Performance testing is critical for APIs that need to handle real-world traffic. A slow API can frustrate users, lose customers, and cost money. But how do you know if your API can handle 100 users? 1,000? 10,000? Performance and load testing gives you the answer before your users find out the hard way.
This guide covers practical performance testing strategies, load testing tools, benchmarking techniques, and how to identify and fix bottlenecks. You'll learn how to test response times, throughput, and scalability to ensure your API performs well under any load.
Types of Performance Testing
Load Testing - Expected Traffic
Load testing simulates expected user traffic to verify your API handles normal and peak loads. This is your baseline performance test.
What to Test:
- • Normal Load: Average daily traffic (e.g., 100 requests/second)
- • Peak Load: Highest expected traffic (e.g., 500 requests/second during sales)
- • Sustained Load: Can the API handle peak load for extended periods?
- • Response Times: Do requests complete within acceptable time (e.g., <200ms)?
- • Error Rate: Are there errors under normal load? (Target: <0.1%)
// Example: Load test with k6
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up to 100 users
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<200'], // 95% of requests < 200ms
http_req_failed: ['rate<0.01'], // Error rate < 1%
},
};
export default function () {
const res = http.get('https://api.example.com/products');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 200ms': (r) => r.timings.duration < 200,
});
sleep(1);
}Stress Testing - Breaking Point
Stress testing pushes your API beyond normal limits to find the breaking point. When does it start to slow down? When does it crash? How does it recover?
What to Test:
- • Maximum Capacity: How many users before performance degrades?
- • Failure Behavior: Does it crash gracefully or catastrophically?
- • Error Messages: Are errors meaningful when overloaded?
- • Recovery: Can it recover when load decreases?
- • Resource Exhaustion: What resource runs out first? (CPU, memory, connections)
// Stress test - gradually increase load until failure
export let options = {
stages: [
{ duration: '2m', target: 100 }, // Normal load
{ duration: '5m', target: 200 }, // Above normal
{ duration: '5m', target: 300 }, // Higher
{ duration: '5m', target: 400 }, // Even higher
{ duration: '5m', target: 500 }, // Keep pushing
{ duration: '10m', target: 0 }, // Recovery
],
};
// Monitor where performance starts to degradeSpike Testing - Sudden Traffic Surges
Spike testing simulates sudden, dramatic increases in traffic. Think: product launch, viral content, flash sales. Can your API handle it?
Real-World Scenarios:
- • Black Friday sales - traffic 10x overnight
- • Product launch - sudden surge from marketing campaign
- • Social media viral post - unexpected traffic spike
- • DDoS attack - malicious traffic spike
// Spike test - sudden jump in traffic
export let options = {
stages: [
{ duration: '1m', target: 100 }, // Normal
{ duration: '1m', target: 1000 }, // Sudden spike!
{ duration: '3m', target: 1000 }, // Hold spike
{ duration: '1m', target: 100 }, // Back to normal
],
};
// Test: Does API handle sudden surge?
// Does auto-scaling kick in fast enough?Soak Testing - Long-Term Stability
Soak testing (endurance testing) runs for hours or days to catch memory leaks, resource exhaustion, and degradation over time.
What to Look For:
- • Memory Leaks: Does memory usage grow over time?
- • Connection Leaks: Are database connections properly closed?
- • Log File Growth: Does logging fill up disk space?
- • Performance Degradation: Does response time increase over hours?
- • Resource Cleanup: Are temporary files/caches cleaned up?
// Soak test - run for 24+ hours
export let options = {
stages: [
{ duration: '5m', target: 200 }, // Ramp up
{ duration: '24h', target: 200 }, // Run for 24 hours
{ duration: '5m', target: 0 }, // Ramp down
],
};
// Monitor system metrics throughout:
// - Memory usage
// - CPU usage
// - Database connections
// - Response timesKey Performance Metrics to Measure
Response Time (Latency)
How long does it take for the API to respond? This is what users feel directly.
Important Percentiles:
- • p50 (median): 50% of requests are faster than this (typical experience)
- • p95: 95% of requests are faster (good user experience benchmark)
- • p99: 99% of requests are faster (catches outliers)
- • p99.9: Worst-case scenarios (important for SLAs)
// Good performance targets: p50: < 100ms (median user sees sub-100ms) p95: < 200ms (95% of users happy) p99: < 500ms (99% acceptable) p99.9: < 1s (even worst case under 1 second) // Don't just look at averages! // Average: 150ms might hide some 5s requests
Throughput (Requests per Second)
How many requests can your API handle per second? This determines capacity.
Calculating Capacity:
Current Usage: 50 requests/second average
Peak Usage: 200 requests/second (4x average)
Tested Capacity: 800 requests/second
✓ Headroom: 4x peak capacity (good!)
Rule of thumb: Have 2-3x headroom above peak for unexpected spikes
Error Rate
What percentage of requests fail? Under load, some errors are expected, but how many is too many?
✓ Acceptable Error Rates:
- • <0.1% - Excellent
- • 0.1-0.5% - Good
- • 0.5-1% - Acceptable
✗ Problematic Error Rates:
- • 1-5% - Investigate
- • 5-10% - Serious issue
- • >10% - Critical problem
// Track error types separately: 4xx errors: Client errors (bad requests, auth failures) 5xx errors: Server errors (crashes, timeouts, db issues) // 4xx errors might be expected (invalid input) // 5xx errors are always bad (your fault)
Resource Utilization
Monitor system resources during tests. Which resource is the bottleneck?
Resources to Monitor:
- • CPU: >80% sustained = need more compute or optimization
- • Memory: Growing over time = memory leak
- • Database Connections: Pool exhausted = need more connections or faster queries
- • Network I/O: Bandwidth limit reached = CDN or compression needed
- • Disk I/O: Slow disk = use SSD or cache more data
Popular Load Testing Tools
k6 - Modern Load Testing
Developer-friendly, scriptable in JavaScript, great for CI/CD integration. Open source and powerful.
Pros:
- ✓ Write tests in JavaScript
- ✓ Excellent for CI/CD pipelines
- ✓ Great documentation and community
- ✓ Built-in metrics and thresholds
- ✓ Can run locally or in cloud
// Install: brew install k6 (or download binary)
// Run: k6 run script.js
import http from 'k6/http';
import { sleep } from 'k6';
export let options = {
vus: 100, // 100 virtual users
duration: '5m', // Run for 5 minutes
};
export default function() {
http.get('https://api.example.com/users');
sleep(1);
}
// Output shows real-time metrics:
// http_req_duration: avg=95ms p95=150ms
// http_req_failed: 0.05%
// iterations: 28,500Apache JMeter - Enterprise Standard
Industry-standard tool with GUI. More complex but very powerful for large-scale enterprise testing.
✓ Best For:
- • Enterprise environments
- • Complex test scenarios
- • Teams used to GUI tools
- • Detailed reporting needs
⚠Drawbacks:
- • Steeper learning curve
- • Java-based (heavier)
- • GUI can be slow
- • Harder to version control
Artillery - Simple & Fast
Easy to get started, YAML-based configuration, good for quick load tests.
# Install: npm install -g artillery
# Run: artillery run loadtest.yml
# loadtest.yml
config:
target: "https://api.example.com"
phases:
- duration: 60
arrivalRate: 10 # 10 users per second
- duration: 120
arrivalRate: 50 # Ramp to 50/second
scenarios:
- name: "Get users"
flow:
- get:
url: "/users"
- think: 2 # Wait 2 seconds
- get:
url: "/users/{{ $randomNumber(1, 1000) }}"
# Artillery outputs summary with response times and errorsLocust - Python-Based
Write load tests in Python. Great web UI for monitoring tests in real-time.
# Install: pip install locust
# Run: locust -f locustfile.py
from locust import HttpUser, task, between
class APIUser(HttpUser):
wait_time = between(1, 3) # Wait 1-3s between requests
@task(3) # 3x more likely than other tasks
def get_users(self):
self.client.get("/users")
@task(1)
def get_user(self):
self.client.get("/users/123")
@task(2)
def create_user(self):
self.client.post("/users", json={
"name": "Test User",
"email": "[email protected]"
})
# Open http://localhost:8089 to control test and see real-time statsIdentifying and Fixing Bottlenecks
Database Bottleneck
Most common bottleneck. Slow queries kill API performance.
Symptoms:
- • Response times increase with more users
- • Database CPU or connections maxed out
- • API server CPU is low but responses are slow
- • Connection pool exhausted errors
Solutions:
- ✓ Add database indexes on queried columns
- ✓ Optimize slow queries (use EXPLAIN)
- ✓ Increase connection pool size
- ✓ Add caching layer (Redis, Memcached)
- ✓ Use read replicas for read-heavy workloads
- ✓ Implement pagination (don't fetch all data)
CPU Bottleneck
Your code is doing too much work. CPU is maxed but throughput is low.
Symptoms:
- • CPU at 100% on API servers
- • Response times increase linearly with load
- • Database is fine, but API is slow
Solutions:
- ✓ Profile code to find slow functions
- ✓ Optimize algorithms (O(n²) → O(n log n))
- ✓ Remove unnecessary processing
- ✓ Cache expensive computations
- ✓ Use async/await to avoid blocking
- ✓ Scale horizontally (more servers)
Memory Bottleneck
Running out of memory causes swapping, garbage collection pauses, and crashes.
Symptoms:
- • Memory usage grows over time (leak)
- • Intermittent slow responses (GC pauses)
- • Out of memory crashes under load
- • System starts swapping to disk
Solutions:
- ✓ Fix memory leaks (unclosed connections, event listeners)
- ✓ Stream large responses instead of buffering
- ✓ Implement pagination
- ✓ Increase server memory
- ✓ Use memory profiler to find leaks
Network Bottleneck
Bandwidth limit reached or high latency between services.
Solutions:
- ✓ Enable compression (gzip, brotli)
- ✓ Use CDN for static assets
- ✓ Reduce response payload size
- ✓ Use HTTP/2 or HTTP/3
- ✓ Colocate services in same region
Performance Testing Best Practices
✓ Test in Production-Like Environment
Don't test on your laptop. Use staging environment with similar specs to production: same server size, same database, same network setup. Otherwise results are meaningless.
✓ Use Realistic Test Data
Load production database snapshot into staging. Testing with 100 records doesn't tell you how it performs with 10 million. Query performance changes dramatically with data size.
✓ Simulate Real User Behavior
Don't just hammer one endpoint. Real users browse, search, create, update. Mix different request types and think times between requests.
// Realistic user journey
export default function() {
// 1. Browse products
http.get('/products');
sleep(2); // User reads page
// 2. View product detail
http.get('/products/123');
sleep(5); // User reads details
// 3. Add to cart
http.post('/cart', { product_id: 123, qty: 1 });
sleep(1);
// 4. Checkout (10% of users)
if (Math.random() < 0.1) {
http.post('/checkout', {...});
}
}✓ Run Tests Regularly
Performance degrades over time as code changes. Run performance tests in CI/CD to catch regressions early. Set up alerts if metrics exceed thresholds.
✓ Set Performance Budgets
Define acceptable performance and fail builds if exceeded.
// k6 thresholds - fail test if not met
export let options = {
thresholds: {
'http_req_duration': ['p(95)<200'], // 95% under 200ms
'http_req_duration{endpoint:search}': ['p(95)<500'], // Search can be slower
'http_req_failed': ['rate<0.01'], // < 1% errors
'http_reqs': ['rate>100'], // > 100 req/s throughput
},
};
// Test fails if any threshold not metRelated Tools & Resources
External References
Official Documentation & Tools
- k6 Documentation - Official k6 load testing guide
- Apache JMeter - Enterprise load testing tool
- Artillery Documentation - Modern load testing toolkit
- Locust - Python-based load testing framework
- Web Vitals - Google's performance metrics guide
Conclusion
Performance and load testing is not optional for production APIs. You need to know how your API performs before your users experience slowness or outages. By testing different load scenarios - normal load, peak load, stress, spikes, and endurance - you can identify bottlenecks and fix them proactively.
Start with simple load tests using k6 or Artillery, measure key metrics like response time and throughput, and gradually expand to more sophisticated scenarios. Set performance budgets and run tests regularly in CI/CD. Remember: a slow API costs money in lost users and revenue. Invest in performance testing now to save much more later. Your users and your bottom line will thank you.