API Performance & Load Testing

Complete guide to testing API performance, identifying bottlenecks, and ensuring scalability

Published: January 2025 • 16 min read

Performance testing is critical for APIs that need to handle real-world traffic. A slow API can frustrate users, lose customers, and cost money. But how do you know if your API can handle 100 users? 1,000? 10,000? Performance and load testing gives you the answer before your users find out the hard way.

This guide covers practical performance testing strategies, load testing tools, benchmarking techniques, and how to identify and fix bottlenecks. You'll learn how to test response times, throughput, and scalability to ensure your API performs well under any load.

Types of Performance Testing

Load Testing - Expected Traffic

Load testing simulates expected user traffic to verify your API handles normal and peak loads. This is your baseline performance test.

What to Test:

  • Normal Load: Average daily traffic (e.g., 100 requests/second)
  • Peak Load: Highest expected traffic (e.g., 500 requests/second during sales)
  • Sustained Load: Can the API handle peak load for extended periods?
  • Response Times: Do requests complete within acceptable time (e.g., <200ms)?
  • Error Rate: Are there errors under normal load? (Target: <0.1%)
// Example: Load test with k6
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 100 },  // Ramp up to 100 users
    { duration: '5m', target: 100 },  // Stay at 100 users
    { duration: '2m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200'], // 95% of requests < 200ms
    http_req_failed: ['rate<0.01'],   // Error rate < 1%
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
  
  sleep(1);
}

Stress Testing - Breaking Point

Stress testing pushes your API beyond normal limits to find the breaking point. When does it start to slow down? When does it crash? How does it recover?

What to Test:

  • Maximum Capacity: How many users before performance degrades?
  • Failure Behavior: Does it crash gracefully or catastrophically?
  • Error Messages: Are errors meaningful when overloaded?
  • Recovery: Can it recover when load decreases?
  • Resource Exhaustion: What resource runs out first? (CPU, memory, connections)
// Stress test - gradually increase load until failure
export let options = {
  stages: [
    { duration: '2m', target: 100 },   // Normal load
    { duration: '5m', target: 200 },   // Above normal
    { duration: '5m', target: 300 },   // Higher
    { duration: '5m', target: 400 },   // Even higher
    { duration: '5m', target: 500 },   // Keep pushing
    { duration: '10m', target: 0 },    // Recovery
  ],
};

// Monitor where performance starts to degrade

Spike Testing - Sudden Traffic Surges

Spike testing simulates sudden, dramatic increases in traffic. Think: product launch, viral content, flash sales. Can your API handle it?

Real-World Scenarios:

  • • Black Friday sales - traffic 10x overnight
  • • Product launch - sudden surge from marketing campaign
  • • Social media viral post - unexpected traffic spike
  • • DDoS attack - malicious traffic spike
// Spike test - sudden jump in traffic
export let options = {
  stages: [
    { duration: '1m', target: 100 },    // Normal
    { duration: '1m', target: 1000 },   // Sudden spike!
    { duration: '3m', target: 1000 },   // Hold spike
    { duration: '1m', target: 100 },    // Back to normal
  ],
};

// Test: Does API handle sudden surge?
// Does auto-scaling kick in fast enough?

Soak Testing - Long-Term Stability

Soak testing (endurance testing) runs for hours or days to catch memory leaks, resource exhaustion, and degradation over time.

What to Look For:

  • Memory Leaks: Does memory usage grow over time?
  • Connection Leaks: Are database connections properly closed?
  • Log File Growth: Does logging fill up disk space?
  • Performance Degradation: Does response time increase over hours?
  • Resource Cleanup: Are temporary files/caches cleaned up?
// Soak test - run for 24+ hours
export let options = {
  stages: [
    { duration: '5m', target: 200 },      // Ramp up
    { duration: '24h', target: 200 },     // Run for 24 hours
    { duration: '5m', target: 0 },        // Ramp down
  ],
};

// Monitor system metrics throughout:
// - Memory usage
// - CPU usage
// - Database connections
// - Response times

Key Performance Metrics to Measure

Response Time (Latency)

How long does it take for the API to respond? This is what users feel directly.

Important Percentiles:

  • p50 (median): 50% of requests are faster than this (typical experience)
  • p95: 95% of requests are faster (good user experience benchmark)
  • p99: 99% of requests are faster (catches outliers)
  • p99.9: Worst-case scenarios (important for SLAs)
// Good performance targets:
p50:  < 100ms  (median user sees sub-100ms)
p95:  < 200ms  (95% of users happy)
p99:  < 500ms  (99% acceptable)
p99.9: < 1s    (even worst case under 1 second)

// Don't just look at averages!
// Average: 150ms might hide some 5s requests

Throughput (Requests per Second)

How many requests can your API handle per second? This determines capacity.

Calculating Capacity:

Current Usage: 50 requests/second average

Peak Usage: 200 requests/second (4x average)

Tested Capacity: 800 requests/second

✓ Headroom: 4x peak capacity (good!)

Rule of thumb: Have 2-3x headroom above peak for unexpected spikes

Error Rate

What percentage of requests fail? Under load, some errors are expected, but how many is too many?

✓ Acceptable Error Rates:

  • • <0.1% - Excellent
  • • 0.1-0.5% - Good
  • • 0.5-1% - Acceptable

✗ Problematic Error Rates:

  • • 1-5% - Investigate
  • • 5-10% - Serious issue
  • • >10% - Critical problem
// Track error types separately:
4xx errors: Client errors (bad requests, auth failures)
5xx errors: Server errors (crashes, timeouts, db issues)

// 4xx errors might be expected (invalid input)
// 5xx errors are always bad (your fault)

Resource Utilization

Monitor system resources during tests. Which resource is the bottleneck?

Resources to Monitor:

  • CPU: >80% sustained = need more compute or optimization
  • Memory: Growing over time = memory leak
  • Database Connections: Pool exhausted = need more connections or faster queries
  • Network I/O: Bandwidth limit reached = CDN or compression needed
  • Disk I/O: Slow disk = use SSD or cache more data

Popular Load Testing Tools

k6 - Modern Load Testing

Developer-friendly, scriptable in JavaScript, great for CI/CD integration. Open source and powerful.

Pros:

  • ✓ Write tests in JavaScript
  • ✓ Excellent for CI/CD pipelines
  • ✓ Great documentation and community
  • ✓ Built-in metrics and thresholds
  • ✓ Can run locally or in cloud
// Install: brew install k6 (or download binary)
// Run: k6 run script.js

import http from 'k6/http';
import { sleep } from 'k6';

export let options = {
  vus: 100,              // 100 virtual users
  duration: '5m',        // Run for 5 minutes
};

export default function() {
  http.get('https://api.example.com/users');
  sleep(1);
}

// Output shows real-time metrics:
// http_req_duration: avg=95ms p95=150ms
// http_req_failed: 0.05%
// iterations: 28,500

Apache JMeter - Enterprise Standard

Industry-standard tool with GUI. More complex but very powerful for large-scale enterprise testing.

✓ Best For:

  • • Enterprise environments
  • • Complex test scenarios
  • • Teams used to GUI tools
  • • Detailed reporting needs

⚠Drawbacks:

  • • Steeper learning curve
  • • Java-based (heavier)
  • • GUI can be slow
  • • Harder to version control

Artillery - Simple & Fast

Easy to get started, YAML-based configuration, good for quick load tests.

# Install: npm install -g artillery
# Run: artillery run loadtest.yml

# loadtest.yml
config:
  target: "https://api.example.com"
  phases:
    - duration: 60
      arrivalRate: 10    # 10 users per second
    - duration: 120
      arrivalRate: 50    # Ramp to 50/second
      
scenarios:
  - name: "Get users"
    flow:
      - get:
          url: "/users"
      - think: 2          # Wait 2 seconds
      - get:
          url: "/users/{{ $randomNumber(1, 1000) }}"

# Artillery outputs summary with response times and errors

Locust - Python-Based

Write load tests in Python. Great web UI for monitoring tests in real-time.

# Install: pip install locust
# Run: locust -f locustfile.py

from locust import HttpUser, task, between

class APIUser(HttpUser):
    wait_time = between(1, 3)  # Wait 1-3s between requests
    
    @task(3)  # 3x more likely than other tasks
    def get_users(self):
        self.client.get("/users")
    
    @task(1)
    def get_user(self):
        self.client.get("/users/123")
    
    @task(2)
    def create_user(self):
        self.client.post("/users", json={
            "name": "Test User",
            "email": "[email protected]"
        })

# Open http://localhost:8089 to control test and see real-time stats

Identifying and Fixing Bottlenecks

Database Bottleneck

Most common bottleneck. Slow queries kill API performance.

Symptoms:

  • • Response times increase with more users
  • • Database CPU or connections maxed out
  • • API server CPU is low but responses are slow
  • • Connection pool exhausted errors

Solutions:

  • ✓ Add database indexes on queried columns
  • ✓ Optimize slow queries (use EXPLAIN)
  • ✓ Increase connection pool size
  • ✓ Add caching layer (Redis, Memcached)
  • ✓ Use read replicas for read-heavy workloads
  • ✓ Implement pagination (don't fetch all data)

CPU Bottleneck

Your code is doing too much work. CPU is maxed but throughput is low.

Symptoms:

  • • CPU at 100% on API servers
  • • Response times increase linearly with load
  • • Database is fine, but API is slow

Solutions:

  • ✓ Profile code to find slow functions
  • ✓ Optimize algorithms (O(n²) → O(n log n))
  • ✓ Remove unnecessary processing
  • ✓ Cache expensive computations
  • ✓ Use async/await to avoid blocking
  • ✓ Scale horizontally (more servers)

Memory Bottleneck

Running out of memory causes swapping, garbage collection pauses, and crashes.

Symptoms:

  • • Memory usage grows over time (leak)
  • • Intermittent slow responses (GC pauses)
  • • Out of memory crashes under load
  • • System starts swapping to disk

Solutions:

  • ✓ Fix memory leaks (unclosed connections, event listeners)
  • ✓ Stream large responses instead of buffering
  • ✓ Implement pagination
  • ✓ Increase server memory
  • ✓ Use memory profiler to find leaks

Network Bottleneck

Bandwidth limit reached or high latency between services.

Solutions:

  • ✓ Enable compression (gzip, brotli)
  • ✓ Use CDN for static assets
  • ✓ Reduce response payload size
  • ✓ Use HTTP/2 or HTTP/3
  • ✓ Colocate services in same region

Performance Testing Best Practices

✓ Test in Production-Like Environment

Don't test on your laptop. Use staging environment with similar specs to production: same server size, same database, same network setup. Otherwise results are meaningless.

✓ Use Realistic Test Data

Load production database snapshot into staging. Testing with 100 records doesn't tell you how it performs with 10 million. Query performance changes dramatically with data size.

✓ Simulate Real User Behavior

Don't just hammer one endpoint. Real users browse, search, create, update. Mix different request types and think times between requests.

// Realistic user journey
export default function() {
  // 1. Browse products
  http.get('/products');
  sleep(2);  // User reads page
  
  // 2. View product detail
  http.get('/products/123');
  sleep(5);  // User reads details
  
  // 3. Add to cart
  http.post('/cart', { product_id: 123, qty: 1 });
  sleep(1);
  
  // 4. Checkout (10% of users)
  if (Math.random() < 0.1) {
    http.post('/checkout', {...});
  }
}

✓ Run Tests Regularly

Performance degrades over time as code changes. Run performance tests in CI/CD to catch regressions early. Set up alerts if metrics exceed thresholds.

✓ Set Performance Budgets

Define acceptable performance and fail builds if exceeded.

// k6 thresholds - fail test if not met
export let options = {
  thresholds: {
    'http_req_duration': ['p(95)<200'],     // 95% under 200ms
    'http_req_duration{endpoint:search}': ['p(95)<500'],  // Search can be slower
    'http_req_failed': ['rate<0.01'],       // < 1% errors
    'http_reqs': ['rate>100'],              // > 100 req/s throughput
  },
};

// Test fails if any threshold not met

Related Tools & Resources

External References

Official Documentation & Tools

Conclusion

Performance and load testing is not optional for production APIs. You need to know how your API performs before your users experience slowness or outages. By testing different load scenarios - normal load, peak load, stress, spikes, and endurance - you can identify bottlenecks and fix them proactively.

Start with simple load tests using k6 or Artillery, measure key metrics like response time and throughput, and gradually expand to more sophisticated scenarios. Set performance budgets and run tests regularly in CI/CD. Remember: a slow API costs money in lost users and revenue. Invest in performance testing now to save much more later. Your users and your bottom line will thank you.