Performance Testing: Loads, Stress, and Soak
- Contributor
- Apr 28
- 5 min read
"Is our system fast enough?" is a fuzzy question. "Can it handle 1000 concurrent users?" is a clearer one. "Will it survive a 10x spike?" is another. "Does it stay healthy over 48 hours?" is yet another. Each is a different test, with a different setup, finding different problems.
This guide is the practical view of the main performance test types and when to use each.
The Three Main Types
Load testing: How does the system behave under expected load? Verifies that normal traffic produces acceptable response times.
Stress testing: What happens when the system is pushed beyond expected load? Reveals breaking points and behavior under overload.
Soak testing: Does the system remain stable over long periods? Surfaces memory leaks, resource exhaustion, and slow-accumulating problems.
There are others (spike, scalability, capacity) but these three cover most needs.
Load Testing
The most common performance test. Run the system at expected production load and measure:
Response times (p50, p95, p99)
Error rates
Throughput (requests per second)
Resource utilization (CPU, memory, network)
A working load test:
Models realistic traffic patterns (not just one endpoint hit repeatedly)
Includes realistic data (not all the same record)
Includes realistic user behavior (think time between requests)
Runs for long enough to reach steady state (typically 15-30 minutes)
Asserts against specific thresholds
Example assertion: "p95 of /api/checkout under 800ms at 500 concurrent users."
Stress Testing
Push past expected load until something breaks. The goal is understanding limits.
Variants:
Ramp test: gradually increase load until performance degrades
Spike test: sudden burst of high load
Sustained overload: maintain high load to see what fails first
Stress testing answers:
What's our actual capacity?
What fails first under overload? (Database? API gateway? Memory?)
How does the system recover after the load is removed?
Are our scaling triggers correctly set?
Soak Testing
Long-running tests at moderate load. Often 24-72 hours.
What soak tests find:
Memory leaks
Connection pool exhaustion
Disk filling up (logs, temp files)
Background job queue buildup
Database performance degradation as data grows
Caching artifacts
Soak tests can be expensive (long-running infrastructure). For most teams, run them periodically — monthly or before major releases.
Spike Testing
Sudden traffic burst to test elasticity.
Examples:
Marketing campaign launches
Black Friday rushes
Social media virality
What spike tests verify:
Auto-scaling triggers correctly
The system doesn't fail before scaling completes
Recovery is clean after the spike
Capacity / Scalability Testing
Determines maximum sustainable throughput.
Methodology:
Start at low load
Increase incrementally until error rate exceeds threshold
Note the throughput at that point
Note where additional capacity should be added
The result informs capacity planning: "we can handle 1500 RPS on the current cluster; we'd need to add a node at 2000."
Setting Targets
Performance tests need numeric targets. Without them, the test reveals data but doesn't pass or fail.
Targets come from:
SLOs (service level objectives) you've committed to
Customer expectations for the use case
Competitive benchmarks for similar products
Engineering intuition for what's "fast enough"
Examples:
p95 response time under 500ms for read endpoints
p99 response time under 1500ms for write endpoints
Error rate under 0.1% at expected load
System handles 2x expected peak load with degradation but no errors
Document targets explicitly. Performance is the second-biggest source of vague engineering disagreements (after "good user experience").
What to Test
You can't load test everything. Pick:
Critical user journeys. What customers do most.
Money paths. Checkout, billing, payment.
Read-heavy endpoints. Where most traffic concentrates.
Reportedly slow areas. Where customers have complained.
Recently changed code. Where regressions might hide.
A typical team has 5-15 endpoints worth dedicated performance testing.
Tools
Common open-source options:
k6: modern JavaScript-based, good developer experience
JMeter: mature, GUI-driven, broad protocol support
Locust: Python-based, code-driven
Gatling: Scala-based, high-concurrency
Cloud options (Loader.io, Loadster, Octoperf) reduce setup cost.
For most teams, k6 is a reasonable starting point. The exact tool matters less than the discipline of using it.
Environment
Performance tests need realistic environments.
Production: the most realistic, but disruptive. Used carefully (during low-traffic windows, with kill switches).
Production-equivalent staging: ideal — same infrastructure shape and capacity.
Smaller staging: useful for comparative testing (regression between releases) but absolute numbers don't translate.
A common mistake: load testing a smaller environment and assuming the results scale linearly. They often don't.
When to Run
Cadence varies by team:
Continuously in CI: lightweight smoke load tests on every change (does this introduce a regression?)
Pre-release: full load and stress tests before major releases
Periodically: soak and capacity tests monthly or quarterly
Pre-launch: specific spike tests before known traffic events
The fastest-feedback layer (CI smoke perf) catches obvious regressions. Slower tests catch subtler issues.
Interpreting Results
Raw response times alone are misleading. Look at:
Percentile distributions. p50, p95, p99. Averages hide tail latency.
Throughput vs. latency curves. As load increases, where does latency start climbing?
Error rates. Is the system returning errors before slowing down?
Resource utilization. What's saturating? CPU, memory, network, database connections?
A successful performance test produces graphs and conclusions, not just numbers.
Common Performance Test Mistakes
Unrealistic traffic patterns. All requests hitting one endpoint with one user ID. Doesn't model real traffic; doesn't catch real bugs.
Insufficient ramp time. Sudden jump to full load. Misses what happens during scaling.
Ignoring think time. Real users pause between requests. Removing think time produces synthetic patterns no real user creates.
Testing only happy paths. Real traffic includes errors, retries, slow responses. Test those too.
No baseline. Performance tests without historical comparison are absolute numbers without context. Track over time.
Testing infrastructure, not application. Hitting the load balancer hard reveals how the load balancer scales, not whether your app does.
Performance Bugs to Look For
N+1 queries that don't surface in single-request testing
Memory growth proportional to request count
Cache misses that cascade
Lock contention under concurrency
Slow paths in third-party dependencies
Garbage collection pauses
These typically surface in load and soak tests, not unit tests.
Reporting
Performance test reports should include:
Configuration: what was tested, with what setup
Results: response times, throughput, errors, resource use
Comparison: against prior runs and against targets
Findings: bottlenecks identified
Recommendations: what to address before production
A good report is short. Long reports get filed; short reports get read.
Key Takeaway
Performance testing isn't one activity. Load tests verify expected traffic, stress tests find breaking points, soak tests reveal slow accumulation, spike tests verify elasticity. Set specific numeric targets. Test the critical journeys, not everything. Use realistic traffic patterns. Run lightweight tests in CI, heavier tests pre-release, soak tests periodically. Report results with comparisons, not just numbers. Performance testing is most valuable when it produces actionable findings, not just data.
Related reading
Keep learning. This article is part of the Test Automation path in the ShiftQuality Learning Center. Build test automation that lasts, with ROI you can defend.


