top of page

Performance Testing: Loads, Stress, and Soak

  • Contributor
  • Apr 28
  • 5 min read

"Is our system fast enough?" is a fuzzy question. "Can it handle 1000 concurrent users?" is a clearer one. "Will it survive a 10x spike?" is another. "Does it stay healthy over 48 hours?" is yet another. Each is a different test, with a different setup, finding different problems.

This guide is the practical view of the main performance test types and when to use each.

The Three Main Types

Load testing: How does the system behave under expected load? Verifies that normal traffic produces acceptable response times.

Stress testing: What happens when the system is pushed beyond expected load? Reveals breaking points and behavior under overload.

Soak testing: Does the system remain stable over long periods? Surfaces memory leaks, resource exhaustion, and slow-accumulating problems.

There are others (spike, scalability, capacity) but these three cover most needs.

Load Testing

The most common performance test. Run the system at expected production load and measure:

  • Response times (p50, p95, p99)

  • Error rates

  • Throughput (requests per second)

  • Resource utilization (CPU, memory, network)

A working load test:

  1. Models realistic traffic patterns (not just one endpoint hit repeatedly)

  2. Includes realistic data (not all the same record)

  3. Includes realistic user behavior (think time between requests)

  4. Runs for long enough to reach steady state (typically 15-30 minutes)

  5. Asserts against specific thresholds

Example assertion: "p95 of /api/checkout under 800ms at 500 concurrent users."

Stress Testing

Push past expected load until something breaks. The goal is understanding limits.

Variants:

  • Ramp test: gradually increase load until performance degrades

  • Spike test: sudden burst of high load

  • Sustained overload: maintain high load to see what fails first

Stress testing answers:

  • What's our actual capacity?

  • What fails first under overload? (Database? API gateway? Memory?)

  • How does the system recover after the load is removed?

  • Are our scaling triggers correctly set?

Soak Testing

Long-running tests at moderate load. Often 24-72 hours.

What soak tests find:

  • Memory leaks

  • Connection pool exhaustion

  • Disk filling up (logs, temp files)

  • Background job queue buildup

  • Database performance degradation as data grows

  • Caching artifacts

Soak tests can be expensive (long-running infrastructure). For most teams, run them periodically — monthly or before major releases.

Spike Testing

Sudden traffic burst to test elasticity.

Examples:

  • Marketing campaign launches

  • Black Friday rushes

  • Social media virality

What spike tests verify:

  • Auto-scaling triggers correctly

  • The system doesn't fail before scaling completes

  • Recovery is clean after the spike

Capacity / Scalability Testing

Determines maximum sustainable throughput.

Methodology:

  1. Start at low load

  2. Increase incrementally until error rate exceeds threshold

  3. Note the throughput at that point

  4. Note where additional capacity should be added

The result informs capacity planning: "we can handle 1500 RPS on the current cluster; we'd need to add a node at 2000."

Setting Targets

Performance tests need numeric targets. Without them, the test reveals data but doesn't pass or fail.

Targets come from:

  • SLOs (service level objectives) you've committed to

  • Customer expectations for the use case

  • Competitive benchmarks for similar products

  • Engineering intuition for what's "fast enough"

Examples:

  • p95 response time under 500ms for read endpoints

  • p99 response time under 1500ms for write endpoints

  • Error rate under 0.1% at expected load

  • System handles 2x expected peak load with degradation but no errors

Document targets explicitly. Performance is the second-biggest source of vague engineering disagreements (after "good user experience").

What to Test

You can't load test everything. Pick:

  • Critical user journeys. What customers do most.

  • Money paths. Checkout, billing, payment.

  • Read-heavy endpoints. Where most traffic concentrates.

  • Reportedly slow areas. Where customers have complained.

  • Recently changed code. Where regressions might hide.

A typical team has 5-15 endpoints worth dedicated performance testing.

Tools

Common open-source options:

  • k6: modern JavaScript-based, good developer experience

  • JMeter: mature, GUI-driven, broad protocol support

  • Locust: Python-based, code-driven

  • Gatling: Scala-based, high-concurrency

Cloud options (Loader.io, Loadster, Octoperf) reduce setup cost.

For most teams, k6 is a reasonable starting point. The exact tool matters less than the discipline of using it.

Environment

Performance tests need realistic environments.

  • Production: the most realistic, but disruptive. Used carefully (during low-traffic windows, with kill switches).

  • Production-equivalent staging: ideal — same infrastructure shape and capacity.

  • Smaller staging: useful for comparative testing (regression between releases) but absolute numbers don't translate.

A common mistake: load testing a smaller environment and assuming the results scale linearly. They often don't.

When to Run

Cadence varies by team:

  • Continuously in CI: lightweight smoke load tests on every change (does this introduce a regression?)

  • Pre-release: full load and stress tests before major releases

  • Periodically: soak and capacity tests monthly or quarterly

  • Pre-launch: specific spike tests before known traffic events

The fastest-feedback layer (CI smoke perf) catches obvious regressions. Slower tests catch subtler issues.

Interpreting Results

Raw response times alone are misleading. Look at:

  • Percentile distributions. p50, p95, p99. Averages hide tail latency.

  • Throughput vs. latency curves. As load increases, where does latency start climbing?

  • Error rates. Is the system returning errors before slowing down?

  • Resource utilization. What's saturating? CPU, memory, network, database connections?

A successful performance test produces graphs and conclusions, not just numbers.

Common Performance Test Mistakes

Unrealistic traffic patterns. All requests hitting one endpoint with one user ID. Doesn't model real traffic; doesn't catch real bugs.

Insufficient ramp time. Sudden jump to full load. Misses what happens during scaling.

Ignoring think time. Real users pause between requests. Removing think time produces synthetic patterns no real user creates.

Testing only happy paths. Real traffic includes errors, retries, slow responses. Test those too.

No baseline. Performance tests without historical comparison are absolute numbers without context. Track over time.

Testing infrastructure, not application. Hitting the load balancer hard reveals how the load balancer scales, not whether your app does.

Performance Bugs to Look For

  • N+1 queries that don't surface in single-request testing

  • Memory growth proportional to request count

  • Cache misses that cascade

  • Lock contention under concurrency

  • Slow paths in third-party dependencies

  • Garbage collection pauses

These typically surface in load and soak tests, not unit tests.

Reporting

Performance test reports should include:

  • Configuration: what was tested, with what setup

  • Results: response times, throughput, errors, resource use

  • Comparison: against prior runs and against targets

  • Findings: bottlenecks identified

  • Recommendations: what to address before production

A good report is short. Long reports get filed; short reports get read.

Key Takeaway

Performance testing isn't one activity. Load tests verify expected traffic, stress tests find breaking points, soak tests reveal slow accumulation, spike tests verify elasticity. Set specific numeric targets. Test the critical journeys, not everything. Use realistic traffic patterns. Run lightweight tests in CI, heavier tests pre-release, soak tests periodically. Report results with comparisons, not just numbers. Performance testing is most valuable when it produces actionable findings, not just data.

Related reading

Keep learning. This article is part of the Test Automation path in the ShiftQuality Learning Center. Build test automation that lasts, with ROI you can defend.

bottom of page