top of page

Test Pyramid vs Test Trophy: Choosing a Model

  • Contributor
  • Mar 4
  • 5 min read

The test pyramid (many unit tests, fewer integration, fewer E2E) was the dominant testing model for two decades. The test trophy (heavy on integration, lighter on unit and E2E) has been gaining traction since around 2018. Neither is universally right. The choice depends on what your application looks like and what kinds of bugs you most need to catch.

The Pyramid

Mike Cohn's pyramid, from 2009:

        /\
       /E2E\        Few
      /----\
     /Integ.\      Some
    /--------\
   /  Unit    \   Many
  /------------\

Reasoning:

  • Unit tests are fastest and cheapest

  • More of the cheap ones, fewer of the expensive ones

  • Catch most bugs at the lowest level

The pyramid was widely adopted and reflected the test tooling of its time.

The Trophy

Kent C. Dodds' trophy, from 2018:

       _____
      |  E2E  |     Some
      |-------|
      |       |
      | Integ.|     Many
      |       |
      |-------|
      |  Unit |     Some
      |-------|
      | Static|   Foundation
      |_______|

Reasoning:

  • Modern integration tests are fast enough to use heavily

  • Heavily-mocked unit tests miss real bugs

  • Static analysis (types, lint) catches a class of bugs cheaply

  • The middle (integration) provides the best value-per-test

What the Models Are Actually About

Both models try to answer: "where should your testing effort go?"

The pyramid bet on: "the lowest level catches the most bugs cheapest."

The trophy bets on: "the level closest to user behavior catches the most relevant bugs, and integration is now cheap enough to live there."

The disagreement is empirical. Which level actually catches your bugs?

Why the Pyramid Sometimes Fails

Unit tests with mocks can pass while production fails. The mock returns what the test expects; production returns something else.

Common pyramid failures:

  • 95% unit coverage, but a deploy breaks the integration with payments

  • Unit tests verify each function works in isolation; the bug is that they don't compose

  • All the unit tests pass; production breaks because the schema migration wasn't tested with real Postgres

The pyramid assumed unit tests would catch these by virtue of coverage. They don't when the bugs live at integration points.

Why the Trophy Sometimes Fails

Integration tests are slower. A team that doubles down on integration can produce suites that take 30+ minutes.

Common trophy failures:

  • Slow suite that gets run less often

  • Tests too coarse to localize failures

  • Tests slow enough that developers don't run them locally

  • Test data complexity grows unwieldy

The trophy assumed integration testing tooling would keep up. For most teams, it has — but not all teams.

Choosing Between Them

Use the pyramid when:

  • Your application has lots of pure logic (calculation, transformation, parsing)

  • Components are mostly stateless and independent

  • Integration points are few and stable

  • Unit tests catch the bugs you're actually seeing

Use the trophy when:

  • Your application is mostly orchestration of services and data

  • Components have lots of integration with databases or other services

  • Most bugs live at integration points

  • Unit tests are mostly mock-shuffling

Many modern web applications fit the trophy better than the pyramid. Many systems-level codebases fit the pyramid better than the trophy.

What Matters More Than the Shape

The shape is a guide. What matters more:

  • Speed. Whatever level dominates your tests, they need to be fast enough to run often.

  • Reliability. Tests should fail when the system is broken; they should not fail otherwise.

  • Trust. A team that trusts the suite ships faster than a team that doesn't.

  • Maintenance. Tests should age well, not require constant fixing.

A pyramid that's fast and reliable beats a trophy that isn't. And vice versa.

What to Measure

Rather than picking a shape and adhering to it, measure where your bugs are caught — or where they escape.

  • For each bug in production, ask: "what level should have caught this?"

  • For each false alarm in CI, ask: "what level is being too pessimistic?"

The aggregate tells you whether your testing is balanced. If most production bugs are integration issues, you need more integration coverage. If most CI failures are unit tests failing on legitimate refactoring, you have too many implementation-coupled unit tests.

Static Analysis as Foundation

The trophy adds a "static" layer at the bottom. The pyramid traditionally didn't.

Types, linting, and other static analyses catch a class of bug very cheaply — at compile time, with no test runtime cost. For typed languages especially, this has become the cheapest test layer available.

A team using TypeScript or strong static analysis tools is getting a layer of coverage that earlier pyramid-era discussions didn't have.

Hybrid Models

You don't have to pick. Many teams use:

  • Static analysis foundation

  • Unit tests for pure logic

  • Heavy integration coverage for service/database code

  • Few but stable E2E tests for critical user journeys

This is essentially "use the right level for the right kind of code." Pure logic at unit level; orchestration at integration; full system at E2E.

Architecture Affects Shape

The shape of your testing should match the shape of your code.

  • Monolith with rich domain model: unit-heavy makes sense

  • API layer over database: integration-heavy makes sense

  • Microservices with thin services: integration and contract-heavy makes sense

  • Real-time/distributed system: E2E and chaos testing become important

Trying to force a shape onto code it doesn't fit produces strained tests.

The Cost of Each Level

Approximate costs to run a single test:

  • Static check: microseconds

  • Unit test: 1-10 ms

  • Integration test (with real DB): 50-500 ms

  • E2E test: 5-30 seconds

Approximate cost to maintain:

  • Unit tests: low if simple, high if heavy mocking

  • Integration tests: medium; flakiness can be an issue

  • E2E tests: high; brittle to UI/flow changes

The right shape minimizes total cost (runtime + maintenance) for the coverage you need.

Anti-Patterns

Cargo-cult adherence. Following the pyramid (or trophy) because that's what the talks say, regardless of fit.

Coverage worship. "We need 80% coverage at the unit level" — without asking whether unit-level coverage catches your bugs.

E2E-only. A team with only E2E tests has slow CI and brittle tests. Defeats the purpose.

Unit-only. A team with only unit tests ships integration bugs constantly. Defeats the purpose.

Key Takeaway

The test pyramid says "more cheap tests at the bottom"; the test trophy says "more value-per-test at the integration level with static foundation." Neither is universally right. The pyramid suits pure-logic codebases; the trophy suits orchestration-heavy modern web apps. What matters more than the shape: speed, reliability, trust, maintenance. Measure where your bugs escape and where false alarms come from; let the data shape your investment. Most real teams use a hybrid — static foundation, unit tests for logic, integration tests for orchestration, E2E for critical journeys.

Related reading

Keep learning. This article is part of the Software Testing Foundations path in the ShiftQuality Learning Center. Learn to design tests that catch real bugs.

bottom of page