Contract Testing: Keeping Services Honest
- ShiftQuality Contributor
- Jul 17, 2025
- 5 min read
The previous posts in this path covered systems that test themselves and observability by design. This post addresses the testing gap that observability cannot fill: verifying that the agreements between services — the contracts — remain valid as services evolve independently.
In a monolith, the compiler catches interface mismatches. Change a method signature and every caller that passes the wrong arguments fails to compile. In a distributed system, services communicate over HTTP, message queues, and event streams. There is no compiler. Service A changes its response format on Tuesday. Service B, which depends on that format, does not find out until Wednesday — in production, at 3 AM, when a customer reports a broken checkout flow.
Contract testing prevents this by making the implicit promises between services explicit and verifiable.
What Is a Contract?
A contract is the agreement between a service provider (the service that produces data) and a service consumer (the service that uses that data). The contract specifies what the consumer expects: the shape of the response, the fields that must be present, the types of those fields, and the meanings of status codes.
Critically, the contract represents what the consumer actually uses — not the entire provider API. If the provider returns 50 fields and the consumer only reads 3 of them, the contract covers those 3 fields. The provider is free to change the other 47 fields without breaking the contract.
This consumer-driven approach is the key insight. The consumer defines what it needs. The provider verifies that it still provides what the consumer needs. As long as the contract holds, both services can evolve independently. The provider can add new fields, refactor internal implementations, and change unused endpoints without risk — because the contract tests verify that the fields the consumer depends on are still present and correct.
How Contract Testing Works
The workflow has two sides, and they run independently.
Consumer side. The consumer team writes a test that specifies the interaction they expect. "When I call GET /orders/123, I expect a JSON response with fields: id (integer), status (string), total (number), items (array)." This specification — the pact — is generated as an artifact and published to a shared location (typically a Pact Broker or similar registry).
Provider side. The provider team runs verification tests against every published pact. The test replays the consumer's expected request against the actual provider and verifies that the response matches what the consumer expects. If it matches, the contract holds. If it does not, the provider knows — before deployment — that a consumer will break.
This decoupling is what makes contract testing practical at scale. The consumer and provider do not need to be running simultaneously. The consumer's pact is a static artifact. The provider verifies against it in its own CI pipeline, on its own schedule. No shared test environments. No coordination between teams.
When Integration Tests Fall Short
Integration tests verify that services work together by running them in a shared environment and exercising end-to-end workflows. This approach has three problems that contract testing solves.
Speed. Integration tests require standing up multiple services, databases, and dependencies. They are slow to set up, slow to run, and flaky because of environmental issues. Contract tests run against a single service and execute in seconds.
Independence. Integration tests require all services to be available and running compatible versions simultaneously. A broken test environment blocks everyone. Contract tests run in each service's CI pipeline independently.
Specificity. When an integration test fails, the failure could be in any service involved in the workflow. Diagnosing requires investigating multiple codebases. When a contract test fails, the failure is localized — either the consumer's expectations changed or the provider's behavior changed. The specific field or response code that broke is identified in the test output.
Integration tests are still valuable for verifying complex multi-service workflows. But contract tests catch the most common failure — interface drift between services — faster, cheaper, and more reliably.
Implementing Consumer-Driven Contracts
The implementation follows a pattern regardless of the tooling.
Start with the consumer. When writing code that calls another service, capture the assumptions being made about the response. These assumptions become the contract specification. The test stubs the provider using the contract, verifying that the consumer handles the expected response correctly.
Publish the contract. The contract artifact — containing the expected request, the expected response, and metadata about the consumer and provider — is published to a broker where the provider can access it.
Verify on the provider side. The provider's CI pipeline pulls all contracts that consumers have published against it. For each contract, it replays the request against the actual provider (or a test instance) and verifies the response matches. A webhook or CI status check reports the result.
Gate deployments. The critical integration: neither consumer nor provider can deploy unless the contracts between them are verified. The consumer cannot deploy if its pact relies on provider behavior that has not been verified. The provider cannot deploy if its changes break any published pact. This "can-I-deploy" check is the safety net that prevents production breakage.
Beyond HTTP: Events and Messages
Contract testing extends to asynchronous communication — message queues, event streams, and pub/sub systems. A service that publishes events makes an implicit promise about the shape and semantics of those events. Consumers that subscribe to those events depend on that promise.
Event contracts follow the same pattern. The consumer specifies the event structure it expects. The provider verifies that the events it publishes match that structure. The pact broker manages the contracts between publishers and subscribers.
The challenge with event contracts is evolution. Adding a new optional field to an event is backward-compatible. Removing a field or changing its type is breaking. The contract test catches breaking changes before they propagate through the event pipeline.
Evolving Contracts Over Time
Contracts are not static. As features are added and requirements change, the expectations between services evolve. Managing this evolution requires a few practices.
Additive changes are safe. Adding a new field to a response does not break existing consumers because they do not reference the new field. Contract tests verify this — the consumer's pact still passes because it only checks the fields it uses.
Removal requires coordination. Removing a field breaks any consumer that depends on it. The contract test for that consumer will fail on the provider side, flagging the breaking change. The provider team must coordinate with the consumer team to migrate away from the removed field before it can be deleted.
Version your contracts. As consumers update their expectations, maintain version history. This allows tracking how the contract between two services has evolved and enables rollback if a new contract version introduces problems.
The Takeaway
Contract testing makes the implicit agreements between services explicit and verifiable. Consumers define what they need. Providers verify they deliver it. Both sides can evolve independently as long as the contracts hold.
This testing approach catches the most common distributed system failure — interface drift — without the overhead and fragility of full integration test environments. The contracts are fast to run, specific when they fail, and provide the deployment safety gates that prevent the 3 AM production incident caused by an incompatible service change.
Next in the "Quality Architecture" learning path: We'll cover chaos engineering — deliberately breaking things in production to build confidence in your system's resilience.



Comments