DORA Metrics: A Framework for Delivery Performance
- Contributor
- Jun 14
- 5 min read
DORA — DevOps Research and Assessment — published a framework of four metrics that predict software delivery performance. The metrics are simple to describe and hard to game. Together, they tell a useful story about whether a team is delivering software well.
This guide is the four metrics, what they mean, and how to use them.
The Four Metrics
Deployment Frequency: how often you deploy to production
Lead Time for Changes: how long from code committed to running in production
Change Failure Rate: what percentage of deploys cause incidents
Mean Time to Recovery (MTTR): how long to recover from an incident
Two metrics about speed; two about quality. The framework's insight: high-performing teams excel at both, not one at the expense of the other.
Why These Four
The DORA research found these four metrics correlate strongly with:
Organizational performance (revenue, market share)
Team well-being (lower burnout, higher job satisfaction)
Software quality (fewer incidents, faster recovery)
The metrics aren't proxies for these outcomes — they predict them.
Deployment Frequency
How often do you deploy to production?
The DORA benchmarks:
Elite: multiple times per day
High: between once per week and once per month
Medium: between once per month and once per six months
Low: less than once per six months
Higher frequency correlates with better outcomes because it forces small batches. Small batches mean each deploy is less risky, easier to test, and easier to roll back.
Teams that deploy rarely tend to:
Batch many changes together (high risk per deploy)
Have long change-detection lead times
Treat each deploy as a special event requiring ceremony
Teams that deploy often have automated their pipeline to the point where deploys are routine.
Lead Time for Changes
From the moment code is committed, how long until it's running in production?
The benchmarks:
Elite: less than one hour
High: between one day and one week
Medium: between one week and one month
Low: more than one month
Short lead times require:
Fast CI/CD pipelines
Automated testing
Low manual gates
Confidence in the deployment process
Long lead times indicate friction — manual approvals, lengthy testing, batched releases.
Change Failure Rate
What percentage of deploys cause incidents, hotfixes, or rollbacks?
The benchmarks:
Elite: 0-15%
High: 16-30%
Medium: 16-30%
Low: 16-30%
(DORA's "Elite" tier is the only one with a low ceiling; Medium and Low can have higher rates.)
Note: change failure rate doesn't have to be very low. Even Elite teams have failures. The key is recovering fast and learning from them.
Mean Time to Recovery
When something breaks, how long until it's fixed?
The benchmarks:
Elite: less than one hour
High: less than one day
Medium: between one day and one week
Low: more than one week
Fast recovery requires:
Good monitoring (you know when something's broken)
Good runbooks (you know what to do)
Easy rollback (you can revert)
Practiced incident response
Slow recovery often means recovery procedures are theoretical rather than practiced.
What the Metrics Tell You Together
The interplay matters as much as individual numbers.
High deployment frequency + low lead time + low failure rate + low MTTR: elite performance. The team has a healthy pipeline and good practices.
High frequency + high failure rate: shipping fast without quality. The pipeline outpaces the testing.
Low frequency + low failure rate: safe but slow. Likely over-batching changes.
High frequency + high MTTR: fast pipeline but no recovery capability. Risky.
The four numbers together diagnose where to invest.
Tracking the Metrics
Measurement is the first challenge. You need:
Deployment events: when did production deploys happen? (from CI/CD)
Commit-to-deploy data: when was each commit deployed? (from git + CI/CD)
Incident records: which deploys caused incidents? (from incident management)
Recovery times: when did incidents start and end? (from incident management)
Tools like LinearB, Sleuth, and Faros aggregate these. For smaller teams, a simple spreadsheet can work.
Interpretation
The metrics are aggregate. Per-deploy variation is huge. Look at trends over weeks/months, not individual data points.
Compare against the team's own history more than against benchmarks. A team improving from "medium" toward "high" is more important than its current ranking.
Common Misuse
As performance management. Using DORA for individual evaluation produces gaming. Engineers split commits, skip difficult changes, avoid risky work.
As inter-team comparison. Different teams have different contexts; comparing them without nuance is misleading.
Gamification. Optimizing for the metric rather than the underlying quality. Frequent deploys of trivial changes don't actually represent throughput.
Single-metric focus. Optimizing one at the expense of others. Improving frequency by shipping broken code makes things worse.
The metrics should drive team conversation, not management surveillance.
Improving Each Metric
To improve Deployment Frequency:
Smaller PRs
Faster CI
Automated testing
Trunk-based development
Reduce manual approvals
To improve Lead Time:
Address pipeline bottlenecks
Reduce manual gates
Automate tests that are running manually
Parallelize what runs serially
To improve Change Failure Rate:
Better automated testing
Pre-merge validation
Feature flags for risky changes
Canary deployments
To improve MTTR:
Monitoring that catches issues fast
Runbooks for common failures
Tested rollback procedures
Practice via game days
Each metric has specific levers. The diagnosis tells you which levers to pull.
The Cultural Side
DORA's research also identified cultural factors that correlate with performance:
Psychological safety
Trust between teams
Learning culture
Generative (not blame-oriented) organization
You can't fix DORA metrics with tooling alone if the culture isn't supportive. Investment in culture pays back in the metrics.
Beyond DORA: SPACE Framework
A newer framework, SPACE, extends DORA with additional dimensions:
Satisfaction and well-being
Performance (DORA-like)
Activity (volume of work)
Communication and collaboration
Efficiency and flow
SPACE is more holistic but harder to measure. For most teams, DORA is the place to start. SPACE comes later if you want more depth.
What DORA Doesn't Measure
DORA is silent on:
Whether you're building the right thing
Product quality from a customer perspective
Code maintainability
Team happiness specifically
Architectural soundness
Use DORA alongside other measures, not as a complete picture.
A Working Use of DORA
For an engineering organization:
Measure baseline. Where are you on each metric?
Identify the weakest dimension. Where's the most opportunity?
Invest there. Make changes targeted at that dimension.
Re-measure. Did the investment move the metric?
Repeat with the next weakest dimension.
The iteration produces real progress. Treating DORA as a one-time benchmark misses the value.
Anti-Patterns
DORA dashboards for executives. Without context, the numbers are misleading.
DORA-driven feature work. Optimizing for metric improvement instead of actual value.
DORA as the only metric. Other things matter; DORA doesn't capture them.
DORA without action. Measuring; not improving.
A Worked Example
An engineering team measures:
Deployment Frequency: weekly
Lead Time: 5 days
Change Failure Rate: 20%
MTTR: 4 hours
Their interpretation:
Lead Time and Frequency suggest under-batched changes (slow pipeline?)
20% failure rate is on the edge; investigate root causes
MTTR is reasonable for the scale
Investments:
Reduce CI runtime from 30 min to 10 min
Add automated tests for the most-failed change types
Adopt feature flags for risky changes
After 3 months, they re-measure:
Frequency: 3x per week
Lead Time: 1 day
Change Failure Rate: 12%
MTTR: 2 hours
Real progress. The framework guided investment.
Key Takeaway
DORA measures delivery performance through four metrics: Deployment Frequency, Lead Time for Changes, Change Failure Rate, MTTR. High performance requires improving all four together, not one at the expense of others. Track against your own history more than against benchmarks. Use the metrics to diagnose where to invest; avoid using them for individual performance management. Pair with cultural investment — tooling alone doesn't move the metrics in unhealthy cultures. DORA is one part of the picture; use alongside product and quality measures.
Related reading
Keep learning. This article is part of the Advanced Quality Engineering path in the ShiftQuality Learning Center. Take quality from a team chore to an organizational property.


