DORA Metrics: A Framework for Delivery Performance

Contributor
Jun 14
5 min read

DORA — DevOps Research and Assessment — published a framework of four metrics that predict software delivery performance. The metrics are simple to describe and hard to game. Together, they tell a useful story about whether a team is delivering software well.

This guide is the four metrics, what they mean, and how to use them.

The Four Metrics

Deployment Frequency: how often you deploy to production
Lead Time for Changes: how long from code committed to running in production
Change Failure Rate: what percentage of deploys cause incidents
Mean Time to Recovery (MTTR): how long to recover from an incident

Two metrics about speed; two about quality. The framework's insight: high-performing teams excel at both, not one at the expense of the other.

Why These Four

The DORA research found these four metrics correlate strongly with:

Organizational performance (revenue, market share)
Team well-being (lower burnout, higher job satisfaction)
Software quality (fewer incidents, faster recovery)

The metrics aren't proxies for these outcomes — they predict them.

Deployment Frequency

How often do you deploy to production?

The DORA benchmarks:

Elite: multiple times per day
High: between once per week and once per month
Medium: between once per month and once per six months
Low: less than once per six months

Higher frequency correlates with better outcomes because it forces small batches. Small batches mean each deploy is less risky, easier to test, and easier to roll back.

Teams that deploy rarely tend to:

Batch many changes together (high risk per deploy)
Have long change-detection lead times
Treat each deploy as a special event requiring ceremony

Teams that deploy often have automated their pipeline to the point where deploys are routine.

Lead Time for Changes

From the moment code is committed, how long until it's running in production?

The benchmarks:

Elite: less than one hour
High: between one day and one week
Medium: between one week and one month
Low: more than one month

Short lead times require:

Fast CI/CD pipelines
Automated testing
Low manual gates
Confidence in the deployment process

Long lead times indicate friction — manual approvals, lengthy testing, batched releases.

Change Failure Rate

What percentage of deploys cause incidents, hotfixes, or rollbacks?

The benchmarks:

Elite: 0-15%
High: 16-30%
Medium: 16-30%
Low: 16-30%

(DORA's "Elite" tier is the only one with a low ceiling; Medium and Low can have higher rates.)

Note: change failure rate doesn't have to be very low. Even Elite teams have failures. The key is recovering fast and learning from them.

Mean Time to Recovery

When something breaks, how long until it's fixed?

The benchmarks:

Elite: less than one hour
High: less than one day
Medium: between one day and one week
Low: more than one week

Fast recovery requires:

Good monitoring (you know when something's broken)
Good runbooks (you know what to do)
Easy rollback (you can revert)
Practiced incident response

Slow recovery often means recovery procedures are theoretical rather than practiced.

What the Metrics Tell You Together

The interplay matters as much as individual numbers.

High deployment frequency + low lead time + low failure rate + low MTTR: elite performance. The team has a healthy pipeline and good practices.

High frequency + high failure rate: shipping fast without quality. The pipeline outpaces the testing.

Low frequency + low failure rate: safe but slow. Likely over-batching changes.

High frequency + high MTTR: fast pipeline but no recovery capability. Risky.

The four numbers together diagnose where to invest.

Tracking the Metrics

Measurement is the first challenge. You need:

Deployment events: when did production deploys happen? (from CI/CD)
Commit-to-deploy data: when was each commit deployed? (from git + CI/CD)
Incident records: which deploys caused incidents? (from incident management)
Recovery times: when did incidents start and end? (from incident management)

Tools like LinearB, Sleuth, and Faros aggregate these. For smaller teams, a simple spreadsheet can work.

Interpretation

The metrics are aggregate. Per-deploy variation is huge. Look at trends over weeks/months, not individual data points.

Compare against the team's own history more than against benchmarks. A team improving from "medium" toward "high" is more important than its current ranking.

Common Misuse

As performance management. Using DORA for individual evaluation produces gaming. Engineers split commits, skip difficult changes, avoid risky work.

As inter-team comparison. Different teams have different contexts; comparing them without nuance is misleading.

Gamification. Optimizing for the metric rather than the underlying quality. Frequent deploys of trivial changes don't actually represent throughput.

Single-metric focus. Optimizing one at the expense of others. Improving frequency by shipping broken code makes things worse.

The metrics should drive team conversation, not management surveillance.

Improving Each Metric

To improve Deployment Frequency:

Smaller PRs
Faster CI
Automated testing
Trunk-based development
Reduce manual approvals

To improve Lead Time:

Address pipeline bottlenecks
Reduce manual gates
Automate tests that are running manually
Parallelize what runs serially

To improve Change Failure Rate:

Better automated testing
Pre-merge validation
Feature flags for risky changes
Canary deployments

To improve MTTR:

Monitoring that catches issues fast
Runbooks for common failures
Tested rollback procedures
Practice via game days

Each metric has specific levers. The diagnosis tells you which levers to pull.

The Cultural Side

DORA's research also identified cultural factors that correlate with performance:

Psychological safety
Trust between teams
Learning culture
Generative (not blame-oriented) organization

You can't fix DORA metrics with tooling alone if the culture isn't supportive. Investment in culture pays back in the metrics.

Beyond DORA: SPACE Framework

A newer framework, SPACE, extends DORA with additional dimensions:

Satisfaction and well-being
Performance (DORA-like)
Activity (volume of work)
Communication and collaboration
Efficiency and flow

SPACE is more holistic but harder to measure. For most teams, DORA is the place to start. SPACE comes later if you want more depth.

What DORA Doesn't Measure

DORA is silent on:

Whether you're building the right thing
Product quality from a customer perspective
Code maintainability
Team happiness specifically
Architectural soundness

Use DORA alongside other measures, not as a complete picture.

A Working Use of DORA

For an engineering organization:

Measure baseline. Where are you on each metric?
Identify the weakest dimension. Where's the most opportunity?
Invest there. Make changes targeted at that dimension.
Re-measure. Did the investment move the metric?
Repeat with the next weakest dimension.

The iteration produces real progress. Treating DORA as a one-time benchmark misses the value.

Anti-Patterns

DORA dashboards for executives. Without context, the numbers are misleading.

DORA-driven feature work. Optimizing for metric improvement instead of actual value.

DORA as the only metric. Other things matter; DORA doesn't capture them.

DORA without action. Measuring; not improving.

A Worked Example

An engineering team measures:

Deployment Frequency: weekly
Lead Time: 5 days
Change Failure Rate: 20%
MTTR: 4 hours

Their interpretation:

Lead Time and Frequency suggest under-batched changes (slow pipeline?)
20% failure rate is on the edge; investigate root causes
MTTR is reasonable for the scale

Investments:

Reduce CI runtime from 30 min to 10 min
Add automated tests for the most-failed change types
Adopt feature flags for risky changes

After 3 months, they re-measure:

Frequency: 3x per week
Lead Time: 1 day
Change Failure Rate: 12%
MTTR: 2 hours

Real progress. The framework guided investment.

Key Takeaway

DORA measures delivery performance through four metrics: Deployment Frequency, Lead Time for Changes, Change Failure Rate, MTTR. High performance requires improving all four together, not one at the expense of others. Track against your own history more than against benchmarks. Use the metrics to diagnose where to invest; avoid using them for individual performance management. Pair with cultural investment — tooling alone doesn't move the metrics in unhealthy cultures. DORA is one part of the picture; use alongside product and quality measures.

ShiftQuality

DORA Metrics: A Framework for Delivery Performance

The Four Metrics

Why These Four

Deployment Frequency

Lead Time for Changes

Change Failure Rate

Mean Time to Recovery

What the Metrics Tell You Together

Tracking the Metrics

Interpretation

Common Misuse

Improving Each Metric

The Cultural Side

Beyond DORA: SPACE Framework

What DORA Doesn't Measure

A Working Use of DORA

Anti-Patterns

A Worked Example

Key Takeaway

Related reading

Related Posts