top of page

DORA Metrics: A Framework for Delivery Performance

  • Contributor
  • Jun 14
  • 5 min read

DORA — DevOps Research and Assessment — published a framework of four metrics that predict software delivery performance. The metrics are simple to describe and hard to game. Together, they tell a useful story about whether a team is delivering software well.

This guide is the four metrics, what they mean, and how to use them.

The Four Metrics

  1. Deployment Frequency: how often you deploy to production

  2. Lead Time for Changes: how long from code committed to running in production

  3. Change Failure Rate: what percentage of deploys cause incidents

  4. Mean Time to Recovery (MTTR): how long to recover from an incident

Two metrics about speed; two about quality. The framework's insight: high-performing teams excel at both, not one at the expense of the other.

Why These Four

The DORA research found these four metrics correlate strongly with:

  • Organizational performance (revenue, market share)

  • Team well-being (lower burnout, higher job satisfaction)

  • Software quality (fewer incidents, faster recovery)

The metrics aren't proxies for these outcomes — they predict them.

Deployment Frequency

How often do you deploy to production?

The DORA benchmarks:

  • Elite: multiple times per day

  • High: between once per week and once per month

  • Medium: between once per month and once per six months

  • Low: less than once per six months

Higher frequency correlates with better outcomes because it forces small batches. Small batches mean each deploy is less risky, easier to test, and easier to roll back.

Teams that deploy rarely tend to:

  • Batch many changes together (high risk per deploy)

  • Have long change-detection lead times

  • Treat each deploy as a special event requiring ceremony

Teams that deploy often have automated their pipeline to the point where deploys are routine.

Lead Time for Changes

From the moment code is committed, how long until it's running in production?

The benchmarks:

  • Elite: less than one hour

  • High: between one day and one week

  • Medium: between one week and one month

  • Low: more than one month

Short lead times require:

  • Fast CI/CD pipelines

  • Automated testing

  • Low manual gates

  • Confidence in the deployment process

Long lead times indicate friction — manual approvals, lengthy testing, batched releases.

Change Failure Rate

What percentage of deploys cause incidents, hotfixes, or rollbacks?

The benchmarks:

  • Elite: 0-15%

  • High: 16-30%

  • Medium: 16-30%

  • Low: 16-30%

(DORA's "Elite" tier is the only one with a low ceiling; Medium and Low can have higher rates.)

Note: change failure rate doesn't have to be very low. Even Elite teams have failures. The key is recovering fast and learning from them.

Mean Time to Recovery

When something breaks, how long until it's fixed?

The benchmarks:

  • Elite: less than one hour

  • High: less than one day

  • Medium: between one day and one week

  • Low: more than one week

Fast recovery requires:

  • Good monitoring (you know when something's broken)

  • Good runbooks (you know what to do)

  • Easy rollback (you can revert)

  • Practiced incident response

Slow recovery often means recovery procedures are theoretical rather than practiced.

What the Metrics Tell You Together

The interplay matters as much as individual numbers.

High deployment frequency + low lead time + low failure rate + low MTTR: elite performance. The team has a healthy pipeline and good practices.

High frequency + high failure rate: shipping fast without quality. The pipeline outpaces the testing.

Low frequency + low failure rate: safe but slow. Likely over-batching changes.

High frequency + high MTTR: fast pipeline but no recovery capability. Risky.

The four numbers together diagnose where to invest.

Tracking the Metrics

Measurement is the first challenge. You need:

  • Deployment events: when did production deploys happen? (from CI/CD)

  • Commit-to-deploy data: when was each commit deployed? (from git + CI/CD)

  • Incident records: which deploys caused incidents? (from incident management)

  • Recovery times: when did incidents start and end? (from incident management)

Tools like LinearB, Sleuth, and Faros aggregate these. For smaller teams, a simple spreadsheet can work.

Interpretation

The metrics are aggregate. Per-deploy variation is huge. Look at trends over weeks/months, not individual data points.

Compare against the team's own history more than against benchmarks. A team improving from "medium" toward "high" is more important than its current ranking.

Common Misuse

As performance management. Using DORA for individual evaluation produces gaming. Engineers split commits, skip difficult changes, avoid risky work.

As inter-team comparison. Different teams have different contexts; comparing them without nuance is misleading.

Gamification. Optimizing for the metric rather than the underlying quality. Frequent deploys of trivial changes don't actually represent throughput.

Single-metric focus. Optimizing one at the expense of others. Improving frequency by shipping broken code makes things worse.

The metrics should drive team conversation, not management surveillance.

Improving Each Metric

To improve Deployment Frequency:

  • Smaller PRs

  • Faster CI

  • Automated testing

  • Trunk-based development

  • Reduce manual approvals

To improve Lead Time:

  • Address pipeline bottlenecks

  • Reduce manual gates

  • Automate tests that are running manually

  • Parallelize what runs serially

To improve Change Failure Rate:

  • Better automated testing

  • Pre-merge validation

  • Feature flags for risky changes

  • Canary deployments

To improve MTTR:

  • Monitoring that catches issues fast

  • Runbooks for common failures

  • Tested rollback procedures

  • Practice via game days

Each metric has specific levers. The diagnosis tells you which levers to pull.

The Cultural Side

DORA's research also identified cultural factors that correlate with performance:

  • Psychological safety

  • Trust between teams

  • Learning culture

  • Generative (not blame-oriented) organization

You can't fix DORA metrics with tooling alone if the culture isn't supportive. Investment in culture pays back in the metrics.

Beyond DORA: SPACE Framework

A newer framework, SPACE, extends DORA with additional dimensions:

  • Satisfaction and well-being

  • Performance (DORA-like)

  • Activity (volume of work)

  • Communication and collaboration

  • Efficiency and flow

SPACE is more holistic but harder to measure. For most teams, DORA is the place to start. SPACE comes later if you want more depth.

What DORA Doesn't Measure

DORA is silent on:

  • Whether you're building the right thing

  • Product quality from a customer perspective

  • Code maintainability

  • Team happiness specifically

  • Architectural soundness

Use DORA alongside other measures, not as a complete picture.

A Working Use of DORA

For an engineering organization:

  1. Measure baseline. Where are you on each metric?

  2. Identify the weakest dimension. Where's the most opportunity?

  3. Invest there. Make changes targeted at that dimension.

  4. Re-measure. Did the investment move the metric?

  5. Repeat with the next weakest dimension.

The iteration produces real progress. Treating DORA as a one-time benchmark misses the value.

Anti-Patterns

DORA dashboards for executives. Without context, the numbers are misleading.

DORA-driven feature work. Optimizing for metric improvement instead of actual value.

DORA as the only metric. Other things matter; DORA doesn't capture them.

DORA without action. Measuring; not improving.

A Worked Example

An engineering team measures:

  • Deployment Frequency: weekly

  • Lead Time: 5 days

  • Change Failure Rate: 20%

  • MTTR: 4 hours

Their interpretation:

  • Lead Time and Frequency suggest under-batched changes (slow pipeline?)

  • 20% failure rate is on the edge; investigate root causes

  • MTTR is reasonable for the scale

Investments:

  • Reduce CI runtime from 30 min to 10 min

  • Add automated tests for the most-failed change types

  • Adopt feature flags for risky changes

After 3 months, they re-measure:

  • Frequency: 3x per week

  • Lead Time: 1 day

  • Change Failure Rate: 12%

  • MTTR: 2 hours

Real progress. The framework guided investment.

Key Takeaway

DORA measures delivery performance through four metrics: Deployment Frequency, Lead Time for Changes, Change Failure Rate, MTTR. High performance requires improving all four together, not one at the expense of others. Track against your own history more than against benchmarks. Use the metrics to diagnose where to invest; avoid using them for individual performance management. Pair with cultural investment — tooling alone doesn't move the metrics in unhealthy cultures. DORA is one part of the picture; use alongside product and quality measures.

Related reading

Keep learning. This article is part of the Advanced Quality Engineering path in the ShiftQuality Learning Center. Take quality from a team chore to an organizational property.

bottom of page