Best Git Reporting Tools for Engineering Teams (2026)
A hands-on comparison of 11 git reporting tools for engineering teams. Features, pricing, platform support, and honest pros and cons for each tool.
March 9, 2026·12 min read·Gitmore Team
Your team ships fast. But is it shipping well? Deployment frequency alone doesn't answer that. Neither does change failure rate on its own. You need both speed and stability measured together to get the real picture.
That's what DORA metrics do. They're four (now five) metrics backed by a decade of research across 39,000+ professionals, and they're the closest thing the industry has to a standard for measuring software delivery performance.
This guide covers what each metric means, how to calculate them from your git data, the official benchmarks, common mistakes, and when DORA isn't enough on its own.
DORA stands for DevOps Research and Assessment. It was co-founded in 2015 by Dr. Nicole Forsgren, Jez Humble, and Gene Kim. Their research started in 2014 with the annual State of DevOps Report (originally published with Puppet), and by 2018 they'd published Accelerate: The Science of Lean Software and DevOps, summarizing four years of findings.
Google Cloud acquired DORA in December 2018. The research has continued every year since, and the 2024 report marks the 10th anniversary with data from over 39,000 professionals globally. The book won the Shingo Research and Professional Publication Award in 2019.
One of DORA's core findings: speed and stability are not tradeoffs. Top-performing teams score well on all metrics. Low performers score poorly across the board. You don't have to choose between moving fast and keeping things stable.
DORA groups its metrics into two dimensions: throughput (how fast you deliver) and stability (how reliable those deliveries are).
1. Deployment Frequency (DF)
How often your team deploys code to production. This isn't commits or merges. It's production deployments specifically.
Example: A team deploying 5 times per day vs. once per month. Higher frequency usually means smaller changes, which are easier to debug when something breaks.
2. Change Lead Time (CLT)
The time from a developer's first commit on a branch to that code running in production. This covers coding, review, testing, and deployment.
Example: A developer merges a PR at 9am and it reaches production by 11am = 2-hour lead time. If it takes 3 weeks, something in the pipeline needs attention.
3. Change Failure Rate (CFR)
The percentage of deployments that require an immediate rollback, hotfix, or patch. Not every bug counts. Only failures that need urgent intervention.
Example: 3 out of 30 deployments this month caused incidents = 10% CFR.
4. Failed Deployment Recovery Time (FDRT)
How long it takes to recover from a failed deployment. Previously called "Mean Time to Restore" (MTTR), it was renamed in 2023 to focus specifically on software-change failures rather than external outages.
Example: A deployment causes an outage at 2pm and service is restored by 3pm = 1-hour recovery time.
5. Deployment Rework Rate (added 2024)
The ratio of unplanned deployments that happen because of a production incident. This was split from Change Failure Rate in the 2024 report to better capture rework as a separate signal.
DORA uses cluster analysis across thousands of survey respondents each year to define performance tiers. These numbers shift annually (the 2022 report only detected 3 clusters, no Elite tier), but here are the widely referenced benchmarks:
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | Multiple per day | Daily to weekly | Weekly to monthly | Weekly to monthly |
| Change Lead Time | < 1 day | 1 day to 1 week | 1 week to 1 month | 1 week to 1 month |
| Change Failure Rate | 5% | 10% | 15% | 64% |
| Recovery Time | < 1 hour | < 1 day | 1 day to 1 week | 1 month to 6 months |
The gap between Medium and Low is dramatic. Low-performing teams have a 64% change failure rate vs. 5% for Elite. Their recovery time stretches from hours to months. These aren't small differences.
You can derive DORA metrics from your existing git and CI/CD data. Here's how each one maps to data you already have.
Count the number of successful deployments to production in a given period. If you use CI/CD pipelines, count pipeline runs that deploy to production. A common git-based proxy: count merged PRs to your main branch that trigger a deployment pipeline.
deployments_this_month = count(successful_deploys_to_production)
frequency = deployments_this_month / days_in_monthMeasure the time from the first commit on a branch (or PR creation) to when that code is deployed to production. You can pull this from the GitHub API using PR creation timestamps and deployment timestamps.
lead_time = deployment_timestamp - first_commit_timestamp
average_lead_time = sum(all_lead_times) / count(deployments)Divide the number of deployments that caused incidents by the total number of deployments. This requires tagging failed deployments in your CI/CD system or linking deployments to your incident tracking tool.
cfr = (failed_deployments / total_deployments) * 100Measure the elapsed time between a failed deployment and the next successful deployment on that service. You can also use incident creation to incident resolution timestamps from your incident management tool.
recovery_time = next_successful_deploy - failed_deploy_timestamp
average_recovery = sum(all_recovery_times) / count(failures)Google's DORA team also maintains an open-source project called Four Keys that sets up a data ingestion pipeline from GitHub or GitLab through Google Cloud into a dashboard. It's a solid starting point if you want to build your own measurement system.
DORA isn't just about engineering efficiency. The research found a direct link between software delivery performance and organizational performance.
Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." Saying "every team must deploy multiple times per day by Q4" incentivizes gaming, not improvement.
Pushing many small, inconsequential updates to inflate Deployment Frequency without delivering real value. The metric goes up but nothing improves.
Optimizing for deployment frequency while ignoring change failure rate means you're shipping bugs faster. The metrics are designed to work as a set.
Creating league tables that rank teams against each other ignores context. A web frontend team will naturally deploy more frequently than a mobile app team or a team working on embedded systems. Comparing them is misleading.
Skipping code reviews or automated tests to get a lower lead time number produces buggy software. Speed at the expense of quality defeats the purpose.
Teams need at least 2-3 months of data before deciding what "good" looks like for their specific context. A single sprint doesn't give you a reliable baseline.
DORA shows what changed but not why. A spike in lead time might mean your pipeline is broken, or it might mean half the team was on holiday. External factors like absent code owners or company events skew results.
Teams that release monthly or quarterly will inherently show "low" deployment frequency. That doesn't mean they're performing poorly. The benchmarks assume continuous delivery pipelines.
Going from 1 deployment per month to 1 per week is a big win. Going from 5 per day to 10 per day probably isn't. The framework doesn't tell you when to stop optimizing.
DORA measures pipeline performance but misses PR review time, first-response time on code reviews, work categorization (features vs. bugs vs. tech debt), and developer experience. You need additional signals to get the full picture.
DORA is useful but it has real blind spots. Knowing these prevents you from over-relying on four numbers to run your engineering org.
The SPACE framework was created in 2021 by Nicole Forsgren (yes, the same person who co-founded DORA), Margaret-Anne Storey, and colleagues at Microsoft Research. It was published in ACM Queue and expands the measurement picture across five dimensions:
| Dimension | What It Measures | Example Metrics |
|---|---|---|
| Satisfaction & Well-being | Developer fulfillment | Retention rates, tool satisfaction surveys |
| Performance | System and code outcomes | Defect rates, reliability, feature usage |
| Activity | Observable work output | Commits, PRs, deployments, incidents handled |
| Communication & Collaboration | How well teams work together | Onboarding speed, documentation quality, review participation |
| Efficiency & Flow | Workflow smoothness | Interruptions, handoffs, focus time, cycle time |
DORA's metrics primarily cover the Activity and Performance dimensions. SPACE adds satisfaction, collaboration, and flow. The recommended approach is to start with DORA for delivery baselines, then layer in SPACE for a fuller picture of engineering health.
One practical guideline from the framework: pick at least 3 of the 5 dimensions, balance quantitative and survey data, and only report aggregated team-level results. Never use SPACE (or DORA) to evaluate individual developers.
Several tools can automate DORA measurement. They differ primarily in how they define a "deployment" and how they link incidents to deployments.
| Tool | Approach | Best For |
|---|---|---|
| Google Four Keys | Open-source, GitHub/GitLab to BigQuery | Teams that want full control |
| GitLab (built-in) | Native CI/CD pipeline analytics | GitLab-only teams |
| Sleuth | Deployment-centric event tracking | Accuracy-focused teams |
| LinearB | Git-centric with workflow automation | Teams that also want PR automation |
| Swarmia | DORA + SPACE + developer experience | Teams that want DORA plus DX surveys |
| Middleware | Open-source DORA platform | Budget-conscious teams on GitHub |
For a broader comparison of tools that track git activity and team metrics, see our guide to git reporting tools.
You don't need to buy an expensive platform to start measuring DORA. Here's a practical path:
The goal isn't to reach "Elite" on every metric. It's to understand where your bottlenecks are and improve over time. A team that moves from Low to Medium performance has made a more meaningful improvement than an Elite team optimizing by 5%.
Gitmore doesn't calculate DORA metrics directly. What it does is give you automated visibility into the daily git activity that feeds into those metrics: what got deployed, what PRs merged, what work categories the team focused on, and where things are stuck.
Think of it this way: DORA tells you your lead time is 5 days. Gitmore's daily reports tell you why it's 5 days, because you can see that PRs sit in review for 3 of those days. The metrics and the narrative work together.
Try Gitmore free to get AI-generated team reports from your GitHub, GitLab, or Bitbucket repos. Two-minute setup, no credit card.
Explore git reporting for your platform
Automated git reports for your engineering team. Set up in 2 minutes, no credit card required.
Get Started Free