GitHub claims 55% faster. A new study found 19% slower. Here's what the research actually shows about AI coding tool productivity — and why the results conflict.
GitHub says developers are 55% faster with Copilot. Microsoft reports a 26% productivity boost. But a July 2025 study from METR found experienced developers were actually 19% slower when using AI tools.
Stack Overflow's developer satisfaction with AI dropped from 77% to 72% in just one year. Only 43% of developers say they're confident in AI accuracy.
Who's right? And what does it mean for your workflow?
The answer is more nuanced than any single headline suggests. Here's what the research actually shows.
Let's start with the studies showing AI helps.
The foundational GitHub Copilot study that launched a thousand vendor claims:
This study established the "AI makes you faster" narrative. But note the conditions: a specific, well-defined task in a controlled environment with a particular tech stack.
Microsoft followed up with a larger enterprise study:
This addressed some limitations of the original study—more developers, longer duration, varied work. But still relied partly on self-reported metrics.
IBM surveyed 669 developers using their code assistant:
JetBrains surveyed 481 programmers about their AI tool usage patterns:
Now for the studies that complicate the narrative.
The METR study (full paper) dropped a statistic that made headlines:
Setup:
Results:
| Metric | Value |
|---|---|
| Expected speedup (developer prediction) | +24% faster |
| Perceived speedup (what developers thought happened) | +20% faster |
| Actual result | 19% slower |
The gap between perception and reality is striking. Developers believed they were getting a 20% speedup while actually experiencing a 19% slowdown.
As Simon Willison noted in his analysis, this wasn't a study of novice AI users—these were experienced developers who knew their tools and their codebases intimately.
TechCrunch's coverage highlighted the key implication: AI coding tools may genuinely help some developers while slowing down others.
The annual Stack Overflow survey revealed erosion in AI sentiment:
GitClear analyzed code metrics across repositories using AI assistance:
These aren't bad studies reaching wrong conclusions. They're measuring different things under different conditions.
| Study | Task Type | Finding |
|---|---|---|
| GitHub | Specific exercise (HTTP server) | +55% faster |
| METR | Real issues in production repos | -19% slower |
Simple, well-defined tasks with clear success criteria: AI helps. Complex, ambiguous tasks requiring system understanding: AI may hurt.
The pattern across studies:
The GitHub study included a mix of experience levels. The METR study specifically selected experienced developers who deeply knew their codebases. Different populations, different results.
METR developers had contributed to their repositories for years. They didn't need AI to explain the code to them.
The perception gap in the METR study is crucial:
Many positive studies rely heavily on self-reported metrics. This isn't dishonest—perception matters—but it's not the same as measuring actual output.
Microsoft found it takes approximately 11 weeks to realize full productivity benefits from AI tools. Studies vary dramatically in duration:
Based on the research, clear patterns emerge about where AI helps.
Best fit scenarios:
| Use Case | Why It Works |
|---|---|
| Boilerplate generation | Repetitive, pattern-based, low risk |
| Tests and configs | Well-defined structure, easy to verify |
| Unfamiliar languages/frameworks | AI bridges knowledge gaps |
| Code explanation | Understanding > generation |
| API discovery | Finding methods, libraries, patterns |
| Prototyping | Speed matters more than perfection |
Worst fit scenarios:
| Use Case | Why It Struggles |
|---|---|
| Complex debugging | Often makes it worse (Stack Overflow data) |
| Architecture decisions | Lacks full system context |
| Performance optimization | Doesn't understand constraints |
| Security-critical code | 45% failure rate (Veracode) |
| Familiar, well-understood tasks | Overhead exceeds benefit |
One of the clearest findings across research: experience level dramatically affects outcomes.
AI genuinely bridges skill gaps:
The "leveling up" effect is real. A junior developer with AI assistance can produce code that looks more like senior output—at least on the surface.
The calculus is different:
The METR study specifically showed this: experts on their own codebases were slowed down by AI tools.
A fundamental limitation underlies many AI tool struggles.
The study involved repositories averaging 22,000+ stars—real, complex projects with millions of lines of code. AI tools couldn't grasp the full context. Suggestions were often technically correct but contextually wrong.
This is where the next generation of tools has opportunity: full project understanding, not just file-level assistance.
The JetBrains research revealed practical usage patterns that don't match the "AI writes all my code" narrative.
Common patterns:
Trust calibration happens over time:
Most productive developers aren't using AI for everything—they're using it strategically.
The trajectory is clear: AI tools are improving rapidly. Context windows are expanding. Agent capabilities are increasing.
But fundamental limitations remain:
What to watch for in tools:
AI coding tools help. But it's complicated.
The research shows:
| Factor | Impact on AI Benefit |
|---|---|
| Task complexity | Simple = helps, Complex = may hurt |
| Developer experience | Junior = big gains, Senior = smaller/negative |
| Codebase familiarity | Unfamiliar = helps, Familiar = overhead |
| Measurement method | Self-reported often inflated |
Don't trust vendor benchmarks blindly. A 55% improvement on a specific JavaScript exercise doesn't mean 55% improvement on your production codebase.
Your mileage will literally vary. The same tool can make one developer faster and another slower, depending on task, experience, and context.
Best approach: Experiment with your actual workflow. Measure honestly. Use AI strategically for tasks where research shows it helps, and maintain your own skills for tasks where it doesn't.
Solo IDE is built differently. Instead of file-level suggestions, AI agents understand your entire project. Describe what you want to build, and agents handle implementation with full context awareness.
The METR Study (19% Slower)
Positive Studies
Additional Context