ai ai-tooling developer-productivity software-engineering

The Bottleneck Keeps Moving: What Actually Limits You at Each Stage of AI-Assisted Development

· 12 min read

At each stage of AI-assisted development, the skill holding you back changes. Here's how to identify which bottleneck you've migrated to and what to do about it.

I wrote recently about the 5 stages of AI tooling adoption that engineering teams go through, from curiosity to orchestrated autonomy. That framework is about organizations. This one is about you, the individual developer, and the thing nobody warns you about when you start getting good with AI tools: the bottleneck keeps moving.

You spend weeks learning to prompt effectively, finally start getting useful output, and then discover that generating code was never really the problem. Now you’re drowning in code you can’t verify fast enough. So you build better review habits, develop a sense for when to trust the output and when to double-check, and then realize the new constraint is that you can’t break complex problems into pieces the AI can actually execute on. Every time you solve one bottleneck, the next one is already waiting behind it.

This is why “just use AI more” is useless advice. It’s like telling someone who’s plateaued at the gym to “just lift more.” Which muscle, specifically? If your bottleneck is verification and someone tells you to write better prompts, congratulations: you now have more unreviewed code. You’ll feel busier and produce less.

DX’s 2025 report across 135,000 developers found that time savings from AI tools plateaued at roughly 3.6 hours per week, even as adoption climbed from 50% to 91%. More people using the tools, more usage per person, same savings. The bottleneck had already moved somewhere the tools couldn’t help with. Nobody was talking about where it moved to.

Here are the five bottlenecks I’ve seen developers hit, in roughly the order they show up.

Bottleneck 1: Speed

This is where everyone starts. You write code by hand. Character by character, function by function. AI removes this constraint almost immediately.

Install Copilot. Start a Claude Code session. Within a few days you’re generating boilerplate, scaffolding tests, and autocompleting entire functions faster than you could type them. The speed improvement is real and it feels transformative.

This stage lasts about a week. The bottleneck clears so fast that most developers don’t even register it as a distinct stage.

What catches people off guard is the assumption that this acceleration will keep scaling. You got 3x faster at producing code, so surely 10x is next, then 30x. That’s not what happens. You hit the next bottleneck, and it has nothing to do with how fast code appears on screen.

Bottleneck 2: Expression

You have the tools. The speed is there. But the AI keeps giving you code that’s not quite what you wanted. It builds the wrong abstraction, misunderstands the edge case you care about, or produces something that technically works but doesn’t fit the architecture of your project.

The instinct here is to blame the model. “Claude didn’t understand what I meant.” It understood you perfectly. The problem is that what you said wasn’t what you meant. Those are different things, and the gap between them is where most AI frustration lives.

This is the expression bottleneck. You know what you want but you can’t articulate it precisely enough for the AI to produce it. The gap between your mental model of the solution and your ability to communicate that model is the constraint.

The skill you need here isn’t prompt engineering in the “add magic keywords” sense. It’s the same skill that makes someone a good tech lead: the ability to write a task description that a competent but literal-minded junior developer could execute without coming back with fifteen clarifying questions. You need to specify the interface, the constraints, the non-obvious requirements, the things you’d normally keep in your head because you’d be the one writing the code.

Most developers have never had to externalize their intent this thoroughly. When you write code yourself, you make hundreds of micro-decisions that never surface as explicit choices. Hand that same task to an AI and suddenly all of those implicit decisions need to be explicit, or the AI will make its own choices. It will make them confidently and they will be different from yours.

Getting through this bottleneck changes how you think about all specifications, not just AI prompts. You start writing better tickets, better design docs, better code comments. The skill is general. But it takes real practice, and most developers spend months here before the output consistently matches what they had in mind.

Bottleneck 3: Verification

This is where things get interesting, and where I see the most developers stuck right now.

You’ve learned to express your intent clearly. The AI produces reasonable code on the first try. You’re generating far more code per day than you used to write by hand. And that’s the problem: you’re producing code faster than you can confirm it works.

It compiles. The happy path looks right. The function signatures match. But is it actually correct? Does it handle the edge cases? Did it introduce a subtle bug in state management? Is the error handling actually robust, or does it just look robust from a distance? (AI-generated code is very good at looking robust from a distance.)

The METR study on AI-assisted development captures this bottleneck perfectly: developers using AI tools were 19% slower on their tasks but believed they were 24% faster. That’s a 40-point perception gap. They felt more productive because code was appearing on screen faster. They were actually less productive because they weren’t catching the problems in that code quickly enough.

This is where trust becomes the central skill. Not blind trust (“the AI is usually right, ship it”) and not paranoid distrust (“I need to verify every line character by character”). You need calibrated distrust: the ability to scan AI-generated code and know, from experience, where the bodies are likely buried.

The patterns you learn to watch for: AI-generated code is excellent at the happy path and terrible at edge cases. It will write clean, idiomatic code that handles the common case beautifully and then silently drop error handling you didn’t explicitly ask for. It produces code that passes a first read but breaks under concurrent access, unusual input, or production-scale data.

The fix for this bottleneck isn’t reading more carefully. You can’t review your way to safety when the volume keeps climbing. The fix is building verification infrastructure around the output. Test-first workflows where you write (or have the AI write) the tests before the implementation. Incremental generation where you verify small pieces instead of reviewing a 500-line diff. Contract-driven development where the interfaces are locked down before any code gets written.

This is also where a lot of developers start feeling an uncomfortable cognitive dissonance. You’re spending more time reviewing code than writing it. Your job used to be “person who writes code” and now it’s increasingly “person who evaluates code someone else wrote.” The shift feels subtle but it’s the beginning of a bigger identity question that fully surfaces at the next bottleneck.

Bottleneck 4: Decomposition

You can verify individual pieces of AI-generated code. Your review instincts are sharp, your test strategies are solid, and you know when to trust the output. The new problem is that you’re handed a large, ambiguous task and you can’t figure out how to break it into pieces the AI can actually execute on.

“Build a notification system” is not a prompt. It’s a project. And the skill of turning a project into a sequence of well-scoped, independently verifiable tasks that an AI agent can execute is different from any skill you’ve needed before. It’s architectural thinking, but applied to human-AI task allocation rather than system design.

The decomposition has to account for what the AI is good at and what it isn’t. Greenfield CRUD with clear specs? Hand that off all day. Refactoring a stateful service with implicit dependencies across three modules? You probably need to handle the analysis yourself and have the AI execute the mechanical parts. The skill is knowing where the boundary lives for each kind of task.

This is where I started building Hivemind. Not because I wanted a framework, but because I kept running into the same decomposition problem. I’d have a task that the AI could clearly handle, but setting up the context, isolating the environment, managing the iteration loop, and verifying the output was more overhead than just doing the task myself. Hivemind’s pipeline (plan, execute, verify, review, fix, PR) is my answer to “what does good decomposition look like when you systematize it?”

The architectural thinking required here is genuinely hard. You need to understand the codebase well enough to identify natural seams where tasks can be split without creating integration nightmares. You need to judge which tasks need shared context and which can be fully isolated. You need to sequence the work so that later tasks can build on earlier ones without requiring the AI to hold the full project state in its context window.

This bottleneck is also where the identity question from the verification stage comes to a head. At the verification stage, you started spending more time reviewing than writing. At the decomposition stage, you might not be writing any code at all on a given day. You’re specifying, decomposing, orchestrating, and reviewing. The code is being produced by something else.

For a lot of developers, this is genuinely disorienting. “Am I even a developer anymore?” is a question I’ve heard from people who are, objectively, operating at a higher level than they ever have before. I’ve written more about that identity shift separately, because it deserves its own space. The short version: it gets easier once your definition of value shifts from “person who writes code” to “person who gets the right software built.” But that shift takes longer than learning any tool.

Bottleneck 5: Judgment

This is the one that doesn’t have a technical fix.

You can generate code effortlessly. You can express your intent with precision. You can verify output efficiently. You can decompose complex problems into executable units. Code production is, for all practical purposes, a solved problem in your workflow.

The bottleneck is now: what should you build?

When implementation was expensive, the cost of building the wrong thing provided natural guardrails. Three months of developer time is a serious investment that requires justification. When implementation is cheap, those guardrails disappear. You can build anything in an afternoon. Which means you can also build the wrong thing in an afternoon, and do it again tomorrow, and again the day after, and wake up in a month surrounded by features nobody asked for. The question of whether you should build something is suddenly the only question that matters.

Judgment at this level means operating at the product and strategy layer. What problem are we actually solving? Is this feature worth building or should we ship what we have and see if anyone uses it? Are we optimizing the right metric? Should we build this internal tool or buy one?

These are questions that senior engineers and engineering managers have always dealt with. What’s changed is that they’ve become the primary bottleneck, not a secondary concern that lives alongside the “real work” of writing code. When every feature idea can be prototyped in an afternoon by an AI agent, the ability to decide which ideas deserve prototyping becomes the scarce resource.

I notice this in my own work with Hivemind. On any given day, I could spin up agents to work on a dozen different improvements. The constraint isn’t capacity. It’s knowing which three of those dozen would actually matter. I’ve gotten this wrong. I’ve built features that were easy to build and satisfying to ship and completely pointless. Cheap implementation makes that mistake much easier to repeat.

The skills at this level look nothing like the skills at the earlier levels. You need product sense: the ability to understand what users actually need versus what they say they want. You need strategic thinking about where the codebase is heading over the next six months, not just what the current sprint looks like. You need the judgment to say “no, we’re not building that” even when building it would be easy, because ease of implementation is no longer a useful filter for whether something should exist.

Finding Your Bottleneck

The diagnostic is straightforward. Read through the five stages and notice which description made you most uncomfortable. Not which one you intellectually disagree with, but which one described a problem you’re currently feeling.

If you’re frustrated that the AI keeps producing wrong code: you’re at Expression. Work on specifying intent more precisely. Write task descriptions as if you’re handing them to a competent but literal-minded contractor who has never seen your codebase.

If you’re generating a lot of code but feel uneasy about quality: you’re at Verification. Build test-first workflows and develop calibrated distrust. Learn where AI-generated code typically fails and look there first.

If you’re good at individual tasks but struggle with large, ambiguous projects: you’re at Decomposition. Practice breaking problems into independent, verifiable units. Study how your codebase naturally decomposes along its seams and boundaries.

If you can build anything but aren’t sure what’s worth building: you’re at Judgment. This is a product and strategy problem, not a technical one. Talk to users. Measure outcomes, not output.

The key insight is that the advice for each stage is different and often contradictory. At the Expression stage, better prompting is the right answer. At the Verification stage, better prompting actively makes things worse because it gives you more code to review. At the Judgment stage, the entire conversation about prompting is irrelevant.

The 5 Stages of AI Tooling Adoption describes where your team is. This framework describes where you are. They interact but they’re not the same. You can be personally at the Judgment bottleneck while your organization is stuck at stage two. That’s its own kind of frustrating: you know what you’d build if the infrastructure and workflows existed, but you’re spending your time trying to get the team to use AI tools at all.

Figure out which bottleneck is yours. Work on that one. Ignore the advice meant for the other stages, even when it’s coming from someone who sounds very confident on LinkedIn. The skill that gets you here is different from the skill that gets you further.