ai ai-tooling software-engineering opinion

Stop Treating AI Tools Like Junior Developers

· 5 min read

The junior dev metaphor leads teams to waste AI tools on the wrong work. A better mental model changes everything about how you delegate.

Somewhere in the last two years, the industry settled on a mental model for AI coding tools: “It’s like having a junior developer. You still need to review everything carefully.”

A year ago, this was wrong because it overstated the tool’s capabilities. The tools have gotten dramatically better since then, and the metaphor is still wrong, just in a completely different direction now. Which, if anything, proves it was never the right framing to begin with.

I should say upfront: I use Claude Code and Codex every day. I’ve built an orchestration platform that runs multiple AI coding agents in parallel Docker containers. Within a single session, these tools are genuinely impressive: recursive context gathering, multi-file reasoning, sub-agents exploring codebases in parallel. The best agents on SWE-bench Pro solve around 55% of real-world engineering tasks. Far from autonomous, but far from useless.

I’m not here to argue the tools are bad. I’m here to argue that the metaphor leads teams to waste them on the wrong work and then wonder why their AI adoption isn’t paying off.

What the Metaphor Gets Wrong

I’ve managed actual junior developers. The entire relationship is built on one assumption: investment compounds. You review their PRs, explain why the database query creates an N+1 problem, show them the architectural reasoning behind a design choice. Next month, they catch those things themselves. The surface area of what you need to review shrinks over time. That’s the deal.

AI tools don’t offer this deal. The review burden on Tuesday is identical to Monday. You can maintain a CLAUDE.md file or a .cursorrules config, and those help, but they’re injected notes, not formed understanding. It’s the difference between a contractor reading your style guide and an employee who’s internalized your team’s instincts. Claude Code’s auto-memory system saves notes across sessions, which is genuinely useful, but the open GitHub issue trying to make cross-session memory actually work tells you how far we are from solving this.

This is the fundamental mismatch. The junior dev metaphor implies a return on mentorship. You invest time now, you get independence later. With AI tools, there is no later. You’re briefing, not training. Every session is day one. It’s Groundhog Day, except Bill Murray can write a mean React component.

The Contractor Who Resets

The old version of this argument used “amnesiac contractor.” That undersold the within-session capabilities. Here’s a version that’s closer to where the tools actually are in 2026:

An extremely capable contractor with amnesia between engagements. Brilliant within a single working session: explores your codebase, asks clarifying questions, traces bugs, reasons about architecture. But they start from zero every time you rehire them. No memory of the last project. No recollection that you told them three engagements ago not to touch the UserService god object.

DHH described this well in January: “I feel like it’s a flickering light bulb. Total darkness, then it’ll flicker on, I can see everything, then boom, pitch black.” Within the session, the light is genuinely on. Across sessions, you’re in the dark again.

This reframe matters because it changes what work you hand off and how you hand it off.

What Actually Changes

You delegate based on task shape, not skill-building. The junior dev model nudges you toward “assign tasks that build learning.” That’s backwards for a tool that won’t retain anything. Instead, you’re sorting tasks by how well they can be specified in a single briefing and how much institutional context they require. High-pattern, well-specified work goes to the AI. Novel architecture and anything touching code with undocumented tribal knowledge stays with humans. I’ve written about the specific delegation levels that make this concrete.

You parallelize aggressively. You’d never give a junior developer five tasks at once. A fast contractor with amnesia? Absolutely, if each task is self-contained. The stages teams go through as they adopt AI tooling show this clearly: the biggest unlock is running many AI tasks in parallel, each with a complete briefing. The contractor model makes this intuitive. The junior dev model makes it feel irresponsible.

You write better specs instead of doing more mentoring. I’ve watched teams spend hours crafting elaborate system prompts, essentially writing onboarding docs for a tool that will forget everything tomorrow. That energy should go into explicit task specifications, structured coding standards, and project-level context files. Not because the tool is dumb, but because a contractor with amnesia needs the brief to be complete every time. No exceptions, no shortcutting.

You review differently. This one’s uncomfortable. A CodeRabbit study of 470 pull requests found that AI-generated PRs contained 1.7x more issues overall, with security issues up to 2.74x higher than human-written PRs. Meanwhile, Anthropic’s own research found that developers using AI scored 17% lower on code comprehension tests. The tools are getting better at writing code and we’re getting worse at reviewing it. That’s a combination worth taking seriously.

A junior developer’s code quality improves over time, which means your review intensity can decrease. AI-generated code quality is roughly constant session to session, which means your review standards can’t slip just because the tool feels more capable. The contractor model keeps you honest here. You wouldn’t skip code review for a contractor just because their last three engagements went well.

And yet the drift is already happening. Cal at YC wrote in February: “I still spot-check the code… but increasingly I am not reading every line of every PR. I rely on agents to spot subtle bugs.” That’s trusting the contractor to QA their own work. If you’ve ever managed actual contractors, you already know the ending.

Your junior devs deserve mentorship that compounds. Your AI tools deserve complete specs that work from a cold start. Mixing those up is how you end up with undertrained humans and overcoached robots.