April 15, 2026 10 min read ai

The Spec Is the Product Now

AI tools execute your specs literally. Every gap in your specification becomes a gap in the output.

I used to write terrible tickets. Not maliciously terrible. Just the kind of lazy that everyone recognized and nobody questioned. “Add retry logic to the payment service.” “Handle the error case for expired tokens.” Tickets that assumed the implementer already knew what I meant, because the implementer was usually me, or someone sitting three desks away who’d been in the same standup that morning.

This worked fine for twenty years. It was, in hindsight, a system held together entirely by the fact that humans are excellent at reading between the lines. AI is not.

The Expensive Illusion

The dirty secret of software specifications is that most of them were never complete. They didn’t need to be. The person writing the ticket carried a mental model of the codebase, the business rules, the edge cases, and the unwritten conventions. The person implementing the ticket carried the same model, or close enough. The spec was a reminder, not an instruction set. A Post-it note from you to yourself.

Nobody noticed this because the feedback loop was invisible. You’d write a vague ticket, implement it yourself, fill in the gaps from memory, and ship it. If you handed it to another engineer, they’d ask a couple of clarifying questions in Slack, fill in the rest from team context, and ship it. The spec was incomplete, and it didn’t matter, because the whole system was humans talking to humans, and humans are remarkably good at inferring what you probably meant.

I run tasks through Hivemind, an agent orchestration platform I built in F# that drives Claude Code and Codex behind the scenes. Sometimes I still reach for those tools directly, but most of my work goes through the pipeline. These tools do exactly what you tell them to do. Not what you meant. Not what you would have done. What you wrote. Every assumption you left implicit, every edge case you figured was obvious, every convention you didn’t bother documenting, it all shows up as a bug in the output.

This isn’t a model problem. Qodo’s 2025 survey of 609 developers found that 65% say AI misses relevant context, and among the 44% who say AI degrades code quality, the blame lands on missing context, not model limitations. The models are fine. The specs are the problem.

A Ticket I Actually Wrote

Here’s something embarrassing. This is a real specification I wrote for a Hivemind task, lightly disguised. I’m sharing it because I think most developers write tickets like this and nobody admits it:

Add retry logic to the Stripe webhook handler. It should retry
on transient failures and give up on permanent ones.

Seems reasonable, right? You’ve probably written tickets like this. I certainly had. Here’s what the AI produced:

public async Task HandleWebhook(StripeEvent event)
{
    var maxRetries = 3;
    for (int i = 0; i < maxRetries; i++)
    {
        try
        {
            await ProcessEvent(event);
            return;
        }
        catch (HttpRequestException)
        {
            if (i == maxRetries - 1) throw;
            await Task.Delay(1000);
        }
        catch (Exception)
        {
            throw; // permanent failure
        }
    }
}

This code is technically correct. It retries on transient failures and gives up on permanent ones. It also has a fixed 1-second delay with no backoff, will hammer Stripe three times in three seconds during an outage, treats every non-HTTP exception as permanent (including timeouts, DNS failures, and socket exhaustion), has no logging, no dead letter queue, no idempotency check, and will silently retry events that already partially succeeded.

Every one of those problems came from something I didn’t say. The AI did exactly what I asked for. I just asked for the wrong thing, very precisely.

Here’s the version that actually worked:

# Feature: Stripe Webhook Retry Handler

## Problem Statement
The Stripe webhook endpoint has no retry logic. Transient failures
(network blips, Stripe rate limits, brief outages) cause events to
be silently dropped, leading to missed payments and inconsistent state.

## Solution Statement
Add a Polly-based retry policy with exponential backoff, idempotency
checks, and a dead letter queue for persistent failures. Surgical
change to the webhook handler only.

## Relevant Files
- `PaymentService.cs` - existing DI and error handling patterns to follow
- `StripeWebhookHandler.cs` - the handler that needs retry logic
- `processed_events` table - for idempotency checks

## Step by Step Tasks

### 1. Add idempotency check
- Check `stripe_event_id` against `processed_events` table before processing
- If already processed, return 200 and skip
- Write to `processed_events` AFTER successful processing, inside the same transaction

### 2. Implement retry policy with Polly
- Retry on: HTTP 429, 5xx responses, network timeouts, DNS resolution failures
- Do NOT retry on: 4xx (except 429), deserialization errors, business logic validation
- Exponential backoff: 1s, 2s, 4s, 8s, 16s with jitter (0-500ms), max 5 retries

### 3. Add dead letter queue
- After final retry failure, push event payload to SQS `stripe-webhook-dlq`

### 4. Add structured logging
- Log each retry attempt at WARN: event type, event ID, attempt number, error
- Log final failure at ERROR with full event payload
- Log skipped duplicates at INFO

### 5. Validate
- Run `dotnet test` — all existing + new tests pass with zero regressions

## Acceptance Criteria
- Transient 5xx from Stripe retries up to 5 times with backoff, then lands in DLQ
- Duplicate events (same `stripe_event_id`) return 200 without reprocessing
- Non-retriable errors (400, validation) fail immediately, no retry
- All retry attempts visible in structured logs at WARN level

That spec produced working code on the first pass. No back-and-forth. No “that’s close but you missed the idempotency check.” One shot.

The difference isn’t prompting skill. I didn’t use any magic words or clever formatting. I just wrote down everything I knew about how this system should behave. All the knowledge that used to live in my head, on the page.

And here’s the thing worth noting: you can use AI to help write these specs. Describe the problem in plain language, have Claude Code or Codex ask you clarifying questions, and iterate until the spec is tight. The AI is excellent at surfacing edge cases you hadn’t considered and structuring your thinking into an executable format. The spec still requires your judgment and domain knowledge, but the drafting process doesn’t have to be manual.

“Prompt Engineering” Is a Misnomer

There’s an entire cottage industry around prompt engineering. Tips, tricks, magic phrases, persona assignments, twelve-part prompting frameworks with their own acronyms. Most of it misses the point.

Andrej Karpathy started using the term “context engineering” instead, noting that people’s use of “prompt” tends to trivialize what is actually a complex engineering problem. He’s right, but I think even that framing is too generous to the novelty of it. What we’re really talking about is specification writing. It’s the same skill that distinguishes a good tech lead from a mediocre one: the ability to externalize intent completely enough that someone else can execute on it without reading your mind.

I’ve been an engineering manager for years. The best technical leads I’ve worked with could write a design doc or a task breakdown that a new hire could pick up and execute correctly. Not because the new hire was brilliant, but because the document was thorough. It anticipated questions. It specified edge cases. It made implicit knowledge explicit.

That’s exactly what working effectively with AI requires. The feedback loop is just faster. Instead of waiting two weeks for a junior dev to build the wrong thing, you wait ninety seconds for Claude to build the wrong thing. Same root cause, tighter iteration cycle.

Stack Overflow’s 2025 survey found that 45% of developers say their top frustration with AI tools is “solutions that are almost right, but not quite.” Two out of three spend more time fixing that almost-right code than they would have spent writing it from scratch. That’s not a model quality problem. That’s a specification quality problem. The models are doing exactly what was asked for. The ask just wasn’t what anyone actually wanted.

What “Senior” Means Now

The conventional definition of a senior engineer is someone who can take an ambiguous problem and figure out the right thing to build. Identify requirements, navigate tradeoffs, make judgment calls, and deliver something that works. That hasn’t changed. What’s changed is how that skill manifests.

For the past two decades, a senior engineer’s context lived in their head and came out through their hands on a keyboard. The spec was an afterthought because the senior engineer was both the spec writer and the implementer. The translation from “what should we build” to “working code” happened inside one person’s skull, and the ticket was just a breadcrumb trail.

Now there’s a new step: you have to get that context out of your skull and into a document that an AI can execute against. I’ve written before about treating AI tools like junior developers, and the same principle applies here. A staff engineer who can only produce good code when they’re the one typing is operating at a lower level than they think. The value was never in the typing. It was in knowing what to type.

If you can describe a system precisely enough that a competent stranger could implement it correctly, you can work effectively with AI. If you can’t, what you actually have isn’t ten years of experience. It’s three years of experience and seven years of relying on context that lives nowhere but your own memory.

I know that’s a harsh framing. I wrote it about myself first. It stung. Some codebases are genuinely so complex that no spec could capture them without being longer than the code itself. Some problems require real-time judgment that can’t be pre-specified. I’m not arguing that specs replace engineering skill. I’m arguing that the ability to write a complete spec is an engineering skill, and one that a lot of senior engineers have been able to skip because they were always their own implementer.

The Shift Already Happened

ThoughtWorks named “Spec-Driven Development” as a key engineering practice for 2025. GitHub launched Spec Kit. AWS built Kiro around the idea that specifications are the primary artifact in AI-assisted development. This isn’t a prediction. It’s a description of where the industry already is.

I’ve felt this shift personally. When I started building Hivemind, my task specs looked like that first retry example. Three months later, they looked like the second one, because I’d wasted enough cycles watching agents build the wrong thing from my lazy descriptions. The bottleneck kept moving, and it moved squarely onto the spec. Writing code got faster. Writing specs that produce correct code on the first pass became the actual work.

The time investment doesn’t shrink, by the way. It just moves. Less time typing code, more time thinking about what code should exist and describing it precisely. For people who got into programming because they like typing code, that’s going to feel like a loss. For people who got into it because they like solving problems, the feedback loop has never been better.

Where This Goes

I don’t think every engineer needs to become a technical writer. I do think the gap between “I know what to build” and “I can describe what to build so completely that a machine builds it correctly” is going to define career trajectories over the next few years.

The engineers who are already good at this, the ones who write thorough design docs, clear tickets, and precise code review comments, are going to have a significant advantage. Not because they’re better at prompting. Because they’ve been practicing specification writing their entire career, and the rest of the industry just discovered it matters.

The spec was always the product. We just didn’t notice because we were the only ones reading it.