The thesis

From prompting
to orchestrating.

AI coding is reorganising software work along a four-stage progression. The models are good enough. The infrastructure isn't.

By Andrew Crookston Updated May 2026 10 min read

AI models are excellent at coding now, and getting better every month. The limit has moved. What surrounds them — how we prompt, structure, verify, and scale the work — is now the harness that decides whether AI delivers anything useful.

This page is my thesis on where that work has landed and where it's going next. Four clear stages of adoption have emerged: prompting, pair programming, delegating, orchestrating. Each is a real shift in how engineering teams work with AI.

Where are you, and where is your team?

What's happening at scale

42%

of committed code is now AI-generated or assisted. Projected 65% by 2027.

Sonar, 2026

96%

of developers don't fully trust AI-generated code. Only 48% always verify before committing.

Sonar, 2026

73%

of engineering teams have no standardised templates or golden paths.

Harness, 2026

51%

of frequent AI users report more quality problems than before. Not fewer.

Sonar, 2026

Adoption is mainstream. Trust isn't. The gap is the story.

01The four stages.

Prompting

Chat window, copy-paste, manual loop. The AI is fast; you're the bottleneck.

Tools

ChatGPT, Claude.ai

Limit

Human as copy-paste pipeline

Pair programming

AI lives in the editor, suggests line by line. You drive, it navigates.

Tools

Copilot, Cursor

Limit

Still your pace, your keyboard

Delegating

Describe a task, walk away — sort of. Powerful, but tethered to your machine.

Tools

Claude Code, Codex

Limit

Babysitting, lost context, sync

Orchestrating

Async, fleet-scale, pipeline-enforced. Queue work, review outcomes.

Tools

Stripe Minions, Spotify Honk, Pilot

Limit

Infrastructure doesn't exist yet

Most teams think they're at stage three. Most are stuck between one and two. → Read the foundational piece

Stage is decided by infrastructure, not ambition. You can't delegate if your environments are flaky. You can't orchestrate at scale without golden paths and a way to verify what came back.

Different stages suit different problems. One-shot a regex at stage one. Pair-program a feature at stage two. Delegate a migration at stage three. Earlier stages don't go away as you build toward stage four — they sit alongside it, each used where it fits.

02AI is a multiplier.

Where AI shines

Greenfield

Empty repository. Fresh context. No constraints, no history. The model has everything because there's nothing it doesn't have. Demos and viral wins come from this mode.

Most demos · Few real codebases

Where AI struggles

Iceberg

The code in front of you is the small part above the waterline. The bulk of what matters — why this service owns that table, which module holds up three downstream consumers — lives in senior engineers' heads.

Most real work · Few demos

AI multiplies whatever you give it, including the broken parts. Spotify and Stripe didn't sprinkle AI on tired codebases. They spent a decade building platforms AI could plug into, then wrote their own tools.

Real work is the iceberg. The code in front of you is the small visible part. The bulk — why this service owns that table, which module holds up three downstream consumers, what the post-mortem from 2023 actually concluded — lives in senior engineers' heads and old Slack threads, where AI has no access.

What changed with AI wasn't the volume of work but its location. Code generation compressed; specification upstream and review downstream expanded to match.

The bottleneck moved.

To the two ends most teams have underinvested in for years.

Before AI

Spec

Writing code

Review

After AI

Spec

Code

Review

Writing was never the bottleneck. Reviewing was — and specification before it. AI compresses the middle and exposes the bookends. Vague specs in, bad code out, now at 10× the volume. → Read the multiplier essay

03What stage four requires.

Existing tools converge on a specific shape — single task, single agent, single screen, synchronous, freeform. They work well for solo, non-critical, or non-production work, where speed and flexibility are the priorities and the cost of a bad output is low.

Production is a different problem. To let AI coding into shipped systems, the work needs checks and balances, pipelines, and verification — not as polish but as the structure that lets you trust the output. Without that, AI just produces more code than anyone can safely review.

Five constraints describe what stage four actually looks like. Each one is a frustration with stage three turned into a principle.

Chats

→

Tasks

Terminals

→

Servers

Synchronous

→

Async

Single agent

→

Fleet

Freeform

→

Pipeline

Five frustrations, five principles, one shape. → Read the constraints piece

04The loop has to close.

Under the five constraints sits a single principle: agents become useful when the loop closes.

The loop has two halves. Verification is what gets checked — tests, lints, spec match, structural gates. Heartbeat is what advances state — what triggers the next step, what signal moves the work forward. Both halves have to be load-bearing rather than advisory — pipelines, not suggestions.

That distinction is what separates the existing tools from what stage four needs.

Where most tools sit today

Runner

Executes things. Fast. Flexible.

Generic heartbeat
Time- or run-based
Advisory structure
Fails open
Flexibility by default
Solo, exploratory work

Where stage four lives

Workflow

Executes things in order. Refuses to advance when preconditions aren't met.

Semantic heartbeat
State-aware
Load-bearing structure
Fails closed
Structure by default
Team-scale, regulated work

Flexibility lives at the spec layer, not the execution layer. → Read the close-the-loop piece

05Where this leads.

None of this is theoretical. The constraints, the bookends, the close-the-loop logic — those came out of the last year of building production AI work, and adopting AI across an engineering org in a regulated healthcare context.

I'm working the thesis from two sides. I'm building Pilot — an orchestration layer that takes the runner-vs-workflow split seriously, starting with an open-source community edition. And I do advisory work with engineering leaders navigating the same transition inside their organisations — figuring out which stage they're actually at, what foundation work matters next, how to adopt AI without amplifying what's already broken.

Different angles on the same question. The question is what I'm interested in.

If any of this resonates and you want to follow the thinking, the Field Notes go out when there's something worth saying. Otherwise, the writing keeps going.

From promptingto orchestrating.

01The four stages.

02AI is a multiplier.

03What stage four requires.

04The loop has to close.

05Where this leads.

From prompting
to orchestrating.