From prompting
to orchestrating.
AI coding is reorganising software work along a four-stage progression. The models are good enough. The infrastructure isn't.
AI models are excellent at coding now, and getting better every month. The limit has moved. What surrounds them — how we prompt, structure, verify, and scale the work — is now the harness that decides whether AI delivers anything useful.
This page is my thesis on where that work has landed and where it's going next. Four clear stages of adoption have emerged: prompting, pair programming, delegating, orchestrating. Each is a real shift in how engineering teams work with AI.
Where are you, and where is your team?
Adoption is mainstream. Trust isn't. The gap is the story.
01The four stages.
Most teams think they're at stage three. Most are stuck between one and two. → Read the foundational piece
Stage is decided by infrastructure, not ambition. You can't delegate if your environments are flaky. You can't orchestrate at scale without golden paths and a way to verify what came back.
Different stages suit different problems. One-shot a regex at stage one. Pair-program a feature at stage two. Delegate a migration at stage three. Earlier stages don't go away as you build toward stage four — they sit alongside it, each used where it fits.
02AI is a multiplier.
AI multiplies whatever you give it, including the broken parts. Spotify and Stripe didn't sprinkle AI on tired codebases. They spent a decade building platforms AI could plug into, then wrote their own tools.
Real work is the iceberg. The code in front of you is the small visible part. The bulk — why this service owns that table, which module holds up three downstream consumers, what the post-mortem from 2023 actually concluded — lives in senior engineers' heads and old Slack threads, where AI has no access.
What changed with AI wasn't the volume of work but its location. Code generation compressed; specification upstream and review downstream expanded to match.
03What stage four requires.
Existing tools converge on a specific shape — single task, single agent, single screen, synchronous, freeform. They work well for solo, non-critical, or non-production work, where speed and flexibility are the priorities and the cost of a bad output is low.
Production is a different problem. To let AI coding into shipped systems, the work needs checks and balances, pipelines, and verification — not as polish but as the structure that lets you trust the output. Without that, AI just produces more code than anyone can safely review.
Five constraints describe what stage four actually looks like. Each one is a frustration with stage three turned into a principle.
Five frustrations, five principles, one shape. → Read the constraints piece
04The loop has to close.
Under the five constraints sits a single principle: agents become useful when the loop closes.
The loop has two halves. Verification is what gets checked — tests, lints, spec match, structural gates. Heartbeat is what advances state — what triggers the next step, what signal moves the work forward. Both halves have to be load-bearing rather than advisory — pipelines, not suggestions.
That distinction is what separates the existing tools from what stage four needs.
- Generic heartbeat
- Time- or run-based
- Advisory structure
- Fails open
- Flexibility by default
- Solo, exploratory work
- Semantic heartbeat
- State-aware
- Load-bearing structure
- Fails closed
- Structure by default
- Team-scale, regulated work
Flexibility lives at the spec layer, not the execution layer. → Read the close-the-loop piece
05Where this leads.
None of this is theoretical. The constraints, the bookends, the close-the-loop logic — those came out of the last year of building production AI work, and adopting AI across an engineering org in a regulated healthcare context.
I'm working the thesis from two sides. I'm building Pilot — an orchestration layer that takes the runner-vs-workflow split seriously, starting with an open-source community edition. And I do advisory work with engineering leaders navigating the same transition inside their organisations — figuring out which stage they're actually at, what foundation work matters next, how to adopt AI without amplifying what's already broken.
Different angles on the same question. The question is what I'm interested in.
If any of this resonates and you want to follow the thinking, the Field Notes go out when there's something worth saying. Otherwise, the writing keeps going.