The Programmer as Orchestrator

Orchestrating Software begins with a simple shift in where the work sits

Feb 06, 2026

The work changes when the tools change, but responsibility remains.

Introduction

Over the past two years, an idea that once sounded slightly provocative has become increasingly practical. Andrej Karpathy captured the early shape of it in English: The New Programming Language, where the point is not that coding disappears, but that natural language starts to behave like a serious interface for getting software built.

Since then, another framing has become harder to ignore because it describes how many people now work. In a widely shared post, Marc Andreessen argues that AI coding does not remove programmers so much as reposition them as supervisors of parallel agents, with the job shifting from typing to orchestration, evaluation, and accountability (Marc Andreessen via Ian Miles Cheong on X).

“AI coding doesn’t eliminate programmers — it redefines them.” — Marc Andreessen (quoted in the X post)

Taken together, these two frames point to a quiet but important change. Natural language may be the interface, but the deeper shift is that the programmer’s value moves upstream into specification and downstream into verification and responsibility.

What do we gain when software becomes easier to produce? And what do we risk when it becomes easier to believe we are finished?

*Early alignment is rarely perfect; the work is in bringing the parts into agreement.* — Louvre Pyramid, Paris (Wed 17 Sep 2014)

The New Loop

What becomes normal is rarely noticed until it becomes costly.

The day-to-day experience has quietly reorganised around a different loop. The centre is no longer “write → compile → fix”. It is closer to “specify → generate → verify → ship → observe → revise”. What matters is that revision is not only about the code. It is also about the specification, because implementation tends to reveal what was missing, what was assumed, and what the user actually meant.

Before going further, it helps to name what this loop usually contains:

A shift from producing to steering: The first draft is increasingly generated, and the human role becomes choosing direction and setting constraints.
A shift from making to checking: Progress is less visible in keystrokes and more visible in decisions, tests, and reviews.
A shift from “done” to “understood”: Shipping and testing often teach you what the specification failed to capture.

Once this loop is visible, it becomes easier to see why “English as the new programming language” describes only part of what is happening. It helps explain how intent enters the system. It does not fully explain how correctness, trust, and change are managed afterwards.

That is where orchestration starts to look like the work itself.

Orchestration as Work

Coordination is a form of engineering, even when it looks like conversation.

Marc’s framing leans on a simple idea: if you can run many agents in parallel, your job becomes orchestration rather than transcription. The detail that is easy to miss is that orchestration is not only “asking for code”. It is also decomposing work, maintaining coherence, and integrating outputs into something that can live in the real world.

Before unpacking this, it is worth noting one practical implication. Orchestration also includes revisiting the specification, because rapid implementation tends to surface what was underspecified or misunderstood.

With that in mind, the orchestration layer often includes:

Parallelism depends on good decomposition: Running multiple agents at once only works when problems are split into clean, testable parts with stable interfaces.
More output introduces a coordination tax: Multiple generated solutions can fragment patterns and conventions unless someone actively enforces consistency.
Integration becomes a central task: The time saved in producing components can reappear when aligning them into one maintainable system.
Orchestration rewards restraint as much as speed: Knowing what not to generate, and when to stop iterating, becomes part of keeping systems coherent.

This makes a useful connection back to Karpathy’s earlier point in English: The New Programming Language. If English becomes a powerful interface, the craft is not only phrasing. It is the ability to structure intent into parts that can be delegated without losing control.

Once the work is seen this way, it becomes natural to ask a more basic question: what exactly are we steering towards when the “target” itself keeps changing?

The Moving Specification

We discover what we mean by trying to build it.

A quiet change in AI-assisted work is that the specification can become more fluid, not less. Fast generation makes it easier to try an idea quickly, and that often reveals that the original request was incomplete, slightly wrong, or based on assumptions that do not survive contact with reality. In practice, you are not only iterating towards a fixed target. You are refining the target as you learn.

Before listing implications, it helps to treat the specification as part of the “program”:

The spec is rarely complete at the start: Early requirements often describe an intention, not the full behaviour needed in edge cases and real use.
Building surfaces hidden assumptions: Implementation reveals missing constraints, conflicting goals, and where “obvious” meanings diverge across people.
Change is not failure; it is feedback: Adjusting the specification can be a sign that the work is learning, not simply drifting.
A good spec becomes a living set of checks: The most reliable form of specification is what can be tested repeatedly as the system evolves.

Once the target is allowed to move, evaluation becomes less about “did it run” and more about “does it still conform to what we now understand we need”. That transition is where Marc’s argument becomes especially pointed.

Evaluation Becomes the Craft

Confidence is not the same thing as correctness.

One line in Marc’s framing carries more weight than it first appears:

“If you don’t understand how to write code yourself, you can’t evaluate what the AI gives you.” — Marc Andreessen (quoted in the X post)

Evaluation is not only checking whether something runs. It is checking whether the behaviour conforms to the current specification — including the parts the team only discovered mid-build. This is also the moment where software stops being “generated code” and becomes “a system we are willing to stand behind”.

To make “evaluation” less vague, it helps to name what it often includes:

“Does it do what we meant?” comes before “does it run?” Conformance includes edge cases, error handling, and the meanings that only become clear after trying.
“Is it still true after change?” is the maintenance test: A living specification needs repeatable checks so upgrades do not quietly drift behaviour.
“Is it safe?” is not optional: Security, privacy, and data boundaries are easy to omit and hard to retrofit once habits form.
“Can we prove it to ourselves?” becomes routine: Tests, invariants, and observable behaviour are how intent survives iteration.

As soon as evaluation is treated as the craft, the discussion stops being about whether English replaces code. It becomes about whether teams can create trust at the same pace as they can create output. That raises the next question directly: who holds responsibility when execution is delegated?

Responsibility Does Not Move

Delegation changes the work, not the ownership.

A recurring idea in the discussion around Marc’s post is that execution can be handed to tools, but responsibility remains with people. That is not mainly a philosophical claim. It is how accountability works in organisations and communities.

Before the bullets, it is useful to say this plainly: when the system fails, the explanation that “the AI wrote it” does not reduce the impact on users or the obligations of those who shipped it.

That reality tends to reshape priorities:

Accountability becomes more explicit, not less: Someone still owns the outcome when generated code breaks in production.
Governance quietly expands: Review standards, testing gates, and audit trails become more valuable when change becomes cheaper.
Trust becomes a deliverable: The goal is not only to ship features, but to ship behaviour you can justify, explain, and safely modify.
Empathy becomes part of engineering judgement: The cost of mistakes is often paid by users and communities, not by the tool.

This is where the conversation naturally widens beyond programmers. End users do not experience “AI coding”. They experience reliability, regressions, and whether change is introduced with care. That external reality brings an internal question back into focus: if tools produce the first draft, how do people learn to judge?

The Apprenticeship Question

When struggle disappears, learning does not automatically remain.

The thread around Marc’s post includes a simple concern: if juniors do not write “bad code” first, how do they develop the intuition needed to evaluate future outputs? This matters because evaluation is not only knowledge. It is pattern recognition, caution, and a feel for where systems become brittle.

Before listing consequences, it is worth acknowledging that this is not nostalgia for older workflows. It is a question about how judgement is formed when the environment changes.

A few risks follow if learning pathways are left to chance:

Practice can be replaced by dependence: People may become fluent in producing outputs without building the mental models needed to debug them.
The “middle” can thin out quietly: Seniors gain leverage through orchestration, but fewer people may develop depth through repetition and failure.
Organisations can mortgage future capability: Short-term velocity can feel like progress until maintenance and incident response become harder.
The craft can narrow if it becomes too tool-shaped: Without deliberate exposure to fundamentals, teams may lose variety in approaches and problem-solving habits.

Once learning is included in the picture, it becomes clearer that “orchestrating software” is not merely a new productivity technique. It is a shift in how capability is built and sustained. That brings us back to the wider world, where users and communities experience the outcomes.

Conclusion

Andrej Karpathy’s early 2023 observation in English: The New Programming Language remains useful as an entry point. It helps explain why intent can now be expressed in natural language and turned into working output more readily than before.

Marc Andreessen’s early 2026 framing, 3 years later, shared widely in this X post, sharpens what comes next. It suggests the heart of programming is moving up a layer towards decomposition and specification, and down a layer towards verification, deployment, and the ongoing responsibility for outcomes.

If that is the direction of travel, then “orchestrating software” is not a slogan. It is a description of where attention goes when output becomes abundant and change becomes easy. The open question is whether our practices, incentives, and learning pathways will evolve fast enough to keep trust and quality moving with it.

*Alignment is what makes the system feel simple — even when it took iterations to get there.* — Louvre Pyramid, Paris (Wed 17 Sep 2014)

The future arrives quietly, then shapes what we consider normal.

Geoff’s Substack

Discussion about this post

Ready for more?