Vibe engineering · May 22, 2026 · 9 min read

System architecture with AI agents

Steering AI agents at the system level: module contracts, boundaries, and what stays with the human.

Most teams use AI agents at the file level: “add this endpoint”, “fix this test”, “add validation”. That works as long as the change fits in one file. The trouble starts higher up — where decisions concern module boundaries, the contracts between them, and who owns what. Steering an agent at the system level is a different discipline from steering it at the file level. It requires a human — a senior, a tech lead, an architect — to keep control over the things an agent will not invent on its own: the data model, the boundaries, and the non-functional requirements.

The module and its contract as the agent’s unit of work

A file is a poor unit of work for an agent because it has no semantics of its own. The right unit is a module with an explicit contract: a clearly described input, output, errors, and invariants. When you tell an agent “implement the billing module against this contract”, you hand it a closed space where it can be creative without spilling across the whole system.

In practice the contract is the types at the module’s edge (its public interface), the schema of incoming data (e.g. Zod), the defined error cases, and a short list of invariants — things that must hold regardless of implementation. The agent receives the contract as input, not as something to “fill in” itself. This moves the most important decision — the shape of the boundary — from the model to the human.

There is a further benefit you will only appreciate at scale. When a module has a closed contract, you can hand its implementation to one agent and the neighbouring module to another — in parallel, without collisions. The contract boundary is also the parallelism boundary. Without it, two agents working the same area start overwriting each other’s assumptions, and you spend the evening merging contradictory visions of the same service. A contract is not bureaucracy — it is the protocol that lets many agents (and humans) work side by side without stepping on each other.

Why clear interfaces matter more when an agent writes the code

A human writing code carries unspoken context: they know this service “really” should not call the database directly, that this cache is temporary. The agent has none of that context. It reads what is written in the types and signatures — and treats it as the whole truth. If an interface is fuzzy, the agent fills the gap with the simplest assumption that compiles, not necessarily the one that is correct in the domain.

Hence the rule: the more code an agent writes, the sharper your interfaces must be. Narrow, explicit signatures, no “magic” parameters like options: any, no hidden side effects. A good interface is to an agent what good documentation is to a human: it states what is allowed and what is not. A bad boundary does not slow an agent down — on the contrary, the agent will happily cross it in ten files at once before you notice.

A practical consequence: it pays to make illegal states unrepresentable in the type itself. If a function takes status: string, sooner or later the agent will pass a misspelled “peding” and the code will compile. If it takes a narrow union of literals, the compiler catches the mistake before it reaches review. Each such typing decision is a gate the agent cannot jump — it works every time, without human vigilance. That is a cheaper form of control than catching the same bug by eye in the tenth PR of the week.

What must stay in human hands

There are three areas you do not hand off to an agent — not because it cannot write them, but because they are decisions, not generations.

The data model. The shape of entities, relations, keys, consistency boundaries. It is the foundation everything else stands on — the agent may propose a schema, but the sign-off and the migration consequences are yours.
Module boundaries. Where one responsibility ends and another begins. This is the decision about what will be cheap to change and what expensive. The agent cannot see your product trajectory a year out.
Non-functional requirements. Latency, LLM cost per request, memory budget, rollback path, security model. By default an agent optimises for “it works”, not for “it works in 50 ms and costs pennies”. You must state those constraints explicitly.

The common denominator: these are the things whose cost of error grows exponentially over time. A badly placed module boundary in week one is a one-day refactor. The same boundary after six months is rewriting a quarter of the system.

Small, reviewable changes at the architecture level

The temptation with agents is to commission one huge change — “rewrite the persistence layer onto a new ORM” — and get a hundred files to review. That is an anti-pattern. A human will not review a hundred files diligently; they will review the first five and trust the rest. And the bug is precisely in the rest.

Cut architectural changes into a sequence of small, reversible steps, each of which leaves the system in a working state. First introduce the new interface alongside the old. Then switch one module. Then the next. Each step is its own PR, its own green build, its own rollback point. An agent executes such steps beautifully — provided you drew the sequence, not it. The migration plan is an architectural decision; executing a single step is work you can delegate.

Documentation an agent consumes

Documentation in a world of agents stops being “for new hires”. It becomes an input to the model — the context on which the agent builds its decisions. That changes its form.

ADRs (Architecture Decision Records). Short records of a decision: what was decided, in what context, what the alternatives and consequences were. An agent reading an ADR understands why the boundary is here and not there — and will not “fix” it against intent.
Module specs. The contract, invariants, edge cases, explicit “out of scope”. This is the document you give the agent before implementation, not after.
A boundary map. One page: what the modules are, who talks to whom, what must not be called directly. It is the compass that keeps the agent in line at the system level.

Documentation an agent does not read is dead. Documentation you feed it in context becomes an active steering mechanism — cheaper and faster than catching mistakes at review.

Failure modes: an agent touches architecture unsupervised

When you let an agent make architectural decisions on its own, the failures have a repeatable shape. They are worth knowing, because each one is predictable.

Boundary erosion. The agent calls an internal module from the far end of the system “for convenience”, because it is the shortest path. After twenty such shortcuts the modules are tangled and the contract is fiction.
Data-model drift. Every change adds a field “just in case”. Entities bloat, semantics blur, and no one knows which field is the source of truth.
Invisible duplication. The agent does not find the existing abstraction, so it writes a third version of the same logic. It looks fine locally; globally you have three diverging implementations.
Silent non-functional regression. The code works, tests pass, but latency doubled or LLM cost per request shot up. No one notices, because no one measures it in a gate.

The common cause of all four: the agent optimises locally and short-term, because that is all it sees in context. Architecture is global, long-term optimisation — and that is exactly the part you do not delegate.

Crucially, these failure modes do not shout. None of them breaks the build or lights up a red lamp in CI — because locally everything is correct. They surface only as mounting friction: every next change costs a little more, every onboarding takes a little longer, every incident is harder to diagnose. This is architectural debt and, unlike debt in a single file, an agent almost never pays it back — the team does, by hand, over months. That is why architectural oversight is not something to bolt on “later”; it is the condition for the speed an agent gives you to hold up at all.

How to set this up in practice

A human defines the data model and module boundaries — written down in ADRs and a boundary map.
Each module gets a contract (types, schema, invariants, out of scope).
The agent implements inside one module, with the contract as input.
Architectural changes ship as a sequence of small, reversible PRs.
Non-functional requirements have their own gates (latency, cost, bundle) — not just tests.
Cross-module review is done by a human; the agent may review inside a module.

TL;DR

Steering an agent at the system level differs from the file level. The unit of work is a module with an explicit contract, not a file. The more code an agent writes, the sharper your interfaces must be, because the agent reads only what is written. The human keeps the data model, module boundaries, and non-functional requirements — these are decisions, not generations. Cut architectural changes into small, reversible steps, and treat documentation (ADRs, specs, the boundary map) as an input to the model, not an archive. Left unsupervised, an agent erodes boundaries, bloats the data model, duplicates logic, and silently breaks non-functionals — because it optimises locally, while architecture is global.