Skip to content
Agents & workflows

Agents & workflows · · 9 min read

AI agents: an introduction

What an agent is (LLM plus tools plus loop plus memory), the action loop, MCP and when to reach for one.

“AI agent” is the most overused word in the industry right now. It is sold as magic, but inside sits a simple loop. Once you understand that loop, you stop falling for demos and start judging soberly when an agent makes sense and when a plain script is better. This text is for developers who have heard the word “agent” five hundred times and still cannot say how it differs from a chatbot. It differs in one thing: an agentacts, it does not merely answer.

What an agent is, and what it is not

A chatbot takes text and returns text. That is all. It has no access to the world, performs no actions, remembers nothing beyond the context window of the current conversation. An agent is more. The practical definition worth keeping: an agent is an LLM — plus tools, plus a loop, plus memory.

  • LLM — the brain that decides what to do next;
  • tools — the hands: search, a database, a terminal, an API;
  • loop — the machinery that lets it take more than one step;
  • memory — what the agent knows beyond the current prompt.

Remove any one of these and it stops being an agent. An LLM with no tools is a chatbot. An LLM with tools but no loop is a single function call — useful, but not an agent. Only the loop, where the model observes the result of its own action and decides the next one, turns a language model into something that pursues a goal.

The loop, the heart of an agent

The simplest mental model is the cycle perceive – plan – act – observe. The agent gets a goal, looks at the current state, picks an action, runs it, reads the result, and loops back. It repeats until it judges the goal met or it exhausts its step budget.

  1. Perceive — gather context: the goal, results so far, state of the world;
  2. Plan — decide which tool to call and with what arguments;
  3. Act — execute the tool call (your code does this, not the model);
  4. Observe — the tool result returns into context as a new observation.

This loop is deceptively simple, and that is where most production problems hide. Without a hard iteration cap an agent can spin in circles, call the same tool ten times, or “improve” a working solution until it breaks it. A step limit and a clear “done” criterion are not optional — they are a requirement.

Tool use and function calling

The mechanism that gives an LLM hands is called function calling or tool use. It works like this: you describe the available tools to the model — a name, a description, and an argument schema (usually JSON Schema). Instead of answering with text, the model returns a structured request: “call search with the argumentquery: AI agents”.

A key detail that trips up beginners: the model does not execute the tool. The model only asks for it to be executed. It is your code that receives the request, runs the actual function, catches the result, and feeds it back into context. The trust boundary runs right here — and it is on this side, in your code, that you put argument validation and limits. Never let model-generated text flow straight into eval, a shell, or a SQL query without a control layer.

ReAct versus plan-and-execute

There are two dominant loop architectures, and the difference is worth knowing.

ReAct (reason + act) interleaves thinking and doing. The model reasons aloud as it goes, takes one action, reads the result, reasons again. It is reactive and copes well in unpredictable environments — when you cannot plan everything up front because each step changes the picture. The downside: it can be short-sighted and lose its way in a long chain.

Plan-and-execute separates the phases. First the model lays out a whole plan (a list of steps), then runs them in order, replanning if something goes wrong. It gives more predictability and is cheaper — planning happens once, not on every iteration. The downside: a rigid plan handles poorly environments that change faster than the agent can work through it.

My default pick: start with ReAct, because it is simpler to run and debug. Move to plan-and-execute when tasks grow long and the cost of “thinking” on every step starts to hurt.

One agent or many

The temptation to split a problem across a team of specialised agents — researcher, coder, reviewer, orchestrator — is strong and usually premature. Multi-agent sounds elegant but adds two real costs: communication between agents burns tokens, and one agent’s mistakes propagate to the rest, often unnoticed.

  • One agent — the default start. Easier to trace, cheaper, enough for the vast majority of tasks;
  • Many agents — reach for them when a task splits naturally into parallel sub-domains, or when you need different tool sets and permissions for different roles.

Rule of thumb: do not add a second agent until one truly is not enough. Most “we need multi-agent” is really “our one agent has too many tools and too vague a prompt”.

Memory: short, long, vector

Memory comes in two kinds, and conflating them is a common design mistake.

  • Short-term memory is the context window — the current conversation, recent observations. It is fast but limited and vanishes when the session ends;
  • Long-term memory is knowledge that outlives the session: documents, history, facts about the user. Usually kept outside the model and pulled in on demand.

The most common long-memory mechanism is a vector store. You turn text into an embedding (a vector of numbers), and at query time you find the fragments closest in meaning and inject them into context. This is the foundation of RAG. Watch the trap: a vector store is not magic — feed it garbage and the agent gets semantically similar garbage back. The quality of your chunks and metadata decides everything.

MCP: a common standard for tools

Until recently every agent framework defined tools its own way. MCP (Model Context Protocol) is an attempt to unify this — an open protocol where a server exposes tools, resources, and prompts, and any client (agent) can use them without writing an integration from scratch.

Why this matters to you: you write an MCP server once, and it plugs into many agents and hosts. Instead of n integrations for n tools and m agents, you get one shared interface. For agent tooling it is roughly what LSP became for code editors — one protocol instead of a quadratic number of plugins. In 2026 it is worth writing new tool integrations against MCP if you have the choice.

Where agents shine, and where they fall apart

Agents are great where a task needs many steps, access to tools, and tolerance for iteration:

  • Research — searching many sources, synthesis, cross-checking;
  • Coding — reading a repo, writing changes, running tests, fixing;
  • Operations — ticket triage, routine data tasks, workflow automation.

And just as predictably they fall apart on:

  • Long horizons — the more steps, the higher the chance the agent loses the goal or gets stuck;
  • Error compounding — if a single step succeeds 95% of the time, then after twenty steps the combined success chance drops below half. Errors multiply, they do not add;
  • Cost — every iteration is an LLM call. An agent that “thinks” thirty times costs thirty times more than a single query, and sometimes still does not finish the task.

Treat these numbers as an estimate that illustrates the mechanism, not a hard benchmark — real rates depend on the task and the model. But the direction is certain: unreliability grows with chain length.

Guardrails and human in the loop

An agent with tool access can genuinely break something — delete a file, send an email, spend money. So guardrails are not an add-on; they are part of the architecture.

  • Limits — a maximum step count, a token budget, a timeout;
  • Tool validation — every call checked on the code side (e.g. by schema) before anything runs;
  • Least privilege — the agent gets access only to what it truly needs, never to the production database “just in case”;
  • Human in the loop — irreversible or costly actions require human approval before they execute.

Human in the loop is not distrust of AI — it is the same principle you apply to a production deploy or a database migration. The higher the stakes and the harder the effect is to undo, the closer a human should stand to the decision.

Agent or a plain script

The most important decision is made before you write a single line. The mental model is simple: if the steps are known up front and do not change — write a script. A script is cheaper, faster, deterministic, and easy to test.

Reach for an agent only when the path to the goal is not known up front, when each step depends on the result of the previous one, and the number of branches is too large to hard-code. In other words: an agent is a tool for uncertainty, not for repetition. If you can draw a flowchart of the task, a script is almost always better. If the chart has a hundred paths and half of them depend on the content of the data — that is fertile ground for an agent.

TL;DR

An agent is an LLM plus tools plus a loop plus memory — something that acts, not just answers. The heart is the perceive–plan–act–observe loop; function calling gives it hands, but your code runs the tool, not the model. Start with ReAct and a single agent, let a vector store handle knowledge beyond the context, and lean on MCP to standardise tools. Agents shine at research, coding, and operations, but fall apart over long horizons through error compounding and cost. Wrap them in limits and a human in the loop for irreversible actions. And if the steps are known up front — write a script, not an agent.

AI agents: an introduction | vibecoding.pl