What makes a model an agent
A plain language model maps a prompt to a response and stops. An agent wraps the same model in a loop and gives it tools — functions it can call, like searching the web, running code, querying a database, or editing a file. The model no longer just answers; it decides what to do next, takes an action, sees the result, and continues. The model supplies the judgment; the loop and the tools supply the ability to act on the world.
The ReAct loop: reason, act, observe
The pattern most agents follow was crystallized by the ReAct paper (2022): interleave reasoning and acting. The model thinks a step (“I need the current price, so I'll search”), emits an action (a tool call with arguments), the system runs the tool and feeds the observation (the result) back into the context, and the model reasons again with that new information. Repeat until the model decides it has enough to answer.
This is why agents can handle tasks a single forward pass can't: they gather information incrementally and correct course based on real feedback, rather than committing to an answer up front. It pairs naturally with test-time compute — both spend extra inference to reach a better result.
How function calling actually works
Under the hood, tool use is less magical than it looks. The model is given a list of tools with names, descriptions, and argument schemas (typically JSON Schema). When it wants to use one, it doesn't run anything — it emits a structured request: the tool name and a JSON object of arguments. Your code (or the API runtime) parses that, actually executes the function, and returns the result as a new message. The model was trained — via fine-tuning on tool-use traces — to produce well-formed calls and to interpret the results.
So “the model called an API” really means “the model produced text saying which API to call, and trusted infrastructure called it.” That separation is also the security boundary: the model proposes, your code disposes, and whatever you let it call is exactly the blast radius — which is why tool allowlisting and sandboxing matter.
The failure modes — and the cost trap
Loops that act on the world fail in ways one-shot answers can't. Error cascades: a wrong early step poisons every later one, since the bad observation stays in context. Loops: the agent retries the same failing action forever, or oscillates between two. Context bloat: every observation is appended, so long agent runs fill the context window and degrade.
And the one that bites hardest in production: cost blowups. Each loop iteration is a full model call, and tokens accumulate across the whole trajectory, so an agent left to run unbounded can burn an enormous bill on a single task. Hard caps on iterations, token budgets, and timeouts aren't polish — they're load-bearing. An agent without a budget ceiling is a runaway waiting to happen.
Multi-agent orchestration
Beyond a single loop, you can compose multiple agents: a planner that breaks a task into subtasks, specialist agents that each own one, and a synthesizer that combines results. Done well, this parallelizes work and lets each agent keep a focused context. Done carelessly, it multiplies every failure mode above — more loops, more tokens, more places for an error to cascade — at multiplied cost. The discipline is the same: bound it, scope each agent tightly, and verify outputs rather than trusting them.
The throughline: an agent is a model plus a loop plus tools plus guardrails. The model is the easy part. The loop is where the capability comes from — and where the cost and the failures come from too.