Every vendor selling software right now claims their product is “agentic.” I’ve seen chatbots with a system prompt called an agent. I’ve seen a scheduled Python script described as autonomous AI. I’ve also shipped actual agents to production — at an insurance company, handling FNOL triage, policy lookup, and claims routing. The gap between what gets marketed as an agent and what you’d actually build is significant.
This post is about that gap. What an AI agent actually is, architecturally. How it differs from a prompt chain or a simple LLM call. The three patterns you’ll encounter in production. And — the part most write-ups skip — when you should not build one.
If you’re evaluating whether to build agent infrastructure for your platform, or just trying to get the vocabulary right before a design review, this is the place to start.
The Core Loop#
The term “agent” in AI comes from the same place it does in economics and philosophy: an entity that perceives its environment, makes decisions, and takes actions to achieve a goal. Applied to LLM-based systems, the definition is precise:
An AI agent is a system that uses an LLM as its reasoning core, can invoke external tools, maintains state across multiple steps, and continues operating until a goal is reached — without requiring a human to direct every step.
The loop that makes this work is: Perceive → Reason → Act → Observe, repeating until the task is complete or a stopping condition is hit.
Each cycle through this loop is one “step.” A simple task might complete in two steps. A complex multi-system workflow might take fifteen. The agent doesn’t know in advance how many steps it needs — that’s determined dynamically by the LLM’s reasoning at each iteration.
This is fundamentally different from anything you’d call a pipeline or a chain. A pipeline has a fixed number of nodes. An agent has a loop.
LLM Call vs. AI Agent#
The distinction matters for architecture, cost, latency, and reliability. Here’s the practical difference:
| Dimension | Plain LLM Call | AI Agent |
|---|---|---|
| Steps | Single round-trip | Multiple iterations, variable |
| Tool use | Optional, single invocation | Core mechanism, multi-step |
| State | Stateless (prompt window only) | Persistent state across steps |
| Goal completion | Responds to the prompt | Pursues a goal until done |
| Latency | Predictable (1–3s) | Variable (5s to minutes) |
| Cost | Predictable per-request | Multiplied by steps and retries |
| Failure modes | Hallucination, wrong answer | Looping, cascading tool failures, context exhaustion |
| Observability | Simple (input/output) | Requires tracing every step |
The key insight: an agent is not a smarter LLM call. It is an orchestration pattern that uses an LLM as the decision engine. You’re running multiple LLM calls per user request, each consuming tokens, each potentially calling an external system. The costs and risks compound accordingly.
Three Patterns You’ll Actually Use#
1. ReAct — Reason + Act Interleaved#
ReAct (Reasoning and Acting) is the baseline agent pattern. The LLM alternates between thinking about what to do and doing it. Each step produces a Thought, an Action, and an Observation — the observation from the previous tool call feeds into the next reasoning step.
ReAct agents are the right default for tasks that are exploratory or where the correct sequence of tool calls depends on intermediate results. They handle unexpected tool outputs well because they reason fresh at each step.
The failure mode: ReAct agents on long-horizon tasks accumulate context. By step 8 or 9, the model is reasoning over a window packed with prior tool outputs. Attention degrades on information from early in the context — the “lost in the middle” problem, documented by Stanford researchers — and the agent starts making worse decisions.
2. Plan-and-Execute — Strategy Before Action#
Plan-and-Execute separates the planning step from execution. A planner model (often a higher-capability model like o3 or Claude Sonnet) analyzes the goal and produces a task graph. An executor (often a smaller, cheaper model) carries out each task. A re-planner evaluates results and adjusts if needed.
This pattern performs better on complex multi-step tasks because the planner has global context before any execution starts. It also gives you a natural place to inject a human-approval step: show the plan to the user before running it.
The trade-off: the planning step adds latency. And if the planner gets the decomposition wrong — misses a dependency, over-specifies a sub-task — execution can compound that error across many steps.
3. Multi-Agent — Orchestrator + Specialists#
Rather than one agent doing everything, you decompose by domain: a Claims agent, a Policy agent, a Routing agent, each with its own tool set and system prompt. An orchestrator receives the top-level goal and delegates to specialists.
This pattern solves a real problem. A single agent managing a 30-tool list will misfire — tool selection accuracy degrades sharply as the tool count grows. Studies have measured tool hallucination rates above 90% when agents are given irrelevant or mismatched tools. Specialist agents have small, coherent tool sets, which keeps selection accurate.
Multi-agent also enables parallelism. The orchestrator can fan out to multiple specialists simultaneously and merge results.
The cost: coordination overhead, more moving parts to trace, and the hard problem of passing context between agents cleanly. I covered this architecture in depth in Multi-Agent AI for Enterprise.
What Agents Actually Need#
Four things separate a working agent from a prompt:
Tools are functions the agent can call. A tool is a typed interface: name, description, input schema, output schema. The LLM decides which tool to call and with what arguments. Tools can wrap REST APIs, database queries, file system operations, other LLMs, or any code you write. The description matters more than the implementation — if the LLM can’t figure out when to use a tool from its description, it won’t.
Memory covers how the agent retains information. Short-term memory is the context window — conversation history and tool outputs accumulated in the current session. Long-term memory requires external storage: vector stores for semantic retrieval, relational stores for structured facts, or key-value stores for session state. Most production agents need both: in-context working memory for the current task, retrieval-based memory for anything that doesn’t fit in the window.
State is what the agent tracks about the task in progress. This is different from memory. State is explicit: the current step in a workflow, which tools have been called, what results came back, whether a sub-task failed. In LangGraph, state is a typed schema that every node reads and writes. Without explicit state management, agents become hard to debug and impossible to resume after failure.
A runtime is the infrastructure that manages the loop. This handles scheduling steps, managing the event loop, handling tool call timeouts, retrying failed steps, and persisting state so you can inspect or restart a run. LangGraph, AutoGen, and Microsoft’s Semantic Kernel all provide agent runtimes. Rolling your own is a bad idea until you’ve spent significant time debugging production loops — there are dozens of edge cases that these frameworks have already handled.
The ReAct Agent Loop#
Here’s what actually happens inside a ReAct agent at the implementation level:
Each LLM call in this sequence costs tokens. Each tool call adds latency. A three-step agent like this one costs roughly 3–5x what a single LLM call costs, with 3–5x the latency. That’s the baseline you’re accepting before you’ve handled retries, failures, or edge cases.
When to Build an Agent#
Three conditions should all be true before you choose an agent architecture:
The task is genuinely multi-step, and the correct steps can’t be predetermined. If you already know the exact sequence of operations — call API A, then B, then format result — you don’t need an agent. You need a service. An agent earns its complexity when the path through the problem depends on intermediate results: “If the policy is active, check claim history; if there’s a recent claim, route to a specialist; otherwise, proceed to standard triage.”
The task requires dynamic tool selection. If you know which tools will be used in advance, hardcode the calls. Agents add value when the tool selection itself is non-deterministic — when different inputs require different tools, and you can’t enumerate all the combinations at design time.
The task can tolerate variable latency and multi-second processing time. Background jobs, asynchronous workflows, and batch processes are natural fits. Real-time user-facing interactions with sub-second expectations are not.
Secondary signal: tasks involving unstructured inputs (documents, emails, voice), variable task decomposition, or the need to reason about partial information mid-task all push toward agent patterns.
When NOT to Build an Agent#
This is the section that saves you from a bad week in production.
When the workflow is deterministic. If you’re processing invoices in a fixed sequence — extract fields, validate schema, match to PO, post to ERP — that’s an ETL pipeline. Add LLMs for the extraction step if you need them, but don’t make the whole thing an agent. You’ll pay for reasoning overhead on decisions that aren’t actually decisions.
When latency matters. Agents are inherently slow. A 3-step agent with Azure OpenAI GPT-4o takes 8–15 seconds end-to-end on a good day. If your SLA is under 3 seconds, or if a human is watching a spinner, the agent loop will cause problems. A simple prompt chain — fixed steps, predictable latency — is the right answer.
When you can enumerate the logic. I’ve seen teams replace 50 lines of business logic with an agent because it felt more “AI-native.” The business logic was clearer, faster, cheaper, and testable with unit tests. LLM reasoning is not a substitute for code when the rules are known.
When the cost doesn’t pencil out. An agent handling 10,000 requests per day at $0.05 per multi-step run costs $500/day, $15,000/month. A prompt chain at $0.002 per call costs $20/day. Before committing to agent architecture, model the per-request cost at scale. The 10x token multiplier from an agent loop has a real dollar figure.
When failure modes are unacceptable. Agents fail in non-obvious ways. Tool call hallucinations, cascading failures across multiple systems, context degradation mid-task. If a wrong action is irreversible — deleting a record, sending an email, posting a financial transaction — you need either deterministic logic or a mandatory human-in-the-loop approval gate, not an autonomous agent loop.
The honest framework: if you’d naturally describe the task as “run these steps in this order with this data,” build a service. If you’d describe it as “figure out what needs to happen and do it,” consider an agent.
Real-World Example: FNOL Claims Triage#
FNOL (First Notice of Loss) is the first contact a policyholder makes after an incident. Getting it right matters: fast routing reduces claim cycle time, accurate triage prevents underpayment, and missed information early in the process causes work 10 steps downstream.
Here’s why FNOL is a textbook agent use case:
- The input is unstructured: phone calls, web forms, photos, emails, chatbot transcripts
- The correct next step depends on intermediate findings: what type of incident, what the policy covers, whether there’s an active claim, what the damage severity estimate is
- The sequence of operations varies: a hail damage claim follows a different path than a liability claim
- Multiple systems must be queried: policy database, claims history, fraud signals, routing rules
A deterministic pipeline can’t handle this. The branching logic is too complex, and the input variability means you’d be writing conditional branches indefinitely. An agent handles it naturally.
Here’s the architecture:
In production, this agent runs asynchronously. The policyholder gets an acknowledgment immediately; the agent works in the background, typically completing in 20–40 seconds across 5–8 tool calls. The output is a structured claim record with routing decision, supporting evidence, and an audit trail of every tool call and reasoning step.
What makes this work: the tools are scoped, typed, and return structured data. The agent never writes to any system until the final routing step — all intermediate calls are read-only. Human review is built into the “complex” path, not bolted on as an afterthought.
A regional insurer using a similar architecture reported processing 12,000 FNOL submissions in six weeks, routing 83% to the correct queue automatically, and cutting first-contact time from 48 hours to 6. That’s the kind of result that justifies the agent complexity — because the problem is genuinely variable, the inputs are unstructured, and the correct path depends on what you find along the way.
Where to Go from Here#
The concepts in this post — the core loop, ReAct, Plan-and-Execute, tool design, state management — all have concrete implementations in LangGraph, which is the framework I use for Python-based agents and multi-agent systems.
The next post in this series covers LangGraph fundamentals: how to model agent state as a typed schema, how nodes and edges map to the Perceive-Reason-Act loop, and how to build the FNOL triage agent from scratch with proper error handling and observability. If you want to move from the conceptual framing here to running code, that’s the next step.

