Agentic AI¶
Traditional AI applications take a prompt and return a response. Agentic AI goes further -- it can plan, use tools, make decisions, and take multi-step actions to accomplish goals. This page covers the protocols, patterns, and infrastructure that make agentic AI possible.
What Makes AI "Agentic"?¶
A standard LLM interaction is reactive: you ask, it answers. An agentic system is proactive: given a goal, it can break it into steps, decide which tools to use, evaluate its own output, and iterate until the goal is met.
| Characteristic | Standard LLM | Agentic AI |
|---|---|---|
| Interaction | Single turn or multi-turn chat | Autonomous multi-step execution |
| Tool use | None | Calls APIs, searches databases, runs code |
| Planning | None | Breaks goals into subtasks |
| Memory | Limited to context window | Short-term and long-term memory |
| Self-correction | None | Evaluates and revises its own output |
| Decision-making | Follows instructions literally | Chooses between approaches |
Not Everything Needs to Be Agentic
Agentic systems add complexity. If a well-crafted prompt with RAG solves your problem, you do not need an agent. Use agents when the task genuinely requires multi-step reasoning, tool use, or dynamic decision-making.
The Agentic Ecosystem¶
The modern agentic AI ecosystem consists of models, protocols, frameworks, and infrastructure working together:
graph TD
subgraph Protocols
MCP["MCP\n(Model Context Protocol)"]
A2A["A2A\n(Agent-to-Agent)"]
AGUI["AG-UI\n(Agent-User Interface)"]
end
subgraph Frameworks
LG["LangGraph"]
AG["AutoGen"]
SK["Semantic Kernel"]
CW["CrewAI"]
end
subgraph Infrastructure
MEM["Memory\nSystems"]
OBS["Observability\n& Tracing"]
TOOLS["Tool\nRegistries"]
end
subgraph Models
LLM["LLMs\n(GPT, Claude, Gemini)"]
SLM["SLMs\n(Phi, Gemma)"]
end
Models --> Frameworks
Protocols --> Frameworks
Infrastructure --> Frameworks
style MCP fill:#057398,stroke:#004987,color:#fff
style A2A fill:#00A0DF,stroke:#004987,color:#fff
style AGUI fill:#57C0E8,stroke:#004987,color:#000
style LG fill:#632C4F,stroke:#632C4F,color:#fff
style AG fill:#853175,stroke:#632C4F,color:#fff
style SK fill:#9E57A2,stroke:#632C4F,color:#fff
style CW fill:#853175,stroke:#632C4F,color:#fff
style MEM fill:#004987,stroke:#004987,color:#fff
style OBS fill:#057398,stroke:#004987,color:#fff
style TOOLS fill:#00A0DF,stroke:#004987,color:#fff
style LLM fill:#259638,stroke:#259638,color:#fff
style SLM fill:#259638,stroke:#259638,color:#fff
Key Protocols¶
Model Context Protocol (MCP)¶
MCP is an open standard (created by Anthropic) that defines how AI models connect to external tools and data sources. Think of it as a USB-C for AI -- a universal interface that lets any model talk to any tool.
Why it matters:
- Before MCP, every tool integration was custom-built for each model/framework combination.
- With MCP, a tool server built once can work with any MCP-compatible client.
- It standardizes how tools describe their capabilities, accept inputs, and return results.
Key components:
- MCP Server
- Exposes tools, resources, and prompts via a standardized protocol.
- MCP Client
- The AI application that discovers and calls tools from MCP servers.
- Transport
- Communication layer (stdio for local, HTTP with SSE for remote).
Agent-to-Agent Protocol (A2A)¶
A2A (by Google) enables agents built on different frameworks to communicate with each other. While MCP connects models to tools, A2A connects agents to other agents.
Key concepts:
- Agent Cards: JSON metadata describing what an agent can do (like a business card for agents).
- Tasks: Structured units of work that one agent can send to another.
- Streaming: Support for long-running tasks with progress updates.
AG-UI Protocol¶
AG-UI (Agent-User Interface) standardizes the communication between AI agents and frontend interfaces. It defines how agents stream their progress, decisions, and outputs to users in real time.
Why it matters:
- Users need visibility into what agents are doing (not just the final answer).
- AG-UI provides standard events for tool calls, state changes, and intermediate results.
- It enables consistent UX patterns across different agent frameworks.
Tool Use and Function Calling¶
Tool use (also called function calling) is the mechanism that lets an LLM invoke external functions. The model does not execute code directly -- instead, it outputs a structured request (function name + arguments), the application executes it, and the result is fed back to the model.
How It Works¶
- You define available tools (name, description, parameters) in the system prompt or API call.
- The model decides whether a tool is needed to answer the user's question.
- If yes, the model outputs a tool call with arguments.
- Your application executes the tool and returns the result.
- The model incorporates the result into its response.
The Model Does Not Execute Tools
The model only decides which tool to call and with what arguments. Your application code is responsible for actually executing the tool. This is an important security boundary.
Agentic Design Patterns¶
ReAct (Reasoning + Acting)¶
The ReAct pattern interleaves reasoning and action. The agent thinks about what to do, takes an action (tool call), observes the result, and then thinks again.
Thought: I need to find the user's order status. I should search the database.
Action: search_orders(user_id="12345")
Observation: Order #789 - Shipped, tracking: XYZ123
Thought: I have the information. I can now respond to the user.
Answer: Your order #789 has been shipped. Tracking number: XYZ123.
Reflection¶
In the Reflection pattern, an agent evaluates its own output and decides whether to revise it. This is like a built-in code review -- the agent generates a draft, critiques it, and improves it.
Common implementation:
- Generator agent produces initial output.
- Critic agent reviews the output against quality criteria.
- If the critic finds issues, the generator revises.
- This loop repeats until quality is acceptable or a max iteration is reached.
Supervisor / Router¶
A Supervisor agent acts as a coordinator. It receives a user request, decides which specialized agent should handle it, routes the task, and aggregates results.
graph TD
U["User\nRequest"] --> S["Supervisor\nAgent"]
S -->|"Code question"| C["Coding\nAgent"]
S -->|"Data question"| D["Data\nAgent"]
S -->|"Writing task"| W["Writing\nAgent"]
C --> S
D --> S
W --> S
S --> R["Final\nResponse"]
style U fill:#057398,stroke:#004987,color:#fff
style S fill:#632C4F,stroke:#632C4F,color:#fff
style C fill:#00A0DF,stroke:#004987,color:#fff
style D fill:#00A0DF,stroke:#004987,color:#fff
style W fill:#00A0DF,stroke:#004987,color:#fff
style R fill:#259638,stroke:#259638,color:#fff
Handoff¶
In a Handoff pattern, one agent transfers control to another when the task moves outside its area of expertise. Unlike a supervisor that routes upfront, handoff happens mid-conversation.
Example: A customer service agent handles a general inquiry, then hands off to a billing specialist agent when the conversation shifts to payment issues.
Agent Memory¶
Agents need memory to maintain context across interactions and learn from past experience.
Short-Term Memory¶
- Stored within the current conversation or session.
- Typically the messages in the LLM's context window.
- Lost when the session ends.
- Example: Remembering what the user said three messages ago.
Long-Term Memory¶
- Persisted across sessions in an external store (database, vector store, file system).
- Allows agents to remember user preferences, past interactions, and learned facts.
- Must be explicitly managed (what to store, when to retrieve, when to forget).
- Example: Remembering that the user prefers Python over JavaScript.
Memory Is Not Free
Every piece of information stored in memory costs tokens when retrieved. Be selective about what goes into long-term memory. Store summaries and key facts, not raw transcripts.
Human-in-the-Loop¶
Not every decision should be automated. Human-in-the-loop (HITL) patterns ensure that a human reviews and approves critical actions before they are executed.
When to use HITL:
- Actions with real-world consequences (sending emails, making purchases, modifying data)
- High-stakes decisions (financial transactions, medical recommendations)
- When confidence is low (the agent is unsure about its plan)
- Regulatory requirements demand human oversight
Implementation approaches:
- Approval gates: The agent pauses and asks for confirmation before executing a tool.
- Review queues: Actions are queued for human review before execution.
- Escalation: The agent recognizes when it is out of its depth and escalates to a human.
Observability and Tracing¶
Agentic systems are harder to debug than simple API calls. An agent might make a dozen tool calls, revise its plan three times, and route through multiple sub-agents before producing a response. Observability gives you visibility into this process.
What to Trace¶
- Agent decisions: Why did the agent choose this tool? Why did it route to this sub-agent?
- Tool calls: What was called, with what arguments, what was returned, how long did it take?
- Token usage: How many tokens were consumed at each step?
- Latency breakdown: Where is time being spent?
- Errors and retries: What failed and how did the agent recover?
Tools for Observability¶
| Tool | Type | Key Features |
|---|---|---|
| LangSmith | Managed service | Deep LangChain/LangGraph integration, evaluation |
| Azure AI Foundry Tracing | Managed service | Built into Azure AI, end-to-end traces |
| Phoenix (Arize) | Open source | Model-agnostic, real-time monitoring |
| OpenLLMetry | Open source | OpenTelemetry-based, vendor-neutral |
Deterministic vs Non-Deterministic Workflows¶
| Aspect | Deterministic | Non-Deterministic |
|---|---|---|
| Flow | Predefined sequence of steps | Agent decides the path dynamically |
| Predictability | Same input always produces same flow | Flow may vary between runs |
| Use case | Structured processes (approvals, pipelines) | Open-ended tasks (research, analysis) |
| Debugging | Easier -- follow the fixed path | Harder -- need observability |
| Example | "Extract data, validate, save to DB" | "Research this topic and write a report" |
Start Deterministic, Add Agency Gradually
Build your workflow as a deterministic pipeline first. Then identify specific decision points where the LLM should choose the path. This hybrid approach gives you predictability where you need it and flexibility where it helps.
Orchestration Frameworks¶
| Framework | Maintainer | Key Strengths |
|---|---|---|
| LangGraph | LangChain | Graph-based workflows, streaming, persistence, human-in-the-loop |
| AutoGen | Microsoft | Multi-agent conversations, code execution, group chat |
| Semantic Kernel | Microsoft | Enterprise-ready, .NET and Python, planner-based agents |
| CrewAI | CrewAI | Role-based agents, easy-to-define crews and tasks |
Each framework has its own philosophy. LangGraph is graph-first (you define nodes and edges). AutoGen is conversation-first (agents talk to each other). Semantic Kernel is plugin-first (you compose capabilities). CrewAI is role-first (you define agent personas).