Series note — Part of MAF v1: Python and .NET. The Python-only Part 7 — Production Readiness covered HTTP middleware for auth; this chapter is about the three agent-level middleware types that sit inside the agent:
AgentMiddleware,FunctionMiddleware,ChatMiddleware. Those three form what MAF calls the agent pipeline, and every production concern in the capstone — logging, tool audit, tool approval, content safety, PII redaction — is one of them.
Repo — Full runnable code for this chapter is at https://github.com/nitin27may/e-commerce-agents/tree/main/tutorials/06-middleware. Clone the repo,
cd tutorials/06-middleware, and follow the README.
Why this chapter#
A stateless chat agent is a nice demo. A production agent needs observability, audit, approval gates, and content safety — and you don’t want any of that hard-coded into the tool bodies themselves. MAF’s answer is middleware: code that wraps an agent run at one of three clearly-defined layers.
Once the three layers click you’ll stop writing custom one-offs and start composing. The capstone’s shared/middleware.py ships three stock pieces (agent-run logger, tool audit, PII redactor) and every specialist picks them up by default. That file is basically a longer version of the example we build here.
Prerequisites#
- Completed Chapter 05 — Context Providers.
.envat the repo root withOPENAI_API_KEYor the Azure OpenAI trio.- Read-first (optional): Agents — Middleware, Agents — Agent Pipeline, Journey — Adding Middleware.
The concept#
A MAF agent doesn’t go straight to the LLM. A call to run() / RunAsync() descends through a stack of decorators — the agent pipeline — before anything hits the wire, and every response climbs back up through the same stack. You plug middleware in at one of three layers:
- Agent-run middleware (
AgentMiddleware) — wraps the entirerun()call. Runs once per turn. Use for high-level concerns: correlation ids, start/finish log lines, top-of-stack tracing spans, run-wide rate limits, global short-circuits (e.g., “deny this user outright”). - Function-calling middleware (
FunctionMiddleware) — wraps each individual tool invocation. May run zero, one, or many times perrun()depending on how many tool calls the LLM emits. Use for per-tool concerns: approval gates, argument validation, auditing, result transformation. - Chat-client middleware (
ChatMiddlewarein Python,DelegatingChatClientin .NET) — wraps the raw LLM call. Runs once per LLM turn (and there can be several in a single run — tool-calling loop, remember). Use for message-level concerns: PII redaction, prompt-injection filters, content moderation, response caching, model routing.
The three layers are orthogonal: they wrap different things, see different contexts, and fire at different cadences. The trick is composing them.
Nesting, not chaining. AgentMiddleware wraps FunctionMiddleware wraps ChatMiddleware wraps the LLM. A request flows inward through all three, the LLM responds, and control bubbles back out through all three in reverse. Each layer sees a different context object with a different slice of the call.
Layer responsibilities at a glance#
| Layer | Wraps | Fires per run | Context object | Typical uses |
|---|---|---|---|---|
AgentMiddleware | The whole run() | Once | AgentContext (agent, metadata) | Correlation id, run-wide logging, tracing spans, global auth check, top-level short-circuit |
FunctionMiddleware | One tool call | 0..N | FunctionInvocationContext (function, arguments, result) | Tool approval gate, argument validation, audit log, result rewriting |
ChatMiddleware / DelegatingChatClient | One LLM round-trip | 1..N (once per tool-calling iteration) | ChatContext (messages) / raw ChatMessage list | PII redaction, content safety, caching, model routing, prompt-injection filters |
AgentMiddleware sees the run as a whole. FunctionMiddleware sees individual tool arguments and can set the result directly. ChatMiddleware sees raw messages on the wire.
Jargon recap#
- Agent pipeline — the composed stack of middleware around an agent’s LLM and tool calls. The word pipeline is the one Microsoft’s docs use; “middleware chain” and “agent pipeline” mean the same thing in MAF.
AgentMiddleware(Python) / run-level.Use(runFunc, runStreamingFunc)(.NET) — wraps one wholerun()invocation.FunctionMiddleware(Python) / function-level.Use(functionCallingFunc)(.NET) — wraps each tool call.ChatMiddleware(Python) /DelegatingChatClient(.NET) — wraps the raw LLM call atIChatClientlevel.DelegatingChatClient(.NET) — base class for chat middleware. OverrideGetResponseAsyncandGetStreamingResponseAsync; callbase.<method>to pass through, or return a cached response to short-circuit..AsBuilder().Use(...)(.NET) — the fluent API for attaching middleware. Returns a new, decorated agent; does not mutate the original. The most common .NET gotcha in this series — keep reading.- Short-circuit /
MiddlewareTermination— a middleware that sets a result and returns (or raises in Python) to skip downstream middleware and the LLM call. How approval gates work. context.result— the outgoing result carried in a middleware context. Setting it beforeawait call_next()(and returning without calling next) is how you short-circuit a tool or chat call with a canned response.- Tool approval — a middleware pattern where a destructive tool is paused, surfaced to a human for confirmation, and only executed after the human approves. Implemented via
FunctionMiddleware.
Code walkthrough#
Full source: dotnet/Program.cs. .NET builds two pipelines — the IChatClient pipeline (chat layer) and the AIAgent pipeline (agent-run and function layers) — which compose naturally via .AsBuilder().Use(...). This chapter demonstrates chat-layer middleware in full and sketches the agent-run layer; the capstone’s shared agent factory (agents/dotnet/src/ECommerceAgents.Shared/Middleware/) has all three.
// dotnet/Program.cs (excerpt)
using Microsoft.Extensions.AI;
using Microsoft.Agents.AI;
// ChatMiddleware via DelegatingChatClient subclass.
public sealed class PiiRedactingChatClient : DelegatingChatClient
{
private readonly Regex _pattern;
private readonly Stats _stats;
public PiiRedactingChatClient(IChatClient inner, Stats stats, Regex pattern) : base(inner)
{
_stats = stats;
_pattern = pattern;
}
public override Task<ChatResponse> GetResponseAsync(
IEnumerable<ChatMessage> messages, ChatOptions? options = null, CancellationToken ct = default)
{
Redact(messages);
return base.GetResponseAsync(messages, options, ct);
}
public override IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
IEnumerable<ChatMessage> messages, ChatOptions? options = null, CancellationToken ct = default)
{
Redact(messages); // ← both paths must be wrapped
return base.GetStreamingResponseAsync(messages, options, ct);
}
private void Redact(IEnumerable<ChatMessage> messages) { /* replaces TextContent in-place */ }
}
// Build the IChatClient pipeline, wrap the chat layer with the delegating client.
IChatClient pipeline = rawChat.AsIChatClient()
.AsBuilder()
.Use(inner => new PiiRedactingChatClient(inner, stats, CardPattern))
.Build();
// Tool with an inline argument guard (FunctionMiddleware-equivalent).
var weather = AIFunctionFactory.Create(
([Description("The city to look up.")] string city) =>
{
stats.ToolInvocations.Add(city);
if (string.Equals(city, "Atlantis", StringComparison.OrdinalIgnoreCase))
{
stats.BlockedTools.Add(city);
return "Refused: that city isn't supported."; // canned refusal
}
return LookupWeather(city);
}, name: "get_weather", description: "Look up the current weather for a city.");
var agent = new ChatClientAgent(pipeline, new ChatClientAgentOptions
{
Name = "middleware-agent",
ChatOptions = new ChatOptions { Instructions = Instructions, Tools = new[] { (AITool)weather } },
});For agent-run middleware, .NET exposes AIAgentBuilder.Use(runFunc, runStreamingFunc):
// Agent-run middleware — wraps the whole RunAsync.
AIAgent wrapped = baseAgent
.AsBuilder()
.Use(
runFunc: async (messages, thread, options, innerAgent, ct) =>
{
var runId = Guid.NewGuid().ToString("N")[..8];
logger.LogInformation("agent.start run_id={RunId}", runId);
var response = await innerAgent.RunAsync(messages, thread, options, ct);
logger.LogInformation("agent.finish run_id={RunId}", runId);
return response;
},
runStreamingFunc: async (messages, thread, options, innerAgent, ct) =>
{
// same shape, streaming path
})
.Build();Full source: python/main.py. We build one tiny agent with one weather tool and three middleware — one of each layer. Key lines:
# python/main.py (excerpt)
from agent_framework import Agent, tool
from agent_framework._middleware import (
AgentContext, AgentMiddleware,
ChatContext, ChatMiddleware,
FunctionInvocationContext, FunctionMiddleware,
)
_CARD_RE = re.compile(r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b")
class LoggingAgentMiddleware(AgentMiddleware):
"""AgentMiddleware — wraps the whole run()."""
def __init__(self) -> None:
self.events: list[str] = []
async def process(self, context: AgentContext, call_next) -> None:
self.events.append("agent:before")
await call_next()
self.events.append("agent:after")
class ArgValidatorMiddleware(FunctionMiddleware):
"""FunctionMiddleware — wraps each tool call. Short-circuits a forbidden city."""
FORBIDDEN_CITY = "Atlantis"
async def process(self, context: FunctionInvocationContext, call_next) -> None:
city = (context.arguments or {}).get("city", "")
if city.lower() == self.FORBIDDEN_CITY.lower():
context.result = "Refused: that city isn't supported."
return # short-circuit — real tool body never runs
await call_next()
class PiiRedactionChatMiddleware(ChatMiddleware):
"""ChatMiddleware — wraps the LLM call. Redacts card numbers in outbound messages."""
async def process(self, context: ChatContext, call_next) -> None:
for message in context.messages:
for content in message.contents:
text = getattr(content, "text", None)
if text and _CARD_RE.search(text):
content.text = _CARD_RE.sub("[REDACTED-CARD]", text)
await call_next()
agent = Agent(
client,
instructions=INSTRUCTIONS,
name="middleware-agent",
tools=[get_weather],
middleware=[
LoggingAgentMiddleware(), # agent-run
ArgValidatorMiddleware(), # function
PiiRedactionChatMiddleware(), # chat
],
)Three things worth staring at:
middleware=[...]is one flat list. MAF inspects each entry’s base class and routes it to the right layer in the pipeline. Order within a layer matters (first listed wraps outermost); order across layers does not —AgentMiddlewarealways wrapsFunctionMiddlewarealways wrapsChatMiddlewareno matter how you list them.- Short-circuit via
context.result. Settingcontext.resultand returning without awaitingcall_next()skips the real tool body and feeds the canned refusal back to the LLM as if the tool had returned it. No exception, no retry, no mess. This is the same mechanism tool-approval gates use. - Mutating messages in place. MAF’s content objects are writeable in v1; the capstone’s
PiiRedactionMiddleware(agents/python/shared/middleware.py:127-158) has a defensive variant that catchesAttributeErroron frozen content types.
Agent Pipeline — naming the concept#
The three layers we just wired up are, collectively, the agent pipeline. MAF’s docs use exactly that phrase — see Agents — Agent Pipeline. It’s worth internalising the term because the rest of the series (“the pipeline builds this agent”, “wired into the pipeline”, “pipeline ordering”) will assume you know what it means.
Mental model: an agent is a ChatClientAgent (or Agent in Python), and the pipeline is everything between your call site and the raw chat client. You compose a pipeline by decorating a base agent with .AsBuilder().Use(...) (.NET) or by passing a middleware=[...] list (Python). The result is a new AIAgent whose RunAsync transparently goes through your stack.
Two pipeline rules to keep in your head:
- Layers are ordered by type, not by listing. No matter how you order your
middleware=[...]list,AgentMiddlewarealways wrapsFunctionMiddlewarealways wrapsChatMiddleware. The type itself picks the layer. - Within a layer, first listed is outermost. Two
AgentMiddlewareinstances in the same list wrap each other in declaration order. That’s the knob for tracing-before-auth versus auth-before-tracing arguments.
Tool Approval — short-circuiting from FunctionMiddleware#
Destructive tools — cancel order, initiate refund, delete account — shouldn’t run on autopilot. FunctionMiddleware is the mechanism: intercept the call, surface a pending approval to a human, and only let execution proceed once you have an answer.
The minimum viable gate:
class ApprovalGateMiddleware(FunctionMiddleware):
"""Block tools flagged 'destructive' until an out-of-band approval lands."""
DESTRUCTIVE = {"cancel_order", "initiate_refund", "delete_account"}
def __init__(self, approvals: dict[str, bool]) -> None:
self.approvals = approvals # keyed by correlation id, filled by the UI
self.pending: list[dict] = []
async def process(self, context: FunctionInvocationContext, call_next) -> None:
tool_name = getattr(context.function, "name", "")
if tool_name not in self.DESTRUCTIVE:
await call_next()
return
correlation_id = str(uuid.uuid4())[:8]
if self.approvals.get(correlation_id) is True:
await call_next() # human said yes
return
# Short-circuit: record the pending request, return a canned result.
self.pending.append({
"correlation_id": correlation_id,
"tool": tool_name,
"arguments": dict(context.arguments or {}),
})
context.result = (
f"pending_approval:{correlation_id} — awaiting human confirmation "
f"for {tool_name}. Reply with approval to proceed."
)Three things this buys you:
- No tool code changes. The
cancel_orderfunction doesn’t know it was gated. The gate lives entirely in the pipeline. - Structured result, not an exception. Setting
context.resultfeeds a string back to the LLM, which can tell the user “I’ve queued your cancellation for approval” naturally. An exception would abort the run. - Auditability for free.
self.pendingis a record of every gated call; log it and you have a complete tool-approval audit trail.
The user-facing half of this (a UI that shows the approval, a resume path that flips the flag) is covered in Chapter 17 — Human in the Loop. For the gate itself, FunctionMiddleware is the whole story.
In .NET the same pattern lives on FunctionInvokingChatClient hooks or a guard baked into the tool body. The capstone’s ToolAuditMiddleware (agents/dotnet/src/ECommerceAgents.Shared/Middleware/ToolAuditMiddleware.cs) is the audit half of approval; add a pending check before await body() and you have the gate.
Agent Safety — content filtering at the chat layer#
PII redaction is the safety example we already showed, but ChatMiddleware generalises. Anything you want to enforce on raw LLM traffic — prompt-injection filters, content moderation calls, jailbreak resistance, model routing on a per-message basis — belongs here.
Three variants all built on the same DelegatingChatClient shape:
- Input filtering — scan outbound messages for policy violations (PII, profanity, prompt-injection markers) and either redact, short-circuit with a canned refusal, or pass through.
PiiRedactionMiddlewareatagents/python/shared/middleware.py:127-158is the redact variant. - Output filtering — scan the LLM’s response after
await base.GetResponseAsync(...)returns, before the response reaches your agent. Good for stripping secrets the model might have echoed back. - Routing — inspect the message, pick a different
IChatClient(cheaper model for trivial prompts, stronger model for complex ones), and forward the call. Microsoft’s docs call this out in the Middleware page.
The important design principle: the LLM never sees raw user input directly. It sees whatever the chat-layer pipeline hands to it. That’s the safety boundary — once you internalise it, questions like “where do I enforce content policy?” have a single answer: chat-layer middleware.
Stacking safety layers is fine and common. The capstone runs a single PiiRedactionMiddleware; a production deployment would add a content-moderation call and a prompt-injection scanner on top, all as separate ChatMiddleware instances in the same pipeline.
Side-by-side — Python vs .NET#
| Aspect | Python | .NET |
|---|---|---|
| Abstraction | Three base classes: AgentMiddleware, FunctionMiddleware, ChatMiddleware | One base class (DelegatingChatClient) plus AIAgentBuilder.Use(runFunc, runStreamingFunc) / .Use(functionCallingFunc) |
| Registration | Agent(..., middleware=[...]) — flat list, layer inferred from type | .AsBuilder().Use(...) per layer, fluent chain, returns a new agent |
| Short-circuit | Set context.result and return without awaiting call_next() | Return a cached response; don’t call base.GetResponseAsync(...) |
| Ordering within layer | First listed wraps outermost | First .Use() on the builder wraps outermost |
| Streaming | Not split — process handles both paths | Both runFunc and runStreamingFunc must be wired, same for GetStreamingResponseAsync on DelegatingChatClient |
| Mutation | Mutate context.messages / context.arguments / context.result in place | Replace items in message.Contents; new TextContent(...) instances |
| Context shape | AgentContext / FunctionInvocationContext / ChatContext | Raw IEnumerable<ChatMessage> + ChatOptions at chat layer |
| DI integration | Instantiate and pass | Register as DelegatingChatClient service; .AsBuilder().Use(inner => sp.GetRequiredService<T>(...)) |
Structurally identical in intent, different in plumbing. Python hangs everything off three abstract classes with explicit layer semantics. .NET reuses the IChatClient / AIAgent delegation pattern at different layers.
Gotchas#
.AsBuilder().Use(...)returns a NEW agent. This trips people up constantly.var wrapped = agent.AsBuilder().Use(...).Build();does not mutateagent. You must usewrapped. Assigningagent.AsBuilder().Use(...).Build()and never storing it is a silent no-op — your middleware runs zero times and you spend an afternoon wondering why.- Both
runFuncandrunStreamingFuncmust be provided in .NET.AIAgentBuilder.Use(runFunc, runStreamingFunc)takes two delegates becauseRunAsyncandRunStreamingAsyncare distinct code paths. Wire only one and the other silently bypasses your middleware. The capstone’sAgentRunLoggerimplements both sides. - Layer type determines position — not list order. In Python, listing
[chat, function, agent]still produces the pipelineagent → function → chat → LLM. Intentional; makes the pipeline deterministic regardless of how the middleware is composed. You only control order within a layer. - Don’t carry state across runs unless you mean to. A counter on a middleware instance accumulates across every run that instance handles. Our tests build a fresh middleware per test; production code shares a singleton deliberately for metrics (PII redaction count, tool-invocation totals).
- The arguments dict isn’t mutable on every backend. To short-circuit a tool call in Python, set
context.resultrather than mutatingcontext.arguments. The Pydantic models MAF uses internally may be frozen. DelegatingChatClientmust callbase.GetResponseAsync(...)(or return a response). Forgetting to call the inner client hangs the run — theChatClientAgentis waiting for a response that never arrives.- Tool body guards are NOT the same as
FunctionMiddleware. A guard baked into the tool function runs after any function-level middleware and doesn’t participate in the pipeline’s observability. If you want to audit guarded refusals uniformly, the guard belongs in middleware. - Exceptions in middleware propagate to the run. If your
AgentMiddlewareraises,agent.run()raises. Usetry/finallyif you need the finish-log to fire on failure; our capstoneAgentRunLoggerdoes exactly that. - Ordering
AgentRunLoggerbeforePiiRedactionMiddlewareis the right stack for most cases: the logger captures the redaction count at run end, not mid-stream. The capstone’sdefault_middleware_stack()inshared/middleware.py:164-174locks this order.
Tests#
# Python — 5 integration tests against real Azure OpenAI
source agents/.venv/bin/activate
python -m pytest tutorials/06-middleware/python/tests/ -v
# .NET — 4 integration tests
cd tutorials/06-middleware/dotnet
dotnet test tests/Middleware.Tests.csprojAll 9 tests green against Azure OpenAI. Integration-only on purpose — the middleware chain is tightly coupled to MAF’s invocation machinery, and stubbing it out defeats the point of testing the real composition.
The capstone’s agents/python/tests/test_middleware.py takes the opposite tack: 10+ unit tests against fake SimpleNamespace contexts, because the stock middleware’s behaviour (correlation id generation, counter accumulation, default-arg redaction) is testable without a live LLM. Both styles have their place.
How this shows up in the capstone#
The chapter’s example is a simplified version of what lives in the capstone. Every specialist agent picks up the same three layers by default:
shared/middleware.py(Python, 175 lines,agents/python/shared/middleware.py:1-175) — three stock middleware plus adefault_middleware_stack()factory:AgentRunLoggerat lines 40–70 —AgentMiddlewarethat logsagent.start/agent.finishwith a short correlation id surfaced oncontext.metadata["run_id"]so downstream middleware and tools can thread it through their own logs.ToolAuditMiddlewareat lines 76–117 —FunctionMiddlewarethat records every tool invocation (tool,elapsed_ms,error) in a bounded list for health dashboards. Arguments are opt-in (capture_arguments=True) because audit logs leaking tool arguments is a classic data-exposure incident.PiiRedactionMiddlewareat lines 127–158 —ChatMiddlewarethat masks card-number and SSN-shaped strings before the LLM sees them. Counts redactions so you can alert when sensitive strings leak in at a higher than expected rate.default_middleware_stack()at lines 164–174 — returns the ordered list every specialist picks up.
- Wiring —
shared/factory.pycomposes the stack into every specialist via thebuild_specialist_middleware(...)helper. Individual specialists don’t know or care; they get the default stack unless they opt in to additions. - .NET mirror —
agents/dotnet/src/ECommerceAgents.Shared/Middleware/has the same three pieces as standalone helpers (AgentRunLogger.cs,ToolAuditMiddleware.cs,PiiRedactor.cs) with unit tests atagents/dotnet/tests/ECommerceAgents.Shared.Tests/MiddlewareTests.cs. - HTTP middleware is different —
agents/python/shared/auth.py:27AgentAuthMiddlewareis ASGI middleware (FastAPI layer), not MAF middleware. It validates JWTs on inbound HTTP, sets ContextVars, and is orthogonal to everything in this chapter. Both layers co-exist.
Production middleware layers validation, authorisation, audit, PII masking, telemetry spans, and approval gates on top of this chapter’s three-class example. Nothing in the decorator shape changes.
Further reading#
Microsoft Agent Framework docs
Related chapters
- Chapter 17 — Human in the Loop — the UI half of tool approval gates.
- Chapter 07 — Observability with OpenTelemetry —
AgentMiddlewareas a span-creation site.
What’s next#
Chapter 07 — Observability with OpenTelemetry picks up exactly where this chapter leaves off — the AgentMiddleware layer is where MAF creates the invoke_agent span, so adding OTel-native telemetry is a ten-line change on top of what we wrote today.

