MAF v1 — Middleware and the Agent Pipeline (Python + .NET)

Series note — Part of MAF v1: Python and .NET. The Python-only Part 7 — Production Readiness covered HTTP middleware for auth; this chapter is about the three agent-level middleware types that sit inside the agent: AgentMiddleware, FunctionMiddleware, ChatMiddleware. Those three form what MAF calls the agent pipeline, and every production concern in the capstone — logging, tool audit, tool approval, content safety, PII redaction — is one of them.

Repo — Full runnable code for this chapter is at https://github.com/nitin27may/e-commerce-agents/tree/main/tutorials/06-middleware. Clone the repo, cd tutorials/06-middleware, and follow the README.

Why this chapter
#

A stateless chat agent is a nice demo. A production agent needs observability, audit, approval gates, and content safety — and you don’t want any of that hard-coded into the tool bodies themselves. MAF’s answer is middleware: code that wraps an agent run at one of three clearly-defined layers.

Once the three layers click you’ll stop writing custom one-offs and start composing. The capstone’s shared/middleware.py ships three stock pieces (agent-run logger, tool audit, PII redactor) and every specialist picks them up by default. That file is basically a longer version of the example we build here.

Prerequisites
#

Completed Chapter 05 — Context Providers.
.env at the repo root with OPENAI_API_KEY or the Azure OpenAI trio.
Read-first (optional): Agents — Middleware, Agents — Agent Pipeline, Journey — Adding Middleware.

The concept
#

A MAF agent doesn’t go straight to the LLM. A call to run() / RunAsync() descends through a stack of decorators — the agent pipeline — before anything hits the wire, and every response climbs back up through the same stack. You plug middleware in at one of three layers:

Agent-run middleware (AgentMiddleware) — wraps the entire run() call. Runs once per turn. Use for high-level concerns: correlation ids, start/finish log lines, top-of-stack tracing spans, run-wide rate limits, global short-circuits (e.g., “deny this user outright”).
Function-calling middleware (FunctionMiddleware) — wraps each individual tool invocation. May run zero, one, or many times per run() depending on how many tool calls the LLM emits. Use for per-tool concerns: approval gates, argument validation, auditing, result transformation.
Chat-client middleware (ChatMiddleware in Python, DelegatingChatClient in .NET) — wraps the raw LLM call. Runs once per LLM turn (and there can be several in a single run — tool-calling loop, remember). Use for message-level concerns: PII redaction, prompt-injection filters, content moderation, response caching, model routing.

The three layers are orthogonal: they wrap different things, see different contexts, and fire at different cadences. The trick is composing them.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor': '#2563eb','primaryTextColor': '#ffffff','primaryBorderColor': '#1e40af', 'lineColor': '#64748b','secondaryColor': '#f59e0b','tertiaryColor': '#10b981', 'background': 'transparent'}}}%% flowchart LR classDef core fill:#2563eb,stroke:#1e40af,color:#ffffff classDef external fill:#f59e0b,stroke:#b45309,color:#000000 classDef success fill:#10b981,stroke:#047857,color:#ffffff classDef error fill:#ef4444,stroke:#b91c1c,color:#ffffff classDef infra fill:#64748b,stroke:#334155,color:#ffffff user([User input]) agent[Agent.run] subgraph pipeline["Agent Pipeline"] direction LR amw[AgentMiddleware] fmw[FunctionMiddleware] cmw[ChatMiddleware] end tool[[Tool function]] llm[(LLM)] answer([Final answer]) user --> agent agent --> amw amw -- "1. wrap run" --> fmw fmw -- "2. wrap tool" --> tool tool -- "result" --> fmw fmw -- "3. wrap chat" --> cmw cmw -- "messages out" --> llm llm -- "completion" --> cmw cmw --> fmw fmw --> amw amw --> answer class agent,amw,fmw,cmw,tool core class llm external class answer success

Nesting, not chaining. AgentMiddleware wraps FunctionMiddleware wraps ChatMiddleware wraps the LLM. A request flows inward through all three, the LLM responds, and control bubbles back out through all three in reverse. Each layer sees a different context object with a different slice of the call.

Layer responsibilities at a glance
#

Layer	Wraps	Fires per run	Context object	Typical uses
`AgentMiddleware`	The whole `run()`	Once	`AgentContext` (agent, metadata)	Correlation id, run-wide logging, tracing spans, global auth check, top-level short-circuit
`FunctionMiddleware`	One tool call	0..N	`FunctionInvocationContext` (function, arguments, result)	Tool approval gate, argument validation, audit log, result rewriting
`ChatMiddleware` / `DelegatingChatClient`	One LLM round-trip	1..N (once per tool-calling iteration)	`ChatContext` (messages) / raw `ChatMessage` list	PII redaction, content safety, caching, model routing, prompt-injection filters

AgentMiddleware sees the run as a whole. FunctionMiddleware sees individual tool arguments and can set the result directly. ChatMiddleware sees raw messages on the wire.

Jargon recap
#

Agent pipeline — the composed stack of middleware around an agent’s LLM and tool calls. The word pipeline is the one Microsoft’s docs use; “middleware chain” and “agent pipeline” mean the same thing in MAF.
AgentMiddleware (Python) / run-level .Use(runFunc, runStreamingFunc) (.NET) — wraps one whole run() invocation.
FunctionMiddleware (Python) / function-level .Use(functionCallingFunc) (.NET) — wraps each tool call.
ChatMiddleware (Python) / DelegatingChatClient (.NET) — wraps the raw LLM call at IChatClient level.
DelegatingChatClient (.NET) — base class for chat middleware. Override GetResponseAsync and GetStreamingResponseAsync; call base.<method> to pass through, or return a cached response to short-circuit.
.AsBuilder().Use(...) (.NET) — the fluent API for attaching middleware. Returns a new, decorated agent; does not mutate the original. The most common .NET gotcha in this series — keep reading.
Short-circuit / MiddlewareTermination — a middleware that sets a result and returns (or raises in Python) to skip downstream middleware and the LLM call. How approval gates work.
context.result — the outgoing result carried in a middleware context. Setting it before await call_next() (and returning without calling next) is how you short-circuit a tool or chat call with a canned response.
Tool approval — a middleware pattern where a destructive tool is paused, surfaced to a human for confirmation, and only executed after the human approves. Implemented via FunctionMiddleware.

Code walkthrough
#

Full source: dotnet/Program.cs. .NET builds two pipelines — the IChatClient pipeline (chat layer) and the AIAgent pipeline (agent-run and function layers) — which compose naturally via .AsBuilder().Use(...). This chapter demonstrates chat-layer middleware in full and sketches the agent-run layer; the capstone’s shared agent factory (agents/dotnet/src/ECommerceAgents.Shared/Middleware/) has all three.

// dotnet/Program.cs (excerpt)
using Microsoft.Extensions.AI;
using Microsoft.Agents.AI;

// ChatMiddleware via DelegatingChatClient subclass.
public sealed class PiiRedactingChatClient : DelegatingChatClient
{
    private readonly Regex _pattern;
    private readonly Stats _stats;

    public PiiRedactingChatClient(IChatClient inner, Stats stats, Regex pattern) : base(inner)
    {
        _stats = stats;
        _pattern = pattern;
    }

    public override Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages, ChatOptions? options = null, CancellationToken ct = default)
    {
        Redact(messages);
        return base.GetResponseAsync(messages, options, ct);
    }

    public override IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages, ChatOptions? options = null, CancellationToken ct = default)
    {
        Redact(messages);                                   // ← both paths must be wrapped
        return base.GetStreamingResponseAsync(messages, options, ct);
    }

    private void Redact(IEnumerable<ChatMessage> messages) { /* replaces TextContent in-place */ }
}

// Build the IChatClient pipeline, wrap the chat layer with the delegating client.
IChatClient pipeline = rawChat.AsIChatClient()
    .AsBuilder()
    .Use(inner => new PiiRedactingChatClient(inner, stats, CardPattern))
    .Build();

// Tool with an inline argument guard (FunctionMiddleware-equivalent).
var weather = AIFunctionFactory.Create(
    ([Description("The city to look up.")] string city) =>
    {
        stats.ToolInvocations.Add(city);
        if (string.Equals(city, "Atlantis", StringComparison.OrdinalIgnoreCase))
        {
            stats.BlockedTools.Add(city);
            return "Refused: that city isn't supported.";       // canned refusal
        }
        return LookupWeather(city);
    }, name: "get_weather", description: "Look up the current weather for a city.");

var agent = new ChatClientAgent(pipeline, new ChatClientAgentOptions
{
    Name = "middleware-agent",
    ChatOptions = new ChatOptions { Instructions = Instructions, Tools = new[] { (AITool)weather } },
});

For agent-run middleware, .NET exposes AIAgentBuilder.Use(runFunc, runStreamingFunc):

// Agent-run middleware — wraps the whole RunAsync.
AIAgent wrapped = baseAgent
    .AsBuilder()
    .Use(
        runFunc: async (messages, thread, options, innerAgent, ct) =>
        {
            var runId = Guid.NewGuid().ToString("N")[..8];
            logger.LogInformation("agent.start run_id={RunId}", runId);
            var response = await innerAgent.RunAsync(messages, thread, options, ct);
            logger.LogInformation("agent.finish run_id={RunId}", runId);
            return response;
        },
        runStreamingFunc: async (messages, thread, options, innerAgent, ct) =>
        {
            // same shape, streaming path
        })
    .Build();

Full source: python/main.py. We build one tiny agent with one weather tool and three middleware — one of each layer. Key lines:

# python/main.py (excerpt)
from agent_framework import Agent, tool
from agent_framework._middleware import (
    AgentContext, AgentMiddleware,
    ChatContext, ChatMiddleware,
    FunctionInvocationContext, FunctionMiddleware,
)

_CARD_RE = re.compile(r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b")


class LoggingAgentMiddleware(AgentMiddleware):
    """AgentMiddleware — wraps the whole run()."""

    def __init__(self) -> None:
        self.events: list[str] = []

    async def process(self, context: AgentContext, call_next) -> None:
        self.events.append("agent:before")
        await call_next()
        self.events.append("agent:after")


class ArgValidatorMiddleware(FunctionMiddleware):
    """FunctionMiddleware — wraps each tool call. Short-circuits a forbidden city."""

    FORBIDDEN_CITY = "Atlantis"

    async def process(self, context: FunctionInvocationContext, call_next) -> None:
        city = (context.arguments or {}).get("city", "")
        if city.lower() == self.FORBIDDEN_CITY.lower():
            context.result = "Refused: that city isn't supported."
            return  # short-circuit — real tool body never runs
        await call_next()


class PiiRedactionChatMiddleware(ChatMiddleware):
    """ChatMiddleware — wraps the LLM call. Redacts card numbers in outbound messages."""

    async def process(self, context: ChatContext, call_next) -> None:
        for message in context.messages:
            for content in message.contents:
                text = getattr(content, "text", None)
                if text and _CARD_RE.search(text):
                    content.text = _CARD_RE.sub("[REDACTED-CARD]", text)
        await call_next()


agent = Agent(
    client,
    instructions=INSTRUCTIONS,
    name="middleware-agent",
    tools=[get_weather],
    middleware=[
        LoggingAgentMiddleware(),       # agent-run
        ArgValidatorMiddleware(),       # function
        PiiRedactionChatMiddleware(),   # chat
    ],
)

Three things worth staring at:

middleware=[...] is one flat list. MAF inspects each entry’s base class and routes it to the right layer in the pipeline. Order within a layer matters (first listed wraps outermost); order across layers does not — AgentMiddleware always wraps FunctionMiddleware always wraps ChatMiddleware no matter how you list them.
Short-circuit via context.result. Setting context.result and returning without awaiting call_next() skips the real tool body and feeds the canned refusal back to the LLM as if the tool had returned it. No exception, no retry, no mess. This is the same mechanism tool-approval gates use.
Mutating messages in place. MAF’s content objects are writeable in v1; the capstone’s PiiRedactionMiddleware (agents/python/shared/middleware.py:127-158) has a defensive variant that catches AttributeError on frozen content types.

Agent Pipeline — naming the concept
#

The three layers we just wired up are, collectively, the agent pipeline. MAF’s docs use exactly that phrase — see Agents — Agent Pipeline. It’s worth internalising the term because the rest of the series (“the pipeline builds this agent”, “wired into the pipeline”, “pipeline ordering”) will assume you know what it means.

Mental model: an agent is a ChatClientAgent (or Agent in Python), and the pipeline is everything between your call site and the raw chat client. You compose a pipeline by decorating a base agent with .AsBuilder().Use(...) (.NET) or by passing a middleware=[...] list (Python). The result is a new AIAgent whose RunAsync transparently goes through your stack.

Two pipeline rules to keep in your head:

Layers are ordered by type, not by listing. No matter how you order your middleware=[...] list, AgentMiddleware always wraps FunctionMiddleware always wraps ChatMiddleware. The type itself picks the layer.
Within a layer, first listed is outermost. Two AgentMiddleware instances in the same list wrap each other in declaration order. That’s the knob for tracing-before-auth versus auth-before-tracing arguments.

Tool Approval — short-circuiting from FunctionMiddleware
#

Destructive tools — cancel order, initiate refund, delete account — shouldn’t run on autopilot. FunctionMiddleware is the mechanism: intercept the call, surface a pending approval to a human, and only let execution proceed once you have an answer.

The minimum viable gate:

class ApprovalGateMiddleware(FunctionMiddleware):
    """Block tools flagged 'destructive' until an out-of-band approval lands."""

    DESTRUCTIVE = {"cancel_order", "initiate_refund", "delete_account"}

    def __init__(self, approvals: dict[str, bool]) -> None:
        self.approvals = approvals  # keyed by correlation id, filled by the UI
        self.pending: list[dict] = []

    async def process(self, context: FunctionInvocationContext, call_next) -> None:
        tool_name = getattr(context.function, "name", "")
        if tool_name not in self.DESTRUCTIVE:
            await call_next()
            return

        correlation_id = str(uuid.uuid4())[:8]
        if self.approvals.get(correlation_id) is True:
            await call_next()                      # human said yes
            return

        # Short-circuit: record the pending request, return a canned result.
        self.pending.append({
            "correlation_id": correlation_id,
            "tool": tool_name,
            "arguments": dict(context.arguments or {}),
        })
        context.result = (
            f"pending_approval:{correlation_id} — awaiting human confirmation "
            f"for {tool_name}. Reply with approval to proceed."
        )

Three things this buys you:

No tool code changes. The cancel_order function doesn’t know it was gated. The gate lives entirely in the pipeline.
Structured result, not an exception. Setting context.result feeds a string back to the LLM, which can tell the user “I’ve queued your cancellation for approval” naturally. An exception would abort the run.
Auditability for free. self.pending is a record of every gated call; log it and you have a complete tool-approval audit trail.

The user-facing half of this (a UI that shows the approval, a resume path that flips the flag) is covered in Chapter 17 — Human in the Loop. For the gate itself, FunctionMiddleware is the whole story.

In .NET the same pattern lives on FunctionInvokingChatClient hooks or a guard baked into the tool body. The capstone’s ToolAuditMiddleware (agents/dotnet/src/ECommerceAgents.Shared/Middleware/ToolAuditMiddleware.cs) is the audit half of approval; add a pending check before await body() and you have the gate.

Agent Safety — content filtering at the chat layer
#

PII redaction is the safety example we already showed, but ChatMiddleware generalises. Anything you want to enforce on raw LLM traffic — prompt-injection filters, content moderation calls, jailbreak resistance, model routing on a per-message basis — belongs here.

Three variants all built on the same DelegatingChatClient shape:

Input filtering — scan outbound messages for policy violations (PII, profanity, prompt-injection markers) and either redact, short-circuit with a canned refusal, or pass through. PiiRedactionMiddleware at agents/python/shared/middleware.py:127-158 is the redact variant.
Output filtering — scan the LLM’s response after await base.GetResponseAsync(...) returns, before the response reaches your agent. Good for stripping secrets the model might have echoed back.
Routing — inspect the message, pick a different IChatClient (cheaper model for trivial prompts, stronger model for complex ones), and forward the call. Microsoft’s docs call this out in the Middleware page.

The important design principle: the LLM never sees raw user input directly. It sees whatever the chat-layer pipeline hands to it. That’s the safety boundary — once you internalise it, questions like “where do I enforce content policy?” have a single answer: chat-layer middleware.

Stacking safety layers is fine and common. The capstone runs a single PiiRedactionMiddleware; a production deployment would add a content-moderation call and a prompt-injection scanner on top, all as separate ChatMiddleware instances in the same pipeline.

Side-by-side — Python vs .NET
#

Aspect	Python	.NET
Abstraction	Three base classes: `AgentMiddleware`, `FunctionMiddleware`, `ChatMiddleware`	One base class (`DelegatingChatClient`) plus `AIAgentBuilder.Use(runFunc, runStreamingFunc)` / `.Use(functionCallingFunc)`
Registration	`Agent(..., middleware=[...])` — flat list, layer inferred from type	`.AsBuilder().Use(...)` per layer, fluent chain, returns a new agent
Short-circuit	Set `context.result` and return without awaiting `call_next()`	Return a cached response; don’t call `base.GetResponseAsync(...)`
Ordering within layer	First listed wraps outermost	First `.Use()` on the builder wraps outermost
Streaming	Not split — `process` handles both paths	Both `runFunc` and `runStreamingFunc` must be wired, same for `GetStreamingResponseAsync` on `DelegatingChatClient`
Mutation	Mutate `context.messages` / `context.arguments` / `context.result` in place	Replace items in `message.Contents`; new `TextContent(...)` instances
Context shape	`AgentContext` / `FunctionInvocationContext` / `ChatContext`	Raw `IEnumerable<ChatMessage>` + `ChatOptions` at chat layer
DI integration	Instantiate and pass	Register as `DelegatingChatClient` service; `.AsBuilder().Use(inner => sp.GetRequiredService<T>(...))`

Structurally identical in intent, different in plumbing. Python hangs everything off three abstract classes with explicit layer semantics. .NET reuses the IChatClient / AIAgent delegation pattern at different layers.

Gotchas
#

.AsBuilder().Use(...) returns a NEW agent. This trips people up constantly. var wrapped = agent.AsBuilder().Use(...).Build(); does not mutate agent. You must use wrapped. Assigning agent.AsBuilder().Use(...).Build() and never storing it is a silent no-op — your middleware runs zero times and you spend an afternoon wondering why.
Both runFunc and runStreamingFunc must be provided in .NET. AIAgentBuilder.Use(runFunc, runStreamingFunc) takes two delegates because RunAsync and RunStreamingAsync are distinct code paths. Wire only one and the other silently bypasses your middleware. The capstone’s AgentRunLogger implements both sides.
Layer type determines position — not list order. In Python, listing [chat, function, agent] still produces the pipeline agent → function → chat → LLM. Intentional; makes the pipeline deterministic regardless of how the middleware is composed. You only control order within a layer.
Don’t carry state across runs unless you mean to. A counter on a middleware instance accumulates across every run that instance handles. Our tests build a fresh middleware per test; production code shares a singleton deliberately for metrics (PII redaction count, tool-invocation totals).
The arguments dict isn’t mutable on every backend. To short-circuit a tool call in Python, set context.result rather than mutating context.arguments. The Pydantic models MAF uses internally may be frozen.
DelegatingChatClient must call base.GetResponseAsync(...) (or return a response). Forgetting to call the inner client hangs the run — the ChatClientAgent is waiting for a response that never arrives.
Tool body guards are NOT the same as FunctionMiddleware. A guard baked into the tool function runs after any function-level middleware and doesn’t participate in the pipeline’s observability. If you want to audit guarded refusals uniformly, the guard belongs in middleware.
Exceptions in middleware propagate to the run. If your AgentMiddleware raises, agent.run() raises. Use try/finally if you need the finish-log to fire on failure; our capstone AgentRunLogger does exactly that.
Ordering AgentRunLogger before PiiRedactionMiddleware is the right stack for most cases: the logger captures the redaction count at run end, not mid-stream. The capstone’s default_middleware_stack() in shared/middleware.py:164-174 locks this order.

Tests
#

# Python — 5 integration tests against real Azure OpenAI
source agents/.venv/bin/activate
python -m pytest tutorials/06-middleware/python/tests/ -v

# .NET — 4 integration tests
cd tutorials/06-middleware/dotnet
dotnet test tests/Middleware.Tests.csproj

All 9 tests green against Azure OpenAI. Integration-only on purpose — the middleware chain is tightly coupled to MAF’s invocation machinery, and stubbing it out defeats the point of testing the real composition.

The capstone’s agents/python/tests/test_middleware.py takes the opposite tack: 10+ unit tests against fake SimpleNamespace contexts, because the stock middleware’s behaviour (correlation id generation, counter accumulation, default-arg redaction) is testable without a live LLM. Both styles have their place.

How this shows up in the capstone
#

The chapter’s example is a simplified version of what lives in the capstone. Every specialist agent picks up the same three layers by default:

shared/middleware.py (Python, 175 lines, agents/python/shared/middleware.py:1-175) — three stock middleware plus a default_middleware_stack() factory:
- AgentRunLogger at lines 40–70 — AgentMiddleware that logs agent.start / agent.finish with a short correlation id surfaced on context.metadata["run_id"] so downstream middleware and tools can thread it through their own logs.
- ToolAuditMiddleware at lines 76–117 — FunctionMiddleware that records every tool invocation (tool, elapsed_ms, error) in a bounded list for health dashboards. Arguments are opt-in (capture_arguments=True) because audit logs leaking tool arguments is a classic data-exposure incident.
- PiiRedactionMiddleware at lines 127–158 — ChatMiddleware that masks card-number and SSN-shaped strings before the LLM sees them. Counts redactions so you can alert when sensitive strings leak in at a higher than expected rate.
- default_middleware_stack() at lines 164–174 — returns the ordered list every specialist picks up.
Wiring — shared/factory.py composes the stack into every specialist via the build_specialist_middleware(...) helper. Individual specialists don’t know or care; they get the default stack unless they opt in to additions.
.NET mirror — agents/dotnet/src/ECommerceAgents.Shared/Middleware/ has the same three pieces as standalone helpers (AgentRunLogger.cs, ToolAuditMiddleware.cs, PiiRedactor.cs) with unit tests at agents/dotnet/tests/ECommerceAgents.Shared.Tests/MiddlewareTests.cs.
HTTP middleware is different — agents/python/shared/auth.py:27 AgentAuthMiddleware is ASGI middleware (FastAPI layer), not MAF middleware. It validates JWTs on inbound HTTP, sets ContextVars, and is orthogonal to everything in this chapter. Both layers co-exist.

Production middleware layers validation, authorisation, audit, PII masking, telemetry spans, and approval gates on top of this chapter’s three-class example. Nothing in the decorator shape changes.

What’s next
#

Chapter 07 — Observability with OpenTelemetry picks up exactly where this chapter leaves off — the AgentMiddleware layer is where MAF creates the invoke_agent span, so adding OTel-native telemetry is a ten-line change on top of what we wrote today.