Series note — Part of MAF v1: Python and .NET. Third orchestration pattern after Sequential and Concurrent. Next: Group Chat.
Repo — Full runnable code for this chapter is at https://github.com/nitin27may/e-commerce-agents/tree/main/tutorials/14-handoff-orchestration. Clone the repo,
cd tutorials/14-handoff-orchestration, and follow the per-language instructions below.
Why this chapter#
Sequential (Ch12) hard-codes the order. Concurrent (Ch13) runs everyone in parallel and then aggregates. Both predetermine the graph. Handoff is the first orchestration where the agents themselves decide what happens next. A Triage agent reads the user’s question, decides “this is math” or “this is history”, and invokes a tool call whose sole effect is to transfer control to a specialist. The specialist can finish the turn — or hand back to Triage if the question was mis-routed, or to a third specialist if the conversation evolves.
That shape is the backbone of customer-support bots (“billing question → billing specialist, account question → account specialist, escalation → human”), tutoring systems, and research assistants that pull in domain experts on demand. It’s also the shape that requires the most care, because agent-driven routing is where loops emerge: Triage hands to Math, Math decides “this is history actually” and hands back, Triage hands to History, History hands to Math, and the LLM budget evaporates. This chapter builds the happy path, shows exactly how the loop pathology happens, and names the mechanism that keeps it bounded — turn_limits.
Jargon defined inline below: HandoffBuilder (Python builder), AgentWorkflowBuilder.CreateHandoffBuilderWith (.NET builder), mesh topology, synthesised handoff_to_<name> tools, handoff_sent event, autonomous mode, turn_limits.
Prerequisites#
- Completed Chapter 13 — Concurrent Orchestration.
.envat the repo root with eitherOPENAI_API_KEYor the Azure OpenAI trio (AZURE_OPENAI_ENDPOINT,AZURE_OPENAI_KEY,AZURE_OPENAI_DEPLOYMENT). The Python and .NET samples both run three real LLM calls end-to-end — unlike Ch09/Ch10 this chapter has no offline mode.uvfor Python; .NET 10 SDK for the .NET sample.
The concept#
Mesh topology, not pipeline#
Sequential is a line. Concurrent is a star. Handoff is a mesh: every agent in the participants list is a node, and every configured handoff is a directed edge. The framework synthesises one tool per outgoing edge and advertises it to that agent’s LLM — a synthesised handoff_to_<name> tool is a function the framework injects into the agent’s tool list at build time, with a JSON schema derived from the target agent’s description, and whose sole side-effect when invoked is to transfer control of the workflow to the target. Agents don’t build the graph; they navigate it.
start node] math[Math specialist] history[History specialist] out([Final answer]) user --> triage triage -- "handoff_to_math" --> math triage -- "handoff_to_history" --> history math -- "handoff_to_triage" --> triage history -- "handoff_to_triage" --> triage math --> out history --> out class triage core class math,history core class user infra class out success
Three agents, four handoff edges, two “answer” exits. The forward edges from Triage are how routing happens; the back edges from the specialists are what enables cycles — and why turn_limits exist.
Key properties of the mesh:
- Every outgoing edge becomes a tool.
WithHandoffs(triage, [math, history])makes two tools appear on Triage’s tool list:handoff_to_math_tutorandhandoff_to_history_tutor. Triage never sees the graph; it sees two function signatures. - The edges are directional. If you declare
triage -> mathbut notmath -> triage, Math has no handoff tool to call — it must either answer or fall through to the user-input pause. - Agents with no outgoing edges must terminate. Without any
handoff_to_*tool the LLM has exactly one legal output shape: a plain assistant message. That message ends the turn. - Mesh is not all-to-all. You configure edges explicitly in .NET. Python’s
HandoffBuilderdefaults to all-to-all if you never calladd_handoff(...), but as soon as you call it once the defaults disappear and you have to configure every source.
Why loops happen, and what stops them#
The mesh has cycles by construction: once a specialist has a back-edge to Triage, Triage has forward edges to every specialist, and every LLM decision is non-deterministic. Nothing in the graph stops a pathological run where the agents keep passing the baton to each other because each one is slightly unsure whether the question is really in its domain.
ambiguous: math or history]) t1[Triage turn 1] m1[Math turn 1] t2[Triage turn 2] m2[Math turn 2] halt[turn_limits tripped
framework ends run] q --> t1 t1 -- "handoff_to_math" --> m1 m1 -- "handoff_to_triage" --> t2 t2 -- "handoff_to_math" --> m2 m2 -. "would hand back" .-> halt class t1,m1,t2,m2 core class halt error class q infra
A realistic pathological loop. Each handoff consumes a turn; turn_limits is the budget that stops the run when an agent would exceed its cap. Without that budget the workflow just keeps billing the LLM.
Turn limits is a per-agent budget on how many times that agent can be invoked in a single run. In Python it’s a kwarg on with_autonomous_mode:
.with_autonomous_mode(
agents=[triage, math, history],
turn_limits={triage.name: 3, math.name: 2, history.name: 2},
)In .NET the handoff builder exposes it implicitly through the interactive request/response loop — each RunStreamingAsync(...) call is one “batch” of turns, and the outer caller is responsible for bounding the number of batches. Both languages have the same failure mode without a cap: the LLM-driven cycle keeps consuming tokens until something outside the framework (quota, timeout, Ctrl-C) stops it.
Pick limits based on the minimum path length you actually need. For the two-specialist mesh above, triage: 3, math: 2, history: 2 is generous: Triage can route, get a mis-routed hand-back, route again; each specialist gets two shots at answering.
Autonomous vs interactive mode#
Handoff is the only orchestration in this series that is interactive by default. The reason is structural: when an agent doesn’t hand off, it has produced a message for the user — and the framework has no way to know whether the user wants to follow up. So instead of completing the workflow, the framework emits a request_info event and pauses, waiting for the caller to provide the next user turn.
Autonomous mode flips that. HandoffBuilder.with_autonomous_mode(...) tells the framework to fabricate a “continue” response on the agent’s behalf whenever it would otherwise pause — the agent keeps the turn and either answers again or hands off. This is what most tutorials demo because you can’t have an interactive CLI inside pytest. In production you usually want the interactive flow: the agent pauses, the user responds, the next batch of turns runs. .NET’s CreateHandoffBuilderWith defaults to interactive; the caller provides each user message via TrySendMessageAsync(new TurnToken(...)).
Jargon recap#
HandoffBuilder(Python) — the fluent Python builder:HandoffBuilder(participants=[...]).with_start_agent(a).add_handoff(a, [b, c]).build(). Lives inagent_framework.orchestrations. Emits a regularWorkflowyou drive withworkflow.run(...)/workflow.run_stream(...).AgentWorkflowBuilder.CreateHandoffBuilderWith(.NET) — the equivalent .NET entry point. Returns aHandoffWorkflowBuilderon which you chainWithHandoffs(source, targets)calls and thenBuild(). Lives inMicrosoft.Agents.AI.Workflows. Marked[Experimental]in 1.1 — public preview, stable-enough shape, suppressed analyzerMAAIW001in the sample.csproj.- Mesh topology — the graph shape where every participant is a node and every configured
handoffis a directed edge. Contrast with Sequential (line) and Concurrent (star). - Synthesised
handoff_to_<name>tool — anAIToolthe framework injects into each agent’s tool list at build time, one per outgoing handoff edge. The<name>is the target agent’sName(name=in Python,name:inAsAIAgent). The tool’s JSON schema is built from the target’sDescription/description=so the source agent’s LLM sees what each handoff target specialises in. - Autonomous mode — a Python builder toggle (
with_autonomous_mode(...)) that makes the workflow fabricate a “continue” user turn whenever an agent declines to hand off, so the graph runs end-to-end without a human in the loop. Experimental in 1.1. .NET doesn’t expose it as a builder flag; the equivalent is the interactive TurnToken loop inProgram.cs. turn_limits— a per-agent budget (Python: kwarg onwith_autonomous_mode; .NET: enforced by the outer request/response loop) bounding how many times each agent can be invoked in one run. Prevents the agents-bouncing-back-and-forth failure mode the second diagram shows.handoff_sentevent (Python) — a stream event emitted by the workflow each time an agent successfully invokes a handoff tool. Payload is aHandoffSentEvent(source, target). The event is the observable equivalent of “an edge in the mesh just fired.”
Code walkthrough#
Source: python/main.py. Three agents, one builder chain, one event-stream consumer that tracks routing.
from agent_framework import Agent
from agent_framework.orchestrations import HandoffBuilder
def triage() -> Agent:
return Agent(
_default_client(),
instructions=(
"You are a Triage agent. Read the user's question and hand off to the "
"right specialist: math for arithmetic/math questions, history for "
"historical facts or dates. If the specialist answers, simply acknowledge "
"and stop — do not rewrite the answer."
),
name="triage",
)
def math_expert() -> Agent:
return Agent(
_default_client(),
instructions=(
"You are a Math expert. Answer arithmetic and math questions directly "
"with a single short sentence containing the numerical answer."
),
name="math",
)
def history_expert() -> Agent:
return Agent(
_default_client(),
instructions=(
"You are a History expert. Answer historical questions in one short "
"sentence with the specific date or year."
),
name="history",
)
def build_workflow():
t, m, h = triage(), math_expert(), history_expert()
return (
HandoffBuilder(participants=[t, m, h])
.with_start_agent(t)
.add_handoff(t, [m, h])
.add_handoff(m, [t]) # specialists can hand back to triage
.add_handoff(h, [t])
.with_autonomous_mode(
agents=[t, m, h],
turn_limits={"triage": 3, "math": 2, "history": 2},
)
.build()
)Four details that matter:
- Agents are factories, not singletons.
triage()builds a freshAgenteach time. Sharing aChatClientacross factories is fine (it’s stateless); sharing theAgentitself across concurrent runs is a surprise waiting to happen if any middleware is stateful. name=becomes the executor id inside the workflow. That’s what shows up onexecutor_idin stream events and what you keyturn_limitson. Keep them short and stable.- Instructions are load-bearing. Triage’s “simply acknowledge and stop — do not rewrite the answer” clause is what keeps it from taking over after the specialist replies. The specialists’ “in one short sentence” keeps them from rambling into a new handoff trigger. Drop these and you’ll watch the routing go feral.
with_autonomous_mode(...)is experimental. Without it the workflow pauses onrequest_infobetween agent turns, waiting for user input — fine for a chat UI, wrong for the scripted tests we run here. Theturn_limitskwarg is the only thing standing between a demo and a runaway.
Reading the event stream#
workflow.run(...) streams events as each agent runs. For Handoff, the interesting shapes are output (carrying streaming AgentResponseUpdate deltas per agent) and handoff_sent (one event per successful handoff tool call):
async def ask(question: str) -> tuple[list[str], str]:
workflow = build_workflow()
current: str | None = None
buffers: list[tuple[str, list[str]]] = []
handoffs: list[str] = []
async for event in workflow.run(question, stream=True):
etype = getattr(event, "type", None)
eid = getattr(event, "executor_id", "") if etype == "output" else None
if etype == "output" and eid in {"triage", "math", "history"}:
if current != eid:
current = eid
buffers.append((eid, []))
update = getattr(event, "data", None)
text = getattr(update, "text", None) if update is not None else None
if text:
buffers[-1][1].append(text)
elif etype == "handoff_sent":
data = getattr(event, "data", None)
target = getattr(data, "target", None)
if target:
handoffs.append(target)
turns = [(eid, "".join(parts).strip()) for eid, parts in buffers if any(parts)]
participants = [eid for eid, _ in turns]
final = turns[-1][1] if turns else ""
return participants, finalTwo things worth pinning down:
outputevents are streaming deltas, not complete messages.event.datais anAgentResponseUpdate— the.textfield holds the fragment emitted in this chunk, not the whole assistant turn. Aggregating consecutive events from the same executor gives you each agent’s full message.handoff_sentis the audit trail. Every time an agent calls ahandoff_to_*tool successfully, onehandoff_sentevent fires with source/target. That’s the trace you ship to Aspire to prove which routing decisions actually happened.
Running it:
uv run python tutorials/14-handoff-orchestration/python/main.py "What is 37 * 42?"
# Q: What is 37 * 42?
# Routing: triage → math
# A: 37 multiplied by 42 is 1,554.
uv run python tutorials/14-handoff-orchestration/python/main.py "When did World War 2 end?"
# Q: When did World War 2 end?
# Routing: triage → history
# A: World War 2 ended in 1945.Two real LLM calls per question: Triage decides, specialist answers. The Routing: line is built from the executor_id of each agent that produced output — same data you’d get from handoff_sent events if you were shipping to telemetry instead of printing.
Source: dotnet/Program.cs. Same three agents, same mesh, but .NET leans into the interactive pattern: the sample starts the workflow, streams one batch of turns, and exits — the same shape a chat UI would use between user messages.
using Microsoft.Agents.AI;
using Microsoft.Agents.AI.Workflows;
using Microsoft.Extensions.AI;
using OpenAI.Chat;
using ChatMessage = Microsoft.Extensions.AI.ChatMessage;
ChatClient chatClient = BuildChatClient(); // OpenAI or Azure OpenAI
AIAgent triage = chatClient.AsAIAgent(
instructions: TriageInstructions,
name: "triage_agent",
description: "Routes questions to the appropriate specialist.");
AIAgent mathTutor = chatClient.AsAIAgent(
instructions: MathInstructions,
name: "math_tutor",
description: "Specialist agent for math and arithmetic questions.");
AIAgent historyTutor = chatClient.AsAIAgent(
instructions: HistoryInstructions,
name: "history_tutor",
description: "Specialist agent for historical questions, dates, and events.");
Workflow workflow = AgentWorkflowBuilder.CreateHandoffBuilderWith(triage)
.WithHandoffs(triage, new[] { mathTutor, historyTutor })
.WithHandoffs(new[] { mathTutor, historyTutor }, triage)
.Build();Four things worth flagging before the runner code:
description:matters. The .NET builder derives the synthesisedhandoff_to_<name>tool’s JSON schema description from each target agent’sDescription. A missing or genericdescriptionproduces a schema the source agent’s LLM can’t route against — and routing quality collapses. Treat it as a prompt, not metadata.WithHandoffstakes either direction.WithHandoffs(triage, [math, history])says “triage can call handoff_to_math / handoff_to_history”;WithHandoffs([math, history], triage)says “math and history can both call handoff_to_triage_agent”. Both overloads are used in this sample.CreateHandoffBuilderWith(triage)pins the start agent. The first agent to run is whichever one you pass here; the mesh around it is built with subsequentWithHandoffscalls.- The analyzer warning.
CreateHandoffBuilderWithis marked[Experimental("MAAIW001")]in 1.1.TreatWarningsAsErrors=true(inherited from Ch12/Ch13) promotes it to a compile error. The sample.csprojadds<NoWarn>$(NoWarn);MAAIW001</NoWarn>so the rest of the warning budget stays strict. Shape is stable; the attribute is a ship-warning, not a stability warning.
The interactive run loop#
.NET uses the same InProcessExecution.RunStreamingAsync + TurnToken pattern you saw in Ch13, but for Handoff the loop is meaningfully different: one RunStreamingAsync call drives one batch of turns — Triage runs, handoff fires, specialist runs, the workflow pauses on a WorkflowOutputEvent carrying the accumulated List<ChatMessage>. The caller then decides whether to feed another user message in for a follow-up batch.
var messages = new List<ChatMessage> { new(ChatRole.User, question) };
var routing = new List<string>();
string? lastExecutorId = null;
List<ChatMessage>? newMessages = null;
await using StreamingRun run = await InProcessExecution.RunStreamingAsync(workflow, messages);
await run.TrySendMessageAsync(new TurnToken(emitEvents: true));
await foreach (WorkflowEvent evt in run.WatchStreamAsync())
{
switch (evt)
{
case AgentResponseUpdateEvent update:
if (update.ExecutorId != lastExecutorId)
{
lastExecutorId = update.ExecutorId;
routing.Add(update.ExecutorId ?? "agent");
Console.WriteLine();
Console.WriteLine($"[{update.ExecutorId}]");
}
Console.Write(update.Update.Text);
break;
case WorkflowOutputEvent output when output.Data is List<ChatMessage> list:
newMessages = list;
break;
}
}Three practical notes:
AgentResponseUpdateEventis the streaming shape. One event per token, grouped byExecutorId. Watch for the id transition to print the agent header exactly once per turn. (AgentResponseEventexists and fires once per agent when the turn completes, if you prefer non-streaming.)WorkflowOutputEvent.Datais aList<ChatMessage>, the conversation so far including the user turn you sent in. That’s the value you append future user messages to for the next batch.TurnToken(emitEvents: true)is not optional. Without the token the workflow sits wired-but-idle andWatchStreamAsync()yields nothing. Same gotcha as Ch13.
Running all together:
cd tutorials/14-handoff-orchestration/dotnet
dotnet run -- "What is 37 * 42?"
# Q: What is 37 * 42?
#
# [triage_agent]
# This is a math question. I'll route it to the math tutor.
#
# [math_tutor]
# 37 multiplied by 42 is 1,554.
#
# Routing: triage_agent -> math_tutor
# Final : 37 multiplied by 42 is 1,554.Side-by-side — Python vs .NET#
| Aspect | Python | .NET |
|---|---|---|
| Builder entry point | HandoffBuilder(participants=[...]).with_start_agent(t).build() | AgentWorkflowBuilder.CreateHandoffBuilderWith(t).Build() |
| Declare outgoing edges | .add_handoff(source, [targets]) | .WithHandoffs(source, new[] { targets }) |
| Declare incoming edges | Same .add_handoff(target, [source]) flipped | .WithHandoffs(new[] { sources }, target) overload |
| Default handoff tool name | handoff_to_<agent_name> synthesised per edge | handoff_to_<agent_name> synthesised per edge |
| Pause-for-user behaviour | Emits request_info event with HandoffAgentUserRequest payload | WorkflowOutputEvent returned to caller; caller decides whether to resume |
| Autonomous flag | .with_autonomous_mode(agents=..., turn_limits={...}, prompts={...}) | Not exposed as a builder flag; emulated via the outer caller’s run loop |
| Per-agent turn budget | turn_limits={name: int} on autonomous mode | Outer run loop bounds number of RunStreamingAsync calls |
| Observe handoffs | handoff_sent event with source/target on event.data | No dedicated event; infer from AgentResponseUpdateEvent.ExecutorId transitions |
| Per-token streaming | output events carry AgentResponseUpdate with .text fragments | AgentResponseUpdateEvent carries Update.Text fragments |
| Per-turn completion | executor_completed with list[AgentExecutorResponse] | AgentResponseEvent per agent |
| Final surface | WorkflowOutputEvent-analogue (type == "output") with conversation payload | WorkflowOutputEvent.Data as List<ChatMessage> |
| Stability marker | Autonomous mode is experimental; core builder is not | CreateHandoffBuilderWith / WithHandoffs marked [Experimental("MAAIW001")] |
Structurally the languages agree. The sharpest divergence is where the pause-for-user boundary lives: Python raises it as an explicit event inside the same run(...) call, so autonomous mode is a builder toggle that fabricates responses; .NET returns control to the caller at the same boundary, so “autonomous” is whatever outer loop the application builds. Neither is wrong; they’re different defaults for the same state machine.
Gotchas#
- Every source agent needs an outgoing edge list — even if that list is empty. If you skip
WithHandoffs(math, [...])entirely, Math has no handoff tool and must answer. That’s sometimes what you want, but the Python builder will warnmath has no handoff targetsto remind you. descriptionon each agent is the handoff prompt. The synthesisedhandoff_to_<name>tool’s JSON schema description comes from the target’sDescription(.NET) /description=(Python). A generic “specialist agent” string routes badly. Write these as if they were a prompt line, because they are.- Turn limits are mandatory in autonomous mode. Without
turn_limits={...}, the Python workflow will happily cycletriage -> math -> triage -> mathuntil you hit a quota error. Pick limits that match your minimum viable path length plus one. handoff_sentfires on successful handoff, not on attempt. If the LLM emits a malformed tool call, MAF surfaces that as a function-calling error, not a handoff event. Treathandoff_sentas “routing actually happened.”outputevents in Python are deltas.event.datais anAgentResponseUpdate, not a string. Use.textfor the fragment, aggregate consecutive events from the sameexecutor_idto reconstruct each turn. Copying Ch10’s data-event pattern here gives you partial strings.- .NET
AgentResponseUpdateEventalso streams deltas. Same aggregation discipline applies. TrackExecutorIdtransitions to cut between agents. TurnToken(emitEvents: true)forgotten = silent hang (.NET).RunStreamingAsyncreturns aStreamingRunin a wired-but-not-started state. The TurnToken is what dispatches the first superstep.- Experimental markers stay experimental.
HandoffBuilder.with_autonomous_modein Python andCreateHandoffBuilderWith/WithHandoffsin .NET are public preview in 1.1. Shape is fine for tutorials and internal tools; pin your MAF version in production so a minor upgrade doesn’t rename a method. - Don’t share a single agent across workflows. Handoff agents are mutated at build time (the synthesised handoff tools are injected into their tool list). Build a fresh agent per workflow instance; reuse the
ChatClient. - Specialist instructions want a “stop when done” clause. Without one the specialist will gratefully take every follow-up from the user that wanders into view, even if it’s no longer in their domain. The sample uses “in ONE short sentence” plus “Do not hand off back unless the question is clearly not about math” — both matter.
Tests#
Python ships 1 wiring test plus 3 real-LLM integration tests. Integration tests skip when no LLM credentials are in .env.
# Python (4 tests)
source agents/.venv/bin/activate
python -m pytest tutorials/14-handoff-orchestration/python/tests/ -v
# 4 passed (3 hit real Azure OpenAI)The tests exercise:
- Wiring —
build_workflow()returns a non-nullWorkflowwith the three participants and mesh edges. - Math routing — “What is 37 * 42?” lands in the Math specialist and the answer contains
1554(comma-tolerant). - History routing — “When did World War 2 end?” lands in the History specialist and the answer contains
1945. - Divergence — math and history questions routed in the same test session produce distinct participant sets, proving the mesh is actually routing and not always picking the same agent.
.NET:
cd tutorials/14-handoff-orchestration/dotnet
dotnet build # type-check and compile
dotnet run -- "What is 37 * 42?" # end-to-end with real LLM
dotnet run -- "When did World War 2 end?"The .NET build pins Microsoft.Agents.AI.Workflows 1.1.0, Microsoft.Agents.AI 1.1.0, Microsoft.Agents.AI.OpenAI 1.1.0, and Azure.AI.OpenAI 2.1.0 — same as Ch12/Ch13. The .csproj adds <NoWarn>$(NoWarn);MAAIW001</NoWarn> to silence the experimental attribute; TreatWarningsAsErrors=true stays in force for everything else.
How this shows up in the capstone#
agents/python/orchestrator/agent.py ships a hand-rolled router today: one call_specialist_agent(agent_name, message) tool that the orchestrator’s LLM invokes, with the orchestrator deciding which specialist to name and then calling the A2A endpoint over HTTP. That’s effectively a single giant handoff tool where the LLM picks the target from a string enum — awkward to prompt and impossible to audit without custom logging.
Phase 7 plans/refactor/10-orchestrator-to-handoff.md replaces it with HandoffBuilder. The orchestrator becomes the start agent; each specialist (ProductDiscovery, OrderManagement, PricingPromotions, ReviewSentiment, InventoryFulfillment) is a participant with explicit edges: orchestrator -> every specialist, every specialist -> orchestrator. A2A over HTTP remains the wire transport — Handoff drives orchestration in-process; A2A moves messages between service boundaries when a specialist runs in a different container.
The .NET parity port in agents/dotnet/src/ECommerceAgents.Orchestrator/ follows the same trajectory via AgentWorkflowBuilder.CreateHandoffBuilderWith(orchestrator).WithHandoffs(...). The refactor plan calls this chapter as the pattern reference and cites the mesh-plus-turn-limits guidance above for the loop-prevention strategy in production.
Further reading#
- MAF docs — Handoff Orchestration (C#) — canonical .NET reference including the interactive multi-turn loop this chapter mirrors.
- MAF docs — Handoff Orchestration (Python) — covers
HandoffBuilder, autonomous mode, tool approval inside handoffs, and checkpointed durable handoffs. - Previous chapter: Chapter 13 — Concurrent Orchestration.
- Next chapter: Chapter 15 — Group Chat Orchestration — similar mesh shape, but a manager picks the next speaker instead of the agents picking each other.

