Series note — This article is part of MAF v1: Python and .NET. The Python-only predecessor Part 5 — Observability: Tracing Multi-Agent Workflows with OpenTelemetry is the definitive deep-dive for the capstone’s 400-line telemetry module. This chapter is the portable version: the minimum plumbing to get a real trace on screen in either language, with jargon defined inline.
Repo — Full runnable code for this chapter is at https://github.com/nitin27may/e-commerce-agents/tree/main/tutorials/07-observability-otel. Clone the repo,
cd tutorials/07-observability-otel, and follow the README.
Why this chapter#
Agents fail in weird ways. The LLM called the wrong tool. The tool returned empty and the model hallucinated over it. A retry loop silently burned 40k tokens before giving up. A third-party API went from 200 ms to 4 s and every downstream request piled up behind it. None of those show up in a log line that says INFO: request completed.
What you need is a trace — a tree of timed operations (spans) that shows exactly which step took how long and what attributes it carried. MAF emits those spans out of the box. You wire one TracerProvider per process, point it at an exporter, and every agent.run / RunAsync call produces:
- an
invoke_agentspan wrapping the turn, - a
chatchild span per LLM call withgen_ai.request.model,gen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.response.finish_reason, - an HTTP span per network round trip to the provider (DNS, TLS, status code),
- one
execute_toolspan per tool call inside the loop.
That’s enough to debug the four failure modes above in seconds rather than hours. A retry loop shows up as three siblings named chat under one invoke_agent, each with the same gen_ai.request.model and increasing gen_ai.usage.input_tokens. A slow LLM shows up as one chat span whose duration bar stretches past the tool and HTTP spans around it. A wrong tool shows up as an execute_tool span with an unexpected tool.name attribute — no need to grep logs; the trace view is the log, grouped the way you want to read it.
Prerequisites#
- Completed Chapter 06 — Middleware.
.envat the repo root with eitherOPENAI_API_KEYor the Azure OpenAI trio (AZURE_OPENAI_ENDPOINT,AZURE_OPENAI_KEY,AZURE_OPENAI_DEPLOYMENT).- Read-first (optional): Agents — Observability and Integrations — Aspire / Azure Monitor / Langfuse.
The concept#
A span is a timed, named unit of work with a start time, end time, and a bag of string-typed attributes. Spans form a tree through parent-child references: the root span covers the whole request, children cover sub-operations. Together they make a trace (all spans that share a trace_id). Everything in observability — Jaeger, Azure Monitor, Datadog, the Aspire Dashboard — is a different UI on top of the same OTLP wire format carrying these spans.
MAF emits three kinds of span you care about, using the GenAI semantic attributes — a W3C-backed convention for LLM telemetry (gen_ai.operation.name, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.response.finish_reason, etc.). Every agent framework and every LLM SDK that plays along tags spans with the same keys, so one dashboard works across Python, .NET, OpenAI, Azure, Anthropic, and anything else.
gen_ai.operation.name = invoke_agent
gen_ai.agent.name = traced-agent"] end subgraph child["Child span — chat"] chat["chat gpt-4.1
gen_ai.request.model
gen_ai.usage.input_tokens
gen_ai.usage.output_tokens
gen_ai.response.finish_reason"] end subgraph grand["Grandchild span — HTTP"] http["HTTP POST /chat/completions
http.method, http.status_code
net.peer.name"] end llm[(OpenAI / Azure OpenAI)] run --> invoke invoke --> chat chat --> http http --> llm class invoke,chat core class http infra class llm external class run success
One user request builds a three-level span tree: invoke_agent wraps the turn, chat wraps each LLM call inside it, HTTP wraps the actual network round trip. A tool-calling run adds sibling execute_tool spans under invoke_agent. The parent-child shape is what lets the dashboard show “90 % of this request was one slow LLM call”.
The pipeline — from code to dashboard#
The SDK side is four objects chained together. Every OTel stack looks like this regardless of language.
| Stage | What it is | Python | .NET |
|---|---|---|---|
| Tracer / ActivitySource | Creates spans | trace.get_tracer("…") | new ActivitySource("…") |
| TracerProvider | Owns tracers, wires in processors | TracerProvider(resource=…) | Sdk.CreateTracerProviderBuilder() |
| Span processor | Batches or forwards spans | BatchSpanProcessor | BatchActivityExportProcessor / SimpleActivityExportProcessor |
| Exporter | Serializes spans to the wire | ConsoleSpanExporter / OTLPSpanExporter | ConsoleExporter / OtlpExporter |
Spans created by a tracer bubble up through the provider → processor → exporter, and land in your dashboard. Swap the exporter for OTLPSpanExporter pointed at http://aspire:4317 (Aspire’s OTLP gRPC port) and the same code that prints to console now feeds the dashboard. That’s the whole contract.
code → Tracer / ActivitySource
│ creates
▼
Span
│ finishes
▼
SpanProcessor (batches, adds sampling, etc.)
│ hands off
▼
Exporter (ConsoleSpanExporter | OTLPSpanExporter | ...)
│ serializes
▼
Collector / Dashboard (Aspire :18888, Jaeger, Azure Monitor)Every arrow is async except the first one. Which is why tests need force_flush() — spans you just created have not hit the exporter yet.
Jargon recap#
- Span — a timed, named unit of work with attributes. Parent-child edges make a trace.
- Trace — all spans sharing one
trace_id; usually one user request or one job. TracerProvider— the OTel object that owns tracers. Build once per process, set globally.ActivitySource(.NET) — .NET’s name for a tracer.System.Diagnostics.Activityis .NET’s name for a span. Same concept, older API wrapped by OTel.- Exporter — where spans go.
ConsoleSpanExporterfor dev,OTLPSpanExporterfor prod (Aspire, Jaeger, Azure Monitor, Datadog, anything speaking OTLP). BatchSpanProcessor— queues spans and exports in batches. The right default for prod: fewer round trips, doesn’t block the agent.SimpleActivityExportProcessoris sync/immediate — fine for tests.- OTLP — OpenTelemetry Protocol. The vendor-neutral wire format over gRPC (port 4317) or HTTP (port 4318). Every modern observability backend speaks it.
- GenAI semantic attributes — standardized keys for AI spans (
gen_ai.operation.name,gen_ai.request.model,gen_ai.usage.*,gen_ai.response.finish_reason). MAF applies them automatically. - Aspire Dashboard — .NET’s dev-time telemetry UI. Runs at
:18888in this repo; consumes OTLP, renders traces / metrics / logs. enable_sensitive_data(Python) — toggle onenable_instrumentation(...)that includes full prompts and responses as span attributes. Off by default. On for local debugging; off in prod.
Code walkthrough#
Full source: dotnet/Program.cs. The tracer-provider factory mirrors the Python one:
using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
public static readonly string[] ActivitySources = new[]
{
"Microsoft.Agents.AI",
"Microsoft.Extensions.AI",
"*",
};
public static TracerProvider BuildTracerProvider(BaseExporter<Activity> exporter)
{
return Sdk.CreateTracerProviderBuilder()
.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("maf-v1-ch07"))
.AddSource(ActivitySources)
.AddProcessor(new SimpleActivityExportProcessor(exporter))
.Build()!;
}
using var tracer = BuildTracerProvider(new ConsoleExporter());
var agent = BuildAgent();
await agent.RunAsync("What is C#?");Three differences worth staring at:
- There is no
enable_instrumentation()on .NET. MAF’s .NET libraries emit spans unconditionally from namedActivitySourceinstances; you opt in by telling the tracer provider which source names to listen to.Microsoft.Agents.AIcarries the agent-run spans;Microsoft.Extensions.AIcarries the chat-client spans; the"*"wildcard catches everything else (HTTP, DNS, TLS). SimpleActivityExportProcessoris synchronous. It exports each span as it finishes — convenient for chapter demos and tests, but the prod default isBatchActivityExportProcessoror the batching you get automatically when you call.AddOtlpExporter(...).- Sensitive data is an env var, not a method argument. Set
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=trueto include prompts and completions as span attributes.
Run it:
cd tutorials/07-observability-otel/dotnet
dotnet run -- "What is C#?"
# Console prints:
# [span] POST localhost:... | tags: 6 ← HTTP call
# [span] chat gpt-4.1 | tags: 9 ← chat child
# [span] invoke_agent traced-agent | tags: 5 ← parentSpans print in the order they finish, not the order they start, because a parent stays open until all its children close.
Full source: python/main.py. The setup function is seven lines of actual work:
from agent_framework import Agent
from agent_framework.observability import enable_instrumentation
from agent_framework.openai import OpenAIChatClient, OpenAIChatCompletionClient
from opentelemetry import trace
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
def setup_tracing(service_name: str = "maf-v1-ch07", exporter=None) -> TracerProvider:
resource = Resource.create({SERVICE_NAME: service_name})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(exporter or ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
enable_instrumentation(enable_sensitive_data=True)
return providerFour things are happening:
Resourcecarries the service name that labels every span this process emits — how the dashboard groups spans into swim lanes.TracerProvideris the global handle; oncetrace.set_tracer_provider(...)is called, everytrace.get_tracer(...)in the process sees it. Calling it twice is a warning and a no-op.BatchSpanProcessor + ConsoleSpanExporteris the span pipeline. SwapConsoleSpanExporterforOTLPSpanExporter(endpoint="http://aspire:4317", insecure=True)to send to Aspire.enable_instrumentation(enable_sensitive_data=True)is the MAF-specific line — it opts in the auto-instrumentation that wrapsChatClient.get_response(...)with achatspan carrying GenAI attributes. Leaveenable_sensitive_data=Falsein production.
Run it:
cd tutorials/07-observability-otel/python
uv sync
uv run python main.py "What is Python?"
# prints the answer and ~3-5 spans per runObservability UI — Aspire Dashboard, and where DevUI fits#
For dev, point the OTLP exporter at the Aspire Dashboard that already ships in this repo’s compose stack. It runs at localhost:18888, consumes OTLP on gRPC port 4317, and renders traces, metrics, and structured logs in one pane. The swap on either side:
# Python — replace ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(endpoint="http://localhost:18889", insecure=True)// .NET — replace the console exporter / use OTLP extension
tracing.AddOtlpExporter(opts => opts.Endpoint = new Uri("http://localhost:18889"));In the Aspire traces view you will see the three-level tree from the diagram above — invoke_agent at the top, chat beneath it, HTTP under that — with gen_ai.* attributes in the side panel and duration in milliseconds on each bar. The GenAI side tab (available once OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental is set on the producer) groups spans by gen_ai.conversation.id, turning a conversation into a single scrollable timeline.
For production, the same code points at a different OTLP endpoint:
- Azure Monitor / Application Insights — set
APPLICATIONINSIGHTS_CONNECTION_STRINGand addazure-monitor-opentelemetry(Python) /Azure.Monitor.OpenTelemetry.AspNetCore(.NET). Everything else is unchanged. - Jaeger / Tempo / Grafana Cloud — run Jaeger-all-in-one locally (
docker run jaegertracing/all-in-one) and point the OTLP endpoint athttp://localhost:4317. - Langfuse — LLM-specific backend that understands
gen_ai.*attributes natively and shows prompt/completion diffs between runs.
DevUI is different. DevUI is the MAF-native browser dashboard for interactively driving an agent or workflow: you type a prompt, watch tool calls fire, inspect intermediate state. Aspire visualises passive telemetry; DevUI is an active test harness. Chapter 20b — DevUI covers it end-to-end.
Side-by-side differences#
| Aspect | Python | .NET |
|---|---|---|
| Enable MAF instrumentation | enable_instrumentation(enable_sensitive_data=False) | .AddSource("Microsoft.Agents.AI", "Microsoft.Extensions.AI") |
| Span primitive | OpenTelemetry Span | System.Diagnostics.Activity (wrapped by OTel) |
| Tracer primitive | trace.get_tracer("name") | new ActivitySource("name") |
| Default processor in chapter | BatchSpanProcessor | SimpleActivityExportProcessor |
| Default exporter in chapter | ConsoleSpanExporter | Custom ConsoleExporter |
| Swap to OTLP | OTLPSpanExporter(endpoint=…, insecure=True) | .AddOtlpExporter(opts => opts.Endpoint = …) |
| GenAI attributes | gen_ai.operation.name, gen_ai.request.model, gen_ai.usage.* | Same attribute names (OTel cross-language convention) |
| Sensitive data toggle | enable_sensitive_data=True on enable_instrumentation | Env var OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true |
Both produce the same OTLP wire bytes. A Python process and a .NET process exporting to the same Aspire instance appear in the same trace if you propagate the context header (traceparent) across A2A calls — the capstone does exactly this.
Gotchas#
TracerProvideris a process-global singleton. Python warns and ignores a secondtrace.set_tracer_provider(...). In .NET the secondBuild()returns a new disposable but existing sources stay pinned to the first. Tests must share one provider (seetests/test_observability.py’s module-levelsetup_tracingand_EXPORTER.clear()between tests)..NETneeds explicitActivitySourcenames. WithoutAddSource("Microsoft.Agents.AI", "Microsoft.Extensions.AI")you will get zero MAF spans and spend an afternoon wondering why. The wildcard"*"is convenient for demos but blanket-captures every library usingActivitySource— noisy in prod.enable_sensitive_data=Trueprints prompts and responses. Useful for debugging a failed tool selection locally; a compliance incident in prod. Default isFalse. In the capstone it is gated onGENAI_CAPTURE_CONTENT=truein settings.- Cost impact of sensitive-data mode. Spans grow from a few hundred bytes to several KB when they include full completions. At 20 spans per request, 100 rps, you can push tens of GB per hour to your collector. Aspire tolerates it locally; Azure Monitor and Datadog will bill you.
- Sampling. The SDK default is
AlwaysOn— every span exported. Fine for a demo, expensive in prod. For high-QPS services, configureTraceIdRatioBased(0.1)in Python /.SetSampler(new TraceIdRatioBasedSampler(0.1))in .NET. Head sampling is per-trace, so once the root is sampled the whole tree is kept — no broken trees. - Exporter choice matters.
ConsoleSpanExporteris fine for a demo but flushes synchronously — don’t use it in a hot path.BatchSpanProcessor + OTLPSpanExporteris the prod default. - Async flush in tests. Both languages buffer spans. If you assert on
get_finished_spans()withoutprovider.force_flush()/await Task.Delay(500), you will get a flaky test. - Trace context across A2A calls. If you spawn a subprocess, queue a Service Bus message, or POST to another agent, the new process has a new
trace_idby default. Propagate the W3Ctraceparentheader (bothhttpxandHttpClientinstrumentations do this automatically) or the child process’s spans will live in their own orphan trace. - Clock skew between services. Dashboards render timelines by absolute timestamp. Containers with drifted clocks produce traces with children that appear to start before their parents. Run NTP or rely on your container runtime’s clock; there is no OTel fix at the trace level.
Tests#
# Python — 3 integration tests
cd tutorials/07-observability-otel/python
uv run pytest -v
# .NET — 2 integration tests
cd tutorials/07-observability-otel/dotnet
dotnet test tests/Observability.Tests.csprojAll 5 tests green — both stacks proven to emit real spans for real agent runs against Azure OpenAI. The most interesting assertion is test_spans_include_genai_attributes: it filters span attribute keys by startswith("gen_ai.") and fails if none are present. That test is the regression canary — if a future MAF version drops GenAI attributes or changes their prefix, this test screams before the capstone does.
How this shows up in the capstone#
Chapter 07 is the simplified version. The capstone telemetry module is 400 lines of production wiring at agents/python/shared/telemetry.py:
setup_telemetry(service_name)(telemetry.py:30-52) — guarded byOTEL_ENABLED, safe to call when Aspire is unreachable, never throws into the agent host._do_setup(...)(telemetry.py:54-108) — three providers in one call:TracerProvider,MeterProvider(metrics every 5 s),LoggerProviderbridging Pythonlogging→ OTel. Tries gRPC OTLP first and falls back to HTTP._instrument_openai()(telemetry.py:356-377) —OpenAIInstrumentor().instrument()auto-wraps the OpenAI SDK with the GenAI conventions.instrument_fastapi(app)/instrument_starlette(app)(telemetry.py:110-129) — one-line auto-instrumentation for the HTTP layer.agent_run_span(agent_name)(telemetry.py:183-217) — context manager tagging the span withgen_ai.operation.name=invoke_agentand enriching it from ContextVars (user email, session id,gen_ai.conversation.id).a2a_call_span(source, target, url)(telemetry.py:220-247) —SpanKind.CLIENTspan wrapping an inter-agent HTTP POST.tool_call_span(tool_name)(telemetry.py:250-273) — wraps individual tool invocations inside the tool-calling loop.traced_tool(telemetry.py:276-308) — decorator that layers a span around an@tool-decorated function when MAF doesn’t emit one natively.
Wiring from a service: agents/python/product_discovery/main.py:12-14 calls setup_telemetry("ecommerce.product-discovery") and instrument_fastapi(app) inside the FastAPI on_startup hook. Every specialist does this identically.
The .NET equivalent lives at agents/dotnet/src/ECommerceAgents.Shared/Telemetry/TelemetrySetup.cs. AddAgentTelemetry(settings) on IServiceCollection wires AddAspNetCoreInstrumentation, AddHttpClientInstrumentation, and AddOtlpExporter in one call.
Both stacks export to the same Aspire Dashboard at :18888, which renders the full orchestrator → A2A → specialist → tool → LLM tree for any user request in the capstone — mixed Python and .NET in one flame graph.
Further reading#
- Canonical README: tutorials/07-observability-otel
- MAF docs — Observability
- MAF docs — Integrations (Aspire, Azure Monitor, Langfuse)
- OTel GenAI semantic conventions
- OTLP protocol spec
What’s next#
- Next chapter: Chapter 08 — MCP Tools — MCP calls appear as child spans under
invoke_agentwith zero extra wiring. - Jump to the interactive flip side: Chapter 20b — DevUI.
- Full source:
python/·dotnet/

