MAF v1 — Observability with OpenTelemetry (Python + .NET)

Series note — This article is part of MAF v1: Python and .NET. The Python-only predecessor Part 5 — Observability: Tracing Multi-Agent Workflows with OpenTelemetry is the definitive deep-dive for the capstone’s 400-line telemetry module. This chapter is the portable version: the minimum plumbing to get a real trace on screen in either language, with jargon defined inline.

Repo — Full runnable code for this chapter is at https://github.com/nitin27may/e-commerce-agents/tree/main/tutorials/07-observability-otel. Clone the repo, cd tutorials/07-observability-otel, and follow the README.

Why this chapter
#

Agents fail in weird ways. The LLM called the wrong tool. The tool returned empty and the model hallucinated over it. A retry loop silently burned 40k tokens before giving up. A third-party API went from 200 ms to 4 s and every downstream request piled up behind it. None of those show up in a log line that says INFO: request completed.

What you need is a trace — a tree of timed operations (spans) that shows exactly which step took how long and what attributes it carried. MAF emits those spans out of the box. You wire one TracerProvider per process, point it at an exporter, and every agent.run / RunAsync call produces:

an invoke_agent span wrapping the turn,
a chat child span per LLM call with gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.finish_reason,
an HTTP span per network round trip to the provider (DNS, TLS, status code),
one execute_tool span per tool call inside the loop.

That’s enough to debug the four failure modes above in seconds rather than hours. A retry loop shows up as three siblings named chat under one invoke_agent, each with the same gen_ai.request.model and increasing gen_ai.usage.input_tokens. A slow LLM shows up as one chat span whose duration bar stretches past the tool and HTTP spans around it. A wrong tool shows up as an execute_tool span with an unexpected tool.name attribute — no need to grep logs; the trace view is the log, grouped the way you want to read it.

Prerequisites
#

Completed Chapter 06 — Middleware.
.env at the repo root with either OPENAI_API_KEY or the Azure OpenAI trio (AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY, AZURE_OPENAI_DEPLOYMENT).
Read-first (optional): Agents — Observability and Integrations — Aspire / Azure Monitor / Langfuse.

The concept
#

A span is a timed, named unit of work with a start time, end time, and a bag of string-typed attributes. Spans form a tree through parent-child references: the root span covers the whole request, children cover sub-operations. Together they make a trace (all spans that share a trace_id). Everything in observability — Jaeger, Azure Monitor, Datadog, the Aspire Dashboard — is a different UI on top of the same OTLP wire format carrying these spans.

MAF emits three kinds of span you care about, using the GenAI semantic attributes — a W3C-backed convention for LLM telemetry (gen_ai.operation.name, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.response.finish_reason, etc.). Every agent framework and every LLM SDK that plays along tags spans with the same keys, so one dashboard works across Python, .NET, OpenAI, Azure, Anthropic, and anything else.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor': '#2563eb','primaryTextColor': '#ffffff','primaryBorderColor': '#1e40af', 'lineColor': '#64748b','secondaryColor': '#f59e0b','tertiaryColor': '#10b981', 'background': 'transparent'}}}%% flowchart TD classDef core fill:#2563eb,stroke:#1e40af,color:#ffffff classDef external fill:#f59e0b,stroke:#b45309,color:#000000 classDef success fill:#10b981,stroke:#047857,color:#ffffff classDef infra fill:#64748b,stroke:#334155,color:#ffffff run(["agent.run / RunAsync"]) subgraph parent["Parent span — invoke_agent"] invoke["invoke_agent traced-agent
gen_ai.operation.name = invoke_agent
gen_ai.agent.name = traced-agent"] end subgraph child["Child span — chat"] chat["chat gpt-4.1
gen_ai.request.model
gen_ai.usage.input_tokens
gen_ai.usage.output_tokens
gen_ai.response.finish_reason"] end subgraph grand["Grandchild span — HTTP"] http["HTTP POST /chat/completions
http.method, http.status_code
net.peer.name"] end llm[(OpenAI / Azure OpenAI)] run --> invoke invoke --> chat chat --> http http --> llm class invoke,chat core class http infra class llm external class run success

One user request builds a three-level span tree: invoke_agent wraps the turn, chat wraps each LLM call inside it, HTTP wraps the actual network round trip. A tool-calling run adds sibling execute_tool spans under invoke_agent. The parent-child shape is what lets the dashboard show “90 % of this request was one slow LLM call”.

The pipeline — from code to dashboard
#

The SDK side is four objects chained together. Every OTel stack looks like this regardless of language.

Stage	What it is	Python	.NET
Tracer / ActivitySource	Creates spans	`trace.get_tracer("…")`	`new ActivitySource("…")`
TracerProvider	Owns tracers, wires in processors	`TracerProvider(resource=…)`	`Sdk.CreateTracerProviderBuilder()`
Span processor	Batches or forwards spans	`BatchSpanProcessor`	`BatchActivityExportProcessor` / `SimpleActivityExportProcessor`
Exporter	Serializes spans to the wire	`ConsoleSpanExporter` / `OTLPSpanExporter`	`ConsoleExporter` / `OtlpExporter`

Spans created by a tracer bubble up through the provider → processor → exporter, and land in your dashboard. Swap the exporter for OTLPSpanExporter pointed at http://aspire:4317 (Aspire’s OTLP gRPC port) and the same code that prints to console now feeds the dashboard. That’s the whole contract.

code → Tracer / ActivitySource
          │  creates
          ▼
        Span
          │  finishes
          ▼
        SpanProcessor  (batches, adds sampling, etc.)
          │  hands off
          ▼
        Exporter       (ConsoleSpanExporter | OTLPSpanExporter | ...)
          │  serializes
          ▼
        Collector / Dashboard  (Aspire :18888, Jaeger, Azure Monitor)

Every arrow is async except the first one. Which is why tests need force_flush() — spans you just created have not hit the exporter yet.

Jargon recap
#

Span — a timed, named unit of work with attributes. Parent-child edges make a trace.
Trace — all spans sharing one trace_id; usually one user request or one job.
TracerProvider — the OTel object that owns tracers. Build once per process, set globally.
ActivitySource (.NET) — .NET’s name for a tracer. System.Diagnostics.Activity is .NET’s name for a span. Same concept, older API wrapped by OTel.
Exporter — where spans go. ConsoleSpanExporter for dev, OTLPSpanExporter for prod (Aspire, Jaeger, Azure Monitor, Datadog, anything speaking OTLP).
BatchSpanProcessor — queues spans and exports in batches. The right default for prod: fewer round trips, doesn’t block the agent. SimpleActivityExportProcessor is sync/immediate — fine for tests.
OTLP — OpenTelemetry Protocol. The vendor-neutral wire format over gRPC (port 4317) or HTTP (port 4318). Every modern observability backend speaks it.
GenAI semantic attributes — standardized keys for AI spans (gen_ai.operation.name, gen_ai.request.model, gen_ai.usage.*, gen_ai.response.finish_reason). MAF applies them automatically.
Aspire Dashboard — .NET’s dev-time telemetry UI. Runs at :18888 in this repo; consumes OTLP, renders traces / metrics / logs.
enable_sensitive_data (Python) — toggle on enable_instrumentation(...) that includes full prompts and responses as span attributes. Off by default. On for local debugging; off in prod.

Code walkthrough
#

Full source: dotnet/Program.cs. The tracer-provider factory mirrors the Python one:

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

public static readonly string[] ActivitySources = new[]
{
    "Microsoft.Agents.AI",
    "Microsoft.Extensions.AI",
    "*",
};

public static TracerProvider BuildTracerProvider(BaseExporter<Activity> exporter)
{
    return Sdk.CreateTracerProviderBuilder()
        .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("maf-v1-ch07"))
        .AddSource(ActivitySources)
        .AddProcessor(new SimpleActivityExportProcessor(exporter))
        .Build()!;
}

using var tracer = BuildTracerProvider(new ConsoleExporter());
var agent = BuildAgent();
await agent.RunAsync("What is C#?");

Three differences worth staring at:

There is no enable_instrumentation() on .NET. MAF’s .NET libraries emit spans unconditionally from named ActivitySource instances; you opt in by telling the tracer provider which source names to listen to. Microsoft.Agents.AI carries the agent-run spans; Microsoft.Extensions.AI carries the chat-client spans; the "*" wildcard catches everything else (HTTP, DNS, TLS).
SimpleActivityExportProcessor is synchronous. It exports each span as it finishes — convenient for chapter demos and tests, but the prod default is BatchActivityExportProcessor or the batching you get automatically when you call .AddOtlpExporter(...).
Sensitive data is an env var, not a method argument. Set OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true to include prompts and completions as span attributes.

Run it:

cd tutorials/07-observability-otel/dotnet
dotnet run -- "What is C#?"
# Console prints:
#   [span] POST localhost:... | tags: 6        ← HTTP call
#   [span] chat gpt-4.1       | tags: 9        ← chat child
#   [span] invoke_agent traced-agent | tags: 5 ← parent

Spans print in the order they finish, not the order they start, because a parent stays open until all its children close.

Full source: python/main.py. The setup function is seven lines of actual work:

from agent_framework import Agent
from agent_framework.observability import enable_instrumentation
from agent_framework.openai import OpenAIChatClient, OpenAIChatCompletionClient
from opentelemetry import trace
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter


def setup_tracing(service_name: str = "maf-v1-ch07", exporter=None) -> TracerProvider:
    resource = Resource.create({SERVICE_NAME: service_name})
    provider = TracerProvider(resource=resource)
    provider.add_span_processor(BatchSpanProcessor(exporter or ConsoleSpanExporter()))
    trace.set_tracer_provider(provider)
    enable_instrumentation(enable_sensitive_data=True)
    return provider

Four things are happening:

Resource carries the service name that labels every span this process emits — how the dashboard groups spans into swim lanes.
TracerProvider is the global handle; once trace.set_tracer_provider(...) is called, every trace.get_tracer(...) in the process sees it. Calling it twice is a warning and a no-op.
BatchSpanProcessor + ConsoleSpanExporter is the span pipeline. Swap ConsoleSpanExporter for OTLPSpanExporter(endpoint="http://aspire:4317", insecure=True) to send to Aspire.
enable_instrumentation(enable_sensitive_data=True) is the MAF-specific line — it opts in the auto-instrumentation that wraps ChatClient.get_response(...) with a chat span carrying GenAI attributes. Leave enable_sensitive_data=False in production.

Run it:

cd tutorials/07-observability-otel/python
uv sync
uv run python main.py "What is Python?"
# prints the answer and ~3-5 spans per run

Observability UI — Aspire Dashboard, and where DevUI fits
#

For dev, point the OTLP exporter at the Aspire Dashboard that already ships in this repo’s compose stack. It runs at localhost:18888, consumes OTLP on gRPC port 4317, and renders traces, metrics, and structured logs in one pane. The swap on either side:

# Python — replace ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(endpoint="http://localhost:18889", insecure=True)

// .NET — replace the console exporter / use OTLP extension
tracing.AddOtlpExporter(opts => opts.Endpoint = new Uri("http://localhost:18889"));

In the Aspire traces view you will see the three-level tree from the diagram above — invoke_agent at the top, chat beneath it, HTTP under that — with gen_ai.* attributes in the side panel and duration in milliseconds on each bar. The GenAI side tab (available once OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental is set on the producer) groups spans by gen_ai.conversation.id, turning a conversation into a single scrollable timeline.

For production, the same code points at a different OTLP endpoint:

Azure Monitor / Application Insights — set APPLICATIONINSIGHTS_CONNECTION_STRING and add azure-monitor-opentelemetry (Python) / Azure.Monitor.OpenTelemetry.AspNetCore (.NET). Everything else is unchanged.
Jaeger / Tempo / Grafana Cloud — run Jaeger-all-in-one locally (docker run jaegertracing/all-in-one) and point the OTLP endpoint at http://localhost:4317.
Langfuse — LLM-specific backend that understands gen_ai.* attributes natively and shows prompt/completion diffs between runs.

DevUI is different. DevUI is the MAF-native browser dashboard for interactively driving an agent or workflow: you type a prompt, watch tool calls fire, inspect intermediate state. Aspire visualises passive telemetry; DevUI is an active test harness. Chapter 20b — DevUI covers it end-to-end.

Side-by-side differences
#

Aspect	Python	.NET
Enable MAF instrumentation	`enable_instrumentation(enable_sensitive_data=False)`	`.AddSource("Microsoft.Agents.AI", "Microsoft.Extensions.AI")`
Span primitive	OpenTelemetry `Span`	`System.Diagnostics.Activity` (wrapped by OTel)
Tracer primitive	`trace.get_tracer("name")`	`new ActivitySource("name")`
Default processor in chapter	`BatchSpanProcessor`	`SimpleActivityExportProcessor`
Default exporter in chapter	`ConsoleSpanExporter`	Custom `ConsoleExporter`
Swap to OTLP	`OTLPSpanExporter(endpoint=…, insecure=True)`	`.AddOtlpExporter(opts => opts.Endpoint = …)`
GenAI attributes	`gen_ai.operation.name`, `gen_ai.request.model`, `gen_ai.usage.*`	Same attribute names (OTel cross-language convention)
Sensitive data toggle	`enable_sensitive_data=True` on `enable_instrumentation`	Env var `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true`

Both produce the same OTLP wire bytes. A Python process and a .NET process exporting to the same Aspire instance appear in the same trace if you propagate the context header (traceparent) across A2A calls — the capstone does exactly this.

Gotchas
#

TracerProvider is a process-global singleton. Python warns and ignores a second trace.set_tracer_provider(...). In .NET the second Build() returns a new disposable but existing sources stay pinned to the first. Tests must share one provider (see tests/test_observability.py’s module-level setup_tracing and _EXPORTER.clear() between tests).
.NET needs explicit ActivitySource names. Without AddSource("Microsoft.Agents.AI", "Microsoft.Extensions.AI") you will get zero MAF spans and spend an afternoon wondering why. The wildcard "*" is convenient for demos but blanket-captures every library using ActivitySource — noisy in prod.
enable_sensitive_data=True prints prompts and responses. Useful for debugging a failed tool selection locally; a compliance incident in prod. Default is False. In the capstone it is gated on GENAI_CAPTURE_CONTENT=true in settings.
Cost impact of sensitive-data mode. Spans grow from a few hundred bytes to several KB when they include full completions. At 20 spans per request, 100 rps, you can push tens of GB per hour to your collector. Aspire tolerates it locally; Azure Monitor and Datadog will bill you.
Sampling. The SDK default is AlwaysOn — every span exported. Fine for a demo, expensive in prod. For high-QPS services, configure TraceIdRatioBased(0.1) in Python / .SetSampler(new TraceIdRatioBasedSampler(0.1)) in .NET. Head sampling is per-trace, so once the root is sampled the whole tree is kept — no broken trees.
Exporter choice matters. ConsoleSpanExporter is fine for a demo but flushes synchronously — don’t use it in a hot path. BatchSpanProcessor + OTLPSpanExporter is the prod default.
Async flush in tests. Both languages buffer spans. If you assert on get_finished_spans() without provider.force_flush() / await Task.Delay(500), you will get a flaky test.
Trace context across A2A calls. If you spawn a subprocess, queue a Service Bus message, or POST to another agent, the new process has a new trace_id by default. Propagate the W3C traceparent header (both httpx and HttpClient instrumentations do this automatically) or the child process’s spans will live in their own orphan trace.
Clock skew between services. Dashboards render timelines by absolute timestamp. Containers with drifted clocks produce traces with children that appear to start before their parents. Run NTP or rely on your container runtime’s clock; there is no OTel fix at the trace level.

Tests
#

# Python — 3 integration tests
cd tutorials/07-observability-otel/python
uv run pytest -v

# .NET — 2 integration tests
cd tutorials/07-observability-otel/dotnet
dotnet test tests/Observability.Tests.csproj

All 5 tests green — both stacks proven to emit real spans for real agent runs against Azure OpenAI. The most interesting assertion is test_spans_include_genai_attributes: it filters span attribute keys by startswith("gen_ai.") and fails if none are present. That test is the regression canary — if a future MAF version drops GenAI attributes or changes their prefix, this test screams before the capstone does.

How this shows up in the capstone
#

Chapter 07 is the simplified version. The capstone telemetry module is 400 lines of production wiring at agents/python/shared/telemetry.py:

setup_telemetry(service_name) (telemetry.py:30-52) — guarded by OTEL_ENABLED, safe to call when Aspire is unreachable, never throws into the agent host.
_do_setup(...) (telemetry.py:54-108) — three providers in one call: TracerProvider, MeterProvider (metrics every 5 s), LoggerProvider bridging Python logging → OTel. Tries gRPC OTLP first and falls back to HTTP.
_instrument_openai() (telemetry.py:356-377) — OpenAIInstrumentor().instrument() auto-wraps the OpenAI SDK with the GenAI conventions.
instrument_fastapi(app) / instrument_starlette(app) (telemetry.py:110-129) — one-line auto-instrumentation for the HTTP layer.
agent_run_span(agent_name) (telemetry.py:183-217) — context manager tagging the span with gen_ai.operation.name=invoke_agent and enriching it from ContextVars (user email, session id, gen_ai.conversation.id).
a2a_call_span(source, target, url) (telemetry.py:220-247) — SpanKind.CLIENT span wrapping an inter-agent HTTP POST.
tool_call_span(tool_name) (telemetry.py:250-273) — wraps individual tool invocations inside the tool-calling loop.
traced_tool (telemetry.py:276-308) — decorator that layers a span around an @tool-decorated function when MAF doesn’t emit one natively.

Wiring from a service: agents/python/product_discovery/main.py:12-14 calls setup_telemetry("ecommerce.product-discovery") and instrument_fastapi(app) inside the FastAPI on_startup hook. Every specialist does this identically.

The .NET equivalent lives at agents/dotnet/src/ECommerceAgents.Shared/Telemetry/TelemetrySetup.cs. AddAgentTelemetry(settings) on IServiceCollection wires AddAspNetCoreInstrumentation, AddHttpClientInstrumentation, and AddOtlpExporter in one call.

Both stacks export to the same Aspire Dashboard at :18888, which renders the full orchestrator → A2A → specialist → tool → LLM tree for any user request in the capstone — mixed Python and .NET in one flame graph.

What’s next
#

Next chapter: Chapter 08 — MCP Tools — MCP calls appear as child spans under invoke_agent with zero extra wiring.
Jump to the interactive flip side: Chapter 20b — DevUI.
Full source: python/ · dotnet/