MAF v1 — Prompt engineering (Python + .NET)

MAF v1: Python and .NET - This article is part of a series.

Part 24: This Article

Part 25: MAF v1 — Deployment with Docker and Compose

Part 20.1: MAF v1 — DevUI for agents and workflows (Python)

Part 20.2: MAF v1 — Production hardening (Python + .NET)

Series note — Appendix to MAF v1: Python and .NET. Sits after Ch23 — Evaluation framework. Supersedes the original Python-only Part 2 — Prompt Engineering for AI Agents. The big change in this version: role-specific behavior moves out of the system prompt and into a context provider, so you don’t need to re-instantiate the agent per role. The YAML composition for the base prompt stays, because it’s still the right shape.

Capstone code — This chapter doesn’t ship a standalone tutorial folder. The runnable prompt-composition system lives in the e-commerce-agents capstone: agents/python/shared/prompt_loader.py (the loader) and agents/python/config/prompts/ (per-agent YAML files + _shared/ fragments).

Why this chapter
#

Every chapter in this series has shown agents with one-paragraph system prompts. That’s fine for a tutorial. It is not what the capstone runs. A real agent system prompt does five things at once: declares the agent’s identity, lists its capabilities, sets constraints (especially anti-hallucination), shows tool-usage patterns, and pins output formatting. Each of those concerns evolves on a different cadence — grounding rules barely change, tool examples change every time you ship a new tool, role context changes per request. Putting all of it into one Python string in agent.py works for one agent on one prompt; it falls apart at six agents and forty prompt iterations.

This chapter walks the prompt architecture the capstone settled on: base prompt composed from YAML at boot, role context injected per-request via a context provider, grounding rules shared across every agent. The framework angle is using MAF v1’s instructions= parameter (the canonical system-prompt slot) for stable content and the context-provider lifecycle for everything that changes per request.

Prerequisites
#

Completed Ch01 — Your First Agent (you understand Agent(instructions=...)).
Completed Ch05 — Context Providers (the role-context pattern below uses the before_run() hook).
Familiar with YAML — no advanced features, just sections and string blocks.

What you’ll learn
#

The five concerns every non-trivial system prompt has to address, and why they go in different layers.
A four-file YAML layout (_shared/grounding-rules.yaml, _shared/schema-context.yaml, _shared/tool-examples.yaml, <agent>.yaml) that scales across six agents without duplication.
Why role-specific behavior belongs in a RoleContextProvider (per-request) rather than a load_prompt(agent, role) call (per-agent-instance).
The lru_cache trick that keeps YAML loading at zero-I/O after first request without losing dev-time hot reload.

The five concerns
#

A chatbot prompt can get away with “You are a helpful assistant.” An agent prompt cannot. An agent makes decisions: which tool to call, what arguments to pass, whether to route to another agent, how to render the result. A vague prompt yields vague decisions, and vague decisions against a live database produce real bugs.

The five concerns:

Concern	Lives where	Cadence of change
Identity	Base prompt	Rare — defines the agent
Capabilities	Base prompt	When tools are added/removed
Constraints (grounding)	Shared YAML, applied to every agent	Once per quarter
Tool guidance	Per-domain YAML, referenced by agents that need it	Every tool change
Role context	Context provider, per-request	Per request

The architectural mistake the original capstone made was putting all five into the same instructions= string and re-creating the agent for every role. That works, but it forces an agent restart-cycle on every prompt iteration and burns context window when only a small slice (the role section) actually varies. Pulling role into a context provider fixes both.

File layout
#

Four files. One per agent, three shared.

agents/python/config/prompts/
├── _shared/
│   ├── grounding-rules.yaml      # Universal anti-hallucination rules
│   ├── schema-context.yaml       # Database schema by domain
│   └── tool-examples.yaml        # Tool usage patterns by domain
├── orchestrator.yaml             # Per-agent identity + references
├── product-discovery.yaml
├── order-management.yaml
├── pricing-promotions.yaml
├── review-sentiment.yaml
└── inventory-fulfillment.yaml

Per-agent files are tiny — they declare identity and pull shared sections by reference:

# product-discovery.yaml
name: product-discovery
version: "2.0"

system_prompt:
  base: |
    You are the product-discovery agent for ECommerce Agents. Your job is
    to help users find products in our catalog through search, comparison,
    and personalised recommendations grounded in the actual database.

    You have access to seven tools spanning keyword search, semantic
    search, price comparison, and trending products.

  schema_refs:
    - product
    - inventory

  tool_example_refs:
    - product_tools

Shared files are bigger but only have to be edited once for system-wide changes:

# _shared/grounding-rules.yaml
rules: |
  ## CRITICAL: Data Grounding Rules
  1. Always call a tool for data. Never invent product names, prices,
     order IDs, tracking numbers, or any specific value — even if the user
     supplies one. Verify with a tool before quoting it.
  2. Report results honestly. If a tool returns an error or empty result,
     say "I couldn't find that" — do not paper over the gap with content
     from training data.
  3. Say "I don't know" out loud. When you have no information and no tool
     can fetch it, admit it.
  4. Render tool results faithfully. Use bullet points or the rich-card
     code blocks below; never paraphrase numbers.

(The four-rule consolidation came from the audit of the original Part 2 — see the revised version of that article for the longer explanation.)

# _shared/schema-context.yaml
product: |
  ## Database: Products
  - **products**: id (UUID), name, description, category, brand, price,
    original_price, image_url, rating (1.0-5.0), review_count, specs (JSONB)
    - Categories: Electronics, Clothing, Home, Sports, Books
    - If original_price > price, the product is on sale
  - **product_embeddings**: product_id, embedding (vector 1536-dim)
  - **price_history**: product_id, price, recorded_at — 90 days of daily prices

order: |
  ## Database: Orders & Returns
  - **orders**: id (UUID), user_id, status, total, shipping_address (JSONB), …
    - Status flow: placed -> confirmed -> shipped -> out_for_delivery -> delivered
    - Also: cancelled (from placed/confirmed), returned (from delivered)

# _shared/tool-examples.yaml
product_tools: |
  ## Tool Usage Guide

  ### search_products(query, max_price?, category?, limit?)
  Keyword search by name and description. Use when the user names a product
  or characteristic.
    - search_products("noise cancelling headphones")
    - search_products("running shoes", max_price=150, category="Sports")
  -> Returns: [{"id": "...", "name": "...", "price": 199.99, "rating": 4.6, …}]

  ### semantic_search(intent, limit?)
  Vector search by user intent (pgvector). Use when the user describes
  a need rather than a product.
    - semantic_search("something to help me sleep on flights")
    - semantic_search("birthday gift for a runner", limit=5)
  -> Returns same shape as search_products.

Identical structure, different concerns. A new agent is one file. A new tool is one block append.

The base loader (Python)
#

Pure composition. No role logic in here — that lives in the context provider.

# shared/prompt_loader.py
from __future__ import annotations
from functools import lru_cache
from pathlib import Path
import yaml

PROMPTS_DIR = Path(__file__).parent.parent / "config" / "prompts"
SHARED_DIR  = PROMPTS_DIR / "_shared"

@lru_cache(maxsize=32)
def _load_yaml(path: Path) -> dict:
    with open(path) as f:
        return yaml.safe_load(f) or {}

@lru_cache(maxsize=16)
def _shared(filename: str) -> dict:
    return _load_yaml(SHARED_DIR / filename)

def load_base_prompt(agent_name: str) -> str:
    """Compose the role-independent system prompt for an agent.

    Sections, in order:
        base + grounding-rules + schema-refs + tool-example-refs
    """
    cfg = _load_yaml(PROMPTS_DIR / f"{agent_name}.yaml")
    sp  = cfg.get("system_prompt", {})

    parts: list[str] = []
    if base := sp.get("base", "").strip():
        parts.append(base)

    if rules := _shared("grounding-rules.yaml").get("rules", "").strip():
        parts.append(rules)

    schema_data = _shared("schema-context.yaml")
    for ref in sp.get("schema_refs", []):
        if section := schema_data.get(ref, "").strip():
            parts.append(section)

    tool_data = _shared("tool-examples.yaml")
    for ref in sp.get("tool_example_refs", []):
        if section := tool_data.get(ref, "").strip():
            parts.append(section)

    return "\n\n".join(parts)

def reload_prompts() -> None:
    """Dev-only: clear caches so the next call re-reads YAML."""
    _load_yaml.cache_clear()
    _shared.cache_clear()

Two LRU caches — one per file, one for the shared map — keep production reads at zero I/O after first request. reload_prompts() is the dev-time escape hatch; bind it to a /admin/reload-prompts endpoint behind the admin role and you can iterate on YAML without restarts.

# Wiring at agent construction
from agent_framework import Agent
from shared.prompt_loader import load_base_prompt

product_discovery_agent = Agent(
    chat_client=client,
    instructions=load_base_prompt("product-discovery"),
    tools=PRODUCT_DISCOVERY_TOOLS,
    context_providers=[RoleContextProvider(), ECommerceContextProvider()],
)

The instructions= carry everything stable. The role-aware paragraph rides in via the context provider — see the next section.

Role context as a context provider
#

This is the substantive change from the original capstone. The original load_prompt(agent, role) returned a different string per role and required Agent(instructions=...) to be re-built for each call. That meant agents were either (a) instantiated per-request — expensive and tied to ContextVars in awkward ways — or (b) instantiated per-role at boot, which multiplies the agent count by three.

The MAF v1 way: one agent, one base prompt, role injected as per-request context.

# shared/context_providers.py
from agent_framework import ContextProvider, Context
from shared.identity import current_user_role     # ContextVar from Ch06
from shared.prompt_loader import _shared

ROLE_INSTRUCTIONS = {
    "customer": """\
This user is a customer. Help them find products, track orders, discover
deals, and resolve issues. Route their questions to the appropriate
specialist agent.""",
    "seller": """\
This user is a seller on the platform. They may ask about their own
products, inventory, fulfillment for items they sell, and reviews on
their products. Route seller queries to specialists with seller context
("Bob is a seller asking about HIS own products").""",
    "admin": """\
This user is an admin with full access to all data and agents. Provide
complete, unfiltered information when requested. Data scoping
restrictions are lifted for this role.""",
}

class RoleContextProvider(ContextProvider):
    """Inject the role-specific instruction paragraph at request time.

    Reads `current_user_role` from the auth middleware ContextVar (Ch06).
    Falls back to 'customer' if the role isn't set (anonymous browsing
    scenarios where you still want a sensible default).
    """

    async def before_run(self, ctx: Context) -> None:
        role = current_user_role.get() or "customer"
        text = ROLE_INSTRUCTIONS.get(role, ROLE_INSTRUCTIONS["customer"])
        ctx.add_system_text(f"## Your Role Context\n{text}")

That’s the whole pattern. The agent is constructed once at boot, lives forever, serves customers and sellers and admins from the same instance. The role-context paragraph is added per-request as additional system content — it doesn’t bloat the persistent instructions= string and it doesn’t require the agent to know anything about authentication.

The win compounds when you scale up. If you also want per-tenant prompt overrides (multi-tenancy), they go in another context provider that reads the tenant from current_tenant_id and pulls a tenant-overrides.yaml. If you want A/B prompt experiments, that’s another provider that reads the experiment flag. The base prompt stays the same; the runtime injection composes.

.NET twin
#

The same pattern in C#. IOptionsMonitor<PromptOptions> for the YAML, a custom IContextProvider (or whichever protocol the .NET MAF surface settles on for this — see Ch22 asymmetries for the surface differences) for the role injection.

public sealed class PromptLoader
{
    private readonly IMemoryCache _cache;
    private readonly string _root;

    public PromptLoader(IMemoryCache cache, IOptions<PromptOptions> opt)
    {
        _cache = cache;
        _root = opt.Value.PromptsDirectory;
    }

    public string LoadBasePrompt(string agentName)
        => _cache.GetOrCreate($"prompt:{agentName}", _ =>
        {
            var cfg     = LoadYaml(Path.Combine(_root, $"{agentName}.yaml"));
            var sp      = cfg.GetSection("system_prompt");

            var parts = new List<string>();
            if (sp["base"] is { } b)        parts.Add(b.Trim());
            parts.Add(LoadShared("grounding-rules.yaml")["rules"]!.Trim());
            foreach (var r in sp.GetSection("schema_refs").Get<string[]>() ?? [])
                if (LoadShared("schema-context.yaml")[r] is { } s) parts.Add(s.Trim());
            foreach (var r in sp.GetSection("tool_example_refs").Get<string[]>() ?? [])
                if (LoadShared("tool-examples.yaml")[r] is { } s) parts.Add(s.Trim());

            return string.Join("\n\n", parts);
        })!;
}

IMemoryCache plays the same role as lru_cache. Hot reload is _cache.Remove(...); in production, leave it indefinite.

For the role-context provider, see Ch05’s .NET section for the canonical IContextProvider.BeforeRunAsync shape — the implementation transcribes the Python one literally.

Composition diagram
#

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor': '#2563eb','primaryTextColor': '#ffffff','primaryBorderColor': '#1e40af', 'lineColor': '#64748b','secondaryColor': '#f59e0b','tertiaryColor': '#10b981', 'background': 'transparent'}}}%% flowchart LR classDef static fill:#2563eb,stroke:#1e40af,color:#ffffff classDef shared fill:#64748b,stroke:#334155,color:#ffffff classDef runtime fill:#10b981,stroke:#047857,color:#ffffff classDef merged fill:#f59e0b,stroke:#b45309,color:#000000 base["Agent base
(per-agent YAML)"] rules["Grounding rules
(_shared)"] schema["Schema context
(_shared)"] tools["Tool examples
(_shared)"] loader["load_base_prompt"] instructions["Agent.instructions
(stable, per-boot)"] role["Current user role
(ContextVar)"] ctxprov["RoleContextProvider
before_run()"] injected["System content
(per-request)"] llm["LLM call"] base --> loader rules --> loader schema --> loader tools --> loader loader --> instructions role --> ctxprov --> injected instructions --> llm injected --> llm class base,rules,schema,tools,loader,instructions static class role,ctxprov runtime class injected merged class llm merged

Three boot-time YAML inputs compose into the agent’s stable instructions. The role-context provider injects per-request content from the auth ContextVar. Both stream into the same LLM call as system content — but only the static half is paid for at agent-construction time.

Two tests worth writing
#

The composition is pure — no LLM, no I/O after first read — so it tests cleanly.

# tests/test_prompt_loader.py
from shared.prompt_loader import load_base_prompt, reload_prompts

def setup_function():
    reload_prompts()

def test_base_prompt_includes_grounding_rules():
    prompt = load_base_prompt("product-discovery")
    assert "Always call a tool for data" in prompt

def test_base_prompt_resolves_schema_refs():
    prompt = load_base_prompt("product-discovery")
    assert "## Database: Products" in prompt
    assert "## Database: Inventory" in prompt
    # And does NOT pull schemas it didn't ask for:
    assert "## Database: Orders & Returns" not in prompt

def test_role_context_provider_injects_seller_paragraph():
    from shared.context_providers import RoleContextProvider
    from shared.identity import current_user_role
    from agent_framework.testing import FakeContext

    current_user_role.set("seller")
    ctx = FakeContext()
    asyncio.run(RoleContextProvider().before_run(ctx))
    assert "Bob is a seller" in ctx.system_text   # exact phrase from the role string

The first two are unit tests on the loader. The third is a unit test on the context provider with a fake Context shim. Together they catch the most common breakage: a YAML edit that accidentally drops a section or a role string that goes missing.

Gotchas
#

lru_cache and editor-saves. When you edit a YAML in dev, the cached copy serves until you call reload_prompts(). Wire the reload to a hotkey (uv run reload) or to a filesystem watcher. The original capstone shipped a --watch flag on the dev server that called reload_prompts() on every YAML change.
Don’t use the role context provider for authorisation. The role string changes the prompt; it does not enforce anything. The actual auth boundary is HTTP-layer middleware (Ch06 + Ch20c production hardening). A jailbroken model that ignores the role context paragraph still can’t call a tool you didn’t give it.
Keep _shared/ tight. The temptation is to dump everything that’s “shared” into the shared files. Three files is the right ceiling — grounding (constraints), schema (data shape), tools (usage). Adding more dilutes the discipline; if a fourth concern needs sharing, it’s probably a context provider in disguise.
Version field. Each per-agent YAML carries a version: "2.0". The loader doesn’t read it. The point is git-blame and the eval framework: when a regression appears, the version bump in the YAML diff narrows the blast radius.
Pretty-printed JSON in fenced rich-card blocks. The grounding-rules file says “keep JSON on a single line.” Multi-line JSON breaks the streaming chat renderer’s fenced-block parser (Ch03). The single-line constraint is non-obvious — call it out in the rules and back it up in the eval dataset with a “fenced JSON is single-line” check.

What changes for the capstone
#

The capstone today (agents/python/shared/prompt_loader.py) ships the original Part 2 design — load_prompt(agent, role) returning a different string per role, agents constructed per-role at boot. Phase 10 of the refactor plan migrates it:

agents/python/shared/prompt_loader.py — drop the role parameter; rename load_prompt to load_base_prompt.
agents/python/shared/context_providers.py — add RoleContextProvider and wire it into default_context_providers() alongside ECommerceContextProvider.
agents/python/orchestrator/agent.py and the five specialists — drop the per-role agent factory and replace with a single create_agent().
agents/python/config/prompts/_shared/grounding-rules.yaml — already updated to the four-rule version.

No tool changes; no LLM-provider changes. The migration is a straight refactor.

What’s next
#

That closes the prompt-architecture gap the original Part 2 left open. The next chapter back to the original e-commerce series is the deployment port — Docker Compose, multi-target Dockerfiles, dev.sh — modernised against the production-hardening patterns from Ch20c.