ADR-001: Agent Orchestration & Harness Strategy

Adopt Pydantic AI as the agent framework layer with Temporal for durable execution

Table of Contents

  1. ADR Header
  2. Context
    1. Why the Original Decision Was Correct
    2. What Changed (2025-2026)
  3. Decision Drivers
  4. Options Considered
    1. Summary Table
    2. Selected: Pydantic AI
    3. Runner-up: OpenAI Agents SDK
    4. Honorable Mention: Google ADK
  5. Decision
    1. 4-Layer Architecture
  6. Architecture Diagrams
    1. Single Agent Execution Flow
    2. Multi-Agent DAG Workflow
    3. MCP Tool Integration
  7. Code Examples
    1. Requirements Agent Definition (Pydantic AI)
    2. Temporal Activity Wrapper
    3. Temporal Workflow Coordinating Multiple Agents
  8. Build to Delete
  9. Migration Path
  10. Consequences
    1. Positive
    2. Negative
    3. Neutral
  11. Related Documents

ADR Header

Field Value
Status Accepted
Date 2026-02-28
Decision Adopt Pydantic AI (v1.0+, MIT) as the agent framework layer in a 4-layer architecture, with Temporal for durable execution and a custom hardware harness on top
Supersedes “Custom orchestration, no agent framework” (original architecture decision)
Deciders Architecture team
Categories Agent orchestration, framework selection, durable execution

Context

Why the Original Decision Was Correct

When MetaForge’s architecture was first designed, the decision to use custom orchestration with no agent framework was sound:

  • LangChain instability — The dominant framework (LangChain/LangGraph) was notorious for rapid breaking changes, thick abstraction layers, and a heavy dependency tree. Production teams consistently reported fragile state management.
  • Framework immaturity — No framework offered native Temporal integration, first-class MCP support, AND Pydantic-validated structured output simultaneously.
  • Industry pattern — The most successful production agent systems (Cursor, Devin, Claude Code, OpenHands) all used custom orchestration with raw LLM SDKs.

What Changed (2025-2026)

Three developments shifted the cost-benefit analysis:

  1. Pydantic AI v1.0 (September 2025) — The first framework to ship native Temporal integration AND native MCP support with Pydantic-validated structured output as its core strength. Lightweight, model-agnostic, MIT-licensed.

  2. MCP became the standard — Model Context Protocol became the de facto standard for tool integration across all major frameworks and LLM providers. Building custom tool dispatch plumbing is now redundant.

  3. The “Agent Harness” concept — Philipp Schmid’s influential January 2026 analysis (The importance of Agent Harness in 2026) clarified that production systems build domain-specific harnesses ON TOP of frameworks. The competitive advantage is the harness (hardware-domain knowledge), not reinventing tool dispatch plumbing.

Key insight: MetaForge’s competitive advantage is the hardware harness — hardware-domain prompt engineering, FreeCAD/KiCad/SPICE error recovery, design-rule context management, and EVT/DVT/PVT gate hooks. The agent framework is commodity infrastructure.

See: Agent Framework Comparison (2026) for the full 14-framework evaluation.


Decision Drivers

Six non-negotiable constraints drove the framework selection:

# Constraint Rationale Frameworks That Pass
1 Native Temporal integration Long-running hardware workflows (hours/days) require crash recovery, replay, and approval gates Pydantic AI, OpenAI Agents SDK
2 First-class MCP support FreeCAD, KiCad, SPICE, Neo4j MCP servers must plug in directly without adapter layers Pydantic AI, OpenAI Agents SDK, Google ADK, CrewAI, LangGraph
3 Pydantic structured output MetaForge domain models (PCB specs, SPICE parameters, mechanical dimensions) are Pydantic models — zero impedance mismatch Pydantic AI (core strength), others via add-on
4 Python ecosystem Hardware tools (FreeCAD, KiCad IPC, OpenFOAM, CalculiX, ngspice) have Python-native APIs All Python frameworks
5 Model-agnostic Must support Claude, GPT-4o, Gemini, and local models without vendor lock-in Pydantic AI, OpenAI Agents SDK, Google ADK
6 MIT license Open-source platform requirement, no proprietary dependencies Pydantic AI, OpenAI Agents SDK, LangGraph, CrewAI

Only Pydantic AI satisfies all six constraints.


Options Considered

Summary Table

Framework Native Temporal Native MCP Pydantic Output Model-Agnostic MIT MetaForge Fit Disposition
Pydantic AI Yes Yes Core strength Yes Yes Strong Selected
OpenAI Agents SDK Yes Yes Yes Yes Yes Moderate-Strong Runner-up
Google ADK No Yes Yes Yes Apache 2.0 Moderate Honorable mention
LangGraph No (own persistence) Yes Yes Yes Yes Moderate Rejected — own persistence conflicts with Temporal
CrewAI No (own Flows) Yes Yes Yes Yes Moderate Rejected — role metaphor mismatch, no Temporal
MS Agent Framework No Yes Yes Azure-oriented Yes Low-Moderate Rejected — RC/preview, Azure-oriented
Claude Agent SDK No Yes Yes Claude-only Proprietary Moderate Rejected — proprietary license, Claude-only
OpenHands No Yes Internal only SWE-specific MIT Poor Rejected — SWE harness, not a framework
Mastra No Yes Zod (TS only) Yes Apache 2.0 N/A Rejected — TypeScript-only
Semantic Kernel No Azure-oriented MIT See MS AF Rejected — merged into MS Agent Framework

Selected: Pydantic AI

Only framework with BOTH native Temporal integration AND native MCP support. Pydantic-validated structured output is its core strength — zero impedance mismatch with MetaForge’s domain models. Lightweight, model-agnostic, MIT-licensed.

Runner-up: OpenAI Agents SDK

Temporal and MCP support, but thinner validation layer than Pydantic AI. Designed primarily around OpenAI’s model paradigm (Responses API) even though it supports other providers.

Honorable Mention: Google ADK

Clean architecture, good MCP support, model-agnostic. However, no native Temporal integration and optimized for Google Cloud/Vertex AI deployment.


Decision

4-Layer Architecture

flowchart TB
    subgraph L4["L4: MetaForge Hardware Harness (CUSTOM)"]
        direction LR
        H1["Hardware Prompts"]
        H2["FreeCAD/KiCad/SPICE<br/>Error Recovery"]
        H3["Design-Rule Context"]
        H4["EVT/DVT/PVT<br/>Gate Hooks"]
        H5["Agent DAG Engine"]
    end

    subgraph L3["L3: Agent Framework (Pydantic AI)"]
        direction LR
        F1["Typed Agent<br/>Definitions"]
        F2["MCP Connections"]
        F3["Structured Output<br/>Validation"]
        F4["Multi-Agent<br/>Delegation"]
        F5["Human-in-the-Loop<br/>Approval"]
    end

    subgraph L2["L2: Durable Execution (Temporal)"]
        direction LR
        T1["Long-Running<br/>Workflows"]
        T2["Crash Recovery<br/>& Replay"]
        T3["Approval Gates"]
        T4["Activity Retries"]
    end

    subgraph L1["L1: LLM Providers (model-agnostic)"]
        direction LR
        M1["Claude"]
        M2["GPT-4o"]
        M3["Gemini"]
        M4["Local Models"]
    end

    L4 --> L3
    L3 --> L2
    L2 --> L1

    style L4 fill:#E67E22,color:#fff,stroke:#E67E22,stroke-width:2px
    style L3 fill:#9b59b6,color:#fff,stroke:#9b59b6,stroke-width:2px
    style L2 fill:#3498db,color:#fff,stroke:#3498db,stroke-width:2px
    style L1 fill:#2C3E50,color:#fff,stroke:#2C3E50,stroke-width:2px

Layer responsibilities:

Layer Responsibility Build vs. Buy
L4: Hardware Harness Domain-specific intelligence — hardware prompt engineering, tool error recovery, design-rule context injection, EVT/DVT/PVT lifecycle hooks, agent DAG orchestration Custom (competitive advantage)
L3: Agent Framework Agent definitions with typed dependencies and outputs, MCP server connections, structured output validation, multi-agent delegation Pydantic AI (commodity)
L2: Durable Execution Long-running workflow coordination, crash recovery, deterministic replay, human-in-the-loop approval gates, activity-level retries Temporal (commodity)
L1: LLM Providers Model-agnostic LLM access via standard SDKs openai + anthropic SDKs (commodity)

Architecture Diagrams

Single Agent Execution Flow

sequenceDiagram
    participant GW as Gateway<br/>(FastAPI)
    participant H as Hardware<br/>Harness (L4)
    participant A as Pydantic AI<br/>Agent (L3)
    participant MCP as MCP Server
    participant T as Temporal<br/>Activity (L2)
    participant LLM as LLM Provider<br/>(L1)

    GW->>T: Start workflow
    T->>H: Execute agent activity
    H->>H: Inject hardware context<br/>(design rules, constraints)
    H->>A: Run agent with deps
    A->>LLM: Prompt with tools
    LLM-->>A: Tool call request
    A->>MCP: Execute tool<br/>(KiCad ERC, SPICE sim)
    MCP-->>A: Tool result
    A->>LLM: Continue with result
    LLM-->>A: Structured output
    A-->>H: Validated Pydantic model
    H->>H: Hardware-specific<br/>post-processing
    H-->>T: Activity result
    T-->>GW: Workflow complete

Multi-Agent DAG Workflow

flowchart LR
    subgraph Temporal["Temporal Workflow"]
        direction LR
        REQ["REQ Agent<br/>Requirements"]
        SYS["SYS Agent<br/>Systems"]
        EE["EE Agent<br/>Electronics"]
        FW["FW Agent<br/>Firmware"]
        BOM["BOM Agent<br/>Supply Chain"]
        MFG["MFG Agent<br/>Manufacturing"]
    end

    REQ --> SYS
    SYS --> EE
    SYS --> FW
    EE --> BOM
    FW --> BOM
    BOM --> MFG

    GATE{"EVT Gate<br/>Approval"}
    MFG --> GATE

    style REQ fill:#9b59b6,color:#fff
    style SYS fill:#9b59b6,color:#fff
    style EE fill:#9b59b6,color:#fff
    style FW fill:#9b59b6,color:#fff
    style BOM fill:#9b59b6,color:#fff
    style MFG fill:#9b59b6,color:#fff
    style GATE fill:#E67E22,color:#fff

The Temporal workflow coordinates agent activities in a DAG. Each agent runs as a Temporal activity with crash recovery and retry semantics. The EVT/DVT/PVT gate is a Temporal signal that blocks until human approval.

MCP Tool Integration

flowchart LR
    subgraph Agent["Pydantic AI Agent"]
        AC["Agent Core<br/>+ MCP Client"]
    end

    subgraph MCP_Servers["MCP Servers"]
        K["KiCad MCP<br/>Server"]
        S["SPICE MCP<br/>Server"]
        F["FreeCAD MCP<br/>Server"]
        N["Neo4j MCP<br/>Server"]
    end

    subgraph Tools["External Tools"]
        KT["KiCad"]
        ST["ngspice"]
        FT["FreeCAD"]
        NT["Neo4j"]
    end

    AC --> K
    AC --> S
    AC --> F
    AC --> N

    K --> KT
    S --> ST
    F --> FT
    N --> NT

    style Agent fill:#9b59b6,color:#fff
    style MCP_Servers fill:#3498db,color:#fff
    style Tools fill:#27ae60,color:#fff

MCP servers wrap each external tool and expose typed tool schemas. Pydantic AI agents connect to MCP servers via MCPServerStdio or MCPServerHTTP, receiving tools as typed function calls with validated inputs and outputs.


Code Examples

Requirements Agent Definition (Pydantic AI)

from __future__ import annotations

from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from pydantic_ai.mcp import MCPServerStdio


# --- Dependencies (injected at runtime) ---

@dataclass
class RequirementsAgentDeps:
    """Typed dependencies for the Requirements Agent."""
    project_path: str
    design_rules: dict
    session_id: str


# --- Structured Output ---

class Constraint(BaseModel):
    name: str
    value: float
    unit: str
    min_value: float | None = None
    max_value: float | None = None

class RequirementsOutput(BaseModel):
    """Validated output from the Requirements Agent."""
    electrical: list[Constraint] = Field(description="Electrical constraints")
    mechanical: list[Constraint] = Field(description="Mechanical constraints")
    environmental: list[Constraint] = Field(description="Environmental constraints")
    cost: list[Constraint] = Field(description="Cost constraints")
    assumptions: list[str] = Field(description="Assumptions made during extraction")
    confidence: float = Field(ge=0.0, le=1.0, description="Confidence score")


# --- MCP Server Connections ---

neo4j_mcp = MCPServerStdio('npx', args=['-y', '@metaforge/neo4j-mcp-server'])
kicad_mcp = MCPServerStdio('python', args=['-m', 'metaforge.mcp.kicad_server'])


# --- Agent Definition ---

requirements_agent = Agent(
    model='anthropic:claude-sonnet-4-20250514',
    deps_type=RequirementsAgentDeps,
    output_type=RequirementsOutput,
    mcp_servers=[neo4j_mcp],
    system_prompt=(
        'You are the MetaForge Requirements Agent. '
        'Extract structured engineering constraints from a PRD. '
        'Output must include electrical, mechanical, environmental, '
        'and cost constraints with units and ranges.'
    ),
)


# --- Custom Tool with Dependency Injection ---

@requirements_agent.tool
async def check_design_rules(
    ctx: RunContext[RequirementsAgentDeps],
    constraint_name: str,
    proposed_value: float,
) -> str:
    """Check a proposed constraint against known design rules."""
    rules = ctx.deps.design_rules
    if constraint_name in rules:
        rule = rules[constraint_name]
        if rule['min'] <= proposed_value <= rule['max']:
            return f"PASS: {constraint_name}={proposed_value} within [{rule['min']}, {rule['max']}]"
        return f"FAIL: {constraint_name}={proposed_value} outside [{rule['min']}, {rule['max']}]"
    return f"NO_RULE: No design rule found for {constraint_name}"

Temporal Activity Wrapper

from temporalio import activity, workflow
from pydantic_ai import Agent

from metaforge.agents.requirements import (
    requirements_agent,
    RequirementsAgentDeps,
    RequirementsOutput,
)


@activity.defn
async def run_requirements_agent(
    prd_content: str,
    project_path: str,
    session_id: str,
) -> dict:
    """Temporal activity that runs the Requirements Agent."""
    deps = RequirementsAgentDeps(
        project_path=project_path,
        design_rules=load_design_rules(project_path),
        session_id=session_id,
    )

    result = await requirements_agent.run(
        f"Extract requirements from this PRD:\n\n{prd_content}",
        deps=deps,
    )

    # result.output is a validated RequirementsOutput instance
    return result.output.model_dump()

Temporal Workflow Coordinating Multiple Agents

from temporalio import workflow
from datetime import timedelta

with workflow.unsafe.imports_passed_through():
    from metaforge.activities import (
        run_requirements_agent,
        run_systems_agent,
        run_electronics_agent,
        run_firmware_agent,
        run_bom_agent,
        run_manufacturing_agent,
    )


@workflow.defn
class HardwareDesignWorkflow:
    """Temporal workflow that orchestrates the full agent DAG."""

    @workflow.run
    async def run(self, prd_content: str, project_path: str) -> dict:
        session_id = workflow.info().workflow_id

        # Stage 1: Requirements extraction
        requirements = await workflow.execute_activity(
            run_requirements_agent,
            args=[prd_content, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=5),
        )

        # Stage 2: Systems architecture (depends on requirements)
        architecture = await workflow.execute_activity(
            run_systems_agent,
            args=[requirements, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=10),
        )

        # Stage 3: Electronics + Firmware (parallel, both depend on architecture)
        electronics, firmware = await asyncio.gather(
            workflow.execute_activity(
                run_electronics_agent,
                args=[architecture, project_path, session_id],
                start_to_close_timeout=timedelta(minutes=10),
            ),
            workflow.execute_activity(
                run_firmware_agent,
                args=[architecture, project_path, session_id],
                start_to_close_timeout=timedelta(minutes=10),
            ),
        )

        # Stage 4: BOM analysis (depends on electronics + firmware)
        bom = await workflow.execute_activity(
            run_bom_agent,
            args=[electronics, firmware, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=5),
        )

        # Stage 5: Manufacturing prep (depends on BOM)
        manufacturing = await workflow.execute_activity(
            run_manufacturing_agent,
            args=[bom, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=5),
        )

        # EVT Gate: Wait for human approval
        await workflow.wait_condition(lambda: self._gate_approved)

        return {
            'requirements': requirements,
            'architecture': architecture,
            'electronics': electronics,
            'firmware': firmware,
            'bom': bom,
            'manufacturing': manufacturing,
        }

    _gate_approved: bool = False

    @workflow.signal
    async def approve_gate(self) -> None:
        self._gate_approved = True

Build to Delete

Per Philipp Schmid’s principle: design so the Pydantic AI layer is replaceable.

Component Framework Coupling Replacement Cost Notes
MCP servers (KiCad, SPICE, FreeCAD, Neo4j) None — framework-agnostic Zero MCP is a protocol, not a framework feature
Temporal workflows None — framework-agnostic Zero Workflows call activities; activities wrap agents
Pydantic domain models None — used by framework but owned by MetaForge Zero RequirementsOutput, BOMEntry, etc. are plain Pydantic models
Hardware harness logic None — sits above the framework Zero Prompt templates, error recovery, gate hooks are custom code
Agent definitions CoupledAgent(), @agent.tool, RunContext Moderate ~20 agent definitions to rewrite
Tool decorators Coupled@agent.tool with dependency injection Moderate ~50-100 tool functions to re-register
MCP client wrappers CoupledMCPServerStdio, MCPServerHTTP Low Thin wrappers, standard protocol underneath

Risk assessment: If Pydantic AI is abandoned or superseded, the replacement surface is limited to agent definitions and tool decorators (~2-3 weeks of refactoring). All domain logic, MCP servers, Temporal workflows, and Pydantic models survive intact.


Migration Path

Text replacements across existing documentation:

File Old Text New Text
README.md Custom orchestration layer (no agent framework) Pydantic AI + Temporal ([ADR-001](docs/architecture/agent-orchestration-adr.md))
docs/architecture/index.md subgraph "Agent Base (Custom Orchestration)" subgraph "Agent Base (Pydantic AI + Temporal)"
docs/architecture/index.md Custom Orchestration blockquote (Section 2.3) Agent Orchestration (ADR-001) blockquote referencing Pydantic AI + Temporal
docs/architecture/index.md Custom Orchestration in Agent Runtime mermaid Pydantic AI Framework with Temporal Activities and MCP connections
docs/architecture/index.md custom orchestration, no agent framework in dependencies Pydantic AI framework + Temporal with pydantic-ai and temporalio
docs/architecture/mvp-roadmap.md Custom orchestration + openai + @anthropic-ai/sdk Pydantic AI + Temporal + openai + anthropic SDKs
docs/architecture/mvp-roadmap.md Agent Orchestration Decision blockquote Updated text referencing Pydantic AI + Temporal + ADR-001
docs/agents/index.md TypeScript LLMProvider + Agent interface Python Pydantic AI base agent pattern
docs/architecture/repository-structure.md orchestrator/ table Add note: agent execution uses Pydantic AI + Temporal
docs/architecture/repository-structure.md domain_agents/ agent.py description Reference Pydantic AI agent definition
docs/research/agent-framework-comparison-2026.md (end of document) Add “Decision Recorded” section linking to ADR-001

Consequences

Positive

  • Eliminates ~2K lines of custom orchestration plumbing — tool dispatch, MCP integration, structured output validation, and retry logic are handled by Pydantic AI + Temporal
  • Native crash recovery — Temporal provides deterministic replay for long-running hardware design workflows (hours/days)
  • Zero impedance mismatch — Pydantic AI speaks the same validation language as MetaForge’s domain models
  • MCP-native — FreeCAD, KiCad, SPICE, and Neo4j MCP servers plug in directly via MCPServerStdio/MCPServerHTTP
  • Model-agnostic — Claude, GPT-4o, Gemini, and local models supported without custom abstraction layers
  • Community momentum — Pydantic AI backed by the Pydantic team (ubiquitous in Python ecosystem)

Negative

  • Framework dependency — MetaForge now depends on pydantic-ai (MIT, actively maintained). Mitigated by “Build to Delete” architecture.
  • Team learning curve — Engineers must learn Pydantic AI’s Agent(), RunContext, @agent.tool patterns. Mitigated by strong documentation and Pydantic familiarity.
  • Framework maturity — Pydantic AI v1.0 shipped September 2025; younger than LangGraph or CrewAI. Mitigated by Pydantic team’s track record and rapid adoption.

Neutral

  • Agent lifecycle model maps cleanly — MetaForge’s existing Created → Loading → Executing → Validating → Completed lifecycle maps directly to Pydantic AI’s agent run semantics with Temporal activity wrapping
  • Skill pattern preserved — Skills remain pure, stateless capability modules invoked by agents. The framework change affects agent orchestration, not skill execution.
  • EVT/DVT/PVT gates — Gate workflow maps to Temporal signals with human-in-the-loop approval, matching the existing gate engine design.

Document Relationship
System Architecture Parent architecture document — updated to reference this ADR
MVP Roadmap Technology stack tables updated to reflect Pydantic AI + Temporal
Agent System Base agent interface updated from TypeScript/custom to Python/Pydantic AI
Repository Structure Directory descriptions updated for Pydantic AI agent definitions
Agent Framework Comparison (2026) Research that informed this decision — 14-framework evaluation
Orchestrator Technical Orchestrator design — workflow coordination via Temporal
Digital Twin Evolution Twin architecture unchanged; agents interact via MCP + Neo4j
Assistant Mode Dual-mode operation — reuses Temporal activities for ingest pipeline and Temporal signals for IDE notifications
System Observability OpenTelemetry integration — Pydantic AI supports OTel tracing natively

Document Version: v1.0 Last Updated: 2026-02-28 Status: Accepted

← MVP Roadmap Architecture Home →