ADR-001: Agent Orchestration & Harness Strategy

Adopt Pydantic AI as the agent framework layer with Temporal for durable execution

ADR Header
Context
1. Why the Original Decision Was Correct
2. What Changed (2025-2026)
Decision Drivers
Options Considered
Decision
1. 4-Layer Architecture
Architecture Diagrams
Code Examples
Build to Delete
Migration Path
Consequences
Related Documents

ADR Header

Field	Value
Status	Accepted
Date	2026-02-28
Decision	Adopt Pydantic AI (v1.0+, MIT) as the agent framework layer in a 4-layer architecture, with Temporal for durable execution and a custom hardware harness on top
Supersedes	“Custom orchestration, no agent framework” (original architecture decision)
Deciders	Architecture team
Categories	Agent orchestration, framework selection, durable execution

Context

Why the Original Decision Was Correct

When MetaForge’s architecture was first designed, the decision to use custom orchestration with no agent framework was sound:

LangChain instability — The dominant framework (LangChain/LangGraph) was notorious for rapid breaking changes, thick abstraction layers, and a heavy dependency tree. Production teams consistently reported fragile state management.
Framework immaturity — No framework offered native Temporal integration, first-class MCP support, AND Pydantic-validated structured output simultaneously.
Industry pattern — The most successful production agent systems (Cursor, Devin, Claude Code, OpenHands) all used custom orchestration with raw LLM SDKs.

What Changed (2025-2026)

Three developments shifted the cost-benefit analysis:

Pydantic AI v1.0 (September 2025) — The first framework to ship native Temporal integration AND native MCP support with Pydantic-validated structured output as its core strength. Lightweight, model-agnostic, MIT-licensed.
MCP became the standard — Model Context Protocol became the de facto standard for tool integration across all major frameworks and LLM providers. Building custom tool dispatch plumbing is now redundant.
The “Agent Harness” concept — Philipp Schmid’s influential January 2026 analysis (The importance of Agent Harness in 2026) clarified that production systems build domain-specific harnesses ON TOP of frameworks. The competitive advantage is the harness (hardware-domain knowledge), not reinventing tool dispatch plumbing.

Key insight: MetaForge’s competitive advantage is the hardware harness — hardware-domain prompt engineering, FreeCAD/KiCad/SPICE error recovery, design-rule context management, and EVT/DVT/PVT gate hooks. The agent framework is commodity infrastructure.

See: Agent Framework Comparison (2026) for the full 14-framework evaluation.

Decision Drivers

Six non-negotiable constraints drove the framework selection:

#	Constraint	Rationale	Frameworks That Pass
1	Native Temporal integration	Long-running hardware workflows (hours/days) require crash recovery, replay, and approval gates	Pydantic AI, OpenAI Agents SDK
2	First-class MCP support	FreeCAD, KiCad, SPICE, Neo4j MCP servers must plug in directly without adapter layers	Pydantic AI, OpenAI Agents SDK, Google ADK, CrewAI, LangGraph
3	Pydantic structured output	MetaForge domain models (PCB specs, SPICE parameters, mechanical dimensions) are Pydantic models — zero impedance mismatch	Pydantic AI (core strength), others via add-on
4	Python ecosystem	Hardware tools (FreeCAD, KiCad IPC, OpenFOAM, CalculiX, ngspice) have Python-native APIs	All Python frameworks
5	Model-agnostic	Must support Claude, GPT-4o, Gemini, and local models without vendor lock-in	Pydantic AI, OpenAI Agents SDK, Google ADK
6	MIT license	Open-source platform requirement, no proprietary dependencies	Pydantic AI, OpenAI Agents SDK, LangGraph, CrewAI

Only Pydantic AI satisfies all six constraints.

Options Considered

Summary Table

Framework	Native Temporal	Native MCP	Pydantic Output	Model-Agnostic	MIT	MetaForge Fit	Disposition
Pydantic AI	Yes	Yes	Core strength	Yes	Yes	Strong	Selected
OpenAI Agents SDK	Yes	Yes	Yes	Yes	Yes	Moderate-Strong	Runner-up
Google ADK	No	Yes	Yes	Yes	Apache 2.0	Moderate	Honorable mention
LangGraph	No (own persistence)	Yes	Yes	Yes	Yes	Moderate	Rejected — own persistence conflicts with Temporal
CrewAI	No (own Flows)	Yes	Yes	Yes	Yes	Moderate	Rejected — role metaphor mismatch, no Temporal
MS Agent Framework	No	Yes	Yes	Azure-oriented	Yes	Low-Moderate	Rejected — RC/preview, Azure-oriented
Claude Agent SDK	No	Yes	Yes	Claude-only	Proprietary	Moderate	Rejected — proprietary license, Claude-only
OpenHands	No	Yes	Internal only	SWE-specific	MIT	Poor	Rejected — SWE harness, not a framework
Mastra	No	Yes	Zod (TS only)	Yes	Apache 2.0	N/A	Rejected — TypeScript-only
Semantic Kernel	No	–	–	Azure-oriented	MIT	See MS AF	Rejected — merged into MS Agent Framework

Selected: Pydantic AI

Only framework with BOTH native Temporal integration AND native MCP support. Pydantic-validated structured output is its core strength — zero impedance mismatch with MetaForge’s domain models. Lightweight, model-agnostic, MIT-licensed.

Runner-up: OpenAI Agents SDK

Temporal and MCP support, but thinner validation layer than Pydantic AI. Designed primarily around OpenAI’s model paradigm (Responses API) even though it supports other providers.

Honorable Mention: Google ADK

Clean architecture, good MCP support, model-agnostic. However, no native Temporal integration and optimized for Google Cloud/Vertex AI deployment.

Decision

4-Layer Architecture

flowchart TB
    subgraph L4["L4: MetaForge Hardware Harness (CUSTOM)"]
        direction LR
        H1["Hardware Prompts"]
        H2["FreeCAD/KiCad/SPICE<br/>Error Recovery"]
        H3["Design-Rule Context"]
        H4["EVT/DVT/PVT<br/>Gate Hooks"]
        H5["Agent DAG Engine"]
    end

    subgraph L3["L3: Agent Framework (Pydantic AI)"]
        direction LR
        F1["Typed Agent<br/>Definitions"]
        F2["MCP Connections"]
        F3["Structured Output<br/>Validation"]
        F4["Multi-Agent<br/>Delegation"]
        F5["Human-in-the-Loop<br/>Approval"]
    end

    subgraph L2["L2: Durable Execution (Temporal)"]
        direction LR
        T1["Long-Running<br/>Workflows"]
        T2["Crash Recovery<br/>& Replay"]
        T3["Approval Gates"]
        T4["Activity Retries"]
    end

    subgraph L1["L1: LLM Providers (model-agnostic)"]
        direction LR
        M1["Claude"]
        M2["GPT-4o"]
        M3["Gemini"]
        M4["Local Models"]
    end

    L4 --> L3
    L3 --> L2
    L2 --> L1

    style L4 fill:#E67E22,color:#fff,stroke:#E67E22,stroke-width:2px
    style L3 fill:#9b59b6,color:#fff,stroke:#9b59b6,stroke-width:2px
    style L2 fill:#3498db,color:#fff,stroke:#3498db,stroke-width:2px
    style L1 fill:#2C3E50,color:#fff,stroke:#2C3E50,stroke-width:2px

Layer responsibilities:

Layer	Responsibility	Build vs. Buy
L4: Hardware Harness	Domain-specific intelligence — hardware prompt engineering, tool error recovery, design-rule context injection, EVT/DVT/PVT lifecycle hooks, agent DAG orchestration	Custom (competitive advantage)
L3: Agent Framework	Agent definitions with typed dependencies and outputs, MCP server connections, structured output validation, multi-agent delegation	Pydantic AI (commodity)
L2: Durable Execution	Long-running workflow coordination, crash recovery, deterministic replay, human-in-the-loop approval gates, activity-level retries	Temporal (commodity)
L1: LLM Providers	Model-agnostic LLM access via standard SDKs	openai + anthropic SDKs (commodity)

Architecture Diagrams

Single Agent Execution Flow

sequenceDiagram
    participant GW as Gateway<br/>(FastAPI)
    participant H as Hardware<br/>Harness (L4)
    participant A as Pydantic AI<br/>Agent (L3)
    participant MCP as MCP Server
    participant T as Temporal<br/>Activity (L2)
    participant LLM as LLM Provider<br/>(L1)

    GW->>T: Start workflow
    T->>H: Execute agent activity
    H->>H: Inject hardware context<br/>(design rules, constraints)
    H->>A: Run agent with deps
    A->>LLM: Prompt with tools
    LLM-->>A: Tool call request
    A->>MCP: Execute tool<br/>(KiCad ERC, SPICE sim)
    MCP-->>A: Tool result
    A->>LLM: Continue with result
    LLM-->>A: Structured output
    A-->>H: Validated Pydantic model
    H->>H: Hardware-specific<br/>post-processing
    H-->>T: Activity result
    T-->>GW: Workflow complete

Multi-Agent DAG Workflow

flowchart LR
    subgraph Temporal["Temporal Workflow"]
        direction LR
        REQ["REQ Agent<br/>Requirements"]
        SYS["SYS Agent<br/>Systems"]
        EE["EE Agent<br/>Electronics"]
        FW["FW Agent<br/>Firmware"]
        BOM["BOM Agent<br/>Supply Chain"]
        MFG["MFG Agent<br/>Manufacturing"]
    end

    REQ --> SYS
    SYS --> EE
    SYS --> FW
    EE --> BOM
    FW --> BOM
    BOM --> MFG

    GATE{"EVT Gate<br/>Approval"}
    MFG --> GATE

    style REQ fill:#9b59b6,color:#fff
    style SYS fill:#9b59b6,color:#fff
    style EE fill:#9b59b6,color:#fff
    style FW fill:#9b59b6,color:#fff
    style BOM fill:#9b59b6,color:#fff
    style MFG fill:#9b59b6,color:#fff
    style GATE fill:#E67E22,color:#fff

The Temporal workflow coordinates agent activities in a DAG. Each agent runs as a Temporal activity with crash recovery and retry semantics. The EVT/DVT/PVT gate is a Temporal signal that blocks until human approval.

MCP Tool Integration

flowchart LR
    subgraph Agent["Pydantic AI Agent"]
        AC["Agent Core<br/>+ MCP Client"]
    end

    subgraph MCP_Servers["MCP Servers"]
        K["KiCad MCP<br/>Server"]
        S["SPICE MCP<br/>Server"]
        F["FreeCAD MCP<br/>Server"]
        N["Neo4j MCP<br/>Server"]
    end

    subgraph Tools["External Tools"]
        KT["KiCad"]
        ST["ngspice"]
        FT["FreeCAD"]
        NT["Neo4j"]
    end

    AC --> K
    AC --> S
    AC --> F
    AC --> N

    K --> KT
    S --> ST
    F --> FT
    N --> NT

    style Agent fill:#9b59b6,color:#fff
    style MCP_Servers fill:#3498db,color:#fff
    style Tools fill:#27ae60,color:#fff

MCP servers wrap each external tool and expose typed tool schemas. Pydantic AI agents connect to MCP servers via MCPServerStdio or MCPServerHTTP, receiving tools as typed function calls with validated inputs and outputs.

Code Examples

Requirements Agent Definition (Pydantic AI)

from __future__ import annotations

from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from pydantic_ai.mcp import MCPServerStdio


# --- Dependencies (injected at runtime) ---

@dataclass
class RequirementsAgentDeps:
    """Typed dependencies for the Requirements Agent."""
    project_path: str
    design_rules: dict
    session_id: str


# --- Structured Output ---

class Constraint(BaseModel):
    name: str
    value: float
    unit: str
    min_value: float | None = None
    max_value: float | None = None

class RequirementsOutput(BaseModel):
    """Validated output from the Requirements Agent."""
    electrical: list[Constraint] = Field(description="Electrical constraints")
    mechanical: list[Constraint] = Field(description="Mechanical constraints")
    environmental: list[Constraint] = Field(description="Environmental constraints")
    cost: list[Constraint] = Field(description="Cost constraints")
    assumptions: list[str] = Field(description="Assumptions made during extraction")
    confidence: float = Field(ge=0.0, le=1.0, description="Confidence score")


# --- MCP Server Connections ---

neo4j_mcp = MCPServerStdio('npx', args=['-y', '@metaforge/neo4j-mcp-server'])
kicad_mcp = MCPServerStdio('python', args=['-m', 'metaforge.mcp.kicad_server'])


# --- Agent Definition ---

requirements_agent = Agent(
    model='anthropic:claude-sonnet-4-20250514',
    deps_type=RequirementsAgentDeps,
    output_type=RequirementsOutput,
    mcp_servers=[neo4j_mcp],
    system_prompt=(
        'You are the MetaForge Requirements Agent. '
        'Extract structured engineering constraints from a PRD. '
        'Output must include electrical, mechanical, environmental, '
        'and cost constraints with units and ranges.'
    ),
)


# --- Custom Tool with Dependency Injection ---

@requirements_agent.tool
async def check_design_rules(
    ctx: RunContext[RequirementsAgentDeps],
    constraint_name: str,
    proposed_value: float,
) -> str:
    """Check a proposed constraint against known design rules."""
    rules = ctx.deps.design_rules
    if constraint_name in rules:
        rule = rules[constraint_name]
        if rule['min'] <= proposed_value <= rule['max']:
            return f"PASS: {constraint_name}={proposed_value} within [{rule['min']}, {rule['max']}]"
        return f"FAIL: {constraint_name}={proposed_value} outside [{rule['min']}, {rule['max']}]"
    return f"NO_RULE: No design rule found for {constraint_name}"

Temporal Activity Wrapper

from temporalio import activity, workflow
from pydantic_ai import Agent

from metaforge.agents.requirements import (
    requirements_agent,
    RequirementsAgentDeps,
    RequirementsOutput,
)


@activity.defn
async def run_requirements_agent(
    prd_content: str,
    project_path: str,
    session_id: str,
) -> dict:
    """Temporal activity that runs the Requirements Agent."""
    deps = RequirementsAgentDeps(
        project_path=project_path,
        design_rules=load_design_rules(project_path),
        session_id=session_id,
    )

    result = await requirements_agent.run(
        f"Extract requirements from this PRD:\n\n{prd_content}",
        deps=deps,
    )

    # result.output is a validated RequirementsOutput instance
    return result.output.model_dump()

Temporal Workflow Coordinating Multiple Agents

from temporalio import workflow
from datetime import timedelta

with workflow.unsafe.imports_passed_through():
    from metaforge.activities import (
        run_requirements_agent,
        run_systems_agent,
        run_electronics_agent,
        run_firmware_agent,
        run_bom_agent,
        run_manufacturing_agent,
    )


@workflow.defn
class HardwareDesignWorkflow:
    """Temporal workflow that orchestrates the full agent DAG."""

    @workflow.run
    async def run(self, prd_content: str, project_path: str) -> dict:
        session_id = workflow.info().workflow_id

        # Stage 1: Requirements extraction
        requirements = await workflow.execute_activity(
            run_requirements_agent,
            args=[prd_content, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=5),
        )

        # Stage 2: Systems architecture (depends on requirements)
        architecture = await workflow.execute_activity(
            run_systems_agent,
            args=[requirements, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=10),
        )

        # Stage 3: Electronics + Firmware (parallel, both depend on architecture)
        electronics, firmware = await asyncio.gather(
            workflow.execute_activity(
                run_electronics_agent,
                args=[architecture, project_path, session_id],
                start_to_close_timeout=timedelta(minutes=10),
            ),
            workflow.execute_activity(
                run_firmware_agent,
                args=[architecture, project_path, session_id],
                start_to_close_timeout=timedelta(minutes=10),
            ),
        )

        # Stage 4: BOM analysis (depends on electronics + firmware)
        bom = await workflow.execute_activity(
            run_bom_agent,
            args=[electronics, firmware, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=5),
        )

        # Stage 5: Manufacturing prep (depends on BOM)
        manufacturing = await workflow.execute_activity(
            run_manufacturing_agent,
            args=[bom, project_path, session_id],
            start_to_close_timeout=timedelta(minutes=5),
        )

        # EVT Gate: Wait for human approval
        await workflow.wait_condition(lambda: self._gate_approved)

        return {
            'requirements': requirements,
            'architecture': architecture,
            'electronics': electronics,
            'firmware': firmware,
            'bom': bom,
            'manufacturing': manufacturing,
        }

    _gate_approved: bool = False

    @workflow.signal
    async def approve_gate(self) -> None:
        self._gate_approved = True

Build to Delete

Per Philipp Schmid’s principle: design so the Pydantic AI layer is replaceable.

Component	Framework Coupling	Replacement Cost	Notes
MCP servers (KiCad, SPICE, FreeCAD, Neo4j)	None — framework-agnostic	Zero	MCP is a protocol, not a framework feature
Temporal workflows	None — framework-agnostic	Zero	Workflows call activities; activities wrap agents
Pydantic domain models	None — used by framework but owned by MetaForge	Zero	`RequirementsOutput`, `BOMEntry`, etc. are plain Pydantic models
Hardware harness logic	None — sits above the framework	Zero	Prompt templates, error recovery, gate hooks are custom code
Agent definitions	Coupled — `Agent()`, `@agent.tool`, `RunContext`	Moderate	~20 agent definitions to rewrite
Tool decorators	Coupled — `@agent.tool` with dependency injection	Moderate	~50-100 tool functions to re-register
MCP client wrappers	Coupled — `MCPServerStdio`, `MCPServerHTTP`	Low	Thin wrappers, standard protocol underneath

Risk assessment: If Pydantic AI is abandoned or superseded, the replacement surface is limited to agent definitions and tool decorators (~2-3 weeks of refactoring). All domain logic, MCP servers, Temporal workflows, and Pydantic models survive intact.

Migration Path

Text replacements across existing documentation:

File	Old Text	New Text
`README.md`	`Custom orchestration layer (no agent framework)`	`Pydantic AI + Temporal ([ADR-001](docs/architecture/agent-orchestration-adr.md))`
`docs/architecture/index.md`	`subgraph "Agent Base (Custom Orchestration)"`	`subgraph "Agent Base (Pydantic AI + Temporal)"`
`docs/architecture/index.md`	`Custom Orchestration` blockquote (Section 2.3)	`Agent Orchestration (ADR-001)` blockquote referencing Pydantic AI + Temporal
`docs/architecture/index.md`	`Custom Orchestration` in Agent Runtime mermaid	`Pydantic AI Framework` with Temporal Activities and MCP connections
`docs/architecture/index.md`	`custom orchestration, no agent framework` in dependencies	`Pydantic AI framework + Temporal` with `pydantic-ai` and `temporalio`
`docs/architecture/mvp-roadmap.md`	`Custom orchestration + openai + @anthropic-ai/sdk`	`Pydantic AI + Temporal + openai + anthropic SDKs`
`docs/architecture/mvp-roadmap.md`	Agent Orchestration Decision blockquote	Updated text referencing Pydantic AI + Temporal + ADR-001
`docs/agents/index.md`	TypeScript `LLMProvider` + `Agent` interface	Python Pydantic AI base agent pattern
`docs/architecture/repository-structure.md`	`orchestrator/` table	Add note: agent execution uses Pydantic AI + Temporal
`docs/architecture/repository-structure.md`	`domain_agents/` agent.py description	Reference Pydantic AI agent definition
`docs/research/agent-framework-comparison-2026.md`	(end of document)	Add “Decision Recorded” section linking to ADR-001

Consequences

Positive

Eliminates ~2K lines of custom orchestration plumbing — tool dispatch, MCP integration, structured output validation, and retry logic are handled by Pydantic AI + Temporal
Native crash recovery — Temporal provides deterministic replay for long-running hardware design workflows (hours/days)
Zero impedance mismatch — Pydantic AI speaks the same validation language as MetaForge’s domain models
MCP-native — FreeCAD, KiCad, SPICE, and Neo4j MCP servers plug in directly via MCPServerStdio/MCPServerHTTP
Model-agnostic — Claude, GPT-4o, Gemini, and local models supported without custom abstraction layers
Community momentum — Pydantic AI backed by the Pydantic team (ubiquitous in Python ecosystem)

Negative

Framework dependency — MetaForge now depends on pydantic-ai (MIT, actively maintained). Mitigated by “Build to Delete” architecture.
Team learning curve — Engineers must learn Pydantic AI’s Agent(), RunContext, @agent.tool patterns. Mitigated by strong documentation and Pydantic familiarity.
Framework maturity — Pydantic AI v1.0 shipped September 2025; younger than LangGraph or CrewAI. Mitigated by Pydantic team’s track record and rapid adoption.

Neutral

Agent lifecycle model maps cleanly — MetaForge’s existing Created → Loading → Executing → Validating → Completed lifecycle maps directly to Pydantic AI’s agent run semantics with Temporal activity wrapping
Skill pattern preserved — Skills remain pure, stateless capability modules invoked by agents. The framework change affects agent orchestration, not skill execution.
EVT/DVT/PVT gates — Gate workflow maps to Temporal signals with human-in-the-loop approval, matching the existing gate engine design.

Document	Relationship
System Architecture	Parent architecture document — updated to reference this ADR
MVP Roadmap	Technology stack tables updated to reflect Pydantic AI + Temporal
Agent System	Base agent interface updated from TypeScript/custom to Python/Pydantic AI
Repository Structure	Directory descriptions updated for Pydantic AI agent definitions
Agent Framework Comparison (2026)	Research that informed this decision — 14-framework evaluation
Orchestrator Technical	Orchestrator design — workflow coordination via Temporal
Digital Twin Evolution	Twin architecture unchanged; agents interact via MCP + Neo4j
Assistant Mode	Dual-mode operation — reuses Temporal activities for ingest pipeline and Temporal signals for IDE notifications
System Observability	OpenTelemetry integration — Pydantic AI supports OTel tracing natively

Document Version: v1.0 Last Updated: 2026-02-28 Status: Accepted

← MVP Roadmap

Architecture Home →