ADR-001: Agent Orchestration & Harness Strategy
Adopt Pydantic AI as the agent framework layer with Temporal for durable execution
Table of Contents
- ADR Header
- Context
- Decision Drivers
- Options Considered
- Decision
- Architecture Diagrams
- Code Examples
- Build to Delete
- Migration Path
- Consequences
- Related Documents
ADR Header
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-02-28 |
| Decision | Adopt Pydantic AI (v1.0+, MIT) as the agent framework layer in a 4-layer architecture, with Temporal for durable execution and a custom hardware harness on top |
| Supersedes | “Custom orchestration, no agent framework” (original architecture decision) |
| Deciders | Architecture team |
| Categories | Agent orchestration, framework selection, durable execution |
Context
Why the Original Decision Was Correct
When MetaForge’s architecture was first designed, the decision to use custom orchestration with no agent framework was sound:
- LangChain instability — The dominant framework (LangChain/LangGraph) was notorious for rapid breaking changes, thick abstraction layers, and a heavy dependency tree. Production teams consistently reported fragile state management.
- Framework immaturity — No framework offered native Temporal integration, first-class MCP support, AND Pydantic-validated structured output simultaneously.
- Industry pattern — The most successful production agent systems (Cursor, Devin, Claude Code, OpenHands) all used custom orchestration with raw LLM SDKs.
What Changed (2025-2026)
Three developments shifted the cost-benefit analysis:
-
Pydantic AI v1.0 (September 2025) — The first framework to ship native Temporal integration AND native MCP support with Pydantic-validated structured output as its core strength. Lightweight, model-agnostic, MIT-licensed.
-
MCP became the standard — Model Context Protocol became the de facto standard for tool integration across all major frameworks and LLM providers. Building custom tool dispatch plumbing is now redundant.
-
The “Agent Harness” concept — Philipp Schmid’s influential January 2026 analysis (The importance of Agent Harness in 2026) clarified that production systems build domain-specific harnesses ON TOP of frameworks. The competitive advantage is the harness (hardware-domain knowledge), not reinventing tool dispatch plumbing.
Key insight: MetaForge’s competitive advantage is the hardware harness — hardware-domain prompt engineering, FreeCAD/KiCad/SPICE error recovery, design-rule context management, and EVT/DVT/PVT gate hooks. The agent framework is commodity infrastructure.
See: Agent Framework Comparison (2026) for the full 14-framework evaluation.
Decision Drivers
Six non-negotiable constraints drove the framework selection:
| # | Constraint | Rationale | Frameworks That Pass |
|---|---|---|---|
| 1 | Native Temporal integration | Long-running hardware workflows (hours/days) require crash recovery, replay, and approval gates | Pydantic AI, OpenAI Agents SDK |
| 2 | First-class MCP support | FreeCAD, KiCad, SPICE, Neo4j MCP servers must plug in directly without adapter layers | Pydantic AI, OpenAI Agents SDK, Google ADK, CrewAI, LangGraph |
| 3 | Pydantic structured output | MetaForge domain models (PCB specs, SPICE parameters, mechanical dimensions) are Pydantic models — zero impedance mismatch | Pydantic AI (core strength), others via add-on |
| 4 | Python ecosystem | Hardware tools (FreeCAD, KiCad IPC, OpenFOAM, CalculiX, ngspice) have Python-native APIs | All Python frameworks |
| 5 | Model-agnostic | Must support Claude, GPT-4o, Gemini, and local models without vendor lock-in | Pydantic AI, OpenAI Agents SDK, Google ADK |
| 6 | MIT license | Open-source platform requirement, no proprietary dependencies | Pydantic AI, OpenAI Agents SDK, LangGraph, CrewAI |
Only Pydantic AI satisfies all six constraints.
Options Considered
Summary Table
| Framework | Native Temporal | Native MCP | Pydantic Output | Model-Agnostic | MIT | MetaForge Fit | Disposition |
|---|---|---|---|---|---|---|---|
| Pydantic AI | Yes | Yes | Core strength | Yes | Yes | Strong | Selected |
| OpenAI Agents SDK | Yes | Yes | Yes | Yes | Yes | Moderate-Strong | Runner-up |
| Google ADK | No | Yes | Yes | Yes | Apache 2.0 | Moderate | Honorable mention |
| LangGraph | No (own persistence) | Yes | Yes | Yes | Yes | Moderate | Rejected — own persistence conflicts with Temporal |
| CrewAI | No (own Flows) | Yes | Yes | Yes | Yes | Moderate | Rejected — role metaphor mismatch, no Temporal |
| MS Agent Framework | No | Yes | Yes | Azure-oriented | Yes | Low-Moderate | Rejected — RC/preview, Azure-oriented |
| Claude Agent SDK | No | Yes | Yes | Claude-only | Proprietary | Moderate | Rejected — proprietary license, Claude-only |
| OpenHands | No | Yes | Internal only | SWE-specific | MIT | Poor | Rejected — SWE harness, not a framework |
| Mastra | No | Yes | Zod (TS only) | Yes | Apache 2.0 | N/A | Rejected — TypeScript-only |
| Semantic Kernel | No | – | – | Azure-oriented | MIT | See MS AF | Rejected — merged into MS Agent Framework |
Selected: Pydantic AI
Only framework with BOTH native Temporal integration AND native MCP support. Pydantic-validated structured output is its core strength — zero impedance mismatch with MetaForge’s domain models. Lightweight, model-agnostic, MIT-licensed.
Runner-up: OpenAI Agents SDK
Temporal and MCP support, but thinner validation layer than Pydantic AI. Designed primarily around OpenAI’s model paradigm (Responses API) even though it supports other providers.
Honorable Mention: Google ADK
Clean architecture, good MCP support, model-agnostic. However, no native Temporal integration and optimized for Google Cloud/Vertex AI deployment.
Decision
4-Layer Architecture
flowchart TB
subgraph L4["L4: MetaForge Hardware Harness (CUSTOM)"]
direction LR
H1["Hardware Prompts"]
H2["FreeCAD/KiCad/SPICE<br/>Error Recovery"]
H3["Design-Rule Context"]
H4["EVT/DVT/PVT<br/>Gate Hooks"]
H5["Agent DAG Engine"]
end
subgraph L3["L3: Agent Framework (Pydantic AI)"]
direction LR
F1["Typed Agent<br/>Definitions"]
F2["MCP Connections"]
F3["Structured Output<br/>Validation"]
F4["Multi-Agent<br/>Delegation"]
F5["Human-in-the-Loop<br/>Approval"]
end
subgraph L2["L2: Durable Execution (Temporal)"]
direction LR
T1["Long-Running<br/>Workflows"]
T2["Crash Recovery<br/>& Replay"]
T3["Approval Gates"]
T4["Activity Retries"]
end
subgraph L1["L1: LLM Providers (model-agnostic)"]
direction LR
M1["Claude"]
M2["GPT-4o"]
M3["Gemini"]
M4["Local Models"]
end
L4 --> L3
L3 --> L2
L2 --> L1
style L4 fill:#E67E22,color:#fff,stroke:#E67E22,stroke-width:2px
style L3 fill:#9b59b6,color:#fff,stroke:#9b59b6,stroke-width:2px
style L2 fill:#3498db,color:#fff,stroke:#3498db,stroke-width:2px
style L1 fill:#2C3E50,color:#fff,stroke:#2C3E50,stroke-width:2px
Layer responsibilities:
| Layer | Responsibility | Build vs. Buy |
|---|---|---|
| L4: Hardware Harness | Domain-specific intelligence — hardware prompt engineering, tool error recovery, design-rule context injection, EVT/DVT/PVT lifecycle hooks, agent DAG orchestration | Custom (competitive advantage) |
| L3: Agent Framework | Agent definitions with typed dependencies and outputs, MCP server connections, structured output validation, multi-agent delegation | Pydantic AI (commodity) |
| L2: Durable Execution | Long-running workflow coordination, crash recovery, deterministic replay, human-in-the-loop approval gates, activity-level retries | Temporal (commodity) |
| L1: LLM Providers | Model-agnostic LLM access via standard SDKs | openai + anthropic SDKs (commodity) |
Architecture Diagrams
Single Agent Execution Flow
sequenceDiagram
participant GW as Gateway<br/>(FastAPI)
participant H as Hardware<br/>Harness (L4)
participant A as Pydantic AI<br/>Agent (L3)
participant MCP as MCP Server
participant T as Temporal<br/>Activity (L2)
participant LLM as LLM Provider<br/>(L1)
GW->>T: Start workflow
T->>H: Execute agent activity
H->>H: Inject hardware context<br/>(design rules, constraints)
H->>A: Run agent with deps
A->>LLM: Prompt with tools
LLM-->>A: Tool call request
A->>MCP: Execute tool<br/>(KiCad ERC, SPICE sim)
MCP-->>A: Tool result
A->>LLM: Continue with result
LLM-->>A: Structured output
A-->>H: Validated Pydantic model
H->>H: Hardware-specific<br/>post-processing
H-->>T: Activity result
T-->>GW: Workflow complete
Multi-Agent DAG Workflow
flowchart LR
subgraph Temporal["Temporal Workflow"]
direction LR
REQ["REQ Agent<br/>Requirements"]
SYS["SYS Agent<br/>Systems"]
EE["EE Agent<br/>Electronics"]
FW["FW Agent<br/>Firmware"]
BOM["BOM Agent<br/>Supply Chain"]
MFG["MFG Agent<br/>Manufacturing"]
end
REQ --> SYS
SYS --> EE
SYS --> FW
EE --> BOM
FW --> BOM
BOM --> MFG
GATE{"EVT Gate<br/>Approval"}
MFG --> GATE
style REQ fill:#9b59b6,color:#fff
style SYS fill:#9b59b6,color:#fff
style EE fill:#9b59b6,color:#fff
style FW fill:#9b59b6,color:#fff
style BOM fill:#9b59b6,color:#fff
style MFG fill:#9b59b6,color:#fff
style GATE fill:#E67E22,color:#fff
The Temporal workflow coordinates agent activities in a DAG. Each agent runs as a Temporal activity with crash recovery and retry semantics. The EVT/DVT/PVT gate is a Temporal signal that blocks until human approval.
MCP Tool Integration
flowchart LR
subgraph Agent["Pydantic AI Agent"]
AC["Agent Core<br/>+ MCP Client"]
end
subgraph MCP_Servers["MCP Servers"]
K["KiCad MCP<br/>Server"]
S["SPICE MCP<br/>Server"]
F["FreeCAD MCP<br/>Server"]
N["Neo4j MCP<br/>Server"]
end
subgraph Tools["External Tools"]
KT["KiCad"]
ST["ngspice"]
FT["FreeCAD"]
NT["Neo4j"]
end
AC --> K
AC --> S
AC --> F
AC --> N
K --> KT
S --> ST
F --> FT
N --> NT
style Agent fill:#9b59b6,color:#fff
style MCP_Servers fill:#3498db,color:#fff
style Tools fill:#27ae60,color:#fff
MCP servers wrap each external tool and expose typed tool schemas. Pydantic AI agents connect to MCP servers via MCPServerStdio or MCPServerHTTP, receiving tools as typed function calls with validated inputs and outputs.
Code Examples
Requirements Agent Definition (Pydantic AI)
from __future__ import annotations
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from pydantic_ai.mcp import MCPServerStdio
# --- Dependencies (injected at runtime) ---
@dataclass
class RequirementsAgentDeps:
"""Typed dependencies for the Requirements Agent."""
project_path: str
design_rules: dict
session_id: str
# --- Structured Output ---
class Constraint(BaseModel):
name: str
value: float
unit: str
min_value: float | None = None
max_value: float | None = None
class RequirementsOutput(BaseModel):
"""Validated output from the Requirements Agent."""
electrical: list[Constraint] = Field(description="Electrical constraints")
mechanical: list[Constraint] = Field(description="Mechanical constraints")
environmental: list[Constraint] = Field(description="Environmental constraints")
cost: list[Constraint] = Field(description="Cost constraints")
assumptions: list[str] = Field(description="Assumptions made during extraction")
confidence: float = Field(ge=0.0, le=1.0, description="Confidence score")
# --- MCP Server Connections ---
neo4j_mcp = MCPServerStdio('npx', args=['-y', '@metaforge/neo4j-mcp-server'])
kicad_mcp = MCPServerStdio('python', args=['-m', 'metaforge.mcp.kicad_server'])
# --- Agent Definition ---
requirements_agent = Agent(
model='anthropic:claude-sonnet-4-20250514',
deps_type=RequirementsAgentDeps,
output_type=RequirementsOutput,
mcp_servers=[neo4j_mcp],
system_prompt=(
'You are the MetaForge Requirements Agent. '
'Extract structured engineering constraints from a PRD. '
'Output must include electrical, mechanical, environmental, '
'and cost constraints with units and ranges.'
),
)
# --- Custom Tool with Dependency Injection ---
@requirements_agent.tool
async def check_design_rules(
ctx: RunContext[RequirementsAgentDeps],
constraint_name: str,
proposed_value: float,
) -> str:
"""Check a proposed constraint against known design rules."""
rules = ctx.deps.design_rules
if constraint_name in rules:
rule = rules[constraint_name]
if rule['min'] <= proposed_value <= rule['max']:
return f"PASS: {constraint_name}={proposed_value} within [{rule['min']}, {rule['max']}]"
return f"FAIL: {constraint_name}={proposed_value} outside [{rule['min']}, {rule['max']}]"
return f"NO_RULE: No design rule found for {constraint_name}"
Temporal Activity Wrapper
from temporalio import activity, workflow
from pydantic_ai import Agent
from metaforge.agents.requirements import (
requirements_agent,
RequirementsAgentDeps,
RequirementsOutput,
)
@activity.defn
async def run_requirements_agent(
prd_content: str,
project_path: str,
session_id: str,
) -> dict:
"""Temporal activity that runs the Requirements Agent."""
deps = RequirementsAgentDeps(
project_path=project_path,
design_rules=load_design_rules(project_path),
session_id=session_id,
)
result = await requirements_agent.run(
f"Extract requirements from this PRD:\n\n{prd_content}",
deps=deps,
)
# result.output is a validated RequirementsOutput instance
return result.output.model_dump()
Temporal Workflow Coordinating Multiple Agents
from temporalio import workflow
from datetime import timedelta
with workflow.unsafe.imports_passed_through():
from metaforge.activities import (
run_requirements_agent,
run_systems_agent,
run_electronics_agent,
run_firmware_agent,
run_bom_agent,
run_manufacturing_agent,
)
@workflow.defn
class HardwareDesignWorkflow:
"""Temporal workflow that orchestrates the full agent DAG."""
@workflow.run
async def run(self, prd_content: str, project_path: str) -> dict:
session_id = workflow.info().workflow_id
# Stage 1: Requirements extraction
requirements = await workflow.execute_activity(
run_requirements_agent,
args=[prd_content, project_path, session_id],
start_to_close_timeout=timedelta(minutes=5),
)
# Stage 2: Systems architecture (depends on requirements)
architecture = await workflow.execute_activity(
run_systems_agent,
args=[requirements, project_path, session_id],
start_to_close_timeout=timedelta(minutes=10),
)
# Stage 3: Electronics + Firmware (parallel, both depend on architecture)
electronics, firmware = await asyncio.gather(
workflow.execute_activity(
run_electronics_agent,
args=[architecture, project_path, session_id],
start_to_close_timeout=timedelta(minutes=10),
),
workflow.execute_activity(
run_firmware_agent,
args=[architecture, project_path, session_id],
start_to_close_timeout=timedelta(minutes=10),
),
)
# Stage 4: BOM analysis (depends on electronics + firmware)
bom = await workflow.execute_activity(
run_bom_agent,
args=[electronics, firmware, project_path, session_id],
start_to_close_timeout=timedelta(minutes=5),
)
# Stage 5: Manufacturing prep (depends on BOM)
manufacturing = await workflow.execute_activity(
run_manufacturing_agent,
args=[bom, project_path, session_id],
start_to_close_timeout=timedelta(minutes=5),
)
# EVT Gate: Wait for human approval
await workflow.wait_condition(lambda: self._gate_approved)
return {
'requirements': requirements,
'architecture': architecture,
'electronics': electronics,
'firmware': firmware,
'bom': bom,
'manufacturing': manufacturing,
}
_gate_approved: bool = False
@workflow.signal
async def approve_gate(self) -> None:
self._gate_approved = True
Build to Delete
Per Philipp Schmid’s principle: design so the Pydantic AI layer is replaceable.
| Component | Framework Coupling | Replacement Cost | Notes |
|---|---|---|---|
| MCP servers (KiCad, SPICE, FreeCAD, Neo4j) | None — framework-agnostic | Zero | MCP is a protocol, not a framework feature |
| Temporal workflows | None — framework-agnostic | Zero | Workflows call activities; activities wrap agents |
| Pydantic domain models | None — used by framework but owned by MetaForge | Zero | RequirementsOutput, BOMEntry, etc. are plain Pydantic models |
| Hardware harness logic | None — sits above the framework | Zero | Prompt templates, error recovery, gate hooks are custom code |
| Agent definitions | Coupled — Agent(), @agent.tool, RunContext |
Moderate | ~20 agent definitions to rewrite |
| Tool decorators | Coupled — @agent.tool with dependency injection |
Moderate | ~50-100 tool functions to re-register |
| MCP client wrappers | Coupled — MCPServerStdio, MCPServerHTTP |
Low | Thin wrappers, standard protocol underneath |
Risk assessment: If Pydantic AI is abandoned or superseded, the replacement surface is limited to agent definitions and tool decorators (~2-3 weeks of refactoring). All domain logic, MCP servers, Temporal workflows, and Pydantic models survive intact.
Migration Path
Text replacements across existing documentation:
| File | Old Text | New Text |
|---|---|---|
README.md |
Custom orchestration layer (no agent framework) |
Pydantic AI + Temporal ([ADR-001](docs/architecture/agent-orchestration-adr.md)) |
docs/architecture/index.md |
subgraph "Agent Base (Custom Orchestration)" |
subgraph "Agent Base (Pydantic AI + Temporal)" |
docs/architecture/index.md |
Custom Orchestration blockquote (Section 2.3) |
Agent Orchestration (ADR-001) blockquote referencing Pydantic AI + Temporal |
docs/architecture/index.md |
Custom Orchestration in Agent Runtime mermaid |
Pydantic AI Framework with Temporal Activities and MCP connections |
docs/architecture/index.md |
custom orchestration, no agent framework in dependencies |
Pydantic AI framework + Temporal with pydantic-ai and temporalio |
docs/architecture/mvp-roadmap.md |
Custom orchestration + openai + @anthropic-ai/sdk |
Pydantic AI + Temporal + openai + anthropic SDKs |
docs/architecture/mvp-roadmap.md |
Agent Orchestration Decision blockquote | Updated text referencing Pydantic AI + Temporal + ADR-001 |
docs/agents/index.md |
TypeScript LLMProvider + Agent interface |
Python Pydantic AI base agent pattern |
docs/architecture/repository-structure.md |
orchestrator/ table |
Add note: agent execution uses Pydantic AI + Temporal |
docs/architecture/repository-structure.md |
domain_agents/ agent.py description |
Reference Pydantic AI agent definition |
docs/research/agent-framework-comparison-2026.md |
(end of document) | Add “Decision Recorded” section linking to ADR-001 |
Consequences
Positive
- Eliminates ~2K lines of custom orchestration plumbing — tool dispatch, MCP integration, structured output validation, and retry logic are handled by Pydantic AI + Temporal
- Native crash recovery — Temporal provides deterministic replay for long-running hardware design workflows (hours/days)
- Zero impedance mismatch — Pydantic AI speaks the same validation language as MetaForge’s domain models
- MCP-native — FreeCAD, KiCad, SPICE, and Neo4j MCP servers plug in directly via
MCPServerStdio/MCPServerHTTP - Model-agnostic — Claude, GPT-4o, Gemini, and local models supported without custom abstraction layers
- Community momentum — Pydantic AI backed by the Pydantic team (ubiquitous in Python ecosystem)
Negative
- Framework dependency — MetaForge now depends on
pydantic-ai(MIT, actively maintained). Mitigated by “Build to Delete” architecture. - Team learning curve — Engineers must learn Pydantic AI’s
Agent(),RunContext,@agent.toolpatterns. Mitigated by strong documentation and Pydantic familiarity. - Framework maturity — Pydantic AI v1.0 shipped September 2025; younger than LangGraph or CrewAI. Mitigated by Pydantic team’s track record and rapid adoption.
Neutral
- Agent lifecycle model maps cleanly — MetaForge’s existing Created → Loading → Executing → Validating → Completed lifecycle maps directly to Pydantic AI’s agent run semantics with Temporal activity wrapping
- Skill pattern preserved — Skills remain pure, stateless capability modules invoked by agents. The framework change affects agent orchestration, not skill execution.
- EVT/DVT/PVT gates — Gate workflow maps to Temporal signals with human-in-the-loop approval, matching the existing gate engine design.
Related Documents
| Document | Relationship |
|---|---|
| System Architecture | Parent architecture document — updated to reference this ADR |
| MVP Roadmap | Technology stack tables updated to reflect Pydantic AI + Temporal |
| Agent System | Base agent interface updated from TypeScript/custom to Python/Pydantic AI |
| Repository Structure | Directory descriptions updated for Pydantic AI agent definitions |
| Agent Framework Comparison (2026) | Research that informed this decision — 14-framework evaluation |
| Orchestrator Technical | Orchestrator design — workflow coordination via Temporal |
| Digital Twin Evolution | Twin architecture unchanged; agents interact via MCP + Neo4j |
| Assistant Mode | Dual-mode operation — reuses Temporal activities for ingest pipeline and Temporal signals for IDE notifications |
| System Observability | OpenTelemetry integration — Pydantic AI supports OTel tracing natively |
Document Version: v1.0 Last Updated: 2026-02-28 Status: Accepted
| ← MVP Roadmap | Architecture Home → |