OpenAI Agents SDK: Architecture Patterns 2026

Why the OpenAI Agents SDK

The OpenAI Agents SDK is the spiritual successor to the Assistants API — a production-grade framework for building AI agents that goes well beyond raw API calls. It solves real pain points: tool definition is Pythonic, multi-agent handoffs are first-class, guardrails are declarative, and tracing is built in.

The Assistants API is being deprecated on August 26, 2026. If you're building on Assistants, migration to the Agents SDK (or the Responses API) is the path forward. If you're building new agent applications, the Agents SDK is the cleaner starting point.

TL;DR

The OpenAI Agents SDK provides four core primitives — Agent, Runner, Tool, and Handoff — that compose into everything from simple chatbots to complex multi-agent pipelines. The framework's design philosophy favors explicit composition over magic, and the Python type hints-to-function-schema generation is genuinely excellent. For production use, the built-in tracing and guardrails system saves weeks of boilerplate.

Key Takeaways

Package: openai-agents (Python); @openai/agents (TypeScript)
Core primitives: Agent, Runner, function_tool decorator, handoff()
Multi-agent: Orchestrator/subagent pattern; agents hand off to each other via handoffs=[] parameter
Guardrails: InputGuardrail + OutputGuardrail for input validation and output checking; run in parallel with agent execution
Tracing: Built-in, sends to OpenAI dashboard; exportable to custom backends
Replacing: OpenAI Assistants API (deprecated August 26, 2026)
Key difference from LangChain: Less magic, more explicit — tool schemas generated from Python type hints; no hidden state

Installation and Basic Setup

pip install openai-agents  # current: v0.12.x (Python 3.10+)

The SDK is still in 0.x (active weekly releases since March 2025). The API is stable for the core primitives but expect evolution in areas like sessions and MCP integration.

import asyncio
from agents import Agent, Runner

# Minimal agent
agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant. Be concise.",
    model="gpt-4.1",
)

async def main():
    result = await Runner.run(
        agent,
        input="What is the capital of France?"
    )
    print(result.final_output)

asyncio.run(main())
# Output: "Paris"

The SDK is async-first. Runner.run() is the primary entry point — it handles the agentic loop (running the agent, executing tools, handling handoffs) until a final result is produced.

Defining Tools

Tools are the mechanism for agents to interact with the world. The SDK generates JSON schemas from Python function type hints — you don't write schema definitions manually.

from agents import Agent, Runner, function_tool
import httpx

@function_tool
def get_weather(city: str) -> str:
    """Get the current weather for a city.

    Args:
        city: The name of the city to get weather for.
    """
    # Real implementation would call a weather API
    response = httpx.get(f"https://api.weather.example.com/current?city={city}")
    data = response.json()
    return f"{data['temp']}°F, {data['condition']} in {city}"

@function_tool
def search_web(query: str, num_results: int = 5) -> list[str]:
    """Search the web and return relevant URLs.

    Args:
        query: The search query.
        num_results: Number of results to return (default 5).
    """
    # Implementation
    return ["https://example.com/result1", "https://example.com/result2"]

agent = Agent(
    name="Research Assistant",
    instructions="Research topics thoroughly using web search and provide accurate answers.",
    model="gpt-4.1",
    tools=[get_weather, search_web],
)

The docstring serves dual purpose: it documents the function for developers AND populates the tool description that the model sees. Write good docstrings — they directly affect how reliably the model uses your tools.

Hosted Tools

The SDK also includes OpenAI-hosted tools that run on OpenAI's infrastructure:

from agents import Agent
from agents.tools import WebSearchTool, CodeInterpreterTool, FileSearchTool

agent = Agent(
    name="Advanced Research Agent",
    instructions="Use web search and code execution to research and analyze data.",
    model="gpt-4.1",
    tools=[
        WebSearchTool(),          # Bing web search via OpenAI
        CodeInterpreterTool(),    # Python execution sandbox
        FileSearchTool(           # Semantic search over uploaded files
            vector_store_ids=["vs_abc123"]
        ),
    ],
)

Hosted tools handle their own infrastructure — you don't manage the web search implementation or code execution sandbox.

The RunContextWrapper passes shared state across tool calls in a single agent run without going through the model:

from agents import Agent, Runner, RunContextWrapper, function_tool
from dataclasses import dataclass
from typing import Optional
import asyncio

@dataclass
class UserContext:
    user_id: str
    subscription_tier: str
    cached_data: dict = None

    def __post_init__(self):
        self.cached_data = {}

@function_tool
def get_user_preferences(ctx: RunContextWrapper[UserContext]) -> str:
    """Get the current user's notification preferences."""
    user_id = ctx.context.user_id
    # Access user context without passing it through the LLM
    return f"User {user_id} prefers email notifications, dark mode enabled"

@function_tool
def update_setting(ctx: RunContextWrapper[UserContext], setting: str, value: str) -> str:
    """Update a user setting.

    Args:
        setting: The setting name to update.
        value: The new value for the setting.
    """
    user_id = ctx.context.user_id
    ctx.context.cached_data[setting] = value
    return f"Updated {setting} to {value} for user {user_id}"

agent = Agent(
    name="Settings Agent",
    instructions="Help users manage their account settings.",
    model="gpt-4.1",
    tools=[get_user_preferences, update_setting],
)

async def main():
    context = UserContext(user_id="user-123", subscription_tier="pro")

    result = await Runner.run(
        agent,
        input="Turn on dark mode for me",
        context=context,
    )
    print(result.final_output)
    print(f"Cached data: {context.cached_data}")

asyncio.run(main())

The context type parameter (RunContextWrapper[UserContext]) is a generic — it enforces type safety across all tool functions that access context.

Guardrails

Guardrails run in parallel with agent execution and can block or modify the agent's behavior:

from agents import Agent, Runner, InputGuardrail, OutputGuardrail, GuardrailFunctionOutput
from agents import RunContextWrapper
from pydantic import BaseModel

class SafetyCheck(BaseModel):
    is_safe: bool
    reason: str

# Input guardrail — check before agent runs
guardrail_agent = Agent(
    name="Safety Checker",
    instructions="Determine if user input is safe for a customer service bot to respond to.",
    model="gpt-4o-mini",  # Use a smaller/cheaper model for guardrails
    output_type=SafetyCheck,
)

async def safety_guardrail(ctx: RunContextWrapper, agent: Agent, input: str) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input=input, context=ctx.context)
    safety = result.final_output

    return GuardrailFunctionOutput(
        output_info=safety,
        tripwire_triggered=not safety.is_safe,  # Block if not safe
    )

# Output guardrail — check before returning to user
class ResponseCheck(BaseModel):
    contains_pii: bool
    pii_types: list[str]

output_checker = Agent(
    name="PII Checker",
    instructions="Check if a response contains PII (names, emails, phone numbers, addresses).",
    model="gpt-4o-mini",
    output_type=ResponseCheck,
)

async def pii_guardrail(ctx: RunContextWrapper, agent: Agent, output: str) -> GuardrailFunctionOutput:
    result = await Runner.run(output_checker, input=f"Response to check: {output}", context=ctx.context)
    check = result.final_output

    return GuardrailFunctionOutput(
        output_info=check,
        tripwire_triggered=check.contains_pii,
    )

# Main agent with guardrails
customer_service_agent = Agent(
    name="Customer Service",
    instructions="Help customers with orders, returns, and account questions.",
    model="gpt-4.1",
    input_guardrails=[
        InputGuardrail(guardrail_function=safety_guardrail),
    ],
    output_guardrails=[
        OutputGuardrail(guardrail_function=pii_guardrail),
    ],
)

Guardrails use a separate (typically smaller) model and run concurrently with the main agent. If a guardrail triggers, the run stops and an exception is raised — handle it in your application:

Guardrails scope in multi-agent chains: Input guardrails run only for the first agent in a handoff chain. Output guardrails run only for the last agent (the one that produces the final response). If you need guardrails on intermediate agents, attach them directly to each agent in the chain.

from agents.exceptions import InputGuardrailTripwireTriggered, OutputGuardrailTripwireTriggered

try:
    result = await Runner.run(customer_service_agent, input=user_message)
    return result.final_output
except InputGuardrailTripwireTriggered as e:
    return "I can't help with that request."
except OutputGuardrailTripwireTriggered as e:
    return "I encountered an issue generating a safe response."

Multi-Agent Patterns

The SDK's most powerful capability is composing multiple agents together. There are two patterns: handoffs (agents transfer control) and orchestrators (an agent calls subagents as tools).

Pattern 1: Handoffs (Specialization)

from agents import Agent, Runner, handoff
import asyncio

# Specialist agents
billing_agent = Agent(
    name="Billing Specialist",
    instructions="""You handle billing questions, payment issues, and subscription management.
    You have access to billing systems and can process refunds up to $100.""",
    model="gpt-4.1",
)

technical_agent = Agent(
    name="Technical Support",
    instructions="""You handle technical issues, bugs, and integration problems.
    You have access to engineering documentation and can create support tickets.""",
    model="gpt-4.1",
)

# Triage agent that routes to specialists
triage_agent = Agent(
    name="Support Triage",
    instructions="""You are a triage agent for customer support.
    Route billing questions to the Billing Specialist.
    Route technical questions to Technical Support.
    Handle simple FAQ questions directly.""",
    model="gpt-4o-mini",  # Cheaper model for routing
    handoffs=[billing_agent, technical_agent],
)

async def main():
    result = await Runner.run(
        triage_agent,
        input="I was charged twice for my subscription last month"
    )
    print(result.final_output)
    # Routes to billing_agent, which handles the response

asyncio.run(main())

In a handoff, the triage agent decides to transfer control to a specialist agent. The specialist runs with the full conversation history and produces the final response.

Pattern 2: Orchestrator + Subagents

from agents import Agent, Runner, function_tool
from typing import Any
import asyncio

# Subagents defined as tools for the orchestrator
@function_tool
async def research_topic(topic: str) -> str:
    """Research a topic thoroughly and return a summary.

    Args:
        topic: The topic to research.
    """
    researcher = Agent(
        name="Researcher",
        instructions="Research the given topic and provide a thorough, factual summary.",
        model="gpt-4.1",
    )
    result = await Runner.run(researcher, input=f"Research this topic: {topic}")
    return result.final_output

@function_tool
async def write_section(content_brief: str, tone: str = "professional") -> str:
    """Write a content section based on a brief.

    Args:
        content_brief: Description of what to write.
        tone: Writing tone (professional, casual, technical).
    """
    writer = Agent(
        name="Writer",
        instructions=f"Write engaging {tone} content based on the provided brief.",
        model="gpt-4.1",
    )
    result = await Runner.run(writer, input=content_brief)
    return result.final_output

# Orchestrator manages the workflow
orchestrator = Agent(
    name="Content Orchestrator",
    instructions="""Create a comprehensive blog post by:
    1. Researching the topic
    2. Writing an introduction
    3. Writing 3 main sections based on research
    4. Writing a conclusion
    Combine all sections into a complete post.""",
    model="gpt-4.1",
    tools=[research_topic, write_section],
)

async def main():
    result = await Runner.run(
        orchestrator,
        input="Write a blog post about the environmental impact of data centers"
    )
    print(result.final_output)

asyncio.run(main())

The orchestrator pattern gives the top-level agent control over the workflow — it decides when to delegate, what to ask each subagent, and how to synthesize results.

Pattern 3: Parallel Agent Execution

from agents import Runner, Agent
import asyncio

analysis_agents = [
    Agent(name="Technical Analyst", instructions="Analyze the technical aspects.", model="gpt-4.1"),
    Agent(name="Business Analyst", instructions="Analyze the business implications.", model="gpt-4.1"),
    Agent(name="Risk Analyst", instructions="Identify risks and mitigation strategies.", model="gpt-4.1"),
]

async def parallel_analysis(document: str) -> dict:
    """Run multiple analysis agents in parallel."""
    tasks = [
        Runner.run(agent, input=f"Analyze this document:\n\n{document}")
        for agent in analysis_agents
    ]

    results = await asyncio.gather(*tasks)

    return {
        "technical": results[0].final_output,
        "business": results[1].final_output,
        "risk": results[2].final_output,
    }

Parallel execution is just asyncio.gather — no special SDK primitive needed. This pattern is effective for independent analyses that can be synthesized afterward.

Structured Output

Force agents to return structured data with the output_type parameter:

from agents import Agent, Runner
from pydantic import BaseModel
from typing import Optional
import asyncio

class ExtractedInvoice(BaseModel):
    vendor_name: str
    invoice_number: str
    total_amount: float
    currency: str
    due_date: Optional[str]
    line_items: list[dict]

extraction_agent = Agent(
    name="Invoice Extractor",
    instructions="Extract structured data from invoice text. Be precise with amounts and dates.",
    model="gpt-4.1",
    output_type=ExtractedInvoice,
)

async def extract_invoice(invoice_text: str) -> ExtractedInvoice:
    result = await Runner.run(extraction_agent, input=invoice_text)
    return result.final_output  # Already typed as ExtractedInvoice

async def main():
    invoice_text = """
    INVOICE #INV-2026-0042
    Vendor: Acme Corp
    Due: 2026-04-15

    Web hosting (1 month)    $299.00
    SSL Certificate          $49.00
    Total: $348.00 USD
    """

    invoice = await extract_invoice(invoice_text)
    print(f"Vendor: {invoice.vendor_name}")
    print(f"Total: ${invoice.total_amount} {invoice.currency}")

asyncio.run(main())

The SDK uses Pydantic for output validation. The model is instructed to return JSON matching the schema, and the SDK validates and parses it automatically.

Streaming Responses

from agents import Runner, Agent
import asyncio

agent = Agent(
    name="Streaming Agent",
    instructions="Provide detailed, helpful responses.",
    model="gpt-4.1",
)

async def stream_response(user_input: str):
    async with Runner.run_streamed(agent, input=user_input) as stream:
        async for event in stream.stream_events():
            if event.type == "raw_response_event":
                # Stream text delta to user
                if hasattr(event.data, 'delta') and hasattr(event.data.delta, 'text'):
                    print(event.data.delta.text, end='', flush=True)
        print()  # Newline after streaming completes
    return stream.final_output

Tracing and Observability

Every agent run automatically generates a trace in the OpenAI dashboard. You can also export traces to custom backends:

from agents.tracing import set_trace_processors, TracingProcessor

class CustomTracer(TracingProcessor):
    def on_trace_start(self, trace):
        print(f"Trace started: {trace.trace_id}")

    def on_span_start(self, span):
        if span.span_data.type == "tool":
            print(f"Tool called: {span.span_data.name}")

    def on_trace_end(self, trace):
        total_tokens = sum(
            s.span_data.usage.total_tokens
            for s in trace.spans
            if hasattr(s.span_data, 'usage') and s.span_data.usage
        )
        print(f"Trace complete. Total tokens: {total_tokens}")

set_trace_processors([CustomTracer()])

The trace structure captures every LLM call, tool invocation, and handoff — essential for debugging complex multi-agent workflows.

Migration from Assistants API

The Assistants API is deprecated August 26, 2026. Key migration points:

Assistants API	Agents SDK
`Assistant` object	`Agent` dataclass
`Thread`	Conversation history passed to `Runner.run()`
`Run`	`Runner.run()` return value
`Tool definitions (JSON schema)`	Python functions with type hints
`File attachments`	`FileSearchTool` + vector stores
`Code Interpreter`	`CodeInterpreterTool()`
Polling for run completion	`await Runner.run()` (async)

The biggest conceptual shift: the Assistants API maintained state server-side (threads, runs, messages). The Agents SDK uses the Responses API under the hood (not the Assistants API or Chat Completions API) — it's stateless by default, though the built-in Sessions system handles persistence for you.

Sessions for multi-turn memory (the stateless alternative to Threads):

from agents import Runner
from agents.extensions.sessions import SQLiteSession

session = SQLiteSession("conversation-user-123")

# Turn 1
result = await Runner.run(agent, "My name is Alex", session=session)

# Turn 2 — prior history loaded automatically
result = await Runner.run(agent, "What did I just tell you?", session=session)
# Agent knows it's Alex; no manual history management needed

Backends available: SQLiteSession, RedisSession, SQLAlchemySession, OpenAIConversationsSession. Custom backends via the SessionABC protocol.

When to Use OpenAI Agents SDK vs Alternatives

Use Case	Best Choice
Production OpenAI agents	OpenAI Agents SDK
Multi-provider LLM (Claude + GPT + Gemini)	LangChain / LangGraph
Complex stateful multi-agent workflows	CrewAI or AutoGen
Maximum control, no framework	Raw API calls
TypeScript/Node.js agents	OpenAI Agents TypeScript SDK

The OpenAI Agents SDK is the right default for teams building on OpenAI models. If you need multi-provider support or very complex state machines, LangGraph has more flexibility — at the cost of more complexity.

Cost Management for Agent Applications

Multi-agent workflows multiply token costs in ways that catch teams off guard. A simple single-agent interaction with one tool call might consume 500-800 input tokens. An orchestrator that spawns three subagents — each with their own tool calls — can consume 5,000-15,000 tokens per user request. Knowing where tokens accumulate is necessary for keeping production costs manageable.

Instruction tokens accumulate across agents. Every agent run includes the system instructions in the context window. For a complex multi-agent chain, you're paying for each agent's instructions on every message in the chain. Keep instructions concise — 200 words is usually sufficient for a focused agent. An orchestrator with 2,000-word instructions pays for those instructions on every triage request, even when the triage decision takes one sentence and the actual work happens in a subagent.

Guardrails add model calls. Each guardrail runs a separate model invocation, concurrent with the main agent. With input and output guardrails on a high-traffic agent, you're effectively running three model calls per user interaction. Using a fast, cheap model for guardrails (gpt-4o-mini at $0.15/1M input tokens vs gpt-4.1 at $2/1M) is a meaningful cost optimization that doesn't compromise safety quality for most guardrail use cases.

Attribution via hooks. The SDK's AgentHooks lifecycle callbacks give you token usage per agent invocation — use them to build cost attribution by agent type and user action. Without this attribution, it's difficult to know which part of a multi-agent chain is expensive.

Establish cost-per-user-action baselines during development, before your traffic is high enough to make optimization urgent. A chatbot that costs $0.05 per session becomes $50,000 at one million sessions. Model selection and instruction length both compound at scale; the time to optimize is before you've committed to the pattern in production.

Debugging Multi-Agent Failures

Multi-agent workflows fail in ways that single-agent systems don't. The trace view in the OpenAI dashboard makes most failures obvious once you know what to look for, but understanding the common failure modes saves significant debugging time.

Handoff loops. The triage agent hands off to the billing agent; the billing agent hands off back to triage because the query falls outside its narrowly defined scope. This creates a loop that runs until the max_turns limit stops it. The fix: give each agent explicit instructions about what it should not do. "Do not hand off back to the triage agent under any circumstances — handle the request directly or decline it" is more reliable than expecting the model to infer the rule from the handoff structure.

Tool call argument validation failures. When an agent generates a tool call with invalid arguments — wrong type, missing required field — the SDK raises a validation error and either retries or stops. The cause is almost always an ambiguous or incomplete docstring. The model uses the docstring to understand parameter format; be explicit: "date string in YYYY-MM-DD format, e.g. 2026-03-15" rather than simply "the date."

Context window overflow in long sessions. Multi-turn sessions with many tool calls accumulate context across the conversation. When session length approaches the model's context limit, response quality degrades before errors appear. Set explicit max_turns limits and implement conversation summarization after a threshold number of turns, rather than passing the full raw history indefinitely.

Every Runner.run() call generates a trace_id available on the RunResult object. Log this alongside your application request ID from the start — when users report failures, the trace ID is the fastest path to the exact model inputs and outputs that produced the problem.

Compare AI agent frameworks at APIScout.

The API Integration Checklist (Free PDF)