OpenAI Agents SDK: Architecture Patterns 2026
Why the OpenAI Agents SDK
The OpenAI Agents SDK is the spiritual successor to the Assistants API — a production-grade framework for building AI agents that goes well beyond raw API calls. It solves real pain points: tool definition is Pythonic, multi-agent handoffs are first-class, guardrails are declarative, and tracing is built in.
The Assistants API is being deprecated on August 26, 2026. If you're building on Assistants, migration to the Agents SDK (or the Responses API) is the path forward. If you're building new agent applications, the Agents SDK is the cleaner starting point.
TL;DR
The OpenAI Agents SDK provides four core primitives — Agent, Runner, Tool, and Handoff — that compose into everything from simple chatbots to complex multi-agent pipelines. The framework's design philosophy favors explicit composition over magic, and the Python type hints-to-function-schema generation is genuinely excellent. For production use, the built-in tracing and guardrails system saves weeks of boilerplate.
Key Takeaways
- Package:
openai-agents(Python);@openai/agents(TypeScript) - Core primitives:
Agent,Runner,function_tooldecorator,handoff() - Multi-agent: Orchestrator/subagent pattern; agents hand off to each other via
handoffs=[]parameter - Guardrails:
InputGuardrail+OutputGuardrailfor input validation and output checking; run in parallel with agent execution - Tracing: Built-in, sends to OpenAI dashboard; exportable to custom backends
- Replacing: OpenAI Assistants API (deprecated August 26, 2026)
- Key difference from LangChain: Less magic, more explicit — tool schemas generated from Python type hints; no hidden state
Installation and Basic Setup
pip install openai-agents # current: v0.12.x (Python 3.10+)
The SDK is still in 0.x (active weekly releases since March 2025). The API is stable for the core primitives but expect evolution in areas like sessions and MCP integration.
import asyncio
from agents import Agent, Runner
# Minimal agent
agent = Agent(
name="Assistant",
instructions="You are a helpful assistant. Be concise.",
model="gpt-4.1",
)
async def main():
result = await Runner.run(
agent,
input="What is the capital of France?"
)
print(result.final_output)
asyncio.run(main())
# Output: "Paris"
The SDK is async-first. Runner.run() is the primary entry point — it handles the agentic loop (running the agent, executing tools, handling handoffs) until a final result is produced.
Defining Tools
Tools are the mechanism for agents to interact with the world. The SDK generates JSON schemas from Python function type hints — you don't write schema definitions manually.
from agents import Agent, Runner, function_tool
import httpx
@function_tool
def get_weather(city: str) -> str:
"""Get the current weather for a city.
Args:
city: The name of the city to get weather for.
"""
# Real implementation would call a weather API
response = httpx.get(f"https://api.weather.example.com/current?city={city}")
data = response.json()
return f"{data['temp']}°F, {data['condition']} in {city}"
@function_tool
def search_web(query: str, num_results: int = 5) -> list[str]:
"""Search the web and return relevant URLs.
Args:
query: The search query.
num_results: Number of results to return (default 5).
"""
# Implementation
return ["https://example.com/result1", "https://example.com/result2"]
agent = Agent(
name="Research Assistant",
instructions="Research topics thoroughly using web search and provide accurate answers.",
model="gpt-4.1",
tools=[get_weather, search_web],
)
The docstring serves dual purpose: it documents the function for developers AND populates the tool description that the model sees. Write good docstrings — they directly affect how reliably the model uses your tools.
Hosted Tools
The SDK also includes OpenAI-hosted tools that run on OpenAI's infrastructure:
from agents import Agent
from agents.tools import WebSearchTool, CodeInterpreterTool, FileSearchTool
agent = Agent(
name="Advanced Research Agent",
instructions="Use web search and code execution to research and analyze data.",
model="gpt-4.1",
tools=[
WebSearchTool(), # Bing web search via OpenAI
CodeInterpreterTool(), # Python execution sandbox
FileSearchTool( # Semantic search over uploaded files
vector_store_ids=["vs_abc123"]
),
],
)
Hosted tools handle their own infrastructure — you don't manage the web search implementation or code execution sandbox.
Context: Sharing State Between Tool Calls
The RunContextWrapper passes shared state across tool calls in a single agent run without going through the model:
from agents import Agent, Runner, RunContextWrapper, function_tool
from dataclasses import dataclass
from typing import Optional
import asyncio
@dataclass
class UserContext:
user_id: str
subscription_tier: str
cached_data: dict = None
def __post_init__(self):
self.cached_data = {}
@function_tool
def get_user_preferences(ctx: RunContextWrapper[UserContext]) -> str:
"""Get the current user's notification preferences."""
user_id = ctx.context.user_id
# Access user context without passing it through the LLM
return f"User {user_id} prefers email notifications, dark mode enabled"
@function_tool
def update_setting(ctx: RunContextWrapper[UserContext], setting: str, value: str) -> str:
"""Update a user setting.
Args:
setting: The setting name to update.
value: The new value for the setting.
"""
user_id = ctx.context.user_id
ctx.context.cached_data[setting] = value
return f"Updated {setting} to {value} for user {user_id}"
agent = Agent(
name="Settings Agent",
instructions="Help users manage their account settings.",
model="gpt-4.1",
tools=[get_user_preferences, update_setting],
)
async def main():
context = UserContext(user_id="user-123", subscription_tier="pro")
result = await Runner.run(
agent,
input="Turn on dark mode for me",
context=context,
)
print(result.final_output)
print(f"Cached data: {context.cached_data}")
asyncio.run(main())
The context type parameter (RunContextWrapper[UserContext]) is a generic — it enforces type safety across all tool functions that access context.
Guardrails
Guardrails run in parallel with agent execution and can block or modify the agent's behavior:
from agents import Agent, Runner, InputGuardrail, OutputGuardrail, GuardrailFunctionOutput
from agents import RunContextWrapper
from pydantic import BaseModel
class SafetyCheck(BaseModel):
is_safe: bool
reason: str
# Input guardrail — check before agent runs
guardrail_agent = Agent(
name="Safety Checker",
instructions="Determine if user input is safe for a customer service bot to respond to.",
model="gpt-4o-mini", # Use a smaller/cheaper model for guardrails
output_type=SafetyCheck,
)
async def safety_guardrail(ctx: RunContextWrapper, agent: Agent, input: str) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input=input, context=ctx.context)
safety = result.final_output
return GuardrailFunctionOutput(
output_info=safety,
tripwire_triggered=not safety.is_safe, # Block if not safe
)
# Output guardrail — check before returning to user
class ResponseCheck(BaseModel):
contains_pii: bool
pii_types: list[str]
output_checker = Agent(
name="PII Checker",
instructions="Check if a response contains PII (names, emails, phone numbers, addresses).",
model="gpt-4o-mini",
output_type=ResponseCheck,
)
async def pii_guardrail(ctx: RunContextWrapper, agent: Agent, output: str) -> GuardrailFunctionOutput:
result = await Runner.run(output_checker, input=f"Response to check: {output}", context=ctx.context)
check = result.final_output
return GuardrailFunctionOutput(
output_info=check,
tripwire_triggered=check.contains_pii,
)
# Main agent with guardrails
customer_service_agent = Agent(
name="Customer Service",
instructions="Help customers with orders, returns, and account questions.",
model="gpt-4.1",
input_guardrails=[
InputGuardrail(guardrail_function=safety_guardrail),
],
output_guardrails=[
OutputGuardrail(guardrail_function=pii_guardrail),
],
)
Guardrails use a separate (typically smaller) model and run concurrently with the main agent. If a guardrail triggers, the run stops and an exception is raised — handle it in your application:
Guardrails scope in multi-agent chains: Input guardrails run only for the first agent in a handoff chain. Output guardrails run only for the last agent (the one that produces the final response). If you need guardrails on intermediate agents, attach them directly to each agent in the chain.
from agents.exceptions import InputGuardrailTripwireTriggered, OutputGuardrailTripwireTriggered
try:
result = await Runner.run(customer_service_agent, input=user_message)
return result.final_output
except InputGuardrailTripwireTriggered as e:
return "I can't help with that request."
except OutputGuardrailTripwireTriggered as e:
return "I encountered an issue generating a safe response."
Multi-Agent Patterns
The SDK's most powerful capability is composing multiple agents together. There are two patterns: handoffs (agents transfer control) and orchestrators (an agent calls subagents as tools).
Pattern 1: Handoffs (Specialization)
from agents import Agent, Runner, handoff
import asyncio
# Specialist agents
billing_agent = Agent(
name="Billing Specialist",
instructions="""You handle billing questions, payment issues, and subscription management.
You have access to billing systems and can process refunds up to $100.""",
model="gpt-4.1",
)
technical_agent = Agent(
name="Technical Support",
instructions="""You handle technical issues, bugs, and integration problems.
You have access to engineering documentation and can create support tickets.""",
model="gpt-4.1",
)
# Triage agent that routes to specialists
triage_agent = Agent(
name="Support Triage",
instructions="""You are a triage agent for customer support.
Route billing questions to the Billing Specialist.
Route technical questions to Technical Support.
Handle simple FAQ questions directly.""",
model="gpt-4o-mini", # Cheaper model for routing
handoffs=[billing_agent, technical_agent],
)
async def main():
result = await Runner.run(
triage_agent,
input="I was charged twice for my subscription last month"
)
print(result.final_output)
# Routes to billing_agent, which handles the response
asyncio.run(main())
In a handoff, the triage agent decides to transfer control to a specialist agent. The specialist runs with the full conversation history and produces the final response.
Pattern 2: Orchestrator + Subagents
from agents import Agent, Runner, function_tool
from typing import Any
import asyncio
# Subagents defined as tools for the orchestrator
@function_tool
async def research_topic(topic: str) -> str:
"""Research a topic thoroughly and return a summary.
Args:
topic: The topic to research.
"""
researcher = Agent(
name="Researcher",
instructions="Research the given topic and provide a thorough, factual summary.",
model="gpt-4.1",
)
result = await Runner.run(researcher, input=f"Research this topic: {topic}")
return result.final_output
@function_tool
async def write_section(content_brief: str, tone: str = "professional") -> str:
"""Write a content section based on a brief.
Args:
content_brief: Description of what to write.
tone: Writing tone (professional, casual, technical).
"""
writer = Agent(
name="Writer",
instructions=f"Write engaging {tone} content based on the provided brief.",
model="gpt-4.1",
)
result = await Runner.run(writer, input=content_brief)
return result.final_output
# Orchestrator manages the workflow
orchestrator = Agent(
name="Content Orchestrator",
instructions="""Create a comprehensive blog post by:
1. Researching the topic
2. Writing an introduction
3. Writing 3 main sections based on research
4. Writing a conclusion
Combine all sections into a complete post.""",
model="gpt-4.1",
tools=[research_topic, write_section],
)
async def main():
result = await Runner.run(
orchestrator,
input="Write a blog post about the environmental impact of data centers"
)
print(result.final_output)
asyncio.run(main())
The orchestrator pattern gives the top-level agent control over the workflow — it decides when to delegate, what to ask each subagent, and how to synthesize results.
Pattern 3: Parallel Agent Execution
from agents import Runner, Agent
import asyncio
analysis_agents = [
Agent(name="Technical Analyst", instructions="Analyze the technical aspects.", model="gpt-4.1"),
Agent(name="Business Analyst", instructions="Analyze the business implications.", model="gpt-4.1"),
Agent(name="Risk Analyst", instructions="Identify risks and mitigation strategies.", model="gpt-4.1"),
]
async def parallel_analysis(document: str) -> dict:
"""Run multiple analysis agents in parallel."""
tasks = [
Runner.run(agent, input=f"Analyze this document:\n\n{document}")
for agent in analysis_agents
]
results = await asyncio.gather(*tasks)
return {
"technical": results[0].final_output,
"business": results[1].final_output,
"risk": results[2].final_output,
}
Parallel execution is just asyncio.gather — no special SDK primitive needed. This pattern is effective for independent analyses that can be synthesized afterward.
Structured Output
Force agents to return structured data with the output_type parameter:
from agents import Agent, Runner
from pydantic import BaseModel
from typing import Optional
import asyncio
class ExtractedInvoice(BaseModel):
vendor_name: str
invoice_number: str
total_amount: float
currency: str
due_date: Optional[str]
line_items: list[dict]
extraction_agent = Agent(
name="Invoice Extractor",
instructions="Extract structured data from invoice text. Be precise with amounts and dates.",
model="gpt-4.1",
output_type=ExtractedInvoice,
)
async def extract_invoice(invoice_text: str) -> ExtractedInvoice:
result = await Runner.run(extraction_agent, input=invoice_text)
return result.final_output # Already typed as ExtractedInvoice
async def main():
invoice_text = """
INVOICE #INV-2026-0042
Vendor: Acme Corp
Due: 2026-04-15
Web hosting (1 month) $299.00
SSL Certificate $49.00
Total: $348.00 USD
"""
invoice = await extract_invoice(invoice_text)
print(f"Vendor: {invoice.vendor_name}")
print(f"Total: ${invoice.total_amount} {invoice.currency}")
asyncio.run(main())
The SDK uses Pydantic for output validation. The model is instructed to return JSON matching the schema, and the SDK validates and parses it automatically.
Streaming Responses
from agents import Runner, Agent
import asyncio
agent = Agent(
name="Streaming Agent",
instructions="Provide detailed, helpful responses.",
model="gpt-4.1",
)
async def stream_response(user_input: str):
async with Runner.run_streamed(agent, input=user_input) as stream:
async for event in stream.stream_events():
if event.type == "raw_response_event":
# Stream text delta to user
if hasattr(event.data, 'delta') and hasattr(event.data.delta, 'text'):
print(event.data.delta.text, end='', flush=True)
print() # Newline after streaming completes
return stream.final_output
Tracing and Observability
Every agent run automatically generates a trace in the OpenAI dashboard. You can also export traces to custom backends:
from agents.tracing import set_trace_processors, TracingProcessor
class CustomTracer(TracingProcessor):
def on_trace_start(self, trace):
print(f"Trace started: {trace.trace_id}")
def on_span_start(self, span):
if span.span_data.type == "tool":
print(f"Tool called: {span.span_data.name}")
def on_trace_end(self, trace):
total_tokens = sum(
s.span_data.usage.total_tokens
for s in trace.spans
if hasattr(s.span_data, 'usage') and s.span_data.usage
)
print(f"Trace complete. Total tokens: {total_tokens}")
set_trace_processors([CustomTracer()])
The trace structure captures every LLM call, tool invocation, and handoff — essential for debugging complex multi-agent workflows.
Migration from Assistants API
The Assistants API is deprecated August 26, 2026. Key migration points:
| Assistants API | Agents SDK |
|---|---|
Assistant object | Agent dataclass |
Thread | Conversation history passed to Runner.run() |
Run | Runner.run() return value |
Tool definitions (JSON schema) | Python functions with type hints |
File attachments | FileSearchTool + vector stores |
Code Interpreter | CodeInterpreterTool() |
| Polling for run completion | await Runner.run() (async) |
The biggest conceptual shift: the Assistants API maintained state server-side (threads, runs, messages). The Agents SDK uses the Responses API under the hood (not the Assistants API or Chat Completions API) — it's stateless by default, though the built-in Sessions system handles persistence for you.
Sessions for multi-turn memory (the stateless alternative to Threads):
from agents import Runner
from agents.extensions.sessions import SQLiteSession
session = SQLiteSession("conversation-user-123")
# Turn 1
result = await Runner.run(agent, "My name is Alex", session=session)
# Turn 2 — prior history loaded automatically
result = await Runner.run(agent, "What did I just tell you?", session=session)
# Agent knows it's Alex; no manual history management needed
Backends available: SQLiteSession, RedisSession, SQLAlchemySession, OpenAIConversationsSession. Custom backends via the SessionABC protocol.
When to Use OpenAI Agents SDK vs Alternatives
| Use Case | Best Choice |
|---|---|
| Production OpenAI agents | OpenAI Agents SDK |
| Multi-provider LLM (Claude + GPT + Gemini) | LangChain / LangGraph |
| Complex stateful multi-agent workflows | CrewAI or AutoGen |
| Maximum control, no framework | Raw API calls |
| TypeScript/Node.js agents | OpenAI Agents TypeScript SDK |
The OpenAI Agents SDK is the right default for teams building on OpenAI models. If you need multi-provider support or very complex state machines, LangGraph has more flexibility — at the cost of more complexity.
Compare AI agent frameworks at APIScout.
Related: LangChain vs CrewAI vs OpenAI Agents SDK 2026 · Claude API Extended Thinking 2026