OpenAI Responses API vs Assistants API
TL;DR
Migrate to the Responses API now — don't wait for the August 26, 2026 deadline. The Responses API is simpler, cheaper (40–80% better cache utilization), and unlocks features the Assistants API never will: Computer Use, MCP server connections, and deep research tools. The migration is a meaningful architectural change (Threads → Conversations, Assistants → stateless calls), not a drop-in replacement, but it's worth doing before the forced cutoff.
Key Takeaways
- Deprecation deadline: Assistants API shuts down August 26, 2026 — Azure OpenAI customers get until February 2027
- Architecture shift: Assistants API was stateful (Threads, Runs, Messages as API objects); Responses API is stateless (you manage conversation history client-side)
- New capabilities: Responses API adds Computer Use, MCP server connections, and Deep Research — none will come to Assistants API
- Cost improvement: 40–80% better cache utilization vs Chat Completions, and outperforms Assistants API on caching
- Assistants → Prompts: Persistent Assistant objects are replaced by Prompts, but Prompts must be created in the OpenAI dashboard, not via API
- Threads → Conversations API: Long-running multi-turn conversations use the new Conversations API (durable
conversation_id) instead of Threads
Why OpenAI Is Replacing the Assistants API
The Assistants API launched in 2023 as a managed stateful layer: you created Assistant objects (with a model, instructions, and tools), created Thread objects (conversation containers), and ran them with Run objects. All state lived on OpenAI's servers.
The problem was complexity — three API objects (Assistants, Threads, Runs) for what is fundamentally a "send prompt, get response" operation. The polling loop for Run completion was awkward. Tool calls inside Runs required additional API calls to submit outputs. And Threads became bottlenecks for long conversations due to context window management.
The Responses API collapses this into a cleaner model:
Assistants API: Responses API:
Create Assistant Pass instructions inline (or via Prompt)
Create Thread Pass conversation history inline
Create Run Single responses.create() call
Poll for Run completion Streaming response or single result
Fetch Messages Response directly in API return
Architecture Differences
Stateful vs Stateless
The Assistants API stored your conversation history on OpenAI's servers. You referenced a thread_id and OpenAI managed what was included in the context window.
The Responses API is stateless by default — you send the full conversation history with each request:
import OpenAI from 'openai';
const client = new OpenAI();
// Assistants API — stateful (conversation lives on OpenAI's servers)
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
role: 'user',
content: 'What is the capital of France?',
});
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: 'asst_abc123',
});
const messages = await client.beta.threads.messages.list(thread.id);
const answer = messages.data[0].content[0].text.value;
// Responses API — stateless (you manage history)
const response = await client.responses.create({
model: 'gpt-4o',
instructions: 'You are a helpful assistant.',
input: 'What is the capital of France?',
});
console.log(response.output_text); // "Paris"
// Multi-turn: pass previous_response_id for context continuation
const followUp = await client.responses.create({
model: 'gpt-4o',
previous_response_id: response.id, // Links to previous context
input: 'What is its population?',
});
Managing Conversation State
For persistent multi-turn conversations (equivalent to Threads), the Responses API introduces the Conversations API:
// Create a durable conversation (replaces Threads)
const conversation = await client.conversations.create({
metadata: { userId: 'user_123', sessionType: 'support' },
});
// Use conversation_id for persistent context across sessions
const response = await client.responses.create({
model: 'gpt-4o',
conversation_id: conversation.id, // OpenAI stores the history
instructions: 'You are a helpful customer support agent.',
input: 'I need help with my subscription.',
});
// Later session, same conversation
const laterResponse = await client.responses.create({
model: 'gpt-4o',
conversation_id: conversation.id, // Same conversation_id
input: 'Has my issue been escalated?',
});
// OpenAI remembers the full history from this conversation_id
Tools: File Search, Code Interpreter, Computer Use
The core tools map across, but Computer Use is exclusive to the Responses API:
// File search (equivalent to Assistants' file_search tool)
const response = await client.responses.create({
model: 'gpt-4o',
input: 'Summarize the attached PDF',
tools: [
{
type: 'file_search',
vector_store_ids: ['vs_abc123'],
},
],
});
// Code interpreter (same as Assistants)
const calcResponse = await client.responses.create({
model: 'gpt-4o',
input: 'Calculate the compound interest on $10,000 at 5% for 10 years',
tools: [{ type: 'code_interpreter' }],
});
// Computer Use — Responses API ONLY (not available in Assistants)
const computerResponse = await client.responses.create({
model: 'computer-use-preview',
tools: [
{
type: 'computer_use_preview',
display_width: 1280,
display_height: 800,
environment: 'browser',
},
],
input: [
{
type: 'text',
text: 'Go to hacker news and get the top 5 headlines',
},
],
truncation: 'auto',
});
The Assistants → Prompts Change
In the Assistants API, you created persistent Assistant objects via API with a model, instructions, and tools configuration. These were stored server-side with an asst_ ID.
In the Responses API, the equivalent is Prompts — but with a key constraint: Prompts can only be created in the OpenAI dashboard, not via API.
// Assistants API — create Assistant object via API
const assistant = await client.beta.assistants.create({
name: 'Customer Support Bot',
instructions: 'You are a helpful customer support agent...',
model: 'gpt-4o',
tools: [{ type: 'file_search' }],
});
// Store: asst_abc123
// Use: reference assistant_id in thread runs
// Responses API — use a Prompt from the dashboard
// Prompts are created at platform.openai.com/prompts
// They're versioned, can be A/B tested, and referenced by ID
const response = await client.responses.create({
model: 'gpt-4o',
prompt: {
id: 'pmpt_abc123', // Dashboard-created Prompt ID
version: '2', // Specific version (optional)
variables: { // Template variables in the prompt
customerName: 'Royce',
accountTier: 'Pro',
},
},
input: 'I need help with my billing.',
});
The dashboard-only constraint for Prompt creation is a tradeoff: you get versioning, A/B testing, and a visual editor, but lose programmatic creation. If you need to dynamically generate different system prompts at runtime, pass instructions inline instead of using Prompts.
MCP Integration (Responses API Only)
One of the most significant new capabilities is direct MCP (Model Context Protocol) server connections. The Responses API can connect to MCP servers as tools:
// Connect to an MCP server directly from the Responses API
const response = await client.responses.create({
model: 'gpt-4o',
tools: [
{
type: 'mcp',
server_label: 'deepwiki',
server_url: 'https://mcp.deepwiki.com/mcp',
// No auth needed for public MCP servers
},
{
type: 'mcp',
server_label: 'github',
server_url: 'https://api.githubcopilot.com/mcp/',
headers: {
Authorization: `Bearer ${process.env.GITHUB_TOKEN}`,
},
allowed_tools: ['search_repositories', 'get_file_contents'],
},
],
input: 'Find the top TypeScript repositories on GitHub about AI agents',
});
This replaces the pattern of manually defining function tools and handling MCP protocol in your application code.
Function Calling Migration
Function calling syntax changed between APIs:
// Assistants API function calling
const assistant = await client.beta.assistants.create({
model: 'gpt-4o',
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City and state' },
},
required: ['location'],
},
},
},
],
});
// Handle requires_action status in run loop...
// Submit tool outputs back to the run...
// Poll again for completion...
// Very verbose
// Responses API function calling — cleaner loop
const response = await client.responses.create({
model: 'gpt-4o',
tools: [
{
type: 'function',
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City and state' },
},
required: ['location'],
},
},
],
input: "What's the weather in San Francisco?",
});
// Check for tool calls in the response
for (const item of response.output) {
if (item.type === 'function_call') {
const result = await getWeather(item.arguments.location);
// Submit result and continue
const finalResponse = await client.responses.create({
model: 'gpt-4o',
previous_response_id: response.id,
input: [
{
type: 'function_call_output',
call_id: item.call_id,
output: JSON.stringify(result),
},
],
});
console.log(finalResponse.output_text);
}
}
Streaming
Streaming is cleaner in the Responses API:
// Responses API streaming
const stream = await client.responses.create({
model: 'gpt-4o',
input: 'Write a haiku about TypeScript',
stream: true,
});
for await (const event of stream) {
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
}
if (event.type === 'response.completed') {
console.log('\n[Done]', event.response.usage);
}
}
// With the SDK helper (recommended)
const streamHelper = client.responses.stream({
model: 'gpt-4o',
input: 'Explain transformers architecture',
});
streamHelper.on('text', (text) => process.stdout.write(text));
await streamHelper.finalResponse();
Migration Checklist
A phased migration approach:
Phase 1: Stop creating new Assistants
- Create all new agents with Responses API
- Keep existing Assistants running for in-flight threads
Phase 2: Migrate instructions
- Create equivalent Prompts in the dashboard (or use inline instructions)
- Validate instruction equivalence with output comparison
Phase 3: Migrate conversation history
- Export Thread messages using:
client.beta.threads.messages.list(threadId)
- Store in your database as message arrays
- Use conversation_id (Conversations API) or pass history inline
Phase 4: Migrate tool configurations
- file_search: update vector_store_ids syntax
- code_interpreter: largely compatible
- function tools: update parameter format
- Custom tools: rewrite as MCP servers if reusable
Phase 5: Migrate Run polling to streaming
- Replace createAndPoll() with stream: true
- Update tool call handling loop
Cost Impact
The cache utilization improvements in the Responses API directly affect cost:
Internal OpenAI tests show 40–80% better cache utilization vs Chat Completions.
Caching kicks in when the same prefix appears in multiple requests.
For high-volume apps:
Example: 1M requests/day, $0.0025/1K input tokens, 2K token prompts
Without caching: 1M × 2K × $0.0025/1K = $5,000/day
With 60% cache hit: 0.4 × $5,000 + 0.6 × ($5,000 × 0.1) = $2,300/day
Savings: $2,700/day on input tokens alone
Use consistent prompt prefixes (same system instructions, same context format) to maximize cache hit rates.
What You Don't Need to Migrate
Some things aren't changing:
- Vector stores: The same vector store IDs work in both APIs
- File uploads: Files uploaded to OpenAI storage work in both
- Models: Same model identifiers (
gpt-4o,o3, etc.) - Embeddings: Embeddings API is separate, unaffected
- Fine-tuned models: Work in both APIs
Comparison Table
| Feature | Assistants API | Responses API |
|---|---|---|
| State management | Server-side (Threads) | Client-side or Conversations API |
| Agent config | Persistent Assistant objects | Inline instructions or dashboard Prompts |
| API objects | Assistants, Threads, Runs, Messages | Single Response object |
| Streaming | Via SSE with event types | Native, cleaner event types |
| Function calling | Submit outputs to active Run | Pass function_call_output in new request |
| File search | ✅ | ✅ |
| Code interpreter | ✅ | ✅ |
| Computer Use | ❌ | ✅ |
| MCP servers | ❌ | ✅ |
| Deep Research | ❌ | ✅ |
| Cache utilization | Baseline | 40–80% improvement |
| Create agent config via API | ✅ | ❌ (dashboard only) |
| Deprecation date | August 26, 2026 | Current standard |
Timeline for Migration
March 2026 (now): Start migrating — 5 months until deadline
April–May 2026: Migrate active projects
June 2026: Drain remaining Threads
July 2026: Final testing
August 26, 2026: Assistants API shuts down
February 2027: Azure OpenAI Assistants API shutsdown
The migration is not trivial for apps with complex multi-turn conversation flows, but the Responses API is genuinely better: fewer moving parts, better cost efficiency, and a path to Computer Use and MCP integration that the Assistants API never gets.
Browse all AI APIs and compare pricing at APIScout.
Related: Vercel AI SDK vs LangChain vs Raw API Calls · OpenAI vs Anthropic API 2026