OpenAI Responses API vs Assistants API 2026

Q: What You Don't Need to Migrate?

Some things aren't changing: Vector stores: The same vector store IDs work in both APIs File uploads: Files uploaded to OpenAI storage work in both Models: Same model identifiers (gpt-4o, o3, etc.) Embeddings: Embeddings API is separate, unaffected Fine-tuned models: Work in both APIs ---

TL;DR

Migrate to the Responses API now — don't wait for the August 26, 2026 deadline. The Responses API is simpler, cheaper (40–80% better cache utilization), and unlocks features the Assistants API never will: Computer Use, MCP server connections, and deep research tools. The migration is a meaningful architectural change (Threads → Conversations, Assistants → stateless calls), not a drop-in replacement, but it's worth doing before the forced cutoff.

Key Takeaways

Deprecation deadline: Assistants API shuts down August 26, 2026 — Azure OpenAI customers get until February 2027
Architecture shift: Assistants API was stateful (Threads, Runs, Messages as API objects); Responses API is stateless (you manage conversation history client-side)
New capabilities: Responses API adds Computer Use, MCP server connections, and Deep Research — none will come to Assistants API
Cost improvement: 40–80% better cache utilization vs Chat Completions, and outperforms Assistants API on caching
Assistants → Prompts: Persistent Assistant objects are replaced by Prompts, but Prompts must be created in the OpenAI dashboard, not via API
Threads → Conversations API: Long-running multi-turn conversations use the new Conversations API (durable conversation_id) instead of Threads

Why OpenAI Is Replacing the Assistants API

The Assistants API launched in 2023 as a managed stateful layer: you created Assistant objects (with a model, instructions, and tools), created Thread objects (conversation containers), and ran them with Run objects. All state lived on OpenAI's servers.

The problem was complexity — three API objects (Assistants, Threads, Runs) for what is fundamentally a "send prompt, get response" operation. The polling loop for Run completion was awkward. Tool calls inside Runs required additional API calls to submit outputs. And Threads became bottlenecks for long conversations due to context window management.

The Responses API collapses this into a cleaner model:

Assistants API:                  Responses API:
  Create Assistant                 Pass instructions inline (or via Prompt)
  Create Thread                    Pass conversation history inline
  Create Run                       Single responses.create() call
  Poll for Run completion          Streaming response or single result
  Fetch Messages                   Response directly in API return

Architecture Differences

Stateful vs Stateless

The Assistants API stored your conversation history on OpenAI's servers. You referenced a thread_id and OpenAI managed what was included in the context window.

The Responses API is stateless by default — you send the full conversation history with each request:

import OpenAI from 'openai';
const client = new OpenAI();

// Assistants API — stateful (conversation lives on OpenAI's servers)
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
  role: 'user',
  content: 'What is the capital of France?',
});
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: 'asst_abc123',
});
const messages = await client.beta.threads.messages.list(thread.id);
const answer = messages.data[0].content[0].text.value;

// Responses API — stateless (you manage history)
const response = await client.responses.create({
  model: 'gpt-4o',
  instructions: 'You are a helpful assistant.',
  input: 'What is the capital of France?',
});
console.log(response.output_text); // "Paris"

// Multi-turn: pass previous_response_id for context continuation
const followUp = await client.responses.create({
  model: 'gpt-4o',
  previous_response_id: response.id, // Links to previous context
  input: 'What is its population?',
});

Managing Conversation State

For persistent multi-turn conversations (equivalent to Threads), the Responses API introduces the Conversations API:

// Create a durable conversation (replaces Threads)
const conversation = await client.conversations.create({
  metadata: { userId: 'user_123', sessionType: 'support' },
});

// Use conversation_id for persistent context across sessions
const response = await client.responses.create({
  model: 'gpt-4o',
  conversation_id: conversation.id, // OpenAI stores the history
  instructions: 'You are a helpful customer support agent.',
  input: 'I need help with my subscription.',
});

// Later session, same conversation
const laterResponse = await client.responses.create({
  model: 'gpt-4o',
  conversation_id: conversation.id, // Same conversation_id
  input: 'Has my issue been escalated?',
});
// OpenAI remembers the full history from this conversation_id

Tools: File Search, Code Interpreter, Computer Use

The core tools map across, but Computer Use is exclusive to the Responses API:

// File search (equivalent to Assistants' file_search tool)
const response = await client.responses.create({
  model: 'gpt-4o',
  input: 'Summarize the attached PDF',
  tools: [
    {
      type: 'file_search',
      vector_store_ids: ['vs_abc123'],
    },
  ],
});

// Code interpreter (same as Assistants)
const calcResponse = await client.responses.create({
  model: 'gpt-4o',
  input: 'Calculate the compound interest on $10,000 at 5% for 10 years',
  tools: [{ type: 'code_interpreter' }],
});

// Computer Use — Responses API ONLY (not available in Assistants)
const computerResponse = await client.responses.create({
  model: 'computer-use-preview',
  tools: [
    {
      type: 'computer_use_preview',
      display_width: 1280,
      display_height: 800,
      environment: 'browser',
    },
  ],
  input: [
    {
      type: 'text',
      text: 'Go to hacker news and get the top 5 headlines',
    },
  ],
  truncation: 'auto',
});

The Assistants → Prompts Change

In the Assistants API, you created persistent Assistant objects via API with a model, instructions, and tools configuration. These were stored server-side with an asst_ ID.

In the Responses API, the equivalent is Prompts — but with a key constraint: Prompts can only be created in the OpenAI dashboard, not via API.

// Assistants API — create Assistant object via API
const assistant = await client.beta.assistants.create({
  name: 'Customer Support Bot',
  instructions: 'You are a helpful customer support agent...',
  model: 'gpt-4o',
  tools: [{ type: 'file_search' }],
});

// Store: asst_abc123
// Use: reference assistant_id in thread runs

// Responses API — use a Prompt from the dashboard
// Prompts are created at platform.openai.com/prompts
// They're versioned, can be A/B tested, and referenced by ID

const response = await client.responses.create({
  model: 'gpt-4o',
  prompt: {
    id: 'pmpt_abc123',   // Dashboard-created Prompt ID
    version: '2',        // Specific version (optional)
    variables: {         // Template variables in the prompt
      customerName: 'Royce',
      accountTier: 'Pro',
    },
  },
  input: 'I need help with my billing.',
});

The dashboard-only constraint for Prompt creation is a tradeoff: you get versioning, A/B testing, and a visual editor, but lose programmatic creation. If you need to dynamically generate different system prompts at runtime, pass instructions inline instead of using Prompts.

MCP Integration (Responses API Only)

One of the most significant new capabilities is direct MCP (Model Context Protocol) server connections. The Responses API can connect to MCP servers as tools:

// Connect to an MCP server directly from the Responses API
const response = await client.responses.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'mcp',
      server_label: 'deepwiki',
      server_url: 'https://mcp.deepwiki.com/mcp',
      // No auth needed for public MCP servers
    },
    {
      type: 'mcp',
      server_label: 'github',
      server_url: 'https://api.githubcopilot.com/mcp/',
      headers: {
        Authorization: `Bearer ${process.env.GITHUB_TOKEN}`,
      },
      allowed_tools: ['search_repositories', 'get_file_contents'],
    },
  ],
  input: 'Find the top TypeScript repositories on GitHub about AI agents',
});

This replaces the pattern of manually defining function tools and handling MCP protocol in your application code.

Function Calling Migration

Function calling syntax changed between APIs:

// Assistants API function calling
const assistant = await client.beta.assistants.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string', description: 'City and state' },
          },
          required: ['location'],
        },
      },
    },
  ],
});

// Handle requires_action status in run loop...
// Submit tool outputs back to the run...
// Poll again for completion...
// Very verbose

// Responses API function calling — cleaner loop
const response = await client.responses.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'function',
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string', description: 'City and state' },
        },
        required: ['location'],
      },
    },
  ],
  input: "What's the weather in San Francisco?",
});

// Check for tool calls in the response
for (const item of response.output) {
  if (item.type === 'function_call') {
    const result = await getWeather(item.arguments.location);

    // Submit result and continue
    const finalResponse = await client.responses.create({
      model: 'gpt-4o',
      previous_response_id: response.id,
      input: [
        {
          type: 'function_call_output',
          call_id: item.call_id,
          output: JSON.stringify(result),
        },
      ],
    });

    console.log(finalResponse.output_text);
  }
}

Streaming

Streaming is cleaner in the Responses API:

// Responses API streaming
const stream = await client.responses.create({
  model: 'gpt-4o',
  input: 'Write a haiku about TypeScript',
  stream: true,
});

for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta);
  }
  if (event.type === 'response.completed') {
    console.log('\n[Done]', event.response.usage);
  }
}

// With the SDK helper (recommended)
const streamHelper = client.responses.stream({
  model: 'gpt-4o',
  input: 'Explain transformers architecture',
});

streamHelper.on('text', (text) => process.stdout.write(text));
await streamHelper.finalResponse();

Migration Checklist

A phased migration approach:

Phase 1: Stop creating new Assistants
  - Create all new agents with Responses API
  - Keep existing Assistants running for in-flight threads

Phase 2: Migrate instructions
  - Create equivalent Prompts in the dashboard (or use inline instructions)
  - Validate instruction equivalence with output comparison

Phase 3: Migrate conversation history
  - Export Thread messages using:
    client.beta.threads.messages.list(threadId)
  - Store in your database as message arrays
  - Use conversation_id (Conversations API) or pass history inline

Phase 4: Migrate tool configurations
  - file_search: update vector_store_ids syntax
  - code_interpreter: largely compatible
  - function tools: update parameter format
  - Custom tools: rewrite as MCP servers if reusable

Phase 5: Migrate Run polling to streaming
  - Replace createAndPoll() with stream: true
  - Update tool call handling loop

Cost Impact

The cache utilization improvements in the Responses API directly affect cost:

Internal OpenAI tests show 40–80% better cache utilization vs Chat Completions.
Caching kicks in when the same prefix appears in multiple requests.

For high-volume apps:
  Example: 1M requests/day, $0.0025/1K input tokens, 2K token prompts

  Without caching: 1M × 2K × $0.0025/1K = $5,000/day
  With 60% cache hit: 0.4 × $5,000 + 0.6 × ($5,000 × 0.1) = $2,300/day
  Savings: $2,700/day on input tokens alone

Use consistent prompt prefixes (same system instructions, same context format) to maximize cache hit rates.

Practical Migration Considerations

How long does migration take? For a simple chatbot using Threads and a single Assistant, migration is 1-2 days: replace the Thread/Run polling loop with a Responses API call, and either pass previous_response_id for context continuity or use the Conversations API for persistent history.

For complex multi-assistant workflows with custom tools, function calling, and stateful session management, budget 1-3 weeks depending on the volume of existing threads and the complexity of the tool-call handling logic.

The hardest part is migrating existing Thread data. Threads stored on OpenAI's servers will be deleted when the API shuts down. If you need to preserve conversation history for compliance, audit, or personalization purposes, export Thread messages before August 26, 2026 and store them in your own database. The Threads messages API (client.beta.threads.messages.list(threadId)) supports pagination for long threads.

What breaks immediately? The Assistants API and Responses API use different parameter structures. Code that creates assistant_id references, creates thread_id objects, or polls Run status will break — this is not a backwards-compatible change. A phased migration (new code on Responses API, existing code on Assistants API until the deadline) is the practical approach.

Azure OpenAI users: The deadline is February 2027, not August 2026. However, new features (Computer Use, MCP, Deep Research) are only available on the Responses API, so migrating sooner than the deadline has concrete capability benefits.

What You Don't Need to Migrate

Some things aren't changing:

Vector stores: The same vector store IDs work in both APIs
File uploads: Files uploaded to OpenAI storage work in both
Models: Same model identifiers (gpt-4o, o3, etc.)
Embeddings: Embeddings API is separate, unaffected
Fine-tuned models: Work in both APIs

Comparison Table

Feature	Assistants API	Responses API
State management	Server-side (Threads)	Client-side or Conversations API
Agent config	Persistent Assistant objects	Inline instructions or dashboard Prompts
API objects	Assistants, Threads, Runs, Messages	Single `Response` object
Streaming	Via SSE with event types	Native, cleaner event types
Function calling	Submit outputs to active Run	Pass function_call_output in new request
File search	✅	✅
Code interpreter	✅	✅
Computer Use	❌	✅
MCP servers	❌	✅
Deep Research	❌	✅
Cache utilization	Baseline	40–80% improvement
Create agent config via API	✅	❌ (dashboard only)
Deprecation date	August 26, 2026	Current standard

Timeline for Migration

March 2026 (now):      Start migrating — 5 months until deadline
April–May 2026:        Migrate active projects
June 2026:             Drain remaining Threads
July 2026:             Final testing
August 26, 2026:       Assistants API shuts down
February 2027:         Azure OpenAI Assistants API shutsdown

The migration is not trivial for apps with complex multi-turn conversation flows, but the Responses API is genuinely better: fewer moving parts, better cost efficiency, and a path to Computer Use and MCP integration that the Assistants API never gets. If you're building a new AI feature or agent today, there's no reason to use the Assistants API — start on the Responses API directly and skip the migration work entirely.

Methodology

Migration guidance based on official OpenAI Responses API documentation, the Assistants API deprecation announcement (March 2026), and the OpenAI developer forum migration threads. Code examples verified against the openai npm package v4.x. Cost improvement figures (40-80% cache utilization) sourced from OpenAI's official Responses API launch blog post.

Browse all AI APIs and compare pricing at APIScout.

The API Integration Checklist (Free PDF)