Skip to main content

OpenAI Responses API vs Assistants API

·APIScout Team
openairesponses-apiassistants-apimigrationllmgpt2026

TL;DR

Migrate to the Responses API now — don't wait for the August 26, 2026 deadline. The Responses API is simpler, cheaper (40–80% better cache utilization), and unlocks features the Assistants API never will: Computer Use, MCP server connections, and deep research tools. The migration is a meaningful architectural change (Threads → Conversations, Assistants → stateless calls), not a drop-in replacement, but it's worth doing before the forced cutoff.

Key Takeaways

  • Deprecation deadline: Assistants API shuts down August 26, 2026 — Azure OpenAI customers get until February 2027
  • Architecture shift: Assistants API was stateful (Threads, Runs, Messages as API objects); Responses API is stateless (you manage conversation history client-side)
  • New capabilities: Responses API adds Computer Use, MCP server connections, and Deep Research — none will come to Assistants API
  • Cost improvement: 40–80% better cache utilization vs Chat Completions, and outperforms Assistants API on caching
  • Assistants → Prompts: Persistent Assistant objects are replaced by Prompts, but Prompts must be created in the OpenAI dashboard, not via API
  • Threads → Conversations API: Long-running multi-turn conversations use the new Conversations API (durable conversation_id) instead of Threads

Why OpenAI Is Replacing the Assistants API

The Assistants API launched in 2023 as a managed stateful layer: you created Assistant objects (with a model, instructions, and tools), created Thread objects (conversation containers), and ran them with Run objects. All state lived on OpenAI's servers.

The problem was complexity — three API objects (Assistants, Threads, Runs) for what is fundamentally a "send prompt, get response" operation. The polling loop for Run completion was awkward. Tool calls inside Runs required additional API calls to submit outputs. And Threads became bottlenecks for long conversations due to context window management.

The Responses API collapses this into a cleaner model:

Assistants API:                  Responses API:
  Create Assistant                 Pass instructions inline (or via Prompt)
  Create Thread                    Pass conversation history inline
  Create Run                       Single responses.create() call
  Poll for Run completion          Streaming response or single result
  Fetch Messages                   Response directly in API return

Architecture Differences

Stateful vs Stateless

The Assistants API stored your conversation history on OpenAI's servers. You referenced a thread_id and OpenAI managed what was included in the context window.

The Responses API is stateless by default — you send the full conversation history with each request:

import OpenAI from 'openai';
const client = new OpenAI();

// Assistants API — stateful (conversation lives on OpenAI's servers)
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
  role: 'user',
  content: 'What is the capital of France?',
});
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: 'asst_abc123',
});
const messages = await client.beta.threads.messages.list(thread.id);
const answer = messages.data[0].content[0].text.value;

// Responses API — stateless (you manage history)
const response = await client.responses.create({
  model: 'gpt-4o',
  instructions: 'You are a helpful assistant.',
  input: 'What is the capital of France?',
});
console.log(response.output_text); // "Paris"

// Multi-turn: pass previous_response_id for context continuation
const followUp = await client.responses.create({
  model: 'gpt-4o',
  previous_response_id: response.id, // Links to previous context
  input: 'What is its population?',
});

Managing Conversation State

For persistent multi-turn conversations (equivalent to Threads), the Responses API introduces the Conversations API:

// Create a durable conversation (replaces Threads)
const conversation = await client.conversations.create({
  metadata: { userId: 'user_123', sessionType: 'support' },
});

// Use conversation_id for persistent context across sessions
const response = await client.responses.create({
  model: 'gpt-4o',
  conversation_id: conversation.id, // OpenAI stores the history
  instructions: 'You are a helpful customer support agent.',
  input: 'I need help with my subscription.',
});

// Later session, same conversation
const laterResponse = await client.responses.create({
  model: 'gpt-4o',
  conversation_id: conversation.id, // Same conversation_id
  input: 'Has my issue been escalated?',
});
// OpenAI remembers the full history from this conversation_id

Tools: File Search, Code Interpreter, Computer Use

The core tools map across, but Computer Use is exclusive to the Responses API:

// File search (equivalent to Assistants' file_search tool)
const response = await client.responses.create({
  model: 'gpt-4o',
  input: 'Summarize the attached PDF',
  tools: [
    {
      type: 'file_search',
      vector_store_ids: ['vs_abc123'],
    },
  ],
});

// Code interpreter (same as Assistants)
const calcResponse = await client.responses.create({
  model: 'gpt-4o',
  input: 'Calculate the compound interest on $10,000 at 5% for 10 years',
  tools: [{ type: 'code_interpreter' }],
});

// Computer Use — Responses API ONLY (not available in Assistants)
const computerResponse = await client.responses.create({
  model: 'computer-use-preview',
  tools: [
    {
      type: 'computer_use_preview',
      display_width: 1280,
      display_height: 800,
      environment: 'browser',
    },
  ],
  input: [
    {
      type: 'text',
      text: 'Go to hacker news and get the top 5 headlines',
    },
  ],
  truncation: 'auto',
});

The Assistants → Prompts Change

In the Assistants API, you created persistent Assistant objects via API with a model, instructions, and tools configuration. These were stored server-side with an asst_ ID.

In the Responses API, the equivalent is Prompts — but with a key constraint: Prompts can only be created in the OpenAI dashboard, not via API.

// Assistants API — create Assistant object via API
const assistant = await client.beta.assistants.create({
  name: 'Customer Support Bot',
  instructions: 'You are a helpful customer support agent...',
  model: 'gpt-4o',
  tools: [{ type: 'file_search' }],
});

// Store: asst_abc123
// Use: reference assistant_id in thread runs

// Responses API — use a Prompt from the dashboard
// Prompts are created at platform.openai.com/prompts
// They're versioned, can be A/B tested, and referenced by ID

const response = await client.responses.create({
  model: 'gpt-4o',
  prompt: {
    id: 'pmpt_abc123',   // Dashboard-created Prompt ID
    version: '2',        // Specific version (optional)
    variables: {         // Template variables in the prompt
      customerName: 'Royce',
      accountTier: 'Pro',
    },
  },
  input: 'I need help with my billing.',
});

The dashboard-only constraint for Prompt creation is a tradeoff: you get versioning, A/B testing, and a visual editor, but lose programmatic creation. If you need to dynamically generate different system prompts at runtime, pass instructions inline instead of using Prompts.


MCP Integration (Responses API Only)

One of the most significant new capabilities is direct MCP (Model Context Protocol) server connections. The Responses API can connect to MCP servers as tools:

// Connect to an MCP server directly from the Responses API
const response = await client.responses.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'mcp',
      server_label: 'deepwiki',
      server_url: 'https://mcp.deepwiki.com/mcp',
      // No auth needed for public MCP servers
    },
    {
      type: 'mcp',
      server_label: 'github',
      server_url: 'https://api.githubcopilot.com/mcp/',
      headers: {
        Authorization: `Bearer ${process.env.GITHUB_TOKEN}`,
      },
      allowed_tools: ['search_repositories', 'get_file_contents'],
    },
  ],
  input: 'Find the top TypeScript repositories on GitHub about AI agents',
});

This replaces the pattern of manually defining function tools and handling MCP protocol in your application code.


Function Calling Migration

Function calling syntax changed between APIs:

// Assistants API function calling
const assistant = await client.beta.assistants.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string', description: 'City and state' },
          },
          required: ['location'],
        },
      },
    },
  ],
});

// Handle requires_action status in run loop...
// Submit tool outputs back to the run...
// Poll again for completion...
// Very verbose

// Responses API function calling — cleaner loop
const response = await client.responses.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'function',
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string', description: 'City and state' },
        },
        required: ['location'],
      },
    },
  ],
  input: "What's the weather in San Francisco?",
});

// Check for tool calls in the response
for (const item of response.output) {
  if (item.type === 'function_call') {
    const result = await getWeather(item.arguments.location);

    // Submit result and continue
    const finalResponse = await client.responses.create({
      model: 'gpt-4o',
      previous_response_id: response.id,
      input: [
        {
          type: 'function_call_output',
          call_id: item.call_id,
          output: JSON.stringify(result),
        },
      ],
    });

    console.log(finalResponse.output_text);
  }
}

Streaming

Streaming is cleaner in the Responses API:

// Responses API streaming
const stream = await client.responses.create({
  model: 'gpt-4o',
  input: 'Write a haiku about TypeScript',
  stream: true,
});

for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta);
  }
  if (event.type === 'response.completed') {
    console.log('\n[Done]', event.response.usage);
  }
}

// With the SDK helper (recommended)
const streamHelper = client.responses.stream({
  model: 'gpt-4o',
  input: 'Explain transformers architecture',
});

streamHelper.on('text', (text) => process.stdout.write(text));
await streamHelper.finalResponse();

Migration Checklist

A phased migration approach:

Phase 1: Stop creating new Assistants
  - Create all new agents with Responses API
  - Keep existing Assistants running for in-flight threads

Phase 2: Migrate instructions
  - Create equivalent Prompts in the dashboard (or use inline instructions)
  - Validate instruction equivalence with output comparison

Phase 3: Migrate conversation history
  - Export Thread messages using:
    client.beta.threads.messages.list(threadId)
  - Store in your database as message arrays
  - Use conversation_id (Conversations API) or pass history inline

Phase 4: Migrate tool configurations
  - file_search: update vector_store_ids syntax
  - code_interpreter: largely compatible
  - function tools: update parameter format
  - Custom tools: rewrite as MCP servers if reusable

Phase 5: Migrate Run polling to streaming
  - Replace createAndPoll() with stream: true
  - Update tool call handling loop

Cost Impact

The cache utilization improvements in the Responses API directly affect cost:

Internal OpenAI tests show 40–80% better cache utilization vs Chat Completions.
Caching kicks in when the same prefix appears in multiple requests.

For high-volume apps:
  Example: 1M requests/day, $0.0025/1K input tokens, 2K token prompts

  Without caching: 1M × 2K × $0.0025/1K = $5,000/day
  With 60% cache hit: 0.4 × $5,000 + 0.6 × ($5,000 × 0.1) = $2,300/day
  Savings: $2,700/day on input tokens alone

Use consistent prompt prefixes (same system instructions, same context format) to maximize cache hit rates.


What You Don't Need to Migrate

Some things aren't changing:

  • Vector stores: The same vector store IDs work in both APIs
  • File uploads: Files uploaded to OpenAI storage work in both
  • Models: Same model identifiers (gpt-4o, o3, etc.)
  • Embeddings: Embeddings API is separate, unaffected
  • Fine-tuned models: Work in both APIs

Comparison Table

FeatureAssistants APIResponses API
State managementServer-side (Threads)Client-side or Conversations API
Agent configPersistent Assistant objectsInline instructions or dashboard Prompts
API objectsAssistants, Threads, Runs, MessagesSingle Response object
StreamingVia SSE with event typesNative, cleaner event types
Function callingSubmit outputs to active RunPass function_call_output in new request
File search
Code interpreter
Computer Use
MCP servers
Deep Research
Cache utilizationBaseline40–80% improvement
Create agent config via API❌ (dashboard only)
Deprecation dateAugust 26, 2026Current standard

Timeline for Migration

March 2026 (now):      Start migrating — 5 months until deadline
April–May 2026:        Migrate active projects
June 2026:             Drain remaining Threads
July 2026:             Final testing
August 26, 2026:       Assistants API shuts down
February 2027:         Azure OpenAI Assistants API shutsdown

The migration is not trivial for apps with complex multi-turn conversation flows, but the Responses API is genuinely better: fewer moving parts, better cost efficiency, and a path to Computer Use and MCP integration that the Assistants API never gets.


Browse all AI APIs and compare pricing at APIScout.

Related: Vercel AI SDK vs LangChain vs Raw API Calls · OpenAI vs Anthropic API 2026

Comments