Skip to main content

OpenAI Assistants API: When to Use It (and When Not To)

·APIScout Team
openaiassistants-apiai-apithreadsfile-searchcode-interpreter2026

TL;DR

Don't use the Assistants API unless you specifically need its built-in tools. For most use cases, the regular chat.completions API is faster, cheaper, more flexible, and easier to debug. The Assistants API shines in exactly two scenarios: when you want OpenAI to manage conversation state (threads) across many concurrent users without building your own storage, or when you need the built-in Code Interpreter for running actual Python code. For everything else, chat.completions wins.

Key Takeaways

  • Assistants API: stateful threads, file search (vector store), code interpreter, but ~2x slower
  • When to use: document Q&A with file upload, Python code execution, managing many concurrent long-running conversations
  • When NOT to use: simple chatbots, latency-sensitive apps, apps where you already manage conversation state
  • Cost surprise: Assistants API adds thread storage costs ($0.10/GB/day) + tool costs on top of model costs
  • File search: built-in RAG — upload PDFs/docs, Assistants handles chunking, embedding, retrieval
  • Code Interpreter: runs real Python in sandboxed environment, handles CSVs, generates charts

The Core Difference

Regular chat.completions:
  → You manage conversation history (array of messages)
  → You send full history on each request
  → Simple, fast, transparent
  → Handle your own state (database, cache, etc.)

Assistants API:
  → OpenAI manages conversation history (Threads)
  → You add messages to a Thread, run it, get responses
  → Built-in tools (File Search, Code Interpreter)
  → Extra latency for thread operations

For a simple chatbot serving 1,000 concurrent users, chat.completions with Redis for session state will outperform Assistants API in every way. The Assistants API is for when you want to offload the infrastructure.


Core Concepts

Assistant → A configured AI agent (model + instructions + tools)
Thread    → A conversation session (stores message history)
Message   → A message added to a thread
Run       → Execute an assistant against a thread
Run Step  → Individual steps the assistant took (tool calls, etc.)

Setup: Create an Assistant

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Create once, reuse everywhere:
const assistant = await openai.beta.assistants.create({
  name: 'Support Agent',
  instructions: `You are a helpful customer support agent for Acme Corp.
  Always be polite and professional.
  If you don't know the answer, say so and offer to escalate.
  Use the provided documentation to answer questions accurately.`,
  model: 'gpt-4o',
  tools: [
    { type: 'file_search' },        // Built-in document search
    { type: 'code_interpreter' },   // Run Python code
  ],
});

console.log('Assistant ID:', assistant.id);
// Store this ID — you reuse it, don't recreate it every request

Basic Conversation Flow

// Create a thread for a new conversation:
const thread = await openai.beta.threads.create();

// Add a message:
await openai.beta.threads.messages.create(thread.id, {
  role: 'user',
  content: 'How do I reset my password?',
});

// Run the assistant (this does the actual LLM call):
const run = await openai.beta.threads.runs.create(thread.id, {
  assistant_id: assistant.id,
});

// Poll until complete:
let completedRun = await openai.beta.threads.runs.poll(thread.id, run.id);

// Get the response:
const messages = await openai.beta.threads.messages.list(thread.id);
const lastMessage = messages.data[0];  // Most recent first
console.log(lastMessage.content[0].text.value);
// Streaming version (better UX):
const stream = await openai.beta.threads.runs.stream(thread.id, {
  assistant_id: assistant.id,
});

// Process stream events:
for await (const event of stream) {
  if (event.event === 'thread.message.delta') {
    const delta = event.data.delta.content?.[0];
    if (delta?.type === 'text') {
      process.stdout.write(delta.text?.value ?? '');
    }
  }
}

const finalRun = await stream.getFinalRun();
console.log('Run status:', finalRun.status);  // 'completed'

File Search: Built-In Document RAG

This is the strongest use case for the Assistants API — upload documents and ask questions without building your own RAG pipeline.

// Step 1: Create a Vector Store:
const vectorStore = await openai.beta.vectorStores.create({
  name: 'Product Documentation',
});

// Step 2: Upload files to the vector store:
const fileStream = fs.createReadStream('product-manual.pdf');
await openai.beta.vectorStores.files.uploadAndPoll(vectorStore.id, fileStream);

// Or upload multiple files:
const fileStreams = ['manual.pdf', 'faq.md', 'pricing.txt'].map((path) =>
  fs.createReadStream(path)
);
await openai.beta.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, {
  files: fileStreams,
});

// Step 3: Attach vector store to assistant:
await openai.beta.assistants.update(assistant.id, {
  tool_resources: {
    file_search: {
      vector_store_ids: [vectorStore.id],
    },
  },
});
// Now conversations automatically search your documents:
await openai.beta.threads.messages.create(thread.id, {
  role: 'user',
  content: 'What is the warranty period for the Pro model?',
});

const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
});

const messages = await openai.beta.threads.messages.list(thread.id);
const response = messages.data[0];

// Response includes citations with source file references:
const content = response.content[0];
if (content.type === 'text') {
  console.log(content.text.value);
  // "The Pro model comes with a 2-year warranty [1]."

  // Annotations show which files were cited:
  content.text.annotations?.forEach((annotation) => {
    if (annotation.type === 'file_citation') {
      console.log(`Citation: ${annotation.file_citation.file_id}`);
    }
  });
}

File Search pricing:

  • Vector store storage: $0.10/GB/day
  • File search tool call: charged per run step (~$0.001-0.01 per query depending on model)

Code Interpreter: Real Python Execution

Code Interpreter runs actual Python in an OpenAI sandbox. Useful for data analysis, chart generation, math.

// Create thread with a file to analyze:
const fileStream = fs.createReadStream('sales-data.csv');
const file = await openai.files.create({
  file: fileStream,
  purpose: 'assistants',
});

const thread = await openai.beta.threads.create({
  messages: [
    {
      role: 'user',
      content: 'Analyze the sales data and create a chart showing monthly trends.',
      attachments: [
        {
          file_id: file.id,
          tools: [{ type: 'code_interpreter' }],
        },
      ],
    },
  ],
});

const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
});

// Retrieve the response (may include image outputs):
const messages = await openai.beta.threads.messages.list(thread.id);
const response = messages.data[0];

for (const block of response.content) {
  if (block.type === 'text') {
    console.log(block.text.value);
  } else if (block.type === 'image_file') {
    // Download the generated chart:
    const imageContent = await openai.files.content(block.image_file.file_id);
    fs.writeFileSync('chart.png', Buffer.from(await imageContent.arrayBuffer()));
    console.log('Chart saved to chart.png');
  }
}

Code Interpreter pricing: $0.03 per session (not per run step, per entire code interpreter session)


When Assistants API Makes Sense

✅ Use Assistants API for:

1. Document Q&A product
   → Users upload their own PDFs/docs
   → Each user gets their own vector store
   → You want OpenAI to handle the RAG pipeline

2. Data analysis tool
   → Users upload CSV/Excel files
   → Need code interpreter to run analysis
   → Don't want to build Python execution infrastructure

3. High-concurrency applications
   → 10,000+ concurrent conversations
   → Don't want to manage your own thread storage
   → Willing to pay OpenAI for the state management

4. Very long-running conversations
   → Conversations spanning days/weeks
   → Thread history automatically maintained
   → No TTL management on your end
❌ Don't use Assistants API for:

1. Simple chatbots
   → chat.completions is faster, cheaper, easier to debug

2. Latency-sensitive applications
   → Assistants API is ~2x slower than raw completions
   → Thread operations add extra round trips

3. Custom RAG pipelines
   → You can't control chunking strategy
   → You can't use custom embedding models
   → pgvector + your own pipeline gives more control

4. When you need full message control
   → Assistants API batches responses
   → Can't easily inject intermediate messages

5. Cost-optimized applications
   → Thread storage adds $0.10/GB/day
   → Equivalent chat.completions is cheaper at scale

Managing Threads at Scale

// Thread management best practices:
class ThreadManager {
  private cache = new Map<string, string>();  // userId → threadId

  async getOrCreateThread(userId: string): Promise<string> {
    // Check cache first:
    if (this.cache.has(userId)) {
      return this.cache.get(userId)!;
    }

    // Check database:
    const existing = await db.userThread.findUnique({ where: { userId } });
    if (existing) {
      this.cache.set(userId, existing.threadId);
      return existing.threadId;
    }

    // Create new thread:
    const thread = await openai.beta.threads.create();
    await db.userThread.create({
      data: { userId, threadId: thread.id, createdAt: new Date() },
    });
    this.cache.set(userId, thread.id);
    return thread.id;
  }

  async deleteOldThreads(daysOld: number) {
    // Clean up threads older than N days to save storage costs:
    const cutoff = new Date(Date.now() - daysOld * 24 * 60 * 60 * 1000);
    const old = await db.userThread.findMany({
      where: { createdAt: { lt: cutoff } },
    });

    for (const thread of old) {
      await openai.beta.threads.del(thread.threadId);
      await db.userThread.delete({ where: { id: thread.id } });
    }
  }
}

Compare all AI APIs including OpenAI Assistants at APIScout.

Comments