Skip to main content

Embedding Models Compared: OpenAI vs Cohere vs Voyage vs Open Source 2026

·APIScout Team
embeddingsopenaicoherevoyage-airagvector-search2026

TL;DR

For most RAG and semantic search tasks: text-embedding-3-small from OpenAI wins on value — excellent performance, 1536 dimensions, $0.02/1M tokens. For maximum quality: Voyage AI's voyage-3-large leads the MTEB leaderboard. For zero-cost self-hosted: nomic-embed-text-v1.5 (via Ollama) is surprisingly competitive. Cohere's embed-v3 excels at multilingual and classification. The performance gap between these models is smaller than the gap between chunking strategies — your RAG pipeline's chunking and retrieval logic matters more than model choice.

Key Takeaways

  • OpenAI text-embedding-3-small: best value, 62.3 MTEB, $0.02/1M tokens, 1536 dims
  • OpenAI text-embedding-3-large: higher quality, 64.6 MTEB, $0.13/1M tokens, 3072 dims
  • Voyage AI voyage-3-large: top MTEB score 68.2, best raw quality, $0.12/1M tokens
  • Cohere embed-v3: multilingual (100+ languages), classification-optimized, $0.10/1M tokens
  • nomic-embed-text-v1.5: free self-hosted, 62.4 MTEB (beats OpenAI small!), 768 dims
  • Matryoshka embeddings: OpenAI and Voyage support truncating dimensions — save 4x cost without much quality loss

MTEB Benchmark Scores (2026)

MTEB (Massive Text Embedding Benchmark) is the standard leaderboard for embedding models across 56 tasks:

ModelMTEB ScoreDimensionsCost/1M tokensContext
voyage-3-large68.21024$0.1232K
text-embedding-3-large64.63072$0.138K
nomic-embed-text-v1.562.4768Free (self-hosted)8K
text-embedding-3-small62.31536$0.028K
Cohere embed-english-v3.064.51024$0.10512
voyage-365.11024$0.0632K
Cohere embed-multilingual-v360.11024$0.10512
text-embedding-ada-00261.01536$0.108K

Don't over-optimize on MTEB. A 2-point MTEB difference rarely matters in practice as much as retrieval strategy, chunk size, or query preprocessing.


OpenAI Embeddings: Default Choice

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Single embedding:
async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text.replace(/\n/g, ' '),  // Newlines hurt quality
  });
  return response.data[0].embedding;
}

// Batch embeddings (much more efficient):
async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts.map((t) => t.replace(/\n/g, ' ')),
  });
  // Response preserves order:
  return response.data.map((d) => d.embedding);
}

Matryoshka: Truncate Dimensions for Cost

OpenAI's v3 models support Matryoshka Representation Learning — you can truncate the output dimensions with minimal quality loss:

// Full dimensions: 1536 (text-embedding-3-small)
// Truncated: 256 dims — 6x smaller, ~1.5% quality loss

const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: text,
  dimensions: 256,  // Truncate to 256 dims
});

// Storage savings: 1536 float32 = 6KB per vector
//                  256 float32 = 1KB per vector
// At 1M documents: 6GB vs 1GB — huge difference for pgvector

// Useful dimensions for text-embedding-3-small:
// 1536 — full quality (default)
// 512  — good quality, 3x smaller
// 256  — acceptable quality, 6x smaller

Voyage AI: Highest Quality

// npm install voyageai
import VoyageAI from 'voyageai';

const voyage = new VoyageAI({ apiKey: process.env.VOYAGE_API_KEY });

// Basic embedding:
const response = await voyage.embed({
  input: ['How do I connect to a database?'],
  model: 'voyage-3-large',
});

const embedding = response.data[0].embedding;  // 1024 dims

// Batch:
const batchResponse = await voyage.embed({
  input: texts,
  model: 'voyage-3-large',
  input_type: 'document',  // 'document' for corpus, 'query' for search queries
  truncation: true,         // Truncate at context limit instead of error
});

Voyage Asymmetric Embeddings

Voyage supports asymmetric search — different representations for documents vs queries:

// When indexing documents:
const docEmbeddings = await voyage.embed({
  input: documentTexts,
  model: 'voyage-3',
  input_type: 'document',  // Optimized for documents
});

// When searching:
const queryEmbedding = await voyage.embed({
  input: [userQuery],
  model: 'voyage-3',
  input_type: 'query',    // Optimized for queries
});

// This matters more than it seems:
// A query "how to connect" and a document "database connection tutorial"
// mean the same thing but look different textually
// Asymmetric models bridge this gap better

Voyage models:

  • voyage-3-large: highest quality, $0.12/1M
  • voyage-3: balanced, $0.06/1M
  • voyage-3-lite: fast and cheap, $0.02/1M — competitive with OpenAI small
  • voyage-code-3: optimized for code search, $0.18/1M

Cohere Embeddings: Multilingual + Classification

// npm install cohere-ai
import { CohereClient } from 'cohere-ai';

const cohere = new CohereClient({ token: process.env.COHERE_API_KEY });

// English embeddings:
const response = await cohere.v2.embed({
  texts: [
    'What is machine learning?',
    'Explain neural networks',
  ],
  model: 'embed-english-v3.0',
  inputType: 'search_document',  // For indexing; 'search_query' for querying
  embeddingTypes: ['float'],
});

const embeddings = response.embeddings.float!;
// Multilingual — 100+ languages:
const multilingualResponse = await cohere.v2.embed({
  texts: [
    'How are you?',         // English
    '¿Cómo estás?',        // Spanish
    'Comment allez-vous?',  // French
    '你好吗?',               // Chinese
  ],
  model: 'embed-multilingual-v3.0',
  inputType: 'search_document',
  embeddingTypes: ['float'],
});

// All 4 texts will have similar embeddings for similar meanings
// Great for global apps where users query in different languages
// Cohere int8 embeddings — 4x smaller, minimal quality loss:
const int8Response = await cohere.v2.embed({
  texts: documents,
  model: 'embed-english-v3.0',
  inputType: 'search_document',
  embeddingTypes: ['int8'],  // 1024 int8 vs 1024 float32 = 4x smaller
});

// Great for: very large corpora where storage is a concern

Open Source: nomic-embed via Ollama (Free)

# Install Ollama, then pull the model:
ollama pull nomic-embed-text

# Serve (already running with `ollama serve`):
# http://localhost:11434
// nomic-embed-text via Ollama API:
async function embedWithOllama(text: string): Promise<number[]> {
  const response = await fetch('http://localhost:11434/api/embeddings', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: 'nomic-embed-text',
      prompt: text,
    }),
  });
  const data = await response.json();
  return data.embedding;  // 768 dimensions
}

// Or use Ollama's OpenAI-compatible endpoint:
const openaiCompatible = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',  // Ignored but required
});

const response = await openaiCompatible.embeddings.create({
  model: 'nomic-embed-text',
  input: text,
});

nomic-embed-text-v1.5 supports Matryoshka dimensions (768, 512, 256, 128, 64) — same feature as OpenAI.

Use open source embeddings when:

  • Running locally (dev, privacy-sensitive data)
  • Batch processing large corpora ($0 vs $200+ for 10B tokens)
  • Self-hosted infrastructure where cloud API calls are not possible
  • Experimenting without API costs

Cost at Scale

Embedding 10M documents (avg 200 tokens each = 2B tokens):

ModelCostNotes
text-embedding-3-small$40Best value commercial
voyage-3-lite$40Competitive alternative
text-embedding-ada-002$200Old model, avoid
text-embedding-3-large$260Only if quality critical
nomic-embed-text$0Self-hosted

For ongoing use (1M queries/day):

ModelMonthly CostNotes
text-embedding-3-small~$1.20Very cheap
voyage-3~$3.603x more
text-embedding-3-large~$7.806x more

Query embedding cost is almost always negligible — focus on ingestion cost.


Practical Recommendation

Use text-embedding-3-small if:
  → You're starting a new project
  → Budget matters
  → English-only content
  → Good RAG quality is sufficient (vs best quality)

Use voyage-3 if:
  → You need maximum retrieval quality
  → Code search (voyage-code-3)
  → Long documents (32K context vs 8K for OpenAI)

Use cohere embed-multilingual if:
  → Users query in multiple languages
  → Classification alongside search
  → Building a multilingual search engine

Use nomic-embed-text if:
  → Self-hosting is a requirement
  → Processing huge corpora locally
  → Privacy-sensitive documents
  → Development/testing ($0 cost)

Skip text-embedding-ada-002:
  → Legacy model — always use v3 models instead
  → Strictly worse than text-embedding-3-small and costs 5x more

Find and compare embedding APIs at APIScout.

Comments