Embedding Models Compared: OpenAI vs Cohere vs Voyage vs Open Source 2026
TL;DR
For most RAG and semantic search tasks: text-embedding-3-small from OpenAI wins on value — excellent performance, 1536 dimensions, $0.02/1M tokens. For maximum quality: Voyage AI's voyage-3-large leads the MTEB leaderboard. For zero-cost self-hosted: nomic-embed-text-v1.5 (via Ollama) is surprisingly competitive. Cohere's embed-v3 excels at multilingual and classification. The performance gap between these models is smaller than the gap between chunking strategies — your RAG pipeline's chunking and retrieval logic matters more than model choice.
Key Takeaways
- OpenAI
text-embedding-3-small: best value, 62.3 MTEB, $0.02/1M tokens, 1536 dims - OpenAI
text-embedding-3-large: higher quality, 64.6 MTEB, $0.13/1M tokens, 3072 dims - Voyage AI
voyage-3-large: top MTEB score 68.2, best raw quality, $0.12/1M tokens - Cohere
embed-v3: multilingual (100+ languages), classification-optimized, $0.10/1M tokens - nomic-embed-text-v1.5: free self-hosted, 62.4 MTEB (beats OpenAI small!), 768 dims
- Matryoshka embeddings: OpenAI and Voyage support truncating dimensions — save 4x cost without much quality loss
MTEB Benchmark Scores (2026)
MTEB (Massive Text Embedding Benchmark) is the standard leaderboard for embedding models across 56 tasks:
| Model | MTEB Score | Dimensions | Cost/1M tokens | Context |
|---|---|---|---|---|
| voyage-3-large | 68.2 | 1024 | $0.12 | 32K |
| text-embedding-3-large | 64.6 | 3072 | $0.13 | 8K |
| nomic-embed-text-v1.5 | 62.4 | 768 | Free (self-hosted) | 8K |
| text-embedding-3-small | 62.3 | 1536 | $0.02 | 8K |
| Cohere embed-english-v3.0 | 64.5 | 1024 | $0.10 | 512 |
| voyage-3 | 65.1 | 1024 | $0.06 | 32K |
| Cohere embed-multilingual-v3 | 60.1 | 1024 | $0.10 | 512 |
| text-embedding-ada-002 | 61.0 | 1536 | $0.10 | 8K |
Don't over-optimize on MTEB. A 2-point MTEB difference rarely matters in practice as much as retrieval strategy, chunk size, or query preprocessing.
OpenAI Embeddings: Default Choice
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Single embedding:
async function embed(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text.replace(/\n/g, ' '), // Newlines hurt quality
});
return response.data[0].embedding;
}
// Batch embeddings (much more efficient):
async function embedBatch(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts.map((t) => t.replace(/\n/g, ' ')),
});
// Response preserves order:
return response.data.map((d) => d.embedding);
}
Matryoshka: Truncate Dimensions for Cost
OpenAI's v3 models support Matryoshka Representation Learning — you can truncate the output dimensions with minimal quality loss:
// Full dimensions: 1536 (text-embedding-3-small)
// Truncated: 256 dims — 6x smaller, ~1.5% quality loss
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
dimensions: 256, // Truncate to 256 dims
});
// Storage savings: 1536 float32 = 6KB per vector
// 256 float32 = 1KB per vector
// At 1M documents: 6GB vs 1GB — huge difference for pgvector
// Useful dimensions for text-embedding-3-small:
// 1536 — full quality (default)
// 512 — good quality, 3x smaller
// 256 — acceptable quality, 6x smaller
Voyage AI: Highest Quality
// npm install voyageai
import VoyageAI from 'voyageai';
const voyage = new VoyageAI({ apiKey: process.env.VOYAGE_API_KEY });
// Basic embedding:
const response = await voyage.embed({
input: ['How do I connect to a database?'],
model: 'voyage-3-large',
});
const embedding = response.data[0].embedding; // 1024 dims
// Batch:
const batchResponse = await voyage.embed({
input: texts,
model: 'voyage-3-large',
input_type: 'document', // 'document' for corpus, 'query' for search queries
truncation: true, // Truncate at context limit instead of error
});
Voyage Asymmetric Embeddings
Voyage supports asymmetric search — different representations for documents vs queries:
// When indexing documents:
const docEmbeddings = await voyage.embed({
input: documentTexts,
model: 'voyage-3',
input_type: 'document', // Optimized for documents
});
// When searching:
const queryEmbedding = await voyage.embed({
input: [userQuery],
model: 'voyage-3',
input_type: 'query', // Optimized for queries
});
// This matters more than it seems:
// A query "how to connect" and a document "database connection tutorial"
// mean the same thing but look different textually
// Asymmetric models bridge this gap better
Voyage models:
voyage-3-large: highest quality, $0.12/1Mvoyage-3: balanced, $0.06/1Mvoyage-3-lite: fast and cheap, $0.02/1M — competitive with OpenAI smallvoyage-code-3: optimized for code search, $0.18/1M
Cohere Embeddings: Multilingual + Classification
// npm install cohere-ai
import { CohereClient } from 'cohere-ai';
const cohere = new CohereClient({ token: process.env.COHERE_API_KEY });
// English embeddings:
const response = await cohere.v2.embed({
texts: [
'What is machine learning?',
'Explain neural networks',
],
model: 'embed-english-v3.0',
inputType: 'search_document', // For indexing; 'search_query' for querying
embeddingTypes: ['float'],
});
const embeddings = response.embeddings.float!;
// Multilingual — 100+ languages:
const multilingualResponse = await cohere.v2.embed({
texts: [
'How are you?', // English
'¿Cómo estás?', // Spanish
'Comment allez-vous?', // French
'你好吗?', // Chinese
],
model: 'embed-multilingual-v3.0',
inputType: 'search_document',
embeddingTypes: ['float'],
});
// All 4 texts will have similar embeddings for similar meanings
// Great for global apps where users query in different languages
// Cohere int8 embeddings — 4x smaller, minimal quality loss:
const int8Response = await cohere.v2.embed({
texts: documents,
model: 'embed-english-v3.0',
inputType: 'search_document',
embeddingTypes: ['int8'], // 1024 int8 vs 1024 float32 = 4x smaller
});
// Great for: very large corpora where storage is a concern
Open Source: nomic-embed via Ollama (Free)
# Install Ollama, then pull the model:
ollama pull nomic-embed-text
# Serve (already running with `ollama serve`):
# http://localhost:11434
// nomic-embed-text via Ollama API:
async function embedWithOllama(text: string): Promise<number[]> {
const response = await fetch('http://localhost:11434/api/embeddings', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'nomic-embed-text',
prompt: text,
}),
});
const data = await response.json();
return data.embedding; // 768 dimensions
}
// Or use Ollama's OpenAI-compatible endpoint:
const openaiCompatible = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama', // Ignored but required
});
const response = await openaiCompatible.embeddings.create({
model: 'nomic-embed-text',
input: text,
});
nomic-embed-text-v1.5 supports Matryoshka dimensions (768, 512, 256, 128, 64) — same feature as OpenAI.
Use open source embeddings when:
- Running locally (dev, privacy-sensitive data)
- Batch processing large corpora ($0 vs $200+ for 10B tokens)
- Self-hosted infrastructure where cloud API calls are not possible
- Experimenting without API costs
Cost at Scale
Embedding 10M documents (avg 200 tokens each = 2B tokens):
| Model | Cost | Notes |
|---|---|---|
| text-embedding-3-small | $40 | Best value commercial |
| voyage-3-lite | $40 | Competitive alternative |
| text-embedding-ada-002 | $200 | Old model, avoid |
| text-embedding-3-large | $260 | Only if quality critical |
| nomic-embed-text | $0 | Self-hosted |
For ongoing use (1M queries/day):
| Model | Monthly Cost | Notes |
|---|---|---|
| text-embedding-3-small | ~$1.20 | Very cheap |
| voyage-3 | ~$3.60 | 3x more |
| text-embedding-3-large | ~$7.80 | 6x more |
Query embedding cost is almost always negligible — focus on ingestion cost.
Practical Recommendation
Use text-embedding-3-small if:
→ You're starting a new project
→ Budget matters
→ English-only content
→ Good RAG quality is sufficient (vs best quality)
Use voyage-3 if:
→ You need maximum retrieval quality
→ Code search (voyage-code-3)
→ Long documents (32K context vs 8K for OpenAI)
Use cohere embed-multilingual if:
→ Users query in multiple languages
→ Classification alongside search
→ Building a multilingual search engine
Use nomic-embed-text if:
→ Self-hosting is a requirement
→ Processing huge corpora locally
→ Privacy-sensitive documents
→ Development/testing ($0 cost)
Skip text-embedding-ada-002:
→ Legacy model — always use v3 models instead
→ Strictly worse than text-embedding-3-small and costs 5x more
Find and compare embedding APIs at APIScout.