Building a RAG Pipeline in 2026: Pinecone vs Weaviate vs pgvector
TL;DR
For most teams: pgvector. It runs in your existing Postgres database, costs nothing extra, and handles millions of vectors comfortably. Pinecone wins when you need a fully managed, scale-to-zero vector store with zero operational overhead. Weaviate wins when you want AI-native features (built-in vectorization, hybrid BM25+vector search, graph traversal) without running your own model. The "best" vector database in 2026 depends almost entirely on your existing stack — don't introduce a new service if Postgres already works.
Key Takeaways
- pgvector: free, Postgres-native, handles 1M+ vectors easily, no new infra
- Pinecone: best managed vector DB, auto-scaling, $0 free tier (100K vectors), fastest similarity search
- Weaviate: best AI-native features (HNSW + BM25 hybrid), multimodal, self-host or cloud
- Embedding models:
text-embedding-3-small(OpenAI, cheap) ornomic-embed-text(open source, free) - RAG latency: pgvector ~10-50ms, Pinecone ~10-20ms, Weaviate ~5-20ms
- Rule of thumb: <500K vectors → pgvector, 500K-10M → Pinecone, complex queries → Weaviate
The RAG Architecture
Before comparing databases, understand the full pipeline:
Document Ingestion:
Raw docs → Chunk → Embed → Store in vector DB
(Run once, or incrementally as content changes)
Query Time:
User query → Embed query → Similarity search → Get top-K chunks
↓
Inject into LLM prompt
↓
LLM generates answer
All three vector databases handle the "Store" and "Similarity search" steps. The embeddings step is the same regardless of which you use.
Setting Up Embeddings (Same for All Three)
// embeddings.ts — Generate embeddings with OpenAI:
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export async function embed(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small', // 1536 dims, $0.02/1M tokens
// model: 'text-embedding-3-large', // 3072 dims, $0.13/1M tokens
input: text.replace(/\n/g, ' '),
});
return response.data[0].embedding;
}
// Batch embeddings (more efficient):
export async function embedBatch(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts.map((t) => t.replace(/\n/g, ' ')),
});
return response.data.map((d) => d.embedding);
}
// Chunking strategy (critical for RAG quality):
export function chunkDocument(content: string, chunkSize = 500, overlap = 50): string[] {
const words = content.split(/\s+/);
const chunks: string[] = [];
for (let i = 0; i < words.length; i += chunkSize - overlap) {
const chunk = words.slice(i, i + chunkSize).join(' ');
if (chunk.length > 100) { // Skip tiny chunks
chunks.push(chunk);
}
}
return chunks;
}
// For code documentation — split by function/class, not words
// For PDFs — split by page or paragraph
// For markdown — split by heading sections
pgvector: RAG in Postgres
Best choice when: already using Postgres (Supabase, Neon, Railway, self-hosted)
Setup
-- Enable the extension (Supabase: already enabled by default):
CREATE EXTENSION IF NOT EXISTS vector;
-- Create documents table:
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding VECTOR(1536), -- Match your model's dimensions
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index (best for most cases):
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- OR IVFFlat (better for exact recall at scale):
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100); -- lists ≈ sqrt(num_rows)
// pgvector with Drizzle ORM:
import { drizzle } from 'drizzle-orm/postgres-js';
import postgres from 'postgres';
import { sql } from 'drizzle-orm';
import { customType, jsonb, pgTable, serial, text, timestamp } from 'drizzle-orm/pg-core';
// Custom vector type for Drizzle:
const vector = (name: string, dimensions: number) =>
customType<{ data: number[]; driverData: string }>({
dataType() {
return `vector(${dimensions})`;
},
toDriver(value: number[]) {
return `[${value.join(',')}]`;
},
fromDriver(value: string) {
return value.slice(1, -1).split(',').map(Number);
},
})(name);
export const documents = pgTable('documents', {
id: serial('id').primaryKey(),
content: text('content').notNull(),
metadata: jsonb('metadata').default({}),
embedding: vector('embedding', 1536),
createdAt: timestamp('created_at').defaultNow(),
});
const db = drizzle(postgres(process.env.DATABASE_URL!));
// Insert documents:
export async function insertDocument(
content: string,
metadata: Record<string, unknown> = {}
) {
const embedding = await embed(content);
await db.insert(documents).values({
content,
metadata,
embedding,
});
}
// Semantic search:
export async function searchDocuments(query: string, limit = 5) {
const queryEmbedding = await embed(query);
// cosine similarity search using pgvector <=> operator:
const results = await db.execute(sql`
SELECT
id,
content,
metadata,
1 - (embedding <=> ${`[${queryEmbedding.join(',')}]`}::vector) AS similarity
FROM documents
ORDER BY embedding <=> ${`[${queryEmbedding.join(',')}]`}::vector
LIMIT ${limit}
`);
return results.rows as Array<{
id: number;
content: string;
metadata: Record<string, unknown>;
similarity: number;
}>;
}
// Full RAG function with pgvector:
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
export async function ragAnswer(userQuestion: string): Promise<string> {
// 1. Search for relevant context:
const relevant = await searchDocuments(userQuestion, 5);
if (relevant.length === 0) {
return "I don't have information about that in my knowledge base.";
}
// 2. Build context string:
const context = relevant
.map((doc, i) => `[${i + 1}] ${doc.content}`)
.join('\n\n');
// 3. Generate answer:
const { text } = await generateText({
model: openai('gpt-4o'),
system: `You are a helpful assistant. Answer questions based ONLY on the provided context.
If the context doesn't contain enough information, say so.
Context:
${context}`,
prompt: userQuestion,
});
return text;
}
Pinecone: Managed Vector Database
Best choice when: you need a fully managed solution, high query volume, or don't want to manage Postgres.
Setup
// npm install @pinecone-database/pinecone
import { Pinecone } from '@pinecone-database/pinecone';
const pinecone = new Pinecone({
apiKey: process.env.PINECONE_API_KEY!,
});
// Create an index (serverless — scales to zero):
await pinecone.createIndex({
name: 'knowledge-base',
dimension: 1536, // Match embedding model
metric: 'cosine',
spec: {
serverless: {
cloud: 'aws',
region: 'us-east-1',
},
},
});
// Insert vectors:
const index = pinecone.index('knowledge-base');
export async function upsertDocuments(
documents: Array<{ id: string; content: string; metadata?: Record<string, string | number> }>
) {
// Embed in batches:
const batchSize = 100;
for (let i = 0; i < documents.length; i += batchSize) {
const batch = documents.slice(i, i + batchSize);
const embeddings = await embedBatch(batch.map((d) => d.content));
await index.upsert(
batch.map((doc, j) => ({
id: doc.id,
values: embeddings[j],
metadata: {
content: doc.content, // Store content in metadata for retrieval
...doc.metadata,
},
}))
);
}
}
// Query vectors:
export async function searchPinecone(
query: string,
filter?: Record<string, string | number>,
limit = 5
) {
const queryEmbedding = await embed(query);
const results = await index.query({
vector: queryEmbedding,
topK: limit,
includeMetadata: true,
filter, // Optional: filter by metadata fields
});
return results.matches.map((match) => ({
id: match.id,
content: match.metadata?.content as string,
score: match.score,
metadata: match.metadata,
}));
}
// Filter example — only search certain document types:
const results = await searchPinecone('pricing questions', {
document_type: 'pricing',
language: 'en',
});
Pinecone Namespaces for Multi-Tenancy
// Namespace isolation per customer (no extra cost):
const customerIndex = pinecone.index('knowledge-base').namespace(`customer-${customerId}`);
// Insert to customer namespace:
await customerIndex.upsert([{ id: 'doc-1', values: embedding, metadata: { content } }]);
// Query only this customer's data:
const results = await customerIndex.query({ vector: queryEmbedding, topK: 5 });
// Clean up when customer leaves:
await customerIndex.deleteAll();
Weaviate: AI-Native Vector Search
Best choice when: you want hybrid search (semantic + keyword), built-in vectorization, or graph relationships between documents.
Setup with Weaviate Cloud
// npm install weaviate-client
import weaviate, { WeaviateClient, dataType } from 'weaviate-client';
const client: WeaviateClient = await weaviate.connectToWeaviateCloud(
process.env.WEAVIATE_URL!,
{
authCredentials: new weaviate.ApiKey(process.env.WEAVIATE_API_KEY!),
headers: {
'X-OpenAI-Api-Key': process.env.OPENAI_API_KEY!, // For auto-vectorization
},
}
);
// Create collection (Weaviate's equivalent of a table):
await client.collections.create({
name: 'Document',
vectorizers: [
weaviate.configure.vectorizer.text2VecOpenAI({
model: 'text-embedding-3-small',
}),
],
generative: weaviate.configure.generative.openAI({ model: 'gpt-4o' }),
properties: [
{ name: 'content', dataType: dataType.TEXT },
{ name: 'source', dataType: dataType.TEXT },
{ name: 'category', dataType: dataType.TEXT },
],
});
// Insert objects — Weaviate auto-vectorizes:
const collection = client.collections.get('Document');
await collection.data.insertMany([
{ content: 'PostgreSQL is an object-relational database...', source: 'docs/postgres.md', category: 'database' },
{ content: 'MongoDB is a document database...', source: 'docs/mongo.md', category: 'database' },
]);
// No need to call embed() — Weaviate does it automatically
// Hybrid search (vector + BM25 keyword):
const results = await collection.query.hybrid('how do I connect to postgres', {
limit: 5,
alpha: 0.75, // 0 = pure keyword, 1 = pure vector
returnMetadata: ['score', 'certainty'],
filters: collection.filter.byProperty('category').equal('database'),
});
for (const result of results.objects) {
console.log(`Score: ${result.metadata?.score}, Content: ${result.properties.content}`);
}
// Weaviate Generative Search (built-in RAG):
const results = await collection.generate.nearText(
'how does postgres handle transactions',
{
groupedTask: 'Summarize how the following documents describe database transactions.',
limit: 5,
}
);
// results.generated contains the LLM-generated answer
console.log(results.generated);
// Each result also has the source chunks
Benchmark: Similarity Search Performance
For 1M documents, Llama 3 embedding (1024 dims):
| Database | Query latency (p99) | Throughput | ANN accuracy |
|---|---|---|---|
| Pinecone (serverless) | 15-25ms | High | 99%+ |
| Weaviate (cloud) | 10-20ms | High | 99%+ |
| pgvector (HNSW) | 20-50ms | Medium | 98%+ |
| pgvector (IVFFlat) | 50-150ms | Medium | 95-99% |
For most applications, all three are fast enough. The difference matters at >10M vectors or >100 QPS.
Cost Comparison at Scale
10M vectors, 1536 dimensions, 1000 queries/day:
| Solution | Monthly Cost | Notes |
|---|---|---|
| pgvector on Neon | ~$50 | 8GB storage, compute |
| pgvector on Supabase Pro | $25 | Included in Pro plan |
| Pinecone Serverless | ~$35 | Estimated by usage |
| Weaviate Cloud | ~$100 | Enterprise features |
| Weaviate Self-hosted | ~$20-50 | Just VPS cost |
pgvector wins on cost for teams already paying for Postgres.
Full Production RAG Checklist
Ingestion pipeline:
[ ] Chunk documents intelligently (by section, not word count)
[ ] Add metadata (source, date, document_type) for filtering
[ ] Deduplicate before upserting (hash content)
[ ] Store original content for retrieval (not just vectors)
[ ] Batch embed for cost efficiency
Query time:
[ ] Embed query with same model used for documents
[ ] Retrieve 5-10 chunks (more = better context, higher cost)
[ ] Hybrid search if keyword matching matters (Weaviate or pgvector + tsvector)
[ ] Filter by metadata when query implies scope (dates, categories)
[ ] Rerank results (Cohere Rerank or cross-encoder) for better accuracy
LLM generation:
[ ] Set clear system prompt: "Answer ONLY from the context provided"
[ ] Include source citations in the prompt
[ ] Handle "not found in context" gracefully
[ ] Use streaming for better UX
[ ] Log what context was used (debugging + audit)
Evaluation:
[ ] Track retrieval recall (were relevant docs in the top-K?)
[ ] Track answer faithfulness (did LLM hallucinate beyond context?)
[ ] Use Ragas or ARES for automated RAG evaluation
Compare vector databases and AI APIs at APIScout.