Skip to main content

Building a RAG Pipeline in 2026: Pinecone vs Weaviate vs pgvector

·APIScout Team
ragvector-databasepineconeweaviatepgvectorembeddings2026

TL;DR

For most teams: pgvector. It runs in your existing Postgres database, costs nothing extra, and handles millions of vectors comfortably. Pinecone wins when you need a fully managed, scale-to-zero vector store with zero operational overhead. Weaviate wins when you want AI-native features (built-in vectorization, hybrid BM25+vector search, graph traversal) without running your own model. The "best" vector database in 2026 depends almost entirely on your existing stack — don't introduce a new service if Postgres already works.

Key Takeaways

  • pgvector: free, Postgres-native, handles 1M+ vectors easily, no new infra
  • Pinecone: best managed vector DB, auto-scaling, $0 free tier (100K vectors), fastest similarity search
  • Weaviate: best AI-native features (HNSW + BM25 hybrid), multimodal, self-host or cloud
  • Embedding models: text-embedding-3-small (OpenAI, cheap) or nomic-embed-text (open source, free)
  • RAG latency: pgvector ~10-50ms, Pinecone ~10-20ms, Weaviate ~5-20ms
  • Rule of thumb: <500K vectors → pgvector, 500K-10M → Pinecone, complex queries → Weaviate

The RAG Architecture

Before comparing databases, understand the full pipeline:

Document Ingestion:
  Raw docs → Chunk → Embed → Store in vector DB
  (Run once, or incrementally as content changes)

Query Time:
  User query → Embed query → Similarity search → Get top-K chunks
                                                         ↓
                                              Inject into LLM prompt
                                                         ↓
                                              LLM generates answer

All three vector databases handle the "Store" and "Similarity search" steps. The embeddings step is the same regardless of which you use.


Setting Up Embeddings (Same for All Three)

// embeddings.ts — Generate embeddings with OpenAI:
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',  // 1536 dims, $0.02/1M tokens
    // model: 'text-embedding-3-large',  // 3072 dims, $0.13/1M tokens
    input: text.replace(/\n/g, ' '),
  });
  return response.data[0].embedding;
}

// Batch embeddings (more efficient):
export async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts.map((t) => t.replace(/\n/g, ' ')),
  });
  return response.data.map((d) => d.embedding);
}
// Chunking strategy (critical for RAG quality):
export function chunkDocument(content: string, chunkSize = 500, overlap = 50): string[] {
  const words = content.split(/\s+/);
  const chunks: string[] = [];

  for (let i = 0; i < words.length; i += chunkSize - overlap) {
    const chunk = words.slice(i, i + chunkSize).join(' ');
    if (chunk.length > 100) {  // Skip tiny chunks
      chunks.push(chunk);
    }
  }

  return chunks;
}

// For code documentation — split by function/class, not words
// For PDFs — split by page or paragraph
// For markdown — split by heading sections

pgvector: RAG in Postgres

Best choice when: already using Postgres (Supabase, Neon, Railway, self-hosted)

Setup

-- Enable the extension (Supabase: already enabled by default):
CREATE EXTENSION IF NOT EXISTS vector;

-- Create documents table:
CREATE TABLE documents (
  id          BIGSERIAL PRIMARY KEY,
  content     TEXT NOT NULL,
  metadata    JSONB DEFAULT '{}',
  embedding   VECTOR(1536),  -- Match your model's dimensions
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index (best for most cases):
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- OR IVFFlat (better for exact recall at scale):
-- CREATE INDEX ON documents
--   USING ivfflat (embedding vector_cosine_ops)
--   WITH (lists = 100);  -- lists ≈ sqrt(num_rows)
// pgvector with Drizzle ORM:
import { drizzle } from 'drizzle-orm/postgres-js';
import postgres from 'postgres';
import { sql } from 'drizzle-orm';
import { customType, jsonb, pgTable, serial, text, timestamp } from 'drizzle-orm/pg-core';

// Custom vector type for Drizzle:
const vector = (name: string, dimensions: number) =>
  customType<{ data: number[]; driverData: string }>({
    dataType() {
      return `vector(${dimensions})`;
    },
    toDriver(value: number[]) {
      return `[${value.join(',')}]`;
    },
    fromDriver(value: string) {
      return value.slice(1, -1).split(',').map(Number);
    },
  })(name);

export const documents = pgTable('documents', {
  id: serial('id').primaryKey(),
  content: text('content').notNull(),
  metadata: jsonb('metadata').default({}),
  embedding: vector('embedding', 1536),
  createdAt: timestamp('created_at').defaultNow(),
});

const db = drizzle(postgres(process.env.DATABASE_URL!));
// Insert documents:
export async function insertDocument(
  content: string,
  metadata: Record<string, unknown> = {}
) {
  const embedding = await embed(content);

  await db.insert(documents).values({
    content,
    metadata,
    embedding,
  });
}

// Semantic search:
export async function searchDocuments(query: string, limit = 5) {
  const queryEmbedding = await embed(query);

  // cosine similarity search using pgvector <=> operator:
  const results = await db.execute(sql`
    SELECT
      id,
      content,
      metadata,
      1 - (embedding <=> ${`[${queryEmbedding.join(',')}]`}::vector) AS similarity
    FROM documents
    ORDER BY embedding <=> ${`[${queryEmbedding.join(',')}]`}::vector
    LIMIT ${limit}
  `);

  return results.rows as Array<{
    id: number;
    content: string;
    metadata: Record<string, unknown>;
    similarity: number;
  }>;
}
// Full RAG function with pgvector:
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';

export async function ragAnswer(userQuestion: string): Promise<string> {
  // 1. Search for relevant context:
  const relevant = await searchDocuments(userQuestion, 5);

  if (relevant.length === 0) {
    return "I don't have information about that in my knowledge base.";
  }

  // 2. Build context string:
  const context = relevant
    .map((doc, i) => `[${i + 1}] ${doc.content}`)
    .join('\n\n');

  // 3. Generate answer:
  const { text } = await generateText({
    model: openai('gpt-4o'),
    system: `You are a helpful assistant. Answer questions based ONLY on the provided context.
If the context doesn't contain enough information, say so.

Context:
${context}`,
    prompt: userQuestion,
  });

  return text;
}

Pinecone: Managed Vector Database

Best choice when: you need a fully managed solution, high query volume, or don't want to manage Postgres.

Setup

// npm install @pinecone-database/pinecone
import { Pinecone } from '@pinecone-database/pinecone';

const pinecone = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY!,
});

// Create an index (serverless — scales to zero):
await pinecone.createIndex({
  name: 'knowledge-base',
  dimension: 1536,      // Match embedding model
  metric: 'cosine',
  spec: {
    serverless: {
      cloud: 'aws',
      region: 'us-east-1',
    },
  },
});
// Insert vectors:
const index = pinecone.index('knowledge-base');

export async function upsertDocuments(
  documents: Array<{ id: string; content: string; metadata?: Record<string, string | number> }>
) {
  // Embed in batches:
  const batchSize = 100;
  for (let i = 0; i < documents.length; i += batchSize) {
    const batch = documents.slice(i, i + batchSize);
    const embeddings = await embedBatch(batch.map((d) => d.content));

    await index.upsert(
      batch.map((doc, j) => ({
        id: doc.id,
        values: embeddings[j],
        metadata: {
          content: doc.content,  // Store content in metadata for retrieval
          ...doc.metadata,
        },
      }))
    );
  }
}
// Query vectors:
export async function searchPinecone(
  query: string,
  filter?: Record<string, string | number>,
  limit = 5
) {
  const queryEmbedding = await embed(query);

  const results = await index.query({
    vector: queryEmbedding,
    topK: limit,
    includeMetadata: true,
    filter,  // Optional: filter by metadata fields
  });

  return results.matches.map((match) => ({
    id: match.id,
    content: match.metadata?.content as string,
    score: match.score,
    metadata: match.metadata,
  }));
}

// Filter example — only search certain document types:
const results = await searchPinecone('pricing questions', {
  document_type: 'pricing',
  language: 'en',
});

Pinecone Namespaces for Multi-Tenancy

// Namespace isolation per customer (no extra cost):
const customerIndex = pinecone.index('knowledge-base').namespace(`customer-${customerId}`);

// Insert to customer namespace:
await customerIndex.upsert([{ id: 'doc-1', values: embedding, metadata: { content } }]);

// Query only this customer's data:
const results = await customerIndex.query({ vector: queryEmbedding, topK: 5 });

// Clean up when customer leaves:
await customerIndex.deleteAll();

Best choice when: you want hybrid search (semantic + keyword), built-in vectorization, or graph relationships between documents.

Setup with Weaviate Cloud

// npm install weaviate-client
import weaviate, { WeaviateClient, dataType } from 'weaviate-client';

const client: WeaviateClient = await weaviate.connectToWeaviateCloud(
  process.env.WEAVIATE_URL!,
  {
    authCredentials: new weaviate.ApiKey(process.env.WEAVIATE_API_KEY!),
    headers: {
      'X-OpenAI-Api-Key': process.env.OPENAI_API_KEY!,  // For auto-vectorization
    },
  }
);

// Create collection (Weaviate's equivalent of a table):
await client.collections.create({
  name: 'Document',
  vectorizers: [
    weaviate.configure.vectorizer.text2VecOpenAI({
      model: 'text-embedding-3-small',
    }),
  ],
  generative: weaviate.configure.generative.openAI({ model: 'gpt-4o' }),
  properties: [
    { name: 'content', dataType: dataType.TEXT },
    { name: 'source', dataType: dataType.TEXT },
    { name: 'category', dataType: dataType.TEXT },
  ],
});
// Insert objects — Weaviate auto-vectorizes:
const collection = client.collections.get('Document');

await collection.data.insertMany([
  { content: 'PostgreSQL is an object-relational database...', source: 'docs/postgres.md', category: 'database' },
  { content: 'MongoDB is a document database...', source: 'docs/mongo.md', category: 'database' },
]);
// No need to call embed() — Weaviate does it automatically
// Hybrid search (vector + BM25 keyword):
const results = await collection.query.hybrid('how do I connect to postgres', {
  limit: 5,
  alpha: 0.75,  // 0 = pure keyword, 1 = pure vector
  returnMetadata: ['score', 'certainty'],
  filters: collection.filter.byProperty('category').equal('database'),
});

for (const result of results.objects) {
  console.log(`Score: ${result.metadata?.score}, Content: ${result.properties.content}`);
}
// Weaviate Generative Search (built-in RAG):
const results = await collection.generate.nearText(
  'how does postgres handle transactions',
  {
    groupedTask: 'Summarize how the following documents describe database transactions.',
    limit: 5,
  }
);

// results.generated contains the LLM-generated answer
console.log(results.generated);
// Each result also has the source chunks

Benchmark: Similarity Search Performance

For 1M documents, Llama 3 embedding (1024 dims):

DatabaseQuery latency (p99)ThroughputANN accuracy
Pinecone (serverless)15-25msHigh99%+
Weaviate (cloud)10-20msHigh99%+
pgvector (HNSW)20-50msMedium98%+
pgvector (IVFFlat)50-150msMedium95-99%

For most applications, all three are fast enough. The difference matters at >10M vectors or >100 QPS.


Cost Comparison at Scale

10M vectors, 1536 dimensions, 1000 queries/day:

SolutionMonthly CostNotes
pgvector on Neon~$508GB storage, compute
pgvector on Supabase Pro$25Included in Pro plan
Pinecone Serverless~$35Estimated by usage
Weaviate Cloud~$100Enterprise features
Weaviate Self-hosted~$20-50Just VPS cost

pgvector wins on cost for teams already paying for Postgres.


Full Production RAG Checklist

Ingestion pipeline:
[ ] Chunk documents intelligently (by section, not word count)
[ ] Add metadata (source, date, document_type) for filtering
[ ] Deduplicate before upserting (hash content)
[ ] Store original content for retrieval (not just vectors)
[ ] Batch embed for cost efficiency

Query time:
[ ] Embed query with same model used for documents
[ ] Retrieve 5-10 chunks (more = better context, higher cost)
[ ] Hybrid search if keyword matching matters (Weaviate or pgvector + tsvector)
[ ] Filter by metadata when query implies scope (dates, categories)
[ ] Rerank results (Cohere Rerank or cross-encoder) for better accuracy

LLM generation:
[ ] Set clear system prompt: "Answer ONLY from the context provided"
[ ] Include source citations in the prompt
[ ] Handle "not found in context" gracefully
[ ] Use streaming for better UX
[ ] Log what context was used (debugging + audit)

Evaluation:
[ ] Track retrieval recall (were relevant docs in the top-K?)
[ ] Track answer faithfulness (did LLM hallucinate beyond context?)
[ ] Use Ragas or ARES for automated RAG evaluation

Compare vector databases and AI APIs at APIScout.

Comments