Skip to main content

Pinecone vs Qdrant vs Weaviate

·APIScout Team
pineconeqdrantweaviatevector-databaseragembeddingssemantic-search2026

TL;DR

Qdrant for performance-critical production workloads — Rust-based, 20ms p95 latency, 15K QPS, and the best payload filtering in the category. Pinecone for teams who want zero database operations — fully managed, consistent performance, and the simplest API. Weaviate for hybrid search (vector + keyword BM25) — its native BM25 integration and GraphQL API make it the best choice when you need both semantic and keyword search in the same index. Self-hosting Weaviate or Qdrant saves 60–70% versus Pinecone at scale.

Key Takeaways

  • Qdrant: 20ms p95 latency, 15K QPS, Rust-based, best payload filtering, self-host or managed cloud
  • Pinecone: 50ms p95 latency, 10K QPS, serverless ($0.33/GB/month), zero infrastructure, SOC 2 + ISO 27001 + HIPAA
  • Weaviate: 30ms p95 latency, 5K QPS, best hybrid search, GraphQL API, module ecosystem (vectorizers, generative)
  • Cost at scale (1B vectors): Pinecone ~$3,500/month managed; Weaviate Cloud ~$2,200/month; Qdrant Cloud ~$1,000/month; self-hosted ~$800/month
  • pgvector: For Postgres shops with <10M vectors and <100 QPS — free with your existing database

The Vector Database Landscape in 2026

Vector databases store high-dimensional embeddings (typically 1536-dimensional for OpenAI text-embedding-3-small) and perform approximate nearest-neighbor (ANN) search. The choice matters at scale — at 100M vectors, the difference between a well-optimized database and a poorly-chosen one is 10x on cost and 5x on latency.

The 2026 landscape has three main segments:

Segment 1: Managed simplicity
  Pinecone — zero infrastructure, serverless, best for teams without MLOps

Segment 2: Self-hosted performance
  Qdrant — Rust performance, on-prem or cloud
  Weaviate — feature-rich, strong hybrid search

Segment 3: Embedded/lightweight
  Chroma — local dev, prototyping
  LanceDB — edge deployment
  pgvector — Postgres extension for small-medium workloads

Pinecone

Getting Started

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create a serverless index
pc.create_index(
    name="rag-documents",
    dimension=1536,  # Match your embedding model dimensions
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1",
    ),
)

# Connect to index
index = pc.Index("rag-documents")

Upsert and Query

from openai import OpenAI

openai_client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        input=text,
        model="text-embedding-3-small",
    )
    return response.data[0].embedding

# Upsert documents
vectors = []
for doc in documents:
    embedding = get_embedding(doc.content)
    vectors.append({
        "id": doc.id,
        "values": embedding,
        "metadata": {
            "text": doc.content,
            "source": doc.source,
            "created_at": doc.created_at.isoformat(),
        },
    })

# Batch upsert (max 100 vectors per call)
for i in range(0, len(vectors), 100):
    index.upsert(vectors=vectors[i:i+100])

# Query
query_embedding = get_embedding("What are the payment terms?")

results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True,
    # Filter by metadata
    filter={
        "source": {"$eq": "contracts"},
        "created_at": {"$gte": "2025-01-01"},
    },
)

for match in results.matches:
    print(f"Score: {match.score:.4f}")
    print(f"Text: {match.metadata['text'][:200]}")
    print("---")

Namespaces for Multi-Tenancy

# Pinecone namespaces for tenant isolation
index.upsert(
    vectors=vectors,
    namespace=f"tenant-{tenant_id}",  # Isolated per tenant
)

results = index.query(
    vector=query_embedding,
    top_k=5,
    namespace=f"tenant-{tenant_id}",
    include_metadata=True,
)

Serverless Pricing (2026)

Pinecone Serverless:
  Storage:  $0.33/GB/month
  Writes:   $2.00/million write units
  Reads:    $4.00/million read units

Approximate at 1M documents (1536-dim, float32):
  Storage:  ~6GB → ~$2/month
  Monthly queries (100K/day): ~$12/month
  Monthly ingestion (100K docs): ~$0.20
  Total: ~$14/month at low volume

At 100M documents, 1M queries/day:
  Storage:  ~600GB → $198/month
  Queries:  30M/month → $120/month
  Total:    ~$318/month (light usage at scale)
  Performance tier: scales linearly

Qdrant

Qdrant is the performance leader — written in Rust, it handles complex payload filtering without sacrificing search speed.

Getting Started

from qdrant_client import QdrantClient, models

# Self-hosted (Docker)
client = QdrantClient(host="localhost", port=6333)

# Qdrant Cloud
client = QdrantClient(
    url="https://your-cluster-url.qdrant.io",
    api_key=os.environ["QDRANT_API_KEY"],
)

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=models.VectorParams(
        size=1536,
        distance=models.Distance.COSINE,
        on_disk=True,  # For large collections
    ),
    # HNSW configuration for performance tuning
    hnsw_config=models.HnswConfigDiff(
        m=16,              # Higher = better recall, more memory
        ef_construct=100,  # Higher = slower indexing, better quality
    ),
)

Upsert with Rich Payloads

from qdrant_client.models import PointStruct

# Qdrant uses "points" with payloads (equivalent to metadata)
points = [
    PointStruct(
        id=i,  # Integer or UUID
        vector=embedding,
        payload={
            "text": doc.content,
            "source": doc.source,
            "department": doc.department,
            "access_level": doc.access_level,
            "created_at": doc.created_at.timestamp(),
            "word_count": len(doc.content.split()),
        },
    )
    for i, (doc, embedding) in enumerate(zip(documents, embeddings))
]

client.upsert(
    collection_name="documents",
    points=points,
)

Advanced Filtering (Qdrant's Strength)

Qdrant's payload filtering is the most expressive in the category:

from qdrant_client.models import Filter, FieldCondition, Range, MatchValue, MatchAny

# Complex filter: department = legal AND access_level >= 2 AND recent
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="department",
                match=MatchValue(value="legal"),
            ),
            FieldCondition(
                key="access_level",
                range=Range(gte=2),
            ),
            FieldCondition(
                key="created_at",
                range=Range(
                    gte=datetime(2025, 1, 1).timestamp(),
                ),
            ),
        ],
        should=[
            FieldCondition(
                key="source",
                match=MatchAny(any=["contracts", "agreements"]),
            ),
        ],
    ),
    limit=10,
    with_payload=True,
    with_vectors=False,  # Don't return vectors (saves bandwidth)
)

for result in results:
    print(f"Score: {result.score:.4f} | Source: {result.payload['source']}")
    print(result.payload['text'][:200])

Hybrid Search with Sparse Vectors

from qdrant_client.models import SparseVector, SparseVectorParams

# Create collection with both dense and sparse vectors
client.create_collection(
    collection_name="hybrid_docs",
    vectors_config={
        "dense": models.VectorParams(size=1536, distance=models.Distance.COSINE),
    },
    sparse_vectors_config={
        "sparse": SparseVectorParams(
            index=models.SparseIndexParams(on_disk=True),
        ),
    },
)

# Query with both (RRF fusion)
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector

results = client.query_points(
    collection_name="hybrid_docs",
    prefetch=[
        models.Prefetch(
            query=NamedVector(name="dense", vector=dense_embedding),
            limit=20,
        ),
        models.Prefetch(
            query=NamedSparseVector(
                name="sparse",
                vector=SparseVector(
                    indices=sparse_indices,
                    values=sparse_values,
                ),
            ),
            limit=20,
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=5,
)

Weaviate

Weaviate is the hybrid search specialist. Its native BM25 index means you don't need a separate Elasticsearch instance for keyword search.

Getting Started

import weaviate
import weaviate.classes as wvc

# Connect to Weaviate Cloud (WCS)
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ["WEAVIATE_URL"],
    auth_credentials=weaviate.auth.AuthApiKey(
        os.environ["WEAVIATE_API_KEY"]
    ),
)

# Create collection
client.collections.create(
    name="Document",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small",
    ),
    generative_config=wvc.config.Configure.Generative.openai(
        model="gpt-4o",
    ),
    properties=[
        wvc.config.Property(
            name="content",
            data_type=wvc.config.DataType.TEXT,
        ),
        wvc.config.Property(
            name="source",
            data_type=wvc.config.DataType.TEXT,
            skip_vectorization=True,
        ),
        wvc.config.Property(
            name="department",
            data_type=wvc.config.DataType.TEXT,
            skip_vectorization=True,
        ),
    ],
)

Hybrid Search (Weaviate's Strength)

documents = client.collections.get("Document")

# Pure vector search
vector_results = documents.query.near_text(
    query="payment terms and conditions",
    limit=5,
    return_metadata=wvc.query.MetadataQuery(distance=True),
)

# Pure keyword search (BM25)
keyword_results = documents.query.bm25(
    query="payment terms NET30",
    limit=5,
    return_metadata=wvc.query.MetadataQuery(score=True),
)

# Hybrid search (vector + BM25 combined) — Weaviate's signature feature
hybrid_results = documents.query.hybrid(
    query="payment terms NET30",
    alpha=0.5,  # 0 = pure BM25, 1 = pure vector, 0.5 = balanced
    limit=5,
    filters=wvc.query.Filter.by_property("department").equal("legal"),
    return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True),
)

for result in hybrid_results.objects:
    print(f"Score: {result.metadata.score:.4f}")
    print(f"Content: {result.properties['content'][:200]}")

Generative Search (RAG in One Query)

Weaviate's generative modules run the LLM call inside the database:

# Single query: search + generate response
response = documents.generate.hybrid(
    query="What are the payment terms in our enterprise contracts?",
    alpha=0.5,
    limit=3,
    # RAG: generate a response using the retrieved documents
    grouped_task="Summarize the payment terms found in these documents. "
                 "Format as a bullet list with key terms highlighted.",
)

print(response.generated)  # LLM-generated summary
for obj in response.objects:
    print(f"Source: {obj.properties['source']}")

Performance Comparison

Benchmark setup: 100M vectors, 1536 dimensions, 10% payload filter

Latency (p50 / p95 / p99):
  Pinecone:  12ms / 50ms / 85ms
  Qdrant:     8ms / 20ms / 35ms
  Weaviate:  10ms / 30ms / 55ms

Throughput (concurrent requests):
  Pinecone:  10,000 QPS (managed, auto-scales)
  Qdrant:    15,000 QPS (self-hosted, 32-core)
  Weaviate:   5,000 QPS (self-hosted, 32-core)

With complex payload filter (3 conditions):
  Pinecone:  +8ms latency overhead (metadata index)
  Qdrant:    +2ms latency overhead (native HNSW+filter)
  Weaviate:  +5ms latency overhead

Qdrant's HNSW+filter implementation is the most efficient —
payload filtering runs during graph traversal, not as post-filter.

Cost Comparison at Scale

ScalePinecone CloudWeaviate CloudQdrant CloudSelf-hosted
1M vectors~$14/month~$45/month~$20/month~$15/month
10M vectors~$50/month~$120/month~$60/month~$50/month
100M vectors~$350/month~$800/month~$400/month~$200/month
1B vectors~$3,500/month~$2,200/month~$1,000/month~$800/month

Estimates based on 1536-dim vectors, moderate query volume (100K queries/day), 2026 pricing


Feature Comparison

FeaturePineconeQdrantWeaviate
Managed cloud✅ Only✅ + self-host✅ + self-host
Open source✅ Apache 2.0✅ BSD 3
Hybrid search⚠️ Manual✅ Sparse vectors✅ Native BM25
GraphQL API
REST API
gRPC API
Built-in vectorizer✅ (module system)
Generative search✅ (RAG in one call)
Multi-tenancy✅ Namespaces✅ Collections✅ Multi-tenancy plugin
SOC 2 Type II✅ Cloud✅ Cloud
HIPAA✅ Enterprise✅ Enterprise Cloud
Payload filtering✅ Metadata✅✅ Best-in-class✅ Good
On-disk storage
GPU acceleration

When pgvector Is Enough

Before committing to a dedicated vector DB, consider pgvector:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add vector column to existing table
ALTER TABLE documents
ADD COLUMN embedding vector(1536);

-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Query
SELECT id, content,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

pgvector is the right choice if:

  • You're already on Postgres (Supabase, Neon, PlanetScale)
  • Vectors < 10M
  • Query volume < 100 QPS
  • You don't want another service to manage

Beyond those bounds, dedicated vector databases win on performance.


Decision Guide

Choose Pinecone if:

  • You want zero infrastructure — no Docker, no k8s, no ops
  • Your team has no MLOps resources
  • HIPAA compliance is required (Enterprise tier)
  • You're starting out and want to iterate fast

Choose Qdrant if:

  • Performance is critical — lowest latency, highest throughput
  • You need complex payload filtering (multiple conditions, nested objects)
  • You're comfortable with self-hosting or Qdrant Cloud
  • Cost at scale matters — significantly cheaper than Pinecone managed

Choose Weaviate if:

  • Hybrid search (semantic + keyword) is a core requirement
  • You want built-in vectorization (no separate embedding service)
  • Generative search (RAG in one query) simplifies your architecture
  • GraphQL API fits your existing patterns

Browse all vector database and AI infrastructure APIs at APIScout.

Related: RAG Pipeline: Pinecone vs Weaviate vs pgvector · Embedding Models Compared: OpenAI vs Cohere vs Voyage

Comments