Pinecone vs Qdrant vs Weaviate
TL;DR
Qdrant for performance-critical production workloads — Rust-based, 20ms p95 latency, 15K QPS, and the best payload filtering in the category. Pinecone for teams who want zero database operations — fully managed, consistent performance, and the simplest API. Weaviate for hybrid search (vector + keyword BM25) — its native BM25 integration and GraphQL API make it the best choice when you need both semantic and keyword search in the same index. Self-hosting Weaviate or Qdrant saves 60–70% versus Pinecone at scale.
Key Takeaways
- Qdrant: 20ms p95 latency, 15K QPS, Rust-based, best payload filtering, self-host or managed cloud
- Pinecone: 50ms p95 latency, 10K QPS, serverless ($0.33/GB/month), zero infrastructure, SOC 2 + ISO 27001 + HIPAA
- Weaviate: 30ms p95 latency, 5K QPS, best hybrid search, GraphQL API, module ecosystem (vectorizers, generative)
- Cost at scale (1B vectors): Pinecone ~$3,500/month managed; Weaviate Cloud ~$2,200/month; Qdrant Cloud ~$1,000/month; self-hosted ~$800/month
- pgvector: For Postgres shops with <10M vectors and <100 QPS — free with your existing database
The Vector Database Landscape in 2026
Vector databases store high-dimensional embeddings (typically 1536-dimensional for OpenAI text-embedding-3-small) and perform approximate nearest-neighbor (ANN) search. The choice matters at scale — at 100M vectors, the difference between a well-optimized database and a poorly-chosen one is 10x on cost and 5x on latency.
The 2026 landscape has three main segments:
Segment 1: Managed simplicity
Pinecone — zero infrastructure, serverless, best for teams without MLOps
Segment 2: Self-hosted performance
Qdrant — Rust performance, on-prem or cloud
Weaviate — feature-rich, strong hybrid search
Segment 3: Embedded/lightweight
Chroma — local dev, prototyping
LanceDB — edge deployment
pgvector — Postgres extension for small-medium workloads
Pinecone
Getting Started
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
# Create a serverless index
pc.create_index(
name="rag-documents",
dimension=1536, # Match your embedding model dimensions
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1",
),
)
# Connect to index
index = pc.Index("rag-documents")
Upsert and Query
from openai import OpenAI
openai_client = OpenAI()
def get_embedding(text: str) -> list[float]:
response = openai_client.embeddings.create(
input=text,
model="text-embedding-3-small",
)
return response.data[0].embedding
# Upsert documents
vectors = []
for doc in documents:
embedding = get_embedding(doc.content)
vectors.append({
"id": doc.id,
"values": embedding,
"metadata": {
"text": doc.content,
"source": doc.source,
"created_at": doc.created_at.isoformat(),
},
})
# Batch upsert (max 100 vectors per call)
for i in range(0, len(vectors), 100):
index.upsert(vectors=vectors[i:i+100])
# Query
query_embedding = get_embedding("What are the payment terms?")
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True,
# Filter by metadata
filter={
"source": {"$eq": "contracts"},
"created_at": {"$gte": "2025-01-01"},
},
)
for match in results.matches:
print(f"Score: {match.score:.4f}")
print(f"Text: {match.metadata['text'][:200]}")
print("---")
Namespaces for Multi-Tenancy
# Pinecone namespaces for tenant isolation
index.upsert(
vectors=vectors,
namespace=f"tenant-{tenant_id}", # Isolated per tenant
)
results = index.query(
vector=query_embedding,
top_k=5,
namespace=f"tenant-{tenant_id}",
include_metadata=True,
)
Serverless Pricing (2026)
Pinecone Serverless:
Storage: $0.33/GB/month
Writes: $2.00/million write units
Reads: $4.00/million read units
Approximate at 1M documents (1536-dim, float32):
Storage: ~6GB → ~$2/month
Monthly queries (100K/day): ~$12/month
Monthly ingestion (100K docs): ~$0.20
Total: ~$14/month at low volume
At 100M documents, 1M queries/day:
Storage: ~600GB → $198/month
Queries: 30M/month → $120/month
Total: ~$318/month (light usage at scale)
Performance tier: scales linearly
Qdrant
Qdrant is the performance leader — written in Rust, it handles complex payload filtering without sacrificing search speed.
Getting Started
from qdrant_client import QdrantClient, models
# Self-hosted (Docker)
client = QdrantClient(host="localhost", port=6333)
# Qdrant Cloud
client = QdrantClient(
url="https://your-cluster-url.qdrant.io",
api_key=os.environ["QDRANT_API_KEY"],
)
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=models.VectorParams(
size=1536,
distance=models.Distance.COSINE,
on_disk=True, # For large collections
),
# HNSW configuration for performance tuning
hnsw_config=models.HnswConfigDiff(
m=16, # Higher = better recall, more memory
ef_construct=100, # Higher = slower indexing, better quality
),
)
Upsert with Rich Payloads
from qdrant_client.models import PointStruct
# Qdrant uses "points" with payloads (equivalent to metadata)
points = [
PointStruct(
id=i, # Integer or UUID
vector=embedding,
payload={
"text": doc.content,
"source": doc.source,
"department": doc.department,
"access_level": doc.access_level,
"created_at": doc.created_at.timestamp(),
"word_count": len(doc.content.split()),
},
)
for i, (doc, embedding) in enumerate(zip(documents, embeddings))
]
client.upsert(
collection_name="documents",
points=points,
)
Advanced Filtering (Qdrant's Strength)
Qdrant's payload filtering is the most expressive in the category:
from qdrant_client.models import Filter, FieldCondition, Range, MatchValue, MatchAny
# Complex filter: department = legal AND access_level >= 2 AND recent
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(
key="department",
match=MatchValue(value="legal"),
),
FieldCondition(
key="access_level",
range=Range(gte=2),
),
FieldCondition(
key="created_at",
range=Range(
gte=datetime(2025, 1, 1).timestamp(),
),
),
],
should=[
FieldCondition(
key="source",
match=MatchAny(any=["contracts", "agreements"]),
),
],
),
limit=10,
with_payload=True,
with_vectors=False, # Don't return vectors (saves bandwidth)
)
for result in results:
print(f"Score: {result.score:.4f} | Source: {result.payload['source']}")
print(result.payload['text'][:200])
Hybrid Search with Sparse Vectors
from qdrant_client.models import SparseVector, SparseVectorParams
# Create collection with both dense and sparse vectors
client.create_collection(
collection_name="hybrid_docs",
vectors_config={
"dense": models.VectorParams(size=1536, distance=models.Distance.COSINE),
},
sparse_vectors_config={
"sparse": SparseVectorParams(
index=models.SparseIndexParams(on_disk=True),
),
},
)
# Query with both (RRF fusion)
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector
results = client.query_points(
collection_name="hybrid_docs",
prefetch=[
models.Prefetch(
query=NamedVector(name="dense", vector=dense_embedding),
limit=20,
),
models.Prefetch(
query=NamedSparseVector(
name="sparse",
vector=SparseVector(
indices=sparse_indices,
values=sparse_values,
),
),
limit=20,
),
],
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=5,
)
Weaviate
Weaviate is the hybrid search specialist. Its native BM25 index means you don't need a separate Elasticsearch instance for keyword search.
Getting Started
import weaviate
import weaviate.classes as wvc
# Connect to Weaviate Cloud (WCS)
client = weaviate.connect_to_weaviate_cloud(
cluster_url=os.environ["WEAVIATE_URL"],
auth_credentials=weaviate.auth.AuthApiKey(
os.environ["WEAVIATE_API_KEY"]
),
)
# Create collection
client.collections.create(
name="Document",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small",
),
generative_config=wvc.config.Configure.Generative.openai(
model="gpt-4o",
),
properties=[
wvc.config.Property(
name="content",
data_type=wvc.config.DataType.TEXT,
),
wvc.config.Property(
name="source",
data_type=wvc.config.DataType.TEXT,
skip_vectorization=True,
),
wvc.config.Property(
name="department",
data_type=wvc.config.DataType.TEXT,
skip_vectorization=True,
),
],
)
Hybrid Search (Weaviate's Strength)
documents = client.collections.get("Document")
# Pure vector search
vector_results = documents.query.near_text(
query="payment terms and conditions",
limit=5,
return_metadata=wvc.query.MetadataQuery(distance=True),
)
# Pure keyword search (BM25)
keyword_results = documents.query.bm25(
query="payment terms NET30",
limit=5,
return_metadata=wvc.query.MetadataQuery(score=True),
)
# Hybrid search (vector + BM25 combined) — Weaviate's signature feature
hybrid_results = documents.query.hybrid(
query="payment terms NET30",
alpha=0.5, # 0 = pure BM25, 1 = pure vector, 0.5 = balanced
limit=5,
filters=wvc.query.Filter.by_property("department").equal("legal"),
return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True),
)
for result in hybrid_results.objects:
print(f"Score: {result.metadata.score:.4f}")
print(f"Content: {result.properties['content'][:200]}")
Generative Search (RAG in One Query)
Weaviate's generative modules run the LLM call inside the database:
# Single query: search + generate response
response = documents.generate.hybrid(
query="What are the payment terms in our enterprise contracts?",
alpha=0.5,
limit=3,
# RAG: generate a response using the retrieved documents
grouped_task="Summarize the payment terms found in these documents. "
"Format as a bullet list with key terms highlighted.",
)
print(response.generated) # LLM-generated summary
for obj in response.objects:
print(f"Source: {obj.properties['source']}")
Performance Comparison
Benchmark setup: 100M vectors, 1536 dimensions, 10% payload filter
Latency (p50 / p95 / p99):
Pinecone: 12ms / 50ms / 85ms
Qdrant: 8ms / 20ms / 35ms
Weaviate: 10ms / 30ms / 55ms
Throughput (concurrent requests):
Pinecone: 10,000 QPS (managed, auto-scales)
Qdrant: 15,000 QPS (self-hosted, 32-core)
Weaviate: 5,000 QPS (self-hosted, 32-core)
With complex payload filter (3 conditions):
Pinecone: +8ms latency overhead (metadata index)
Qdrant: +2ms latency overhead (native HNSW+filter)
Weaviate: +5ms latency overhead
Qdrant's HNSW+filter implementation is the most efficient —
payload filtering runs during graph traversal, not as post-filter.
Cost Comparison at Scale
| Scale | Pinecone Cloud | Weaviate Cloud | Qdrant Cloud | Self-hosted |
|---|---|---|---|---|
| 1M vectors | ~$14/month | ~$45/month | ~$20/month | ~$15/month |
| 10M vectors | ~$50/month | ~$120/month | ~$60/month | ~$50/month |
| 100M vectors | ~$350/month | ~$800/month | ~$400/month | ~$200/month |
| 1B vectors | ~$3,500/month | ~$2,200/month | ~$1,000/month | ~$800/month |
Estimates based on 1536-dim vectors, moderate query volume (100K queries/day), 2026 pricing
Feature Comparison
| Feature | Pinecone | Qdrant | Weaviate |
|---|---|---|---|
| Managed cloud | ✅ Only | ✅ + self-host | ✅ + self-host |
| Open source | ❌ | ✅ Apache 2.0 | ✅ BSD 3 |
| Hybrid search | ⚠️ Manual | ✅ Sparse vectors | ✅ Native BM25 |
| GraphQL API | ❌ | ❌ | ✅ |
| REST API | ✅ | ✅ | ✅ |
| gRPC API | ✅ | ✅ | ✅ |
| Built-in vectorizer | ❌ | ❌ | ✅ (module system) |
| Generative search | ❌ | ❌ | ✅ (RAG in one call) |
| Multi-tenancy | ✅ Namespaces | ✅ Collections | ✅ Multi-tenancy plugin |
| SOC 2 Type II | ✅ | ✅ Cloud | ✅ Cloud |
| HIPAA | ✅ Enterprise | ❌ | ✅ Enterprise Cloud |
| Payload filtering | ✅ Metadata | ✅✅ Best-in-class | ✅ Good |
| On-disk storage | ✅ | ✅ | ✅ |
| GPU acceleration | ❌ | ❌ | ❌ |
When pgvector Is Enough
Before committing to a dedicated vector DB, consider pgvector:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Add vector column to existing table
ALTER TABLE documents
ADD COLUMN embedding vector(1536);
-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Query
SELECT id, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
pgvector is the right choice if:
- You're already on Postgres (Supabase, Neon, PlanetScale)
- Vectors < 10M
- Query volume < 100 QPS
- You don't want another service to manage
Beyond those bounds, dedicated vector databases win on performance.
Decision Guide
Choose Pinecone if:
- You want zero infrastructure — no Docker, no k8s, no ops
- Your team has no MLOps resources
- HIPAA compliance is required (Enterprise tier)
- You're starting out and want to iterate fast
Choose Qdrant if:
- Performance is critical — lowest latency, highest throughput
- You need complex payload filtering (multiple conditions, nested objects)
- You're comfortable with self-hosting or Qdrant Cloud
- Cost at scale matters — significantly cheaper than Pinecone managed
Choose Weaviate if:
- Hybrid search (semantic + keyword) is a core requirement
- You want built-in vectorization (no separate embedding service)
- Generative search (RAG in one query) simplifies your architecture
- GraphQL API fits your existing patterns
Browse all vector database and AI infrastructure APIs at APIScout.
Related: RAG Pipeline: Pinecone vs Weaviate vs pgvector · Embedding Models Compared: OpenAI vs Cohere vs Voyage