OpenRouter API: One Key for 500+ LLM Models
The Problem OpenRouter Solves
Building an AI application in 2026 means navigating a fragmented model landscape. OpenAI releases GPT-4.1. Anthropic ships Claude Opus 4.6. Google releases Gemini 3.1 Pro. Meta drops Llama 4. Each requires a separate API key, separate billing, separate integration code, and separate rate limit management.
OpenRouter solves this with a single API gateway: one key, one endpoint, one billing account — access to every major frontier model and hundreds of open-source alternatives.
TL;DR
OpenRouter is an OpenAI-compatible API router that sits in front of every major LLM provider. You call one endpoint (https://openrouter.ai/api/v1), pass any model ID, and OpenRouter handles routing, authentication, and billing. It's genuinely useful for production systems that need model fallbacks, cost routing, or access to models across multiple providers without managing multiple accounts.
Key Takeaways
- Models: 500+ LLMs from OpenAI, Anthropic, Google, Meta, Mistral, Cohere, and dozens of open-source providers
- API format: Fully OpenAI-compatible — just change
base_urlandapi_key - Pricing: No per-token markup — you pay provider prices; 5.5% fee on credit purchases only
- Free models: 29
:freemodels (rate-limited: 20 req/min, 50 req/day without credits) - Killer features: Model fallbacks, provider routing preferences, cost-based auto-routing
- No lock-in: Your code works identically against provider APIs; switching is a one-line change
How OpenRouter Works
When you make a request to OpenRouter:
- You specify a model ID (e.g.,
anthropic/claude-opus-4-6) - OpenRouter authenticates your request using your single API key
- It routes to the appropriate provider (Anthropic, in this case), using your preferred provider settings
- The response is returned in OpenAI chat completions format — regardless of which provider served it
- OpenRouter bills your account; you manage one credit balance
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-v1-your-openrouter-key",
)
response = client.chat.completions.create(
model="anthropic/claude-opus-4-6",
messages=[{"role": "user", "content": "Explain transformer attention in one paragraph."}],
extra_headers={
"HTTP-Referer": "https://yourapp.com", # Required for some rate tiers
"X-Title": "Your App Name", # Shown in OpenRouter dashboard
}
)
print(response.choices[0].message.content)
No Anthropic SDK, no Anthropic API key, no Anthropic billing account. Just OpenRouter.
Supported Models
OpenRouter supports 500+ models across providers. Highlights:
OpenAI
openai/gpt-4.1— Latest flagshipopenai/gpt-4o— Multimodalopenai/gpt-4o-mini— Budget optionopenai/o3— Reasoning modelopenai/o4-mini— Budget reasoning
Anthropic
anthropic/claude-opus-4-6— Most capable Claudeanthropic/claude-sonnet-4-6— Balancedanthropic/claude-haiku-4-5— Fast and cheap
google/gemini-2.5-pro— 1M context, strong reasoninggoogle/gemini-3-flash-preview— Newest Flashgoogle/gemini-2.5-flash— Speed/cost balance
Meta / Open Source
meta-llama/llama-4-scout— Latest Llama 4meta-llama/llama-4-maverick— Llama 4 large variantmeta-llama/llama-3.3-70b-instruct— Proven workhorse
Mistral
mistralai/mistral-large-2— Strong multilingualmistralai/codestral— Code specialistmistralai/mistral-small— Cheapest Mistral
Free Models
Many open-source models are available at $0/token (with rate limits):
meta-llama/llama-4-scout:freemistralai/mistral-7b-instruct:freegoogle/gemma-3-12b-it:free
Free models share a rate-limited pool — good for development and low-volume use, not production.
Pricing
OpenRouter's pricing model is often misunderstood: there is no per-token markup. You pay the same per-token rate as calling the provider directly.
The only fees:
- Credit purchase fee: 5.5% (minimum $0.80) when buying credits upfront
- BYOK (Bring Your Own Key): First 1M requests/month free; 5% fee on provider cost beyond that
- Free models (
:freesuffix): 29 models with no token cost; rate-limited to 20 req/min and 50 req/day (1,000/day after purchasing $10+ in credits)
For most workloads, the 5.5% credit purchase fee is the only overhead — you amortize it across your entire credit balance. If you're spending $1,000 on API credits, you pay ~$55 to OpenRouter. If you use BYOK with your own API keys, you pay OpenRouter nothing for the first 1M requests.
# Check model pricing programmatically
import httpx
response = httpx.get("https://openrouter.ai/api/v1/models")
models = response.json()["data"]
for model in models[:5]:
pricing = model.get("pricing", {})
print(f"{model['id']}: ${pricing.get('prompt', 'N/A')}/token input")
Model Fallbacks
The most powerful OpenRouter feature for production: automatic failover when a provider has an outage or rate limits you.
response = client.chat.completions.create(
model="openai/gpt-4.1",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"route": "fallback",
"models": [
"openai/gpt-4.1",
"anthropic/claude-sonnet-4-6", # Fallback 1
"google/gemini-2.5-pro", # Fallback 2
],
}
)
If GPT-4.1 is rate-limited or down, OpenRouter automatically retries with Claude Sonnet 4.6, then Gemini 2.5 Pro. Your application never sees the error.
This is genuinely difficult to build yourself — you'd need error handling, retries, and provider health checking. OpenRouter gives it to you in one parameter.
Provider Routing Preferences
When a model is available from multiple providers (e.g., Llama 4 runs on Groq, Together AI, Fireworks, and others), you can control which provider OpenRouter uses:
response = client.chat.completions.create(
model="meta-llama/llama-4-maverick",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"provider": {
"order": ["Groq", "Together", "Fireworks"], # Prefer in this order
"allow_fallbacks": True,
}
}
)
Or use model routing variants — shorthand suffixes that let OpenRouter pick the best provider for your goal:
# :nitro — routes to fastest available provider for this model
response = client.chat.completions.create(
model="meta-llama/llama-4-scout:nitro", # Maximum speed
messages=[...]
)
# :floor — routes to cheapest available provider for this model
response = client.chat.completions.create(
model="meta-llama/llama-4-scout:floor", # Minimum cost
messages=[...]
)
# :free — free tier (rate-limited, no cost)
response = client.chat.completions.create(
model="meta-llama/llama-4-scout:free", # Free with rate limits
messages=[...]
)
Or for explicit provider ordering:
extra_body={
"provider": {
"order": ["Groq", "Together", "Fireworks"],
"allow_fallbacks": True,
}
}
This lets you optimize for cost or speed across the same model without changing your code.
Streaming
OpenRouter supports streaming with the same interface as the OpenAI SDK:
stream = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6",
messages=[{"role": "user", "content": "Write a poem about APIs."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming works across all major models. OpenRouter normalizes the stream format so you get consistent delta.content chunks regardless of whether the underlying model is GPT, Claude, or Gemini.
Context and System Prompts
Context and system prompts work exactly as they do with the OpenAI SDK:
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6",
messages=[
{
"role": "system",
"content": "You are a senior backend engineer. Be concise and precise.",
},
{
"role": "user",
"content": "What's the difference between optimistic and pessimistic locking?",
},
],
temperature=0.2,
max_tokens=512,
)
Multi-turn conversations work identically — append messages to the list and send the full history each time. OpenRouter passes the conversation to the upstream provider as-is.
Model Benchmarking Workflow
One of the best uses of OpenRouter: rapidly benchmarking which model performs best for your specific task.
MODELS_TO_TEST = [
"openai/gpt-4.1",
"anthropic/claude-opus-4-6",
"google/gemini-2.5-pro",
"meta-llama/llama-4-maverick",
"mistralai/mistral-large-2",
]
TEST_PROMPTS = [
"Extract the company name, date, and total amount from this invoice: [invoice text]",
"Classify this support ticket as bug/feature/question: [ticket text]",
"Summarize this 5-page contract in 3 bullet points: [contract text]",
]
async def benchmark_models():
results = {}
for model in MODELS_TO_TEST:
model_results = []
for prompt in TEST_PROMPTS:
try:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=256,
)
model_results.append({
"output": response.choices[0].message.content,
"tokens": response.usage.total_tokens,
"cost_estimate": calculate_cost(model, response.usage),
})
except Exception as e:
model_results.append({"error": str(e)})
results[model] = model_results
return results
With one API key, you get cost-normalized comparisons across every major frontier model. Migrating from one model to another becomes a one-line change.
Rate Limits and Production Considerations
OpenRouter's rate limits work in layers:
- Your OpenRouter account limits — based on your plan and credit balance
- Provider-specific limits — OpenRouter can hit provider rate limits independently
- Per-model limits — some models have tighter limits than others
For production, set up error handling that distinguishes between OpenRouter errors (your account rate-limited) and provider errors (upstream capacity):
import time
from openai import RateLimitError, APIError
def chat_with_retry(model: str, messages: list, max_retries: int = 3) -> str:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
)
return response.choices[0].message.content
except RateLimitError as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
else:
raise
except APIError as e:
if "provider" in str(e).lower():
# Provider-side error — try fallback model
fallback_models = get_fallback_models(model)
if fallback_models:
return chat_with_retry(fallback_models[0], messages, max_retries)
raise
raise Exception("Max retries exceeded")
Credits and Billing
OpenRouter uses a credit system:
- Buy credits upfront or set up auto-reload
- Each API call deducts from your balance based on token count × model price
- The OpenRouter dashboard shows real-time spend per model
- Per-model cost breakdown — useful for understanding which models drive your bill
For teams: OpenRouter supports multiple API keys under one account, letting you track spend by project or team.
TypeScript Usage
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
"HTTP-Referer": "https://yourapp.com",
"X-Title": "Your App",
},
});
async function chat(message: string, model: string = "openai/gpt-4.1") {
const response = await client.chat.completions.create({
model,
messages: [{ role: "user", content: message }],
});
return response.choices[0].message.content;
}
// Switch models with a single variable change
const cheapResponse = await chat("Hello", "meta-llama/llama-4-scout:free");
const premiumResponse = await chat("Hello", "anthropic/claude-opus-4-6");
OpenRouter vs. Direct Provider Access
| Factor | OpenRouter | Direct Provider |
|---|---|---|
| Setup | One key, one account | Separate key + billing per provider |
| Cost | ~5–15% overhead | Direct pricing |
| Model access | 500+ models instantly | Only that provider's models |
| Fallbacks | Built-in | DIY (complex) |
| Observability | Unified dashboard | Separate per provider |
| Rate limits | Pooled across providers | Per-provider |
| Data privacy | Traffic through OpenRouter | Direct to provider |
| Fine-tuned models | Your fine-tunes on OpenAI only | Full fine-tune access |
Choose OpenRouter when:
- You want to experiment with many models quickly
- You need model fallbacks in production
- You're building a multi-model product (let users choose their AI)
- You don't want to manage multiple API accounts
Go direct when:
- You're at high volume (optimize out the margin)
- You need fine-tuned models
- Data privacy requires direct-to-provider routing
- You're 100% committed to one provider
Building a Model-Agnostic Abstraction Layer
The pattern most OpenRouter users end up with in production: a thin wrapper that lets you swap models with config changes, not code changes.
import os
from openai import OpenAI
from typing import Optional
class LLMClient:
"""OpenRouter-backed LLM client with model flexibility."""
def __init__(self):
self.client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
# Models configured via env — change without code deploys
self.default_model = os.getenv("LLM_DEFAULT_MODEL", "openai/gpt-4.1")
self.fast_model = os.getenv("LLM_FAST_MODEL", "meta-llama/llama-4-scout")
self.cheap_model = os.getenv("LLM_CHEAP_MODEL", "openai/gpt-4o-mini")
def complete(
self,
prompt: str,
mode: str = "default",
system: Optional[str] = None,
) -> str:
model = {
"default": self.default_model,
"fast": self.fast_model,
"cheap": self.cheap_model,
}.get(mode, self.default_model)
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
response = self.client.chat.completions.create(
model=model,
messages=messages,
)
return response.choices[0].message.content
# Usage — model selection via config, not code
llm = LLMClient()
result = llm.complete("Summarize this doc", mode="cheap") # Uses configured cheap model
When a new model launches that's 30% cheaper for your use case, you update an environment variable and redeploy — no code changes.
Limits and Gotchas
Context window limits: OpenRouter enforces the upstream model's context limits — no magic here.
Model-specific features: Extended thinking (Claude), reasoning tokens (o3), etc. — some are passed through, others aren't. Check the model's page on openrouter.ai.
Latency overhead: OpenRouter adds ~50–150ms of routing latency. For real-time voice applications, go direct.
Free tier throttling: Free models can queue during peak hours. Not suitable for production SLAs.
HTTP-Referer header: OpenRouter uses this for rate tier classification. Set it to your app's domain. Without it, you default to the lowest tier.
Bottom Line
OpenRouter is a genuine time saver for teams that need multi-model flexibility. The overhead is minimal at typical API spend levels, and features like fallbacks and provider routing are hard to replicate yourself. For a startup exploring which model works best for their use case — or building a product that supports multiple AI backends — OpenRouter is the fastest path to production.
For committed, high-volume production workloads on a single provider, go direct and skip the margin.
The typical lifecycle: start on OpenRouter, benchmark models, find your winner, then evaluate whether direct access saves enough to justify the migration at your volume. For most teams at $1K–$10K/month in API spend, OpenRouter's overhead is noise. At $100K+/month, it's worth the switch to direct.
Either way, designing your codebase around a simple abstraction layer (swap base_url and api_key) means the migration is always a one-hour task, not a refactor.
Browse all supported models and live pricing at APIScout.
Related: How to Choose an LLM API in 2026 · Groq API: Fastest LLM Inference 2026