Portkey vs Kong AI Gateway: LLM Routing APIs 2026
TL;DR
Portkey if you're building LLM-first — it understands tokens, models, and prompts natively. Fallbacks, retries, semantic caching, cost routing, and per-model observability are all built-in. Kong if you're already running Kong for traditional APIs — it's 228% faster (measured), and you can extend familiar API gateway patterns to AI workloads, though you sacrifice LLM-aware features. LiteLLM is the open-source alternative that competes with both. For most AI-native teams in 2026, Portkey wins on features and developer experience despite the latency overhead.
Key Takeaways
- Portkey latency overhead: ~20–40ms (AI-native features cost latency)
- Kong latency advantage: 228% faster than Portkey, 65% lower latency in benchmarks
- Portkey pricing: Free (10K requests/month) → $49/month (100K requests) → Enterprise
- Kong AI Gateway pricing: Complex multi-dimensional model — gateway + request + plugin fees; $30+ per million requests at scale
- Semantic caching: Portkey has it natively (exact + semantic); Kong requires custom plugin
- Token observability: Portkey (native); Kong (treats requests as opaque blobs, no token-level data)
- Model fallbacks: Portkey (native, configured in JSON); Kong (requires custom Lua/Python plugin)
What AI Gateways Do
An AI gateway sits between your application and LLM providers (OpenAI, Anthropic, Gemini, etc.). Instead of calling providers directly, your app calls the gateway, which handles:
- Load balancing across providers/models
- Fallbacks when a provider is down or rate-limited
- Caching to avoid redundant LLM calls
- Observability (cost tracking, latency, error rates)
- Rate limiting and quota management
- Authentication and API key management
The question is whether you want a gateway that understands LLMs (tokens, models, prompts) or a gateway that treats LLM calls as generic HTTP requests.
Portkey: AI-Native Gateway
Getting Started
import Portkey from 'portkey-ai';
// Drop-in replacement for OpenAI SDK
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
virtualKey: 'openai-key-abc123', // Your stored OpenAI key
});
// Works exactly like OpenAI SDK
const completion = await portkey.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
For existing OpenAI SDK code, add Portkey as a base URL:
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.PORTKEY_API_KEY,
baseURL: 'https://api.portkey.ai/v1',
defaultHeaders: {
'x-portkey-virtual-key': 'openai-key-abc123',
},
});
// Your existing OpenAI code works unchanged
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Write a haiku.' }],
});
Fallback Configuration (Configs)
Portkey's most powerful feature is declarative routing via JSON configs:
// Portkey Config: fallback across providers
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"virtual_key": "openai-key-abc123",
"model": "gpt-4o",
"override_params": { "max_tokens": 4096 }
},
{
"virtual_key": "anthropic-key-abc123",
"model": "claude-3-5-sonnet-20241022"
},
{
"virtual_key": "groq-key-abc123",
"model": "llama-3.3-70b-versatile"
}
]
}
// Use the config in your app
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config: 'config-id-from-dashboard',
});
// If OpenAI is down → falls back to Anthropic → then Groq
const response = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'Help me debug this code.' }],
// model is determined by the config's primary target
});
Load Balancing and Cost Routing
// Load balance between GPT-4o and GPT-4o-mini by cost
{
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"virtual_key": "openai-key-abc123",
"model": "gpt-4o-mini",
"weight": 0.8 // 80% of traffic
},
{
"virtual_key": "openai-key-abc123",
"model": "gpt-4o",
"weight": 0.2 // 20% for complex queries
}
]
}
Semantic Caching
Portkey's caching goes beyond exact-match:
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config: {
cache: {
mode: 'semantic', // Fuzzy match on similar prompts
maxAge: 3600, // Cache TTL in seconds
// "What is the capital of France?" and
// "Tell me the capital city of France" both hit cache
},
},
});
// Or exact-match for deterministic prompts
const portkeyExact = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config: {
cache: {
mode: 'exact',
maxAge: 86400,
},
},
});
Token and Cost Observability
// Portkey tracks token usage, costs, and latency automatically
// Access via dashboard or API
const response = await portkey.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
// Attach metadata for filtering in dashboard
});
// Via headers
const portkeyWithMeta = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
traceId: `trace-${Date.now()}`, // Custom trace ID
metadata: JSON.stringify({
userId: user.id,
featureName: 'code-review',
environment: 'production',
}),
});
// Query costs programmatically
const analytics = await portkey.analytics.list({
startDate: '2026-03-01',
endDate: '2026-03-15',
groupBy: 'model',
});
// Returns: cost breakdown by model, user, feature, etc.
Guardrails
// Add content safety without changing your app code
const config = {
guardrails: {
input: [
{
type: 'regex',
pattern: '(credit card|SSN|social security)',
action: 'block',
message: 'PII detected in input',
},
],
output: [
{
type: 'pii_detection',
action: 'redact',
},
],
},
};
Kong AI Gateway
Kong is the enterprise API gateway that added AI capabilities. Its architecture treats LLM calls as enhanced HTTP requests through the same plugin infrastructure used for traditional APIs.
Setup
# kong.yml
_format_version: "3.0"
services:
- name: openai-service
url: https://api.openai.com
routes:
- name: chat-completions
paths:
- /v1/chat/completions
plugins:
- name: ai-proxy
service: openai-service
config:
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${OPENAI_API_KEY}
model:
provider: openai
name: gpt-4o
options:
max_tokens: 4096
temperature: 0.7
AI Proxy Plugin
# AI proxy with provider fallback
plugins:
- name: ai-proxy-advanced
config:
targets:
- model:
provider: openai
name: gpt-4o
auth:
header_name: Authorization
header_value: Bearer ${OPENAI_API_KEY}
weight: 100
- model:
provider: anthropic
name: claude-3-5-sonnet-20241022
auth:
header_name: x-api-key
header_value: ${ANTHROPIC_API_KEY}
weight: 0 # Failover only
balancer:
algorithm: round-robin
Rate Limiting
Kong's rate limiting (its traditional strength):
plugins:
- name: rate-limiting
config:
second: 10
minute: 100
hour: 1000
policy: redis
redis:
host: redis-host
port: 6379
What Kong Lacks for LLMs
Traditional API Gateway metrics Kong tracks:
✅ HTTP status codes
✅ Request/response latency
✅ Requests per second
✅ Bandwidth bytes
LLM-specific metrics Kong does NOT track:
❌ Token count (input/output)
❌ Token cost (no model pricing knowledge)
❌ Prompt content (opaque blob)
❌ Model-aware routing (no semantic understanding)
❌ Cache hits based on semantic similarity
❌ Cost per request to different providers
Kong sees: POST /v1/chat/completions with 2KB body
Portkey sees: 850 input tokens → gpt-4o → 320 output tokens → $0.0053 total
Performance Benchmark
Kong's published benchmark (Kong vs Portkey vs LiteLLM):
Environment: AWS, same region as OpenAI API
Latency (p50):
Kong: 12ms overhead
Portkey: 27ms overhead (+125% vs Kong)
LiteLLM: 95ms overhead (+692% vs Kong)
Throughput (requests/second):
Kong: 8,200 req/s
Portkey: 2,100 req/s
LiteLLM: 890 req/s
For real-time chat applications (200ms total budget):
12ms gateway overhead = 6% of budget (acceptable)
27ms gateway overhead = 13.5% of budget (meaningful)
95ms gateway overhead = 47.5% of budget (problematic)
Context: This benchmark was published by Kong. Independent benchmarks show smaller differences. For most SaaS applications with 500ms+ round-trip LLM calls, 15ms extra overhead is irrelevant.
Pricing Comparison
Portkey Pricing:
Free: 10,000 requests/month + 30-day logs
Starter: $49/month — 100K requests + 90-day logs
Business: $249/month — 1M requests + advanced guardrails
Enterprise: Custom — unlimited + HIPAA + dedicated infra
Overage: $9 per additional 100K requests (Starter)
Kong Pricing:
Free (self-hosted): Open-source, no request limits
Konnect (cloud): $0.016+ per unit (complex pricing)
Enterprise: Multi-dimensional: $0.068/unit in Starter tier
Note: Kong Konnect has 5+ pricing dimensions
(gateway services, request units, paid plugins, premium plugins)
making total cost unpredictable at scale.
Reports of $30+/million requests for full feature stack.
Feature Comparison
| Feature | Portkey | Kong AI Gateway |
|---|---|---|
| Latency overhead | ~20–40ms | ~8–15ms |
| Throughput | ~2,100 req/s | ~8,200 req/s |
| Model fallbacks | ✅ Native (JSON config) | ⚠️ Via custom plugin |
| Token observability | ✅ Native | ❌ Not available |
| Cost tracking | ✅ Per-request, per-model | ❌ Not available |
| Semantic caching | ✅ Native | ❌ Not available |
| Exact-match caching | ✅ | ✅ (ai-semantic-cache plugin) |
| Provider normalization | ✅ (OpenAI-compatible for all) | ⚠️ Per-provider config |
| Prompt management | ✅ Prompt hub + versioning | ❌ |
| Guardrails | ✅ Native | ❌ Custom plugin required |
| RBAC / teams | ✅ | ✅ Enterprise |
| SOC 2 Type II | ✅ | ✅ |
| HIPAA | ✅ Enterprise | ✅ Enterprise |
| Self-hosted | ✅ (open-source version) | ✅ (open-source) |
| Traditional API routing | ⚠️ Limited | ✅ Full-featured |
| Learning curve | Low (JSON configs) | High (Lua/YAML/Admin API) |
LiteLLM: The Open-Source Option
Worth mentioning: LiteLLM is the open-source alternative that both Portkey and Kong compete against:
from litellm import completion
# Unified interface across 100+ providers
response = completion(
model='openai/gpt-4o',
messages=[{'role': 'user', 'content': 'Hello!'}],
# Fallback
fallbacks=['anthropic/claude-3-5-sonnet-20241022', 'groq/llama-3.3-70b'],
)
# Or run LiteLLM proxy server
# litellm --model openai/gpt-4o
# Any OpenAI-compatible client works against http://localhost:4000
LiteLLM is slower than both Portkey and Kong (95ms+ overhead) but $0 for self-hosted deployments with full features.
Decision Guide
Choose Portkey if:
- Your primary concern is LLM-specific features (fallbacks, cost tracking, semantic cache)
- You're building AI-first applications where understanding token costs matters
- You want a fast setup — JSON configs, no DevOps
- HIPAA or SOC 2 is required on the managed platform
- Budget: $49–$249/month is acceptable
Choose Kong if:
- You're already running Kong for traditional APIs — unify AI and HTTP routing
- Latency performance at extreme scale (10K+ req/s) is critical
- You have DevOps capacity to write plugins for LLM-specific needs
- You want open-source self-hosting with no per-request fees
Choose LiteLLM if:
- You're cost-sensitive and have DevOps capacity
- You want full LLM-aware features (fallbacks, load balancing) at $0 cost
- Latency overhead is acceptable in your use case
Browse all AI gateway and LLM infrastructure APIs at APIScout.
Related: OpenRouter vs LiteLLM: API Gateway for Multiple AI Models · The Rise of AI Gateway APIs