Skip to main content

Portkey vs Kong AI Gateway: LLM Routing APIs 2026

·APIScout Team
portkeykongai-gatewayllm-routingllmopsopenaianthropic2026

TL;DR

Portkey if you're building LLM-first — it understands tokens, models, and prompts natively. Fallbacks, retries, semantic caching, cost routing, and per-model observability are all built-in. Kong if you're already running Kong for traditional APIs — it's 228% faster (measured), and you can extend familiar API gateway patterns to AI workloads, though you sacrifice LLM-aware features. LiteLLM is the open-source alternative that competes with both. For most AI-native teams in 2026, Portkey wins on features and developer experience despite the latency overhead.

Key Takeaways

  • Portkey latency overhead: ~20–40ms (AI-native features cost latency)
  • Kong latency advantage: 228% faster than Portkey, 65% lower latency in benchmarks
  • Portkey pricing: Free (10K requests/month) → $49/month (100K requests) → Enterprise
  • Kong AI Gateway pricing: Complex multi-dimensional model — gateway + request + plugin fees; $30+ per million requests at scale
  • Semantic caching: Portkey has it natively (exact + semantic); Kong requires custom plugin
  • Token observability: Portkey (native); Kong (treats requests as opaque blobs, no token-level data)
  • Model fallbacks: Portkey (native, configured in JSON); Kong (requires custom Lua/Python plugin)

What AI Gateways Do

An AI gateway sits between your application and LLM providers (OpenAI, Anthropic, Gemini, etc.). Instead of calling providers directly, your app calls the gateway, which handles:

  • Load balancing across providers/models
  • Fallbacks when a provider is down or rate-limited
  • Caching to avoid redundant LLM calls
  • Observability (cost tracking, latency, error rates)
  • Rate limiting and quota management
  • Authentication and API key management

The question is whether you want a gateway that understands LLMs (tokens, models, prompts) or a gateway that treats LLM calls as generic HTTP requests.


Portkey: AI-Native Gateway

Getting Started

import Portkey from 'portkey-ai';

// Drop-in replacement for OpenAI SDK
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  virtualKey: 'openai-key-abc123', // Your stored OpenAI key
});

// Works exactly like OpenAI SDK
const completion = await portkey.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

For existing OpenAI SDK code, add Portkey as a base URL:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.PORTKEY_API_KEY,
  baseURL: 'https://api.portkey.ai/v1',
  defaultHeaders: {
    'x-portkey-virtual-key': 'openai-key-abc123',
  },
});

// Your existing OpenAI code works unchanged
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Write a haiku.' }],
});

Fallback Configuration (Configs)

Portkey's most powerful feature is declarative routing via JSON configs:

// Portkey Config: fallback across providers
{
  "strategy": {
    "mode": "fallback"
  },
  "targets": [
    {
      "virtual_key": "openai-key-abc123",
      "model": "gpt-4o",
      "override_params": { "max_tokens": 4096 }
    },
    {
      "virtual_key": "anthropic-key-abc123",
      "model": "claude-3-5-sonnet-20241022"
    },
    {
      "virtual_key": "groq-key-abc123",
      "model": "llama-3.3-70b-versatile"
    }
  ]
}
// Use the config in your app
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  config: 'config-id-from-dashboard',
});

// If OpenAI is down → falls back to Anthropic → then Groq
const response = await portkey.chat.completions.create({
  messages: [{ role: 'user', content: 'Help me debug this code.' }],
  // model is determined by the config's primary target
});

Load Balancing and Cost Routing

// Load balance between GPT-4o and GPT-4o-mini by cost
{
  "strategy": {
    "mode": "loadbalance"
  },
  "targets": [
    {
      "virtual_key": "openai-key-abc123",
      "model": "gpt-4o-mini",
      "weight": 0.8  // 80% of traffic
    },
    {
      "virtual_key": "openai-key-abc123",
      "model": "gpt-4o",
      "weight": 0.2  // 20% for complex queries
    }
  ]
}

Semantic Caching

Portkey's caching goes beyond exact-match:

const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  config: {
    cache: {
      mode: 'semantic',  // Fuzzy match on similar prompts
      maxAge: 3600,      // Cache TTL in seconds
      // "What is the capital of France?" and
      // "Tell me the capital city of France" both hit cache
    },
  },
});

// Or exact-match for deterministic prompts
const portkeyExact = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  config: {
    cache: {
      mode: 'exact',
      maxAge: 86400,
    },
  },
});

Token and Cost Observability

// Portkey tracks token usage, costs, and latency automatically
// Access via dashboard or API

const response = await portkey.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: prompt }],
  // Attach metadata for filtering in dashboard
});

// Via headers
const portkeyWithMeta = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  traceId: `trace-${Date.now()}`,      // Custom trace ID
  metadata: JSON.stringify({
    userId: user.id,
    featureName: 'code-review',
    environment: 'production',
  }),
});

// Query costs programmatically
const analytics = await portkey.analytics.list({
  startDate: '2026-03-01',
  endDate: '2026-03-15',
  groupBy: 'model',
});
// Returns: cost breakdown by model, user, feature, etc.

Guardrails

// Add content safety without changing your app code
const config = {
  guardrails: {
    input: [
      {
        type: 'regex',
        pattern: '(credit card|SSN|social security)',
        action: 'block',
        message: 'PII detected in input',
      },
    ],
    output: [
      {
        type: 'pii_detection',
        action: 'redact',
      },
    ],
  },
};

Kong AI Gateway

Kong is the enterprise API gateway that added AI capabilities. Its architecture treats LLM calls as enhanced HTTP requests through the same plugin infrastructure used for traditional APIs.

Setup

# kong.yml
_format_version: "3.0"

services:
  - name: openai-service
    url: https://api.openai.com
    routes:
      - name: chat-completions
        paths:
          - /v1/chat/completions

plugins:
  - name: ai-proxy
    service: openai-service
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer ${OPENAI_API_KEY}
      model:
        provider: openai
        name: gpt-4o
        options:
          max_tokens: 4096
          temperature: 0.7

AI Proxy Plugin

# AI proxy with provider fallback
plugins:
  - name: ai-proxy-advanced
    config:
      targets:
        - model:
            provider: openai
            name: gpt-4o
          auth:
            header_name: Authorization
            header_value: Bearer ${OPENAI_API_KEY}
          weight: 100
        - model:
            provider: anthropic
            name: claude-3-5-sonnet-20241022
          auth:
            header_name: x-api-key
            header_value: ${ANTHROPIC_API_KEY}
          weight: 0  # Failover only

      balancer:
        algorithm: round-robin

Rate Limiting

Kong's rate limiting (its traditional strength):

plugins:
  - name: rate-limiting
    config:
      second: 10
      minute: 100
      hour: 1000
      policy: redis
      redis:
        host: redis-host
        port: 6379

What Kong Lacks for LLMs

Traditional API Gateway metrics Kong tracks:
  ✅ HTTP status codes
  ✅ Request/response latency
  ✅ Requests per second
  ✅ Bandwidth bytes

LLM-specific metrics Kong does NOT track:
  ❌ Token count (input/output)
  ❌ Token cost (no model pricing knowledge)
  ❌ Prompt content (opaque blob)
  ❌ Model-aware routing (no semantic understanding)
  ❌ Cache hits based on semantic similarity
  ❌ Cost per request to different providers

Kong sees: POST /v1/chat/completions with 2KB body
Portkey sees: 850 input tokens → gpt-4o → 320 output tokens → $0.0053 total

Performance Benchmark

Kong's published benchmark (Kong vs Portkey vs LiteLLM):

Environment: AWS, same region as OpenAI API

Latency (p50):
  Kong:      12ms overhead
  Portkey:   27ms overhead  (+125% vs Kong)
  LiteLLM:   95ms overhead  (+692% vs Kong)

Throughput (requests/second):
  Kong:      8,200 req/s
  Portkey:   2,100 req/s
  LiteLLM:   890 req/s

For real-time chat applications (200ms total budget):
  12ms gateway overhead = 6% of budget (acceptable)
  27ms gateway overhead = 13.5% of budget (meaningful)
  95ms gateway overhead = 47.5% of budget (problematic)

Context: This benchmark was published by Kong. Independent benchmarks show smaller differences. For most SaaS applications with 500ms+ round-trip LLM calls, 15ms extra overhead is irrelevant.


Pricing Comparison

Portkey Pricing:
  Free:        10,000 requests/month + 30-day logs
  Starter:     $49/month — 100K requests + 90-day logs
  Business:    $249/month — 1M requests + advanced guardrails
  Enterprise:  Custom — unlimited + HIPAA + dedicated infra

  Overage: $9 per additional 100K requests (Starter)

Kong Pricing:
  Free (self-hosted): Open-source, no request limits
  Konnect (cloud):    $0.016+ per unit (complex pricing)
  Enterprise:         Multi-dimensional: $0.068/unit in Starter tier

  Note: Kong Konnect has 5+ pricing dimensions
  (gateway services, request units, paid plugins, premium plugins)
  making total cost unpredictable at scale.
  Reports of $30+/million requests for full feature stack.

Feature Comparison

FeaturePortkeyKong AI Gateway
Latency overhead~20–40ms~8–15ms
Throughput~2,100 req/s~8,200 req/s
Model fallbacks✅ Native (JSON config)⚠️ Via custom plugin
Token observability✅ Native❌ Not available
Cost tracking✅ Per-request, per-model❌ Not available
Semantic caching✅ Native❌ Not available
Exact-match caching✅ (ai-semantic-cache plugin)
Provider normalization✅ (OpenAI-compatible for all)⚠️ Per-provider config
Prompt management✅ Prompt hub + versioning
Guardrails✅ Native❌ Custom plugin required
RBAC / teams✅ Enterprise
SOC 2 Type II
HIPAA✅ Enterprise✅ Enterprise
Self-hosted✅ (open-source version)✅ (open-source)
Traditional API routing⚠️ Limited✅ Full-featured
Learning curveLow (JSON configs)High (Lua/YAML/Admin API)

LiteLLM: The Open-Source Option

Worth mentioning: LiteLLM is the open-source alternative that both Portkey and Kong compete against:

from litellm import completion

# Unified interface across 100+ providers
response = completion(
    model='openai/gpt-4o',
    messages=[{'role': 'user', 'content': 'Hello!'}],
    # Fallback
    fallbacks=['anthropic/claude-3-5-sonnet-20241022', 'groq/llama-3.3-70b'],
)

# Or run LiteLLM proxy server
# litellm --model openai/gpt-4o
# Any OpenAI-compatible client works against http://localhost:4000

LiteLLM is slower than both Portkey and Kong (95ms+ overhead) but $0 for self-hosted deployments with full features.


Decision Guide

Choose Portkey if:

  • Your primary concern is LLM-specific features (fallbacks, cost tracking, semantic cache)
  • You're building AI-first applications where understanding token costs matters
  • You want a fast setup — JSON configs, no DevOps
  • HIPAA or SOC 2 is required on the managed platform
  • Budget: $49–$249/month is acceptable

Choose Kong if:

  • You're already running Kong for traditional APIs — unify AI and HTTP routing
  • Latency performance at extreme scale (10K+ req/s) is critical
  • You have DevOps capacity to write plugins for LLM-specific needs
  • You want open-source self-hosting with no per-request fees

Choose LiteLLM if:

  • You're cost-sensitive and have DevOps capacity
  • You want full LLM-aware features (fallbacks, load balancing) at $0 cost
  • Latency overhead is acceptable in your use case

Browse all AI gateway and LLM infrastructure APIs at APIScout.

Related: OpenRouter vs LiteLLM: API Gateway for Multiple AI Models · The Rise of AI Gateway APIs

Comments