LLM API Pricing 2026: GPT-5 vs Claude vs Gemini

The LLM Pricing Landscape Has Shifted Dramatically

Flagship LLM API prices have fallen 40–60% per generation year-over-year. Models that cost $30/MTok in 2024 cost $5/MTok today. Ultra-budget entrants like Mistral Nemo ($0.02/$0.04 per MTok) and Gemini 2.0 Flash-Lite ($0.075/$0.30) have compressed the bottom of the market to near-zero. At the top, GPT-5.2 Pro charges $21/$168 per MTok — nearly 70x more than the cheapest commercial option.

Choosing the wrong model for your workload is now a significant cost risk. Build a high-volume production pipeline on GPT-5.2 Pro instead of DeepSeek V3.2 and you could overspend by an order of magnitude. Choose Gemini 2.0 Flash-Lite when your task requires frontier reasoning and your quality metrics will fall apart.

This guide gives you the complete pricing picture for March 2026: every major model, every discount mechanism, and practical decision frameworks for real production workloads.

TL;DR

Cheapest commercial option: Mistral Nemo at $0.02/$0.04 per MTok
Best budget frontier model: DeepSeek V3.2 at $0.28/$0.42 — 90% prompt cache discount drops effective input cost to $0.028/MTok
Best value flagship: Gemini 2.5 Pro at $1.25/$10 with 1M context
Best for long-context tasks (flat-rate 1M): Claude Opus 4.6 or Sonnet 4.6 — no surcharge above 200K tokens
Most expensive option: GPT-5.2 Pro at $21/$168 per MTok (specialized use cases only)
Batch + caching combined: up to 95% total savings on qualifying workloads

Key Takeaways

Prices have dropped 40–60% year-over-year. What you budgeted for in 2024 likely costs half as much today.
DeepSeek V3.2's prompt cache discount is exceptional. A 90% discount on cached input tokens means $0.28/MTok becomes $0.028/MTok — cheaper than any competitor including Mistral Nemo on cached workloads.
Claude Opus 4.6, GPT-5.4, and Gemini 2.5 Pro all ship 1M+ context windows in GA. Claude's 1M is flat-rate (no surcharge); Gemini 3 Pro charges a 2x premium above 200K tokens. Context window size is now table stakes at the flagship tier.
Batch API discounts (50% off) are available from OpenAI, Anthropic, and Google. If your workload tolerates 24-hour async processing, this is free money.
Combining batch and caching can reach 95% total savings. A $25/MTok output model becomes effectively $1.25/MTok.
Mistral Nemo at $0.02/$0.04 is the cheapest commercial API for classification, extraction, and structured output tasks that do not need frontier capability.
GPT-5.2 Pro and O3 Pro are in a category of their own. At $21/$168 and $150/MTok input respectively, they serve specialized enterprise and research workloads where cost is secondary to capability.

Full Pricing Comparison Table

Prices are per million tokens (MTok). All prices reflect standard list pricing as of March 2026.

Model	Provider	Input ($/MTok)	Output ($/MTok)	Context Window	Notes
GPT-5.4	OpenAI	$2.50	$15.00	1.05M	General flagship; cost↑ above 272K ctx
GPT-5.4 Pro	OpenAI	$30.00	$180.00	1.05M	Enterprise deep reasoning
GPT-5.3 Codex	OpenAI	$3.00	$15.00	128K	Code-optimized
GPT-5.2 Pro	OpenAI	$21.00	$168.00	128K	Advanced reasoning
O3 Pro	OpenAI	$150.00	—	128K	Input pricing only published
Claude Opus 4.6	Anthropic	$5.00	$25.00	1M	Flat rate; GA March 13, 2026
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	1M	Best value Anthropic
Claude Haiku 3.5	Anthropic	$0.25	$1.25	200K	High-volume, low-latency
Gemini 3 Pro Preview	Google	$2.00/$4.00	$12.00/$18.00	1M	2x input premium >200K ctx
Gemini 2.5 Pro	Google	$1.25	$10.00	1M	Best value with long context
Gemini 2.0 Flash	Google	$0.30	$2.50	1M	Fast, cost-efficient
Gemini 2.0 Flash-Lite	Google	$0.075	$0.30	1M	Ultra-budget
DeepSeek V3.2	DeepSeek	$0.28	$0.42	128K	90% cache discount
DeepSeek R1	DeepSeek	$0.55	$2.19	128K	Reasoning model
Grok 3	xAI	$3.00	$15.00	131K	Real-time web access
Mistral Nemo	Mistral	$0.02	$0.04	128K	Cheapest commercial API

Context Window Comparison

Context window size determines how much text, code, or data a model can process in a single request. This directly affects what use cases are feasible without chunking.

Context Window	Models
1M+ tokens	GPT-5.4 (1.05M), Claude Opus 4.6 (1M), Claude Sonnet 4.6 (1M), Gemini 2.5 Pro (1M), Gemini 2.0 Flash (1M), Gemini 2.0 Flash-Lite (1M)
200K tokens	Claude Haiku 3.5
128–131K tokens	GPT-5.3 Codex, GPT-5.2 Pro, O3 Pro, DeepSeek V3.2, DeepSeek R1, Grok 3, Mistral Nemo

What a 1M context window enables in practice:

Entire codebases (medium-sized projects) in a single request
Full legal contracts, financial filings, or research papers without chunking
Long conversation history without summarization hacks
Multi-document analysis without retrieval pipelines

1M context is now standard among flagship models. The key differentiator is pricing above 200K tokens: Claude charges the same flat rate across the full 1M window. Gemini 3 Pro charges 2x input / 1.5x output above 200K tokens — meaningful for very long-context workloads. GPT-5.4 also notes that input costs double beyond 272K tokens per session.

The practical winner for sustained long-context work: Claude. Flat pricing across 1M tokens makes cost predictable at any context depth.

Cost Optimization: Batch API and Prompt Caching

The headline prices above are not what most production teams actually pay. Two discount mechanisms can dramatically reduce effective costs.

Batch API: 50% Discount for Async Workloads

OpenAI, Anthropic, and Google all offer batch processing APIs that accept jobs and return results within 24 hours. The tradeoff: no real-time responses. The reward: 50% off standard pricing.

Batch API math:

Claude Opus 4.6 output at list price: $25.00/MTok
Claude Opus 4.6 output with batch API: $12.50/MTok
Gemini 2.5 Pro input with batch API: $0.625/MTok
GPT-5.4 input with batch API: $1.25/MTok

Batch API is the right choice for: data labeling pipelines, document summarization at scale, overnight report generation, content moderation queues, and any workload that does not require sub-second latency.

Prompt Caching: Up to 90% Savings on Repeated Context

Prompt caching lets you reuse a cached version of a long system prompt, document, or tool definition across many requests. Only new tokens are billed at full price. Cached tokens are billed at a steep discount.

Provider	Cache Read Discount
Anthropic	90% off input price
OpenAI	~50% off input price
Google	~75% off input price
DeepSeek	90% off input price

DeepSeek V3.2 caching math:

DeepSeek V3.2 input at list price: $0.28/MTok. With 90% cache discount: $0.028/MTok cached input — less than half the price of Mistral Nemo's already-minimal list rate. For workloads with heavy context reuse (RAG pipelines, agent loops with fixed tool definitions, document Q&A), DeepSeek V3.2 is the most cost-efficient frontier model available.

Anthropic caching math:

Claude Opus 4.6 at $5.00/MTok input. A RAG pipeline sending a 50K-token knowledge base with every request:

Without caching: 50K tokens × $5.00/MTok = $0.25 per request
With caching (90% discount): 50K tokens × $0.50/MTok = $0.025 per request
Savings: 90% per cached token

This can fully invert cost comparisons. Claude Opus 4.6 with aggressive caching can be cheaper than Gemini 2.5 Pro at list price for the right workload.

Combined Savings: Batch + Caching

When batch API and prompt caching stack, savings compound:

Claude Opus 4.6 list output: $25.00/MTok
After batch API (50% off): $12.50/MTok
After cache read on input (90% off): effective blended cost drops significantly
Real-world combined savings: up to 95% on qualifying workloads

A workload that looks like $25,000/month at list price could cost $1,250/month with both mechanisms applied. That changes business cases fundamentally.

Real Dollar Examples

Abstract per-million-token pricing can be hard to reason about. Here are concrete examples.

Generating 1M output tokens (roughly 750,000 words — a large novel):

Model	Cost
Mistral Nemo	$0.04
Gemini 2.0 Flash-Lite	$0.30
DeepSeek V3.2	$0.42
Gemini 2.0 Flash	$2.50
Gemini 2.5 Pro	$10.00
Claude Haiku 3.5	$1.25
Claude Sonnet 4.6	$15.00
GPT-5.4	$15.00
Claude Opus 4.6	$25.00
GPT-5.2 Pro	$168.00

The spread from cheapest to most expensive: 4,200x. Mistral Nemo costs $0.04 to generate a million output tokens. GPT-5.2 Pro costs $168.00 for the same volume. Model selection is the single largest cost lever in any LLM-powered application.

Processing a 10,000-token document (roughly 30-40 pages):

Model	Input Cost
Gemini 2.0 Flash-Lite	$0.00075
Mistral Nemo	$0.0002
DeepSeek V3.2	$0.0028
Claude Haiku 3.5	$0.0025
Gemini 2.5 Pro	$0.0125
GPT-5.4	$0.025
Claude Opus 4.6	$0.05
GPT-5.2 Pro	$0.21

At this document scale, absolute costs are small for any model. The economics only become significant at volume — processing 100,000 such documents per month shifts the Claude Opus 4.6 cost to $5,000/month vs $28/month for Gemini 2.0 Flash-Lite.

Use Case Recommendations

Budget-Tight: Maximum Volume at Minimum Cost

Best options: Mistral Nemo, Gemini 2.0 Flash-Lite, DeepSeek V3.2

Use Mistral Nemo ($0.02/$0.04) for: text classification, entity extraction, structured output generation, content moderation, and any task where a capable-but-not-frontier model works.

Use Gemini 2.0 Flash-Lite ($0.075/$0.30) when you need a well-supported platform with Google's ecosystem and want sub-dollar costs at high volume.

Use DeepSeek V3.2 ($0.28/$0.42, $0.028 cached) when you need frontier-quality output at budget prices. R1 reasoning tasks at $0.55/$2.19 remain competitive with GPT-5.4 for complex reasoning while costing half as much.

Explore DeepSeek's API pricing and capabilities and see how it compares in our DeepSeek vs OpenAI API comparison.

Performance-First: Best Quality Regardless of Cost

Best options: Claude Opus 4.6, GPT-5.2 Pro, O3 Pro

Claude Opus 4.6 ($5/$25) is the correct default for performance-first teams. It leads SWE-bench Verified benchmarks, offers 1M context, and has a well-documented API with strong tooling. It is the model to beat on complex reasoning, agentic coding, and nuanced document analysis.

GPT-5.2 Pro ($21/$168) and O3 Pro ($150/MTok input) are justified for specialized enterprise scenarios: high-stakes financial modeling, medical reasoning, research synthesis, or cases where even marginal quality improvements translate to significant downstream value. Most teams will not need them.

Explore Anthropic's API and OpenAI's API for full capability documentation.

Long-Context Workloads

Best options: Claude Opus 4.6, Claude Sonnet 4.6, Gemini 2.5 Pro

If your workload requires processing more than 100K tokens in a single request, you have three strong options. The key differentiator is not context window size (all flagships now hit 1M+), but how they price extended context:

Claude Opus 4.6 ($5/$25, 1M context, flat rate): Best for long-form reasoning, multi-document synthesis, codebase analysis. No surcharge at any context depth.
Claude Sonnet 4.6 ($3/$15, 1M context, flat rate): Best value for long-context tasks — same flat-rate 1M window at a lower price point.
Gemini 2.5 Pro ($1.25/$10, 1M context): Most cost-efficient for workloads under 200K tokens; charges a 2x input premium above that threshold.
GPT-5.4 ($2.50/$15, 1.05M context): Input costs double beyond 272K tokens per session — plan accordingly for sustained long-context workloads.

Explore Google Gemini's API for full context window and pricing details. See also our Anthropic vs Google Gemini comparison.

Balanced: Cost-Efficient Frontier Quality

Best options: Gemini 2.5 Pro, Claude Sonnet 4.6, Grok 3

For teams that want strong performance without paying flagship premiums:

Gemini 2.5 Pro at $1.25/$10 is the most underrated model in this comparison. Frontier-quality output, 1M context, and a price point well below GPT-5.4 or Claude Opus 4.6.
Claude Sonnet 4.6 at $3/$15 is Anthropic's most popular model for good reason: it sits at the sweet spot between Haiku's cost-efficiency and Opus's reasoning depth.
Grok 3 at $3/$15 adds real-time web access, useful for applications requiring current information without separate retrieval pipelines.

When NOT to Use Premium Models

Premium models are often the wrong choice. Here are the specific cases to avoid them.

Do not use GPT-5.2 Pro or O3 Pro for:

High-volume text classification or entity extraction
Standard chatbot conversations
Summarization of documents where quality is "good enough" at lower tiers
Any workload where GPT-5.4 or Claude Sonnet 4.6 produces acceptable output

The quality gap between GPT-5.4 ($2.50/$10) and GPT-5.2 Pro ($21/$168) does not justify a 7–17x price increase for most applications. Run quality evaluations on your specific task with a tier-down model before committing to premium pricing.

Do not use Claude Opus 4.6 for:

Tasks that Sonnet 4.6 handles equally well (check with evals)
High-throughput, latency-sensitive production endpoints where Haiku 3.5 suffices
Simple extraction or formatting tasks

Rule of thumb: Always eval down. Start with the cheapest model that might work. Move up only when quality metrics fail. Most teams skip this step and overspend significantly.

See our API cost optimization strategies guide for a full framework on tiered model selection and eval-driven cost reduction.

Methodology and Sources

Pricing data in this article reflects publicly published list prices from each provider as of March 2026. All prices are per million tokens (MTok) in USD. Input and output token prices are listed separately where providers distinguish between them.

Sources:

OpenAI pricing page (platform.openai.com/pricing): GPT-5.4, GPT-5.3 Codex, GPT-5.2 Pro, O3 Pro
Anthropic pricing page (anthropic.com/pricing): Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 3.5
Google AI pricing (ai.google.dev/pricing): Gemini 2.5 Pro, Gemini 2.0 Flash, Gemini 2.0 Flash-Lite
DeepSeek pricing (platform.deepseek.com/api-docs/pricing): DeepSeek V3.2, DeepSeek R1
xAI pricing (docs.x.ai/docs/models): Grok 3
Mistral pricing (mistral.ai/pricing): Mistral Nemo

Methodology notes:

Batch API discounts are stated as 50% off list price; actual discounts may vary by provider and model tier — always verify at time of purchase.
Prompt cache discount percentages are based on published cache read rates; write costs (to populate the cache) are typically billed at full input price.
"Combined savings" estimates assume ideal conditions: high cache hit rate and batch-eligible workload patterns. Real-world savings will vary.
Context window sizes reflect the maximum published context; performance on tasks filling the full window may degrade — test your specific use case.
Prices change frequently. Verify current rates via each provider's pricing page or the APIScout pricing tracker before making budget decisions.

We update this comparison quarterly. Last updated: March 16, 2026.

Compare OpenAI and Anthropic on APIScout.

The API Integration Checklist (Free PDF)