API guide
Claude 3.7 vs GPT-5 vs Gemini 2.5 API Pricing: 2026 Update
Current Claude, GPT-5, and Gemini API pricing, context windows, reasoning controls, and model-selection tradeoffs for production LLM apps in 2026.

Claude 3.7 vs GPT-5 vs Gemini 2.5 API Pricing: 2026 Update
TL;DR
If you arrived here searching for Claude 3.7 Sonnet pricing, treat that model as the historical comparison point and compare today's production choices instead. As of May 16, 2026, Anthropic's practical Sonnet pick is Claude Sonnet 4.6 at $3/M input and $15/M output tokens, OpenAI's current flagship line is GPT-5.5 at $5/M input and $30/M output tokens, and Gemini 2.5 Pro remains the lower-cost long-context option at $1.25/M input and $10/M output for prompts up to 200K tokens. Gemini 3.1 Pro Preview is now available too, but this guide keeps Gemini 2.5 Pro in the main table because the route's search demand is specifically about Claude/GPT/Gemini 2.5 pricing.
For most production LLM API teams, the short answer is:
- Choose Claude Sonnet 4.6 when coding, agents, tool use, and controllable extended thinking matter more than the lowest token price.
- Choose GPT-5.5 when you want OpenAI's newest flagship model, OpenAI-native tool surfaces, and can justify the higher output price.
- Choose Gemini 2.5 Pro when long context and cost-per-token are the main constraints, especially for document-heavy RAG and batch analysis.
- Do not make a 2026 vendor decision from Claude 3.7-only numbers; use them only to understand older articles, migration plans, or invoices.
Key takeaways
- Gemini 2.5 Pro is still the cheapest flagship-style comparison point in this set: $1.25/M input and $10/M output for prompts up to 200K tokens, with higher pricing for prompts above 200K tokens.
- Claude Sonnet 4.6 keeps the Sonnet price band at $3/M input and $15/M output while adding current Claude 4-series capabilities and extended thinking support.
- GPT-5.5 is the premium OpenAI pick at $5/M input and $30/M output. The OpenAI pricing docs checked for this refresh list GPT-5.5/GPT-5.4 under pricing rows for contexts below 272K tokens; do not rely on older 1M-context or 128K-output assumptions unless OpenAI documents them for the exact model ID you deploy.
- Context limits are now vendor- and model-ID-specific: current Claude Opus/Sonnet docs expose 1M context windows, OpenAI's verified pricing docs for GPT-5.5/GPT-5.4 use pricing rows for contexts below 272K tokens, and Gemini 2.5 Pro pricing differentiates by whether the prompt is above 200K tokens.
- Pricing pages move faster than benchmark leaderboards. Use official price tables for budgets and treat third-party benchmark claims as directional, not final.
Why this comparison still matters in 2026
The original version of this guide compared Claude 3.7 Sonnet, GPT-5-era OpenAI models, and Gemini 2.5 Pro because that was the searcher's mental model: which frontier LLM API should I buy for a production app?
That question is still useful, but the vendor lineups have moved. Anthropic's current model overview lists Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5 as the latest Claude family. OpenAI's model docs surface GPT-5.5 as the latest flagship for coding and professional work, with GPT-5.4 as the more affordable sibling. Google still documents Gemini 2.5 Pro as a state-of-the-art multipurpose model, while Gemini 3.1 Pro Preview now exists for teams willing to evaluate a newer preview line.
So the refreshed decision is not "does Claude 3.7 beat GPT-5?" It is: which current API line gives your app the best reliability, latency, context, reasoning controls, and token economics?
Current pricing matrix
Official USD pricing checked May 16, 2026. Prices are per 1M tokens unless noted.
| Model line | Input | Output | Context window | Max output | Current source-backed note |
|---|---|---|---|---|---|
| Gemini 2.5 Pro | $1.25 up to 200K prompt tokens; $2.50 above 200K | $10 up to 200K prompt tokens; $15 above 200K | Long-context Gemini API model; Google prices prompts above 200K separately | Use current model docs for exact limits | Lowest-cost option in this comparison for most prompt sizes. |
| Claude Sonnet 4.6 | $3 | $15 | 1M tokens | 64K tokens | Current Sonnet line; extended thinking and adaptive thinking are supported. |
| GPT-5.4 | $2.50 | $15 | Pricing row for contexts below 272K tokens in the verified OpenAI docs | Use current model docs for exact limits | More affordable OpenAI GPT-5 line for teams that do not need the newest flagship. |
| GPT-5.5 | $5 | $30 | Pricing row for contexts below 272K tokens in the verified OpenAI docs | Use current model docs for exact limits | OpenAI's newest flagship-class option in the docs. |
| Claude Opus 4.7 | $5 | $25 | 1M tokens | 128K tokens | Higher-end Claude option for complex reasoning and agentic coding. |
| Gemini 3.1 Pro Preview | $2 up to 200K prompt tokens; $4 above 200K | $12 up to 200K prompt tokens; $18 above 200K | Preview Gemini 3.1 Pro line | Use current model docs for exact limits | Worth evaluating, but preview status changes the production-risk profile. |
Cost example: a real SaaS feature
Assume 50,000 API calls per month, averaging 500 input tokens and 200 output tokens per call. That is 25M input tokens and 10M output tokens per month.
| Model line | Estimated monthly token cost |
|---|---|
| Gemini 2.5 Pro, prompts up to 200K | $131.25 |
| Gemini 3.1 Pro Preview, prompts up to 200K | $170.00 |
| GPT-5.4 | $212.50 |
| Claude Sonnet 4.6 | $225.00 |
| Claude Opus 4.7 | $375.00 |
| GPT-5.5 | $425.00 |
Two practical lessons follow:
- At small usage, developer experience and quality matter more than token-price deltas.
- At millions of calls or very long prompts, Gemini's input price and prompt-size tiers materially change infrastructure cost.
How to choose by workload
Choose Claude Sonnet 4.6 for coding agents and multi-step workflows
Claude Sonnet 4.6 is the safest default when the product is an AI coding assistant, code-review bot, refactoring workflow, tool-using agent, or developer-support assistant. The current Claude model overview positions Sonnet 4.6 as the best combination of speed and intelligence, and it supports extended thinking and adaptive thinking.
That matters when you want a model to reason before using tools, inspect a large code context, and produce fewer brittle edits. Claude is not the cheapest option, but Sonnet's $3/$15 price band is still below GPT-5.5's $5/$30 and far below older Opus 4.x pricing tiers.
Choose GPT-5.5 when OpenAI's platform surface is the product fit
GPT-5.5 is the most expensive model in the main comparison, but cost is not the only decision variable. Teams already using the Responses API, OpenAI SDKs, Agents SDK, built-in tools, file search, web search, computer use, or existing OpenAI observability may prefer one platform surface even when another model has cheaper raw tokens.
Use GPT-5.5 when you want the latest OpenAI flagship and can charge enough for the output quality. Use GPT-5.4 when OpenAI compatibility matters but the workload is high-volume enough that the $5/$30 GPT-5.5 price is hard to justify.
Choose Gemini 2.5 Pro for cost-sensitive long-context work
Gemini 2.5 Pro is still the price anchor in this comparison. It is especially attractive for RAG, document review, codebase summarization, multi-file analysis, and batch extraction where input tokens dominate output tokens.
The caveat is prompt size. Google's pricing table charges a higher rate for Gemini 2.5 Pro prompts above 200K tokens. If your app routinely sends huge contexts, model your actual prompt-size distribution instead of multiplying by the headline $1.25/M input number.
Evaluate Gemini 3.1 Pro Preview separately
Google's model docs list Gemini 3.1 Pro Preview as a newer option for advanced intelligence, complex problem solving, and agentic coding. It is relevant to any 2026 buying decision, but preview status changes the risk: pricing, model behavior, names, and availability can move faster than stable models.
Use it for evals and canary traffic before moving critical production routes.
Context window and output limits
Long-context claims are more nuanced than older comparisons suggested:
- Anthropic's current overview lists Claude Opus 4.7 and Claude Sonnet 4.6 at 1M context, with max outputs of 128K and 64K respectively.
- The OpenAI docs verified for this refresh price GPT-5.5 and GPT-5.4 under context rows below 272K tokens and did not publish a source-backed 128K max output limit for those exact model IDs.
- Gemini 2.5 Pro remains compelling because of its long-context pricing posture, but the tier change is workload-dependent: input and context-cache rates double above 200K prompt tokens, while output rises from $10/M to $15/M rather than doubling.
That means the question is no longer only "who has the biggest context window?" Ask instead:
- How much of your workload actually needs more than 200K prompt tokens?
- Do you need long output, or only long input?
- Can caching, summarization, retrieval, or batch processing reduce repeated input?
- Does model quality hold up across the full context, or only in benchmark-style prompts?
API developer experience
All three vendors now cover the baseline production API features developers expect:
| Capability | Claude | OpenAI | Gemini |
|---|---|---|---|
| Streaming | Yes | Yes | Yes |
| Tool/function calling | Yes | Yes | Yes |
| Structured outputs | Tool schema pattern | JSON/schema-oriented Responses patterns | Response schema / structured outputs |
| Long context | Yes on current Opus/Sonnet lines | Yes on GPT-5.5/GPT-5.4 | Yes on Gemini 2.5 / 3.1 lines |
| Reasoning controls | Extended/adaptive thinking on current Claude lines | Reasoning effort controls in current OpenAI docs | Thinking budgets / thinking-token pricing on Gemini pricing docs |
| Multimodal | Text and image across current Claude models | Text, image, audio/tooling depending on model/API | Text, image, audio/video family strengths |
The migration cost is usually not the HTTP call itself. It is your prompt format, tool schemas, streaming event parser, eval harness, retry strategy, caching layer, safety filters, and budget alerts.
A practical vendor-selection checklist
Before committing to one LLM API, run the same workload through all three with a thin adapter and record:
- Task success rate: pass/fail on your real eval set, not only public benchmarks.
- Median and p95 latency: include tool-calling and streaming first-token time.
- Total cost per successful task: token cost divided by tasks that actually pass review.
- Prompt length sensitivity: re-run with 25K, 100K, 200K, and 500K-token contexts if long context matters.
- Schema reliability: count invalid JSON, partial tool calls, retry loops, and safety refusals.
- Operational fit: SDK maturity, logging, data controls, region/cloud availability, and enterprise procurement.
The cheapest model per token can lose if it needs more retries. The best benchmark model can lose if its output price breaks unit economics.
Recommendations by use case
| Use case | Best first pick | Why |
|---|---|---|
| AI coding assistant | Claude Sonnet 4.6 | Strong current Sonnet positioning, extended thinking, good price/quality balance. |
| Expensive agentic coding / high-value automation | Claude Opus 4.7 or GPT-5.5 | Pay for top-tier reasoning only where each successful task is valuable. |
| Long document Q&A | Gemini 2.5 Pro | Lowest-cost long-context baseline, especially when output is short. |
| General chatbot / support assistant | GPT-5.4 or Claude Sonnet 4.6 | Strong platform fit without always paying GPT-5.5 prices. |
| High-volume extraction | Gemini 2.5 Pro or Gemini 3.1 Flash/Flash-Lite family | Model the lower-cost Gemini families if accuracy is enough. |
| OpenAI-native agent platform | GPT-5.4 or GPT-5.5 | Responses API, Agents SDK, built-in tools, and existing OpenAI workflows. |
| Preview-model experimentation | Gemini 3.1 Pro Preview | Run evals before production migration. |
Source notes and update policy
Sources checked May 16, 2026:
- Anthropic Claude model overview and pricing docs for Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5, context windows, max output, extended thinking support, and price bands.
- OpenAI model docs and pricing docs for GPT-5.5, GPT-5.4, context-length pricing rows, and token prices. The public marketing pricing page can return anti-bot responses to curl, so the developer docs are the more reliable machine-readable source for this refresh; the verified docs did not support the older 1M-context or 128K-output assumptions for these exact model IDs.
- Google Gemini API model and pricing docs for Gemini 2.5 Pro, Gemini 3.1 Pro Preview, prompt-size price tiers, thinking-token billing language, and Gemini family positioning.
Because LLM pricing and model IDs change quickly, treat every number above as a snapshot. Before signing a contract or launching a high-volume feature, re-check the official pricing docs and run a small eval against your own traffic.
Related APIScout guides
- LLM API pricing comparison 2026
- How to choose an LLM API in 2026
- OpenRouter API: one key for 100+ LLMs
- AI gateway rate limiting for production LLM apps
- Claude API extended thinking mode
- OpenAI vs Google Gemini API
Compare provider profiles on APIScout: Google Gemini vs OpenAI.
The API Integration Checklist (Free PDF)
Step-by-step checklist: auth setup, rate limit handling, error codes, SDK evaluation, and pricing comparison for 50+ APIs. Used by 200+ developers.
Join 200+ developers. Unsubscribe in one click.