Skip to main content

API guide

Claude 3.7 vs GPT-5 vs Gemini 2.5 API Pricing: 2026 Update

Current Claude, GPT-5, and Gemini API pricing, context windows, reasoning controls, and model-selection tradeoffs for production LLM apps in 2026.

·APIScout Team
Share:
Hero image for Claude 3.7 vs GPT-5 vs Gemini 2.5 API Pricing: 2026 Update

Claude 3.7 vs GPT-5 vs Gemini 2.5 API Pricing: 2026 Update

TL;DR

If you arrived here searching for Claude 3.7 Sonnet pricing, treat that model as the historical comparison point and compare today's production choices instead. As of May 16, 2026, Anthropic's practical Sonnet pick is Claude Sonnet 4.6 at $3/M input and $15/M output tokens, OpenAI's current flagship line is GPT-5.5 at $5/M input and $30/M output tokens, and Gemini 2.5 Pro remains the lower-cost long-context option at $1.25/M input and $10/M output for prompts up to 200K tokens. Gemini 3.1 Pro Preview is now available too, but this guide keeps Gemini 2.5 Pro in the main table because the route's search demand is specifically about Claude/GPT/Gemini 2.5 pricing.

For most production LLM API teams, the short answer is:

  • Choose Claude Sonnet 4.6 when coding, agents, tool use, and controllable extended thinking matter more than the lowest token price.
  • Choose GPT-5.5 when you want OpenAI's newest flagship model, OpenAI-native tool surfaces, and can justify the higher output price.
  • Choose Gemini 2.5 Pro when long context and cost-per-token are the main constraints, especially for document-heavy RAG and batch analysis.
  • Do not make a 2026 vendor decision from Claude 3.7-only numbers; use them only to understand older articles, migration plans, or invoices.

Key takeaways

  • Gemini 2.5 Pro is still the cheapest flagship-style comparison point in this set: $1.25/M input and $10/M output for prompts up to 200K tokens, with higher pricing for prompts above 200K tokens.
  • Claude Sonnet 4.6 keeps the Sonnet price band at $3/M input and $15/M output while adding current Claude 4-series capabilities and extended thinking support.
  • GPT-5.5 is the premium OpenAI pick at $5/M input and $30/M output. The OpenAI pricing docs checked for this refresh list GPT-5.5/GPT-5.4 under pricing rows for contexts below 272K tokens; do not rely on older 1M-context or 128K-output assumptions unless OpenAI documents them for the exact model ID you deploy.
  • Context limits are now vendor- and model-ID-specific: current Claude Opus/Sonnet docs expose 1M context windows, OpenAI's verified pricing docs for GPT-5.5/GPT-5.4 use pricing rows for contexts below 272K tokens, and Gemini 2.5 Pro pricing differentiates by whether the prompt is above 200K tokens.
  • Pricing pages move faster than benchmark leaderboards. Use official price tables for budgets and treat third-party benchmark claims as directional, not final.

Why this comparison still matters in 2026

The original version of this guide compared Claude 3.7 Sonnet, GPT-5-era OpenAI models, and Gemini 2.5 Pro because that was the searcher's mental model: which frontier LLM API should I buy for a production app?

That question is still useful, but the vendor lineups have moved. Anthropic's current model overview lists Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5 as the latest Claude family. OpenAI's model docs surface GPT-5.5 as the latest flagship for coding and professional work, with GPT-5.4 as the more affordable sibling. Google still documents Gemini 2.5 Pro as a state-of-the-art multipurpose model, while Gemini 3.1 Pro Preview now exists for teams willing to evaluate a newer preview line.

So the refreshed decision is not "does Claude 3.7 beat GPT-5?" It is: which current API line gives your app the best reliability, latency, context, reasoning controls, and token economics?


Current pricing matrix

Official USD pricing checked May 16, 2026. Prices are per 1M tokens unless noted.

Model lineInputOutputContext windowMax outputCurrent source-backed note
Gemini 2.5 Pro$1.25 up to 200K prompt tokens; $2.50 above 200K$10 up to 200K prompt tokens; $15 above 200KLong-context Gemini API model; Google prices prompts above 200K separatelyUse current model docs for exact limitsLowest-cost option in this comparison for most prompt sizes.
Claude Sonnet 4.6$3$151M tokens64K tokensCurrent Sonnet line; extended thinking and adaptive thinking are supported.
GPT-5.4$2.50$15Pricing row for contexts below 272K tokens in the verified OpenAI docsUse current model docs for exact limitsMore affordable OpenAI GPT-5 line for teams that do not need the newest flagship.
GPT-5.5$5$30Pricing row for contexts below 272K tokens in the verified OpenAI docsUse current model docs for exact limitsOpenAI's newest flagship-class option in the docs.
Claude Opus 4.7$5$251M tokens128K tokensHigher-end Claude option for complex reasoning and agentic coding.
Gemini 3.1 Pro Preview$2 up to 200K prompt tokens; $4 above 200K$12 up to 200K prompt tokens; $18 above 200KPreview Gemini 3.1 Pro lineUse current model docs for exact limitsWorth evaluating, but preview status changes the production-risk profile.

Cost example: a real SaaS feature

Assume 50,000 API calls per month, averaging 500 input tokens and 200 output tokens per call. That is 25M input tokens and 10M output tokens per month.

Model lineEstimated monthly token cost
Gemini 2.5 Pro, prompts up to 200K$131.25
Gemini 3.1 Pro Preview, prompts up to 200K$170.00
GPT-5.4$212.50
Claude Sonnet 4.6$225.00
Claude Opus 4.7$375.00
GPT-5.5$425.00

Two practical lessons follow:

  1. At small usage, developer experience and quality matter more than token-price deltas.
  2. At millions of calls or very long prompts, Gemini's input price and prompt-size tiers materially change infrastructure cost.

How to choose by workload

Choose Claude Sonnet 4.6 for coding agents and multi-step workflows

Claude Sonnet 4.6 is the safest default when the product is an AI coding assistant, code-review bot, refactoring workflow, tool-using agent, or developer-support assistant. The current Claude model overview positions Sonnet 4.6 as the best combination of speed and intelligence, and it supports extended thinking and adaptive thinking.

That matters when you want a model to reason before using tools, inspect a large code context, and produce fewer brittle edits. Claude is not the cheapest option, but Sonnet's $3/$15 price band is still below GPT-5.5's $5/$30 and far below older Opus 4.x pricing tiers.

Choose GPT-5.5 when OpenAI's platform surface is the product fit

GPT-5.5 is the most expensive model in the main comparison, but cost is not the only decision variable. Teams already using the Responses API, OpenAI SDKs, Agents SDK, built-in tools, file search, web search, computer use, or existing OpenAI observability may prefer one platform surface even when another model has cheaper raw tokens.

Use GPT-5.5 when you want the latest OpenAI flagship and can charge enough for the output quality. Use GPT-5.4 when OpenAI compatibility matters but the workload is high-volume enough that the $5/$30 GPT-5.5 price is hard to justify.

Choose Gemini 2.5 Pro for cost-sensitive long-context work

Gemini 2.5 Pro is still the price anchor in this comparison. It is especially attractive for RAG, document review, codebase summarization, multi-file analysis, and batch extraction where input tokens dominate output tokens.

The caveat is prompt size. Google's pricing table charges a higher rate for Gemini 2.5 Pro prompts above 200K tokens. If your app routinely sends huge contexts, model your actual prompt-size distribution instead of multiplying by the headline $1.25/M input number.

Evaluate Gemini 3.1 Pro Preview separately

Google's model docs list Gemini 3.1 Pro Preview as a newer option for advanced intelligence, complex problem solving, and agentic coding. It is relevant to any 2026 buying decision, but preview status changes the risk: pricing, model behavior, names, and availability can move faster than stable models.

Use it for evals and canary traffic before moving critical production routes.


Context window and output limits

Long-context claims are more nuanced than older comparisons suggested:

  • Anthropic's current overview lists Claude Opus 4.7 and Claude Sonnet 4.6 at 1M context, with max outputs of 128K and 64K respectively.
  • The OpenAI docs verified for this refresh price GPT-5.5 and GPT-5.4 under context rows below 272K tokens and did not publish a source-backed 128K max output limit for those exact model IDs.
  • Gemini 2.5 Pro remains compelling because of its long-context pricing posture, but the tier change is workload-dependent: input and context-cache rates double above 200K prompt tokens, while output rises from $10/M to $15/M rather than doubling.

That means the question is no longer only "who has the biggest context window?" Ask instead:

  1. How much of your workload actually needs more than 200K prompt tokens?
  2. Do you need long output, or only long input?
  3. Can caching, summarization, retrieval, or batch processing reduce repeated input?
  4. Does model quality hold up across the full context, or only in benchmark-style prompts?

API developer experience

All three vendors now cover the baseline production API features developers expect:

CapabilityClaudeOpenAIGemini
StreamingYesYesYes
Tool/function callingYesYesYes
Structured outputsTool schema patternJSON/schema-oriented Responses patternsResponse schema / structured outputs
Long contextYes on current Opus/Sonnet linesYes on GPT-5.5/GPT-5.4Yes on Gemini 2.5 / 3.1 lines
Reasoning controlsExtended/adaptive thinking on current Claude linesReasoning effort controls in current OpenAI docsThinking budgets / thinking-token pricing on Gemini pricing docs
MultimodalText and image across current Claude modelsText, image, audio/tooling depending on model/APIText, image, audio/video family strengths

The migration cost is usually not the HTTP call itself. It is your prompt format, tool schemas, streaming event parser, eval harness, retry strategy, caching layer, safety filters, and budget alerts.


A practical vendor-selection checklist

Before committing to one LLM API, run the same workload through all three with a thin adapter and record:

  1. Task success rate: pass/fail on your real eval set, not only public benchmarks.
  2. Median and p95 latency: include tool-calling and streaming first-token time.
  3. Total cost per successful task: token cost divided by tasks that actually pass review.
  4. Prompt length sensitivity: re-run with 25K, 100K, 200K, and 500K-token contexts if long context matters.
  5. Schema reliability: count invalid JSON, partial tool calls, retry loops, and safety refusals.
  6. Operational fit: SDK maturity, logging, data controls, region/cloud availability, and enterprise procurement.

The cheapest model per token can lose if it needs more retries. The best benchmark model can lose if its output price breaks unit economics.


Recommendations by use case

Use caseBest first pickWhy
AI coding assistantClaude Sonnet 4.6Strong current Sonnet positioning, extended thinking, good price/quality balance.
Expensive agentic coding / high-value automationClaude Opus 4.7 or GPT-5.5Pay for top-tier reasoning only where each successful task is valuable.
Long document Q&AGemini 2.5 ProLowest-cost long-context baseline, especially when output is short.
General chatbot / support assistantGPT-5.4 or Claude Sonnet 4.6Strong platform fit without always paying GPT-5.5 prices.
High-volume extractionGemini 2.5 Pro or Gemini 3.1 Flash/Flash-Lite familyModel the lower-cost Gemini families if accuracy is enough.
OpenAI-native agent platformGPT-5.4 or GPT-5.5Responses API, Agents SDK, built-in tools, and existing OpenAI workflows.
Preview-model experimentationGemini 3.1 Pro PreviewRun evals before production migration.

Source notes and update policy

Sources checked May 16, 2026:

  • Anthropic Claude model overview and pricing docs for Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5, context windows, max output, extended thinking support, and price bands.
  • OpenAI model docs and pricing docs for GPT-5.5, GPT-5.4, context-length pricing rows, and token prices. The public marketing pricing page can return anti-bot responses to curl, so the developer docs are the more reliable machine-readable source for this refresh; the verified docs did not support the older 1M-context or 128K-output assumptions for these exact model IDs.
  • Google Gemini API model and pricing docs for Gemini 2.5 Pro, Gemini 3.1 Pro Preview, prompt-size price tiers, thinking-token billing language, and Gemini family positioning.

Because LLM pricing and model IDs change quickly, treat every number above as a snapshot. Before signing a contract or launching a high-volume feature, re-check the official pricing docs and run a small eval against your own traffic.


Compare provider profiles on APIScout: Google Gemini vs OpenAI.

The API Integration Checklist (Free PDF)

Step-by-step checklist: auth setup, rate limit handling, error codes, SDK evaluation, and pricing comparison for 50+ APIs. Used by 200+ developers.

Join 200+ developers. Unsubscribe in one click.