Free AI APIs for Developers 2026: Rate Limits Compared
TL;DR
Google Gemini has the most generous permanent free tier for a frontier model — Gemini 2.5 Pro with no expiring credits, 1M token context window, and 100 requests/day. Groq has the best throughput for free: 30,000 tokens/minute on LLaMA 3.1 8B, with sub-second response times. OpenAI's free tier is nearly unusable (3 RPM, GPT-3.5 only) — a $5 deposit is effectively required. Anthropic has no permanent free API tier at all. Here's the complete breakdown.
Key Takeaways
- Gemini: Permanent free tier with frontier model access — 2.5 Pro at 5 RPM, 100 RPD, 250K TPM, 1M context
- Groq: Fastest free inference — LLaMA 3.1 8B at 30K TPM, daily limits reset at midnight UTC
- Mistral: 1 billion tokens/month free, but prompts may be used for model training — privacy tradeoff
- Hugging Face: Thousands of open-source models, informal rate limits, best for model evaluation
- Together AI: $100 in credits at signup (not a permanent free tier), 200+ models, OpenAI-compatible API
- Cohere: Full model access but only 1,000 calls/month — evaluation use only
- OpenAI: 3 RPM on GPT-3.5 only — effectively not usable without a $5 deposit
- Anthropic: No permanent free API tier — trial credits only (~$5, expires)
The 2026 Free Tier Landscape
The AI API market matured significantly in 2025. The providers that started as "free trial" offerings split into two camps: those building genuine developer ecosystems with permanent free tiers (Google, Groq, Mistral, Hugging Face) and those treating the free tier as a funnel to paid plans (OpenAI, Anthropic).
This distinction matters for developers choosing where to prototype. A $5 trial credit that expires in 3 months is fundamentally different from a permanent rate-limited tier — especially for open source projects, side projects, and educational use.
One significant change to note: Google cut free tier quotas by 50–80% in December 2025, reducing what were previously more generous limits. The current limits below reflect the post-cut state.
Complete Free Tier Comparison
| Provider | Permanent? | Best Free Model | RPM | TPM | RPD | Trial Credits |
|---|---|---|---|---|---|---|
| Google Gemini | ✅ | Gemini 2.5 Pro | 5 | 250K | 100 | None needed |
| Groq | ✅ | LLaMA 3.1 8B | ~30 | 30K | ~360K tokens | None needed |
| Mistral | ✅ | open-mixtral-8x7b | 60 | 500K | ~33M | None needed |
| Hugging Face | ✅ | Thousands of models | ~200/hr | N/A | N/A | None needed |
| Together AI | ❌ | 200+ OSS models | N/A | N/A | N/A | $100 on signup |
| Cohere | ✅ | Command R+ | 20 | N/A | 1K calls/mo | None needed |
| OpenAI | ✅ | GPT-3.5 Turbo | 3 | N/A | N/A | None |
| Anthropic | ❌ | None | — | — | — | ~$5 (expires) |
| Fireworks AI | ❌ | 200+ OSS models | N/A | N/A | N/A | ~$1 credits |
Provider Deep-Dives
Google Gemini — Best Overall Free Tier
Rate limits (free tier, March 2026):
| Model | RPM | TPM | RPD |
|---|---|---|---|
| Gemini 2.5 Pro | 5 | 250,000 | 100 |
| Gemini 2.5 Flash | 10 | 250,000 | 250 |
| Gemini 2.5 Flash-Lite | 15 | 250,000 | 1,000 |
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
const result = await model.generateContent('Summarize this code review...');
console.log(result.response.text());
What makes Gemini's free tier exceptional:
- 1 million token context window — available on the free tier. No other provider offers this for free.
- Frontier model access — Gemini 2.5 Pro is competitive with GPT-4o and Claude Sonnet. Free tier access to a model this capable is unusual.
- No expiring credits — it's a permanent rate-limited tier, not a trial.
The constraint: 100 RPD on Gemini 2.5 Pro. For solo development with interactive usage, you'll hit this daily limit. The workaround: use Flash-Lite (1,000 RPD) for most calls and Pro only when you need maximum capability.
When to upgrade: When you need more than 100 Pro requests/day, lower latency SLAs, or production-grade uptime guarantees. Pay-as-you-go pricing starts at under $0.001/1K tokens for Flash-Lite.
Groq — Best Free Tier for Speed and Throughput
Groq runs open-source models on purpose-built LPU hardware. The result: inference speeds that feel instant — often under 200ms to first token, versus 800ms–2s on standard GPU inference.
Rate limits (free tier, daily reset at midnight UTC):
| Model | RPM | TPM (per minute) | Tokens/Day |
|---|---|---|---|
| LLaMA 3.1 8B | ~30 | 30,000 | ~360,000 |
| LLaMA 3.3 70B | ~30 | 6,000 | ~100,000 |
| Mixtral 8x7B | ~30 | 5,000 | ~100,000 |
| Gemma 7B | ~30 | 15,000 | ~250,000 |
import Groq from 'groq-sdk';
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
const completion = await groq.chat.completions.create({
messages: [{ role: 'user', content: 'Write a SQL query that...' }],
model: 'llama-3.1-8b-instant', // Highest free-tier TPM
temperature: 0.1,
});
console.log(completion.choices[0].message.content);
Groq's API is OpenAI-compatible — swap the base URL and API key to switch from OpenAI:
// Drop-in replacement for OpenAI SDK:
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: 'https://api.groq.com/openai/v1',
});
The constraint: Daily token limits, not monthly. LLaMA 3.3 70B hits ~100K tokens/day — enough for a day of active development but not for automated batch processing. LLaMA 3.1 8B's 360K tokens/day is the most practical free-tier model for sustained development work.
When to upgrade: When you need sustained automated workloads (batch jobs, CI pipelines) or models beyond the free tier lineup. Groq's paid pricing is competitive — LLaMA 3.1 8B at $0.05/M input tokens is among the cheapest production inference available.
Mistral — 1 Billion Tokens/Month Free (With a Catch)
Mistral's free "Experiment" plan is the highest monthly token budget available — approximately 1 billion tokens/month. For context, a typical developer reading 100K words of documentation generates about 133K tokens. 1 billion tokens is roughly 750,000 pages of text.
Rate limits (Experiment plan):
- ~1 RPS (60 RPM)
- 500,000 TPM
- ~1 billion tokens per month
- Phone verification required, no credit card
Available models: open-mistral-7b, open-mixtral-8x7b, open-mistral-nemo, and select others.
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
response = client.chat.complete(
model="open-mixtral-8x7b",
messages=[{"role": "user", "content": "Extract entities from this text..."}],
)
print(response.choices[0].message.content)
The catch: API calls under the Experiment plan may be used to train Mistral's models. This is disclosed in their terms. For prototyping with synthetic data or public information, this is fine. For anything containing proprietary code, user data, or sensitive business information, it's a dealbreaker.
When to upgrade: When data privacy matters (always for production), or when you need mistral-large or mistral-small — the more capable commercial models. Mistral Small starts at $0.10/M input tokens.
Hugging Face — Widest Model Selection
Hugging Face's Serverless Inference API provides free access to thousands of open-source models — Meta's LLaMA family, Mistral models, Google Gemma, Qwen, and community fine-tunes across every category (text, image, audio, video, multimodal).
Rate limits: A few hundred requests per hour, model-dependent. Not precisely documented — effectively: "use it reasonably."
from huggingface_hub import InferenceClient
client = InferenceClient(
model="meta-llama/Llama-3.3-70B-Instruct",
token=os.environ["HF_TOKEN"],
)
response = client.chat_completion(
messages=[{"role": "user", "content": "Explain this algorithm..."}],
max_tokens=500,
)
print(response.choices[0].message.content)
What makes it valuable: Before committing to any model for production, you can evaluate dozens of candidates using Hugging Face's free tier. Test Qwen 2.5 72B vs LLaMA 3.3 70B vs Mistral Large for your specific use case without spending anything.
The constraint: Rate limits are informal and enforced inconsistently. Not reliable for anything time-sensitive. Models sometimes go cold (need warmup time on first request). HF PRO ($9/month) provides higher limits and guaranteed availability.
When to upgrade to PRO: When you need reliable availability during development (no cold starts, consistent rate limits). $9/month is the lowest barrier to reliable AI inference available.
Together AI — Best One-Time Trial Credit
Together AI gives new accounts $100 in free credits — the most generous trial credit of any provider. With 200+ open-source models and OpenAI-compatible API, it's excellent for benchmarking multiple models before choosing one for production.
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.TOGETHER_API_KEY,
baseURL: 'https://api.together.xyz/v1',
});
// Same code, different models:
const response = await client.chat.completions.create({
model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
messages: [{ role: 'user', content: 'Compare these two approaches...' }],
});
Important clarification: Together AI has no permanent free tier. The $100 credit runs out — for typical development use (200-500 API calls/day at $0.10-0.20/1K tokens for a 70B model), expect 2-4 weeks of intensive use. After credits are exhausted, you pay per token.
Startup program: YC companies and qualifying startups can apply for $15,000–$50,000 in credits. Worth checking if you're early-stage.
OpenAI — Skip the Free Tier, Deposit $5
OpenAI's free tier is technically permanent but practically unusable:
- 3 requests per minute (one every 20 seconds)
- GPT-3.5 Turbo only (not GPT-4o, not o-series models)
- No TPM published (implied to be very low)
A $5 deposit upgrades you to Tier 1: 500 RPM, 30,000 TPM, access to all models including GPT-4o, GPT-4o mini, and the o-series reasoning models.
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Tier 1 (after $5 deposit) — access to all models:
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini', // $0.15/$0.60 per 1M tokens — cheapest usable model
messages: [{ role: 'user', content: '...' }],
});
When to use OpenAI: When you specifically need GPT-4o, o3, or o4-mini for their capabilities — reasoning, code generation, or instruction following where OpenAI models outperform alternatives. At $0.15/M input tokens, GPT-4o mini is competitive with Gemini Flash for pay-as-you-go use.
Anthropic Claude — No Free API Tier
Anthropic has no permanent free API tier. New accounts receive approximately $5 in trial credits that expire. University students can apply for $300 in credits through the research program.
The claude.ai web interface has a free plan (expanded in February 2026 to include Projects, Artifacts, and connectors), but this is not API access.
For API access: a $5 deposit starts Tier 1 access — Claude 3.5 Haiku at $0.80/M input tokens is the cheapest entry point, and one of the best value models for high-volume tasks.
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const message = await anthropic.messages.create({
model: 'claude-3-5-haiku-20241022', // Cheapest, still excellent
max_tokens: 1024,
messages: [{ role: 'user', content: '...' }],
});
When to Upgrade
The free tiers above are genuinely useful for development, prototyping, and side projects. Here's when each becomes insufficient:
Move to paid when:
- Traffic volume: Free tier rate limits (RPM/RPD) become the bottleneck in your development workflow
- Production deployment: Free tiers lack uptime SLAs — any production user-facing system needs paid tier reliability guarantees
- Data privacy: Mistral's Experiment plan (and some Hugging Face models) may use your prompts for training. Any proprietary code, user data, or confidential information needs a paid tier with data processing agreements
- Model access: GPT-4o, Claude Sonnet, and commercial Mistral models aren't available on free tiers — if you need these specific models, you pay
- Latency SLAs: Free tiers don't guarantee latency. Production apps serving end users need paid tier priority
The $5 rule: For OpenAI and Anthropic, a $5 deposit is the effective minimum for any real development. The free tier is a friction reducer, not a development platform.
Strategy for Zero-Budget Development
Build your prototype on free tiers in this order:
- Gemini 2.5 Flash for most tasks — 250 RPD, 250K TPM, fast, capable, free
- Groq + LLaMA 3.1 8B for latency-sensitive interactions — 30K TPM, near-instant responses
- Hugging Face for specialized open-source models (code, multilingual, domain-specific)
- Together AI for benchmarking — use the $100 credit to compare 5–10 models before committing to one
When your prototype needs OpenAI or Anthropic specifically (for their unique capabilities), deposit $5 into each. At typical development usage, $5 lasts weeks.
Methodology
Rate limits sourced from official documentation as of March 2026:
- Google AI Studio: ai.google.dev/gemini-api/docs/rate-limits
- Groq Console: console.groq.com/docs/rate-limits
- Mistral Documentation: docs.mistral.ai/deployment/ai-studio/tier
- Cohere Documentation: docs.cohere.com/docs/rate-limits
- OpenAI Platform: platform.openai.com/docs/guides/rate-limits
- Anthropic Documentation: platform.claude.com/docs
- Together AI Pricing: together.ai/pricing
Rate limits change frequently — verify current limits before relying on them for production planning.
Track live pricing and rate limits for 500+ AI APIs on APIScout — updated weekly with monitoring data.
Related: Groq vs OpenAI API 2026 · Anthropic Claude API Guide · Fireworks vs Together vs Groq