How to Handle API Rate Limits Gracefully
How to Handle API Rate Limits Gracefully
Every API has rate limits. Hit them, and your requests fail with 429 errors. Handle them poorly, and your users see errors, your batch jobs crash, and your integrations break. Handle them well, and your app stays reliable even when you're pushing limits.
How Rate Limits Work
Common Rate Limit Types
| Type | How It Works | Example |
|---|---|---|
| Requests per second | Fixed window of requests per second | 10 req/s |
| Requests per minute | Fixed window per minute | 100 req/min |
| Token bucket | Tokens refill at steady rate, burst allowed | 100 tokens, 10/s refill |
| Sliding window | Rolling time window, no burst edge | 100 req in any 60s window |
| Concurrent | Max simultaneous requests | 5 concurrent connections |
| Daily quota | Fixed daily limit | 10,000 req/day |
| Token-based (AI) | Tokens per minute (TPM) | 100K TPM |
Rate Limit Headers
Most APIs tell you about limits in response headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1704067200
Retry-After: 30
# Or the newer standard (RFC 9110):
RateLimit-Limit: 100
RateLimit-Remaining: 87
RateLimit-Reset: 30
The 429 Response
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please retry after 30 seconds.",
"retry_after": 30
}
}
Pattern 1: Exponential Backoff with Jitter
The most important pattern. Retry failed requests with increasing delays.
async function fetchWithRetry<T>(
url: string,
options: RequestInit,
maxRetries = 5
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const response = await fetch(url, options);
if (response.status === 429) {
// Respect Retry-After header if present
const retryAfter = response.headers.get('Retry-After');
const waitMs = retryAfter
? parseInt(retryAfter) * 1000
: calculateBackoff(attempt);
console.log(`Rate limited. Waiting ${waitMs}ms before retry ${attempt + 1}`);
await sleep(waitMs);
continue;
}
if (response.status >= 500 && attempt < maxRetries) {
// Server error — also worth retrying
await sleep(calculateBackoff(attempt));
continue;
}
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${await response.text()}`);
}
return response.json();
} catch (error) {
if (attempt === maxRetries) throw error;
if (error instanceof TypeError) {
// Network error — retry
await sleep(calculateBackoff(attempt));
continue;
}
throw error;
}
}
throw new Error('Max retries exceeded');
}
function calculateBackoff(attempt: number): number {
// Exponential backoff: 1s, 2s, 4s, 8s, 16s
const baseMs = Math.pow(2, attempt) * 1000;
// Add jitter: random ±50% to prevent thundering herd
const jitter = baseMs * (0.5 + Math.random());
// Cap at 30 seconds
return Math.min(jitter, 30000);
}
function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
Why jitter matters: Without jitter, all retry requests hit the API at the same time (thundering herd). Jitter spreads them out.
Pattern 2: Client-Side Rate Limiting
Don't wait for 429s — prevent them by throttling requests yourself.
class RateLimiter {
private queue: Array<{
execute: () => Promise<any>;
resolve: (value: any) => void;
reject: (error: any) => void;
}> = [];
private activeCount = 0;
private timestamps: number[] = [];
constructor(
private maxPerSecond: number,
private maxConcurrent: number = 10
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.queue.push({ execute: fn, resolve, reject });
this.processQueue();
});
}
private async processQueue() {
if (this.queue.length === 0) return;
if (this.activeCount >= this.maxConcurrent) return;
// Clean old timestamps
const now = Date.now();
this.timestamps = this.timestamps.filter(t => now - t < 1000);
if (this.timestamps.length >= this.maxPerSecond) {
// Wait until oldest timestamp expires
const waitMs = 1000 - (now - this.timestamps[0]);
setTimeout(() => this.processQueue(), waitMs);
return;
}
const item = this.queue.shift();
if (!item) return;
this.activeCount++;
this.timestamps.push(now);
try {
const result = await item.execute();
item.resolve(result);
} catch (error) {
item.reject(error);
} finally {
this.activeCount--;
this.processQueue();
}
}
}
// Usage
const limiter = new RateLimiter(10, 5); // 10 req/s, 5 concurrent
const results = await Promise.all(
userIds.map(id =>
limiter.execute(() => fetch(`/api/users/${id}`).then(r => r.json()))
)
);
Pattern 3: Token Bucket
For APIs with token-bucket rate limiting (like AI APIs with tokens-per-minute):
class TokenBucket {
private tokens: number;
private lastRefill: number;
constructor(
private maxTokens: number,
private refillRate: number, // tokens per second
) {
this.tokens = maxTokens;
this.lastRefill = Date.now();
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
async consume(count: number): Promise<void> {
this.refill();
if (this.tokens >= count) {
this.tokens -= count;
return;
}
// Wait for enough tokens
const deficit = count - this.tokens;
const waitMs = (deficit / this.refillRate) * 1000;
await new Promise(resolve => setTimeout(resolve, waitMs));
this.refill();
this.tokens -= count;
}
}
// Usage with AI API (tokens per minute)
const bucket = new TokenBucket(100000, 100000 / 60); // 100K TPM
async function callAI(prompt: string) {
const estimatedTokens = prompt.length / 4; // rough estimate
await bucket.consume(estimatedTokens);
return openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
});
}
Pattern 4: Queue-Based Processing
For batch jobs that need to process thousands of items:
class BatchProcessor<T, R> {
private queue: T[] = [];
private results: Map<number, R> = new Map();
constructor(
private processFn: (item: T) => Promise<R>,
private options: {
maxPerSecond: number;
maxConcurrent: number;
onProgress?: (completed: number, total: number) => void;
}
) {}
async process(items: T[]): Promise<R[]> {
this.queue = [...items];
const total = items.length;
let completed = 0;
let active = 0;
const results: R[] = new Array(total);
return new Promise((resolve, reject) => {
const interval = setInterval(() => {
while (
active < this.options.maxConcurrent &&
this.queue.length > 0
) {
const index = total - this.queue.length;
const item = this.queue.shift()!;
active++;
this.processFn(item)
.then(result => {
results[index] = result;
completed++;
active--;
this.options.onProgress?.(completed, total);
if (completed === total) {
clearInterval(interval);
resolve(results);
}
})
.catch(error => {
clearInterval(interval);
reject(error);
});
}
}, 1000 / this.options.maxPerSecond);
});
}
}
// Usage
const processor = new BatchProcessor(
async (userId: string) => {
const response = await fetch(`/api/users/${userId}`);
return response.json();
},
{
maxPerSecond: 10,
maxConcurrent: 5,
onProgress: (done, total) => console.log(`${done}/${total}`),
}
);
const allUsers = await processor.process(userIds);
Pattern 5: Adaptive Rate Limiting
Automatically adjust your request rate based on API responses:
class AdaptiveRateLimiter {
private requestsPerSecond: number;
private consecutiveSuccesses = 0;
private consecutiveFailures = 0;
constructor(
private initialRate: number,
private maxRate: number,
private minRate: number = 1
) {
this.requestsPerSecond = initialRate;
}
onSuccess() {
this.consecutiveSuccesses++;
this.consecutiveFailures = 0;
// Increase rate after 10 consecutive successes
if (this.consecutiveSuccesses >= 10) {
this.requestsPerSecond = Math.min(
this.maxRate,
this.requestsPerSecond * 1.2
);
this.consecutiveSuccesses = 0;
}
}
onRateLimit() {
this.consecutiveFailures++;
this.consecutiveSuccesses = 0;
// Cut rate in half on rate limit
this.requestsPerSecond = Math.max(
this.minRate,
this.requestsPerSecond * 0.5
);
}
getDelayMs(): number {
return 1000 / this.requestsPerSecond;
}
}
Provider-Specific Rate Limits
Quick Reference
| Provider | Rate Limit | Headers | Retry Strategy |
|---|---|---|---|
| Stripe | 100/s (live), 25/s (test) | Standard X-RateLimit-* | Exponential backoff |
| OpenAI | TPM + RPM per model | Standard + usage headers | Exponential backoff, token estimation |
| Anthropic | TPM + RPM per tier | Standard | Backoff + tier upgrade |
| Twilio | 100/s per account | Standard | Backoff + request queuing |
| GitHub | 5,000/hour (auth) | X-RateLimit-* | Respect reset time |
| Shopify | 2/s (REST), cost-based (GraphQL) | X-Shopify-Shop-Api-Call-Limit | Leaky bucket |
| Algolia | Varies by plan | Standard | Client-side limiting |
Monitoring Rate Limits
// Track rate limit usage
class RateLimitMonitor {
private metrics = {
totalRequests: 0,
rateLimitedRequests: 0,
totalRetries: 0,
avgRetryDelay: 0,
};
recordRequest(wasRateLimited: boolean, retryCount: number, retryDelayMs: number) {
this.metrics.totalRequests++;
if (wasRateLimited) {
this.metrics.rateLimitedRequests++;
this.metrics.totalRetries += retryCount;
}
}
getReport() {
return {
...this.metrics,
rateLimitRate: this.metrics.rateLimitedRequests / this.metrics.totalRequests,
recommendation: this.metrics.rateLimitedRequests / this.metrics.totalRequests > 0.05
? 'Consider reducing request rate or upgrading API tier'
: 'Rate limit handling is healthy',
};
}
}
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| No retry on 429 | Requests fail permanently | Implement exponential backoff |
| Retry without backoff | Makes rate limiting worse | Add exponential delay + jitter |
| Ignoring Retry-After header | Retrying too soon | Parse and respect Retry-After |
| No client-side throttling | Hit 429s constantly | Pre-limit requests to known rate |
| Fixed delay retries | Thundering herd problem | Add jitter to retry delays |
| No monitoring of 429 rates | Don't know you have a problem | Track rate limit hit percentage |
| Retrying on all errors | Retrying permanent failures | Only retry 429 and 5xx |
Compare API rate limits across providers on APIScout — find the most generous limits and best rate limit handling documentation.