Skip to main content

How to Handle API Rate Limits Gracefully

·APIScout Team
rate limitingapi integrationbest practicesresilienceperformance

How to Handle API Rate Limits Gracefully

Every API has rate limits. Hit them, and your requests fail with 429 errors. Handle them poorly, and your users see errors, your batch jobs crash, and your integrations break. Handle them well, and your app stays reliable even when you're pushing limits.

How Rate Limits Work

Common Rate Limit Types

TypeHow It WorksExample
Requests per secondFixed window of requests per second10 req/s
Requests per minuteFixed window per minute100 req/min
Token bucketTokens refill at steady rate, burst allowed100 tokens, 10/s refill
Sliding windowRolling time window, no burst edge100 req in any 60s window
ConcurrentMax simultaneous requests5 concurrent connections
Daily quotaFixed daily limit10,000 req/day
Token-based (AI)Tokens per minute (TPM)100K TPM

Rate Limit Headers

Most APIs tell you about limits in response headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1704067200
Retry-After: 30

# Or the newer standard (RFC 9110):
RateLimit-Limit: 100
RateLimit-Remaining: 87
RateLimit-Reset: 30

The 429 Response

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 30 seconds.",
    "retry_after": 30
  }
}

Pattern 1: Exponential Backoff with Jitter

The most important pattern. Retry failed requests with increasing delays.

async function fetchWithRetry<T>(
  url: string,
  options: RequestInit,
  maxRetries = 5
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, options);

      if (response.status === 429) {
        // Respect Retry-After header if present
        const retryAfter = response.headers.get('Retry-After');
        const waitMs = retryAfter
          ? parseInt(retryAfter) * 1000
          : calculateBackoff(attempt);

        console.log(`Rate limited. Waiting ${waitMs}ms before retry ${attempt + 1}`);
        await sleep(waitMs);
        continue;
      }

      if (response.status >= 500 && attempt < maxRetries) {
        // Server error — also worth retrying
        await sleep(calculateBackoff(attempt));
        continue;
      }

      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${await response.text()}`);
      }

      return response.json();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      if (error instanceof TypeError) {
        // Network error — retry
        await sleep(calculateBackoff(attempt));
        continue;
      }
      throw error;
    }
  }

  throw new Error('Max retries exceeded');
}

function calculateBackoff(attempt: number): number {
  // Exponential backoff: 1s, 2s, 4s, 8s, 16s
  const baseMs = Math.pow(2, attempt) * 1000;
  // Add jitter: random ±50% to prevent thundering herd
  const jitter = baseMs * (0.5 + Math.random());
  // Cap at 30 seconds
  return Math.min(jitter, 30000);
}

function sleep(ms: number): Promise<void> {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Why jitter matters: Without jitter, all retry requests hit the API at the same time (thundering herd). Jitter spreads them out.

Pattern 2: Client-Side Rate Limiting

Don't wait for 429s — prevent them by throttling requests yourself.

class RateLimiter {
  private queue: Array<{
    execute: () => Promise<any>;
    resolve: (value: any) => void;
    reject: (error: any) => void;
  }> = [];
  private activeCount = 0;
  private timestamps: number[] = [];

  constructor(
    private maxPerSecond: number,
    private maxConcurrent: number = 10
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push({ execute: fn, resolve, reject });
      this.processQueue();
    });
  }

  private async processQueue() {
    if (this.queue.length === 0) return;
    if (this.activeCount >= this.maxConcurrent) return;

    // Clean old timestamps
    const now = Date.now();
    this.timestamps = this.timestamps.filter(t => now - t < 1000);

    if (this.timestamps.length >= this.maxPerSecond) {
      // Wait until oldest timestamp expires
      const waitMs = 1000 - (now - this.timestamps[0]);
      setTimeout(() => this.processQueue(), waitMs);
      return;
    }

    const item = this.queue.shift();
    if (!item) return;

    this.activeCount++;
    this.timestamps.push(now);

    try {
      const result = await item.execute();
      item.resolve(result);
    } catch (error) {
      item.reject(error);
    } finally {
      this.activeCount--;
      this.processQueue();
    }
  }
}

// Usage
const limiter = new RateLimiter(10, 5); // 10 req/s, 5 concurrent

const results = await Promise.all(
  userIds.map(id =>
    limiter.execute(() => fetch(`/api/users/${id}`).then(r => r.json()))
  )
);

Pattern 3: Token Bucket

For APIs with token-bucket rate limiting (like AI APIs with tokens-per-minute):

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private maxTokens: number,
    private refillRate: number, // tokens per second
  ) {
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
  }

  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }

  async consume(count: number): Promise<void> {
    this.refill();

    if (this.tokens >= count) {
      this.tokens -= count;
      return;
    }

    // Wait for enough tokens
    const deficit = count - this.tokens;
    const waitMs = (deficit / this.refillRate) * 1000;
    await new Promise(resolve => setTimeout(resolve, waitMs));
    this.refill();
    this.tokens -= count;
  }
}

// Usage with AI API (tokens per minute)
const bucket = new TokenBucket(100000, 100000 / 60); // 100K TPM

async function callAI(prompt: string) {
  const estimatedTokens = prompt.length / 4; // rough estimate
  await bucket.consume(estimatedTokens);
  return openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
  });
}

Pattern 4: Queue-Based Processing

For batch jobs that need to process thousands of items:

class BatchProcessor<T, R> {
  private queue: T[] = [];
  private results: Map<number, R> = new Map();

  constructor(
    private processFn: (item: T) => Promise<R>,
    private options: {
      maxPerSecond: number;
      maxConcurrent: number;
      onProgress?: (completed: number, total: number) => void;
    }
  ) {}

  async process(items: T[]): Promise<R[]> {
    this.queue = [...items];
    const total = items.length;
    let completed = 0;
    let active = 0;
    const results: R[] = new Array(total);

    return new Promise((resolve, reject) => {
      const interval = setInterval(() => {
        while (
          active < this.options.maxConcurrent &&
          this.queue.length > 0
        ) {
          const index = total - this.queue.length;
          const item = this.queue.shift()!;
          active++;

          this.processFn(item)
            .then(result => {
              results[index] = result;
              completed++;
              active--;
              this.options.onProgress?.(completed, total);

              if (completed === total) {
                clearInterval(interval);
                resolve(results);
              }
            })
            .catch(error => {
              clearInterval(interval);
              reject(error);
            });
        }
      }, 1000 / this.options.maxPerSecond);
    });
  }
}

// Usage
const processor = new BatchProcessor(
  async (userId: string) => {
    const response = await fetch(`/api/users/${userId}`);
    return response.json();
  },
  {
    maxPerSecond: 10,
    maxConcurrent: 5,
    onProgress: (done, total) => console.log(`${done}/${total}`),
  }
);

const allUsers = await processor.process(userIds);

Pattern 5: Adaptive Rate Limiting

Automatically adjust your request rate based on API responses:

class AdaptiveRateLimiter {
  private requestsPerSecond: number;
  private consecutiveSuccesses = 0;
  private consecutiveFailures = 0;

  constructor(
    private initialRate: number,
    private maxRate: number,
    private minRate: number = 1
  ) {
    this.requestsPerSecond = initialRate;
  }

  onSuccess() {
    this.consecutiveSuccesses++;
    this.consecutiveFailures = 0;

    // Increase rate after 10 consecutive successes
    if (this.consecutiveSuccesses >= 10) {
      this.requestsPerSecond = Math.min(
        this.maxRate,
        this.requestsPerSecond * 1.2
      );
      this.consecutiveSuccesses = 0;
    }
  }

  onRateLimit() {
    this.consecutiveFailures++;
    this.consecutiveSuccesses = 0;

    // Cut rate in half on rate limit
    this.requestsPerSecond = Math.max(
      this.minRate,
      this.requestsPerSecond * 0.5
    );
  }

  getDelayMs(): number {
    return 1000 / this.requestsPerSecond;
  }
}

Provider-Specific Rate Limits

Quick Reference

ProviderRate LimitHeadersRetry Strategy
Stripe100/s (live), 25/s (test)Standard X-RateLimit-*Exponential backoff
OpenAITPM + RPM per modelStandard + usage headersExponential backoff, token estimation
AnthropicTPM + RPM per tierStandardBackoff + tier upgrade
Twilio100/s per accountStandardBackoff + request queuing
GitHub5,000/hour (auth)X-RateLimit-*Respect reset time
Shopify2/s (REST), cost-based (GraphQL)X-Shopify-Shop-Api-Call-LimitLeaky bucket
AlgoliaVaries by planStandardClient-side limiting

Monitoring Rate Limits

// Track rate limit usage
class RateLimitMonitor {
  private metrics = {
    totalRequests: 0,
    rateLimitedRequests: 0,
    totalRetries: 0,
    avgRetryDelay: 0,
  };

  recordRequest(wasRateLimited: boolean, retryCount: number, retryDelayMs: number) {
    this.metrics.totalRequests++;
    if (wasRateLimited) {
      this.metrics.rateLimitedRequests++;
      this.metrics.totalRetries += retryCount;
    }
  }

  getReport() {
    return {
      ...this.metrics,
      rateLimitRate: this.metrics.rateLimitedRequests / this.metrics.totalRequests,
      recommendation: this.metrics.rateLimitedRequests / this.metrics.totalRequests > 0.05
        ? 'Consider reducing request rate or upgrading API tier'
        : 'Rate limit handling is healthy',
    };
  }
}

Common Mistakes

MistakeImpactFix
No retry on 429Requests fail permanentlyImplement exponential backoff
Retry without backoffMakes rate limiting worseAdd exponential delay + jitter
Ignoring Retry-After headerRetrying too soonParse and respect Retry-After
No client-side throttlingHit 429s constantlyPre-limit requests to known rate
Fixed delay retriesThundering herd problemAdd jitter to retry delays
No monitoring of 429 ratesDon't know you have a problemTrack rate limit hit percentage
Retrying on all errorsRetrying permanent failuresOnly retry 429 and 5xx

Compare API rate limits across providers on APIScout — find the most generous limits and best rate limit handling documentation.

Comments