API Rate Limiting Best Practices for Developers

·APIScout Team
rate limitingbest practicesapi designperformancetutorial

Why Rate Limiting Exists

Every API has limits. Whether it's 100 requests per minute or 10,000 per day, rate limiting protects API providers from abuse, ensures fair usage, and keeps infrastructure costs manageable.

As a developer consuming APIs, understanding rate limits isn't optional — it's the difference between a reliable application and one that randomly breaks at scale.

Common Rate Limiting Strategies

Fixed Window

The simplest approach. You get N requests per time window (e.g., 100 requests per minute). The counter resets at the start of each window.

  • Pro: Easy to understand and implement
  • Con: Burst traffic at window boundaries can cause issues

Sliding Window

Instead of resetting at fixed intervals, the window slides with each request. More fair, but harder to predict your remaining quota.

  • Pro: Smoother rate enforcement
  • Con: Harder to calculate remaining requests

Token Bucket

You start with a bucket of tokens. Each request consumes one token. Tokens refill at a steady rate. Allows short bursts while enforcing an average rate.

  • Pro: Handles burst traffic gracefully
  • Con: Can be confusing — "I had 100 tokens, now I have 37?"

Leaky Bucket

Requests queue up and are processed at a constant rate. Excess requests overflow (get rejected). Used by APIs that need strict throughput control.

  • Pro: Predictable processing rate
  • Con: Adds latency during bursts

Reading Rate Limit Headers

Most APIs tell you their limits via HTTP headers. Here are the standard ones:

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1709510460
Retry-After: 30
HeaderMeaning
X-RateLimit-LimitMax requests allowed in the window
X-RateLimit-RemainingRequests left in the current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait before retrying (on 429)

Always parse these headers. Don't guess — let the API tell you exactly when you can send the next request.

Handling 429 Too Many Requests

When you hit a rate limit, the API returns HTTP 429. Here's how to handle it properly:

Exponential Backoff with Jitter

The gold standard for retry logic. Wait progressively longer between retries, with random jitter to prevent thundering herd problems.

async function fetchWithRetry(url, options, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) return response;

    const retryAfter = response.headers.get('Retry-After');
    const baseDelay = retryAfter
      ? parseInt(retryAfter) * 1000
      : Math.pow(2, attempt) * 1000;

    // Add jitter: random delay between 0 and baseDelay
    const jitter = Math.random() * baseDelay;
    const delay = baseDelay + jitter;

    console.log(`Rate limited. Retrying in ${Math.round(delay)}ms...`);
    await new Promise(resolve => setTimeout(resolve, delay));
  }

  throw new Error('Max retries exceeded');
}

Key Rules for Retries

  1. Always respect Retry-After — If the API tells you when to retry, listen
  2. Add jitter — Without it, all your retried requests hit at the same time
  3. Set a max retry count — Don't retry forever
  4. Log rate limit events — If you're hitting limits regularly, you have a design problem

Proactive Strategies

Don't wait for 429 errors. Design your application to stay within limits from the start.

1. Cache Aggressively

The fastest API call is the one you don't make. Cache responses at every layer:

// Simple in-memory cache with TTL
const cache = new Map();

async function cachedFetch(url, ttlMs = 60000) {
  const cached = cache.get(url);
  if (cached && Date.now() - cached.time < ttlMs) {
    return cached.data;
  }

  const response = await fetch(url);
  const data = await response.json();
  cache.set(url, { data, time: Date.now() });
  return data;
}

2. Batch Requests

Many APIs offer batch endpoints. Use them instead of making individual calls.

// Instead of this (10 API calls):
for (const id of userIds) {
  await fetch(`/api/users/${id}`);
}

// Do this (1 API call):
await fetch(`/api/users?ids=${userIds.join(',')}`);

3. Use Webhooks

Instead of polling an API every 30 seconds to check for changes, register a webhook and let the API notify you.

  • Polling: 2,880 requests/day per resource
  • Webhooks: 0 requests — the API pushes updates to you

4. Implement a Request Queue

For high-volume applications, queue outgoing API calls and process them at a controlled rate:

class RateLimitedQueue {
  constructor(requestsPerSecond) {
    this.interval = 1000 / requestsPerSecond;
    this.queue = [];
    this.processing = false;
  }

  async add(fn) {
    return new Promise((resolve, reject) => {
      this.queue.push({ fn, resolve, reject });
      this.process();
    });
  }

  async process() {
    if (this.processing) return;
    this.processing = true;

    while (this.queue.length > 0) {
      const { fn, resolve, reject } = this.queue.shift();
      try {
        resolve(await fn());
      } catch (err) {
        reject(err);
      }
      await new Promise(r => setTimeout(r, this.interval));
    }

    this.processing = false;
  }
}

// Usage: max 10 requests per second
const queue = new RateLimitedQueue(10);
await queue.add(() => fetch('/api/data'));

5. Monitor Your Usage

Track your API consumption proactively. Set alerts at 80% of your limit so you can optimize before you hit errors.

APIFree Tier LimitPaid LimitReset Window
GitHub60/hr (unauth), 5,000/hr (auth)15,000/hr1 hour
OpenAIVaries by modelVaries by tier1 minute
Stripe100/sec (live), 25/sec (test)CustomPer second
Twitter/X1,500 tweets/month10,000/monthMonthly
Google Maps28,500/dayPay-per-useDaily

When You're the API Provider

If you're building an API, here's how to implement rate limiting well:

  1. Return clear headers — Always include X-RateLimit-* headers
  2. Use 429 status codes — Not 403 or 500
  3. Include Retry-After — Tell consumers exactly when to retry
  4. Offer tiered limits — Different plans should have different limits
  5. Document your limits — Don't make developers discover them by hitting them

Conclusion

Rate limiting is a fact of life when working with APIs. The best developers don't fight it — they design around it with caching, batching, queuing, and smart retry logic.

Browse our API directory to compare rate limits across hundreds of APIs and find the ones that fit your usage patterns.