Skip to main content

How to Build Resilient API Integrations That Don't Break

·APIScout Team
resilienceapi integrationfault tolerancebest practicesreliability

How to Build Resilient API Integrations That Don't Break

Every API you depend on will go down. It will have bugs. It will change its response format. It will rate-limit you at the worst possible time. The question isn't whether your API integrations will face problems — it's whether your application survives them gracefully.

The Failure Modes

What Goes Wrong with API Integrations

Failure ModeFrequencyImpact
TimeoutDailySlow responses cascade through your system
Rate limiting (429)Daily-weeklyRequests fail until rate resets
Server error (5xx)WeeklyTemporary failures, usually recoverable
DNS resolution failureMonthlyComplete inability to connect
Certificate expiryRare but devastatingHTTPS connections fail
Breaking API changeQuarterlyIntegration stops working
Response format changeQuarterlyParsing errors, data corruption
DeprecationAnnuallyEndpoints removed, features dropped
Provider shutdownRareComplete integration loss

Pattern 1: Timeouts on Everything

The most common cause of cascading failure: no timeouts.

// ❌ No timeout — request hangs forever if API is slow
const data = await fetch('https://api.example.com/data');

// ✅ Always set timeouts
const data = await fetch('https://api.example.com/data', {
  signal: AbortSignal.timeout(5000), // 5 second timeout
});

// ✅ Even better — different timeouts for different operations
const TIMEOUTS = {
  read: 5000,      // 5s for reads
  write: 10000,    // 10s for writes
  upload: 60000,   // 60s for file uploads
  webhook: 3000,   // 3s for webhook delivery
};

async function apiCall(path: string, type: keyof typeof TIMEOUTS) {
  return fetch(`https://api.example.com${path}`, {
    signal: AbortSignal.timeout(TIMEOUTS[type]),
  });
}

Rule of thumb: Set timeout to 2x the expected response time. If the API normally responds in 200ms, timeout at 500ms-1s.

Pattern 2: Circuit Breaker

Stop calling a broken API. Let it recover instead of overwhelming it with retries.

class CircuitBreaker {
  private failures = 0;
  private lastFailure = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  constructor(
    private failureThreshold: number = 5,
    private resetTimeMs: number = 30000, // 30 seconds
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      // Check if enough time has passed to try again
      if (Date.now() - this.lastFailure > this.resetTimeMs) {
        this.state = 'half-open';
      } else {
        throw new CircuitOpenError('Circuit breaker is open');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failures = 0;
    this.state = 'closed';
  }

  private onFailure() {
    this.failures++;
    this.lastFailure = Date.now();

    if (this.failures >= this.failureThreshold) {
      this.state = 'open';
    }
  }

  getState() {
    return {
      state: this.state,
      failures: this.failures,
    };
  }
}

class CircuitOpenError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'CircuitOpenError';
  }
}

// Usage
const paymentCircuit = new CircuitBreaker(5, 30000);

async function processPayment(amount: number) {
  try {
    return await paymentCircuit.execute(() =>
      stripe.charges.create({ amount, currency: 'usd' })
    );
  } catch (error) {
    if (error instanceof CircuitOpenError) {
      // Payment provider is down — queue for later
      await queueForRetry({ amount, type: 'payment' });
      return { status: 'queued', message: 'Payment will be processed shortly' };
    }
    throw error;
  }
}

Pattern 3: Graceful Degradation

When an API is down, serve reduced functionality instead of breaking entirely.

// Example: Product page with reviews from external API

async function getProductPage(productId: string) {
  // Core data — from your database (must succeed)
  const product = await db.products.findById(productId);

  // Enhanced data — from external APIs (can fail gracefully)
  const [reviews, recommendations, inventory] = await Promise.allSettled([
    fetchReviews(productId),        // Third-party reviews API
    fetchRecommendations(productId), // ML recommendation API
    fetchInventory(productId),       // Warehouse API
  ]);

  return {
    product,
    reviews: reviews.status === 'fulfilled'
      ? reviews.value
      : { items: [], message: 'Reviews temporarily unavailable' },
    recommendations: recommendations.status === 'fulfilled'
      ? recommendations.value
      : [],
    inventory: inventory.status === 'fulfilled'
      ? inventory.value
      : { available: true, message: 'Check store for availability' },
  };
}

Degradation Levels

LevelWhat WorksWhat's DegradedUser Experience
FullEverythingNothingNormal
PartialCore featuresEnhancements (reviews, recommendations)Minor loss
MinimalRead operationsWrite operations queuedCan browse, can't act
CachedStale data servedNo fresh data"Data as of X minutes ago"
MaintenanceNothingEverythingMaintenance page

Pattern 4: Caching and Stale Data

Serve cached data when the API is unavailable:

class CachedAPIClient {
  constructor(
    private cache: Map<string, { data: any; timestamp: number }> = new Map(),
    private maxAge: number = 300000, // 5 minutes
    private staleMaxAge: number = 3600000, // 1 hour (serve stale if API is down)
  ) {}

  async fetch<T>(url: string, options?: RequestInit): Promise<T & { _cached?: boolean }> {
    const cached = this.cache.get(url);

    // Fresh cache — serve immediately
    if (cached && Date.now() - cached.timestamp < this.maxAge) {
      return { ...cached.data, _cached: true };
    }

    // Try fresh fetch
    try {
      const response = await fetch(url, {
        ...options,
        signal: AbortSignal.timeout(5000),
      });

      if (!response.ok) throw new Error(`HTTP ${response.status}`);

      const data = await response.json();
      this.cache.set(url, { data, timestamp: Date.now() });
      return data;
    } catch (error) {
      // Fetch failed — serve stale cache if available
      if (cached && Date.now() - cached.timestamp < this.staleMaxAge) {
        console.warn(`Serving stale cache for ${url} (age: ${Date.now() - cached.timestamp}ms)`);
        return { ...cached.data, _cached: true, _stale: true };
      }

      throw error; // No cache available — propagate error
    }
  }
}

Pattern 5: Idempotent Retry with Deduplication

Safe to retry without duplicate side effects:

async function createOrderWithRetry(orderData: OrderInput): Promise<Order> {
  // Generate idempotency key BEFORE first attempt
  const idempotencyKey = `order_${orderData.userId}_${Date.now()}`;

  for (let attempt = 0; attempt < 3; attempt++) {
    try {
      const response = await fetch('https://api.payments.com/v1/orders', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Idempotency-Key': idempotencyKey, // Same key for all retries
        },
        body: JSON.stringify(orderData),
        signal: AbortSignal.timeout(10000),
      });

      if (response.ok) return response.json();

      if (response.status === 429 || response.status >= 500) {
        // Retryable — same idempotency key means no duplicate charges
        await sleep(Math.pow(2, attempt) * 1000);
        continue;
      }

      // 4xx (except 429) — don't retry, it's a client error
      throw new Error(`HTTP ${response.status}: ${await response.text()}`);
    } catch (error) {
      if (attempt === 2) throw error;
      await sleep(Math.pow(2, attempt) * 1000);
    }
  }

  throw new Error('Max retries exceeded');
}

Pattern 6: Health Check Monitoring

Detect issues before they hit users:

class APIHealthChecker {
  private healthStatus: Map<string, {
    healthy: boolean;
    lastCheck: number;
    latency: number;
    consecutiveFailures: number;
  }> = new Map();

  async check(name: string, healthUrl: string): Promise<boolean> {
    const start = Date.now();

    try {
      const response = await fetch(healthUrl, {
        signal: AbortSignal.timeout(3000),
      });

      const healthy = response.ok;
      const latency = Date.now() - start;

      this.healthStatus.set(name, {
        healthy,
        lastCheck: Date.now(),
        latency,
        consecutiveFailures: healthy ? 0 : (this.healthStatus.get(name)?.consecutiveFailures ?? 0) + 1,
      });

      return healthy;
    } catch {
      const current = this.healthStatus.get(name);
      this.healthStatus.set(name, {
        healthy: false,
        lastCheck: Date.now(),
        latency: Date.now() - start,
        consecutiveFailures: (current?.consecutiveFailures ?? 0) + 1,
      });
      return false;
    }
  }

  getStatus() {
    return Object.fromEntries(this.healthStatus);
  }
}

// Usage: check every 30 seconds
const checker = new APIHealthChecker();

setInterval(async () => {
  await Promise.all([
    checker.check('stripe', 'https://api.stripe.com/v1'),
    checker.check('resend', 'https://api.resend.com/health'),
    checker.check('auth', 'https://api.clerk.com/v1/health'),
  ]);

  const status = checker.getStatus();
  // Alert if any API has 3+ consecutive failures
  for (const [name, state] of Object.entries(status)) {
    if (state.consecutiveFailures >= 3) {
      await alertOps(`${name} API is unhealthy: ${state.consecutiveFailures} consecutive failures`);
    }
  }
}, 30000);

Pattern 7: Response Validation

Don't trust API responses — validate them:

import { z } from 'zod';

// Define expected response shape
const UserResponseSchema = z.object({
  id: z.string(),
  email: z.string().email(),
  name: z.string(),
  created_at: z.string().datetime(),
});

type UserResponse = z.infer<typeof UserResponseSchema>;

async function getUser(userId: string): Promise<UserResponse> {
  const response = await fetch(`/api/users/${userId}`);
  const data = await response.json();

  // Validate response matches expected schema
  const result = UserResponseSchema.safeParse(data);

  if (!result.success) {
    // API response format changed — log and alert
    console.error('API response validation failed:', {
      endpoint: `/api/users/${userId}`,
      errors: result.error.issues,
      received: data,
    });

    // Option 1: Throw (fail fast)
    throw new Error('API response format changed');

    // Option 2: Use with defaults (graceful)
    // return { ...defaults, ...data };
  }

  return result.data;
}

The Resilience Checklist

PatternPriorityImpact
Timeouts on all API callsP0Prevents cascading failures
Exponential backoff with jitterP0Handles rate limits and transient errors
Input/output validationP0Catches API changes early
Circuit breakerP1Stops hammering failing APIs
Graceful degradationP1Users get partial functionality vs errors
Response caching (stale-while-error)P1Serves data during outages
Idempotency keys on writesP1Safe retries without duplicates
Health check monitoringP2Early detection of issues
Multi-provider fallbackP2Survive provider outages
Response schema validationP2Detect breaking changes

Common Mistakes

MistakeImpactFix
No timeoutsOne slow API freezes entire appSet timeouts on every external call
Retry without backoffMakes outages worseExponential backoff + jitter
Same code path for all errorsRetrying non-retryable errorsHandle 4xx vs 5xx vs network errors differently
No fallback for external APIsSingle point of failureCache, degrade, or use backup provider
Trusting API response formatBreaks when API changesValidate responses with Zod/schemas
No monitoring of API healthIssues discovered by usersHealth checks + alerting
Tight coupling to one providerLocked in when problems ariseAbstraction layer for critical APIs

Find the most reliable APIs on APIScout — uptime tracking, reliability scores, and resilience pattern guides for every provider.

Comments