Skip to main content

How to Handle Webhook Failures and Retries

·APIScout Team
webhooksreliabilityevent-drivenbest practicesapi integration

How to Handle Webhook Failures and Retries

Webhooks are fire-and-forget from the sender's perspective. If your handler crashes, times out, or returns an error, the webhook provider retries — sometimes for days. Handling this correctly means your app processes every event exactly once, even when things go wrong.

How Webhook Retries Work

Provider Retry Policies

ProviderMax RetriesRetry ScheduleTimeout
Stripe~15 over 3 daysExponential backoff20 seconds
GitHub310s, 60s, 360s10 seconds
TwilioUp to 14Exponential15 seconds
Shopify19 over 48 hoursExponential5 seconds
PayPal15 over 3 daysExponential30 seconds
ClerkMultiple over 3 daysExponential30 seconds

What Triggers a Retry

ResponseProvider Action
2xx (200-299)✅ Success — no retry
3xx (redirect)❌ Treated as failure, retries
4xx (client error)⚠️ Varies — some providers stop, others retry
5xx (server error)❌ Retry with backoff
Timeout❌ Retry with backoff
Connection refused❌ Retry with backoff

Pattern 1: Fast Acknowledgment

Return 200 immediately, process asynchronously:

// ❌ Bad: Process synchronously (can timeout)
app.post('/webhooks/stripe', async (req, res) => {
  const event = verifySignature(req);
  await updateDatabase(event);        // 500ms
  await sendNotification(event);      // 300ms
  await updateAnalytics(event);       // 200ms
  res.status(200).send('OK');         // Total: 1s+ (might timeout)
});

// ✅ Good: Acknowledge fast, process async
app.post('/webhooks/stripe', async (req, res) => {
  // 1. Verify signature (fast — <10ms)
  const event = verifySignature(req);

  // 2. Store raw event (fast — <50ms)
  await db.webhookEvents.create({
    id: event.id,
    type: event.type,
    payload: event,
    status: 'pending',
    receivedAt: new Date(),
  });

  // 3. Acknowledge immediately
  res.status(200).send('OK');

  // 4. Process asynchronously
  processWebhookAsync(event).catch(error => {
    console.error('Webhook processing failed:', error);
  });
});

Pattern 2: Idempotent Processing

Webhooks can be delivered multiple times. Process each event exactly once:

async function processWebhookEvent(event: WebhookEvent): Promise<void> {
  // Check if already processed
  const existing = await db.webhookEvents.findById(event.id);

  if (existing?.status === 'processed') {
    console.log(`Event ${event.id} already processed, skipping`);
    return;
  }

  // Use a transaction to prevent race conditions
  await db.transaction(async (tx) => {
    // Double-check inside transaction (another worker might have started)
    const locked = await tx.webhookEvents.findByIdForUpdate(event.id);
    if (locked?.status === 'processed') return;

    // Process the event
    await handleEvent(event, tx);

    // Mark as processed
    await tx.webhookEvents.update(event.id, {
      status: 'processed',
      processedAt: new Date(),
    });
  });
}

async function handleEvent(event: WebhookEvent, tx: Transaction) {
  switch (event.type) {
    case 'payment_intent.succeeded':
      // Use idempotency key for downstream operations too
      await fulfillOrder(event.data.object.id, tx);
      break;
    case 'customer.subscription.deleted':
      await deactivateSubscription(event.data.object.id, tx);
      break;
    // ... other event types
  }
}

Pattern 3: Signature Verification

Always verify webhook signatures to prevent forgery:

import crypto from 'crypto';

// Stripe signature verification
function verifyStripeSignature(
  payload: string, // Raw body string, NOT parsed JSON
  signature: string,
  secret: string
): boolean {
  const elements = signature.split(',');
  const timestamp = elements.find(e => e.startsWith('t='))?.slice(2);
  const v1Signature = elements.find(e => e.startsWith('v1='))?.slice(3);

  if (!timestamp || !v1Signature) return false;

  // Prevent replay attacks (reject if older than 5 minutes)
  const now = Math.floor(Date.now() / 1000);
  if (now - parseInt(timestamp) > 300) return false;

  const signedPayload = `${timestamp}.${payload}`;
  const expected = crypto
    .createHmac('sha256', secret)
    .update(signedPayload)
    .digest('hex');

  return crypto.timingSafeEqual(
    Buffer.from(v1Signature),
    Buffer.from(expected)
  );
}

// Generic HMAC verification (works for most providers)
function verifyHmacSignature(
  payload: string,
  signature: string,
  secret: string,
  algorithm: string = 'sha256'
): boolean {
  const expected = crypto
    .createHmac(algorithm, secret)
    .update(payload)
    .digest('hex');

  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

Critical: Read the raw request body as a string, NOT parsed JSON. Parsing then re-stringifying changes the payload and breaks signature verification.

// Next.js App Router — get raw body
export async function POST(request: Request) {
  const rawBody = await request.text();
  const signature = request.headers.get('stripe-signature')!;

  if (!verifyStripeSignature(rawBody, signature, WEBHOOK_SECRET)) {
    return new Response('Invalid signature', { status: 401 });
  }

  const event = JSON.parse(rawBody);
  // ... process event
}

Pattern 4: Dead Letter Queue

When processing fails after all retries, don't lose the event:

class WebhookProcessor {
  async process(event: WebhookEvent): Promise<void> {
    const MAX_INTERNAL_RETRIES = 3;

    for (let attempt = 0; attempt < MAX_INTERNAL_RETRIES; attempt++) {
      try {
        await this.handleEvent(event);
        await this.markProcessed(event.id);
        return;
      } catch (error) {
        console.error(`Attempt ${attempt + 1} failed for event ${event.id}:`, error);

        if (attempt < MAX_INTERNAL_RETRIES - 1) {
          await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
        }
      }
    }

    // All retries failed — move to dead letter queue
    await this.moveToDeadLetter(event);
  }

  private async moveToDeadLetter(event: WebhookEvent) {
    await db.deadLetterQueue.create({
      eventId: event.id,
      eventType: event.type,
      payload: event,
      failedAt: new Date(),
      retryCount: 0,
    });

    // Alert team
    await alertSlack(`⚠️ Webhook event failed permanently: ${event.type} (${event.id})`);
  }
}

// Admin tool: retry dead letter events
async function retryDeadLetterEvents() {
  const failed = await db.deadLetterQueue.findAll({ status: 'failed' });

  for (const item of failed) {
    try {
      await processor.handleEvent(item.payload);
      await db.deadLetterQueue.update(item.id, { status: 'resolved' });
      console.log(`Resolved dead letter event: ${item.eventId}`);
    } catch (error) {
      await db.deadLetterQueue.update(item.id, {
        retryCount: item.retryCount + 1,
        lastError: String(error),
      });
    }
  }
}

Pattern 5: Event Ordering

Webhooks may arrive out of order. Handle this:

// Problem: "subscription.updated" arrives before "subscription.created"
// Solution: Use event timestamps and idempotent operations

async function handleSubscriptionEvent(event: WebhookEvent) {
  const subscription = event.data.object;

  await db.subscriptions.upsert({
    id: subscription.id,
    // Only update if this event is newer than what we have
    where: {
      id: subscription.id,
      updatedAt: { lt: new Date(event.created * 1000) },
    },
    create: {
      id: subscription.id,
      status: subscription.status,
      customerId: subscription.customer,
      updatedAt: new Date(event.created * 1000),
    },
    update: {
      status: subscription.status,
      updatedAt: new Date(event.created * 1000),
    },
  });
}

Pattern 6: Monitoring Webhook Health

class WebhookMonitor {
  async recordEvent(eventId: string, type: string, status: 'received' | 'processed' | 'failed') {
    await db.webhookMetrics.create({
      eventId,
      type,
      status,
      timestamp: new Date(),
    });
  }

  async getHealth(hours: number = 24) {
    const since = new Date(Date.now() - hours * 3600000);
    const events = await db.webhookMetrics.findMany({
      where: { timestamp: { gte: since } },
    });

    const received = events.filter(e => e.status === 'received').length;
    const processed = events.filter(e => e.status === 'processed').length;
    const failed = events.filter(e => e.status === 'failed').length;

    return {
      received,
      processed,
      failed,
      successRate: processed / received,
      failureRate: failed / received,
      alert: failed / received > 0.05 ? 'HIGH' : 'OK',
    };
  }
}

Testing Webhooks

// Generate test webhook events locally
import Stripe from 'stripe';

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

// Stripe CLI for local testing
// stripe listen --forward-to localhost:3000/webhooks/stripe
// stripe trigger payment_intent.succeeded

// Programmatic test
test('handles payment succeeded webhook', async () => {
  const event = {
    id: 'evt_test_123',
    type: 'payment_intent.succeeded',
    created: Math.floor(Date.now() / 1000),
    data: {
      object: {
        id: 'pi_test_456',
        amount: 2000,
        status: 'succeeded',
        customer: 'cus_test_789',
      },
    },
  };

  const payload = JSON.stringify(event);
  const signature = stripe.webhooks.generateTestHeaderString({
    payload,
    secret: WEBHOOK_SECRET,
  });

  const response = await app.inject({
    method: 'POST',
    url: '/webhooks/stripe',
    headers: {
      'stripe-signature': signature,
      'content-type': 'application/json',
    },
    body: payload,
  });

  expect(response.statusCode).toBe(200);
  const order = await db.orders.findByPaymentIntent('pi_test_456');
  expect(order.status).toBe('paid');
});

Common Mistakes

MistakeImpactFix
Processing synchronouslyHandler timeouts, missed eventsAcknowledge fast, process async
No idempotencyDuplicate processing on retriesCheck event ID before processing
Parsing body before signature checkSignature verification failsUse raw body string for verification
No dead letter queueFailed events lost foreverStore failed events for manual retry
Assuming event orderRace conditions, data inconsistencyUse timestamps, idempotent operations
No webhook monitoringDon't know when things breakTrack success/failure rates

Find APIs with the best webhook support on APIScout — retry policies, signature verification docs, and event catalogs.

Comments