How to Handle Webhook Failures and Retries
How to Handle Webhook Failures and Retries
Webhooks are fire-and-forget from the sender's perspective. If your handler crashes, times out, or returns an error, the webhook provider retries — sometimes for days. Handling this correctly means your app processes every event exactly once, even when things go wrong.
How Webhook Retries Work
Provider Retry Policies
| Provider | Max Retries | Retry Schedule | Timeout |
|---|---|---|---|
| Stripe | ~15 over 3 days | Exponential backoff | 20 seconds |
| GitHub | 3 | 10s, 60s, 360s | 10 seconds |
| Twilio | Up to 14 | Exponential | 15 seconds |
| Shopify | 19 over 48 hours | Exponential | 5 seconds |
| PayPal | 15 over 3 days | Exponential | 30 seconds |
| Clerk | Multiple over 3 days | Exponential | 30 seconds |
What Triggers a Retry
| Response | Provider Action |
|---|---|
| 2xx (200-299) | ✅ Success — no retry |
| 3xx (redirect) | ❌ Treated as failure, retries |
| 4xx (client error) | ⚠️ Varies — some providers stop, others retry |
| 5xx (server error) | ❌ Retry with backoff |
| Timeout | ❌ Retry with backoff |
| Connection refused | ❌ Retry with backoff |
Pattern 1: Fast Acknowledgment
Return 200 immediately, process asynchronously:
// ❌ Bad: Process synchronously (can timeout)
app.post('/webhooks/stripe', async (req, res) => {
const event = verifySignature(req);
await updateDatabase(event); // 500ms
await sendNotification(event); // 300ms
await updateAnalytics(event); // 200ms
res.status(200).send('OK'); // Total: 1s+ (might timeout)
});
// ✅ Good: Acknowledge fast, process async
app.post('/webhooks/stripe', async (req, res) => {
// 1. Verify signature (fast — <10ms)
const event = verifySignature(req);
// 2. Store raw event (fast — <50ms)
await db.webhookEvents.create({
id: event.id,
type: event.type,
payload: event,
status: 'pending',
receivedAt: new Date(),
});
// 3. Acknowledge immediately
res.status(200).send('OK');
// 4. Process asynchronously
processWebhookAsync(event).catch(error => {
console.error('Webhook processing failed:', error);
});
});
Pattern 2: Idempotent Processing
Webhooks can be delivered multiple times. Process each event exactly once:
async function processWebhookEvent(event: WebhookEvent): Promise<void> {
// Check if already processed
const existing = await db.webhookEvents.findById(event.id);
if (existing?.status === 'processed') {
console.log(`Event ${event.id} already processed, skipping`);
return;
}
// Use a transaction to prevent race conditions
await db.transaction(async (tx) => {
// Double-check inside transaction (another worker might have started)
const locked = await tx.webhookEvents.findByIdForUpdate(event.id);
if (locked?.status === 'processed') return;
// Process the event
await handleEvent(event, tx);
// Mark as processed
await tx.webhookEvents.update(event.id, {
status: 'processed',
processedAt: new Date(),
});
});
}
async function handleEvent(event: WebhookEvent, tx: Transaction) {
switch (event.type) {
case 'payment_intent.succeeded':
// Use idempotency key for downstream operations too
await fulfillOrder(event.data.object.id, tx);
break;
case 'customer.subscription.deleted':
await deactivateSubscription(event.data.object.id, tx);
break;
// ... other event types
}
}
Pattern 3: Signature Verification
Always verify webhook signatures to prevent forgery:
import crypto from 'crypto';
// Stripe signature verification
function verifyStripeSignature(
payload: string, // Raw body string, NOT parsed JSON
signature: string,
secret: string
): boolean {
const elements = signature.split(',');
const timestamp = elements.find(e => e.startsWith('t='))?.slice(2);
const v1Signature = elements.find(e => e.startsWith('v1='))?.slice(3);
if (!timestamp || !v1Signature) return false;
// Prevent replay attacks (reject if older than 5 minutes)
const now = Math.floor(Date.now() / 1000);
if (now - parseInt(timestamp) > 300) return false;
const signedPayload = `${timestamp}.${payload}`;
const expected = crypto
.createHmac('sha256', secret)
.update(signedPayload)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(v1Signature),
Buffer.from(expected)
);
}
// Generic HMAC verification (works for most providers)
function verifyHmacSignature(
payload: string,
signature: string,
secret: string,
algorithm: string = 'sha256'
): boolean {
const expected = crypto
.createHmac(algorithm, secret)
.update(payload)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expected)
);
}
Critical: Read the raw request body as a string, NOT parsed JSON. Parsing then re-stringifying changes the payload and breaks signature verification.
// Next.js App Router — get raw body
export async function POST(request: Request) {
const rawBody = await request.text();
const signature = request.headers.get('stripe-signature')!;
if (!verifyStripeSignature(rawBody, signature, WEBHOOK_SECRET)) {
return new Response('Invalid signature', { status: 401 });
}
const event = JSON.parse(rawBody);
// ... process event
}
Pattern 4: Dead Letter Queue
When processing fails after all retries, don't lose the event:
class WebhookProcessor {
async process(event: WebhookEvent): Promise<void> {
const MAX_INTERNAL_RETRIES = 3;
for (let attempt = 0; attempt < MAX_INTERNAL_RETRIES; attempt++) {
try {
await this.handleEvent(event);
await this.markProcessed(event.id);
return;
} catch (error) {
console.error(`Attempt ${attempt + 1} failed for event ${event.id}:`, error);
if (attempt < MAX_INTERNAL_RETRIES - 1) {
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
}
}
}
// All retries failed — move to dead letter queue
await this.moveToDeadLetter(event);
}
private async moveToDeadLetter(event: WebhookEvent) {
await db.deadLetterQueue.create({
eventId: event.id,
eventType: event.type,
payload: event,
failedAt: new Date(),
retryCount: 0,
});
// Alert team
await alertSlack(`⚠️ Webhook event failed permanently: ${event.type} (${event.id})`);
}
}
// Admin tool: retry dead letter events
async function retryDeadLetterEvents() {
const failed = await db.deadLetterQueue.findAll({ status: 'failed' });
for (const item of failed) {
try {
await processor.handleEvent(item.payload);
await db.deadLetterQueue.update(item.id, { status: 'resolved' });
console.log(`Resolved dead letter event: ${item.eventId}`);
} catch (error) {
await db.deadLetterQueue.update(item.id, {
retryCount: item.retryCount + 1,
lastError: String(error),
});
}
}
}
Pattern 5: Event Ordering
Webhooks may arrive out of order. Handle this:
// Problem: "subscription.updated" arrives before "subscription.created"
// Solution: Use event timestamps and idempotent operations
async function handleSubscriptionEvent(event: WebhookEvent) {
const subscription = event.data.object;
await db.subscriptions.upsert({
id: subscription.id,
// Only update if this event is newer than what we have
where: {
id: subscription.id,
updatedAt: { lt: new Date(event.created * 1000) },
},
create: {
id: subscription.id,
status: subscription.status,
customerId: subscription.customer,
updatedAt: new Date(event.created * 1000),
},
update: {
status: subscription.status,
updatedAt: new Date(event.created * 1000),
},
});
}
Pattern 6: Monitoring Webhook Health
class WebhookMonitor {
async recordEvent(eventId: string, type: string, status: 'received' | 'processed' | 'failed') {
await db.webhookMetrics.create({
eventId,
type,
status,
timestamp: new Date(),
});
}
async getHealth(hours: number = 24) {
const since = new Date(Date.now() - hours * 3600000);
const events = await db.webhookMetrics.findMany({
where: { timestamp: { gte: since } },
});
const received = events.filter(e => e.status === 'received').length;
const processed = events.filter(e => e.status === 'processed').length;
const failed = events.filter(e => e.status === 'failed').length;
return {
received,
processed,
failed,
successRate: processed / received,
failureRate: failed / received,
alert: failed / received > 0.05 ? 'HIGH' : 'OK',
};
}
}
Testing Webhooks
// Generate test webhook events locally
import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
// Stripe CLI for local testing
// stripe listen --forward-to localhost:3000/webhooks/stripe
// stripe trigger payment_intent.succeeded
// Programmatic test
test('handles payment succeeded webhook', async () => {
const event = {
id: 'evt_test_123',
type: 'payment_intent.succeeded',
created: Math.floor(Date.now() / 1000),
data: {
object: {
id: 'pi_test_456',
amount: 2000,
status: 'succeeded',
customer: 'cus_test_789',
},
},
};
const payload = JSON.stringify(event);
const signature = stripe.webhooks.generateTestHeaderString({
payload,
secret: WEBHOOK_SECRET,
});
const response = await app.inject({
method: 'POST',
url: '/webhooks/stripe',
headers: {
'stripe-signature': signature,
'content-type': 'application/json',
},
body: payload,
});
expect(response.statusCode).toBe(200);
const order = await db.orders.findByPaymentIntent('pi_test_456');
expect(order.status).toBe('paid');
});
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| Processing synchronously | Handler timeouts, missed events | Acknowledge fast, process async |
| No idempotency | Duplicate processing on retries | Check event ID before processing |
| Parsing body before signature check | Signature verification fails | Use raw body string for verification |
| No dead letter queue | Failed events lost forever | Store failed events for manual retry |
| Assuming event order | Race conditions, data inconsistency | Use timestamps, idempotent operations |
| No webhook monitoring | Don't know when things break | Track success/failure rates |
Find APIs with the best webhook support on APIScout — retry policies, signature verification docs, and event catalogs.