The Art of API Migration: Switching Providers Without Downtime
The Art of API Migration: Switching Providers Without Downtime
Switching API providers is the project nobody wants. It's risky, time-consuming, and usually triggered by something painful — a price hike, an outage, a deprecation notice. But done right, a migration can be smooth, zero-downtime, and even improve your system. Here's the playbook.
Why Companies Migrate
| Trigger | Frequency | Urgency |
|---|---|---|
| Price increase | Common | Medium — negotiate first |
| Better alternative exists | Common | Low — plan carefully |
| Reliability issues | Occasional | High — after major incident |
| Feature gap | Occasional | Medium — evaluate alternatives |
| Acquisition/deprecation | Rare | High — forced migration |
| Compliance requirement | Rare | High — regulatory deadline |
| Vendor lock-in escape | Occasional | Low — strategic decision |
The Migration Playbook
Phase 1: Assessment (1-2 weeks)
Before writing any code, answer these questions:
## Migration Assessment Checklist
### Current State
- [ ] Document all endpoints you use (not all available — just what you call)
- [ ] List all data stored with the current provider
- [ ] Map all webhook handlers and event types
- [ ] Identify SDK usage across your codebase
- [ ] Check contractual obligations (notice period, data export rights)
- [ ] Measure current performance baselines (latency, uptime, error rate)
### Target State
- [ ] Verify feature parity for YOUR use cases
- [ ] Test target provider's API with your actual data shapes
- [ ] Compare pricing at your usage level (not just list price)
- [ ] Check SDK quality (types, error handling, documentation)
- [ ] Verify compliance requirements (SOC 2, GDPR, etc.)
### Migration Scope
- [ ] Estimate code changes (endpoints, models, error handling)
- [ ] Identify data migration needs (users, subscriptions, history)
- [ ] List integration points (webhooks, SDKs, admin dashboards)
- [ ] Assess team training needs
- [ ] Set rollback criteria
Phase 2: Abstraction Layer (1 week)
If you don't already have one, add an abstraction layer:
// Create an interface that abstracts the provider
interface EmailProvider {
sendEmail(params: {
to: string;
subject: string;
html: string;
from?: string;
}): Promise<{ id: string }>;
getEmailStatus(id: string): Promise<'delivered' | 'bounced' | 'pending'>;
}
// Current provider implementation
class SendGridProvider implements EmailProvider {
async sendEmail(params) {
const response = await sgMail.send({
to: params.to,
from: params.from || 'hello@company.com',
subject: params.subject,
html: params.html,
});
return { id: response[0].headers['x-message-id'] };
}
async getEmailStatus(id: string) { /* ... */ }
}
// New provider implementation (write alongside, don't replace yet)
class ResendProvider implements EmailProvider {
async sendEmail(params) {
const result = await resend.emails.send({
to: params.to,
from: params.from || 'hello@company.com',
subject: params.subject,
html: params.html,
});
return { id: result.data!.id };
}
async getEmailStatus(id: string) { /* ... */ }
}
Key principle: Make the switch a configuration change, not a code change.
Phase 3: Parallel Running (1-2 weeks)
Run both providers simultaneously to verify behavior:
class DualEmailProvider implements EmailProvider {
constructor(
private primary: EmailProvider, // Current (SendGrid)
private secondary: EmailProvider, // New (Resend)
private shadowPercent: number = 10, // % of traffic to shadow
) {}
async sendEmail(params) {
// Always send through primary
const result = await this.primary.sendEmail(params);
// Shadow send through secondary (don't fail if it errors)
if (Math.random() * 100 < this.shadowPercent) {
try {
const shadowResult = await this.secondary.sendEmail({
...params,
to: `shadow-test+${Date.now()}@company.com`, // Don't email real users!
});
this.logComparison(result, shadowResult);
} catch (error) {
this.logShadowError(error);
}
}
return result;
}
private logComparison(primary: any, secondary: any) {
// Compare response times, formats, behavior
console.log('Shadow comparison:', { primary, secondary });
}
}
Shadow testing rules:
- Never send shadow traffic to real users
- Use test/sandbox endpoints or internal addresses
- Compare response formats, latency, error handling
- Run for at least 1 week before switching
Phase 4: Data Migration
// Data migration depends on category:
// PAYMENT MIGRATION (Stripe → other)
// Most complex — must migrate:
// - Customer records
// - Payment methods (usually NOT portable — re-collect)
// - Subscription data
// - Transaction history (for your records, not the new provider)
// EMAIL MIGRATION (SendGrid → Resend)
// Moderate — migrate:
// - DNS records (SPF, DKIM, DMARC)
// - Sender verification
// - Template mappings
// - Suppression lists (bounced emails)
// AUTH MIGRATION (Auth0 → Clerk)
// Complex — migrate:
// - User accounts (password hashes may not be portable)
// - Social connections
// - MFA settings
// - Session management
// - RBAC policies
// SEARCH MIGRATION (Algolia → Typesense)
// Moderate — migrate:
// - Index data (re-index from your database)
// - Search configuration (relevance, synonyms, filters)
// - API query format changes
Phase 5: Traffic Cutover
// Gradual traffic shift using feature flags
class MigratingEmailProvider implements EmailProvider {
constructor(
private old: EmailProvider,
private new_: EmailProvider,
) {}
async sendEmail(params) {
// Feature flag controls rollout
const useNew = await featureFlag.isEnabled('use-resend', {
percent: getPhase(), // 0% → 10% → 50% → 100%
});
if (useNew) {
try {
return await this.new_.sendEmail(params);
} catch (error) {
// Fallback to old provider on error during migration
console.error('New provider failed, falling back:', error);
return await this.old.sendEmail(params);
}
}
return await this.old.sendEmail(params);
}
}
// Rollout schedule:
// Day 1: 0% (shadow testing only)
// Day 3: 10% (early adopters, monitor closely)
// Day 5: 25% (broader testing)
// Day 7: 50% (half traffic)
// Day 10: 100% (full migration)
// Day 17: Remove old provider code
Phase 6: Cleanup
## Post-Migration Checklist
- [ ] Old provider SDK removed from dependencies
- [ ] Old provider env vars removed
- [ ] Feature flags cleaned up
- [ ] Old webhook endpoints decommissioned
- [ ] DNS records updated (email: SPF/DKIM)
- [ ] Monitoring updated for new provider
- [ ] Team documentation updated
- [ ] Old provider account downgraded or closed
- [ ] Data export from old provider (for records)
- [ ] Runbook updated with new provider procedures
Category-Specific Migration Guides
Payment Provider Migration
Difficulty: Very High
Key challenges:
- Payment methods can't be transferred (cards must be re-collected)
- Active subscriptions need careful handling
- PCI compliance during transition
- Financial reconciliation
Approach:
1. New users → new provider immediately
2. Existing users → dual-write during transition
3. Subscription renewal → migrate at next billing cycle
4. Communicate to customers about re-entering payment info
Auth Provider Migration
Difficulty: High
Key challenges:
- Password hashes may use different algorithms
- Social connection tokens need re-authorization
- Active sessions during cutover
- MFA device re-enrollment
Approach:
1. Bulk import users (most auth providers support this)
2. Force password reset for users with non-portable hashes
3. Social logins: re-link on next login
4. Cut over login page, not sessions (existing sessions stay valid)
Email Provider Migration
Difficulty: Medium
Key challenges:
- DNS propagation for SPF/DKIM
- IP reputation with new provider
- Suppression list transfer
- Template format differences
Approach:
1. Set up DNS records for new provider alongside old
2. Warm up new provider's sending reputation
3. Import suppression lists
4. Migrate templates
5. Switch traffic gradually
Search Provider Migration
Difficulty: Medium
Key challenges:
- Query syntax differences
- Relevance tuning needs re-work
- Re-indexing all data
- Search analytics continuity
Approach:
1. Re-index from your source of truth (database, not old index)
2. A/B test search quality before full switch
3. Map old query syntax to new
4. Monitor search metrics after switch
Rollback Plan
Every migration needs a rollback plan:
// Rollback criteria (define BEFORE starting)
const ROLLBACK_CRITERIA = {
errorRate: 0.05, // >5% error rate
latencyP99: 2000, // >2s P99 latency
downtime: 60, // >60 seconds downtime
dataLoss: 0, // Any data loss = immediate rollback
};
// Rollback procedure
async function rollback() {
// 1. Switch feature flag to 0% (all traffic to old provider)
await featureFlag.disable('use-new-provider');
// 2. Verify old provider is handling traffic
await healthCheck.verify('old-provider');
// 3. Alert team
await alert('API migration rolled back — investigating');
// 4. Do NOT delete new provider setup (may resume later)
}
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| Big-bang cutover | All-or-nothing, no rollback | Gradual traffic shift |
| No abstraction layer | Migration requires changing every file | Build abstraction first |
| Skipping parallel running | Bugs found in production | Shadow test for 1+ week |
| Forgetting webhook migration | Missing events after switch | Migrate webhooks BEFORE cutover |
| Migrating data, not re-syncing | Stale data in new provider | Re-sync from source of truth |
| No rollback plan | Can't recover if migration fails | Define rollback criteria upfront |
| Rushing to delete old provider | No fallback if issues emerge | Keep old provider active for 30 days |
Compare API providers for easy migration on APIScout — feature parity checks, migration guides, and vendor comparison tools.