Cloudflare Workers AI vs AWS Bedrock vs Azure OpenAI
TL;DR
If you're already on AWS, use Bedrock. Already on Azure, use Azure OpenAI. Building edge apps, use Cloudflare Workers AI. These three managed AI platforms differ more in philosophy than capability: Cloudflare runs inference at 300+ edge locations (sub-50ms globally), AWS Bedrock gives you the widest enterprise model catalog with built-in governance, and Azure OpenAI gives you GPT-4o with enterprise SLAs and compliance certifications. None of them is best in isolation — your existing cloud infrastructure almost always determines the right choice.
Key Takeaways
- Cloudflare Workers AI: fastest globally distributed inference, 50+ models, $0.011-0.055/1K neurons, no cold starts
- AWS Bedrock: 50+ models (Llama, Mistral, Claude, Titan), pay-per-token, IAM integration, best for existing AWS infrastructure
- Azure OpenAI: GPT-4o + o1 with enterprise SLAs, private endpoints, SOC2/HIPAA compliance, best for Microsoft shops
- Latency: Cloudflare ~30-50ms globally, AWS/Azure ~100-300ms from non-primary regions
- Enterprise compliance: Azure OpenAI wins (HIPAA, FedRAMP), Bedrock close behind
- Multi-model access: Bedrock wins (Claude, Llama, Mistral, Titan, Cohere, AI21 all unified)
Cloudflare Workers AI: Inference at the Edge
Best for: globally distributed apps, low-latency requirements, serverless-first architecture, Cloudflare Workers projects
Cloudflare runs AI inference on their global network — the same infrastructure that handles ~20% of web traffic. When your user is in Tokyo, the model runs in Tokyo. In Frankfurt, Frankfurt.
// workers-ai.ts — Running inference at the edge:
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { prompt } = await request.json<{ prompt: string }>();
// Text generation:
const response = await env.AI.run('@cf/meta/llama-3.3-70b-instruct', {
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: prompt },
],
max_tokens: 512,
stream: false,
});
return Response.json(response);
},
};
// Streaming response from Workers AI:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const stream = await env.AI.run('@cf/meta/llama-3.3-70b-instruct', {
messages: [{ role: 'user', content: 'Explain DNS in 3 sentences.' }],
stream: true,
});
// stream is a ReadableStream of SSE chunks:
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
},
});
},
};
# wrangler.toml:
name = "ai-worker"
main = "src/index.ts"
compatibility_date = "2026-01-01"
[ai]
binding = "AI"
Workers AI Models (2026)
Text Generation:
@cf/meta/llama-3.3-70b-instruct — Latest Llama, best quality
@cf/meta/llama-3.1-8b-instruct — Fast, cheap, good for simple tasks
@cf/mistral/mistral-7b-instruct-v0.2 — Good general purpose
@cf/qwen/qwen1.5-7b-chat-awq — Efficient, multilingual
Code Generation:
@cf/deepseek-ai/deepseek-coder-6.7b-instruct — Code-specialized
@hf/thebloke/deepseek-coder-6.7b-instruct-awq
Text Embeddings:
@cf/baai/bge-small-en-v1.5 — 384 dims, fast
@cf/baai/bge-large-en-v1.5 — 1024 dims, better quality
Image Generation:
@cf/stabilityai/stable-diffusion-xl-base-1.0
@cf/bytedance/stable-diffusion-xl-lightning
Vision:
@cf/llava-hf/llava-1.5-7b-hf — Vision + text
Speech:
@cf/openai/whisper — Transcription
Workers AI Pricing
Billing unit: "Neurons" (compute units)
Text generation (Llama 70B):
Input: ~0.027 neurons/token
Output: ~0.027 neurons/token
Cost: $0.011/1K neurons
Effective: ~$0.30-0.60/M tokens (varies by model)
Free tier: 10,000 neurons/day
Workers Paid plan ($5/month): 10M neurons/month included
Beyond: $0.011/1K neurons
Workers AI is significantly cheaper for medium loads because the $5/month Workers plan includes a generous neuron allocation.
AWS Bedrock: Enterprise Model Catalog
Best for: AWS-heavy organizations, teams needing model governance, fine-tuning requirements, RAG with S3 data
AWS Bedrock is not just one model — it's a managed gateway to 50+ foundation models from Meta, Mistral, Anthropic, AI21, Cohere, and Amazon's own Titan models, all accessible with the same IAM-authenticated API.
// AWS Bedrock with @aws-sdk/client-bedrock-runtime:
import {
BedrockRuntimeClient,
InvokeModelCommand,
InvokeModelWithResponseStreamCommand,
} from '@aws-sdk/client-bedrock-runtime';
const client = new BedrockRuntimeClient({
region: 'us-east-1',
// Uses IAM role or env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
});
// Invoke Llama 3.3 70B:
async function invokeLlama(prompt: string) {
const payload = {
prompt: `<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n${prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>`,
max_gen_len: 512,
temperature: 0.7,
top_p: 0.9,
};
const command = new InvokeModelCommand({
modelId: 'meta.llama3-3-70b-instruct-v1:0',
body: JSON.stringify(payload),
contentType: 'application/json',
accept: 'application/json',
});
const response = await client.send(command);
const decoded = JSON.parse(Buffer.from(response.body).toString('utf-8'));
return decoded.generation;
}
// Bedrock Converse API — unified format across ALL models:
import { ConverseCommand } from '@aws-sdk/client-bedrock-runtime';
async function converse(modelId: string, userMessage: string) {
const command = new ConverseCommand({
modelId,
messages: [
{ role: 'user', content: [{ text: userMessage }] },
],
inferenceConfig: {
maxTokens: 512,
temperature: 0.7,
},
});
const response = await client.send(command);
return response.output?.message?.content?.[0]?.text;
}
// Same API works for ALL Bedrock models:
await converse('meta.llama3-3-70b-instruct-v1:0', 'Hello');
await converse('mistral.mistral-7b-instruct-v0:2', 'Hello');
await converse('anthropic.claude-3-5-sonnet-20241022-v2:0', 'Hello');
await converse('amazon.titan-text-premier-v1:0', 'Hello');
// Bedrock streaming with Converse:
import { ConverseStreamCommand } from '@aws-sdk/client-bedrock-runtime';
async function* converseStream(modelId: string, message: string) {
const command = new ConverseStreamCommand({
modelId,
messages: [{ role: 'user', content: [{ text: message }] }],
});
const response = await client.send(command);
for await (const event of response.stream!) {
if (event.contentBlockDelta?.delta?.text) {
yield event.contentBlockDelta.delta.text;
}
}
}
// Usage:
for await (const chunk of converseStream('meta.llama3-3-70b-instruct-v1:0', 'Explain TLS')) {
process.stdout.write(chunk);
}
Bedrock Model Catalog (Key Models)
| Provider | Model ID | Notes |
|---|---|---|
| Meta | meta.llama3-3-70b-instruct-v1:0 | Best open model on Bedrock |
| Mistral | mistral.mistral-large-2402-v1:0 | Strong reasoning |
| Anthropic | anthropic.claude-3-5-sonnet-20241022-v2:0 | Best quality, higher cost |
| Amazon | amazon.titan-text-premier-v1:0 | AWS-native, good for RAG |
| Cohere | cohere.command-r-plus-v1:0 | Best for long-context RAG |
| AI21 | ai21.jamba-1-5-large-v1:0 | Long context (256K) |
Bedrock Knowledge Bases (RAG Built-In)
Bedrock's killer feature for enterprise: managed RAG with S3 data sources.
import {
BedrockAgentRuntimeClient,
RetrieveAndGenerateCommand,
} from '@aws-sdk/client-bedrock-agent-runtime';
const agentClient = new BedrockAgentRuntimeClient({ region: 'us-east-1' });
async function ragQuery(query: string, knowledgeBaseId: string) {
const command = new RetrieveAndGenerateCommand({
input: { text: query },
retrieveAndGenerateConfiguration: {
type: 'KNOWLEDGE_BASE',
knowledgeBaseConfiguration: {
knowledgeBaseId,
modelArn: 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0',
retrievalConfiguration: {
vectorSearchConfiguration: { numberOfResults: 5 },
},
},
},
});
const response = await agentClient.send(command);
return {
answer: response.output?.text,
citations: response.citations,
};
}
Bedrock Pricing
On-Demand pricing (us-east-1):
Model | Input $/1M | Output $/1M
-----------------------------|------------|------------
Llama 3.3 70B Instruct | $0.72 | $0.99
Mistral Large (2402) | $4.00 | $12.00
Claude 3.5 Sonnet | $3.00 | $15.00
Amazon Titan Text Premier | $0.50 | $1.50
Cohere Command R+ | $3.00 | $15.00
Provisioned Throughput: pre-buy capacity for consistent high-volume use
Azure OpenAI: Enterprise GPT with Microsoft Compliance
Best for: organizations requiring HIPAA/FedRAMP compliance, Microsoft/Azure shops, GPT-4o access with enterprise SLAs, government and healthcare
Azure OpenAI is Microsoft-hosted OpenAI models — the exact same models (GPT-4o, o1, DALL-E 3) but with Azure's enterprise wrapper: private endpoints, customer-managed encryption keys, content filtering policies, audit logs, and compliance certifications.
// Azure OpenAI uses the same SDK as OpenAI — just different endpoint:
import OpenAI from 'openai';
const azureClient = new OpenAI({
apiKey: process.env.AZURE_OPENAI_API_KEY,
baseURL: `https://${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_DEPLOYMENT_NAME}`,
defaultQuery: { 'api-version': '2024-12-01-preview' },
defaultHeaders: { 'api-key': process.env.AZURE_OPENAI_API_KEY },
});
// Or use the official Azure OpenAI SDK for stronger typing:
import { AzureOpenAI } from 'openai';
const client = new AzureOpenAI({
endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
apiKey: process.env.AZURE_OPENAI_API_KEY!,
apiVersion: '2024-12-01-preview',
deployment: 'gpt-4o', // Your deployment name
});
// Chat completion (identical to OpenAI SDK):
const response = await client.chat.completions.create({
model: 'gpt-4o', // Uses your deployment name
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Summarize this document...' },
],
max_tokens: 1024,
temperature: 0.7,
});
console.log(response.choices[0].message.content);
// Azure OpenAI with function calling (same as OpenAI):
const tools = [
{
type: 'function' as const,
function: {
name: 'search_documents',
description: 'Search enterprise document store',
strict: true,
parameters: {
type: 'object',
properties: {
query: { type: 'string' },
department: { type: 'string', enum: ['legal', 'hr', 'finance', 'engineering'] },
date_range: { type: 'string', description: 'ISO date range, e.g. 2025-01-01/2025-12-31' },
},
required: ['query'],
additionalProperties: false,
},
},
},
];
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Find all legal contracts from Q4 2025' }],
tools,
tool_choice: 'auto',
});
// Azure OpenAI Assistants API (full stateful conversations):
const assistant = await client.beta.assistants.create({
name: 'Enterprise Support Agent',
instructions: 'You help employees with HR and IT questions.',
model: 'gpt-4o',
tools: [{ type: 'file_search' }, { type: 'code_interpreter' }],
});
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
role: 'user',
content: 'What is the vacation policy for US employees?',
});
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: assistant.id,
});
Azure OpenAI Models (2026)
| Model | Deployment | Notes |
|---|---|---|
| GPT-4o | gpt-4o | Latest, best quality |
| GPT-4o mini | gpt-4o-mini | Cheaper, fast |
| o1 | o1 | Reasoning model |
| o3-mini | o3-mini | Fast reasoning |
| DALL-E 3 | dall-e-3 | Image generation |
| Whisper | whisper | Speech transcription |
| text-embedding-3-large | text-embedding-3-large | Best embeddings |
Enterprise Features Only Azure Has
✅ Private Endpoints — Your traffic never leaves Azure backbone
✅ Customer-Managed Keys — Encrypt with your own Azure Key Vault keys
✅ Content Filtering — Customizable harm categories per deployment
✅ Managed Identity — No API keys needed, uses Azure AD
✅ Compliance: SOC2, HIPAA, FedRAMP High, ISO 27001, GDPR
✅ No data training: Your data is NOT used to train OpenAI models
✅ Regional deployment: Choose EU regions for data residency
✅ 99.9% uptime SLA (OpenAI has no SLA guarantee)
Azure Authentication Without API Keys
// Production pattern: use Managed Identity (no API keys):
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import { AzureOpenAI } from 'openai';
const credential = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(credential, scope);
const client = new AzureOpenAI({
endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
azureADTokenProvider, // No API key needed when running in Azure
apiVersion: '2024-12-01-preview',
deployment: 'gpt-4o',
});
Side-by-Side Comparison
| Cloudflare Workers AI | AWS Bedrock | Azure OpenAI | |
|---|---|---|---|
| Best for | Edge latency, global apps | AWS orgs, multi-model | Microsoft orgs, GPT-4o |
| Latency | ~30-50ms globally | ~100-300ms | ~100-200ms (US) |
| Model count | 50+ | 50+ | OpenAI models only |
| GPT-4o access | ❌ | ❌ | ✅ |
| Llama access | ✅ | ✅ | ❌ |
| Claude access | ❌ | ✅ | ❌ |
| HIPAA | Limited | ✅ | ✅ |
| Private endpoint | ✅ Workers | ✅ VPC | ✅ Private Link |
| Fine-tuning | ❌ | Limited | ✅ (GPT-4o mini) |
| RAG built-in | ✅ Vectorize | ✅ Knowledge Bases | ✅ AI Search |
| Free tier | 10K neurons/day | No free tier | $200 credit |
| Pricing model | Per neuron | Per token | Per token |
Decision Framework
Choose CLOUDFLARE WORKERS AI if:
→ Your app already runs on Cloudflare Workers
→ Globally distributed users, latency is critical
→ You need inference with zero cold starts
→ Simple use case: Llama, embeddings, image gen
Choose AWS BEDROCK if:
→ Your infrastructure is on AWS
→ You need access to multiple model providers (Llama, Claude, Mistral, Titan)
→ You want managed RAG with S3 data sources
→ Enterprise governance and model access control is required
→ You want Claude without going to Anthropic directly
Choose AZURE OPENAI if:
→ You need GPT-4o or o1 specifically
→ HIPAA, FedRAMP, or government compliance required
→ Your organization is Microsoft-certified / Azure-native
→ You need private endpoints and no data training guarantee
→ Existing Azure AD for authentication (Managed Identity)
Use NONE of these if:
→ You're a startup/indie dev (use OpenAI/Anthropic direct — simpler, cheaper)
→ You need bleeding-edge models first (hyperscalers lag direct providers by weeks)
→ You're optimizing for cost (direct APIs are cheaper, no cloud markup)
Code: Switching Between Providers
Since all three are largely OpenAI-compatible:
// Universal client factory:
import OpenAI from 'openai';
type CloudProvider = 'cloudflare' | 'bedrock' | 'azure' | 'openai';
function createClient(provider: CloudProvider): OpenAI {
switch (provider) {
case 'azure':
return new OpenAI({
apiKey: process.env.AZURE_OPENAI_API_KEY,
baseURL: `${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_DEPLOYMENT}`,
defaultQuery: { 'api-version': '2024-12-01-preview' },
});
case 'openai':
return new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Bedrock has its own SDK (not OpenAI-compatible at the SDK level)
// Use bedrock-compatible OpenAI endpoint via LiteLLM or direct SDK
case 'bedrock':
return new OpenAI({
apiKey: 'bedrock', // Placeholder — real auth via AWS credentials
baseURL: 'https://bedrock-runtime.us-east-1.amazonaws.com/model',
});
default:
throw new Error(`Unknown provider: ${provider}`);
}
}
Discover and compare managed AI APIs at APIScout.