Skip to main content

Cloudflare Workers AI vs AWS Bedrock vs Azure OpenAI

·APIScout Team
cloudflare-workers-aiaws-bedrockazure-openaimanaged-aienterprise-ai2026

TL;DR

If you're already on AWS, use Bedrock. Already on Azure, use Azure OpenAI. Building edge apps, use Cloudflare Workers AI. These three managed AI platforms differ more in philosophy than capability: Cloudflare runs inference at 300+ edge locations (sub-50ms globally), AWS Bedrock gives you the widest enterprise model catalog with built-in governance, and Azure OpenAI gives you GPT-4o with enterprise SLAs and compliance certifications. None of them is best in isolation — your existing cloud infrastructure almost always determines the right choice.

Key Takeaways

  • Cloudflare Workers AI: fastest globally distributed inference, 50+ models, $0.011-0.055/1K neurons, no cold starts
  • AWS Bedrock: 50+ models (Llama, Mistral, Claude, Titan), pay-per-token, IAM integration, best for existing AWS infrastructure
  • Azure OpenAI: GPT-4o + o1 with enterprise SLAs, private endpoints, SOC2/HIPAA compliance, best for Microsoft shops
  • Latency: Cloudflare ~30-50ms globally, AWS/Azure ~100-300ms from non-primary regions
  • Enterprise compliance: Azure OpenAI wins (HIPAA, FedRAMP), Bedrock close behind
  • Multi-model access: Bedrock wins (Claude, Llama, Mistral, Titan, Cohere, AI21 all unified)

Cloudflare Workers AI: Inference at the Edge

Best for: globally distributed apps, low-latency requirements, serverless-first architecture, Cloudflare Workers projects

Cloudflare runs AI inference on their global network — the same infrastructure that handles ~20% of web traffic. When your user is in Tokyo, the model runs in Tokyo. In Frankfurt, Frankfurt.

// workers-ai.ts — Running inference at the edge:
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { prompt } = await request.json<{ prompt: string }>();

    // Text generation:
    const response = await env.AI.run('@cf/meta/llama-3.3-70b-instruct', {
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: prompt },
      ],
      max_tokens: 512,
      stream: false,
    });

    return Response.json(response);
  },
};
// Streaming response from Workers AI:
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const stream = await env.AI.run('@cf/meta/llama-3.3-70b-instruct', {
      messages: [{ role: 'user', content: 'Explain DNS in 3 sentences.' }],
      stream: true,
    });

    // stream is a ReadableStream of SSE chunks:
    return new Response(stream, {
      headers: {
        'Content-Type': 'text/event-stream',
        'Cache-Control': 'no-cache',
      },
    });
  },
};
# wrangler.toml:
name = "ai-worker"
main = "src/index.ts"
compatibility_date = "2026-01-01"

[ai]
binding = "AI"

Workers AI Models (2026)

Text Generation:
  @cf/meta/llama-3.3-70b-instruct     — Latest Llama, best quality
  @cf/meta/llama-3.1-8b-instruct      — Fast, cheap, good for simple tasks
  @cf/mistral/mistral-7b-instruct-v0.2 — Good general purpose
  @cf/qwen/qwen1.5-7b-chat-awq        — Efficient, multilingual

Code Generation:
  @cf/deepseek-ai/deepseek-coder-6.7b-instruct — Code-specialized
  @hf/thebloke/deepseek-coder-6.7b-instruct-awq

Text Embeddings:
  @cf/baai/bge-small-en-v1.5         — 384 dims, fast
  @cf/baai/bge-large-en-v1.5         — 1024 dims, better quality

Image Generation:
  @cf/stabilityai/stable-diffusion-xl-base-1.0
  @cf/bytedance/stable-diffusion-xl-lightning

Vision:
  @cf/llava-hf/llava-1.5-7b-hf       — Vision + text

Speech:
  @cf/openai/whisper                  — Transcription

Workers AI Pricing

Billing unit: "Neurons" (compute units)

Text generation (Llama 70B):
  Input:  ~0.027 neurons/token
  Output: ~0.027 neurons/token
  Cost:   $0.011/1K neurons
  Effective: ~$0.30-0.60/M tokens (varies by model)

Free tier: 10,000 neurons/day
Workers Paid plan ($5/month): 10M neurons/month included
Beyond: $0.011/1K neurons

Workers AI is significantly cheaper for medium loads because the $5/month Workers plan includes a generous neuron allocation.


AWS Bedrock: Enterprise Model Catalog

Best for: AWS-heavy organizations, teams needing model governance, fine-tuning requirements, RAG with S3 data

AWS Bedrock is not just one model — it's a managed gateway to 50+ foundation models from Meta, Mistral, Anthropic, AI21, Cohere, and Amazon's own Titan models, all accessible with the same IAM-authenticated API.

// AWS Bedrock with @aws-sdk/client-bedrock-runtime:
import {
  BedrockRuntimeClient,
  InvokeModelCommand,
  InvokeModelWithResponseStreamCommand,
} from '@aws-sdk/client-bedrock-runtime';

const client = new BedrockRuntimeClient({
  region: 'us-east-1',
  // Uses IAM role or env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
});

// Invoke Llama 3.3 70B:
async function invokeLlama(prompt: string) {
  const payload = {
    prompt: `<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n${prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>`,
    max_gen_len: 512,
    temperature: 0.7,
    top_p: 0.9,
  };

  const command = new InvokeModelCommand({
    modelId: 'meta.llama3-3-70b-instruct-v1:0',
    body: JSON.stringify(payload),
    contentType: 'application/json',
    accept: 'application/json',
  });

  const response = await client.send(command);
  const decoded = JSON.parse(Buffer.from(response.body).toString('utf-8'));
  return decoded.generation;
}
// Bedrock Converse API — unified format across ALL models:
import { ConverseCommand } from '@aws-sdk/client-bedrock-runtime';

async function converse(modelId: string, userMessage: string) {
  const command = new ConverseCommand({
    modelId,
    messages: [
      { role: 'user', content: [{ text: userMessage }] },
    ],
    inferenceConfig: {
      maxTokens: 512,
      temperature: 0.7,
    },
  });

  const response = await client.send(command);
  return response.output?.message?.content?.[0]?.text;
}

// Same API works for ALL Bedrock models:
await converse('meta.llama3-3-70b-instruct-v1:0', 'Hello');
await converse('mistral.mistral-7b-instruct-v0:2', 'Hello');
await converse('anthropic.claude-3-5-sonnet-20241022-v2:0', 'Hello');
await converse('amazon.titan-text-premier-v1:0', 'Hello');
// Bedrock streaming with Converse:
import { ConverseStreamCommand } from '@aws-sdk/client-bedrock-runtime';

async function* converseStream(modelId: string, message: string) {
  const command = new ConverseStreamCommand({
    modelId,
    messages: [{ role: 'user', content: [{ text: message }] }],
  });

  const response = await client.send(command);

  for await (const event of response.stream!) {
    if (event.contentBlockDelta?.delta?.text) {
      yield event.contentBlockDelta.delta.text;
    }
  }
}

// Usage:
for await (const chunk of converseStream('meta.llama3-3-70b-instruct-v1:0', 'Explain TLS')) {
  process.stdout.write(chunk);
}

Bedrock Model Catalog (Key Models)

ProviderModel IDNotes
Metameta.llama3-3-70b-instruct-v1:0Best open model on Bedrock
Mistralmistral.mistral-large-2402-v1:0Strong reasoning
Anthropicanthropic.claude-3-5-sonnet-20241022-v2:0Best quality, higher cost
Amazonamazon.titan-text-premier-v1:0AWS-native, good for RAG
Coherecohere.command-r-plus-v1:0Best for long-context RAG
AI21ai21.jamba-1-5-large-v1:0Long context (256K)

Bedrock Knowledge Bases (RAG Built-In)

Bedrock's killer feature for enterprise: managed RAG with S3 data sources.

import {
  BedrockAgentRuntimeClient,
  RetrieveAndGenerateCommand,
} from '@aws-sdk/client-bedrock-agent-runtime';

const agentClient = new BedrockAgentRuntimeClient({ region: 'us-east-1' });

async function ragQuery(query: string, knowledgeBaseId: string) {
  const command = new RetrieveAndGenerateCommand({
    input: { text: query },
    retrieveAndGenerateConfiguration: {
      type: 'KNOWLEDGE_BASE',
      knowledgeBaseConfiguration: {
        knowledgeBaseId,
        modelArn: 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0',
        retrievalConfiguration: {
          vectorSearchConfiguration: { numberOfResults: 5 },
        },
      },
    },
  });

  const response = await agentClient.send(command);
  return {
    answer: response.output?.text,
    citations: response.citations,
  };
}

Bedrock Pricing

On-Demand pricing (us-east-1):

Model                        | Input $/1M  | Output $/1M
-----------------------------|------------|------------
Llama 3.3 70B Instruct       | $0.72      | $0.99
Mistral Large (2402)         | $4.00      | $12.00
Claude 3.5 Sonnet            | $3.00      | $15.00
Amazon Titan Text Premier    | $0.50      | $1.50
Cohere Command R+            | $3.00      | $15.00

Provisioned Throughput: pre-buy capacity for consistent high-volume use

Azure OpenAI: Enterprise GPT with Microsoft Compliance

Best for: organizations requiring HIPAA/FedRAMP compliance, Microsoft/Azure shops, GPT-4o access with enterprise SLAs, government and healthcare

Azure OpenAI is Microsoft-hosted OpenAI models — the exact same models (GPT-4o, o1, DALL-E 3) but with Azure's enterprise wrapper: private endpoints, customer-managed encryption keys, content filtering policies, audit logs, and compliance certifications.

// Azure OpenAI uses the same SDK as OpenAI — just different endpoint:
import OpenAI from 'openai';

const azureClient = new OpenAI({
  apiKey: process.env.AZURE_OPENAI_API_KEY,
  baseURL: `https://${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_DEPLOYMENT_NAME}`,
  defaultQuery: { 'api-version': '2024-12-01-preview' },
  defaultHeaders: { 'api-key': process.env.AZURE_OPENAI_API_KEY },
});

// Or use the official Azure OpenAI SDK for stronger typing:
import { AzureOpenAI } from 'openai';

const client = new AzureOpenAI({
  endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
  apiKey: process.env.AZURE_OPENAI_API_KEY!,
  apiVersion: '2024-12-01-preview',
  deployment: 'gpt-4o',  // Your deployment name
});
// Chat completion (identical to OpenAI SDK):
const response = await client.chat.completions.create({
  model: 'gpt-4o',     // Uses your deployment name
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Summarize this document...' },
  ],
  max_tokens: 1024,
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
// Azure OpenAI with function calling (same as OpenAI):
const tools = [
  {
    type: 'function' as const,
    function: {
      name: 'search_documents',
      description: 'Search enterprise document store',
      strict: true,
      parameters: {
        type: 'object',
        properties: {
          query: { type: 'string' },
          department: { type: 'string', enum: ['legal', 'hr', 'finance', 'engineering'] },
          date_range: { type: 'string', description: 'ISO date range, e.g. 2025-01-01/2025-12-31' },
        },
        required: ['query'],
        additionalProperties: false,
      },
    },
  },
];

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Find all legal contracts from Q4 2025' }],
  tools,
  tool_choice: 'auto',
});
// Azure OpenAI Assistants API (full stateful conversations):
const assistant = await client.beta.assistants.create({
  name: 'Enterprise Support Agent',
  instructions: 'You help employees with HR and IT questions.',
  model: 'gpt-4o',
  tools: [{ type: 'file_search' }, { type: 'code_interpreter' }],
});

const thread = await client.beta.threads.create();

await client.beta.threads.messages.create(thread.id, {
  role: 'user',
  content: 'What is the vacation policy for US employees?',
});

const run = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
});

Azure OpenAI Models (2026)

ModelDeploymentNotes
GPT-4ogpt-4oLatest, best quality
GPT-4o minigpt-4o-miniCheaper, fast
o1o1Reasoning model
o3-minio3-miniFast reasoning
DALL-E 3dall-e-3Image generation
WhisperwhisperSpeech transcription
text-embedding-3-largetext-embedding-3-largeBest embeddings

Enterprise Features Only Azure Has

✅ Private Endpoints — Your traffic never leaves Azure backbone
✅ Customer-Managed Keys — Encrypt with your own Azure Key Vault keys
✅ Content Filtering — Customizable harm categories per deployment
✅ Managed Identity — No API keys needed, uses Azure AD
✅ Compliance: SOC2, HIPAA, FedRAMP High, ISO 27001, GDPR
✅ No data training: Your data is NOT used to train OpenAI models
✅ Regional deployment: Choose EU regions for data residency
✅ 99.9% uptime SLA (OpenAI has no SLA guarantee)

Azure Authentication Without API Keys

// Production pattern: use Managed Identity (no API keys):
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import { AzureOpenAI } from 'openai';

const credential = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

const client = new AzureOpenAI({
  endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
  azureADTokenProvider,  // No API key needed when running in Azure
  apiVersion: '2024-12-01-preview',
  deployment: 'gpt-4o',
});

Side-by-Side Comparison

Cloudflare Workers AIAWS BedrockAzure OpenAI
Best forEdge latency, global appsAWS orgs, multi-modelMicrosoft orgs, GPT-4o
Latency~30-50ms globally~100-300ms~100-200ms (US)
Model count50+50+OpenAI models only
GPT-4o access
Llama access
Claude access
HIPAALimited
Private endpoint✅ Workers✅ VPC✅ Private Link
Fine-tuningLimited✅ (GPT-4o mini)
RAG built-in✅ Vectorize✅ Knowledge Bases✅ AI Search
Free tier10K neurons/dayNo free tier$200 credit
Pricing modelPer neuronPer tokenPer token

Decision Framework

Choose CLOUDFLARE WORKERS AI if:
  → Your app already runs on Cloudflare Workers
  → Globally distributed users, latency is critical
  → You need inference with zero cold starts
  → Simple use case: Llama, embeddings, image gen

Choose AWS BEDROCK if:
  → Your infrastructure is on AWS
  → You need access to multiple model providers (Llama, Claude, Mistral, Titan)
  → You want managed RAG with S3 data sources
  → Enterprise governance and model access control is required
  → You want Claude without going to Anthropic directly

Choose AZURE OPENAI if:
  → You need GPT-4o or o1 specifically
  → HIPAA, FedRAMP, or government compliance required
  → Your organization is Microsoft-certified / Azure-native
  → You need private endpoints and no data training guarantee
  → Existing Azure AD for authentication (Managed Identity)

Use NONE of these if:
  → You're a startup/indie dev (use OpenAI/Anthropic direct — simpler, cheaper)
  → You need bleeding-edge models first (hyperscalers lag direct providers by weeks)
  → You're optimizing for cost (direct APIs are cheaper, no cloud markup)

Code: Switching Between Providers

Since all three are largely OpenAI-compatible:

// Universal client factory:
import OpenAI from 'openai';

type CloudProvider = 'cloudflare' | 'bedrock' | 'azure' | 'openai';

function createClient(provider: CloudProvider): OpenAI {
  switch (provider) {
    case 'azure':
      return new OpenAI({
        apiKey: process.env.AZURE_OPENAI_API_KEY,
        baseURL: `${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_DEPLOYMENT}`,
        defaultQuery: { 'api-version': '2024-12-01-preview' },
      });

    case 'openai':
      return new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

    // Bedrock has its own SDK (not OpenAI-compatible at the SDK level)
    // Use bedrock-compatible OpenAI endpoint via LiteLLM or direct SDK
    case 'bedrock':
      return new OpenAI({
        apiKey: 'bedrock',  // Placeholder — real auth via AWS credentials
        baseURL: 'https://bedrock-runtime.us-east-1.amazonaws.com/model',
      });

    default:
      throw new Error(`Unknown provider: ${provider}`);
  }
}

Discover and compare managed AI APIs at APIScout.

Comments