<!-- APIScout AI-readable guide source -->
<!-- Canonical: https://apiscout.dev/guides/anthropic-claude-api-complete-developer-guide-2026 -->
<!-- Raw Markdown: https://apiscout.dev/guides/anthropic-claude-api-complete-developer-guide-2026/raw.md -->
<!-- Source path: content/guides/anthropic-claude-api-complete-developer-guide-2026.mdx -->

---
og_image: "/images/guides/anthropic-claude-api-complete-developer-guide-2026.webp"
title: "Anthropic Claude API: Developer Guide 2026"
description: "Claude API guide for 2026: model selection, prompt caching, extended thinking, tool use, vision, streaming, and cost optimization with code examples Updated."
date: "2026-03-08"
author: "APIScout Team"
tags: ["anthropic", "claude", "ai-api", "llm", "tool-use", "prompt-caching", "2026"]
tier: 1
---

## TL;DR

**Claude Sonnet 4.6 is the best model for most production use cases** — top-tier coding, excellent instruction following, and strong value at $3/$15 per 1M tokens. Use **Haiku 4.5** for high-volume simple tasks ($1/$5 per 1M tokens), **Sonnet 4.6** for most everything else ($3/$15), and **Opus 4.6** for the most capable reasoning, agentic workflows, and complex coding ($5/$25). The API is OpenAI-compatible-ish but with key differences in tool use, content blocks, and prompt caching that make the switch non-trivial. Here's everything you need.

## Key Takeaways

- **Model lineup**: Haiku 4.5 (fast/cheap) → Sonnet 4.6 (best value) → Opus 4.6 (most capable)
- **Prompt caching**: up to 90% cost reduction on repeated context — killer feature for RAG and chatbots
- **Adaptive thinking**: Claude dynamically decides when and how deeply to reason, dramatically improves complex tasks
- **Tool use**: `stop_reason: "tool_use"` pattern, mixed text+tool content blocks in same response
- **Vision**: images in `image` content blocks, supports base64 and URLs
- **Context window**: 200K tokens on all models — the longest context in production LLMs

---

## Models and Pricing (2026)

| Model | Input $/1M | Output $/1M | Context | Best For |
|-------|-----------|------------|---------|----------|
| **claude-haiku-4-5** | $1.00 | $5.00 | 200K | High-volume, fast tasks |
| **claude-sonnet-4-6** | $3.00 | $15.00 | 200K (1M beta) | Most production use cases |
| **claude-opus-4-6** | $5.00 | $25.00 | 200K (1M beta) | Agentic workflows, complex reasoning |

**Recommendation**: Default to `claude-sonnet-4-6` for most production use cases. Use `claude-opus-4-6` for agentic coding and complex reasoning tasks.

---

## Basic Setup

```typescript
// npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Basic completion:
const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Explain async/await in 3 sentences.' }
  ],
});

console.log(message.content[0].text);
```

```typescript
// With system prompt:
const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 2048,
  system: `You are an expert TypeScript engineer. Be concise and precise.
Always include types in code examples. Use const over let.`,
  messages: [
    { role: 'user', content: 'Write a retry wrapper for async functions.' }
  ],
});
```

---

## Streaming

```typescript
// Streaming with async iterator:
const stream = await anthropic.messages.stream({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Write a haiku about TypeScript.' }],
});

// Stream text chunks:
for await (const text of stream.textStream) {
  process.stdout.write(text);
}

// Or get the final message after streaming:
const finalMessage = await stream.getFinalMessage();
console.log(finalMessage.usage);  // { input_tokens: 12, output_tokens: 42 }
```

```typescript
// Server-Sent Events for Next.js App Router:
export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await anthropic.messages.stream({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages,
  });

  // Convert to ReadableStream for Response:
  return new Response(
    new ReadableStream({
      async start(controller) {
        const encoder = new TextEncoder();
        for await (const text of stream.textStream) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
        }
        controller.enqueue(encoder.encode('data: [DONE]\n\n'));
        controller.close();
      },
    }),
    { headers: { 'Content-Type': 'text/event-stream' } }
  );
}
```

---

## Prompt Caching: 90% Cost Reduction

Prompt caching is Anthropic's biggest cost-optimization feature. If you're sending the same long system prompt or context repeatedly, mark it for caching and pay 90% less on subsequent requests.

```typescript
// Without caching: pay full price for system prompt on every request
// With caching: pay once (write), then ~10% on subsequent reads

const systemPrompt = `You are an expert software architect with 20 years of experience.
[...imagine 10,000 tokens of detailed instructions, examples, and context...]
`;

// Mark the system prompt for caching:
const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: systemPrompt,
      cache_control: { type: 'ephemeral' },  // ← Enable caching
    },
  ],
  messages: [
    { role: 'user', content: 'Review this PR description...' }
  ],
});

// First request: full price (cache write)
// Subsequent requests within 5 minutes: 90% cheaper (cache read)
// Cache write: 25% premium over base input price
// Cache read: ~90% discount vs uncached input price
```

```typescript
// Cache large context (like documentation or a codebase):
const docsContext = fs.readFileSync('docs/api-reference.md', 'utf-8');

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  system: 'You are a helpful developer support assistant.',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: `Here is our API documentation:\n\n${docsContext}`,
          cache_control: { type: 'ephemeral' },  // Cache the long context
        },
        {
          type: 'text',
          text: userQuestion,  // Not cached — changes each request
        },
      ],
    },
  ],
});

// Check cache status:
console.log(response.usage);
// { cache_creation_input_tokens: 15000, cache_read_input_tokens: 0, input_tokens: 50 }
// → First request: writing cache
// On second request: cache_read_input_tokens: 15000 (90% cheaper)
```

**Rules for caching to work:**
- Minimum 1024 tokens to cache
- Content must be identical across requests (any change = new cache write)
- Cache TTL: 5 minutes for `ephemeral` type
- Cache position: must be at the end of the system/user block, before the varying content

---

## Adaptive Thinking

Adaptive thinking enables Claude to dynamically decide when and how deeply to reason before producing its final response — dramatically improving accuracy on math, coding, and complex analysis. On Claude 4.6 models, adaptive thinking is the recommended approach (the older `budget_tokens` parameter is deprecated).

```typescript
// Enable adaptive thinking (recommended for Opus 4.6 and Sonnet 4.6):
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 16000,
  thinking: {
    type: 'adaptive',  // Claude decides when and how much to think
  },
  messages: [
    {
      role: 'user',
      content: `A train leaves Station A at 60 mph heading to Station B,
        250 miles away. 30 minutes later, another train leaves Station B
        at 80 mph heading to Station A. When and where do they meet?`,
    },
  ],
});

// Response contains both thinking blocks and the final answer:
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log('Thinking:', block.thinking);  // Claude's reasoning process
  } else if (block.type === 'text') {
    console.log('Answer:', block.text);        // Final answer to user
  }
}
```

```typescript
// Control thinking depth with the effort parameter:
const response = await anthropic.messages.create({
  model: 'claude-opus-4-6',
  max_tokens: 16000,
  thinking: { type: 'adaptive' },
  output_config: { effort: 'high' },  // low | medium | high | max (Opus only)
  messages: [{ role: 'user', content: complexCodingTask }],
});

// Streaming with adaptive thinking:
const stream = await anthropic.messages.stream({
  model: 'claude-sonnet-4-6',
  max_tokens: 64000,
  thinking: { type: 'adaptive' },
  messages: [{ role: 'user', content: complexCodingTask }],
});

for await (const event of stream) {
  if (event.type === 'content_block_delta') {
    if (event.delta.type === 'thinking_delta') {
      process.stdout.write(event.delta.thinking);
    } else if (event.delta.type === 'text_delta') {
      process.stdout.write(event.delta.text);
    }
  }
}
```

**When to use adaptive thinking:**
- Math and logic problems
- Complex code generation or debugging
- Multi-step reasoning tasks
- Analysis requiring several considerations

**Cost note**: thinking tokens are billed at output rates. Use the `effort` parameter to control the cost-quality tradeoff.

---

## Tool Use (Function Calling)

```typescript
const tools: Anthropic.Messages.Tool[] = [
  {
    name: 'search_web',
    description: 'Search the web for current information',
    input_schema: {
      type: 'object',
      properties: {
        query: { type: 'string', description: 'Search query' },
        max_results: { type: 'number', description: 'Number of results', default: 5 },
      },
      required: ['query'],
    },
  },
  {
    name: 'run_code',
    description: 'Execute Python code and return the output',
    input_schema: {
      type: 'object',
      properties: {
        code: { type: 'string', description: 'Python code to execute' },
      },
      required: ['code'],
    },
  },
];

async function runAgentLoop(userMessage: string): Promise<string> {
  const messages: Anthropic.Messages.MessageParam[] = [
    { role: 'user', content: userMessage },
  ];

  while (true) {
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-6',
      max_tokens: 4096,
      tools,
      messages,
    });

    // Stop if Claude finished without calling tools:
    if (response.stop_reason === 'end_turn') {
      return response.content
        .filter((b) => b.type === 'text')
        .map((b) => (b as Anthropic.Messages.TextBlock).text)
        .join('');
    }

    // Handle tool calls:
    if (response.stop_reason === 'tool_use') {
      // Add Claude's response (may include text + tool_use blocks):
      messages.push({ role: 'assistant', content: response.content });

      // Execute each tool:
      const toolResults: Anthropic.Messages.ToolResultBlockParam[] = [];
      for (const block of response.content) {
        if (block.type !== 'tool_use') continue;

        let result: unknown;
        try {
          result = await executeTool(block.name, block.input as Record<string, unknown>);
        } catch (err) {
          result = `Error: ${err instanceof Error ? err.message : 'Unknown error'}`;
        }

        toolResults.push({
          type: 'tool_result',
          tool_use_id: block.id,
          content: JSON.stringify(result),
          // is_error: true,  // Uncomment for error results
        });
      }

      messages.push({ role: 'user', content: toolResults });
    }
  }
}
```

---

## Vision: Analyzing Images

```typescript
// Image from URL:
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'image',
          source: {
            type: 'url',
            url: 'https://example.com/screenshot.png',
          },
        },
        {
          type: 'text',
          text: 'What UI issues do you see in this screenshot? Be specific.',
        },
      ],
    },
  ],
});
```

```typescript
// Image from file (base64):
import fs from 'fs';

const imageBuffer = fs.readFileSync('diagram.png');
const base64Image = imageBuffer.toString('base64');

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 2048,
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'image',
          source: {
            type: 'base64',
            media_type: 'image/png',  // 'image/jpeg', 'image/gif', 'image/webp'
            data: base64Image,
          },
        },
        { type: 'text', text: 'Explain this architecture diagram.' },
      ],
    },
  ],
});
```

---

## Message History (Multi-turn Conversations)

```typescript
// Stateless pattern — manage history yourself:
const conversationHistory: Anthropic.Messages.MessageParam[] = [];

async function chat(userMessage: string): Promise<string> {
  // Add user message:
  conversationHistory.push({ role: 'user', content: userMessage });

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    system: 'You are a helpful assistant.',
    messages: conversationHistory,
  });

  const assistantMessage = response.content
    .filter((b) => b.type === 'text')
    .map((b) => (b as Anthropic.Messages.TextBlock).text)
    .join('');

  // Add assistant response to history:
  conversationHistory.push({ role: 'assistant', content: response.content });

  return assistantMessage;
}
```

---

## Cost Optimization Checklist

```
1. Model selection:
   → Haiku 4.5 for classification, simple extraction, routing ($1/$5 per 1M)
   → Sonnet 4.6 for most production tasks — best value ($3/$15 per 1M)
   → Opus 4.6 for agentic workflows and complex reasoning ($5/$25 per 1M)

2. Prompt caching:
   → Cache system prompts >1024 tokens
   → Cache large context (docs, codebase, examples)
   → Saves 90% on input tokens for cached content

3. Token budgeting:
   → Set max_tokens to ~16000 for non-streaming, ~64000 for streaming
   → Use streaming to stop early when you have enough output
   → Use the effort parameter (low/medium/high) to control thinking costs

4. Batching:
   → Use Message Batches API for offline/async workloads
   → 50% cost reduction, up to 24hr processing window
   → Great for: bulk classification, data extraction, eval runs
```

## Claude vs. OpenAI API: Key Differences

If you're migrating from OpenAI or building on both APIs, several Claude API patterns differ in ways that will break direct ports.

**Content blocks vs. string content:** OpenAI's `choices[0].message.content` is a string. Claude's `content` is an array of typed blocks (`TextBlock`, `ToolUseBlock`, `ThinkingBlock`). Always check `block.type` before accessing block-specific fields. When you need just the text, filter for `type === 'text'` blocks and join.

**Tool response format:** OpenAI tool responses go in `messages` with `role: "tool"`. Claude tool results go in a `user` message with content type `tool_result`. The structure is similar but the field names and nesting differ enough to require careful porting.

**System prompt placement:** OpenAI takes `system` as a message with `role: "system"` inside the messages array. Claude takes `system` as a top-level parameter on the request. This matters when you're building conversation history — Claude's system prompt is separate from the conversation messages.

**Stop reasons:** OpenAI uses `finish_reason: "stop"` or `"tool_calls"`. Claude uses `stop_reason: "end_turn"` or `"tool_use"`. Update your checks accordingly. Claude also returns `"max_tokens"` (truncated) and `"stop_sequence"` (hit a custom stop sequence) as stop reasons.

**Pricing model:** OpenAI prices are per request; Anthropic prices are per token. The per-token model makes prompt caching much more impactful — cached tokens cost 0.10¢/1M, whereas uncached input is 30¢/1M for Sonnet. For workloads with large stable context (RAG, chatbots with long system prompts), Anthropic's prompt caching can make it significantly cheaper than OpenAI at equivalent quality levels.

## Error Handling and Rate Limits

The Anthropic API uses standard HTTP status codes with structured error responses. The errors you'll encounter in production:

**429 (Rate Limited):** Anthropic's rate limits are per-organization and vary by tier. Free tier accounts have tight limits; production accounts should request a limit increase via the Anthropic console before launch. The SDK includes automatic retry with exponential backoff by default (`maxRetries: 2` in the default configuration). For sustained high-volume workloads, use the Message Batches API instead of synchronous requests.

**529 (Overloaded):** Anthropic returns 529 when their systems are under high load. This is temporary — retry with backoff. The SDK handles this automatically under the same retry policy as 429s. If you see frequent 529s, it usually means your traffic spike coincided with high global demand on the API. Route non-time-sensitive requests to the Batch API during these periods.

**400 (Invalid Request):** Usually a malformed message structure — missing `role`, wrong content block format, or exceeding `max_tokens` greater than the model's maximum. Check that `max_tokens` is at most 8,192 for Haiku, and 64,000 for Sonnet/Opus. When using adaptive thinking, you need at least 1,000 `max_tokens` for the thinking budget.

**Context window exceeded:** For 200K context windows, you'll hit this if you accumulate too much conversation history or include large documents without truncation. Track `usage.input_tokens` in responses and truncate the middle of conversation history when you approach 180K tokens (keeping the system prompt and recent turns intact).

## Production Hardening

Building a production Claude integration requires thinking beyond happy-path completions.

**Content filtering:** Claude's safety training means it will occasionally refuse requests that are legitimate for your use case, especially if your system prompt contains words that could trigger caution (security, exploit, hack, etc. in the context of security tooling). If you're building in a sensitive domain, test your prompts with Claude before launch and consider adding explicit permission context in the system prompt ("You are assisting authorized security researchers..."). Monitor for `stop_reason: "content_filtered"` in responses.

**Structured output validation:** When you need structured JSON from Claude, always validate the output against your expected schema before using it. Claude is excellent at producing valid JSON, but rare edge cases exist — truncated responses due to `max_tokens` limits, or unusual inputs that cause format drift. Use a library like Zod to validate the parsed JSON and have a fallback path for validation failures (retry, or return a default value).

**Model versioning:** Anthropic periodically updates models without changing the model ID string. `claude-sonnet-4-6` today may behave differently than `claude-sonnet-4-6` in six months. For production systems where consistency is critical, use date-stamped model IDs (e.g., `claude-sonnet-4-6-20251022`) when they're available. For most applications, the ongoing improvements to the latest model version are desirable rather than a concern.

**Observability:** Track token usage per request (`response.usage.input_tokens`, `response.usage.output_tokens`, `cache_read_input_tokens`) and log it alongside your application metrics. Token usage correlates with cost and latency. A sudden spike in `input_tokens` means something in your prompt construction changed — often a bug where context is being appended instead of replaced. A spike in `output_tokens` means Claude is generating longer responses than expected, which may indicate a prompt change or model behavior shift. Set up alerts on both. For multi-turn conversations, track total conversation token count and log when it crosses 100K, 150K, and 180K so you can investigate before hitting the context limit.

**Testing:** Use `gpt-4o-mini` or a similar small model as a cheap proxy during unit test development, then run integration tests against Claude directly before deploying changes to system prompts. Claude's behavior is consistent enough that if your integration tests pass, production usually behaves as expected — but always run at least 20-50 representative examples through the actual model before shipping prompt changes.

## Methodology

Pricing data is sourced from Anthropic's pricing page (anthropic.com/pricing) as of early 2026. Prompt cache TTL (5 minutes for `ephemeral` type) is documented in Anthropic's prompt caching guide. The 90% cost reduction figure for cache reads vs. uncached input is Anthropic's published rate. Adaptive thinking with the `type: 'adaptive'` field is the current API design for Claude 4.x models; the older `extended_thinking` with `budget_tokens` applies to Claude 3.7 and earlier. The `output_config.effort` parameter for controlling thinking depth is available on Opus 4.6; Sonnet 4.6 uses adaptive thinking without explicit effort control. All code examples use `@anthropic-ai/sdk` v0.30+. The OpenAI comparison pricing (0.10¢/1M for cached vs 30¢/1M uncached) uses Sonnet 4.6 rates; check anthropic.com/pricing for current figures as prices change with model updates. Cache write costs (25% premium over base input) are documented in Anthropic's prompt caching guide and apply to the first request that populates a cache entry.

---

*Compare all AI APIs including Anthropic at [APIScout](https://apiscout.dev).*

*Evaluate Anthropic and compare alternatives on [APIScout](https://apiscout.dev/compare/anthropic-vs-openai).*

*Related: [MCP Server Security: Best Practices 2026](/blog/anthropic-mcp-server-security-2026), [Anthropic MCP vs OpenAI Plugins vs Gemini Extensions](/blog/anthropic-mcp-vs-openai-plugins-vs-gemini-extensions-2026), [Anthropic vs Google Gemini](/blog/anthropic-vs-google-gemini-api-2026)*