Anthropic Claude API: Complete Developer Guide 2026
TL;DR
Claude 3.5 Sonnet is the best model for most production use cases — better coding than GPT-4o, excellent instruction following, and much cheaper than Claude 3 Opus. Use Haiku 3.5 for high-volume simple tasks ($0.80/$4 per 1M tokens), Sonnet 3.5 for most everything else ($3/$15), and Opus 3 only for genuinely complex reasoning where cost isn't a concern. The API is OpenAI-compatible-ish but with key differences in tool use, content blocks, and prompt caching that make the switch non-trivial. Here's everything you need.
Key Takeaways
- Model lineup: Haiku 3.5 (fast/cheap) → Sonnet 3.5 (best value) → Opus 3 (most capable)
- Prompt caching: up to 90% cost reduction on repeated context — killer feature for RAG and chatbots
- Extended thinking: Claude reasons step-by-step before answering, dramatically improves complex tasks
- Tool use:
stop_reason: "tool_use"pattern, mixed text+tool content blocks in same response - Vision: images in
imagecontent blocks, supports base64 and URLs - Context window: 200K tokens on all models — the longest context in production LLMs
Models and Pricing (2026)
| Model | Input $/1M | Output $/1M | Context | Best For |
|---|---|---|---|---|
| claude-3-5-haiku-20241022 | $0.80 | $4.00 | 200K | High-volume, fast tasks |
| claude-3-5-sonnet-20241022 | $3.00 | $15.00 | 200K | Most production use cases |
| claude-3-opus-20240229 | $15.00 | $75.00 | 200K | Complex reasoning |
| claude-3-haiku-20240307 | $0.25 | $1.25 | 200K | Cheapest, legacy |
Recommendation: Default to claude-3-5-sonnet-20241022 unless you have a specific reason to deviate.
Basic Setup
// npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// Basic completion:
const message = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Explain async/await in 3 sentences.' }
],
});
console.log(message.content[0].text);
// With system prompt:
const message = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 2048,
system: `You are an expert TypeScript engineer. Be concise and precise.
Always include types in code examples. Use const over let.`,
messages: [
{ role: 'user', content: 'Write a retry wrapper for async functions.' }
],
});
Streaming
// Streaming with async iterator:
const stream = await anthropic.messages.stream({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Write a haiku about TypeScript.' }],
});
// Stream text chunks:
for await (const text of stream.textStream) {
process.stdout.write(text);
}
// Or get the final message after streaming:
const finalMessage = await stream.getFinalMessage();
console.log(finalMessage.usage); // { input_tokens: 12, output_tokens: 42 }
// Server-Sent Events for Next.js App Router:
export async function POST(req: Request) {
const { messages } = await req.json();
const stream = await anthropic.messages.stream({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages,
});
// Convert to ReadableStream for Response:
return new Response(
new ReadableStream({
async start(controller) {
const encoder = new TextEncoder();
for await (const text of stream.textStream) {
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
}
controller.enqueue(encoder.encode('data: [DONE]\n\n'));
controller.close();
},
}),
{ headers: { 'Content-Type': 'text/event-stream' } }
);
}
Prompt Caching: 90% Cost Reduction
Prompt caching is Anthropic's biggest cost-optimization feature. If you're sending the same long system prompt or context repeatedly, mark it for caching and pay 90% less on subsequent requests.
// Without caching: pay full price for system prompt on every request
// With caching: pay once (write), then ~10% on subsequent reads
const systemPrompt = `You are an expert software architect with 20 years of experience.
[...imagine 10,000 tokens of detailed instructions, examples, and context...]
`;
// Mark the system prompt for caching:
const message = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
system: [
{
type: 'text',
text: systemPrompt,
cache_control: { type: 'ephemeral' }, // ← Enable caching
},
],
messages: [
{ role: 'user', content: 'Review this PR description...' }
],
});
// First request: full price (cache write)
// Subsequent requests within 5 minutes: 90% cheaper (cache read)
// Cache write: $3.75/1M tokens (25% premium to write)
// Cache read: $0.30/1M tokens (90% discount vs uncached $3/M)
// Cache large context (like documentation or a codebase):
const docsContext = fs.readFileSync('docs/api-reference.md', 'utf-8');
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
system: 'You are a helpful developer support assistant.',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: `Here is our API documentation:\n\n${docsContext}`,
cache_control: { type: 'ephemeral' }, // Cache the long context
},
{
type: 'text',
text: userQuestion, // Not cached — changes each request
},
],
},
],
});
// Check cache status:
console.log(response.usage);
// { cache_creation_input_tokens: 15000, cache_read_input_tokens: 0, input_tokens: 50 }
// → First request: writing cache
// On second request: cache_read_input_tokens: 15000 (90% cheaper)
Rules for caching to work:
- Minimum 1024 tokens to cache
- Content must be identical across requests (any change = new cache write)
- Cache TTL: 5 minutes for
ephemeraltype - Cache position: must be at the end of the system/user block, before the varying content
Extended Thinking
Extended thinking enables Claude to reason step-by-step before producing its final response — dramatically improving accuracy on math, coding, and complex analysis.
// Enable extended thinking:
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022', // Extended thinking requires Sonnet or Opus
max_tokens: 16000,
thinking: {
type: 'enabled',
budget_tokens: 10000, // Max tokens Claude can use for thinking
},
messages: [
{
role: 'user',
content: `A train leaves Station A at 60 mph heading to Station B,
250 miles away. 30 minutes later, another train leaves Station B
at 80 mph heading to Station A. When and where do they meet?`,
},
],
});
// Response contains both thinking blocks and the final answer:
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking); // Claude's reasoning process
} else if (block.type === 'text') {
console.log('Answer:', block.text); // Final answer to user
}
}
// Streaming with thinking:
const stream = await anthropic.messages.stream({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 8000,
thinking: { type: 'enabled', budget_tokens: 5000 },
messages: [{ role: 'user', content: complexCodingTask }],
});
// Track thinking vs response separately:
let thinkingText = '';
let responseText = '';
let currentBlockType = '';
stream.on('text', (text, snapshot) => {
responseText += text;
});
// Most apps hide the thinking block from users but use it for debugging
When to use extended thinking:
- Math and logic problems
- Complex code generation or debugging
- Multi-step reasoning tasks
- Analysis requiring several considerations
Cost note: thinking tokens count toward max_tokens and are billed at output rates.
Tool Use (Function Calling)
const tools: Anthropic.Messages.Tool[] = [
{
name: 'search_web',
description: 'Search the web for current information',
input_schema: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
max_results: { type: 'number', description: 'Number of results', default: 5 },
},
required: ['query'],
},
},
{
name: 'run_code',
description: 'Execute Python code and return the output',
input_schema: {
type: 'object',
properties: {
code: { type: 'string', description: 'Python code to execute' },
},
required: ['code'],
},
},
];
async function runAgentLoop(userMessage: string): Promise<string> {
const messages: Anthropic.Messages.MessageParam[] = [
{ role: 'user', content: userMessage },
];
while (true) {
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 4096,
tools,
messages,
});
// Stop if Claude finished without calling tools:
if (response.stop_reason === 'end_turn') {
return response.content
.filter((b) => b.type === 'text')
.map((b) => (b as Anthropic.Messages.TextBlock).text)
.join('');
}
// Handle tool calls:
if (response.stop_reason === 'tool_use') {
// Add Claude's response (may include text + tool_use blocks):
messages.push({ role: 'assistant', content: response.content });
// Execute each tool:
const toolResults: Anthropic.Messages.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type !== 'tool_use') continue;
let result: unknown;
try {
result = await executeTool(block.name, block.input as Record<string, unknown>);
} catch (err) {
result = `Error: ${err instanceof Error ? err.message : 'Unknown error'}`;
}
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: JSON.stringify(result),
// is_error: true, // Uncomment for error results
});
}
messages.push({ role: 'user', content: toolResults });
}
}
}
Vision: Analyzing Images
// Image from URL:
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'url',
url: 'https://example.com/screenshot.png',
},
},
{
type: 'text',
text: 'What UI issues do you see in this screenshot? Be specific.',
},
],
},
],
});
// Image from file (base64):
import fs from 'fs';
const imageBuffer = fs.readFileSync('diagram.png');
const base64Image = imageBuffer.toString('base64');
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 2048,
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png', // 'image/jpeg', 'image/gif', 'image/webp'
data: base64Image,
},
},
{ type: 'text', text: 'Explain this architecture diagram.' },
],
},
],
});
Message History (Multi-turn Conversations)
// Stateless pattern — manage history yourself:
const conversationHistory: Anthropic.Messages.MessageParam[] = [];
async function chat(userMessage: string): Promise<string> {
// Add user message:
conversationHistory.push({ role: 'user', content: userMessage });
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: conversationHistory,
});
const assistantMessage = response.content
.filter((b) => b.type === 'text')
.map((b) => (b as Anthropic.Messages.TextBlock).text)
.join('');
// Add assistant response to history:
conversationHistory.push({ role: 'assistant', content: response.content });
return assistantMessage;
}
Cost Optimization Checklist
1. Model selection:
→ Haiku 3.5 for classification, simple extraction, routing
→ Sonnet 3.5 for most production tasks (best value)
→ Opus only for complex reasoning where quality is critical
2. Prompt caching:
→ Cache system prompts >1024 tokens
→ Cache large context (docs, codebase, examples)
→ Saves 90% on input tokens for cached content
3. Token budgeting:
→ Set max_tokens to reasonable upper bound (not 4096 "just in case")
→ Use streaming to stop early when you have enough output
→ Shorter system prompts → lower costs
4. Batching:
→ Use Message Batches API for offline/async workloads
→ 50% cost reduction, up to 24hr processing window
→ Great for: bulk classification, data extraction, eval runs
Compare all AI APIs including Anthropic at APIScout.