Skip to main content

Anthropic Claude API: Complete Developer Guide 2026

·APIScout Team
anthropicclaudeai-apillmtool-useprompt-caching2026

TL;DR

Claude 3.5 Sonnet is the best model for most production use cases — better coding than GPT-4o, excellent instruction following, and much cheaper than Claude 3 Opus. Use Haiku 3.5 for high-volume simple tasks ($0.80/$4 per 1M tokens), Sonnet 3.5 for most everything else ($3/$15), and Opus 3 only for genuinely complex reasoning where cost isn't a concern. The API is OpenAI-compatible-ish but with key differences in tool use, content blocks, and prompt caching that make the switch non-trivial. Here's everything you need.

Key Takeaways

  • Model lineup: Haiku 3.5 (fast/cheap) → Sonnet 3.5 (best value) → Opus 3 (most capable)
  • Prompt caching: up to 90% cost reduction on repeated context — killer feature for RAG and chatbots
  • Extended thinking: Claude reasons step-by-step before answering, dramatically improves complex tasks
  • Tool use: stop_reason: "tool_use" pattern, mixed text+tool content blocks in same response
  • Vision: images in image content blocks, supports base64 and URLs
  • Context window: 200K tokens on all models — the longest context in production LLMs

Models and Pricing (2026)

ModelInput $/1MOutput $/1MContextBest For
claude-3-5-haiku-20241022$0.80$4.00200KHigh-volume, fast tasks
claude-3-5-sonnet-20241022$3.00$15.00200KMost production use cases
claude-3-opus-20240229$15.00$75.00200KComplex reasoning
claude-3-haiku-20240307$0.25$1.25200KCheapest, legacy

Recommendation: Default to claude-3-5-sonnet-20241022 unless you have a specific reason to deviate.


Basic Setup

// npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Basic completion:
const message = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Explain async/await in 3 sentences.' }
  ],
});

console.log(message.content[0].text);
// With system prompt:
const message = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 2048,
  system: `You are an expert TypeScript engineer. Be concise and precise.
Always include types in code examples. Use const over let.`,
  messages: [
    { role: 'user', content: 'Write a retry wrapper for async functions.' }
  ],
});

Streaming

// Streaming with async iterator:
const stream = await anthropic.messages.stream({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Write a haiku about TypeScript.' }],
});

// Stream text chunks:
for await (const text of stream.textStream) {
  process.stdout.write(text);
}

// Or get the final message after streaming:
const finalMessage = await stream.getFinalMessage();
console.log(finalMessage.usage);  // { input_tokens: 12, output_tokens: 42 }
// Server-Sent Events for Next.js App Router:
export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await anthropic.messages.stream({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    messages,
  });

  // Convert to ReadableStream for Response:
  return new Response(
    new ReadableStream({
      async start(controller) {
        const encoder = new TextEncoder();
        for await (const text of stream.textStream) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
        }
        controller.enqueue(encoder.encode('data: [DONE]\n\n'));
        controller.close();
      },
    }),
    { headers: { 'Content-Type': 'text/event-stream' } }
  );
}

Prompt Caching: 90% Cost Reduction

Prompt caching is Anthropic's biggest cost-optimization feature. If you're sending the same long system prompt or context repeatedly, mark it for caching and pay 90% less on subsequent requests.

// Without caching: pay full price for system prompt on every request
// With caching: pay once (write), then ~10% on subsequent reads

const systemPrompt = `You are an expert software architect with 20 years of experience.
[...imagine 10,000 tokens of detailed instructions, examples, and context...]
`;

// Mark the system prompt for caching:
const message = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: systemPrompt,
      cache_control: { type: 'ephemeral' },  // ← Enable caching
    },
  ],
  messages: [
    { role: 'user', content: 'Review this PR description...' }
  ],
});

// First request: full price (cache write)
// Subsequent requests within 5 minutes: 90% cheaper (cache read)
// Cache write: $3.75/1M tokens (25% premium to write)
// Cache read: $0.30/1M tokens (90% discount vs uncached $3/M)
// Cache large context (like documentation or a codebase):
const docsContext = fs.readFileSync('docs/api-reference.md', 'utf-8');

const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  system: 'You are a helpful developer support assistant.',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: `Here is our API documentation:\n\n${docsContext}`,
          cache_control: { type: 'ephemeral' },  // Cache the long context
        },
        {
          type: 'text',
          text: userQuestion,  // Not cached — changes each request
        },
      ],
    },
  ],
});

// Check cache status:
console.log(response.usage);
// { cache_creation_input_tokens: 15000, cache_read_input_tokens: 0, input_tokens: 50 }
// → First request: writing cache
// On second request: cache_read_input_tokens: 15000 (90% cheaper)

Rules for caching to work:

  • Minimum 1024 tokens to cache
  • Content must be identical across requests (any change = new cache write)
  • Cache TTL: 5 minutes for ephemeral type
  • Cache position: must be at the end of the system/user block, before the varying content

Extended Thinking

Extended thinking enables Claude to reason step-by-step before producing its final response — dramatically improving accuracy on math, coding, and complex analysis.

// Enable extended thinking:
const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',  // Extended thinking requires Sonnet or Opus
  max_tokens: 16000,
  thinking: {
    type: 'enabled',
    budget_tokens: 10000,  // Max tokens Claude can use for thinking
  },
  messages: [
    {
      role: 'user',
      content: `A train leaves Station A at 60 mph heading to Station B,
        250 miles away. 30 minutes later, another train leaves Station B
        at 80 mph heading to Station A. When and where do they meet?`,
    },
  ],
});

// Response contains both thinking blocks and the final answer:
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log('Thinking:', block.thinking);  // Claude's reasoning process
  } else if (block.type === 'text') {
    console.log('Answer:', block.text);        // Final answer to user
  }
}
// Streaming with thinking:
const stream = await anthropic.messages.stream({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 8000,
  thinking: { type: 'enabled', budget_tokens: 5000 },
  messages: [{ role: 'user', content: complexCodingTask }],
});

// Track thinking vs response separately:
let thinkingText = '';
let responseText = '';
let currentBlockType = '';

stream.on('text', (text, snapshot) => {
  responseText += text;
});

// Most apps hide the thinking block from users but use it for debugging

When to use extended thinking:

  • Math and logic problems
  • Complex code generation or debugging
  • Multi-step reasoning tasks
  • Analysis requiring several considerations

Cost note: thinking tokens count toward max_tokens and are billed at output rates.


Tool Use (Function Calling)

const tools: Anthropic.Messages.Tool[] = [
  {
    name: 'search_web',
    description: 'Search the web for current information',
    input_schema: {
      type: 'object',
      properties: {
        query: { type: 'string', description: 'Search query' },
        max_results: { type: 'number', description: 'Number of results', default: 5 },
      },
      required: ['query'],
    },
  },
  {
    name: 'run_code',
    description: 'Execute Python code and return the output',
    input_schema: {
      type: 'object',
      properties: {
        code: { type: 'string', description: 'Python code to execute' },
      },
      required: ['code'],
    },
  },
];

async function runAgentLoop(userMessage: string): Promise<string> {
  const messages: Anthropic.Messages.MessageParam[] = [
    { role: 'user', content: userMessage },
  ];

  while (true) {
    const response = await anthropic.messages.create({
      model: 'claude-3-5-sonnet-20241022',
      max_tokens: 4096,
      tools,
      messages,
    });

    // Stop if Claude finished without calling tools:
    if (response.stop_reason === 'end_turn') {
      return response.content
        .filter((b) => b.type === 'text')
        .map((b) => (b as Anthropic.Messages.TextBlock).text)
        .join('');
    }

    // Handle tool calls:
    if (response.stop_reason === 'tool_use') {
      // Add Claude's response (may include text + tool_use blocks):
      messages.push({ role: 'assistant', content: response.content });

      // Execute each tool:
      const toolResults: Anthropic.Messages.ToolResultBlockParam[] = [];
      for (const block of response.content) {
        if (block.type !== 'tool_use') continue;

        let result: unknown;
        try {
          result = await executeTool(block.name, block.input as Record<string, unknown>);
        } catch (err) {
          result = `Error: ${err instanceof Error ? err.message : 'Unknown error'}`;
        }

        toolResults.push({
          type: 'tool_result',
          tool_use_id: block.id,
          content: JSON.stringify(result),
          // is_error: true,  // Uncomment for error results
        });
      }

      messages.push({ role: 'user', content: toolResults });
    }
  }
}

Vision: Analyzing Images

// Image from URL:
const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'image',
          source: {
            type: 'url',
            url: 'https://example.com/screenshot.png',
          },
        },
        {
          type: 'text',
          text: 'What UI issues do you see in this screenshot? Be specific.',
        },
      ],
    },
  ],
});
// Image from file (base64):
import fs from 'fs';

const imageBuffer = fs.readFileSync('diagram.png');
const base64Image = imageBuffer.toString('base64');

const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 2048,
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'image',
          source: {
            type: 'base64',
            media_type: 'image/png',  // 'image/jpeg', 'image/gif', 'image/webp'
            data: base64Image,
          },
        },
        { type: 'text', text: 'Explain this architecture diagram.' },
      ],
    },
  ],
});

Message History (Multi-turn Conversations)

// Stateless pattern — manage history yourself:
const conversationHistory: Anthropic.Messages.MessageParam[] = [];

async function chat(userMessage: string): Promise<string> {
  // Add user message:
  conversationHistory.push({ role: 'user', content: userMessage });

  const response = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    system: 'You are a helpful assistant.',
    messages: conversationHistory,
  });

  const assistantMessage = response.content
    .filter((b) => b.type === 'text')
    .map((b) => (b as Anthropic.Messages.TextBlock).text)
    .join('');

  // Add assistant response to history:
  conversationHistory.push({ role: 'assistant', content: response.content });

  return assistantMessage;
}

Cost Optimization Checklist

1. Model selection:
   → Haiku 3.5 for classification, simple extraction, routing
   → Sonnet 3.5 for most production tasks (best value)
   → Opus only for complex reasoning where quality is critical

2. Prompt caching:
   → Cache system prompts >1024 tokens
   → Cache large context (docs, codebase, examples)
   → Saves 90% on input tokens for cached content

3. Token budgeting:
   → Set max_tokens to reasonable upper bound (not 4096 "just in case")
   → Use streaming to stop early when you have enough output
   → Shorter system prompts → lower costs

4. Batching:
   → Use Message Batches API for offline/async workloads
   → 50% cost reduction, up to 24hr processing window
   → Great for: bulk classification, data extraction, eval runs

Compare all AI APIs including Anthropic at APIScout.

Comments