Building an AI Agent in 2026: Architecture Patterns and Tools
TL;DR
An AI agent is just an LLM in a loop that can call tools. The complexity comes from: deciding when to stop, handling tool errors gracefully, managing memory across steps, and preventing infinite loops. In 2026, you have three options: build your own loop (most control), use the Vercel AI SDK's maxSteps feature (easiest for simple agents), or use Mastra/OpenAI Agents SDK for multi-agent orchestration. The choice depends on complexity — single-task agents are easy, multi-step research agents are hard, multi-agent systems with coordination are very hard.
Key Takeaways
- Agent = LLM + tools + loop — the fundamentals are simple; production is hard
- Vercel AI SDK:
maxStepsenables multi-turn tool use loops in 10 lines of code - OpenAI Agents SDK: handoffs between agents, guardrails, tracing — best for OpenAI-specific
- Mastra: TypeScript-first, provider-agnostic, built-in memory and workflow support
- Memory patterns: in-context (short-term), vector store (semantic recall), structured DB (facts)
- The real challenge: error recovery, loop detection, cost caps, graceful degradation
The Core Agent Loop
Every agent starts here:
// The minimal agent loop:
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
const result = await generateText({
model: openai('gpt-4o'),
maxSteps: 10, // ← This is what makes it an agent (Vercel AI SDK)
tools: {
searchWeb: tool({
description: 'Search the web for information',
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
return await webSearch(query);
},
}),
writeFile: tool({
description: 'Write content to a file',
parameters: z.object({
filename: z.string(),
content: z.string(),
}),
execute: async ({ filename, content }) => {
await fs.writeFile(filename, content);
return { success: true, filename };
},
}),
},
prompt: 'Research the top 5 AI companies in 2026 and save a report to ai-report.md',
});
console.log('Steps taken:', result.steps.length);
console.log('Final output:', result.text);
That's it. maxSteps: 10 tells the SDK to keep calling the LLM until it finishes (no more tool calls) or hits the step limit. Everything else is complexity management.
Memory: The Core Architecture Challenge
Agents need memory across steps, conversations, and sessions:
// Three memory types:
// 1. In-context memory (short-term, current conversation):
const conversationHistory: CoreMessage[] = [];
// 2. Vector store memory (semantic recall across sessions):
async function rememberFact(content: string, userId: string) {
const embedding = await embed(content);
await vectorStore.upsert([{
id: crypto.randomUUID(),
values: embedding,
metadata: { content, userId, timestamp: Date.now() },
}]);
}
async function recall(query: string, userId: string, limit = 5) {
const embedding = await embed(query);
const results = await vectorStore.query({
vector: embedding,
topK: limit,
filter: { userId },
});
return results.matches.map((m) => m.metadata?.content as string);
}
// 3. Structured memory (facts, preferences, state):
interface AgentMemory {
userId: string;
preferences: Record<string, string>;
completedTasks: string[];
workingMemory: Record<string, unknown>;
}
// Full agent with memory:
async function runAgentWithMemory(userMessage: string, userId: string) {
// Load relevant memories:
const memories = await recall(userMessage, userId);
const userPrefs = await db.agentMemory.findUnique({ where: { userId } });
const systemPrompt = `You are a helpful assistant.
${memories.length > 0 ? `\nRelevant context from previous conversations:\n${memories.join('\n')}` : ''}
${userPrefs ? `\nUser preferences: ${JSON.stringify(userPrefs.preferences)}` : ''}`;
const result = await generateText({
model: openai('gpt-4o'),
maxSteps: 5,
system: systemPrompt,
messages: [...conversationHistory, { role: 'user', content: userMessage }],
tools: { /* ... */ },
onStepFinish: async ({ text, toolResults }) => {
// Optionally save important facts to memory during execution:
if (text.includes('REMEMBER:')) {
const factMatch = text.match(/REMEMBER: (.+)/);
if (factMatch) await rememberFact(factMatch[1], userId);
}
},
});
// Save to conversation history:
conversationHistory.push(
{ role: 'user', content: userMessage },
{ role: 'assistant', content: result.text }
);
return result.text;
}
Error Recovery
Production agents must handle tool failures gracefully:
// Resilient tool execution with retry + fallback:
function createResilientTool<TParams, TResult>(config: {
name: string;
description: string;
parameters: z.ZodType<TParams>;
execute: (params: TParams) => Promise<TResult>;
fallback?: (params: TParams, error: Error) => TResult;
maxRetries?: number;
}) {
return tool({
description: config.description,
parameters: config.parameters,
execute: async (params) => {
const maxRetries = config.maxRetries ?? 2;
let lastError: Error | undefined;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await config.execute(params);
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));
if (attempt < maxRetries) {
// Exponential backoff:
await new Promise((r) => setTimeout(r, 1000 * Math.pow(2, attempt)));
}
}
}
// All retries exhausted — use fallback or return structured error:
if (config.fallback) {
return config.fallback(params as TParams, lastError!);
}
// Return error as data so LLM can handle it:
return {
error: true,
message: lastError!.message,
tool: config.name,
} as TResult;
},
});
}
// Usage:
const searchWeb = createResilientTool({
name: 'searchWeb',
description: 'Search for information',
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => externalSearchAPI(query),
fallback: ({ query }) => ({ results: [], error: `Search unavailable for "${query}"` }),
maxRetries: 2,
});
Multi-Agent Patterns
Coordinator + Specialists
// Pattern: one coordinator, multiple specialist agents
const specialists = {
researcher: async (task: string) => {
const result = await generateText({
model: openai('gpt-4o'),
system: 'You are a research specialist. Find information and cite sources.',
prompt: task,
maxSteps: 8,
tools: { searchWeb, scrapeUrl },
});
return result.text;
},
writer: async (brief: string) => {
const result = await generateText({
model: openai('gpt-4o'),
system: 'You are a technical writer. Write clear, structured content.',
prompt: brief,
maxSteps: 3,
tools: { formatMarkdown },
});
return result.text;
},
reviewer: async (content: string) => {
const { object } = await generateObject({
model: openai('gpt-4o'),
system: 'You are a quality reviewer. Check content for accuracy and clarity.',
prompt: `Review this content:\n\n${content}`,
schema: z.object({
approved: z.boolean(),
issues: z.array(z.string()),
suggestions: z.array(z.string()),
}),
});
return object;
},
};
// Coordinator orchestrates the flow:
async function coordinateResearch(topic: string) {
console.log('Step 1: Research');
const research = await specialists.researcher(`Research: ${topic}`);
console.log('Step 2: Write');
const draft = await specialists.writer(`Write an article based on:\n${research}`);
console.log('Step 3: Review');
const review = await specialists.reviewer(draft);
if (!review.approved) {
console.log('Step 4: Revise');
const revised = await specialists.writer(
`Revise this article:\n${draft}\n\nIssues to fix:\n${review.issues.join('\n')}`
);
return revised;
}
return draft;
}
OpenAI Agents SDK Handoffs
// OpenAI Agents SDK (official, TypeScript):
import { Agent, Runner, handoff, guardrail } from 'openai/lib/agents';
const supportAgent = new Agent({
name: 'Support Agent',
instructions: 'You handle customer support questions. Escalate billing issues.',
tools: [lookupOrder, updateTicket],
});
const billingAgent = new Agent({
name: 'Billing Agent',
instructions: 'You handle billing disputes and refunds. You have authority to issue credits.',
tools: [lookupInvoice, issueCreditNote, processRefund],
});
const triageAgent = new Agent({
name: 'Triage Agent',
instructions: 'Route customer requests to the right specialist.',
tools: [
handoff({ agent: supportAgent, condition: 'For general support questions' }),
handoff({ agent: billingAgent, condition: 'For billing, payment, or refund questions' }),
],
guardrails: [
guardrail({
name: 'No PII in logs',
check: (output) => !containsPII(output),
}),
],
});
// Run:
const result = await Runner.run(triageAgent, 'I was charged twice last month');
console.log(result.finalOutput);
console.log('Handled by:', result.lastAgent.name);
Cost and Safety Controls
// Agent with cost cap and timeout:
class SafeAgent {
private totalTokensUsed = 0;
private maxTokens: number;
private timeoutMs: number;
constructor({ maxTokens = 50000, timeoutMs = 60000 } = {}) {
this.maxTokens = maxTokens;
this.timeoutMs = timeoutMs;
}
async run(task: string): Promise<string> {
const startTime = Date.now();
const result = await Promise.race([
generateText({
model: openai('gpt-4o'),
maxSteps: 20,
prompt: task,
tools: { /* ... */ },
onStepFinish: ({ usage }) => {
this.totalTokensUsed += (usage?.totalTokens ?? 0);
if (this.totalTokensUsed > this.maxTokens) {
throw new Error(`Token budget exceeded: ${this.totalTokensUsed}/${this.maxTokens}`);
}
if (Date.now() - startTime > this.timeoutMs) {
throw new Error(`Agent timeout after ${this.timeoutMs}ms`);
}
},
}),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error('Hard timeout')), this.timeoutMs + 5000)
),
]);
console.log(`Task complete. Tokens used: ${this.totalTokensUsed}`);
return result.text;
}
}
Observability: What's Your Agent Doing?
// Log every step for debugging:
const result = await generateText({
model: openai('gpt-4o'),
maxSteps: 10,
prompt: task,
tools: { /* ... */ },
onStepStart: ({ stepType }) => {
console.log(`[Agent] Step: ${stepType}`);
},
onStepFinish: ({ stepType, text, toolResults, usage }) => {
if (stepType === 'tool-result') {
toolResults?.forEach((tr) => {
console.log(`[Tool] ${tr.toolName}(${JSON.stringify(tr.args)}) → ${JSON.stringify(tr.result)}`);
});
}
if (text) console.log(`[LLM] ${text.slice(0, 200)}...`);
if (usage) console.log(`[Usage] ${usage.promptTokens}+${usage.completionTokens} tokens`);
},
});
Discover AI APIs and agent frameworks at APIScout.