ElevenLabs vs OpenAI TTS vs Deepgram Aura
TL;DR
ElevenLabs for voice quality and cloning. OpenAI TTS for simplicity and ecosystem. Deepgram Aura for production-grade low-latency at scale. ElevenLabs produces the most natural-sounding speech and is the only API with high-quality voice cloning from 1 minute of audio. OpenAI TTS is good enough for most use cases and has the simplest API. Deepgram Aura wins on first-byte latency (~200ms) which matters for real-time voice apps. The right choice depends on whether you're building a voice product (ElevenLabs), an AI assistant (Deepgram), or just adding audio to your app (OpenAI).
Key Takeaways
- ElevenLabs: best voice quality, voice cloning, 32 languages, $0.30/1K chars ($0.0003/char)
- OpenAI TTS: 6 voices, simple API, $15/1M chars ($0.000015/char) — 20x cheaper
- Deepgram Aura: ~200ms first byte, streaming WebSocket, $0.015/1K chars
- Latency for streaming: Deepgram ~200ms, OpenAI ~400ms, ElevenLabs ~500ms (streaming)
- Voice cloning: ElevenLabs only (30-second to 1-minute sample needed)
- Real-time voice: Deepgram Aura + STT in same platform = low-latency voice assistant loop
OpenAI TTS: Simplest API
Best for: adding audio to an existing OpenAI app, simple narration, notifications
import OpenAI from 'openai';
import fs from 'fs';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Generate audio file:
const mp3 = await openai.audio.speech.create({
model: 'tts-1', // or 'tts-1-hd' (higher quality, ~2x cost)
voice: 'alloy', // alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer
input: 'Hello! Welcome to our platform.',
response_format: 'mp3', // mp3, opus, aac, flac, wav, pcm
speed: 1.0, // 0.25 to 4.0
});
// Save to file:
const buffer = Buffer.from(await mp3.arrayBuffer());
fs.writeFileSync('speech.mp3', buffer);
// Streaming (for long text):
const stream = await openai.audio.speech.create({
model: 'tts-1',
voice: 'nova',
input: longText,
response_format: 'mp3',
});
// Stream to file or HTTP response:
const dest = fs.createWriteStream('speech.mp3');
const readableStream = stream.body as unknown as NodeJS.ReadableStream;
readableStream.pipe(dest);
// Next.js API route — stream audio to browser:
export async function POST(req: Request) {
const { text, voice = 'alloy' } = await req.json();
const mp3 = await openai.audio.speech.create({
model: 'tts-1',
voice,
input: text,
});
return new Response(mp3.body, {
headers: {
'Content-Type': 'audio/mpeg',
'Transfer-Encoding': 'chunked',
},
});
}
OpenAI TTS Voices
| Voice | Character | Best For |
|---|---|---|
| alloy | Neutral, balanced | General use |
| ash | Warm, conversational | Chatbots |
| coral | Clear, professional | Announcements |
| echo | Deep, authoritative | Presentations |
| nova | Bright, friendly | Customer service |
| onyx | Rich, deep | Narration |
| sage | Calm, clear | Education |
| shimmer | Warm, expressive | Stories |
OpenAI TTS Pricing
tts-1: $15/1M characters ($0.000015/char)
tts-1-hd: $30/1M characters ($0.000030/char)
For context:
Average sentence (100 chars): $0.0015
1-minute narration (~800 chars): $0.012
1-hour audiobook (~48K chars): $0.72
ElevenLabs: Premium Quality and Voice Cloning
Best for: high-quality voice content, voice cloning, multilingual apps, podcasts/audiobooks
// npm install elevenlabs
import { ElevenLabsClient } from 'elevenlabs';
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
// Convert text to speech:
const audioStream = await elevenlabs.textToSpeech.convertAsStream('Rachel', {
text: 'Welcome to our platform.',
model_id: 'eleven_turbo_v2', // Fast model; 'eleven_multilingual_v2' for multilingual
voice_settings: {
stability: 0.5, // 0-1: lower = more expressive, higher = more consistent
similarity_boost: 0.8, // 0-1: higher = more similar to original voice
style: 0.0, // 0-1: style exaggeration
use_speaker_boost: true,
},
output_format: 'mp3_44100_128',
});
// Collect stream into buffer:
const chunks: Uint8Array[] = [];
for await (const chunk of audioStream) {
chunks.push(chunk);
}
const audio = Buffer.concat(chunks);
fs.writeFileSync('speech.mp3', audio);
// List available voices:
const voices = await elevenlabs.voices.getAll();
for (const voice of voices.voices) {
console.log(`${voice.name}: ${voice.voice_id} (${voice.labels?.accent ?? 'no accent'})`);
}
// Use a specific voice by ID:
const audioStream = await elevenlabs.textToSpeech.convertAsStream(
'pNInz6obpgDQGcFmaJgB', // Adam voice ID
{ text: 'Hello world', model_id: 'eleven_turbo_v2' }
);
ElevenLabs Voice Cloning
// Instant Voice Cloning (1 minute of audio → custom voice):
const voiceClone = await elevenlabs.voices.ivc.create({
name: 'My Custom Voice',
description: 'A custom voice for our product',
files: [
new File([fs.readFileSync('voice-sample.mp3')], 'sample.mp3', { type: 'audio/mpeg' }),
],
labels: JSON.stringify({ accent: 'American', age: 'young adult', gender: 'female' }),
});
console.log('Voice ID:', voiceClone.voice_id);
// Now use the cloned voice:
const audio = await elevenlabs.textToSpeech.convertAsStream(voiceClone.voice_id, {
text: 'This is my cloned voice.',
model_id: 'eleven_turbo_v2',
});
ElevenLabs Models
| Model | Quality | Latency | Languages | Notes |
|---|---|---|---|---|
eleven_turbo_v2 | Good | ~500ms | 32 | Best balance |
eleven_turbo_v2_5 | Better | ~500ms | 32 | Improved quality |
eleven_multilingual_v2 | Best | ~800ms | 29 | Highest quality |
eleven_flash_v2_5 | Good | ~200ms | 32 | Lowest latency |
ElevenLabs Pricing
Creator ($22/month): 100K chars/month
Pro ($99/month): 500K chars/month + commercial use + voice cloning
Scale ($330/month): 2M chars/month
Enterprise: custom
Beyond plan limit:
Developer: $0.30/1K chars ($0.0003/char)
vs OpenAI: $0.015/1K chars ($0.000015/char)
ElevenLabs is 20x more expensive than OpenAI TTS.
Worth it for: high-quality consumer products, voice assistants, content creation.
Not worth it for: internal tools, simple notifications, cost-sensitive apps.
Deepgram Aura: Low-Latency Production
Best for: real-time voice assistants, customer service bots, apps needing fast first-byte response
// npm install @deepgram/sdk
import { createClient } from '@deepgram/sdk';
const deepgram = createClient(process.env.DEEPGRAM_API_KEY!);
// TTS with streaming:
const response = await deepgram.speak.request(
{ text: 'Hello! How can I help you today?' },
{
model: 'aura-asteria-en', // Fastest English voice
encoding: 'linear16', // PCM16 for real-time playback
sample_rate: 24000,
}
);
const stream = await response.getStream();
if (stream) {
// First bytes arrive in ~200ms (vs ~400ms for OpenAI)
const audioData = await getAudioBuffer(stream);
}
// WebSocket for true real-time (lower overhead than HTTP):
const ws = deepgram.speak.live({
model: 'aura-asteria-en',
encoding: 'linear16',
sample_rate: 24000,
});
ws.on('open', () => {
ws.sendText('Hello! I am your AI assistant.');
});
ws.on('audio', (audioChunk: Buffer) => {
// Stream each chunk to audio output immediately
playAudioChunk(audioChunk);
});
ws.on('close', () => console.log('Done'));
Deepgram Aura Voices
English voices:
aura-asteria-en — Female, warm (recommended)
aura-luna-en — Female, natural
aura-stella-en — Female, clear
aura-athena-en — Female, authoritative
aura-hera-en — Female, confident
aura-orion-en — Male, deep
aura-arcas-en — Male, warm
aura-perseus-en — Male, authoritative
aura-angus-en — Male, Irish accent
aura-orpheus-en — Male, American
aura-helios-en — Male, British accent
aura-zeus-en — Male, commanding
Deepgram Full Voice Pipeline (STT + TTS)
The killer use case: Deepgram handles both speech-to-text and TTS, minimizing round trips:
// Complete voice assistant loop with Deepgram:
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';
const deepgram = createClient(process.env.DEEPGRAM_API_KEY!);
// 1. Speech-to-Text (Deepgram Nova-3):
const sttConnection = deepgram.listen.live({
model: 'nova-3',
language: 'en-US',
smart_format: true,
interim_results: false,
utterance_end_ms: 1000,
});
sttConnection.on(LiveTranscriptionEvents.Transcript, async (data) => {
const transcript = data.channel?.alternatives[0]?.transcript;
if (!transcript || data.is_final === false) return;
// 2. Send to LLM:
const llmResponse = await getLLMResponse(transcript);
// 3. Text-to-Speech (Deepgram Aura):
const ttsResponse = await deepgram.speak.request(
{ text: llmResponse },
{ model: 'aura-asteria-en', encoding: 'linear16', sample_rate: 24000 }
);
const audioStream = await ttsResponse.getStream();
// Play audio immediately
playStream(audioStream);
});
Deepgram Pricing
Aura TTS:
Pay-as-you-go: $0.015/1K chars ($0.000015/char)
Same price as OpenAI TTS-1 but with better latency
For 1M chars/month:
OpenAI TTS-1: $15
Deepgram Aura: $15
ElevenLabs: $300 (on pay-as-you-go)
Head-to-Head: When to Choose
| Need | Best Choice |
|---|---|
| Simplest setup | OpenAI TTS |
| Highest voice quality | ElevenLabs |
| Voice cloning | ElevenLabs |
| Lowest cost | OpenAI TTS or Deepgram |
| Fastest first-byte | Deepgram Aura or ElevenLabs Flash |
| Multilingual (29+ languages) | ElevenLabs |
| Full STT+TTS pipeline | Deepgram |
| Already using OpenAI | OpenAI TTS |
| Real-time voice assistant | Deepgram Aura |
| Consumer product (quality matters) | ElevenLabs |
| Internal tool | OpenAI TTS |
Code: Audio Playback in the Browser
// Play audio from TTS API in browser:
async function playText(text: string) {
const response = await fetch('/api/tts', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text }),
});
const arrayBuffer = await response.arrayBuffer();
const audioContext = new AudioContext();
// For MP3:
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
// Returns promise when done playing:
return new Promise<void>((resolve) => {
source.onended = () => resolve();
});
}
// Streaming playback (start playing before download complete):
async function playStreamingText(text: string) {
const response = await fetch('/api/tts', {
method: 'POST',
body: JSON.stringify({ text }),
headers: { 'Content-Type': 'application/json' },
});
const audioContext = new AudioContext();
const reader = response.body!.getReader();
// ... chunk-by-chunk audio playback
}
Compare all voice and speech APIs at APIScout.