Skip to main content

ElevenLabs vs OpenAI TTS vs Deepgram Aura

·APIScout Team
text-to-speechelevenlabsopenai-ttsdeepgramvoice-cloningtts-api2026

TL;DR

ElevenLabs for voice quality and cloning. OpenAI TTS for simplicity and ecosystem. Deepgram Aura for production-grade low-latency at scale. ElevenLabs produces the most natural-sounding speech and is the only API with high-quality voice cloning from 1 minute of audio. OpenAI TTS is good enough for most use cases and has the simplest API. Deepgram Aura wins on first-byte latency (~200ms) which matters for real-time voice apps. The right choice depends on whether you're building a voice product (ElevenLabs), an AI assistant (Deepgram), or just adding audio to your app (OpenAI).

Key Takeaways

  • ElevenLabs: best voice quality, voice cloning, 32 languages, $0.30/1K chars ($0.0003/char)
  • OpenAI TTS: 6 voices, simple API, $15/1M chars ($0.000015/char) — 20x cheaper
  • Deepgram Aura: ~200ms first byte, streaming WebSocket, $0.015/1K chars
  • Latency for streaming: Deepgram ~200ms, OpenAI ~400ms, ElevenLabs ~500ms (streaming)
  • Voice cloning: ElevenLabs only (30-second to 1-minute sample needed)
  • Real-time voice: Deepgram Aura + STT in same platform = low-latency voice assistant loop

OpenAI TTS: Simplest API

Best for: adding audio to an existing OpenAI app, simple narration, notifications

import OpenAI from 'openai';
import fs from 'fs';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Generate audio file:
const mp3 = await openai.audio.speech.create({
  model: 'tts-1',       // or 'tts-1-hd' (higher quality, ~2x cost)
  voice: 'alloy',       // alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer
  input: 'Hello! Welcome to our platform.',
  response_format: 'mp3',   // mp3, opus, aac, flac, wav, pcm
  speed: 1.0,            // 0.25 to 4.0
});

// Save to file:
const buffer = Buffer.from(await mp3.arrayBuffer());
fs.writeFileSync('speech.mp3', buffer);
// Streaming (for long text):
const stream = await openai.audio.speech.create({
  model: 'tts-1',
  voice: 'nova',
  input: longText,
  response_format: 'mp3',
});

// Stream to file or HTTP response:
const dest = fs.createWriteStream('speech.mp3');
const readableStream = stream.body as unknown as NodeJS.ReadableStream;
readableStream.pipe(dest);
// Next.js API route — stream audio to browser:
export async function POST(req: Request) {
  const { text, voice = 'alloy' } = await req.json();

  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice,
    input: text,
  });

  return new Response(mp3.body, {
    headers: {
      'Content-Type': 'audio/mpeg',
      'Transfer-Encoding': 'chunked',
    },
  });
}

OpenAI TTS Voices

VoiceCharacterBest For
alloyNeutral, balancedGeneral use
ashWarm, conversationalChatbots
coralClear, professionalAnnouncements
echoDeep, authoritativePresentations
novaBright, friendlyCustomer service
onyxRich, deepNarration
sageCalm, clearEducation
shimmerWarm, expressiveStories

OpenAI TTS Pricing

tts-1:    $15/1M characters ($0.000015/char)
tts-1-hd: $30/1M characters ($0.000030/char)

For context:
  Average sentence (100 chars):  $0.0015
  1-minute narration (~800 chars): $0.012
  1-hour audiobook (~48K chars):  $0.72

ElevenLabs: Premium Quality and Voice Cloning

Best for: high-quality voice content, voice cloning, multilingual apps, podcasts/audiobooks

// npm install elevenlabs
import { ElevenLabsClient } from 'elevenlabs';

const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
});

// Convert text to speech:
const audioStream = await elevenlabs.textToSpeech.convertAsStream('Rachel', {
  text: 'Welcome to our platform.',
  model_id: 'eleven_turbo_v2',  // Fast model; 'eleven_multilingual_v2' for multilingual
  voice_settings: {
    stability: 0.5,        // 0-1: lower = more expressive, higher = more consistent
    similarity_boost: 0.8, // 0-1: higher = more similar to original voice
    style: 0.0,            // 0-1: style exaggeration
    use_speaker_boost: true,
  },
  output_format: 'mp3_44100_128',
});

// Collect stream into buffer:
const chunks: Uint8Array[] = [];
for await (const chunk of audioStream) {
  chunks.push(chunk);
}
const audio = Buffer.concat(chunks);
fs.writeFileSync('speech.mp3', audio);
// List available voices:
const voices = await elevenlabs.voices.getAll();
for (const voice of voices.voices) {
  console.log(`${voice.name}: ${voice.voice_id} (${voice.labels?.accent ?? 'no accent'})`);
}

// Use a specific voice by ID:
const audioStream = await elevenlabs.textToSpeech.convertAsStream(
  'pNInz6obpgDQGcFmaJgB',  // Adam voice ID
  { text: 'Hello world', model_id: 'eleven_turbo_v2' }
);

ElevenLabs Voice Cloning

// Instant Voice Cloning (1 minute of audio → custom voice):
const voiceClone = await elevenlabs.voices.ivc.create({
  name: 'My Custom Voice',
  description: 'A custom voice for our product',
  files: [
    new File([fs.readFileSync('voice-sample.mp3')], 'sample.mp3', { type: 'audio/mpeg' }),
  ],
  labels: JSON.stringify({ accent: 'American', age: 'young adult', gender: 'female' }),
});

console.log('Voice ID:', voiceClone.voice_id);

// Now use the cloned voice:
const audio = await elevenlabs.textToSpeech.convertAsStream(voiceClone.voice_id, {
  text: 'This is my cloned voice.',
  model_id: 'eleven_turbo_v2',
});

ElevenLabs Models

ModelQualityLatencyLanguagesNotes
eleven_turbo_v2Good~500ms32Best balance
eleven_turbo_v2_5Better~500ms32Improved quality
eleven_multilingual_v2Best~800ms29Highest quality
eleven_flash_v2_5Good~200ms32Lowest latency

ElevenLabs Pricing

Creator ($22/month): 100K chars/month
Pro ($99/month):     500K chars/month + commercial use + voice cloning
Scale ($330/month):  2M chars/month
Enterprise: custom

Beyond plan limit:
  Developer:   $0.30/1K chars ($0.0003/char)
  vs OpenAI:  $0.015/1K chars ($0.000015/char)

ElevenLabs is 20x more expensive than OpenAI TTS.
Worth it for: high-quality consumer products, voice assistants, content creation.
Not worth it for: internal tools, simple notifications, cost-sensitive apps.

Deepgram Aura: Low-Latency Production

Best for: real-time voice assistants, customer service bots, apps needing fast first-byte response

// npm install @deepgram/sdk
import { createClient } from '@deepgram/sdk';

const deepgram = createClient(process.env.DEEPGRAM_API_KEY!);

// TTS with streaming:
const response = await deepgram.speak.request(
  { text: 'Hello! How can I help you today?' },
  {
    model: 'aura-asteria-en',  // Fastest English voice
    encoding: 'linear16',       // PCM16 for real-time playback
    sample_rate: 24000,
  }
);

const stream = await response.getStream();
if (stream) {
  // First bytes arrive in ~200ms (vs ~400ms for OpenAI)
  const audioData = await getAudioBuffer(stream);
}
// WebSocket for true real-time (lower overhead than HTTP):
const ws = deepgram.speak.live({
  model: 'aura-asteria-en',
  encoding: 'linear16',
  sample_rate: 24000,
});

ws.on('open', () => {
  ws.sendText('Hello! I am your AI assistant.');
});

ws.on('audio', (audioChunk: Buffer) => {
  // Stream each chunk to audio output immediately
  playAudioChunk(audioChunk);
});

ws.on('close', () => console.log('Done'));

Deepgram Aura Voices

English voices:
  aura-asteria-en     — Female, warm (recommended)
  aura-luna-en        — Female, natural
  aura-stella-en      — Female, clear
  aura-athena-en      — Female, authoritative
  aura-hera-en        — Female, confident
  aura-orion-en       — Male, deep
  aura-arcas-en       — Male, warm
  aura-perseus-en     — Male, authoritative
  aura-angus-en       — Male, Irish accent
  aura-orpheus-en     — Male, American
  aura-helios-en      — Male, British accent
  aura-zeus-en        — Male, commanding

Deepgram Full Voice Pipeline (STT + TTS)

The killer use case: Deepgram handles both speech-to-text and TTS, minimizing round trips:

// Complete voice assistant loop with Deepgram:
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';

const deepgram = createClient(process.env.DEEPGRAM_API_KEY!);

// 1. Speech-to-Text (Deepgram Nova-3):
const sttConnection = deepgram.listen.live({
  model: 'nova-3',
  language: 'en-US',
  smart_format: true,
  interim_results: false,
  utterance_end_ms: 1000,
});

sttConnection.on(LiveTranscriptionEvents.Transcript, async (data) => {
  const transcript = data.channel?.alternatives[0]?.transcript;
  if (!transcript || data.is_final === false) return;

  // 2. Send to LLM:
  const llmResponse = await getLLMResponse(transcript);

  // 3. Text-to-Speech (Deepgram Aura):
  const ttsResponse = await deepgram.speak.request(
    { text: llmResponse },
    { model: 'aura-asteria-en', encoding: 'linear16', sample_rate: 24000 }
  );

  const audioStream = await ttsResponse.getStream();
  // Play audio immediately
  playStream(audioStream);
});

Deepgram Pricing

Aura TTS:
  Pay-as-you-go: $0.015/1K chars ($0.000015/char)
  Same price as OpenAI TTS-1 but with better latency

For 1M chars/month:
  OpenAI TTS-1:    $15
  Deepgram Aura:   $15
  ElevenLabs:      $300 (on pay-as-you-go)

Head-to-Head: When to Choose

NeedBest Choice
Simplest setupOpenAI TTS
Highest voice qualityElevenLabs
Voice cloningElevenLabs
Lowest costOpenAI TTS or Deepgram
Fastest first-byteDeepgram Aura or ElevenLabs Flash
Multilingual (29+ languages)ElevenLabs
Full STT+TTS pipelineDeepgram
Already using OpenAIOpenAI TTS
Real-time voice assistantDeepgram Aura
Consumer product (quality matters)ElevenLabs
Internal toolOpenAI TTS

Code: Audio Playback in the Browser

// Play audio from TTS API in browser:
async function playText(text: string) {
  const response = await fetch('/api/tts', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text }),
  });

  const arrayBuffer = await response.arrayBuffer();
  const audioContext = new AudioContext();

  // For MP3:
  const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
  const source = audioContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioContext.destination);
  source.start();

  // Returns promise when done playing:
  return new Promise<void>((resolve) => {
    source.onended = () => resolve();
  });
}

// Streaming playback (start playing before download complete):
async function playStreamingText(text: string) {
  const response = await fetch('/api/tts', {
    method: 'POST',
    body: JSON.stringify({ text }),
    headers: { 'Content-Type': 'application/json' },
  });

  const audioContext = new AudioContext();
  const reader = response.body!.getReader();
  // ... chunk-by-chunk audio playback
}

Compare all voice and speech APIs at APIScout.

Comments