DeepgramTTS

Stream low-latency text-to-speech over WebSocket using Deepgram's Aura 2 voice models.

Use DeepgramTTS for low-latency streaming speech synthesis. Text flows over a persistent WebSocket connection and audio chunks arrive incrementally, so users hear the first words before the full response finishes generating.

Prerequisites

A Deepgram API key or a CompositeVoice proxy server
No additional peer dependencies required

DeepgramTTS connects through a raw native WebSocket — no external SDK or WebSocket manager library is required.

Basic setup

import { CompositeVoice, MicrophoneInput, DeepgramSTT, AnthropicLLM, DeepgramTTS, BrowserAudioOutput } from '@lukeocodes/composite-voice';

const voice = new CompositeVoice({
  providers: [
    new MicrophoneInput(),
    new DeepgramSTT({ proxyUrl: '/api/proxy/deepgram' }),
    new AnthropicLLM({
      proxyUrl: '/api/proxy/anthropic',
      model: 'claude-haiku-4-5',
    }),
    new DeepgramTTS({
      proxyUrl: '/api/proxy/deepgram',
      voice: 'aura-2-thalia-en',
      sampleRate: 24000,
      outputFormat: 'linear16',
    }),
    new BrowserAudioOutput(),
  ],
});

await voice.initialize();
await voice.startListening();

Configuration options

Option	Type	Default	Description
`apiKey`	`string`	—	Deepgram API key (direct mode)
`authType`	`'token' \| 'bearer'`	`'token'`	Controls WebSocket auth. Default: `'token'` (subprotocol `['token', apiKey]`). Set to `'bearer'` for OAuth tokens.
`proxyUrl`	`string`	—	Proxy server URL (recommended for production)
`voice`	`string`	`'aura-2-thalia-en'`	Voice model identifier
`sampleRate`	`number`	`24000`	Output sample rate: `8000`, `16000`, `24000`, `32000`, or `48000` Hz
`outputFormat`	`string`	`'linear16'`	Audio encoding: `linear16`, `mulaw`, or `alaw`
`options.model`	`string`	Falls back to `voice`	Overrides the voice model
`options.encoding`	`string`	Falls back to `outputFormat`	Overrides the encoding
`options.sampleRate`	`number`	Falls back to `sampleRate`	Overrides the output sample rate

Available voices

Aura 2 (recommended) — 40 English voices + 10 Spanish voices:

Popular English voices: aura-2-thalia-en, aura-2-andromeda-en, aura-2-janus-en, aura-2-proteus-en, aura-2-orion-en, aura-2-luna-en, aura-2-arcas-en, aura-2-athena-en, aura-2-helios-en, aura-2-zeus-en, and 30 more.

Spanish voices: aura-2-sirio-es, aura-2-nestor-es, aura-2-carina-es, aura-2-celeste-es, aura-2-alvaro-es, aura-2-diana-es, aura-2-aquila-es, aura-2-selena-es, aura-2-estrella-es, aura-2-javier-es.

Aura 1 (legacy) — 12 English voices:

aura-asteria-en, aura-luna-en, aura-stella-en, aura-athena-en, aura-hera-en, aura-orion-en, aura-arcas-en, aura-perseus-en, aura-angus-en, aura-orpheus-en, aura-helios-en, aura-zeus-en. Use Aura 1 only if you need a specific voice that did not carry over to Aura 2.

Complete example

import { CompositeVoice, MicrophoneInput, DeepgramSTT, AnthropicLLM, DeepgramTTS, BrowserAudioOutput } from '@lukeocodes/composite-voice';

const tts = new DeepgramTTS({
  proxyUrl: '/api/proxy/deepgram',
  voice: 'aura-2-andromeda-en',
  sampleRate: 24000,
  outputFormat: 'linear16',
});

const voice = new CompositeVoice({
  providers: [
    new MicrophoneInput(),
    new DeepgramSTT({ proxyUrl: '/api/proxy/deepgram' }),
    new AnthropicLLM({
      proxyUrl: '/api/proxy/anthropic',
      model: 'claude-haiku-4-5',
    }),
    tts,
    new BrowserAudioOutput(),
  ],
  logging: { enabled: true, level: 'debug' },
});

voice.on('tts.start', () => console.log('Speaking...'));
voice.on('tts.end', () => console.log('Done speaking'));

await voice.initialize();
await voice.startListening();

Streaming lifecycle

DeepgramTTS extends LiveTTSProvider. When used inside a CompositeVoice pipeline, the SDK manages the full lifecycle automatically. For standalone use:

const tts = new DeepgramTTS({
  proxyUrl: '/api/proxy/deepgram',
  voice: 'aura-2-thalia-en',
});

await tts.initialize();     // Validate configuration
await tts.connect();         // Open WebSocket

tts.onAudio((chunk) => {
  // chunk.data is an ArrayBuffer of linear16 PCM audio
  // chunk.metadata contains sampleRate, encoding, channels, bitDepth
});

tts.sendText('Hello, world!');
await tts.finalize();        // Flush remaining audio
await tts.disconnect();      // Close WebSocket

Tips

Use proxyUrl in production to keep your API key server-side. Pass apiKey only during local development.
Aura 2 voices deliver better quality than Aura 1. Use aura-2-thalia-en as a starting point.
DeepgramTTS uses raw native WebSocket, not @deepgram/sdk. No extra packages to install.
DeepgramTTS emits metadata events with sample rate and encoding information. Use these to configure downstream audio processing.