DeepgramTTS
Stream low-latency text-to-speech over WebSocket using Deepgram's Aura 2 voice models.
Use DeepgramTTS for low-latency streaming speech synthesis. Text flows over a persistent WebSocket connection and audio chunks arrive incrementally, so users hear the first words before the full response finishes generating.
Prerequisites
- A Deepgram API key or a CompositeVoice proxy server
- No additional peer dependencies required
DeepgramTTS connects through a raw native WebSocket — no external SDK or WebSocket manager library is required.
Basic setup
import { CompositeVoice, MicrophoneInput, DeepgramSTT, AnthropicLLM, DeepgramTTS, BrowserAudioOutput } from '@lukeocodes/composite-voice';
const voice = new CompositeVoice({
providers: [
new MicrophoneInput(),
new DeepgramSTT({ proxyUrl: '/api/proxy/deepgram' }),
new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
}),
new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
sampleRate: 24000,
outputFormat: 'linear16',
}),
new BrowserAudioOutput(),
],
});
await voice.initialize();
await voice.startListening();
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | Deepgram API key (direct mode) |
authType | 'token' | 'bearer' | 'token' | Controls WebSocket auth. Default: 'token' (subprotocol ['token', apiKey]). Set to 'bearer' for OAuth tokens. |
proxyUrl | string | — | Proxy server URL (recommended for production) |
voice | string | 'aura-2-thalia-en' | Voice model identifier |
sampleRate | number | 24000 | Output sample rate: 8000, 16000, 24000, 32000, or 48000 Hz |
outputFormat | string | 'linear16' | Audio encoding: linear16, mulaw, or alaw |
options.model | string | Falls back to voice | Overrides the voice model |
options.encoding | string | Falls back to outputFormat | Overrides the encoding |
options.sampleRate | number | Falls back to sampleRate | Overrides the output sample rate |
Available voices
Aura 2 (recommended) — 40 English voices + 10 Spanish voices:
Popular English voices: aura-2-thalia-en, aura-2-andromeda-en, aura-2-janus-en, aura-2-proteus-en, aura-2-orion-en, aura-2-luna-en, aura-2-arcas-en, aura-2-athena-en, aura-2-helios-en, aura-2-zeus-en, and 30 more.
Spanish voices: aura-2-sirio-es, aura-2-nestor-es, aura-2-carina-es, aura-2-celeste-es, aura-2-alvaro-es, aura-2-diana-es, aura-2-aquila-es, aura-2-selena-es, aura-2-estrella-es, aura-2-javier-es.
Aura 1 (legacy) — 12 English voices:
aura-asteria-en, aura-luna-en, aura-stella-en, aura-athena-en, aura-hera-en, aura-orion-en, aura-arcas-en, aura-perseus-en, aura-angus-en, aura-orpheus-en, aura-helios-en, aura-zeus-en. Use Aura 1 only if you need a specific voice that did not carry over to Aura 2.
Complete example
import { CompositeVoice, MicrophoneInput, DeepgramSTT, AnthropicLLM, DeepgramTTS, BrowserAudioOutput } from '@lukeocodes/composite-voice';
const tts = new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-andromeda-en',
sampleRate: 24000,
outputFormat: 'linear16',
});
const voice = new CompositeVoice({
providers: [
new MicrophoneInput(),
new DeepgramSTT({ proxyUrl: '/api/proxy/deepgram' }),
new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
}),
tts,
new BrowserAudioOutput(),
],
logging: { enabled: true, level: 'debug' },
});
voice.on('tts.start', () => console.log('Speaking...'));
voice.on('tts.end', () => console.log('Done speaking'));
await voice.initialize();
await voice.startListening();
Streaming lifecycle
DeepgramTTS extends LiveTTSProvider. When used inside a CompositeVoice pipeline, the SDK manages the full lifecycle automatically. For standalone use:
const tts = new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
});
await tts.initialize(); // Validate configuration
await tts.connect(); // Open WebSocket
tts.onAudio((chunk) => {
// chunk.data is an ArrayBuffer of linear16 PCM audio
// chunk.metadata contains sampleRate, encoding, channels, bitDepth
});
tts.sendText('Hello, world!');
await tts.finalize(); // Flush remaining audio
await tts.disconnect(); // Close WebSocket
Tips
- Use
proxyUrlin production to keep your API key server-side. PassapiKeyonly during local development. - Aura 2 voices deliver better quality than Aura 1. Use
aura-2-thalia-enas a starting point. - DeepgramTTS uses raw native WebSocket, not
@deepgram/sdk. No extra packages to install. - DeepgramTTS emits metadata events with sample rate and encoding information. Use these to configure downstream audio processing.