DeepgramFlux
Low-latency real-time speech recognition with eager end-of-turn signals for the speculative LLM pipeline.
:::caution[Currently Disabled] DeepgramFlux is currently disabled and will throw an error on construction. The V2 Flux API integration is not yet available. This guide documents the intended API for when it becomes available. :::
Use DeepgramFlux for the lowest-latency voice pipelines. It connects to Deepgram’s V2 (Flux) streaming API, which delivers turn-based transcription with eager end-of-turn signals — the key ingredient for the eager LLM pipeline.
Prerequisites
- A Deepgram API key
No extra packages needed — DeepgramFlux uses a native WebSocket connection directly to the Deepgram V2 API.
For production, set up a proxy server so your API key stays server-side.
Basic setup
import { CompositeVoice, MicrophoneInput, DeepgramFlux, AnthropicLLM, DeepgramTTS, BrowserAudioOutput } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
providers: [
new MicrophoneInput(),
new DeepgramFlux({
proxyUrl: '/api/proxy/deepgram',
options: {
model: 'flux-general-en',
eagerEotThreshold: 0.5,
},
}),
new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
}),
new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
}),
new BrowserAudioOutput(),
],
eagerLLM: {
enabled: true,
cancelOnTextChange: true,
similarityThreshold: 0.8,
},
});
await agent.initialize();
await agent.startListening();
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
proxyUrl | string | — | URL of your CompositeVoice proxy endpoint (recommended) |
apiKey | string | — | Deepgram API key (development only) |
language | string | 'en-US' | Language code |
interimResults | boolean | true | Emit partial transcripts while the user speaks |
options.model | string | 'flux-general-en' | Flux transcription model |
options.encoding | string | — | Audio encoding: 'linear16', 'linear32', 'mulaw', 'alaw', 'opus', 'ogg-opus' |
options.sampleRate | number | — | Audio sample rate in Hz (required when encoding is set) |
options.eotThreshold | number | 0.7 | Confidence (0.5–0.9) required to confirm end-of-turn |
options.eagerEotThreshold | number | — | Confidence (0.3–0.9) to fire EagerEndOfTurn (enables eager mode) |
options.eotTimeoutMs | number | 5000 | Max ms before forcing end-of-turn regardless of confidence |
options.keyterms | string[] | — | Specialized terminology to boost recognition |
options.tag | string | — | Label for usage reporting in the Deepgram console |
options.mipOptOut | boolean | false | Opt out of the Deepgram Model Improvement Program |
See the API reference for the full list.
How Flux differs from DeepgramSTT
DeepgramFlux uses Deepgram’s V2 API (listen.v2), which is fundamentally different from the V1 API used by DeepgramSTT:
| DeepgramSTT (V1) | DeepgramFlux (V2) | |
|---|---|---|
| API | listen.live | listen.v2 |
| Models | Nova-3, Nova-2 | Flux (e.g., flux-general-en) |
| Transcription model | Event-streaming (Results events) | Turn-based (TurnInfo events) |
| Events | is_final, speech_final | StartOfTurn, Update, EagerEndOfTurn, TurnResumed, EndOfTurn |
| Preflight signals | No | Yes (EagerEndOfTurn → isPreflight: true) |
| Eager LLM pipeline | Not supported | Supported |
| Utterance buffering | SDK buffers is_final segments until speech_final | Turn lifecycle managed by Deepgram |
Use DeepgramFlux when: you want the lowest latency via the eager LLM pipeline, or you prefer the turn-based conversation model.
Use DeepgramSTT when: you need Nova-3’s broader language support, domain-specific models (medical, finance), or V1-specific features like diarization.
TurnInfo events
Flux delivers transcription through TurnInfo events that map to the CompositeVoice transcription model:
| V2 event | SDK result | Description |
|---|---|---|
StartOfTurn | isFinal: false | Speech detected, turn has begun |
Update | isFinal: false | Partial transcript update (like interim results) |
EagerEndOfTurn | isPreflight: true | Early end-of-turn prediction — triggers eager LLM |
TurnResumed | isFinal: false | User resumed speaking after an eager end-of-turn |
EndOfTurn | isFinal: true, speechFinal: true | Confirmed end of utterance — sets utteranceComplete: true, triggering LLM processing |
Eager LLM pipeline
The killer feature of DeepgramFlux is the EagerEndOfTurn signal. When Deepgram predicts that the speaker is about to stop talking, it fires this event early — before the final EndOfTurn confirmation. The SDK uses it to start LLM generation speculatively.
Configure the threshold to balance speed vs. accuracy:
const stt = new DeepgramFlux({
proxyUrl: '/api/proxy/deepgram',
options: {
model: 'flux-general-en',
eagerEotThreshold: 0.5, // lower = faster but more false positives
eotThreshold: 0.7, // higher = more certain before confirming end-of-turn
},
});
Enable the eager pipeline in CompositeVoice:
const agent = new CompositeVoice({
providers: [
new MicrophoneInput(),
stt,
new AnthropicLLM({ proxyUrl: '/api/proxy/anthropic' }),
new DeepgramTTS({ proxyUrl: '/api/proxy/deepgram' }),
new BrowserAudioOutput(),
],
eagerLLM: {
enabled: true,
cancelOnTextChange: true,
similarityThreshold: 0.8, // accept if >=80% word overlap
},
});
The similarityThreshold controls how different the final text can be from the preflight text before the speculative response is cancelled. A value of 0.8 means that if 80%+ of the words match (in order), the response is kept. See textSimilarity for details on how similarity is computed.
Complete example
import { CompositeVoice, MicrophoneInput, DeepgramFlux, AnthropicLLM, DeepgramTTS, BrowserAudioOutput } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
providers: [
new MicrophoneInput(),
new DeepgramFlux({
proxyUrl: '/api/proxy/deepgram',
language: 'en',
options: {
model: 'flux-general-en',
eagerEotThreshold: 0.5,
eotThreshold: 0.7,
eotTimeoutMs: 5000,
keyterms: ['CompositeVoice'],
},
}),
new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
maxTokens: 256,
systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
}),
new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
}),
new BrowserAudioOutput(),
],
eagerLLM: {
enabled: true,
cancelOnTextChange: true,
similarityThreshold: 0.8,
},
conversationHistory: { enabled: true, maxTurns: 10 },
logging: { enabled: true, level: 'info' },
});
agent.on('transcription.preflight', (event) => {
console.log('Eager end-of-turn:', event.text);
});
agent.on('transcription.speechFinal', (event) => {
console.log('Confirmed:', event.text);
});
await agent.initialize();
await agent.startListening();
Tips and gotchas
- Always use a proxy in production. Pass
proxyUrlinstead ofapiKeyso your Deepgram key never reaches the browser. The SDK convertshttp(s)tows(s)automatically. - No peer dependencies. DeepgramFlux connects directly to the Deepgram V2 WebSocket API using native WebSocket — no
@deepgram/sdkrequired. - Set
eagerEotThresholdto enable preflight. Without this option, DeepgramFlux will not emitEagerEndOfTurnevents, and the eager LLM pipeline will have no preflight signals to work with. - Connection timeout. The WebSocket connection defaults to a 10-second timeout. Adjust with
timeoutin the config if your network is slow. - Keep-alive. Use
sendKeepAlive()to prevent the V2 WebSocket from timing out during long pauses.
Related resources
- Eager pipeline example — preflight signals with speculative LLM
- Deepgram pipeline example — full Deepgram STT + TTS pipeline
- Proxy server example — secure your API key server-side
- Pipeline architecture — how the eager pipeline works
- API reference: DeepgramFlux
- API reference: DeepgramFluxConfig
- Providers reference