Skip to content

AssemblyAISTT

Add real-time speech recognition with word boosting to your voice pipeline using AssemblyAI's WebSocket API.

Use AssemblyAISTT when you need real-time transcription with word boosting for domain-specific vocabulary and automatic WebSocket reconnection.

Prerequisites

No peer dependencies are required. AssemblyAISTT connects through a raw WebSocket managed by the SDK’s built-in WebSocketManager.

For production, set up a proxy server so your API key stays server-side.

Basic setup

import { CompositeVoice, MicrophoneInput, AssemblyAISTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  providers: [
    new MicrophoneInput(),
    new AssemblyAISTT({
      proxyUrl: '/api/proxy/assemblyai',
      sampleRate: 16000,
    }),
    new AnthropicLLM({
      proxyUrl: '/api/proxy/anthropic',
      model: 'claude-haiku-4-5',
      systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
    }),
    new NativeTTS(),
  ],
});

await agent.initialize();
await agent.startListening();

Configuration options

OptionTypeDefaultDescription
proxyUrlstringURL of your CompositeVoice proxy endpoint (recommended)
apiKeystringAssemblyAI API key (development only)
sampleRatenumber16000Audio sample rate in Hz
languagestring'en'Language code for transcription
wordBooststring[]Words to prioritize during recognition
interimResultsbooleantrueEmit partial transcripts while the user speaks
timeoutnumber10000Connection timeout in milliseconds

See the API reference for the full list.

Complete example

import { CompositeVoice, MicrophoneInput, AssemblyAISTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  providers: [
    new MicrophoneInput(),
    new AssemblyAISTT({
      proxyUrl: '/api/proxy/assemblyai',
      sampleRate: 16000,
      language: 'en',
      wordBoost: ['CompositeVoice', 'Deepgram', 'AssemblyAI'],
    }),
    new AnthropicLLM({
      proxyUrl: '/api/proxy/anthropic',
      model: 'claude-haiku-4-5',
      maxTokens: 256,
      systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
    }),
    new NativeTTS({ voiceLang: 'en-US' }),
  ],
  conversationHistory: { enabled: true, maxTurns: 10 },
  logging: { enabled: true, level: 'info' },
});

agent.on('transcription.final', (event) => {
  console.log('User said:', event.text);
});

agent.on('response.text', (event) => {
  console.log('Assistant:', event.text);
});

await agent.initialize();
await agent.startListening();

Utterance completion

AssemblyAISTT sets utteranceComplete: true on FinalTranscript messages from the AssemblyAI WebSocket. This is the flag CompositeVoice checks to trigger LLM processing — interim/partial transcripts do not trigger the pipeline.

Tips and gotchas

  • Always use a proxy in production. Pass proxyUrl instead of apiKey so your AssemblyAI key never reaches the browser. The SDK converts http(s) to ws(s) automatically.
  • No peer dependencies. Unlike DeepgramSTT, AssemblyAISTT uses the SDK’s built-in WebSocketManager — no extra packages to install.
  • Word boosting improves accuracy. Pass product names, technical terms, or proper nouns in wordBoost so AssemblyAI prioritizes them during recognition.
  • Audio is base64-encoded. The provider converts raw ArrayBuffer audio into base64 JSON messages ({ audio_data: "..." }) before sending. This is handled automatically.
  • Automatic reconnection. The WebSocketManager reconnects with exponential backoff (up to 5 attempts, 1s initial delay, 30s max delay) if the connection drops.
  • No preflight signals. AssemblyAISTT does not emit preflight/eager end-of-turn events. If you need the eager LLM pipeline, use DeepgramSTT instead.
  • Graceful disconnect. When you call disconnect(), the provider sends a terminate_session message to AssemblyAI before closing the WebSocket.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency