Open Source

Build voice interfaces
for the web

A composable SDK for speech-to-text, LLM, and text-to-speech pipelines. Mix providers, stream responses, and ship conversational AI — all from the browser.

Get Started

GitHub

pnpm i @lukeocodes/composite-voice@0.1.1

Everything you need for voice

Event-driven architecture, typed APIs, and production-ready defaults.

Composable Pipeline

Five-role pipeline with swappable input, STT, LLM, TTS, and output providers. Swap Deepgram for AssemblyAI or Anthropic for OpenAI — your pipeline, your choice.

Eager Streaming

Speculative LLM generation begins during speech recognition. Responses start before the user finishes speaking.

Resilient Connections

Pipeline-level error recovery and WebSocket reconnection with exponential backoff. Configurable retry strategies keep voice sessions alive.

Turn-Taking Strategies

Automatic barge-in, plus conservative, aggressive, or detect strategies. Users can interrupt mid-speech and the SDK handles generation tracking seamlessly.

Smart Text Routing

LLM output is split into visual and spoken streams. Code fences are buffered and never read aloud. Markdown is stripped for natural TTS while the UI gets full formatting.

Zero SDK Dependencies

LLM and TTS providers use native fetch — no @anthropic-ai/sdk or openai packages to install. WebSocket providers use native WebSocket. Only WebLLM needs a peer dep.

Server-Side Proxy

Express, Next.js, and plain Node adapters with built-in rate limiting, body size limits, and auth hooks. API keys stay server-side.

Extensible Providers

Abstract base classes for STT, LLM, and TTS. Build custom providers for any service with a clean, typed interface.

Your pipeline, your providers

First-class support for leading AI services — or bring your own with extensible base classes.

Speech-to-Text

Deepgram
AssemblyAI
ElevenLabs
Web Speech API

Large Language Models

Anthropic
OpenAI
Groq
Google Gemini
Mistral
WebLLM

Text-to-Speech

Deepgram
OpenAI
ElevenLabs
Cartesia
Web Speech API

Up and running in minutes

Three providers, a few lines of code, and you have a working voice pipeline.

voice.ts

import {
  CompositeVoice,
  DeepgramSTT,
  AnthropicLLM,
  DeepgramTTS,
} from "@lukeocodes/composite-voice";

const voice = new CompositeVoice({
  providers: [
    new DeepgramSTT({ proxyUrl: "/api/proxy/deepgram" }),
    new AnthropicLLM({ proxyUrl: "/api/proxy/anthropic", model: "claude-sonnet-4-20250514" }),
    new DeepgramTTS({ proxyUrl: "/api/proxy/deepgram" }),
  ],
});

voice.on("transcription.speechFinal", ({ text }) => console.log("User:", text));
voice.on("llm.complete", ({ text }) => console.log("AI:", text));

await voice.initialize();
await voice.startListening();

Ready to build?

Get started with the documentation, explore the design system, or jump straight into the code.

Read the Docs View on GitHub

Build voice interfacesfor the web