Skip to content

OpenAI Compatible

Connect any OpenAI-compatible LLM endpoint to a CompositeVoice pipeline.

Use OpenAICompatibleLLM when you need to connect a custom, self-hosted, or third-party LLM that speaks the OpenAI chat completions format. This includes services like Ollama, vLLM, LiteLLM, Together AI, Perplexity, DeepSeek, and any other /v1/chat/completions endpoint.

Prerequisites

  • An accessible OpenAI-compatible API endpoint
  • No additional dependencies required. OpenAICompatibleLLM uses native fetch internally.

Basic setup

import { CompositeVoice, OpenAICompatibleLLM, NativeSTT, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  providers: [
    new NativeSTT({ language: 'en-US' }),
    new OpenAICompatibleLLM({
      endpoint: 'https://my-model-server.example.com/v1',
      apiKey: 'my-api-key',
      model: 'my-custom-model',
      systemPrompt: 'You are a concise voice assistant. Keep answers under two sentences.',
    }),
    new NativeTTS(),
  ],
});

await agent.initialize();
await agent.startListening();

Configuration options

OptionTypeDefaultDescription
modelstring(required)Model identifier recognized by the target endpoint.
endpointstringCustom API endpoint URL (e.g., http://localhost:11434/v1).
systemPromptstringSystem-level instructions for the assistant.
temperaturenumberRandomness (0 = deterministic, 2 = creative).
maxTokensnumberMaximum tokens per response.
topPnumberNucleus sampling threshold (0—1).
streambooleantrueStream tokens incrementally.
proxyUrlstringCompositeVoice proxy endpoint. Takes precedence over endpoint.
apiKeystringAPI key for the target endpoint.
maxRetriesnumber3Retry count for failed requests.

Common endpoints

Ollama (local models)

const llm = new OpenAICompatibleLLM({
  endpoint: 'http://localhost:11434/v1',
  apiKey: 'ollama',  // Ollama ignores the key but the SDK requires one
  model: 'llama3.2',
  systemPrompt: 'You are a helpful voice assistant.',
});

Together AI

const llm = new OpenAICompatibleLLM({
  endpoint: 'https://api.together.xyz/v1',
  apiKey: 'your-together-api-key',
  model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
});

DeepSeek

const llm = new OpenAICompatibleLLM({
  endpoint: 'https://api.deepseek.com/v1',
  apiKey: 'your-deepseek-api-key',
  model: 'deepseek-chat',
});

Complete example

import {
  CompositeVoice,
  MicrophoneInput,
  OpenAICompatibleLLM,
  DeepgramSTT,
  DeepgramTTS,
  BrowserAudioOutput,
} from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  providers: [
    new MicrophoneInput(),
    new DeepgramSTT({
      proxyUrl: '/api/proxy/deepgram',
      language: 'en',
      options: { model: 'nova-3', smartFormat: true },
    }),
    new OpenAICompatibleLLM({
      endpoint: 'http://localhost:11434/v1',
      apiKey: 'ollama',
      model: 'llama3.2',
      temperature: 0.7,
      maxTokens: 256,
      systemPrompt: 'You are a friendly voice assistant. Answer briefly.',
    }),
    new DeepgramTTS({
      proxyUrl: '/api/proxy/deepgram',
      voice: 'aura-2-thalia-en',
    }),
    new BrowserAudioOutput(),
  ],
  conversationHistory: { enabled: true, maxTurns: 10 },
});

await agent.initialize();
await agent.startListening();

Tips

  • Provide either apiKey or proxyUrl. At least one is required. If both are set, proxyUrl takes precedence and the SDK sends a dummy key.
  • Verify your endpoint supports streaming. Some self-hosted setups disable SSE streaming. Set stream: false if your endpoint does not support it.
  • This is the base class for OpenAI, Groq, Mistral, and Gemini. If you use one of those services, prefer their dedicated provider classes — they set correct defaults for endpoint and model.
  • Extend this class for custom providers. Override providerName and buildClientOptions() to add provider-specific behavior. See the source of GroqLLM or GeminiLLM for examples.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency