Skip to content

Google Gemini

Use Google Gemini models as the LLM provider in a CompositeVoice pipeline.

Use GeminiLLM when you want Google’s Gemini models with their strong multimodal capabilities and competitive performance.

Prerequisites

  • A Google AI Studio API key or a CompositeVoice proxy server
  • No additional dependencies required. GeminiLLM uses native fetch internally.

Basic setup

import { CompositeVoice, GeminiLLM, NativeSTT, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  providers: [
    new NativeSTT({ language: 'en-US' }),
    new GeminiLLM({
      proxyUrl: '/api/proxy/gemini',
      model: 'gemini-2.0-flash',
      systemPrompt: 'You are a concise voice assistant. Keep answers under two sentences.',
    }),
    new NativeTTS(),
  ],
});

await agent.initialize();
await agent.startListening();

Configuration options

OptionTypeDefaultDescription
modelstring'gemini-2.0-flash'Model identifier. See model variants below.
systemPromptstringSystem-level instructions for the assistant.
temperaturenumberRandomness (0 = deterministic, 2 = creative).
maxTokensnumberMaximum tokens per response.
topPnumberNucleus sampling threshold (0—1).
streambooleantrueStream tokens incrementally.
proxyUrlstringCompositeVoice proxy endpoint. Recommended for browsers.
geminiApiKeystringGemini API key. Convenience alias for apiKey.
apiKeystringDirect API key. geminiApiKey takes precedence if both are set.

Model variants

ModelSpeedNotes
gemini-2.0-flashFastDefault. Best for low-latency voice applications.
gemini-1.5-flashFastPrevious generation flash model.
gemini-1.5-proSlowerLarger context, higher capability.

Complete example

import {
  CompositeVoice,
  MicrophoneInput,
  GeminiLLM,
  DeepgramSTT,
  DeepgramTTS,
  BrowserAudioOutput,
} from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  providers: [
    new MicrophoneInput(),
    new DeepgramSTT({
      proxyUrl: '/api/proxy/deepgram',
      language: 'en',
      options: { model: 'nova-3', smartFormat: true },
    }),
    new GeminiLLM({
      proxyUrl: '/api/proxy/gemini',
      model: 'gemini-2.0-flash',
      temperature: 0.7,
      maxTokens: 256,
      systemPrompt: 'You are a friendly voice assistant. Answer briefly.',
    }),
    new DeepgramTTS({
      proxyUrl: '/api/proxy/deepgram',
      voice: 'aura-2-thalia-en',
    }),
    new BrowserAudioOutput(),
  ],
  conversationHistory: { enabled: true, maxTurns: 10 },
});

await agent.initialize();
await agent.startListening();

Tips

  • Gemini uses Google’s OpenAI-compatible endpoint. The base URL defaults to https://generativelanguage.googleapis.com/v1beta/openai. You do not need to set this manually.
  • gemini-2.0-flash is ideal for voice. It delivers fast inference with good quality for conversational tasks.
  • GeminiLLM uses native fetch — no Gemini-specific SDK or openai package needed.
  • Google AI Studio keys are free-tier. They work for development and testing. For production, use Vertex AI credentials through a proxy.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency