Skip to content

WebLLMLLMConfig

Configuration for the WebLLM in-browser LLM provider.

Defined in: src/providers/llm/webllm/WebLLMLLM.ts:92

Configuration for the WebLLM in-browser LLM provider.

Remarks

Unlike server-side providers, WebLLM needs no API key or proxy — everything runs client-side via WebGPU. The only required field is model.

Example

const config: WebLLMLLMConfig = {
  model: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
  stream: true,
  systemPrompt: 'You are a helpful assistant running locally.',
  onLoadProgress: ({ progress, text }) => {
    console.log(`Loading: ${Math.round(progress * 100)}% - ${text}`);
  },
};

See

Extends

Properties

PropertyTypeDefault valueDescriptionOverridesInherited fromDefined in
apiKey?stringundefinedAPI key or authentication token for the provider. Remarks For client-side usage, consider using a proxy server to keep API keys secure. The SDK provides Express, Next.js, and Node adapters for this purpose.-LLMProviderConfig.apiKeysrc/core/types/providers.ts:67
authType?"token" | "bearer"Provider-specific (typically 'token' for Deepgram, ignored for REST providers)Authentication type for providers that support multiple auth mechanisms. Remarks Controls how the apiKey is sent to the provider: - 'token' — WebSocket subprotocol ['token', apiKey] or header Authorization: Token <key>. This is the default for Deepgram providers. - 'bearer' — WebSocket subprotocol ['bearer', token] or header Authorization: Bearer <token>. Use this for OAuth tokens or providers that expect Bearer auth. REST/SDK providers (Anthropic, OpenAI) handle auth through their SDK constructors and ignore this field.-LLMProviderConfig.authTypesrc/core/types/providers.ts:111
chatOpts?Record<string, unknown>undefinedOverride entries from mlc-chat-config.json at engine creation time. Remarks Useful for tuning engine parameters such as context_window_size, prefill_chunk_size, or sliding_window_size without modifying the model’s packaged configuration. Example chatOpts: { context_window_size: 2048, prefill_chunk_size: 1024, }--src/providers/llm/webllm/WebLLMLLM.ts:145
debug?booleanfalseWhether to enable debug logging for this provider. Remarks When true, the provider emits detailed internal logs. This is separate from the SDK-level LoggingConfig.-LLMProviderConfig.debugsrc/core/types/providers.ts:122
endpoint?stringundefinedCustom endpoint URL to override the provider’s default API endpoint. Remarks Useful for self-hosted instances, proxy servers, or development environments.-LLMProviderConfig.endpointsrc/core/types/providers.ts:75
maxTokens?numberundefinedMaximum number of tokens to generate in the response. Remarks For voice applications, lower values (100-300) help keep responses concise and reduce TTS latency.-LLMProviderConfig.maxTokenssrc/core/types/providers.ts:677
modelstringundefinedWebLLM model identifier. Remarks Must match one of the model IDs supported by @mlc-ai/web-llm. The model weights are downloaded on first use and cached by the browser for subsequent loads. Example 'Llama-3.2-1B-Instruct-q4f16_1-MLC' See Available modelsLLMProviderConfig.model-src/providers/llm/webllm/WebLLMLLM.ts:105
onLoadProgress?(progress) => voidundefinedCallback fired during model download and WebGPU shader compilation. Remarks Wire this to a progress bar for good UX — initial loads can be 100 MB+. The callback receives a WebLLMLoadProgress object with progress (0—1), timeElapsed (seconds), and a human-readable text description. Example onLoadProgress: ({ progress, text }) => { progressBar.style.width =${progress * 100}%; statusLabel.textContent = text; }--src/providers/llm/webllm/WebLLMLLM.ts:125
proxyUrl?stringundefinedURL of a CompositeVoice proxy server endpoint for this provider. Remarks When set, requests are routed through the proxy which injects the real API key server-side. This keeps API keys out of the browser. For WebSocket providers the HTTP URL is automatically converted to ws(s)://. At least one of apiKey or proxyUrl must be set for providers that require authentication (all except NativeSTT, NativeTTS, and WebLLM). Example proxyUrl: 'http://localhost:3000/api/proxy/deepgram'-LLMProviderConfig.proxyUrlsrc/core/types/providers.ts:93
stopSequences?string[]undefinedSequences that cause the LLM to stop generating. Remarks When the model generates any of these sequences, generation halts. Useful for controlling response boundaries.-LLMProviderConfig.stopSequencessrc/core/types/providers.ts:715
stream?booleanundefinedWhether to stream the LLM response token by token. Remarks When true, the provider yields tokens incrementally via an async iterable. Streaming is essential for low-latency voice applications as it allows TTS to begin synthesizing before the full response is generated.-LLMProviderConfig.streamsrc/core/types/providers.ts:706
systemPrompt?stringundefinedSystem prompt providing instructions and context to the LLM. Remarks Sets the behavior and persona of the assistant. For voice applications, include instructions to keep responses brief and conversational.-LLMProviderConfig.systemPromptsrc/core/types/providers.ts:696
temperature?numberundefinedTemperature for controlling generation randomness. Remarks Values from 0 (deterministic) to 2 (highly creative). Lower values produce more focused responses; higher values increase variety.-LLMProviderConfig.temperaturesrc/core/types/providers.ts:668
timeout?numberundefinedRequest timeout in milliseconds. Remarks Applies to HTTP requests (REST providers) and connection establishment (WebSocket providers). Set to 0 for no timeout.-LLMProviderConfig.timeoutsrc/core/types/providers.ts:131
topP?numberundefinedTop-P (nucleus) sampling parameter. Remarks Limits token selection to the smallest set whose cumulative probability exceeds this value. Values from 0 to 1. Often used as an alternative to temperature.-LLMProviderConfig.topPsrc/core/types/providers.ts:687

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency