WebLLMLLMConfig

Configuration for the WebLLM in-browser LLM provider.

Defined in: src/providers/llm/webllm/WebLLMLLM.ts:92

Configuration for the WebLLM in-browser LLM provider.

Remarks

Unlike server-side providers, WebLLM needs no API key or proxy — everything runs client-side via WebGPU. The only required field is model.

Example

const config: WebLLMLLMConfig = {
  model: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
  stream: true,
  systemPrompt: 'You are a helpful assistant running locally.',
  onLoadProgress: ({ progress, text }) => {
    console.log(`Loading: ${Math.round(progress * 100)}% - ${text}`);
  },
};

See

LLMProviderConfig for inherited base properties (temperature, maxTokens, systemPrompt, etc.).
Available WebLLM models

Extends

LLMProviderConfig

Properties

Property	Type	Default value	Description	Overrides	Inherited from	Defined in
`apiKey?`	`string` \| () => `Promise`<`string`>	`undefined`	API key or authentication token for the provider. Remarks Can be a static string or an async factory function that returns a fresh token on each call. Use a factory for short-lived tokens (e.g. Deepgram JWTs) so each WebSocket connection gets a valid credential. For client-side usage, consider using a proxy server to keep API keys secure. The SDK provides Express, Next.js, and Node adapters for this purpose.	-	`LLMProviderConfig`.`apiKey`	src/core/types/providers.ts:71
`authType?`	`"token"` \| `"bearer"`	`Provider-specific (typically 'token' for Deepgram, ignored for REST providers)`	Authentication type for providers that support multiple auth mechanisms. Remarks Controls how the `apiKey` is sent to the provider: - `'token'` — WebSocket subprotocol `['token', apiKey]` or header `Authorization: Token <key>`. This is the default for Deepgram providers. - `'bearer'` — WebSocket subprotocol `['bearer', token]` or header `Authorization: Bearer <token>`. Use this for OAuth tokens or providers that expect Bearer auth. REST/SDK providers (Anthropic, OpenAI) handle auth through their SDK constructors and ignore this field.	-	`LLMProviderConfig`.`authType`	src/core/types/providers.ts:115
`chatOpts?`	`Record`<`string`, `unknown`>	`undefined`	Override entries from `mlc-chat-config.json` at engine creation time. Remarks Useful for tuning engine parameters such as `context_window_size`, `prefill_chunk_size`, or `sliding_window_size` without modifying the model’s packaged configuration. Example `chatOpts: { context_window_size: 2048, prefill_chunk_size: 1024, }`	-	-	src/providers/llm/webllm/WebLLMLLM.ts:145
`debug?`	`boolean`	`false`	Whether to enable debug logging for this provider. Remarks When `true`, the provider emits detailed internal logs. This is separate from the SDK-level LoggingConfig.	-	`LLMProviderConfig`.`debug`	src/core/types/providers.ts:126
`endpoint?`	`string`	`undefined`	Custom endpoint URL to override the provider’s default API endpoint. Remarks Useful for self-hosted instances, proxy servers, or development environments.	-	`LLMProviderConfig`.`endpoint`	src/core/types/providers.ts:79
`maxTokens?`	`number`	`undefined`	Maximum number of tokens to generate in the response. Remarks For voice applications, lower values (100-300) help keep responses concise and reduce TTS latency.	-	`LLMProviderConfig`.`maxTokens`	src/core/types/providers.ts:689
`model`	`string`	`undefined`	WebLLM model identifier. Remarks Must match one of the model IDs supported by `@mlc-ai/web-llm`. The model weights are downloaded on first use and cached by the browser for subsequent loads. Example `'Llama-3.2-1B-Instruct-q4f16_1-MLC'` See Available models	`LLMProviderConfig`.`model`	-	src/providers/llm/webllm/WebLLMLLM.ts:105
`onLoadProgress?`	(`progress`) => `void`	`undefined`	Callback fired during model download and WebGPU shader compilation. Remarks Wire this to a progress bar for good UX — initial loads can be 100 MB+. The callback receives a WebLLMLoadProgress object with `progress` (0—1), `timeElapsed` (seconds), and a human-readable `text` description. Example `onLoadProgress: ({ progress, text }) => { progressBar.style.width =`${progress * 100}%`; statusLabel.textContent = text; }`	-	-	src/providers/llm/webllm/WebLLMLLM.ts:125
`proxyUrl?`	`string`	`undefined`	URL of a CompositeVoice proxy server endpoint for this provider. Remarks When set, requests are routed through the proxy which injects the real API key server-side. This keeps API keys out of the browser. For WebSocket providers the HTTP URL is automatically converted to `ws(s)://`. At least one of `apiKey` or `proxyUrl` must be set for providers that require authentication (all except NativeSTT, NativeTTS, and WebLLM). Example `proxyUrl: 'http://localhost:3000/api/proxy/deepgram'`	-	`LLMProviderConfig`.`proxyUrl`	src/core/types/providers.ts:97
`stopSequences?`	`string`[]	`undefined`	Sequences that cause the LLM to stop generating. Remarks When the model generates any of these sequences, generation halts. Useful for controlling response boundaries.	-	`LLMProviderConfig`.`stopSequences`	src/core/types/providers.ts:727
`stream?`	`boolean`	`undefined`	Whether to stream the LLM response token by token. Remarks When `true`, the provider yields tokens incrementally via an async iterable. Streaming is essential for low-latency voice applications as it allows TTS to begin synthesizing before the full response is generated.	-	`LLMProviderConfig`.`stream`	src/core/types/providers.ts:718
`systemPrompt?`	`string`	`undefined`	System prompt providing instructions and context to the LLM. Remarks Sets the behavior and persona of the assistant. For voice applications, include instructions to keep responses brief and conversational.	-	`LLMProviderConfig`.`systemPrompt`	src/core/types/providers.ts:708
`temperature?`	`number`	`undefined`	Temperature for controlling generation randomness. Remarks Values from 0 (deterministic) to 2 (highly creative). Lower values produce more focused responses; higher values increase variety.	-	`LLMProviderConfig`.`temperature`	src/core/types/providers.ts:680
`timeout?`	`number`	`undefined`	Request timeout in milliseconds. Remarks Applies to HTTP requests (REST providers) and connection establishment (WebSocket providers). Set to `0` for no timeout.	-	`LLMProviderConfig`.`timeout`	src/core/types/providers.ts:135
`topP?`	`number`	`undefined`	Top-P (nucleus) sampling parameter. Remarks Limits token selection to the smallest set whose cumulative probability exceeds this value. Values from 0 to 1. Often used as an alternative to temperature.	-	`LLMProviderConfig`.`topP`	src/core/types/providers.ts:699