Running Fully Local¶
Kenzy was architected so that every stage of the voice pipeline can run on your own hardware — nothing spoken in your home has to leave your network. The default setup uses OpenAI for the "thinking" and the voice because it's the easiest start, but that's a configuration choice, not a dependency. This page is the recipe for flipping each stage local.
What "fully local" means here¶
The voice pipeline — wake word, speech-to-text, reasoning, text-to-speech, and speaker identification — runs entirely on your machines. Two honest caveats:
- Skills that fetch internet content still fetch it. Weather, news, stocks, and web search exist to bring the outside world in; asking for them makes an outbound request (with the query, never room audio). Disable any of them in the dashboard's Skills tab if you want zero outbound traffic.
- First-time setup downloads models once (
kenzy-setup, pip installs, the first Ollama pull). After that, day-to-day operation needs no internet.
Stage by stage¶
| Stage | Local engine | Status |
|---|---|---|
| Wake word | openwakeword on the node | Always local — no config needed |
| Speaker ID | SpeechBrain ECAPA on kenzy-speaker |
Always local — no config needed |
| Speech-to-text | faster-whisper on kenzy-stt |
Local by default (provider: whisper) |
| Reasoning (LLM) | Ollama / LM Studio via LiteLLM | One config change (below) |
| Text-to-speech | Kokoro on kenzy-tts |
One extra + one config change (below) |
| Dashboard, node links, HA control, lists | — | Already LAN-only by design |
So on a default install, three of five stages are already local — only the LLM and the voice need switching.
The LLM → Ollama¶
On the server (or any box with the horsepower):
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b # or another tool-calling-capable model
Then in the dashboard, Services → llm:
No API key is needed — Kenzy deliberately sends no cloud credential to a custom
base_url. Skill sub-calls (news summaries, the HA resolver) follow the same
model unless you override their per-skill model.
The hardware reality
This is the one stage where local costs real iron. Kenzy's skills depend on tool calling, and small models are noticeably worse at it. A ~7–14B model on a GPU with 8–16 GB VRAM (or an Apple-silicon Mac) gives a good experience; CPU-only LLM inference is technically possible and practically frustrating. Whisper and Kokoro, by contrast, are perfectly happy on CPU.
The voice → Kokoro¶
On the TTS host:
sudo apt-get install espeak-ng
pip install 'kenzy[kokoro]' # in Kenzy's venv (adds Kokoro + PyTorch)
kenzy-setup # pre-download the voice weights
Then in the dashboard, Services → tts: set provider: kokoro (voice/speed
under the kokoro: block). Output format is identical to the cloud provider,
so nothing else changes. See TTS Configuration.
Speech-to-text — verify, don't change¶
provider: whisper is the shipped default; just confirm nobody switched it to
openai (Services → stt). Pick the model size for your CPU in the same
place (guide).
Skills housekeeping¶
In the dashboard's Skills tab, for a strictly-no-outbound setup disable
web_search, get_news/get_news_article, get_current_weather/get_forecast,
and get_stock_info — or keep them and accept those queries going out. If you
want web search without a cloud dependency, point it at a self-hosted
SearXNG (skills.web_search.provider: searxng).
Verifying it¶
The pull-the-plug test, in order of what it proves:
- Disconnect your router's WAN (leave the LAN up).
- "Hey Kenzie… what time is it?" — proves wake word → STT → fast path → TTS, end to end, no internet.
- "Turn on the kitchen lights." — adds Home Assistant control (all LAN).
- Ask something open-ended ("tell me a joke") — proves the local LLM path.
- Check the dashboard's Activity tab: every interaction should show normal latencies, and the fleet view all-green, with the WAN still dark.
If step 2 works but 4 doesn't, the LLM config is the issue; if nothing speaks,
check TTS — the Troubleshooting page takes it from there.
And if a stage does fail, Kenzy now says so out loud rather than going silent
(the pre-recorded failure cue — sound_error in a node's settings).
The hybrid option: cloud primary, local fallback¶
You don't have to choose. Keep the cloud providers as your day-to-day setup and let the local engines catch failures automatically — each retry is silent, and only a double failure reaches the error cue:
- LLM: set
fallback.model/fallback.base_urlto a local Ollama model — an internet outage degrades to a dumber-but-present assistant instead of an apology. - TTS:
openai.fallback: true(the default) uses local Kokoro when the cloud fails — if you've installed thekokoroextra. - STT:
openai.fallback: true(the default) retries with local whisper — relevant only if you switched STT to the cloud provider.