Skip to content

Running Fully Local

Kenzy was architected so that every stage of the voice pipeline can run on your own hardware — nothing spoken in your home has to leave your network. The default setup uses OpenAI for the "thinking" and the voice because it's the easiest start, but that's a configuration choice, not a dependency. This page is the recipe for flipping each stage local.

What "fully local" means here

The voice pipeline — wake word, speech-to-text, reasoning, text-to-speech, and speaker identification — runs entirely on your machines. Two honest caveats:

  • Skills that fetch internet content still fetch it. Weather, news, stocks, and web search exist to bring the outside world in; asking for them makes an outbound request (with the query, never room audio). Disable any of them in the dashboard's Skills tab if you want zero outbound traffic.
  • First-time setup downloads models once (kenzy-setup, pip installs, the first Ollama pull). After that, day-to-day operation needs no internet.

Stage by stage

Stage Local engine Status
Wake word openwakeword on the node Always local — no config needed
Speaker ID SpeechBrain ECAPA on kenzy-speaker Always local — no config needed
Speech-to-text faster-whisper on kenzy-stt Local by default (provider: whisper)
Reasoning (LLM) Ollama / LM Studio via LiteLLM One config change (below)
Text-to-speech Kokoro on kenzy-tts One extra + one config change (below)
Dashboard, node links, HA control, lists Already LAN-only by design

So on a default install, three of five stages are already local — only the LLM and the voice need switching.

The LLM → Ollama

On the server (or any box with the horsepower):

curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b        # or another tool-calling-capable model

Then in the dashboard, Services → llm:

model: "ollama/qwen2.5:14b"
base_url: "http://127.0.0.1:11434"

No API key is needed — Kenzy deliberately sends no cloud credential to a custom base_url. Skill sub-calls (news summaries, the HA resolver) follow the same model unless you override their per-skill model.

The hardware reality

This is the one stage where local costs real iron. Kenzy's skills depend on tool calling, and small models are noticeably worse at it. A ~7–14B model on a GPU with 8–16 GB VRAM (or an Apple-silicon Mac) gives a good experience; CPU-only LLM inference is technically possible and practically frustrating. Whisper and Kokoro, by contrast, are perfectly happy on CPU.

The voice → Kokoro

On the TTS host:

sudo apt-get install espeak-ng
pip install 'kenzy[kokoro]'     # in Kenzy's venv (adds Kokoro + PyTorch)
kenzy-setup                     # pre-download the voice weights

Then in the dashboard, Services → tts: set provider: kokoro (voice/speed under the kokoro: block). Output format is identical to the cloud provider, so nothing else changes. See TTS Configuration.

Speech-to-text — verify, don't change

provider: whisper is the shipped default; just confirm nobody switched it to openai (Services → stt). Pick the model size for your CPU in the same place (guide).

Skills housekeeping

In the dashboard's Skills tab, for a strictly-no-outbound setup disable web_search, get_news/get_news_article, get_current_weather/get_forecast, and get_stock_info — or keep them and accept those queries going out. If you want web search without a cloud dependency, point it at a self-hosted SearXNG (skills.web_search.provider: searxng).

Verifying it

The pull-the-plug test, in order of what it proves:

  1. Disconnect your router's WAN (leave the LAN up).
  2. "Hey Kenzie… what time is it?" — proves wake word → STT → fast path → TTS, end to end, no internet.
  3. "Turn on the kitchen lights." — adds Home Assistant control (all LAN).
  4. Ask something open-ended ("tell me a joke") — proves the local LLM path.
  5. Check the dashboard's Activity tab: every interaction should show normal latencies, and the fleet view all-green, with the WAN still dark.

If step 2 works but 4 doesn't, the LLM config is the issue; if nothing speaks, check TTS — the Troubleshooting page takes it from there. And if a stage does fail, Kenzy now says so out loud rather than going silent (the pre-recorded failure cue — sound_error in a node's settings).

The hybrid option: cloud primary, local fallback

You don't have to choose. Keep the cloud providers as your day-to-day setup and let the local engines catch failures automatically — each retry is silent, and only a double failure reaches the error cue:

  • LLM: set fallback.model / fallback.base_url to a local Ollama model — an internet outage degrades to a dumber-but-present assistant instead of an apology.
  • TTS: openai.fallback: true (the default) uses local Kokoro when the cloud fails — if you've installed the kokoro extra.
  • STT: openai.fallback: true (the default) retries with local whisper — relevant only if you switched STT to the cloud provider.