Node Configuration¶
File: configs/node.yaml
Command: kenzy-node [config_path]
The node service runs on each room device. It captures microphone audio, detects the wake word, streams PCM to the server, and plays back TTS responses.
node.yaml is bootstrap-only: it holds just what the node needs to start up, log, and reach the server (log_level/verbose, server_url/discovery, and a stable node_id). On every boot the node pulls its full operational config from the server — audio device, sample rates, wakeword models/threshold/VAD, sounds, VAD timing, and its room name — and does not initialize audio until that config arrives. Configure all of it from the dashboard, keyed by the node's node_id. The operational keys below may still be set locally as a pre-connect fallback for any key the server does not push, but the dashboard/server is authoritative.
Full reference¶
| Key | Default | Description |
|---|---|---|
server_url |
null |
WebSocket URL of the kenzy-server. Leave null/empty to auto-discover the server on the LAN via mDNS; set an explicit ws:// URL to skip discovery (e.g. across VLANs that block multicast). |
discovery.enabled |
true |
Browse for the server over mDNS when server_url is unset |
node_id |
(generated) | Stable primary identifier for this node. Leave unset — one is generated and written back to node.yaml on first run (or assigned at install with kenzy-init --node-id ID), then kept across restarts. The server keys the registry, per-node config, and all controls on it, so a node's identity (and its config) survives even when the room name changes or the device is reimaged. |
room_id |
null |
Human room name (e.g. kitchen). Server-owned: set it from the dashboard. Until the server provides one it falls back to the hostname. Sent to the assistant as context (used in conversation history). |
audio_device |
null |
PortAudio device name substring or integer index. null uses the system default. Use kenzy-devices to find the correct value. |
capture_sample_rate |
16000 |
Sample rate for microphone capture. Set to the device's native rate if it does not support 16000 Hz; audio is resampled automatically. |
playback_sample_rate |
24000 |
Sample rate for speaker output. Set to the device's native rate if it does not support 24000 Hz; TTS audio is resampled automatically. |
volume |
100 |
Playback volume [0–100]. Server-owned, applies live (config-pull). Settable from the dashboard or by voice ("turn it up", "set the volume to 40"). Affects TTS, intercom, and announcements. |
muted |
false |
Runtime only — not persisted. Mutes all playback except the wake-word ready chime, which stays audible (at a floor level) so you can tell the device is listening and knowingly unmute. Toggle from the dashboard or by voice ("mute"/"unmute"). A node always comes back un-muted after a restart. |
log_level |
"info" |
What the node prints to its console (debug/info/warning/error). Live-tunable from the dashboard. |
log_capture_level |
"debug" |
Server-owned. How deep the dashboard log viewer can see for this node (trace/debug/…), independent of log_level. Captured only while the dashboard's logs flag is on (otherwise zero overhead). Set trace to include per-frame audio logs. |
verbose |
false |
Also enables debug output from websockets and asyncio internals |
Let the dashboard tune these for you
wakeword_threshold, wakeword_vad_threshold, and silence_rms_threshold are
mic- and room-specific. Rather than guessing, use the Calibration panel in the
dashboard (or kenzy-node --calibrate on a headless node) to measure your room and
apply suggested values. See Calibrating a node's audio.
Wake word¶
| Key | Default | Description |
|---|---|---|
wakeword_models |
[] |
List of paths to .tflite or .onnx model files. Empty uses the bundled hey_ken_zee.tflite |
wakeword_threshold |
0.5 |
Confidence threshold [0.0–1.0] above which a detection fires |
wakeword_vad_threshold |
0.0 |
openwakeword Silero VAD gate [0.0–1.0]. Wake-word predictions are discarded unless the voice-activity score exceeds this. 0 disables it. Set to ~0.5 to suppress false detections on near-silence/noise. With it enabled you can safely lower wakeword_threshold (e.g. 0.4) for better real-speech sensitivity without reintroducing silence false-positives. The Silero VAD model is downloaded automatically by kenzy-setup. |
Voice activity detection (VAD)¶
| Key | Default | Description |
|---|---|---|
vad_enabled |
true |
When false, the node streams until the server sends STOP. Hard cap does not apply. |
silence_rms_threshold |
50 |
RMS amplitude [0–32767] below which a frame is considered silent |
silence_ms |
400 |
Consecutive silence (ms) that ends an active session, once speech_min_ms has been heard |
speech_min_ms |
400 |
Minimum speech (ms) that must be detected before silence detection activates. Prevents the session ending on the pause after the wake word. |
no_speech_timeout_ms |
15000 |
Timeout (ms) if no speech is heard after activation. Prevents indefinite streaming when the wake word fires accidentally. |
hard_cap_ms |
30000 |
Unconditional session ceiling (ms). The session ends regardless of VAD state. |
Sound files¶
| Key | Default | Description |
|---|---|---|
sound_ready |
null |
WAV file played on activation (the "chime"). null uses the bundled ready.wav. Accepts an absolute path or a bare filename loaded from the bundled sounds directory. |
sound_waiting |
null |
WAV file played while waiting for the server response. Plays once and stops naturally or is interrupted when TTS begins. null (or an empty string) disables it — pure silence while waiting. Provide a filename or path to enable it. |
sound_connect |
"connect.wav" |
Chime played when an intercom call connects (bundled default; path, or empty/null to disable). |
sound_disconnect |
"disconnect.wav" |
Chime played when an intercom call ends (bundled default; path, or empty/null to disable). |
Zero-config nodes (discovery + config-pull)
A node needs no operational local config. With server_url unset it finds the server via mDNS, generates a stable node_id on first run (or one assigned at install via kenzy-init --node-id), and blocks until the server answers — it connects, sends hello, and waits for the server's config frame before initializing audio. That effective config is the server's node_defaults plus any per-node override in configs/nodes/<node_id>.yaml. Hardware keys (audio_device, sample rates, wakeword_models/VAD gate, sounds) are applied as the audio stack is built on this first pull; a later change to a hardware key needs a restart (one click in the dashboard). Live-tunable keys (wake-word threshold, silence RMS, VAD timing) apply immediately on every push. So a room device can run with an essentially empty node.yaml, and everything — including its room name — is configured from the dashboard and centralised on the server. Pre-seed a node by creating configs/nodes/<node_id>.yaml on the server before the device first connects. See Server Configuration.
Finding the right device name
Run kenzy-devices after install. It tests every PortAudio device against Kenzy's required sample rates and prints ready-to-paste node.yaml settings including capture_sample_rate and playback_sample_rate if resampling is needed.
Prefer a speakerphone with hardware AEC
Use a USB speakerphone with built-in acoustic echo cancellation as the node's mic+speaker. Kenzy does not cancel echo itself, so without hardware AEC the node hears its own TTS playback and may falsely wake or interrupt — otherwise you must handle echo cancellation outside Kenzy.
Custom wake word
Custom wake word models can be trained at openWakeWord and pointed to via wakeword_models. Both .tflite and .onnx formats are supported.
Example¶
A typical node only needs the bootstrap keys — audio and tuning come from the server:
log_level: "info"
server_url: null # null = discover the server via mDNS
discovery:
enabled: true
# node_id is generated and written here automatically on first run — leave it unset
The operational keys may still be set locally as a pre-connect fallback for any key the server does not push (e.g. to pin a device before the node is configured in the dashboard):
audio_device: "Anker PowerConf S330" # substring of name shown by kenzy-devices
capture_sample_rate: 48000 # device native rate; resampled to 16000 Hz
playback_sample_rate: 48000 # device native rate; resampled to 24000 Hz
wakeword_threshold: 0.4 # lower is safe once VAD gating is on
wakeword_vad_threshold: 0.5 # reject wake-word hits on near-silence/noise