STT Configuration

File: configs/stt.yaml
Command: kenzy-stt [config_path]

The STT service accepts POST requests with base64-encoded PCM audio and returns a transcript. It is built on faster-whisper, a CTranslate2-optimized implementation of OpenAI Whisper.

Pulled from the server

kenzy-stt pulls this config from the server at boot — it discovers the server via mDNS (or KENZY_SERVER_URL) and blocks until it answers, so start the server first. Edit it from the dashboard's Services tab (writes configs/services/stt.yaml on the server and restarts the service). Passing an explicit path (kenzy-stt configs/stt.yaml) loads locally instead — a dev/offline escape hatch. See central config for backend services.

Full reference

Key Default Description
host "127.0.0.1" Bind address
port 8767 HTTP port
log_level "info" What the service prints to its console
log_capture_level "debug" How deep the dashboard log viewer can see (trace/debug/…), independent of log_level

Whisper model

Key Default Description
whisper.model "tiny" Model size: tiny, base, small, medium, large-v2, large-v3. Larger models are more accurate but slower and need more RAM.
whisper.device "cpu" Inference device: cpu or cuda
whisper.compute_type "int8" Quantisation: int8 (fastest on CPU), float16 (GPU), float32 (highest quality)
whisper.language "en" Language code (e.g. "en", "fr"), or null for auto-detect

Model size guide

Model Size Relative speed Notes
tiny ~75 MB Fastest Good for fast hardware or simple commands
base ~145 MB Fast Better accuracy, still CPU-friendly
small ~460 MB Moderate Good balance for a dedicated CPU server
medium ~1.5 GB Slow on CPU Recommended with a GPU
large-v3 ~3 GB Slow Best accuracy; GPU strongly recommended

Run STT off the node

Don't run STT on a room-node board (Orange Pi Zero 3 / Raspberry Pi 3–5) — run it on a more powerful server and point stt.url in server.yaml at it. The tiny or base model on a modern x86 CPU gives acceptable latency.

Example

host: "127.0.0.1"
port: 8767

whisper:
  model: "base"
  device: "cpu"
  compute_type: "int8"
  language: "en"