Private
Stays on your LAN
Audio, transcripts, and model calls can all run inside your own network. No third party in the loop unless you put one there.
// Self-hosted · Open source · v3.0.0
Kenzy is a distributed, self-hosted voice assistant built from six independent microservices — wake word, speech-to-text, a language model, speaker ID, and text-to-speech. Point the LLM at a model on your own machine and your voice never leaves the network.
// one-line installer landing soon — manual setup on Get Started
01 Why local
Most assistants ship every word you say to a server you don't control. Kenzy flips that: because kenzy-llm runs on LiteLLM, you can point it at a model running on your own box — Ollama, LM Studio, vLLM — or a cloud provider if you'd rather. Your call, per service.
Private
Audio, transcripts, and model calls can all run inside your own network. No third party in the loop unless you put one there.
Open model
LiteLLM speaks to local runtimes and every major provider. Swap models with one line of YAML — no rewiring.
No meter
Run it on hardware you already own and the marginal cost of "what's the weather?" is electricity, not API credits.
Hackable
Plain Python, readable configs, and a one-file skill system. Built to be tinkered with, not locked down.
02 What's inside
kenzy-node
openWakeWord runs on every frame locally, with an optional Silero VAD gate to kill false triggers. Train and drop in your own wake word.
kenzy-speaker
SpeechBrain ECAPA-TDNN identifies enrolled speakers — so unlocking the front door by voice can require a recognized person.
kenzy-llm
Drop an async function in skills/, decorate it with @skill, and the model calls it as a tool. No registration, no boilerplate.
fast path
Common phrases like "turn on the lights" resolve deterministically — no model round-trip — so they answer the moment you finish speaking.
v3.0.0 · GROUND-UP REWRITE
Kenzy v3 is a complete redesign — not a refactor. The monolith is gone, replaced by six small services that each do one job and talk over a simple WebSocket + PCM protocol.
The result is a system you can spread across the house: a featherweight node on a Raspberry Pi Zero 2 W in each room, the heavy lifting on a server or workstation wherever you've got the horsepower.
# point Kenzy at a model on your own machine model: "ollama/llama3.1" base_url: "http://localhost:11434" # ...or a cloud provider, same two lines # model: "gpt-4o" # model: "claude-opus-4-8"
03 The stack
// Bring it home
Clone it, install the services you need, point it at your hardware, and start talking. The docs walk you through every step.