// Self-hosted · Open source · v3.0.0

Home voice AI that runs on your hardware.

Kenzy is a distributed, self-hosted voice assistant built from six independent microservices — wake word, speech-to-text, a language model, speaker ID, and text-to-speech. Point the LLM at a model on your own machine and your voice never leaves the network.

$ curl -fsSL https://kenzy.dev/install.sh | bash

// one-line installer landing soon — manual setup on Get Started

6
Microservices
100%
Local-capable
~$0
Per-request cost
Pi Zero 2 W
Runs on edge

01 Why local

Your voice never has to leave the house.

Most assistants ship every word you say to a server you don't control. Kenzy flips that: because kenzy-llm runs on LiteLLM, you can point it at a model running on your own box — Ollama, LM Studio, vLLM — or a cloud provider if you'd rather. Your call, per service.

Private

Stays on your LAN

Audio, transcripts, and model calls can all run inside your own network. No third party in the loop unless you put one there.

Open model

Bring your own LLM

LiteLLM speaks to local runtimes and every major provider. Swap models with one line of YAML — no rewiring.

No meter

No per-word bills

Run it on hardware you already own and the marginal cost of "what's the weather?" is electricity, not API credits.

Hackable

Yours to take apart

Plain Python, readable configs, and a one-file skill system. Built to be tinkered with, not locked down.

02 What's inside

A real assistant, assembled from parts you can see.

kenzy-node

On-device wake word

openWakeWord runs on every frame locally, with an optional Silero VAD gate to kill false triggers. Train and drop in your own wake word.

kenzy-speaker

Knows who's talking

SpeechBrain ECAPA-TDNN identifies enrolled speakers — so unlocking the front door by voice can require a recognized person.

kenzy-llm

Skills & tool-calling

Drop an async function in skills/, decorate it with @skill, and the model calls it as a tool. No registration, no boilerplate.

fast path

Instant commands

Common phrases like "turn on the lights" resolve deterministically — no model round-trip — so they answer the moment you finish speaking.

v3.0.0 · GROUND-UP REWRITE

Rebuilt from the ground up.

Kenzy v3 is a complete redesign — not a refactor. The monolith is gone, replaced by six small services that each do one job and talk over a simple WebSocket + PCM protocol.

The result is a system you can spread across the house: a featherweight node on a Raspberry Pi Zero 2 W in each room, the heavy lifting on a server or workstation wherever you've got the horsepower.

  • More flexible — run every service on one box or scatter them across the network.
  • More responsive — a deterministic fast path skips the LLM for everyday commands.
  • Edge-ready — nodes do only wake word + capture, so they fit tiny hardware.
  configs/llm.yaml
# point Kenzy at a model on your own machine
model: "ollama/llama3.1"
base_url: "http://localhost:11434"

# ...or a cloud provider, same two lines
# model: "gpt-4o"
# model: "claude-opus-4-8"
Ollama LM Studio vLLM OpenAI Anthropic any LiteLLM provider

03 The stack

Six services. One pipeline.

ON DEVICEnodeWake word, capture & playbackkenzy-node
PORT 8765serverWebSocket hub & orchestratorkenzy-server
PORT 8767sttSpeech-to-text · faster-whisperkenzy-stt
PORT 8766llmLLM & skills · LiteLLMkenzy-llm
PORT 8769ttsText-to-speech · OpenAI / Kokorokenzy-tts
PORT 8768speakerSpeaker ID · SpeechBrainkenzy-speaker

See the full architecture

// Bring it home

Stand up your own voice assistant.

Clone it, install the services you need, point it at your hardware, and start talking. The docs walk you through every step.