Sakura is a VTuber in Neuro-sama sauce: your personal AI companion who listens through your mic, responds with TTS, remembers conversations via Hindsight memory, and can be routed into VTube Studio with a virtual audio cable.
Sakura takes inspiration from Kira_AI:
https://github.com/JonathanDunkleberger/Kira_AI
- Voice Conversations: VAD + Whisper STT + local LLM + MLX Audio TTS
- Smart Memory: Long‑term memory powered by Hindsight (Docker container)
- Dynamic Personality: Persona and emotional state updates from
mission.yaml - VTube Studio Audio Routing: Output to a virtual cable device
- Web search (scaffolded in
web_search.py, not connected to the main loop) - Twitch chat integration (bot exists in
twitch_bot.py, but disabled by default) - More features coming
- Python 3.10+ recommended
pip install -r requirements_lock.txtCopy .env.example to .env and set at least these values:
# LLM
LLM_PROVIDER=local # or "kimi"
LLM_HOST=127.0.0.1
LLM_PORT=8080
N_CTX=2048
LLM_MAX_RESPONSE_TOKENS=512
# Kimi (if using LLM_PROVIDER=kimi)
MOONSHOT_API_KEY=your_key
KIMI_MODEL=kimi-k2-turbo-preview
# STT
WHISPER_MODEL_SIZE=base.en
# TTS (MLX Audio)
TTS_ENGINE=mlx
MLX_AUDIO_MODEL=/path/to/mlx/model
MLX_REF_AUDIO=/path/to/reference.wav
# Optional audio routing (VTube Studio)
VIRTUAL_AUDIO_DEVICE=
# Hindsight memory (used by the container)
OPENAI_API_KEY=your_openai_keyNotes:
MLX_AUDIO_MODELandMLX_REF_AUDIOare required for MLX TTS.VIRTUAL_AUDIO_DEVICEshould match your virtual cable device name.
Hindsight runs as a container. Use hindsight.yaml and your OPENAI_API_KEY from .env:
docker compose -f hindsight.yaml up -dMore info about Hindsight installation:
https://hindsight.vectorize.io/developer/installation
llamacpp.yaml is ready to run a local server:
docker compose -f llamacpp.yaml up -dUpdate the model path inside llamacpp.yaml or mount your model under ./models.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DCMAKE_OSX_ARCHITECTURES=arm64 -DGGML_METAL=on
make -jRun the server:
./llama-server -m /Users/wasami/Sakura_AI/models/gemma-3-1b-it-BF16.gguf -c 2048 --n-gpu-layers allIf you want Sakura to speak into VTube Studio:
- Install a virtual audio cable.
- Find the correct device name:
python virtual_cable_finder.py- Set that name in
.envasVIRTUAL_AUDIO_DEVICE.
python bot.pyYou should see:
--- Sakura is now running. Press Ctrl+C to exit. ---
- Start memory:
docker compose -f hindsight.yaml up -d- Start LLM server:
- Linux: use Docker with
llamacpp.yaml - Apple Silicon: build llama.cpp from source (see the Metal build section above)
- Run Sakura:
python bot.py- "No module named ...": re-run
pip install -r requirements_lock.txt - Model not found: verify the model path and
LLM_HOST/LLM_PORTfor your server - No audio: confirm mic permissions and VAD settings
- TTS silent: check
MLX_AUDIO_MODEL,MLX_REF_AUDIO, and device name
Before installing Python deps on Apple Silicon:
brew install pyaudioPackages installed via pip (reference):
chromadb
google-api-python-client
mlx-audio
openai
pillow
pyaudio
pygame
python-dotenv
pyttsx3
sentence-transformers
webrtcvad-wheels
nest-asyncio
MIT License — feel free to modify and share.
