Skip to content

WasamiKirua/Sakura-VTuber

Repository files navigation

🎤 Sakura AI VTuber

Demo of AI VTuber in action

Sakura is a VTuber in Neuro-sama sauce: your personal AI companion who listens through your mic, responds with TTS, remembers conversations via Hindsight memory, and can be routed into VTube Studio with a virtual audio cable.

Sakura takes inspiration from Kira_AI:

https://github.com/JonathanDunkleberger/Kira_AI

✨ What Sakura Can Do (Current)

  • Voice Conversations: VAD + Whisper STT + local LLM + MLX Audio TTS
  • Smart Memory: Long‑term memory powered by Hindsight (Docker container)
  • Dynamic Personality: Persona and emotional state updates from mission.yaml
  • VTube Studio Audio Routing: Output to a virtual cable device

🚧 In Progress / Not Yet Wired

  • Web search (scaffolded in web_search.py, not connected to the main loop)
  • Twitch chat integration (bot exists in twitch_bot.py, but disabled by default)
  • More features coming

✅ Setup (Local)

1) Install Python

  • Python 3.10+ recommended

2) Install Dependencies

pip install -r requirements_lock.txt

3) Configure .env

Copy .env.example to .env and set at least these values:

# LLM
LLM_PROVIDER=local            # or "kimi"
LLM_HOST=127.0.0.1
LLM_PORT=8080
N_CTX=2048
LLM_MAX_RESPONSE_TOKENS=512

# Kimi (if using LLM_PROVIDER=kimi)
MOONSHOT_API_KEY=your_key
KIMI_MODEL=kimi-k2-turbo-preview

# STT
WHISPER_MODEL_SIZE=base.en

# TTS (MLX Audio)
TTS_ENGINE=mlx
MLX_AUDIO_MODEL=/path/to/mlx/model
MLX_REF_AUDIO=/path/to/reference.wav

# Optional audio routing (VTube Studio)
VIRTUAL_AUDIO_DEVICE=

# Hindsight memory (used by the container)
OPENAI_API_KEY=your_openai_key

Notes:

  • MLX_AUDIO_MODEL and MLX_REF_AUDIO are required for MLX TTS.
  • VIRTUAL_AUDIO_DEVICE should match your virtual cable device name.

🧠 Memory (Hindsight)

Hindsight runs as a container. Use hindsight.yaml and your OPENAI_API_KEY from .env:

docker compose -f hindsight.yaml up -d

More info about Hindsight installation:

https://hindsight.vectorize.io/developer/installation

🦙 Local LLM (llama.cpp)

Option A: Docker (Linux only)

llamacpp.yaml is ready to run a local server:

docker compose -f llamacpp.yaml up -d

Update the model path inside llamacpp.yaml or mount your model under ./models.

Option B: Build on Apple Silicon (Metal)

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DCMAKE_OSX_ARCHITECTURES=arm64 -DGGML_METAL=on
make -j

Run the server:

./llama-server -m /Users/wasami/Sakura_AI/models/gemma-3-1b-it-BF16.gguf -c 2048 --n-gpu-layers all

🎬 VTube Studio Audio Routing

If you want Sakura to speak into VTube Studio:

  1. Install a virtual audio cable.
  2. Find the correct device name:
python virtual_cable_finder.py
  1. Set that name in .env as VIRTUAL_AUDIO_DEVICE.

▶️ Run Sakura

python bot.py

You should see:

--- Sakura is now running. Press Ctrl+C to exit. ---

✅ Quick Start (Recommended Order)

  1. Start memory:
docker compose -f hindsight.yaml up -d
  1. Start LLM server:
  • Linux: use Docker with llamacpp.yaml
  • Apple Silicon: build llama.cpp from source (see the Metal build section above)
  1. Run Sakura:
python bot.py

🛠️ Troubleshooting

  • "No module named ...": re-run pip install -r requirements_lock.txt
  • Model not found: verify the model path and LLM_HOST/LLM_PORT for your server
  • No audio: confirm mic permissions and VAD settings
  • TTS silent: check MLX_AUDIO_MODEL, MLX_REF_AUDIO, and device name

✅ Apple Silicon Notes (Important)

Before installing Python deps on Apple Silicon:

brew install pyaudio

Packages installed via pip (reference):

chromadb
google-api-python-client
mlx-audio
openai
pillow
pyaudio
pygame
python-dotenv
pyttsx3
sentence-transformers
webrtcvad-wheels
nest-asyncio

📜 License

MIT License — feel free to modify and share.

About

Sakura a VTuber in Neuro-sama sauce (Apple Silicon)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages