It gives you:
- A friendly persona using
gpt-5-minifor chat. - Natural-sounding TTS using
gpt-4o-mini-tts(voice:nova) with prosody shaping so it feels more alive. - Whisper-based speech-to-text so you can talk to Zelda via microphone instead of just typing.
- A simple avatar front-end (PNG + SadTalker MP4 clips) that reacts in sync with the audio (as best as pre-rendered video allows).
The repo is intentionally minimal:
├── backend/
│ ├── main.py
│ ├── voice.py
│ ├── prosody.py
│ ├── transcribe.py
│ ├── zelda_key.env # your local API key (ignored)
│ ├── audio/ # generated audio (ignored)
│ └── video/ # pre-generated reaction videos
├── frontend/
│ ├── index.html
│ └── zelda.PNG
main.py– FastAPI app exposing:POST /chat– chat endpoint returning text +audio_url+tonePOST /transcribe– transcription endpoint for microphone audio- Static
/audiomount for generated MP3 TTS files
voice.py– TTS helper using OpenAIgpt-4o-mini-ttswith thenovavoice and prosody shaping viaprosody.py.prosody.py– detects emotional tone of replies and reshapes text for more natural speech delivery, plus exposesdetect_tone()used by the backend.transcribe.py– audio transcription via Whisper (whisper-1).index.html– the front-end chat UI with:- Typing box + history
- Mic button (records audio, calls
/transcribe) - “Play Zelda’s voice” toggle
- Zelda avatar (PNG + pre-generated SadTalker MP4 clips)
audio/– auto-created folder for generated MP3 files (served at/audio/...).video/– (optional) SadTalker MP4 clips used by the avatar (e.g.zelda_happy.mp4,zelda_neutral.mp4, etc.).zelda_key.env– not committed: a single-line file containing your OpenAI API key, read by bothmain.pyandvoice.py.
- 🗣️ Conversational AI using
gpt-5-mini - 🔊 Natural Text-to-Speech via
gpt-4o-mini-tts - 🎤 Speech-to-Text using
gpt-4o-transcribe - 🎭 Emotion-based avatar reactions
- 🌐 Pure local HTML/JS frontend with optional web setup
- ⚡ Lightweight FastAPI backend (
localhost:8000) - 🧩 Three Personality Modes (Friendly, Therapist, Balanced)
- Python 3.11+ (3.10 will likely work, but 3.11 is recommended).
- An OpenAI API key with access to:
gpt-4.1-mini(chat)gpt-4o-mini-tts(TTS)whisper-1(STT)
- A modern browser (Chrome, Brave, etc.) with microphone access.
pip install -r requirements.txtCreate a file in /backend named: zelda_key.env and copy/paste your OpenAI key inside
cd backend
uvicorn main:app --reload
Open frontend/index.html with your browser (Chrome recommended)
Pull requests welcome. This is an evolving project; improvements and ideas are appreciated.
Built with:
- Python / FastAPI
- OpenAI APIs
- HTML / JavaScript
- Videos generated with SadTalker: https://github.com/OpenTalker/SadTalker
