Live2D + ASR + LLM + TTS → Real-time voice interaction | Local deployment / Cloud inference
Live2D-LLM-Chat is a real-time AI interaction project that integrates Live2D virtual avatars, Automatic Speech Recognition (ASR), Large Language Models (LLM), and Text-to-Speech (TTS). It allows a virtual character to recognize the user's speech through ASR, generate intelligent responses using AI, synthesize speech via TTS, and drive Live2D animations with lip-sync for a natural interaction experience.
- 🎙 Automatic Speech Recognition(ASR): Uses FunASR for Speech-to-Text (STT) processing.
- 🧠 Large Language Model(LLM): Supports rational conversation using OpenAI GPT / DeepSeek.
- 🔊 Text-to-Speech(TTS): Uses CosyVoice for high-quality speech synthesis.
- 🏆 Live2D Virtual Character Interaction: Renders models using Live2D SDK and enables real-time feedback.
- LLM module supports both local and cloud deployment. The local deployment is based on LM Studio, which covers all open-source models, but personal device performance may limit large - models. Cloud deployment supports OpenAI and DeepSeek APIs.
- Stores conversation history with context memory. Every five conversations, a summary is generated to prevent excessive text accumulation.
- Conversation logging records the timestamp and dialogue history, including TTS audio outputs, making it easy to review past interactions. This feature can be disabled in the config file to reduce memory usage.
- Enhanced Live2D eye-tracking and blinking logic to provide natural blinking even if the Live2D model lacks built-in logic. Implements lip-sync mechanics by analyzing real-time audio volume from the TTS output.
- Modifies CosyVoice API to directly save generated speech files and merge segmented audio for long text synthesis.
| Voice Input | AI Processing | Live2D Output |
|---|---|---|
| 🎤 You: Hello! | 🤖 AI: Hi there! | 🧑🎤 "Hi there!" (Lip sync) |
| 🎤 You: How's the weather? | 🤖 AI: It's a sunny day! | 🧑🎤 "It's a sunny day!" (Speech tone variation) |
| Component | Technology |
|---|---|
| ASR (Automatic Speech Recognition) | SenseVoice |
| LLM (Large Language Model) | OpenAI GPT / DeepSeek |
| TTS (Text-to-Speech) | CosyVoice |
| Live2D Animation | live2d-py + OpenGL |
| Configuration Management | Python Config |
This project is developed with Python 3.11, and the following system requirements should be met before running it:
✅ Operating System:
- 🖥 Windows 10/11 or Linux
✅ Python Version:
- 📌 Recommended Python 3.8 or above
⚠️ Note:
The TTS module runs in a conda environment and requires Miniconda to be installed beforehand.
🔗 You can download it from Miniconda Official Website.
This project leverages the following open-source libraries and models:
🎙 Automatic Speech Recognition (ASR):
- SenseVoice - High-precision multilingual speech recognition and speech emotion analysis.
- 🔗 GitHub: SenseVoice Repository
🔊 Text-to-Speech (TTS):
- CosyVoice - A powerful generative speech synthesis system, supporting zero-shot voice cloning.
- 🔗 GitHub: CosyVoice Repository
📽 Live2D Animation:
- live2d-py - A tool for directly loading and manipulating Live2D models in Python.
- 🔗 GitHub: live2d-py Repository
git clone https://github.com/suzuran0y/Live2D-LLM-Chat.git
cd Live2D-LLM-Chatpython -m venv venv
source venv/bin/activate # Linux/macOS activation
venv\Scripts\activate # Windows activationpip install -r requirements.txt🎙 Speech Recognition (ASR) - SenseVoice This project uses SenseVoice for ASR, supporting high-precision multilingual speech recognition and speech emotion detection.
Install SenseVoice dependencies using pip:
pip install funasrIf you need ONNX or TorchScript inference, install the corresponding versions:
pip install funasr-onnx # ONNX version
pip install funasr-torch # TorchScript versionSenseVoice provides several pre-trained models, which can be downloaded via ModelScope:
from modelscope import snapshot_download
# Download SenseVoice-Small version
snapshot_download('iic/SenseVoiceSmall', local_dir='pretrained_models/SenseVoiceSmall')
# Download SenseVoice-Large version for higher accuracy
snapshot_download('iic/SenseVoiceLarge', local_dir='pretrained_models/SenseVoiceLarge')🔗 More details: SenseVoice GitHub | ModelScope
🔊 Text-to-Speech (TTS) - CosyVoice This project uses CosyVoice for TTS, supporting multilingual speech synthesis, voice cloning, and cross-lingual synthesis.
Clone the CosyVoice repository:
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
cd CosyVoice
git submodule update --init --recursive# Create a Conda virtual environment
conda create -n cosyvoice -y python=3.10
conda activate cosyvoice
# Install required dependencies
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.comInstall SoX (if necessary):
# Ubuntu
sudo apt-get install sox libsox-dev
# CentOS
sudo yum install sox sox-develIt is recommended to download the following CosyVoice pre-trained models:
from modelscope import snapshot_download
snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')🔗 More details: CosyVoice GitHub | ModelScope
After installing ASR and TTS models, follow these steps for local configuration:
✅ Replace SenseVoice Directory
- Move the downloaded SenseVoice folder into
Live2D-LLM-Chat/ASR_env/, replacing the existing empty folder.
✅ Replace CosyVoice Directory
- Move the downloaded CosyVoice folder into
Live2D-LLM-Chat/TTS_env/, replacing the existing empty folder.
✅ Replace webui.py File
- Move the
TTS_env/webui.pyfile into theCosyVoicefolder, replacing the originalwebui.pyfile.
Modify config.py to adjust local file paths and parameters. Example:
class Config:
# 🏠 Project Root Directory
PROJECT_ROOT = "E:/PyCharm/project/project1"
# 🎙 ASR (Automatic Speech Recognition) Configuration
ASR_MODEL_DIR = os.path.join(PROJECT_ROOT, "ASR_env/SenseVoice/models/SenseVoiceSmall")
ASR_AUDIO_INPUT = os.path.join(PROJECT_ROOT, "ASR_env/input_voice/voice.wav")
# 🔊 TTS (Text-to-Speech) Configuration
TTS_API_URL = "http://localhost:8000/"
TTS_OUTPUT_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/")❗ Ensure all paths are correctly set up before running the project!
Local deployment of the LLM model relies on LM Studio. Follow these steps:
Download from GitHub or the LM Studio official website.
Start LM Studio and obtain the local API URL.
Adjust the model path & port number in config.py.
Before running the main program, start the TTS API:
python TTS_api.py # This is now integrated into the main program but can be run separately for debugging.🎯 The TTS API module will run webui.py in the conda environment. Once successfully started, you can access the WebUI for voice synthesis management: 🌍 Default address: http://localhost:8000
❗ Ensure the TTS API is running properly, or the program will not be able to generate speech.
Once the TTS API is started, run the main program:
python main.py🎙 Interaction Steps:
1️⃣ Press and hold the Ctrl key to start recording, press the Alt key to stop recording. The voice will be automatically converted into text. 2️⃣ The text is processed by the LLM module, generating a response. 3️⃣ The response text is converted into speech via the TTS module, and the Live2D model will sync its lip movements to the speech.
| Step | Module | Input | Processing | Output |
|---|---|---|---|---|
| 🎤 User Speech | User | Speech Input | User speaks | Audio Signal |
| 🎙 Speech Recognition | ASR (SenseVoice) | Audio Signal | Speech-to-Text (STT) | Recognized Text |
| 🤖 Text Understanding & Generation | LLM (GPT-4 / DeepSeek) | Recognized Text | Semantic Analysis & AI Response Generation | AI-Generated Text |
| 🔊 Speech Synthesis | TTS (CosyVoice) | AI-Generated Text | Text-to-Speech (TTS) | Speech Data |
| 🎭 Live2D Animation | Live2D | Speech Data | Motion Generation | Character Animation |
| 🗣 AI Voice Feedback | User | Character Voice & Actions | User hears AI response | Voice & Visual Interaction |
This project follows a modular design, integrating ASR (speech recognition), TTS (text-to-speech), LLM (large language model), and Live2D animation rendering as core functionalities. Below is the complete project structure:
Live2D-LLM-Chat/
│── main.py # 🚀 Main program entry
│── ASR.py # 🎙 Speech Recognition (ASR) module
│── TTS.py # 🔊 Speech Synthesis (TTS) module
│── TTS_api.py # 🌐 TTS API module
│── LLM.py # 🤖 Large Language Model (LLM) module
│── Live2d_animation.py # 🎭 Live2D animation management module
│── webui.py # 🖥 WebUI for voice synthesis
│── config.py # ⚙️ Configuration file
│── requirements.txt # 📦 Dependency list
└── README.md # 📄 Project documentation- 🎯 Core Goals Defined: Developing a Live2D + LLM real-time interaction system.
- 🔍 Technology Research: Investigating ASR (speech recognition), TTS (text-to-speech), and Live2D solutions.
- ✅ Core Components Selected:
- SenseVoice for ASR
- CosyVoice for TTS
- live2d-py for animation rendering
- 🎙 Implemented speech input & recognition (ASR)
- 🤖 Integrated LLM for text generation
- 🔊 Generated speech output & synced Live2D mouth movements
🔹 LLM Module Optimization:
- Due to device limitations, local deployment may not match cloud-based models. Improving LLM processing logic to enhance stability.
🔹 Refined Output Management:
- Optimizing program logs and output messages to retain only essential information for a cleaner display.
🔹 Enhanced Live2D Interaction:
- Improving Live2D model expressions and movements to make interactions feel more natural and engaging.
🔹 Additional Optimizations:
- 🛠 Improving TTS & ASR efficiency
- 🌍 Expanding multilingual support
- 🔗 Enhancing cloud-based inference capabilities
- 🎙 Implemented speech input & recognition (ASR)
- 🤖 Integrated LLM for text generation
- 🔊 Generated speech output & synced Live2D mouth movements
This project builds upon work from SenseVoice, CosyVoice, and live2d-py, incorporating modifications and optimizations to fit the project’s requirements.
🎉 Special thanks to the original developers!
💡 We welcome contributions and feedback!
📢 If you have suggestions or improvements, please submit a PR (Pull Request) or Issue on GitHub.
This project is licensed under the Apache-2.0 License.
