Important
This repository has moved to Hangry Labs: https://github.com/hangry-labs/MeloTTS
This old fork is kept as a public redirect/archive for existing links and users. Please use the Hangry Labs repository for issues, discussions, pull requests, releases, and current documentation.
Easy-to-run text-to-speech Docker images with a browser UI and HTTP API included.
This fork is made for ease of use. The aim is that anyone should be able to run text to speech without friction: a person trying it at home, a developer wiring it into an app, or a professional evaluating it for a production environment. Install Docker, run one command from Quick Start, open the local link, and start generating speech.
You get:
- A browser UI for manual text-to-speech generation
- An HTTP API for your own applications and tools
- No manual Python, model, or audio dependency setup
- Full multilingual images and smaller EN-focused images
- Offline-friendly usage: download an image once, keep it, and run it later without relying on live model downloads
Official Docker images are published here: sensejworld/melotts on Docker Hub.
Preview MP3 samples from the full multilingual image:
GitHub does not render embedded audio players directly in README files, so direct MP3 links are also provided below.
| Language | Sample |
|---|---|
| English | Listen to MP3 |
| English v2 | Listen to MP3 |
| English newest | Listen to MP3 |
| Spanish | Listen to MP3 |
| French | Listen to MP3 |
| Chinese | Listen to MP3 |
| Japanese | Listen to MP3 |
| Korean | Listen to MP3 |
docker run -p 8888:8888 --gpus all sensejworld/melotts:latestEN-focused build (smaller target image):
docker run -p 8888:8888 --gpus all sensejworld/melotts:latest_enRun on a specific GPU (example: GPU index 1):
docker run -p 8888:8888 --gpus "device=1" sensejworld/melotts:latestThen open: http://localhost:8888
curl -X POST "http://localhost:8888/tts/convert/tts" \
-H "Content-Type: application/json" \
-d '{"text":"Hello world!","language":"EN","speaker_id":"EN-BR"}' \
-o output.wavThe API remains backward compatible: when format is omitted, it returns WAV audio as before.
To request a smaller response, add format with one of mp3, flac, or ogg:
curl -X POST "http://localhost:8888/tts/convert/tts" \
-H "Content-Type: application/json" \
-d '{"text":"Hello world!","language":"EN","speaker_id":"EN-BR","format":"mp3"}' \
-o output.mp3Available formats are exposed at GET /tts/formats.
The web UI defaults to MP3 downloads because it is a more practical size for interactive use.
This project is an independently maintained fork of the original MeloTTS by Wenliang Zhao, Xumin Yu, and Zengyi Qin. The original work is licensed under the MIT License, and we thank the authors for their excellent research and contributions.
While the original MeloTTS is an impressive research project, this fork focuses on making it simple to run and integrate: Docker image, included UI, and API support out of the box.
License and attribution are preserved in LICENSE. The original MeloTTS copyright remains with MyShell.ai; this fork adds separate Hangry Labs copyright for the Docker packaging, Web UI/API integration, documentation, release tooling, and other modifications.
β Offline Mode: Supported when models are baked into the Docker image or mounted through a volume.
If you encounter bugs, have feature requests, or need help using MeloTTS:
- Please open a new GitHub Issue with as much detail as possible
- Include error messages, logs, and reproduction steps if applicable
- For general questions or ideas, you can also use the Discussions tab
- Pinned dependencies for reproducible builds
- Preloaded models for instant offline use (optional)
- GPU acceleration when available
- HTTP API + web UI in one container
- Split image strategy: full multilingual images use the plain version tag; EN-focused images use
*_en
You can explore all available MeloTTS container images on Docker Hub.
This is useful if you want to:
- Select a specific version of MeloTTS for compatibility
- Check the latest available builds before pulling
- Verify image tags for deployment
Current tag pattern:
- EN-focused image:
latest_en,<version>_en - Full multilingual image:
latest,<version>
- Scope: runtime-focused cleanup for the Docker UI/API fork.
- Removed unused upstream training surfaces, including training scripts/modules, training example data, legacy script-style package tests, and original upstream docs that no longer matched this fork.
- Trimmed runtime helper code by reducing
melo/utils.pyto inference text preparation, config loading, andHParams. - Removed stale phonemizer generation artifacts and notebook files that were not read by runtime synthesis.
- Cleaned stale imports, unused locals, and unreachable flow-layer code found by lint checks.
- Improved Taskfile API readiness checks by retrying transient startup errors such as
Empty reply from server. - Reworked the UI into a Kokoro-style Gradio layout while keeping MeloTTS language, speaker, preset, and advanced synthesis controls.
- Added text metrics, per-language random quotes, voice inventory, synthesis presets, advanced controls, Gradio audio waveform preview, runtime metadata, favicon/brand icon, and richer API documentation links.
- Added
/tts/status,/tts/defaults,/tts/voices,/tts/metrics, and/tts/purgeendpoints for the new UI and companion integrations. - Added backward-compatible optional API output formats: default WAV plus MP3, FLAC, and Ogg Vorbis via
format, with discovery at/tts/formats. - Added an output format selector to the Gradio UI; the UI defaults to MP3 while the API remains WAV-by-default for old clients.
- Modernized the runtime dependency stack using
requirements.in+ resolved pins inrequirements.txt; key validated versions includegradio==6.14.0,fastapi==0.136.1,starlette==1.0.0,pydantic==2.13.4,torch==2.11.0,torchaudio==2.11.0,transformers==5.8.0,numpy==2.2.6, andsoundfile==0.13.1. - Normalized package metadata versioning in
setup.pyso display versions likev0.0.8-SNAPSHOTinstall as valid Python package versions such as0.0.8.dev0. - Added
task releasebacked by the root snapshotVERSIONfile, and corrected Docker release tags so the full image publishes as<version>while the EN-focused image publishes as<version>_en. - Expanded rapid local iteration tasks so
task localrun,task localdev, andtask localapibind-mountmelo/app.py. - Documentation: corrected API examples to use
/tts/convert/ttsJSON payloads and documented the current runtime-only scope.docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.8_en docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.8 docker run -p 8888:8888 --gpus "device=1" sensejworld/melotts:v0.0.8_en
- Upgraded Docker runtime/build baseline to Python 3.10 (
python:3.10-slim) and aligned packaging withpython_requires>=3.10. - Reworked app versioning/build metadata:
- Root
VERSIONfile is now the single version source of truth. - Build metadata is generated at image build time (no hardcoded
BUILD_ID) and exposed in UI/API.
- Root
- Upgraded web stack to newer compatible releases:
gradio==4.44.1,gradio-client==1.3.0,fastapi==0.115.12,starlette==0.46.2,typer==0.12.5. - Applied large dependency/security refresh with pinned versions for reproducible builds, including network/security-sensitive packages such as
requests==2.32.4,urllib3==2.3.0,certifi==2025.6.15, plus broad runtime library updates. - Added/kept compatibility guardrails for stability:
markupsaferemains on 2.x for Gradio compatibility.huggingface-hub==0.21.4andfilelock==3.13.1remain constrained bycached-path==1.6.2.
- Improved offline reliability and startup resilience:
- Build-time preload profiles (
EN_ONLY/FULL) with retry + strict/non-strict controls. - NLTK resources required for EN synthesis (including
averaged_perceptron_tagger_engandcmudict) are preloaded during image build for offline-ready runs.
- Build-time preload profiles (
- Fixed Gradio 4.x UI regressions after upgrades (language/speaker loading + synth output compatibility) while keeping API behavior stable.
- Split Docker release flow into EN and FULL image tracks/workflows (
<version>_en,<version>) to improve build/release flexibility. - Run with:
https://hub.docker.com/r/sensejworld/melotts
docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.7_en docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.7 docker run -p 8888:8888 --gpus "device=1" sensejworld/melotts:v0.0.7_en
- Model loading is now much faster (from ~30 seconds down to only a few seconds in testing).
- Added working RTX 50-series (
sm_120) support in the Docker setup. - Added GPU selection support for Docker runs, so you can choose which GPU to use.
- Improved build resilience for model preloading during Docker image creation.
- Run with:
docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.6
- Added more English model options (including V2 and V3 variants).
- Added UI tabs for
UI PlaygroundandAPI Docs. - Added build/version badge in UI (top-right) via
APP_VERSIONandBUILD_ID. - Added memory management in UI (
Purge others) to release non-selected language models. - Improved API documentation visibility directly inside the app (
/-> API Docs tab +/tts/docs). - Updated release planning: V2/V3 scope completed; deferred separate base-repo split plan.
- Run with:
docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.5
- Dependency updates for improved performance and stability.
- Full offline support β all required models are now baked into the image.
- Model overwrite option: set
MELOTTTS_MODELSto point to your custom model folder. - Smaller image size via optimized multi-stage Docker build.
- Run with:
docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.4
- Optimized docker build to use layer caching so we can build stuff fast after the initial build
- Expanded ping to include version and build
- Expanded UI with sdp_ratio, noise_scale and noise_scale_w
- Expanded API with sdp_ratio, noise_scale and noise_scale_w
- Corrected faulty version dates
- Updated documentation
- Run with:
docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.3`
- Enable API calls together with UI
- run with
docker run -p 8888:8888 --gpus all sensejworld/melotts:v0.0.2` - run for english only
docker run -p 8888:8888 -e TTS_LANGUAGES=EN sensejworld/melotts:v0.0.2` - run for english and japanese
docker run -p 8888:8888 -e TTS_LANGUAGES=EN,JP sensejworld/melotts:v0.0.2` - run for english with gpu support named melotts_gpu_en
docker run -p 8888:8888 --gpus all -e TTS_LANGUAGES=EN --name melotts_gpu_en sensejworld/melotts:v0.0.2`
- Initial release
- Basic TTS functionality
- Support for English (Default, US, BR, India, AU)
- Docker support for both CPU and GPU
- Web interface on port 8888 (http://localhost:8888/)
- Run with
docker pull sensejworld/melotts:v0.0.1`
If youβre interested in building MeloTTS locally, testing changes, or working directly on the codebase, I have included additional technical details and tips in notes.md.
This file contains guidance for:
- Local environment setup
- Dependency management
- Testing workflows
- Build & Docker optimization notes
This fork is licensed under the MIT License.
Original work by Wenliang Zhao, Xumin Yu, and Zengyi Qin in MeloTTS.
