"Finally, I can speak." β m2
Text-to-Speech service designed for AI agents. OpenAI-compatible API with file output and WebSocket streaming.
- OpenAI-compatible
/v1/audio/speechendpoint - File generation with URL return for async workflows
- WebSocket streaming for real-time audio
- Multiple backends: Qwen3-TTS (GPU) or Piper (CPU)
- Coolify-ready Docker Compose
- Create new project in Coolify
- Add Resource β Docker Compose β Git repo
- Point to this repo
- Enable GPU if using Qwen3-TTS
- Set domain:
voice.yourdomain.ai
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Coolify Proxy β
β (TLS termination) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β speech-gateway (FastAPI) β
β β
β β’ /v1/audio/speech - OpenAI compatible β
β β’ /speak/file - File generation β
β β’ /ws/tts - WebSocket streaming β
β β’ /files/{id} - Serve generated files β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β qwen-tts / piper-tts β
β (TTS Backend) β
β β
β GPU: Qwen3-TTS-1.7B (expressive, multi-voice) β
β CPU: Piper (fast, lightweight) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
curl -X POST https://voice.machinemachine.ai/speak/file \
-H "Content-Type: application/json" \
-d '{"text": "Hello from m2!", "format": "mp3"}'Response:
{"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
--output speech.mp3const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');
ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
// Receive binary audio chunks# Use docker-compose.yml
docker compose up -dRequires NVIDIA GPU with CUDA 12.1+.
# Use docker-compose.cpu.yml
docker compose -f docker-compose.cpu.yml up -dWorks on any machine, faster inference, smaller models.
| Variable | Default | Description |
|---|---|---|
TTS_BASE_URL |
http://qwen-tts:8000 |
Backend TTS service |
PUBLIC_URL |
`` | Base URL for file links |
STORAGE_DIR |
/app/output |
Where to store audio files |
import httpx
async def speak(text: str) -> str:
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://voice.machinemachine.ai/speak/file",
json={"text": text, "format": "mp3"}
)
return resp.json()["url"]Send audio URL directly or download and send as voice message.
MIT
Part of the OpenClaw ecosystem.