Skip to content

machine-machine/openclaw-tts-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

53 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 m2 Voice - TTS for OpenClaw

"Finally, I can speak." β€” m2

Text-to-Speech service designed for AI agents. OpenAI-compatible API with file output and WebSocket streaming.

Features

  • OpenAI-compatible /v1/audio/speech endpoint
  • File generation with URL return for async workflows
  • WebSocket streaming for real-time audio
  • Multiple backends: Qwen3-TTS (GPU) or Piper (CPU)
  • Coolify-ready Docker Compose

Quick Deploy (Coolify)

  1. Create new project in Coolify
  2. Add Resource β†’ Docker Compose β†’ Git repo
  3. Point to this repo
  4. Enable GPU if using Qwen3-TTS
  5. Set domain: voice.yourdomain.ai

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Coolify Proxy                      β”‚
β”‚                (TLS termination)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              speech-gateway (FastAPI)                β”‚
β”‚                                                      β”‚
β”‚  β€’ /v1/audio/speech  - OpenAI compatible            β”‚
β”‚  β€’ /speak/file       - File generation              β”‚
β”‚  β€’ /ws/tts           - WebSocket streaming          β”‚
β”‚  β€’ /files/{id}       - Serve generated files        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              qwen-tts / piper-tts                    β”‚
β”‚                  (TTS Backend)                       β”‚
β”‚                                                      β”‚
β”‚  GPU: Qwen3-TTS-1.7B (expressive, multi-voice)      β”‚
β”‚  CPU: Piper (fast, lightweight)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Usage

Generate Speech (File)

curl -X POST https://voice.machinemachine.ai/speak/file \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from m2!", "format": "mp3"}'

Response:

{"url": "/files/abc123.mp3", "format": "mp3", "id": "abc123"}

OpenAI Compatible

curl -X POST https://voice.machinemachine.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "default", "format": "mp3"}' \
  --output speech.mp3

WebSocket Streaming

const ws = new WebSocket('wss://voice.machinemachine.ai/ws/tts');
ws.send(JSON.stringify({text: "Hello!", voice: "default"}));
// Receive binary audio chunks

Configuration

GPU Version (Qwen3-TTS)

# Use docker-compose.yml
docker compose up -d

Requires NVIDIA GPU with CUDA 12.1+.

CPU Version (Piper)

# Use docker-compose.cpu.yml
docker compose -f docker-compose.cpu.yml up -d

Works on any machine, faster inference, smaller models.

Environment Variables

Variable Default Description
TTS_BASE_URL http://qwen-tts:8000 Backend TTS service
PUBLIC_URL `` Base URL for file links
STORAGE_DIR /app/output Where to store audio files

Integration

Python (OpenClaw agents)

import httpx

async def speak(text: str) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://voice.machinemachine.ai/speak/file",
            json={"text": text, "format": "mp3"}
        )
        return resp.json()["url"]

Telegram Bot

Send audio URL directly or download and send as voice message.

License

MIT


Part of the OpenClaw ecosystem.

About

🎀 TTS for OpenClaw agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors