High-Performance Real-time Text-to-Speech Service | OpenAI Realtime API Compatible | Enterprise-Grade Architecture
streamTTS is a high-performance real-time text-to-speech service that is compatible with OpenAI Realtime API protocol, supporting conversion of standard v1/audio interfaces to WebSocket real-time streaming processing and intelligent text preprocessing.
📖 Language / 语言: English | 中文版
- 🎯 OpenAI Realtime API Compatible - Seamless audio interface conversion to streaming, standard protocol
- 🚀 Enterprise-Grade Performance - Support for multi-concurrency, low-latency TTS generation
- 🎙️ Intelligent Speech Synthesis - Language model, voice, speed customizable adjustment
- 🧠 Smart Text Preprocessing - Automatic normalization of numbers, dates, symbols
- 🛡️ High-Availability Architecture - Auto-reconnection, error recovery, resource monitoring
git clone https://github.com/your-repo/streamTTS.git
cd streamTTS
go build -o streamTTS main.go && ./streamTTSdocker-compose up -dAccess after launch: http://localhost:8080
ws://localhost:8080/v1/realtime
const ws = new WebSocket('ws://localhost:8080/v1/realtime');
// Create session
ws.send(JSON.stringify({
type: 'session.create',
session: {
model: 'kokoros-pro',
voice: 'zf_001',
modalities: ['text', 'audio']
}
}));
// Text-to-speech
ws.send(JSON.stringify({
type: 'response.create',
response: {
modalities: ['audio'],
input: 'Hello, this is a test text.'
}
}));
// Handle audio response
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'response.audio.delta') {
playAudio(data.delta);
}
};server:
host: "127.0.0.1"
port: 8080
ttsEngine:
audio:
sampleRate: 16000
bitDepth: 16
channels: 1
api:
baseURL: "http://localhost:3000/v1"
AccessKey: "your_api_key"Automatically enhance speech naturalness, supporting:
- Number Conversion -
1.5→one point five|100→one hundred - Date Time -
2024-01-01→January first twenty twenty-four - Symbol Processing -
50%→fifty percent|25℃→twenty-five degrees Celsius - Smart Segmentation - Automatic sentence splitting based on punctuation and length
- Mixed Language - Automatic language environment recognition
Configuration Example:
textPreprocessing:
enabled: true
normalization:
enableNumberNormalization: true
enableDateNormalization: true
enableSymbolNormalization: trueOpenAI Realtime API Compatible - Standard message types:
session.create- Create sessionresponse.create- TTS requestresponse.audio.delta- Audio stream dataerror- Error handling
- Web Interface - Visit http://localhost:8080
- Command Line -
make client_run
- v0.3 - Intelligent text preprocessing, automatically enhance speech naturalness
- v0.2 - Full OpenAI Realtime API compatibility
- v0.1 - Basic streaming TTS functionality
MIT License - See LICENSE file for details
Issues and Pull Requests are welcome!