Skip to content

streamTTS is a high-performance real-time text-to-speech service that is compatible with OpenAI Realtime API protocol, supporting conversion of standard v1/audio interfaces to WebSocket real-time streaming processing and intelligent text preprocessing.

License

Notifications You must be signed in to change notification settings

go-restream/tts

Repository files navigation

🚀 streamTTS

High-Performance Real-time Text-to-Speech Service | OpenAI Realtime API Compatible | Enterprise-Grade Architecture

GitHub Go Protocol

streamTTS is a high-performance real-time text-to-speech service that is compatible with OpenAI Realtime API protocol, supporting conversion of standard v1/audio interfaces to WebSocket real-time streaming processing and intelligent text preprocessing.


📖 Language / 语言: English | 中文版

✨ Core Features

  • 🎯 OpenAI Realtime API Compatible - Seamless audio interface conversion to streaming, standard protocol
  • 🚀 Enterprise-Grade Performance - Support for multi-concurrency, low-latency TTS generation
  • 🎙️ Intelligent Speech Synthesis - Language model, voice, speed customizable adjustment
  • 🧠 Smart Text Preprocessing - Automatic normalization of numbers, dates, symbols
  • 🛡️ High-Availability Architecture - Auto-reconnection, error recovery, resource monitoring

🚀 Quick Start

One-Command Launch

git clone https://github.com/your-repo/streamTTS.git
cd streamTTS
go build -o streamTTS main.go && ./streamTTS

Docker Deployment

docker-compose up -d

Access after launch: http://localhost:8080

🔌 API Interface

WebSocket Connection

ws://localhost:8080/v1/realtime

JavaScript Quick Example

const ws = new WebSocket('ws://localhost:8080/v1/realtime');

// Create session
ws.send(JSON.stringify({
  type: 'session.create',
  session: {
    model: 'kokoros-pro',
    voice: 'zf_001',
    modalities: ['text', 'audio']
  }
}));

// Text-to-speech
ws.send(JSON.stringify({
  type: 'response.create',
  response: {
    modalities: ['audio'],
    input: 'Hello, this is a test text.'
  }
}));

// Handle audio response
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'response.audio.delta') {
    playAudio(data.delta);
  }
};

Configuration File

server:
  host: "127.0.0.1"
  port: 8080

ttsEngine:
  audio:
    sampleRate: 16000
    bitDepth: 16
    channels: 1
  api:
    baseURL: "http://localhost:3000/v1"
    AccessKey: "your_api_key"

🧠 Intelligent Text Preprocessing

Automatically enhance speech naturalness, supporting:

  • Number Conversion - 1.5one point five | 100one hundred
  • Date Time - 2024-01-01January first twenty twenty-four
  • Symbol Processing - 50%fifty percent | 25℃twenty-five degrees Celsius
  • Smart Segmentation - Automatic sentence splitting based on punctuation and length
  • Mixed Language - Automatic language environment recognition

Configuration Example:

textPreprocessing:
  enabled: true
  normalization:
    enableNumberNormalization: true
    enableDateNormalization: true
    enableSymbolNormalization: true

📋 Protocol Support

OpenAI Realtime API Compatible - Standard message types:

  • session.create - Create session
  • response.create - TTS request
  • response.audio.delta - Audio stream data
  • error - Error handling

🛠️ Quick Testing

  1. Web Interface - Visit http://localhost:8080
  2. Command Line - make client_run

📈 Recent Updates

  • v0.3 - Intelligent text preprocessing, automatically enhance speech naturalness
  • v0.2 - Full OpenAI Realtime API compatibility
  • v0.1 - Basic streaming TTS functionality

📄 License

MIT License - See LICENSE file for details

🤝 Contributing

Issues and Pull Requests are welcome!

About

streamTTS is a high-performance real-time text-to-speech service that is compatible with OpenAI Realtime API protocol, supporting conversion of standard v1/audio interfaces to WebSocket real-time streaming processing and intelligent text preprocessing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published