OpenAI-compatible speech-to-text API powered by OpenAI Whisper CLI.
docker run -d -p 8090:8000 --name whisper-api ggai/whisper-api# Build and start
docker compose up -d --build
# View logs
docker compose logs -fThe service will be available at http://localhost:8090.
Set TUNNEL_MODE in docker-compose.yml to expose the service publicly.
- Get an auth key from Tailscale Admin Console (check "Reusable" and enable Funnel)
- Configure in docker-compose.yml:
environment:
- TUNNEL_MODE=tailscale
- TS_AUTHKEY=tskey-auth-xxxxx- First run will print an authorization link in the logs. Open it to enable Funnel on your tailnet (one-time only)
- The public URL will be printed in the logs on subsequent starts
No account or token needed. Just set the mode:
environment:
- TUNNEL_MODE=cloudflareA random *.trycloudflare.com URL will be generated and printed in the logs on each startup.
Replace {WHISPER_URL} with your actual deployment address (e.g. http://192.168.31.105:8090):
openclaw config set tools.media.audio.models '[{"model":"whisper-1","baseUrl":"{WHISPER_URL}","headers":{"Authorization":"Bearer local"}}]'POST /v1/audio/transcriptions
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | Audio file (wav/mp3/m4a/flac/ogg/webm) |
| model | string | No | Fixed to turbo, any value will be ignored |
| language | string | No | Language code (e.g. zh/en/ja), auto-detect if omitted |
| prompt | string | No | Context prompt to guide transcription style |
| response_format | string | No | json (default) or verbose_json |
| temperature | string | No | Sampling temperature, default 0 |
| timestamp_granularities | string | No | Set to word to enable word-level timestamps |
Example request:
curl -X POST http://localhost:8090/v1/audio/transcriptions \
-F "file=@audio.wav" \
-F "language=zh"Example response (json):
{
"id": "whisper-1773844245",
"object": "whisper.result",
"created": 1773844245,
"model": "turbo",
"text": "Transcribed text"
}Example response (verbose_json):
{
"id": "whisper-1773844245",
"object": "whisper.result",
"created": 1773844245,
"model": "turbo",
"processing_time": 211.5,
"data": {
"text": "Transcribed text",
"segments": [...],
"language": "zh"
}
}| Endpoint | Method | Description |
|---|---|---|
/v1/models |
GET | List available models |
/health |
GET | Health check |
Set API_KEY_ENABLED=true in docker-compose.yml to enable API key authentication.
A random API key will be generated on each startup and printed in the logs:
docker compose logs | grep "API key"
Requests must include the key in the Authorization header:
curl -X POST http://localhost:8090/v1/audio/transcriptions \
-H "Authorization: Bearer <your-api-key>" \
-F "file=@audio.wav"The /health endpoint is exempt from authentication.
| Variable | Default | Description |
|---|---|---|
API_KEY_ENABLED |
false |
Enable API key authentication |
TUNNEL_MODE |
none |
Tunnel mode: none, tailscale, cloudflare |
TS_AUTHKEY |
(empty) | Tailscale auth key (required for tailscale mode) |
WHISPER_MODEL_DIR |
/root/.cache/whisper |
Model cache directory |
DEFAULT_LANGUAGE |
(empty) | Default language, auto-detect if empty |
whisper-api/
├── app/
│ └── main.py # FastAPI application
├── Dockerfile
├── docker-compose.yml
├── entrypoint.sh # Startup script with tunnel support
├── requirements.txt
└── README.md
基于 OpenAI Whisper CLI 的 OpenAI 兼容语音转文字 API 服务。
docker run -d -p 8090:8000 --name whisper-api ggai/whisper-api# 构建并启动
docker compose up -d --build
# 查看日志
docker compose logs -f服务启动后访问 http://localhost:8090。
在 docker-compose.yml 中设置 TUNNEL_MODE 即可将服务暴露到公网。
- 从 Tailscale 管理后台 获取 Auth Key(勾选"可复用"并启用 Funnel)
- 在 docker-compose.yml 中配置:
environment:
- TUNNEL_MODE=tailscale
- TS_AUTHKEY=tskey-auth-xxxxx- 首次启动时日志中会打印授权链接,打开链接在管理后台启用 Funnel(仅需一次)
- 后续启动公网地址会打印到日志中
无需账号或 Token,只需设置模式:
environment:
- TUNNEL_MODE=cloudflare每次启动会自动生成一个随机的 *.trycloudflare.com 地址并打印到日志中。
将 {WHISPER_URL} 替换为实际部署地址(如 http://192.168.31.105:8090):
openclaw config set tools.media.audio.models '[{"model":"whisper-1","baseUrl":"{WHISPER_URL}","headers":{"Authorization":"Bearer local"}}]'POST /v1/audio/transcriptions
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| file | file | 是 | 音频文件(wav/mp3/m4a/flac/ogg/webm) |
| model | string | 否 | 固定使用 turbo,传参会被忽略 |
| language | string | 否 | 指定语言(如 zh/en/ja),不传则自动检测 |
| prompt | string | 否 | 提示词,引导转写风格 |
| response_format | string | 否 | json(默认)或 verbose_json |
| temperature | string | 否 | 采样温度,默认 0 |
| timestamp_granularities | string | 否 | 设为 word 启用词级时间戳 |
请求示例:
curl -X POST http://localhost:8090/v1/audio/transcriptions \
-F "file=@audio.wav" \
-F "language=zh"响应示例(json):
{
"id": "whisper-1773844245",
"object": "whisper.result",
"created": 1773844245,
"model": "turbo",
"text": "转写结果文本"
}响应示例(verbose_json):
{
"id": "whisper-1773844245",
"object": "whisper.result",
"created": 1773844245,
"model": "turbo",
"processing_time": 211.5,
"data": {
"text": "转写结果文本",
"segments": [...],
"language": "zh"
}
}| 接口 | 方法 | 说明 |
|---|---|---|
/v1/models |
GET | 查看可用模型 |
/health |
GET | 健康检查 |
在 docker-compose.yml 中设置 API_KEY_ENABLED=true 开启 API Key 认证。
每次启动时会随机生成 API Key 并打印到日志中:
docker compose logs | grep "API key"
请求时需要在 Authorization 头中携带 Key:
curl -X POST http://localhost:8090/v1/audio/transcriptions \
-H "Authorization: Bearer <your-api-key>" \
-F "file=@audio.wav"/health 接口不受认证限制。
| 变量 | 默认值 | 说明 |
|---|---|---|
API_KEY_ENABLED |
false |
是否开启 API Key 认证 |
TUNNEL_MODE |
none |
隧道模式:none、tailscale、cloudflare |
TS_AUTHKEY |
空 | Tailscale 认证密钥(tailscale 模式必填) |
WHISPER_MODEL_DIR |
/root/.cache/whisper |
模型缓存目录 |
DEFAULT_LANGUAGE |
空 | 默认语言,空则自动检测 |
whisper-api/
├── app/
│ └── main.py # FastAPI 应用
├── Dockerfile
├── docker-compose.yml
├── entrypoint.sh # 启动脚本(含隧道支持)
├── requirements.txt
└── README.md