An xyOps Marketplace Event Plugin that transcribes audio files with OpenAI Whisper via the local, offline whisper.cpp runtime.
This Plugin is designed for workflows and event runs where a single audio file is passed in as job input. It launches a Docker container, runs whisper-cli, streams progress back into xyOps, and returns both structured transcript data and optional transcript files.
- Uses
whisper.cpp, not a hosted API - Runs fully inside Docker on your xyOps worker
- Ships separate prebuilt images for
tiny,base,small,medium, andlarge-v3 - Builds both
linux/amd64andlinux/arm64images with GitHub Actions - Bakes one Whisper model directly into each image, so jobs do not download models at runtime
- Works with xyRun for remote file download/upload handling inside Docker
- Emits live progress updates back to xyOps
docker
None. This Plugin does not require any API key, token, or secret vault configuration.
This Plugin does not collect analytics, telemetry, or usage metrics.
The actual job transcription runs locally inside your Docker container using the baked-in Whisper model. No audio is sent to OpenAI or any other hosted inference service by this Plugin.
This Plugin processes the first input file only.
The underlying whisper-cli example in whisper.cpp documents support for:
.mp3.wav.ogg.flac
The Plugin always returns structured job data, including:
transcript: the plain text transcriptsegments: timestamped transcript segmentsdetectedLanguage: the detected language from Whisperoutputs: any attached file artifacts
Depending on the selected parameters, it can also attach:
.txt.srt.vtt.lrc.json
The available image variants are:
tinybasesmallmediumlarge-v3
As a rough rule:
tinyis fastest and lightestbaseis a good general defaultsmallandmediumtrade more CPU and RAM for better accuracylarge-v3is the heaviest but usually the most accurate
Example structured job output:
{
"model": "base",
"requestedModel": "base",
"requestedLanguage": "auto",
"detectedLanguage": "en",
"translate": false,
"input": {
"filename": "meeting.mp3",
"size": 1234567
},
"transcript": "And so my fellow Americans ask not what your country can do for you...",
"segments": [
{
"start": "00:00:00,000",
"end": "00:00:03,210",
"start_ms": 0,
"end_ms": 3210,
"text": " And so my fellow Americans..."
}
],
"outputs": [
{ "type": "txt", "filename": "meeting.txt" },
{ "type": "srt", "filename": "meeting.srt" }
]
}If you want to run the wrapper directly on your machine instead of inside Docker, first download and build whisper.cpp from upstream:
curl -L https://github.com/ggml-org/whisper.cpp/archive/refs/heads/master.zip -o /tmp/whisper.cpp.zip
unzip /tmp/whisper.cpp.zip -d /tmp
cd /tmp/whisper.cpp-master
cmake -B build -DBUILD_SHARED_LIBS=OFF -DWHISPER_BUILD_TESTS=OFF -DWHISPER_BUILD_SERVER=OFF
cmake --build build -j --config Release --target whisper-cli
./models/download-ggml-model.sh base ./modelsFrom the repo root, point the wrapper at the local CLI and model:
printf '%s\n' '{"xy":1,"cwd":"'"$PWD"'","params":{"model":"base","language":"auto","text":true,"srt":true},"input":{"files":[{"filename":"/tmp/whisper.cpp-master/samples/jfk.mp3"}]}}' | \
WHISPER_CLI_PATH="/tmp/whisper.cpp-master/build/bin/whisper-cli" \
WHISPER_MODEL=base \
WHISPER_MODEL_PATH="/tmp/whisper.cpp-master/models/ggml-base.bin" \
node index.jsThe wrapper writes any generated transcript files into ./output/ under the selected working directory.
Example for the base model:
docker build --build-arg WHISPER_MODEL=base -t xyplug-whisper-base:test .Then do a direct wrapper smoke test inside the container by bypassing xyrun:
mkdir -p /tmp/xyplug-whisper-test
curl -L https://github.com/ggml-org/whisper.cpp/raw/master/samples/jfk.mp3 -o /tmp/xyplug-whisper-test/sample.mp3
printf '%s\n' '{"xy":1,"cwd":"/work","params":{"model":"base","language":"auto","text":true,"srt":true},"input":{"files":[{"filename":"sample.mp3"}]}}' | \
docker run -i --rm \
-v /tmp/xyplug-whisper-test:/work \
--entrypoint node \
xyplug-whisper-base:test /app/index.jsFor a full end-to-end xyOps run, keep the default container command so xyrun can handle the real job file downloads and uploads.
The repository includes a workflow at .github/workflows/docker.yml that:
- builds one image per model size
- builds
linux/amd64andlinux/arm64 - publishes to GHCR on semver tag pushes such as
v1.0.0
This produces images like:
ghcr.io/pixlcore/xyplug-whisper-tiny:v1.0.0ghcr.io/pixlcore/xyplug-whisper-base:v1.0.0ghcr.io/pixlcore/xyplug-whisper-small:v1.0.0ghcr.io/pixlcore/xyplug-whisper-medium:v1.0.0ghcr.io/pixlcore/xyplug-whisper-large-v3:v1.0.0
MIT
