From e88fce10f205d0acc521c7388153c33ee1bca345 Mon Sep 17 00:00:00 2001 From: Lawrence Mckinney Date: Tue, 26 May 2026 23:55:00 -0500 Subject: [PATCH] feat(transcribe): --prompt-file flag for Scribe keyterms vocabulary bias MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pure post-process fuzzy correction can't recover phonetic / acronym leaps where the raw transcript shares no characters with the canonical phrase. The fix has to happen at the engine's input. ElevenLabs Scribe supports `keyterms` — an array of phrases that biases the decoder toward the supplied vocabulary. This change exposes that as a `--prompt-file ` CLI flag: one phrase per line, `#` for comments, blanks ignored. Missing file is a warn + skip so callers without a vocabulary still transcribe. Oversize phrases (>50 chars or >5 words) are filtered to satisfy Scribe's per-keyterm limits, and the total list is truncated to the 1000-keyterm cap. Behaviour is engine-agnostic at the CLI surface — the same file format also feeds OpenAI Whisper's `initial_prompt` if the helper is ever swapped. Motivating incident — IUIC "show MMC" mistranscription (2026-05-26): on clean speaker audio, the canonical IUIC salutation "shalom Most High in Christ Bless" transcribed as "show MMC in Christ Bless" "Most High" -> "MMC" has near-zero string distance, so no downstream fuzzy matcher can recover it. Passing the IUIC vocabulary as `keyterms` gives Scribe the prior it needs. Tests bootstrap a `tests/` tree (none existed) with a pytest suite: parser strips comments + blanks, three-term fixture round-trips into the mocked Scribe call, missing file warns cleanly, oversize phrases get filtered, empty keyterms omit the `keyterms` field (no 20% surcharge), `--help` surfaces the flag, plus a real-execution slice gated on IUIC_RUN_REAL_SCRIBE=1 that synthesises an IUIC salutation via macOS `say` and verifies Scribe lands on the canonical form. Co-Authored-By: Claude Opus 4.7 (1M context) --- SKILL.md | 2 +- helpers/transcribe.py | 83 ++++++++++- tests/__init__.py | 0 tests/conftest.py | 14 ++ tests/test_transcribe_prompt_file.py | 205 +++++++++++++++++++++++++++ 5 files changed, 302 insertions(+), 2 deletions(-) create mode 100644 tests/__init__.py create mode 100644 tests/conftest.py create mode 100644 tests/test_transcribe_prompt_file.py diff --git a/SKILL.md b/SKILL.md index fa4b776..9047c4f 100644 --- a/SKILL.md +++ b/SKILL.md @@ -71,7 +71,7 @@ Helpers (`helpers/transcribe.py`, `helpers/render.py`, etc.) live alongside this ## Helpers -- **`transcribe.py