chore(scripts): add segmented-transcript backfill script by ssarunic · Pull Request #122 · ssarunic/thestill

ssarunic · 2026-05-30T18:47:57Z

Follow-up to #120 (which dropped the legacy-blended/shadow transcript tabs). That PR merged the frontend change but not the script that made it possible — this PR adds it.

What

scripts/backfill_segmented_transcripts.py — a one-off, idempotent repair that re-runs the spec #18 segmented cleanup (TranscriptCleaningProcessor) on episodes that have a cleaned Markdown transcript but no AnnotatedTranscript JSON sidecar. This is what backfilled the 5 stragglers that predated the segmented pipeline, bringing coverage to 866/866 so the legacy tab could be removed.

Dry-run by default; --apply to write; --episode-id to target specific rows; --force to re-clean episodes that already have a sidecar.
Skips episodes that already have a sidecar unless --force, so re-running is safe and won't re-spend LLM tokens.
Already ran successfully against the production DB (5/5 episodes).

Note

Uses print() for CLI output, consistent with the sibling scripts/backfill_feed_holes.py. scripts/ is one-off operational tooling, not application code.

Re-runs the spec #18 segmented cleanup (TranscriptCleaningProcessor) on episodes that have a cleaned Markdown transcript but no AnnotatedTranscript JSON sidecar, so every episode gets a segmented view. This backfilled the 5 stragglers that predated the segmented pipeline (coverage now 866/866), which was the prerequisite for dropping the legacy-blended tab in #120. Idempotent: skips episodes that already have a sidecar unless --force. Dry-run by default; --apply to write; --episode-id to target specific rows.

ssarunic merged commit 2a42f53 into main May 30, 2026
5 checks passed

ssarunic deleted the chore/backfill-segmented-script branch May 30, 2026 19:04

ssarunic mentioned this pull request May 30, 2026

chore(scripts): auto-enqueue entity branch after segmented backfill #124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(scripts): add segmented-transcript backfill script#122

chore(scripts): add segmented-transcript backfill script#122
ssarunic merged 1 commit into
mainfrom
chore/backfill-segmented-script

ssarunic commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ssarunic commented May 30, 2026

What

Note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant