Skip to content

chore(scripts): add segmented-transcript backfill script#122

Merged
ssarunic merged 1 commit into
mainfrom
chore/backfill-segmented-script
May 30, 2026
Merged

chore(scripts): add segmented-transcript backfill script#122
ssarunic merged 1 commit into
mainfrom
chore/backfill-segmented-script

Conversation

@ssarunic

Copy link
Copy Markdown
Owner

Follow-up to #120 (which dropped the legacy-blended/shadow transcript tabs). That PR merged the frontend change but not the script that made it possible — this PR adds it.

What

scripts/backfill_segmented_transcripts.py — a one-off, idempotent repair that re-runs the spec #18 segmented cleanup (TranscriptCleaningProcessor) on episodes that have a cleaned Markdown transcript but no AnnotatedTranscript JSON sidecar. This is what backfilled the 5 stragglers that predated the segmented pipeline, bringing coverage to 866/866 so the legacy tab could be removed.

  • Dry-run by default; --apply to write; --episode-id to target specific rows; --force to re-clean episodes that already have a sidecar.
  • Skips episodes that already have a sidecar unless --force, so re-running is safe and won't re-spend LLM tokens.
  • Already ran successfully against the production DB (5/5 episodes).

Note

Uses print() for CLI output, consistent with the sibling scripts/backfill_feed_holes.py. scripts/ is one-off operational tooling, not application code.

Re-runs the spec #18 segmented cleanup (TranscriptCleaningProcessor) on
episodes that have a cleaned Markdown transcript but no AnnotatedTranscript
JSON sidecar, so every episode gets a segmented view. This backfilled the
5 stragglers that predated the segmented pipeline (coverage now 866/866),
which was the prerequisite for dropping the legacy-blended tab in #120.

Idempotent: skips episodes that already have a sidecar unless --force.
Dry-run by default; --apply to write; --episode-id to target specific rows.
@ssarunic ssarunic merged commit 2a42f53 into main May 30, 2026
5 checks passed
@ssarunic ssarunic deleted the chore/backfill-segmented-script branch May 30, 2026 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant