render: fix progressive A/V (lip-sync) drift across multi-segment concats#62
Open
wdeynes wants to merge 1 commit into
Open
render: fix progressive A/V (lip-sync) drift across multi-segment concats#62wdeynes wants to merge 1 commit into
wdeynes wants to merge 1 commit into
Conversation
Per-segment video rounds up to whole 24fps frames while AAC audio keeps the raw -t duration (~17-40ms shorter per segment). The -c copy concat packs each stream back-to-back independently, so the mismatch accumulates into progressive audio-early drift — measured -570ms over a 37-segment, 103s timeline via cross-correlation of output vs source audio. Quantize each segment to whole output frames (-frames:v, vdur=n/fps), force the audio to exactly vdur (atrim + apad), and write sample-exact PCM .mov intermediates, encoding AAC once at the final composite. After the fix every segment has |a-v| = 0ms and output-vs-source cross-correlation shows 0.0ms lag at every checkpoint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="helpers/render.py">
<violation number="1" location="helpers/render.py:531">
P2: Double AAC encoding in loudnorm path: `build_final_composite()` encodes PCM → AAC for the prenorm intermediate, then `apply_loudnorm_two_pass()` re-encodes AAC → AAC. This contradicts the PR goal of a single final AAC encode and wastes CPU while degrading audio quality.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
| run(["ffmpeg", "-y", "-i", str(base_path), "-c", "copy", str(out_path)], quiet=True) | ||
| # No filters — copy video, encode the PCM intermediate audio to AAC for mp4 | ||
| run(["ffmpeg", "-y", "-i", str(base_path), "-c:v", "copy", | ||
| "-c:a", "aac", "-b:a", "192k", "-ar", "48000", |
There was a problem hiding this comment.
P2: Double AAC encoding in loudnorm path: build_final_composite() encodes PCM → AAC for the prenorm intermediate, then apply_loudnorm_two_pass() re-encodes AAC → AAC. This contradicts the PR goal of a single final AAC encode and wastes CPU while degrading audio quality.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At helpers/render.py, line 531:
<comment>Double AAC encoding in loudnorm path: `build_final_composite()` encodes PCM → AAC for the prenorm intermediate, then `apply_loudnorm_two_pass()` re-encodes AAC → AAC. This contradicts the PR goal of a single final AAC encode and wastes CPU while degrading audio quality.</comment>
<file context>
@@ -508,8 +526,10 @@ def build_final_composite(
- run(["ffmpeg", "-y", "-i", str(base_path), "-c", "copy", str(out_path)], quiet=True)
+ # No filters — copy video, encode the PCM intermediate audio to AAC for mp4
+ run(["ffmpeg", "-y", "-i", str(base_path), "-c:v", "copy",
+ "-c:a", "aac", "-b:a", "192k", "-ar", "48000",
+ "-movflags", "+faststart", str(out_path)], quiet=True)
return
</file context>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Long EDLs render with progressively worsening lip-sync: the audio runs ahead of the video, getting worse toward the end of the timeline. On a real 37-segment / 103 s edit, audio was −570 ms early by the last segment — blatantly visible.
Root cause
extract_segment()writes each segment with-t <duration>at CFR-r 24+ AAC audio:-tlength (quantized only by AAC's 21.3 ms frame size)so every segment's audio ends up ~17–40 ms shorter than its video.
concat_segments()then concatenates with-c copy, and the concat demuxer packs each stream back-to-back independently — so the mismatch accumulates: segment N's audio plays roughly N × 17 ms before its video.Measured on a 37-segment EDL (per-segment
ffprobestream durations):Cross-correlating the output audio against the source audio at each segment's video-timeline position (numpy, 16 kHz mono, ±0.8 s search window) confirms the drift is progressive and audible:
(correlation confidence 0.90–1.00 at every checkpoint)
Fix
n_frames = round(duration × OUTPUT_FPS),vdur = n_frames / OUTPUT_FPS; cap video with-frames:v(the-tnow overshoots by 0.5 s purely to give the audio filters enough input).vdurwithatrim=end=vdur,apad=whole_dur=vdur(the 30 ms fades are unchanged, now timed againstvdur).pcm_s16le) in.movintermediates instead of AAC mp4 segments — PCM stream durations are sample-accurate, with no encoder priming or frame rounding to survive the concat demuxer.build_final_composite()'s early-return and filter paths now use-c:a aac -b:a 192kinstead of-c copy/-c:a copy. Final deliverables are unchanged (.mp4, h264 + AAC,+faststart).After
Notes
clips_*/seg_NN_<src>.movandbase*.mov(PCM-in-mp4 is poorly supported; final outputs are still mp4). PCM audio costs ~11.5 MB/min of intermediate disk — negligible next to the video data.apadfills audio tovdur; video may still come up short, as before.🤖 Generated with Claude Code
Summary by cubic
Fixes progressive lip-sync drift across multi-segment renders. Audio now stays aligned with video across the entire timeline (measured −570 ms -> 0 ms on a 103 s/37-segment edit).
Bug Fixes
OUTPUT_FPS=24and cap video with-frames:v.vdurexactly usingatrim+apad(30 ms fades now timed tovdur)..movfor safe-c copyconcat; encode AAC only once in the final composite.Migration
.mov:clips_*/seg_*.movandbase*.mov. Update any scripts that referenced.mp4..mp4(H.264 + AAC,+faststart).Written for commit f7206d8. Summary will update on new commits.