[security] fix(podcast): cap remote media file downloads#237
[security] fix(podcast): cap remote media file downloads#237Hinotoi-agent wants to merge 4 commits into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 4, 2026, 11:08 PM ET / 03:08 UTC. Summary Reproducibility: yes. from source inspection and the PR tests: current main checks only HEAD metadata before the temp-file GET stream, while the PR exercises under-reported and oversized streams. I did not execute tests because the review contract required a read-only checkout. Review metrics: 3 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the hard cap after a maintainer explicitly accepts the documented 512 MB fail-closed upgrade behavior and the finite opt-in path for larger legitimate media. Do we have a high-confidence way to reproduce the issue? Yes from source inspection and the PR tests: current main checks only HEAD metadata before the temp-file GET stream, while the PR exercises under-reported and oversized streams. I did not execute tests because the review contract required a read-only checkout. Is this the best way to solve the issue? Yes, with a maintainer decision on compatibility: enforcing the cap at the write sink is the right security boundary, and the finite opt-in is a narrow way to support larger legitimate files without restoring unbounded downloads. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 821e76613ded. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper re-review I pushed follow-up commit
All completed successfully locally; only the repo's existing Node engine warning appeared because this machine is on Node v22.22.2 while |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Keep the default remote podcast/media download cap fail-closed while allowing operators to configure a larger finite byte limit explicitly. Invalid or unsafe override values are ignored so the built-in cap remains active, and the selected cap is used for both content-length preflight checks and streaming enforcement.
|
@clawsweeper re-review I pushed follow-up commit
Current local validation passed:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review I pushed follow-up commit
Current local validation passed:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Summary
This PR hardens remote podcast/media transcription downloads so the remote media size limit is enforced during the actual GET stream, not only during the optional HEAD preflight.
The default remains fail-closed at 512 MB. For operators who need larger legitimate remote podcast/media files and accept the disk/DoS tradeoff, this update also adds an explicit finite opt-in byte limit via
SUMMARIZE_REMOTE_MEDIA_MAX_BYTES. Override values must be positive integer byte counts; fractional, sub-byte, or otherwise unsafe values are ignored so the built-in cap remains active.Content-Lengthonce the streamed bytes exceed the cap.Content-Lengthchecks and streaming temp-file enforcement.Security issues covered
Before this PR
transcribeMediaUrl()rejected media only when a successful HEAD response reportedcontent-length > MAX_REMOTE_MEDIA_BYTES.Content-Length, or reported a smaller value than the GET body, the ffmpeg temp-file path streamed the full response to/tmp/summarize-podcast-*.bin.After this PR
transcribeMediaUrl()resolves an effective cap fromSUMMARIZE_REMOTE_MEDIA_MAX_BYTESonly when the value is a finite positive integer byte count.Infinity, fractional values, or sub-byte values are treated as unset and fall back to the built-in cap.Content-Lengthpreflight rejection and the GET temp-file write path.downloadToFile()checks the next byte count before every file write and throwsRemote media too largeinstead of writing beyond the configured cap.arrayBuffer()fallback also rejects oversized responses before writing a file.Why this matters
Remote media URLs can be controlled by a feed publisher or by whoever supplies a media URL to the CLI. A size check that only trusts HEAD metadata is not enough because servers can omit HEAD support, omit
Content-Length, use chunked transfer, or advertise a smaller size than the body returned by GET.Silently allowing unknown-size or under-reported responses to stream without a hard byte ceiling would keep the original host-disk exhaustion risk. The explicit finite opt-in gives operators a compatibility path for larger files without returning to unbounded temp-file writes by default.
Attack flow
Affected code
packages/core/src/content/transcript/providers/podcast/media.ts,packages/core/src/content/transcript/transcription-config.ts,tests/security.remote-media-file-download-cap.test.ts,docs/media.mdRoot cause
Remote media temp-file download cap:
CVSS assessment
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:LRationale:
Safe reproduction steps
downloadToFile()with a small testmaxBytesvalue.Remote media too largebefore writing past the cap.The regression tests use this pattern safely with local in-memory streams and temporary files.
Real behavior proof
I also ran real HTTP download proofs against local loopback servers. The URLs/ports are redacted because the servers were local and ephemeral.
Original fail-closed proof with omitted
Content-Lengthand production default cap:Redacted terminal output:
Follow-up proof for the explicit finite opt-in path:
Redacted terminal output:
Together, these proofs exercise the actual HTTP GET streaming path and show both the secure default and the explicit bounded compatibility override.
Expected vulnerable behavior
downloadToFile()resolved after writingmaxBytes + 1bytes.downloadToFile()resolved after writingmaxBytes + 1bytes.Remote media too largebefore writing beyond the selected cap.Changes in this PR
maxBytesguard todownloadToFile().SUMMARIZE_REMOTE_MEDIA_MAX_BYTESparsing through transcription config.Content-Lengthchecks and GET stream enforcement.docs/media.mdwith the 512 MB default, explicit opt-in variable, and invalid-value fallback.Files changed
packages/core/src/content/transcript/providers/podcast/media.tspackages/core/src/content/transcript/transcription-config.tsSUMMARIZE_REMOTE_MEDIA_MAX_BYTESand finite positive byte normalizationtests/security.remote-media-file-download-cap.test.tsdocs/media.mdMaintainer impact
Fix rationale
nextDownloadedBytesbefore each write prevents the temp file from growing past the configured limit.Type of change
Test plan
Current head validation (
b99d5bc7):pnpm -s vitest run tests/security.remote-media-file-download-cap.test.ts— 1 file passed, 4 tests passed, including fractional/sub-byte override regression coverage.pnpm -s tsx /tmp/summarize-remote-media-cap-optin-proof.ts— explicit opt-in proof passed.git diff --check— passed.pnpm -s format:check— all 969 matched files formatted.pnpm -s lint— 0 warnings and 0 errors across 903 files.pnpm -s typecheck— passed.Earlier validation on this branch also passed:
pnpm exec vitest run tests/security.remote-media-file-download-cap.test.tspnpm -s tsx /tmp/summarize-remote-media-cap-proof.tspnpm format:check docs/media.mdpnpm exec vitest run tests/transcript.podcast-provider.transcribe-media-url-branches.test.ts tests/security.remote-media-file-download-cap.test.tspnpm typecheckpnpm lintpnpm testLocal
pnpmpreviously emitted the repository engine warning because this machine was on Nodev22.22.2whilepackage.jsondeclaresnode >=24; the listed commands completed successfully locally.Disclosure notes
SECURITY.mdin the repository, so I am submitting this as a regular hardening PR with safe regression tests.