Make video stream optional in RTSP sessions#6
Merged
Conversation
697031d to
337ab0b
Compare
A session is now considered valid when any of video, audio, or analytics
metadata is set up, so audio-only and metadata-only RTSP configurations
(e.g. Axis cameras with `video=0`) work end-to-end.
Breaking: SessionDescription replaces the flat video and audio fields
with `video: VideoStream?` and `audio: AudioStream?` substructs. One
Optional check unlocks all the codec-and-parameter fields together —
removes the prior anti-pattern of multiple parallel Optionals sharing
the same "all-set-or-all-nil" invariant.
```swift
struct SessionDescription {
let video: VideoStream? // codec, clockRate, sps, pps, vps, resolution
let audio: AudioStream? // codec, sampleRate, channels, extraData
let metadataEncoding: String?
}
```
Internally, the startup path mirrors the audio/metadata best-effort
pattern: video find is optional, video SETUP failure is still fatal
(matching audio), and two gates flank PLAY:
- A pre-PLAY gate throws if no supported video/audio/metadata stream
was set up. The error message enumerates what was offered to make
misconfigurations diagnosable.
- A post-init gate catches the case where SETUP succeeded but every
depacketizer/timeline init failed (e.g. malformed AAC fmtp, broken
metadata clock rate). Required so the documented "at least one
usable stream" invariant holds at the return site too.
The audio depacketizer init path now also nullifies its stream state
on failure, for symmetry with the metadata path — keeps the post-init
gate honest.
Encoding-support predicates (`isVideoEncodingSupported`,
`isAudioEncodingSupported`, `isApplicationEncodingSupported`) are
extracted to module-level free functions so tests can exercise them
directly.
Four new SDP fixtures: three Axis-style configurations (audio-only, metadata-only, audio + metadata) modelled on `video=0` query strings, and one all-unsupported fixture (JPEG video, Opus audio, vendor- specific metadata) used to exercise the encoding-support filters. Parser tests cover each fixture's stream layout, including the negative invariants (no video, etc.) the optional-video refactor enables. Additional tests cover the encoding-support predicates directly and the "would the gate fire?" filter the predicates feed: the all-unsupported fixture produces nil for all three slots, while the Axis fixtures produce exactly the slots their SDPs advertise. Fixtures now include `a=recvonly` per media section, matching the ONVIF Streaming Specification example and the existing camera SDP fixtures in this directory.
- README: note that audio-only / metadata-only sessions are supported. - API.md: document the new VideoStream / AudioStream substructs and update the Quick Start snippet to the new shape (was still showing the flat field names and was missing the `.metadata` switch case). Also fills two pre-existing gaps that the metadata PR (#5) didn't catch: adds the metadataEncoding field, the `.metadata` PublicCodecItem case, and a PublicMetadataFrame definition. - CHANGELOG: add Upcoming entries for the breaking SessionDescription shape change (Breaking changes), the ONVIF analytics metadata stream support that #5 forgot to document (New), and the audio-init-failure state-coherence fix under Fixes (rather than New, since it's an internal cleanup not a user-visible feature).
337ab0b to
2d3f196
Compare
steelbrain
added a commit
that referenced
this pull request
May 30, 2026
Trust the AudioSpecificConfig over the rtpmap channel count (RFC 3640 makes the ASC authoritative; the rtpmap field is informational and often wrong/omitted), and accept an RTP clock that is a small integer multiple of the ASC sampling frequency (HE-AAC/SBR) — both previously dropped audio silently. Enforce the 16-bit AU-headers-length invariant the code already documents instead of truncating an off-spec length, and compare the full 32-bit fragment timestamp so a low-16-bit collision can't mis-stitch. Remove the dead SBR/PS branch and the unreachable channelConfiguration==0 guard. Addresses audit findings #5/#6/#17/#18/#20/#54. Adds regression tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the assumption that every RTSP session must carry an H.264/H.265 video stream. A session is now valid when any of video, audio, or analytics metadata is set up, so audio-only and metadata-only
configurations (e.g. Axis cameras with
video=0,video=0&audio=0&event=on) work end-to-end.Breaking changes
SessionDescriptionnow groups video and audio fields intoVideoStream?andAudioStream?substructs. One Optional check unlocks all the codec-and-parameter fields together — replaces the prior anti-pattern ofmultiple parallel Optionals sharing an "all-set-or-all-nil" invariant.
Consumer usage:
The doc comment on
SessionDescriptiondocuments the invariant: at least one ofvideo,audio, ormetadataEncodingis non-nil.What changed
RTSPSession.start()Video find is now optional (no throw on missing video).
Video SETUP, depacketizer init, and Timeline init are conditional on a successful video SETUP — mirroring the existing best-effort pattern used by audio and metadata.
New gate before PLAY throws
sessionSetupFailedif no supported video, audio, or metadata stream was set up. The error message includes the list of offered streams (e.g."... (offered: video/jpeg, audio/opus)") so misconfigurations are diagnosable.Audio depacketizer init failures now null out
audioStreamIndexand friends, matching the metadata-init failure path. Keeps state coherent for the new gate (an audio-only session whose AACfmtpis malformedno longer slips past with
audio: niland zero deliverable streams).The empty-
Data()fallback for SPS/PPS is gone —video.sps/video.ppsareniluntil parameters are observed rather than emptyData. Matches the new optional contract; previously a footgun if a consumerfed the empty buffer into
CMVideoFormatDescriptionCreateFromH264ParameterSets.The
depacketizer/videoClockRateinvariant ("non-nil together") is now expressed asif let depkt = depacketizer, let clockRate = videoClockRateat the construction site — no sentinel fallback needed.Decoder setup wrapped in
if let video = desc.video, let sps = video.sps, let pps = video.pps.Audio setup wrapped in
if let audio = desc.audio.Video case in the
for try await item in session.frames()loop skips whendesc.videois nil.Window title falls back to
"no video"when there's no video stream.Tests (+3, total 107)
DescribeParsertest per fixture asserting the expected stream layout, including the negative invariant that no video stream is present.Docs
README.md— adds a bullet that audio-only / metadata-only sessions are supported, and anif let video = desc.videosnippet in the usage example.API.md— replaces the flatSessionDescriptionblock withVideoStream/AudioStream/SessionDescription. Also fills two pre-existing gaps that the metadata PR (Add ONVIF analytics metadata stream support #5) didn't catch: addsmetadataEncoding, the.metadatacase toPublicCodecItem, and aPublicMetadataFramedefinition.CHANGELOG.md— adds Upcoming entries for both the metadata stream support (also missing from Add ONVIF analytics metadata stream support #5) and this breaking change.Design choices
unwrap, and codec/clockRate are now non-Optional inside the substruct (they're guaranteed when the stream exists). SPS/PPS/VPS/resolution remain Optional within
VideoStreambecause they reflect "parametersobserved yet?" — a real semantic distinct from "is there a video stream?".
audioCodec/audioSampleRate/audioChannels/audioExtraDatawere already parallel Optionals). Folding them intoAudioStreamis cheap when you're already breaking the API.(width: Int, height: Int)?— not promoted to a named struct, since that's a separate concern.best-effort like metadata is a defensible follow-up but out of scope here.
Compatibility
if let videoIdx = videoStreamIndexguards (pre-existing) already toleratednil.desc.videoCodec→desc.video?.codec,desc.audioSampleRate→desc.audio?.sampleRate, etc. No type-system changes beyond grouping.Test plan
swift build— clean.swift test— 107 tests pass across 16 suites (104 inherited from main + 3 new).xcrun swift-format lint --strict -r Sources/ Tests/ Examples/— clean.CameraViewer) builds with the new substruct flow.video=0/video=0&audio=0&event=onbefore tagging a release. The DescribeParser tests cover SDP parsing; the runtime path is not yetexercised against a live audio-only or metadata-only session.