Skip to content

Make video stream optional in RTSP sessions#6

Merged
steelbrain merged 3 commits into
mainfrom
steelbrain/optional-video-stream
May 20, 2026
Merged

Make video stream optional in RTSP sessions#6
steelbrain merged 3 commits into
mainfrom
steelbrain/optional-video-stream

Conversation

@steelbrain

@steelbrain steelbrain commented May 20, 2026

Copy link
Copy Markdown
Owner

Summary

Removes the assumption that every RTSP session must carry an H.264/H.265 video stream. A session is now valid when any of video, audio, or analytics metadata is set up, so audio-only and metadata-only
configurations (e.g. Axis cameras with video=0, video=0&audio=0&event=on) work end-to-end.

Breaking changes

SessionDescription now groups video and audio fields into VideoStream? and AudioStream? substructs. One Optional check unlocks all the codec-and-parameter fields together — replaces the prior anti-pattern of
multiple parallel Optionals sharing an "all-set-or-all-nil" invariant.

// before
let videoCodec: VideoCodec
let sps: Data
let pps: Data
let vps: Data?
let resolution: (width: Int, height: Int)?
let clockRate: UInt32
let audioCodec: PublicAudioCodec?
let audioSampleRate: UInt32?
let audioChannels: UInt16?
let audioExtraData: Data?

// after
let video: VideoStream?
let audio: AudioStream?

struct VideoStream {
  let codec: VideoCodec
  let clockRate: UInt32
  let sps: Data?           // nil until parameters observed
  let pps: Data?
  let vps: Data?
  let resolution: (width: Int, height: Int)?
}

struct AudioStream {
  let codec: PublicAudioCodec
  let sampleRate: UInt32
  let channels: UInt16?
  let extraData: Data?     // e.g. AudioSpecificConfig for AAC
}

Consumer usage:

if let video = desc.video {
    setupDecoder(codec: video.codec, sps: video.sps, pps: video.pps, vps: video.vps)
}
if let audio = desc.audio {
    audioPlayer.start(codec: audio.codec, sampleRate: audio.sampleRate)
}

The doc comment on SessionDescription documents the invariant: at least one of video, audio, or metadataEncoding is non-nil.

What changed

RTSPSession.start()

  • Video find is now optional (no throw on missing video).

  • Video SETUP, depacketizer init, and Timeline init are conditional on a successful video SETUP — mirroring the existing best-effort pattern used by audio and metadata.

  • New gate before PLAY throws sessionSetupFailed if no supported video, audio, or metadata stream was set up. The error message includes the list of offered streams (e.g. "... (offered: video/jpeg, audio/opus)") so misconfigurations are diagnosable.

  • Audio depacketizer init failures now null out audioStreamIndex and friends, matching the metadata-init failure path. Keeps state coherent for the new gate (an audio-only session whose AAC fmtp is malformed
    no longer slips past with audio: nil and zero deliverable streams).

  • The empty-Data() fallback for SPS/PPS is gone — video.sps/video.pps are nil until parameters are observed rather than empty Data. Matches the new optional contract; previously a footgun if a consumer
    fed the empty buffer into CMVideoFormatDescriptionCreateFromH264ParameterSets.

  • The depacketizer / videoClockRate invariant ("non-nil together") is now expressed as if let depkt = depacketizer, let clockRate = videoClockRate at the construction site — no sentinel fallback needed.

  • Decoder setup wrapped in if let video = desc.video, let sps = video.sps, let pps = video.pps.

  • Audio setup wrapped in if let audio = desc.audio.

  • Video case in the for try await item in session.frames() loop skips when desc.video is nil.

  • Window title falls back to "no video" when there's no video stream.

Tests (+3, total 107)

  • Three Axis-style SDP fixtures: audio-only, metadata-only, and audio + metadata.
  • One DescribeParser test per fixture asserting the expected stream layout, including the negative invariant that no video stream is present.

Docs

  • README.md — adds a bullet that audio-only / metadata-only sessions are supported, and an if let video = desc.video snippet in the usage example.
  • API.md — replaces the flat SessionDescription block with VideoStream / AudioStream / SessionDescription. Also fills two pre-existing gaps that the metadata PR (Add ONVIF analytics metadata stream support #5) didn't catch: adds
    metadataEncoding, the .metadata case to PublicCodecItem, and a PublicMetadataFrame definition.
  • CHANGELOG.md — adds Upcoming entries for both the metadata stream support (also missing from Add ONVIF analytics metadata stream support #5) and this breaking change.

Design choices

  • Substruct over parallel Optionals. Five fields with a shared "either-all-set-or-all-nil" invariant is worse than one Optional carrying a substruct: the compiler enforces the invariant once, consumers get one
    unwrap, and codec/clockRate are now non-Optional inside the substruct (they're guaranteed when the stream exists). SPS/PPS/VPS/resolution remain Optional within VideoStream because they reflect "parameters
    observed yet?" — a real semantic distinct from "is there a video stream?".
  • Audio gets the same treatment. The flat audio fields had the identical anti-pattern (audioCodec / audioSampleRate / audioChannels / audioExtraData were already parallel Optionals). Folding them into
    AudioStream is cheap when you're already breaking the API.
  • Resolution stays as a tuple. (width: Int, height: Int)? — not promoted to a named struct, since that's a separate concern.
  • Video SETUP failure is still fatal (matches audio). Reported case is "no video advertised at all," not "video advertised but SETUP rejected." Keeping SETUP fatal is the narrower change; making it
    best-effort like metadata is a defensible follow-up but out of scope here.
  • At-least-one-stream check sits before PLAY, not after. A degenerate session has nothing to deliver, and sending PLAY just opens the door to packets we can't route.

Compatibility

  • Cameras that advertise video continue to behave exactly as before. The dispatch loop's if let videoIdx = videoStreamIndex guards (pre-existing) already tolerated nil.
  • Audio and metadata paths were already optional/best-effort and are unchanged in behavior, with the one symmetric tweak above (audio depacketizer init failure now nulls state).
  • Field renames are the only source breakage — desc.videoCodecdesc.video?.codec, desc.audioSampleRatedesc.audio?.sampleRate, etc. No type-system changes beyond grouping.

Test plan

  • swift build — clean.
  • swift test — 107 tests pass across 16 suites (104 inherited from main + 3 new).
  • xcrun swift-format lint --strict -r Sources/ Tests/ Examples/ — clean.
  • Example app (CameraViewer) builds with the new substruct flow.
  • Pending live test: verify against a real Axis camera with video=0 / video=0&audio=0&event=on before tagging a release. The DescribeParser tests cover SDP parsing; the runtime path is not yet
    exercised against a live audio-only or metadata-only session.

@steelbrain steelbrain marked this pull request as draft May 20, 2026 14:31
@steelbrain steelbrain force-pushed the steelbrain/optional-video-stream branch 2 times, most recently from 697031d to 337ab0b Compare May 20, 2026 15:17
A session is now considered valid when any of video, audio, or analytics
metadata is set up, so audio-only and metadata-only RTSP configurations
(e.g. Axis cameras with `video=0`) work end-to-end.

Breaking: SessionDescription replaces the flat video and audio fields
with `video: VideoStream?` and `audio: AudioStream?` substructs. One
Optional check unlocks all the codec-and-parameter fields together —
removes the prior anti-pattern of multiple parallel Optionals sharing
the same "all-set-or-all-nil" invariant.

```swift
struct SessionDescription {
  let video: VideoStream?         // codec, clockRate, sps, pps, vps, resolution
  let audio: AudioStream?         // codec, sampleRate, channels, extraData
  let metadataEncoding: String?
}
```

Internally, the startup path mirrors the audio/metadata best-effort
pattern: video find is optional, video SETUP failure is still fatal
(matching audio), and two gates flank PLAY:

- A pre-PLAY gate throws if no supported video/audio/metadata stream
  was set up. The error message enumerates what was offered to make
  misconfigurations diagnosable.
- A post-init gate catches the case where SETUP succeeded but every
  depacketizer/timeline init failed (e.g. malformed AAC fmtp, broken
  metadata clock rate). Required so the documented "at least one
  usable stream" invariant holds at the return site too.

The audio depacketizer init path now also nullifies its stream state
on failure, for symmetry with the metadata path — keeps the post-init
gate honest.

Encoding-support predicates (`isVideoEncodingSupported`,
`isAudioEncodingSupported`, `isApplicationEncodingSupported`) are
extracted to module-level free functions so tests can exercise them
directly.
Four new SDP fixtures: three Axis-style configurations (audio-only,
metadata-only, audio + metadata) modelled on `video=0` query strings,
and one all-unsupported fixture (JPEG video, Opus audio, vendor-
specific metadata) used to exercise the encoding-support filters.

Parser tests cover each fixture's stream layout, including the
negative invariants (no video, etc.) the optional-video refactor
enables. Additional tests cover the encoding-support predicates
directly and the "would the gate fire?" filter the predicates feed:
the all-unsupported fixture produces nil for all three slots, while
the Axis fixtures produce exactly the slots their SDPs advertise.

Fixtures now include `a=recvonly` per media section, matching the
ONVIF Streaming Specification example and the existing camera SDP
fixtures in this directory.
- README: note that audio-only / metadata-only sessions are supported.
- API.md: document the new VideoStream / AudioStream substructs and
  update the Quick Start snippet to the new shape (was still showing
  the flat field names and was missing the `.metadata` switch case).
  Also fills two pre-existing gaps that the metadata PR (#5) didn't
  catch: adds the metadataEncoding field, the `.metadata` PublicCodecItem
  case, and a PublicMetadataFrame definition.
- CHANGELOG: add Upcoming entries for the breaking SessionDescription
  shape change (Breaking changes), the ONVIF analytics metadata stream
  support that #5 forgot to document (New), and the audio-init-failure
  state-coherence fix under Fixes (rather than New, since it's an
  internal cleanup not a user-visible feature).
@steelbrain steelbrain force-pushed the steelbrain/optional-video-stream branch from 337ab0b to 2d3f196 Compare May 20, 2026 15:35
@steelbrain steelbrain marked this pull request as ready for review May 20, 2026 15:40
@steelbrain steelbrain merged commit bc2ccd3 into main May 20, 2026
@steelbrain steelbrain deleted the steelbrain/optional-video-stream branch May 20, 2026 15:40
steelbrain added a commit that referenced this pull request May 30, 2026
Trust the AudioSpecificConfig over the rtpmap channel count (RFC 3640
makes the ASC authoritative; the rtpmap field is informational and often
wrong/omitted), and accept an RTP clock that is a small integer multiple
of the ASC sampling frequency (HE-AAC/SBR) — both previously dropped
audio silently. Enforce the 16-bit AU-headers-length invariant the code
already documents instead of truncating an off-spec length, and compare
the full 32-bit fragment timestamp so a low-16-bit collision can't
mis-stitch. Remove the dead SBR/PS branch and the unreachable
channelConfiguration==0 guard.

Addresses audit findings #5/#6/#17/#18/#20/#54. Adds regression tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant