Skip to content

Add ONVIF analytics metadata stream support#5

Merged
steelbrain merged 7 commits into
mainfrom
steelbrain/application-metadata-stream
May 20, 2026
Merged

Add ONVIF analytics metadata stream support#5
steelbrain merged 7 commits into
mainfrom
steelbrain/application-metadata-stream

Conversation

@steelbrain

Copy link
Copy Markdown
Owner

Summary

Adds support for the RTSP application media-type stream, depacketizing ONVIF analytics metadata (vnd.onvif.metadata) alongside the existing video and audio streams. The new ApplicationDepacketizer follows
the ONVIF Streaming Specification: payloads concatenate across RTP packets until the marker bit, which signals end-of-document.

Closes #N/A. Internal request — supports cameras like Axis and Dahua that expose analytics events (object detection, scene metadata) on a third RTSP stream.

What's new

  • PublicMetadataFrame — exposes data, timestamp, encodingName, loss. Surfaced as PublicCodecItem.metadata(...) in the session.frames() stream.
  • SessionDescription.metadataEncoding: String? — non-nil when a metadata stream is active, mirroring the audio-discovery pattern (audioCodec, audioSampleRate).
  • ApplicationDepacketizer in Sources/IPCamKit/Codec/ — internal type, matches the conventions used by AudioDepacketizer/SimpleAudioDepacketizer.

Design choices

  • Encoding allowlist: only vnd.onvif.metadata for now (gated by isApplicationEncodingSupported, same shape as isAudioEncodingSupported). Other application encodings are silently skipped — easy to extend.
  • Truly best-effort: application SETUP failures and metadata-stream init failures both degrade to .warning diagnostics rather than aborting the session. Video/audio retain their original behavior.
  • Fragment cap: 1 MiB per document. Overflow fires one .warning per in-flight document (rate-limited) and drops until the next marker. Mid-document loss is handled identically.
  • push invariant enforced: precondition(ready == nil) means a caller that forgets to drain via pull() fails fast instead of silently dropping frames.
  • PublicMetadataFrame.timestamp reflects the marker packet's timestamp (last-packet semantics). For typical ONVIF documents where all fragments share a timestamp, this matches the audio/video convention; for
    the rare case of timestamp changes mid-document, this is documented on the field.

Compatibility

  • Existing video/audio behavior is unchanged. Cameras without an application m-line see no difference.
  • Tested against the existing Dahua and Hikvision SDP fixtures (DescribeParserTests), which already advertise vnd.onvif.metadata/90000 streams; my changes ensure those streams now get SETUP'd and depacketized.

Test plan

  • swift build — clean.
  • swift test — 104 tests across 16 suites pass, including 11 new tests in ApplicationDepacketizerTests.
  • Verified the example app (CameraViewer) builds with the new exhaustive switch arm.
  • Pending live test: confirm end-to-end against a real Axis or Dahua camera before tagging a release.

Review trail

The depacketizer survived four review rounds during development (each catching a real bug: overflow loss-accounting, post-overflow document swallowing, edge cases in loss/marker interaction). After the initial
branch was ready, an Opus self-review surfaced five additional issues — all fixed in commit 415c9fb (precondition, drop-path invariants, best-effort SETUP, public discovery field, missing test).

Accumulates RTP payload bytes across packets and emits a MetadataFrame
when the marker bit fires (per ONVIF Streaming Spec: marker = end of
XML document). Mid-document loss and oversized documents both discard
the in-flight prefix and drop until the next marker; loss is preserved
across drops so it surfaces on the next clean frame.

Extends internal CodecItem with .metadataFrame; the public surface is
wired up in subsequent commits.
Discover the application m-line during DESCRIBE, SETUP it alongside
audio (best-effort — the rest of the session continues if the metadata
stream can't be initialized), and dispatch incoming RTP packets to the
ApplicationDepacketizer. Surface frames as PublicCodecItem.metadata
carrying raw bytes, timestamp, encoding name, and loss count.

Limited to vnd.onvif.metadata encoding for now; other application
encodings fall through to the same not-supported path as unknown audio.
Cover the happy path (single + multi-packet documents, back-to-back
documents), loss handling (initial, between, mid-document), the
1 MiB overflow cap with diagnostic rate-limiting and recovery, and
the empty-payload edge cases at idle and mid-document.
Adds the feature bullet, a .metadata arm in the usage example, a
Metadata subsection, and an architecture-diagram tweak. Also catches
up the test count (90 -> 103) that was already stale on main.
- Add precondition(ready == nil) to ApplicationDepacketizer.push so
  push-without-drain is a hard failure rather than silent data loss.
- Nil out lastTimestamp on all drop paths so the "buffer non-empty iff
  lastTimestamp set" invariant is local rather than global.
- Wrap application SETUP in do/catch so a 4xx response degrades to a
  diagnostic instead of aborting the whole session.
- Expose metadataEncoding on SessionDescription so consumers can
  discover metadata availability without waiting for the first packet.
- Add depacketizer test for the marker-packet-carries-loss path.
@steelbrain steelbrain marked this pull request as draft May 20, 2026 03:42
@steelbrain steelbrain marked this pull request as ready for review May 20, 2026 14:21
@steelbrain steelbrain merged commit e67037b into main May 20, 2026
2 checks passed
@steelbrain steelbrain deleted the steelbrain/application-metadata-stream branch May 20, 2026 14:21
steelbrain added a commit that referenced this pull request May 20, 2026
- README: note that audio-only / metadata-only sessions are supported.
- API.md: document the new VideoStream / AudioStream substructs. Also
  fills two pre-existing gaps that the metadata PR (#5) didn't catch:
  adds the metadataEncoding field, the .metadata PublicCodecItem case,
  and a PublicMetadataFrame definition.
- CHANGELOG: add Upcoming entries for both the metadata stream support
  (also missing from #5) and this breaking change.
steelbrain added a commit that referenced this pull request May 20, 2026
- README: note that audio-only / metadata-only sessions are supported.
- API.md: document the new VideoStream / AudioStream substructs. Also
  fills two pre-existing gaps that the metadata PR (#5) didn't catch:
  adds the metadataEncoding field, the .metadata PublicCodecItem case,
  and a PublicMetadataFrame definition.
- CHANGELOG: add Upcoming entries for the breaking SessionDescription
  shape change (Breaking changes), the ONVIF analytics metadata stream
  support that #5 forgot to document (New), and the audio-init-failure
  state-coherence fix under Fixes (rather than New, since it's an
  internal cleanup not a user-visible feature).
steelbrain added a commit that referenced this pull request May 20, 2026
- README: note that audio-only / metadata-only sessions are supported.
- API.md: document the new VideoStream / AudioStream substructs and
  update the Quick Start snippet to the new shape (was still showing
  the flat field names and was missing the `.metadata` switch case).
  Also fills two pre-existing gaps that the metadata PR (#5) didn't
  catch: adds the metadataEncoding field, the `.metadata` PublicCodecItem
  case, and a PublicMetadataFrame definition.
- CHANGELOG: add Upcoming entries for the breaking SessionDescription
  shape change (Breaking changes), the ONVIF analytics metadata stream
  support that #5 forgot to document (New), and the audio-init-failure
  state-coherence fix under Fixes (rather than New, since it's an
  internal cleanup not a user-visible feature).
steelbrain added a commit that referenced this pull request May 30, 2026
Trust the AudioSpecificConfig over the rtpmap channel count (RFC 3640
makes the ASC authoritative; the rtpmap field is informational and often
wrong/omitted), and accept an RTP clock that is a small integer multiple
of the ASC sampling frequency (HE-AAC/SBR) — both previously dropped
audio silently. Enforce the 16-bit AU-headers-length invariant the code
already documents instead of truncating an off-spec length, and compare
the full 32-bit fragment timestamp so a low-16-bit collision can't
mis-stitch. Remove the dead SBR/PS branch and the unreachable
channelConfiguration==0 guard.

Addresses audit findings #5/#6/#17/#18/#20/#54. Adds regression tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant