Background screen recording + system audio capture desktop app with LiveKit WebRTC integration. Built with Electron, TypeScript, and a native Swift sidecar for macOS accessibility APIs.
- Screen + system audio capture via ScreenCaptureKit (macOS 13+), no virtual audio driver needed
- LiveKit WebRTC publishing — publish/unpublish screen share tracks without interrupting local recording
- Gapless segment rotation — dual overlapping MediaRecorder instances for seamless 5-minute WebM chunks
- Swift accessibility sidecar — real-time a11y tree traversal with 13 app-specific parsers (Chrome, Slack, Zoom, etc.)
- Menu bar app — system tray UI with 3-state icons, no Dock presence
- 75 tests covering capture, recording, LiveKit, tray, sidecar crash recovery, and shutdown ordering
┌─────────────────────────────────────────────────────┐
│ Electron App │
│ │
│ Main Process │
│ ├── Tray (system tray menu, 3 states) │
│ ├── SidecarManager (spawn/restart Swift binary) │
│ ├── IPC handlers (file I/O, state relay) │
│ └── App lifecycle (flags, permissions, shutdown) │
│ │
│ Renderer Process (hidden window) │
│ ├── capture.ts (getDisplayMedia) │
│ ├── recorder.ts (MediaRecorder + segmentation) │
│ ├── livekit.ts (room + track publish) │
│ └── renderer.ts (orchestrator) │
│ │
│ Preload (contextBridge IPC) │
└──────────────┬──────────────────────────────────────┘
│ spawns
v
┌──────────────────────────────────────────────────────┐
│ Swift Sidecar (observer-sidecar) │
│ ├── A11y APIs (AXUIElement) │
│ ├── 13 app parsers (Chrome, Slack, Zoom, etc.) │
│ ├── HashStore (dedupe) │
│ ├── SnapshotCapture (diff) │
│ └── stdout: JSON Lines │
└──────────────────────────────────────────────────────┘
All user interaction via system tray context menu (no visible window). The renderer owns MediaStream/MediaRecorder/livekit-client (browser context required). The sidecar communicates via stdio (JSON Lines on stdout).
| Decision | Choice | Rationale |
|---|---|---|
| Screen capture | setDisplayMediaRequestHandler with useSystemPicker: false |
Auto-grants without picker dialog after initial OS permission |
| System audio | audio: 'loopback' + Chromium flags |
Native macOS 13+ ScreenCaptureKit, no third-party driver needed |
| Video format | WebM (VP8 + Opus) | MediaRecorder limitation — VP8 is more reliable than VP9 across Electron builds |
| Segment rotation | Dual overlapping MediaRecorder instances | Single stop/start has frame drops; two alternating recorders produce seamless segments |
| Duration fix | fix-webm-duration |
Chromium bug: MediaRecorder writes WebM without duration metadata |
| Sidecar protocol | stdio JSON Lines | Simpler lifecycle than HTTP/WebSocket, no port conflicts |
| LiveKit context | Data channel (publishData) |
Separate from media tracks, structured JSON, chunked for >15KB |
| Dock hiding | app.dock.hide() + type: 'panel' + LSUIElement |
Prevents dock icon reappearing when BrowserWindow interacts |
- Node.js 18+
- macOS 14+ (for ScreenCaptureKit audio)
- Xcode Command Line Tools (for Swift sidecar build)
The sidecar is a Swift executable in sidecar-swift/ that uses macOS Accessibility APIs to observe the frontmost app.
cd sidecar-swift
swift build --product caret-sidecar
mkdir -p ../sidecar-bin
cp .build/debug/caret-sidecar ../sidecar-bin/observer-sidecarnpm install
npm startSet environment variables before starting:
export LIVEKIT_URL=ws://localhost:7880
export LIVEKIT_TOKEN=<your-access-token>
npm startOr use LiveKit's Meet app to generate a token for testing.
On first launch, macOS will prompt for:
- Screen Recording — required for capture
- Accessibility — required for the sidecar (a11y tree traversal)
Grant both in System Settings > Privacy & Security. The app may need a restart after granting permissions.
- Launch the app — it appears as a gray circle in the menu bar (no Dock icon)
- Click the tray icon > Start Recording — icon turns red, segments start saving to
~/Documents/Caret/Recordings/ - Click > Start Publishing — icon turns green, screen + audio publish to LiveKit room
- Click > Stop Publishing — returns to red, local recording continues
- Click > Stop Recording — finalizes last segment, icon returns to gray
- Click > Quit — clean shutdown (flushes final segment)
~/Documents/Caret/Recordings/
├── 2025-01-15T10-30-00-000Z_000.webm # Video segment 1
├── 2025-01-15T10-30-00-000Z_001.webm # Video segment 2
└── context-2025-01-15T10-30-00-000Z.jsonl # Sidecar context data
caret-recorder/
├── src/
│ ├── main.ts — Electron entry, Chromium flags, hidden window, IPC
│ ├── tray.ts — System tray icon + context menu (3 states)
│ ├── preload.ts — contextBridge IPC exposure
│ ├── renderer.ts — Orchestrator (capture + recorder + livekit + sidecar data)
│ ├── capture.ts — getDisplayMedia wrapper
│ ├── recorder.ts — Dual MediaRecorder + 5-min segmentation
│ ├── livekit.ts — LiveKit room + track publish/unpublish/data
│ ├── sidecar/
│ │ ├── sidecar-manager.ts — Spawn/readline/restart Swift binary
│ │ └── types.ts — SidecarEvent, payload types
│ └── shared/
│ └── types.ts — AppState enum, IPC channels, config constants
├── sidecar-swift/ — Swift sidecar source (macOS a11y observer)
│ ├── Package.swift
│ └── Sources/CaretSidecar/
│ ├── main.swift
│ ├── FrontmostAppObserver.swift
│ ├── AccessibilityTraversal.swift
│ ├── Payloads.swift
│ └── JSONOutput.swift
├── sidecar-bin/ — Built sidecar binary (gitignored)
├── forge.config.ts — Electron Forge + Vite config, extraResource for sidecar
├── index.html — Minimal shell (hidden window)
└── package.json
Screen + Audio → getDisplayMedia → MediaStream
├── SegmentedRecorder → WebM segments → disk (~/Documents/Caret/Recordings/)
└── LiveKitPublisher → Track.Source.ScreenShare + ScreenShareAudio → LiveKit room
Sidecar stdout → SidecarManager.readline → parsed JSON
├── IPC → renderer → LiveKit data channel (real-time context)
├── disk (context-{timestamp}.jsonl alongside WebM segments)
└── Main process → tray tooltip (current app name)
Offline pipeline that processes recorded JSONL into condensed markdown moments. See docs/pipeline.md for full architecture.
adapters → dedup → buffer → stripper → condensation → moments
The first four stages are mechanical (no LLM). Condensation uses source-specific prompts to extract signal from stripped accessibility tree diffs.
Condensation requires Codex CLI for LLM calls (gpt-5.4 via ChatGPT account):
npm install -g @openai/codex
codex loginThen run the full pipeline on recorded data:
npx tsx pipeline/run_pipeline.ts ~/Documents/Caret/Recordings/context-*.jsonlnpx tsx pipeline/test_dedup.ts
npx tsx pipeline/test_buffer.ts
npx tsx pipeline/test_stripper.ts
npx tsx pipeline/test_condensation.ts
npx tsx pipeline/bench_pipeline.ts ~/Documents/Caret/Recordings/context-*.jsonl- WebM not MP4 — MediaRecorder API limitation in Chromium. VP8+Opus produces WebM; converting to MP4 would require ffmpeg post-processing
- System audio requires macOS 13+ — Earlier versions need a virtual audio driver (BlackHole)
- Screen Recording permission requires app restart — macOS caches the grant; first launch may need manual restart
- Hidden window appears in capture — Electron's hidden BrowserWindow is included in full-screen capture (mitigated by 1x1 pixel size +
type: 'panel') - No upload — Segments are saved locally only; background upload is not implemented
- Single monitor — Captures primary display only; multi-monitor selection is not exposed in UI
- LiveKit data channel 15KB limit — Large traversal payloads are chunked automatically, but receiver must reassemble
npm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # Coverage report
npm test -- --reporter=verbose # See all test namesTests cover: 1080p/30fps capture, system audio, 5-min WebM segments, LiveKit publish/unpublish, system tray states, sidecar crash recovery, and clean shutdown ordering. The sidecar binary integration test runs automatically when the binary is present and is skipped otherwise.
- Upload pipeline — Background upload of segments to S3/GCS with retry logic
- ffmpeg transcoding — Convert WebM segments to MP4 (H.264+AAC) post-capture for wider compatibility
- Multi-monitor picker — Allow selecting which display to capture via tray submenu
- Performance profiling — Measure actual CPU/memory impact of dual-recorder approach
- Auto-reconnect LiveKit — The SDK handles it, but re-publishing tracks after reconnect needs explicit handling
- Sidecar binary signing — Move from
extraResourcetoFrameworks/for proper macOS code signing - End-to-end tests — Playwright/Spectron for full Electron lifecycle (segment files on disk, LiveKit track presence)