Caret Desktop Recorder

Background screen recording + system audio capture desktop app with LiveKit WebRTC integration. Built with Electron, TypeScript, and a native Swift sidecar for macOS accessibility APIs.

Screen + system audio capture via ScreenCaptureKit (macOS 13+), no virtual audio driver needed
LiveKit WebRTC publishing — publish/unpublish screen share tracks without interrupting local recording
Gapless segment rotation — dual overlapping MediaRecorder instances for seamless 5-minute WebM chunks
Swift accessibility sidecar — real-time a11y tree traversal with 13 app-specific parsers (Chrome, Slack, Zoom, etc.)
Menu bar app — system tray UI with 3-state icons, no Dock presence
75 tests covering capture, recording, LiveKit, tray, sidecar crash recovery, and shutdown ordering

Architecture

┌─────────────────────────────────────────────────────┐
│ Electron App                                        │
│                                                     │
│  Main Process                                       │
│  ├── Tray (system tray menu, 3 states)              │
│  ├── SidecarManager (spawn/restart Swift binary)    │
│  ├── IPC handlers (file I/O, state relay)           │
│  └── App lifecycle (flags, permissions, shutdown)    │
│                                                     │
│  Renderer Process (hidden window)                   │
│  ├── capture.ts (getDisplayMedia)                   │
│  ├── recorder.ts (MediaRecorder + segmentation)     │
│  ├── livekit.ts (room + track publish)              │
│  └── renderer.ts (orchestrator)                     │
│                                                     │
│  Preload (contextBridge IPC)                        │
└──────────────┬──────────────────────────────────────┘
               │ spawns
               v
┌──────────────────────────────────────────────────────┐
│ Swift Sidecar (observer-sidecar)                     │
│  ├── A11y APIs (AXUIElement)                         │
│  ├── 13 app parsers (Chrome, Slack, Zoom, etc.)      │
│  ├── HashStore (dedupe)                              │
│  ├── SnapshotCapture (diff)                          │
│  └── stdout: JSON Lines                              │
└──────────────────────────────────────────────────────┘

All user interaction via system tray context menu (no visible window). The renderer owns MediaStream/MediaRecorder/livekit-client (browser context required). The sidecar communicates via stdio (JSON Lines on stdout).

Key Technical Decisions

Decision	Choice	Rationale
Screen capture	`setDisplayMediaRequestHandler` with `useSystemPicker: false`	Auto-grants without picker dialog after initial OS permission
System audio	`audio: 'loopback'` + Chromium flags	Native macOS 13+ ScreenCaptureKit, no third-party driver needed
Video format	WebM (VP8 + Opus)	MediaRecorder limitation — VP8 is more reliable than VP9 across Electron builds
Segment rotation	Dual overlapping MediaRecorder instances	Single stop/start has frame drops; two alternating recorders produce seamless segments
Duration fix	`fix-webm-duration`	Chromium bug: MediaRecorder writes WebM without duration metadata
Sidecar protocol	stdio JSON Lines	Simpler lifecycle than HTTP/WebSocket, no port conflicts
LiveKit context	Data channel (`publishData`)	Separate from media tracks, structured JSON, chunked for >15KB
Dock hiding	`app.dock.hide()` + `type: 'panel'` + `LSUIElement`	Prevents dock icon reappearing when BrowserWindow interacts

Setup

Prerequisites

Node.js 18+
macOS 14+ (for ScreenCaptureKit audio)
Xcode Command Line Tools (for Swift sidecar build)

Build the Swift Sidecar

The sidecar is a Swift executable in sidecar-swift/ that uses macOS Accessibility APIs to observe the frontmost app.

cd sidecar-swift
swift build --product caret-sidecar
mkdir -p ../sidecar-bin
cp .build/debug/caret-sidecar ../sidecar-bin/observer-sidecar

Install & Run

npm install
npm start

LiveKit Configuration

Set environment variables before starting:

export LIVEKIT_URL=ws://localhost:7880
export LIVEKIT_TOKEN=<your-access-token>
npm start

Or use LiveKit's Meet app to generate a token for testing.

Permissions

On first launch, macOS will prompt for:

Screen Recording — required for capture
Accessibility — required for the sidecar (a11y tree traversal)

Grant both in System Settings > Privacy & Security. The app may need a restart after granting permissions.

Usage

Launch the app — it appears as a gray circle in the menu bar (no Dock icon)
Click the tray icon > Start Recording — icon turns red, segments start saving to ~/Documents/Caret/Recordings/
Click > Start Publishing — icon turns green, screen + audio publish to LiveKit room
Click > Stop Publishing — returns to red, local recording continues
Click > Stop Recording — finalizes last segment, icon returns to gray
Click > Quit — clean shutdown (flushes final segment)

Recording Output

~/Documents/Caret/Recordings/
├── 2025-01-15T10-30-00-000Z_000.webm   # Video segment 1
├── 2025-01-15T10-30-00-000Z_001.webm   # Video segment 2
└── context-2025-01-15T10-30-00-000Z.jsonl  # Sidecar context data

Project Structure

caret-recorder/
├── src/
│   ├── main.ts              — Electron entry, Chromium flags, hidden window, IPC
│   ├── tray.ts              — System tray icon + context menu (3 states)
│   ├── preload.ts           — contextBridge IPC exposure
│   ├── renderer.ts          — Orchestrator (capture + recorder + livekit + sidecar data)
│   ├── capture.ts           — getDisplayMedia wrapper
│   ├── recorder.ts          — Dual MediaRecorder + 5-min segmentation
│   ├── livekit.ts           — LiveKit room + track publish/unpublish/data
│   ├── sidecar/
│   │   ├── sidecar-manager.ts  — Spawn/readline/restart Swift binary
│   │   └── types.ts            — SidecarEvent, payload types
│   └── shared/
│       └── types.ts            — AppState enum, IPC channels, config constants
├── sidecar-swift/           — Swift sidecar source (macOS a11y observer)
│   ├── Package.swift
│   └── Sources/CaretSidecar/
│       ├── main.swift
│       ├── FrontmostAppObserver.swift
│       ├── AccessibilityTraversal.swift
│       ├── Payloads.swift
│       └── JSONOutput.swift
├── sidecar-bin/             — Built sidecar binary (gitignored)
├── forge.config.ts          — Electron Forge + Vite config, extraResource for sidecar
├── index.html               — Minimal shell (hidden window)
└── package.json

Data Flow

Screen + Audio → getDisplayMedia → MediaStream
  ├── SegmentedRecorder → WebM segments → disk (~/Documents/Caret/Recordings/)
  └── LiveKitPublisher → Track.Source.ScreenShare + ScreenShareAudio → LiveKit room

Sidecar stdout → SidecarManager.readline → parsed JSON
  ├── IPC → renderer → LiveKit data channel (real-time context)
  ├── disk (context-{timestamp}.jsonl alongside WebM segments)
  └── Main process → tray tooltip (current app name)

Perception Pipeline

Offline pipeline that processes recorded JSONL into condensed markdown moments. See docs/pipeline.md for full architecture.

adapters → dedup → buffer → stripper → condensation → moments

The first four stages are mechanical (no LLM). Condensation uses source-specific prompts to extract signal from stripped accessibility tree diffs.

LLM Setup

Condensation requires Codex CLI for LLM calls (gpt-5.4 via ChatGPT account):

npm install -g @openai/codex
codex login

Then run the full pipeline on recorded data:

npx tsx pipeline/run_pipeline.ts ~/Documents/Caret/Recordings/context-*.jsonl

Pipeline Tests & Benchmarks

npx tsx pipeline/test_dedup.ts
npx tsx pipeline/test_buffer.ts
npx tsx pipeline/test_stripper.ts
npx tsx pipeline/test_condensation.ts
npx tsx pipeline/bench_pipeline.ts ~/Documents/Caret/Recordings/context-*.jsonl

Known Limitations

WebM not MP4 — MediaRecorder API limitation in Chromium. VP8+Opus produces WebM; converting to MP4 would require ffmpeg post-processing
System audio requires macOS 13+ — Earlier versions need a virtual audio driver (BlackHole)
Screen Recording permission requires app restart — macOS caches the grant; first launch may need manual restart
Hidden window appears in capture — Electron's hidden BrowserWindow is included in full-screen capture (mitigated by 1x1 pixel size + type: 'panel')
No upload — Segments are saved locally only; background upload is not implemented
Single monitor — Captures primary display only; multi-monitor selection is not exposed in UI
LiveKit data channel 15KB limit — Large traversal payloads are chunked automatically, but receiver must reassemble

Testing

npm test                        # Run all tests
npm run test:watch              # Watch mode
npm run test:coverage           # Coverage report
npm test -- --reporter=verbose  # See all test names

Tests cover: 1080p/30fps capture, system audio, 5-min WebM segments, LiveKit publish/unpublish, system tray states, sidecar crash recovery, and clean shutdown ordering. The sidecar binary integration test runs automatically when the binary is present and is skipped otherwise.

Improvements

Upload pipeline — Background upload of segments to S3/GCS with retry logic
ffmpeg transcoding — Convert WebM segments to MP4 (H.264+AAC) post-capture for wider compatibility
Multi-monitor picker — Allow selecting which display to capture via tray submenu
Performance profiling — Measure actual CPU/memory impact of dual-recorder approach
Auto-reconnect LiveKit — The SDK handles it, but re-publishing tracks after reconnect needs explicit handling
Sidecar binary signing — Move from extraResource to Frameworks/ for proper macOS code signing
End-to-end tests — Playwright/Spectron for full Electron lifecycle (segment files on disk, LiveKit track presence)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs		docs
pipeline		pipeline
sidecar-swift		sidecar-swift
src		src
test		test
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
forge.config.ts		forge.config.ts
forge.env.d.ts		forge.env.d.ts
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.main.config.ts		vite.main.config.ts
vite.preload.config.ts		vite.preload.config.ts
vite.renderer.config.ts		vite.renderer.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caret Desktop Recorder

Architecture

Key Technical Decisions

Setup

Prerequisites

Build the Swift Sidecar

Install & Run

LiveKit Configuration

Permissions

Usage

Recording Output

Project Structure

Data Flow

Perception Pipeline

LLM Setup

Pipeline Tests & Benchmarks

Known Limitations

Testing

Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Caret Desktop Recorder

Architecture

Key Technical Decisions

Setup

Prerequisites

Build the Swift Sidecar

Install & Run

LiveKit Configuration

Permissions

Usage

Recording Output

Project Structure

Data Flow

Perception Pipeline

LLM Setup

Pipeline Tests & Benchmarks

Known Limitations

Testing

Improvements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages