Skip to content

feat(studio-desktop): signed background helper — db/serve survive app close/crash (#599)#604

Open
Necmttn wants to merge 14 commits into
mainfrom
feat/599-studio-app-register-db-serve-via
Open

feat(studio-desktop): signed background helper — db/serve survive app close/crash (#599)#604
Necmttn wants to merge 14 commits into
mainfrom
feat/599-studio-app-register-db-serve-via

Conversation

@Necmttn

@Necmttn Necmttn commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Closes #599.

A signed macOS background helper for ax studio.app that owns the surreal + serve + ingest backend via launchd (SMAppService agentService), so the data plane survives the app closing or crashing. Built to fix a real incident: studio.app crashed, orphaned + wedged its :8521 db, and the graph silently went 4 days stale with ax daemon status still reporting "listening."

What shipped (8 tasks, TDD, each reviewed)

Area Change
Helper runtime ax serve --managed-db (supervise bundled surreal child) + --ingest-every loop - additive, useful for all users
Incident fix Real-SELECT 1 wedge watchdog → SIGKILL+respawn (KeepAlive only restarts on exit; the incident was a wedge)
Bundle Form-A LaunchAgent plist (BundleProgram=bundled bun → ax-src serve), electron-builder wiring, bun install into staged ax-src
Registration ElectronApp.registerBackgroundHelper/unregister/helperStatus via setLoginItemSettings({type:'agentService'}), fail-soft startup
UI Attach-mode invariants pinned - UI never double-spawns, never kills the helper on quit
Visibility ax daemon status now flags a wedged db (real query probe) + ax daemon restart hint
Hardening Idempotent pre-spawn :8521 probe (attach-if-healthy) - no orphan restart storm

Design notes (two spike-driven pivots, full rationale in the plan)

  • The issue's literal "SMAppService compiled helper" was infeasible: the Electron main binary can't be a launchd agent (needs WindowServer), and the compiled axctl can't live-ingest (lmdb won't bundle). Form A = launchd → bundled bun → ax-src serve instead.
  • This is Option A from docs/superpowers/specs/2026-06-16-smappservice-background-helper-design.md - previously rejected in favor of the IDE model, now implemented because the 4-day stall made true app-closed capture a hard requirement (the spec's own named escape hatch).

Plan: docs/superpowers/plans/2026-06-24-studio-helper-smappservice.md · Contract: docs/superpowers/notes/2026-06-24-agentservice-contract.md

⚠️ Maintainer signed-build smoke (required before relying on the helper)

These need a real notarized Developer-ID build and could not run in CI:

  • S1 - signed-build install; quit the app, confirm surreal+serve still answer (curl :1738/api/version), launchctl list | grep ax-studio-helper exit 0.
  • S2 (load-bearing) - verify the plist's bundle-relative ProgramArguments actually resolve under launchd (default cwd is /). If ax serve doesn't start, switch BundleProgram to the §4 shell-wrapper (spike-validated, cwd-independent).
  • S3 - wedge-recovery smoke (kill -STOP surreal → watchdog SIGKILL+respawn) and hard-crash orphan reap (kill the helper's bun hard → confirm idempotent restart attaches, no storm).
  • S4 - System Settings → Login Items shows ONE "ax studio" Developer-ID item (not "bash - unidentified developer").

Review status

Final whole-branch review: ready to merge pending the S1–S4 smoke, no Critical. One Important (orphan restart storm) fixed in this branch. 12 Minor findings triaged keep-as-followup (in the plan's progress ledger). Branch tests green (axctl 454, dashboard 430, studio-desktop 63).

Necmttn and others added 14 commits June 24, 2026 11:20
SMAppService Option A — single signed agentService helper owns the
backend (surreal+serve+ingest) via launchd, UI attaches. Includes the
real-query wedge watchdog (the actual incident fix; KeepAlive alone
misses a hung db) and a daemon-status wedge probe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- serviceName = plist name (no .plist extension); plist at Contents/Library/LaunchAgents/
- BundleProgram is relative to the .app root (not absolute)
- Electron source confirms: agentService → SMAppService.agentServiceWithPlistName:
- Status values: not-registered | enabled | requires-approval | not-found
- Gate resolved: main Electron binary is NOT usable as agentService program (no display)
- Decided strategy B: separate compiled helper binary at Contents/Library/LaunchAgents/ax-serve-helper
- electron-builder auto-signs all Mach-Os; helper staged via extraFiles

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ed-db

Spikes 1+1b: helper = launchd→bundled bun→ax-src serve (Form A, no separate
compiled binary; compiled axctl can't ingest). Task 2 reshaped to
'ax serve --managed-db --ingest-every'. No StandardOutPath (macOS 14.4+).
stage-ax-source must bun install the bundle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…loop (#599)

- managed-db.ts: resolveManagedSurrealPath (sibling-of-execPath) + makeManagedDb
  (Effect that spawns bundled surreal, waits /health, Scope-finalized kill)
- serve-ingest-loop.ts: runIngestLoop (Schedule.spaced, fail-soft per iteration)
  with no-op TraceSink + caller-supplied baseLayer
- parseDurationString: parse compact strings '2m','30s','1h','500ms'
- serve command: --managed-db (boolean) + --ingest-every (optional string)
- serveDashboard: spawn surreal before server when --managed-db, fork ingest
  loop via Effect.runFork when --ingest-every; managed-db scope closed on
  shutdown after serve runtime disposes

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…599)

- Add SurrealWatchdog.ts: pure, TestClock-drivable probe loop
  (sleep-first, Ref counter, trips after N consecutive failures,
  re-arms after each trip)
- Wire into makeManagedDb: SELECT 1 probe with 1s timeout (not /health,
  which passes on a wedge), SIGKILL on trip (SIGTERM was ignored in the
  incident), respawn + re-probe health
- Capture managedDbScope via Effect.scope so spawnAndReady can be called
  from onWedged (R=never) while still registering spawner finalizers in
  the outer scope
- 7 TDD tests covering: trip, no-trip, re-arm, success-reset, mixed
  probes, Effect-failure-as-not-ok, interruption

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…st (#599)

- Add build/LaunchAgents/com.necmttn.ax-studio.helper.plist (Form A plist):
  BundleProgram=Contents/Resources/bin/arm64/bun, ProgramArguments=[bun,
  ax-src entry, serve, --managed-db, --port=1738, --ingest-every=2m].
  KeepAlive=true, RunAtLoad=true, ProcessType=Background, ThrottleInterval=5,
  SoftResourceLimits.NumberOfFiles=65536. NO StandardOutPath/StandardErrorPath
  (macOS 14.4+ SMAppService rejection guard).

- Add electron-builder.config.cjs extraFiles entry placing the plist at
  Contents/Library/LaunchAgents/com.necmttn.ax-studio.helper.plist (required
  location for SMAppService agentServiceWithPlistName: lookup).

- Add scripts/verify-helper-bundle.ts (plutil -convert json parser) +
  scripts/verify-helper-bundle.test.ts (TDD gate: 15 assertions covering
  required keys, BundleProgram arch pattern, ProgramArguments serve flags,
  and the critical NO-StandardOutPath/StandardErrorPath invariant).

- stage-ax-source.ts already has bun install --linker hoisted (added in
  earlier PRs); no changes needed. Confirmed via git log.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gram in helper plist test (#599)

- Replace misleading launchd cwd assumption with honest caveat and fallback reference
- Tighten BundleProgram regex to require full Contents/Resources prefix, not just suffix
- Add assertions for --port=1738 and --ingest-every=2m flags in ProgramArguments

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Service (#599)

- Add AgentServiceStatus type + BACKGROUND_HELPER_SERVICE_NAME const to ElectronApp.ts
- Add ElectronAppLike interface for testable dependency injection
- Export makeFrom(app: ElectronAppLike) factory (used by tests + production layer)
- Add registerBackgroundHelper / unregisterBackgroundHelper / helperStatus
  to ElectronAppShape and implementation (darwin-guarded, no-op elsewhere)
- Startup: call registerBackgroundHelper after setOpenAtLogin in DesktopApp.ts
  (both are kept - mainAppService and agentService are independent)
  logs 'requires-approval' nudge pointing to System Settings → Login Items
- Test: 7 assertions in ElectronApp.test.ts using makeFrom + stub (no mock.module)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wnership

- AxDaemonArbitration.test.ts: add 3 tests pinning the launchd-helper
  invariant (daemonHealthy short-circuits to 'attach' regardless of
  surrealHealthy or portsFree) so a future refactor can't silently break
  the no-double-spawn guarantee.

- AxBackendManager.test.ts: add explicit stop()-in-attach-mode test
  asserting zero process events, pinning the quit-safety invariant that
  the app never kills the helper's surreal/ax-serve on quit.

Both sets of assertions already held (no production code changed); these
tests make the invariants regression-proof and document the launchd-helper
ownership contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…aemonStatus wedged branch

- Add probeDbQueryPure(host, port): async, SELECT 1 via HTTP /sql, 1.5s
  AbortSignal.timeout, fail-CLOSED (any error/timeout → false).
  Exported for unit tests so dead-port fail-closed is verified without
  touching the live :8521 db.
- Add dbQueryOk: boolean | null to DaemonStatus (null when not listening,
  true = healthy, false = wedged).
- collectDaemonStatus calls probeDbQueryPure via Effect.promise when the
  port is listening.
- formatDaemonStatus new wedged branch: listening but NOT answering queries
  (wedged) — shows the word 'wedged' + 'ax daemon restart' hint.
- TDD: wedge test written first (failing), then implementation (26 tests green).

Fixes: ax daemon status silently reported 'listening' during the production
wedge incident because probePort only checked the socket, never ran a query.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#599)

- Status updated: Option C (IDE model) → Option A implemented
- UPDATE 2026-06-24 callout at top (4-day-stall incident, plan link)
- Option A 'Rejected alternatives' entry annotated as NOW IMPLEMENTED
  with Form A summary (bundled bun BundleProgram, --managed-db, watchdog)
- 'Not doing (v0)' Option A entry struck through
- Added 'Operating the helper' operator guide section covering:
  verify (launchctl/ax daemon status/curl), uninstall (4 paths),
  and open cwd/relative-path smoke item with shell-wrapper fallback note

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…al, no orphan restart storm (#599)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying ax with  Cloudflare Pages  Cloudflare Pages

Latest commit: dfb96ae
Status: ✅  Deploy successful!
Preview URL: https://e4475b60.ax-62d.pages.dev
Branch Preview URL: https://feat-599-studio-app-register.ax-62d.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

studio app: register db/serve via SMAppService (KeepAlive) so the daemon survives app close/crash

1 participant