Skip to content

v0.9 sub-issue #6: Stage 5 Guard (withdrawal / lifecycle / expiry watchers + clean teardown + audit emitter) #125

@devin-ai-integration

Description

@devin-ai-integration

v0.9 Sub-issue #6 — Stage 5 Guard

Part of v0.9 epic.

Implements v0.7 §4.3 + §5 + §6 + v0.8 Part B Stage 5 (Guard): the
withdrawal watcher, lifecycle watcher, expiry watcher, audit emitter
and clean teardown. After this sub-issue merges, lifectl run is
fully functional except for the integration tests + echo Provider +
docs (sub-issue 7).

Spec ref

  • docs/LIFE_RUNTIME_STANDARD.md §4.3 (withdrawal polling + revocation)
  • docs/LIFE_RUNTIME_STANDARD.md §5 (audit emission)
  • docs/LIFE_RUNTIME_STANDARD.md §6 (termination)
  • docs/LIFE_RUNTIME_STANDARD.md Part B §B.7 (withdrawal_poll,
    lifecycle_transition_observed)
  • docs/LIFE_LIFECYCLE_SPEC.md (lifecycle states)

Watchers

Each watcher runs as a daemon thread (or async task) started after
Stage 3's mount_succeeded event.

Withdrawal watcher (§4.3)

  • Polls life-package.withdrawal_endpoint at minimum every 24h.
    CLI flag --poll-interval-override <seconds> (test-only) reduces this
    for the conformance harness.
  • Each poll emits withdrawal_poll{endpoint, result, http_status}
    result is one of not_revoked / revoked / unreachable / malformed.
  • On revoked: trigger graceful teardown with reason withdrawal;
    emit unmount{reason: "withdrawal"}.
  • On 3 consecutive unreachable polls (≥ 72h gap): emit
    withdrawal_unreachable_warning{consecutive_failures} but do NOT
    unmount (per §4.3 — runtime continues serving but flags the user).

Lifecycle watcher

  • Polls the package's lifecycle/lifecycle.json::lifecycle_state at
    the same cadence as the withdrawal watcher.
  • Detects transitions:
    • active → superseded: emit lifecycle_transition_observed;
      teardown with reason lifecycle if the user has set
      ~/.config/dlrs/runtime.json::on_supersede = "unmount" (default
      "warn"); else flag user with a banner.
    • active → frozen: enter memorial read-only mode (Stage 4 starts
      refusing new turns; replays only). v0.9 implementation: emit a
      transition event + print a banner "this .life is now frozen
      in memorial state — read-only" and refuse further input. Memorial
      holographic echo (the Q8 D extension) is v0.10+.
    • active → withdrawn: teardown with reason lifecycle.

For pointer-mode .life, the lifecycle file may be remote (per
lifecycle spec). v0.9 reads from local cache only; warns but does NOT
auto-fetch (offline-first).

Expiry watcher

  • Reads life-package.expires_at once at mount time. Schedules a
    callback at that time (using threading.Timer or asyncio.call_at).
  • On fire: emit expiry_reached{expires_at} + teardown with reason
    expiry.

Teardown sequence

When any of the three watchers (or Ctrl-C / explicit lifectl quit)
triggers teardown:

  1. Stop accepting new turns at Stage 4 Run loop (the loop checks a
    shared Event flag every iteration).
  2. Wait for any in-flight invoke() to return (≤ 30s timeout; SIGTERM
    the subprocess host after that).
  3. Call teardown() on every bound LifeCapabilityProvider.
  4. SIGTERM all user_installed subprocess hosts; SIGKILL after 5s if
    they don't exit.
  5. Remove any temp-extracted files from the zip mount (no raw-asset
    leakage per §3.3).
  6. Emit unmount{reason} as the final event in the audit chain.
  7. lifectl process exits 0 (normal teardown) or non-zero (error).

Audit emitter (v0.4 hash chain)

runtime/audit/emitter.py (created in sub-issue 1, fleshed out here):

  • Hash-chained appender to audit/events.jsonl inside the runtime's
    per-mount data directory (~/.local/share/dlrs/mounts/<package_id>/).
    This is the runtime's local audit log, distinct from the .life's
    bundled audit/events.jsonl (which is read-only at runtime).
  • Each event {event_type, occurred_at, actor: "runtime/<version>", prev_hash, ...fields}.
  • prev_hash is sha256(prev_event_json_canonical) per v0.4 hash chain.
  • The first runtime event (mount_attempted) chains its prev_hash from
    the bundled audit log's tip (this is what links the runtime's session
    log back to the .life's issuer-emitted chain).

Module layout

runtime/guard/
├── __init__.py            # exports start_watchers(verify_result, assemble_result, on_teardown)
├── withdrawal_watcher.py
├── lifecycle_watcher.py
├── expiry_watcher.py
└── teardown.py            # signal-handlers + ordered teardown sequence

runtime/audit/
├── __init__.py
└── emitter.py             # full hash-chain implementation (was stub in sub1)

Audit events emitted

  • withdrawal_poll{endpoint, result, http_status} — every poll.
  • withdrawal_unreachable_warning{consecutive_failures} — at 3+ failures.
  • lifecycle_transition_observed{old_state, new_state, package_id}
    any state change.
  • expiry_reached{expires_at} — once at the scheduled time.
  • unmount{reason} — the terminal event.

CLI surface

lifectl run after this PR exits cleanly on:

  • Ctrl-C (SIGINT) → unmount{reason: "user_quit"}.
  • Withdrawal trigger → unmount{reason: "withdrawal"}.
  • Lifecycle transition to withdrawnunmount{reason: "lifecycle"}.
  • Expiry reached → unmount{reason: "expiry"}.
  • Stage exception → unmount{reason: "error"}.

lifectl run --poll-interval-override <s> flag added (test-only).

Tests

tools/test_runtime_guard.py:

  1. Withdrawal trigger: mock endpoint returns {revoked: true} on
    the second poll → unmount within 2 polls; unmount{reason: "withdrawal"}
    emitted.
  2. Withdrawal unreachable warning: mock endpoint always 503 → 3
    withdrawal_poll{result: "unreachable"} events + 1
    withdrawal_unreachable_warning event; runtime keeps serving.
  3. Lifecycle to withdrawn: rebuild package mid-run with state
    withdrawn; lifecycle watcher detects + unmounts.
  4. Lifecycle to frozen: detect transition; Run loop refuses further
    input; memorial banner shown.
  5. Expiry trigger: short-lived package (expires_at = now + 5s);
    timer fires → unmount with reason expiry.
  6. Ctrl-C: send SIGINT; runtime tears down within 30s + emits
    unmount{reason: "user_quit"}.
  7. Subprocess SIGTERM-then-SIGKILL: user_installed Provider
    subprocess refuses SIGTERM; runtime SIGKILLs after 5s; no zombie.
  8. Hash chain unbroken: full mount → 5+ turns → unmount; assert
    every event in the runtime's audit log links via prev_hash and the
    first event's prev_hash matches the .life's bundled audit-tip hash.

Acceptance

  • All 3 watchers implemented + tested
  • Clean teardown with no zombie subprocesses
  • Audit hash chain validates across stages
  • All 8 test cases pass
  • CI runtime-guard job green

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions