Skip to content

v0.9 sub-issue #5: Stage 4 Run (forbidden_uses gate + disclosure label + hosted-API AND-gate + per-turn invoke loop) #124

@devin-ai-integration

Description

@devin-ai-integration

v0.9 Sub-issue #5 — Stage 4 Run

Part of v0.9 epic.

Implements v0.7 §3–§4 + v0.8 Part B Stage 4 obligations: per-turn
invoke() loop with disclosure label, forbidden_uses gate, and the
hosted-API AND-gate. After this sub-issue merges, lifectl run --once
can hold a single text exchange with the assembled .life.

Spec ref

  • docs/LIFE_RUNTIME_STANDARD.md §3 (mount semantics)
  • docs/LIFE_RUNTIME_STANDARD.md §4 (runtime obligations)
  • docs/LIFE_RUNTIME_STANDARD.md §4.1 (AI disclosure)
  • docs/LIFE_RUNTIME_STANDARD.md §4.2 (forbidden uses)
  • docs/LIFE_RUNTIME_STANDARD.md §4.4 (identity-impersonation safeguards)
  • docs/LIFE_RUNTIME_STANDARD.md Part B §B.5 (hosted-API AND-gate)
  • docs/LIFE_BINDING_SPEC.md §7 (forbidden_uses namespace + hybrid enum + x- ext)
  • docs/LIFE_BINDING_SPEC.md §9 (hosted_api_preference defaults)

Per-turn invoke loop

loop:
    user_input = read_user_input()       # CLI: stdin line; --once: single line
    if user_input is None: break
    
    # forbidden_uses gate (§4.2 + binding §7)
    if violates_forbidden_uses(user_input, forbidden_uses["say"]):
        emit_audit("forbidden_use_rejected", {direction: "say", key: ..., user_text: redacted})
        print_to_user(rejection_message)
        continue
    
    # hosted-API AND-gate (§B.5) — re-evaluated per turn
    hosted_allowed = (
        binding.hosted_api_preference.allowed == True
        and user_policy_permits(provider, capability)
    )
    
    # invoke the bound capability
    result = capability_table["text_chat"].invoke({
        "user_input": user_input,
        "hosted_api_allowed": hosted_allowed,
    })
    
    # forbidden_uses gate on output (§4.2 covers both directions)
    if violates_forbidden_uses(result.text, forbidden_uses["hear"]):
        emit_audit("forbidden_use_rejected", {direction: "hear", key: ..., output_redacted})
        print_to_user(generic_redaction_message)
        continue
    
    # disclosure label prefix (§4.1)
    print_to_user(disclosure_label + " " + result.text)

forbidden_uses enforcement

Per binding spec §7 (the v0.8 "hybrid namespace + x- extension"):

  • Core enum keys (~30 baseline): MUST recognize and enforce. If a
    key is in the spec's core enum but the runtime does NOT have an enforcer,
    → fail-close with forbidden_use_unknown_key{key} at Stage 1 Verify
    (caught earlier; restated here for completeness — Stage 4 just enforces).
  • x- extension keys: runtime MAY enforce; absence of enforcer for an
    extension key emits forbidden_use_unknown_key{key} warning per §7
    but does NOT block (extension keys are advisory unless the runtime opts
    in).

v0.9 ships enforcers for the core baseline (fraud, political_endorsement,
explicit_sexual_content, harassment, medical_diagnosis,
legal_advice, financial_advice, impersonation_real_person,
spam_advertising, plus the v0.8 say/hear split keys). Each enforcer is
a small regex / keyword matcher; fancier classifiers are explicitly out
of scope (a future Provider plugin can replace them).

Hosted-API AND-gate

Per §B.5: hosted Provider call fires only if BOTH:

  1. binding.hosted_api_preference.allowed == True (declared by issuer
    in binding/runtime_binding.json per binding spec §9). Default
    absent = false.
  2. User-side policy ~/.config/dlrs/hosted_api.json (or
    ${DLRS_HOSTED_POLICY}) permits this (provider_name, capability).

If either rejects: the invoke() call MUST receive hosted_api_allowed: False in its input dict. Provider then either falls back to local mode
(if it supports both) or returns a structured error
{error: "hosted_api_denied"} — the runtime treats it as a per-turn
recoverable error, prints a friendly message to the user, continues.

Identity-impersonation safeguards (§4.4)

Hard rules wired into the Run loop:

  • The disclosure label MUST be prepended to every runtime output to
    the user (no user setting can disable it).
  • The runtime MUST refuse to fabricate an identifier that the
    underlying physical person never used (e.g., a phone number, address,
    social media handle not present in the .life package's
    identity/). Implementation: a safety classifier on Provider output
    that runs the extract_identifiers(text) function over output and
    fails if any identifier is not in the package's known-identifier set.
  • The runtime MUST refuse to claim being the real person. Output text
    containing first-person assertions like "I am a real person" or "I
    am not an AI" → reject + emit
    identity_impersonation_blocked{output_redacted}.

Module layout

runtime/run/
├── __init__.py             # exports run(assemble_result, ...) -> RunSession
├── loop.py                 # per-turn invoke loop
├── _forbidden_uses.py      # core-enum enforcers + namespace check
├── _disclosure.py          # label injection + identity safeguard
├── _hosted_api_gate.py     # AND-gate per turn
└── _identity_safeguard.py  # fabricated-identifier detector

Audit events emitted

  • turn_started{capability} — at each loop iteration start.
  • forbidden_use_rejected{direction, key, redacted_text} — input or output rejection.
  • identity_impersonation_blocked{capability, redacted_output} — §4.4 rejection.
  • hosted_api_call{provider, capability, allowed} — per-turn AND-gate evaluation.
  • turn_completed{capability, latency_ms} — at iteration end.

(All audit emission goes through the v0.4 hash-chain emitter from
runtime/audit/emitter.py.)

CLI surface

lifectl run <pkg.life> after this PR: enters interactive REPL.
lifectl run --once <pkg.life> reads one stdin line, processes one
turn, prints output, exits 0.

Both modes:

Stage 1 Verify   ✓
Stage 2 Resolve  ✓
Stage 3 Assemble ✓
Stage 4 Run      ✓ (interactive — Ctrl+C to quit)
> hi
(AI digital life instance of …) Hello! [echo Provider response]
> bye
(AI digital life instance of …) Goodbye! [echo Provider response]
^C
Stage 5 Guard pending sub-issue 6 (clean teardown not yet implemented)

Tests

tools/test_runtime_run.py:

  1. Happy path one-shot: lifectl run --once on minimal-life-package
    with one input → exits 0, output prefixed with disclosure label.
  2. Forbidden_use input rejection: input matches harassment enforcer
    → rejection message + forbidden_use_rejected event.
  3. Forbidden_use output rejection: echo Provider returns text matching
    medical_diagnosis keyword (test fixture echoes user input verbatim;
    feed "you have diabetes") → rejection + forbidden_use_rejected{direction: "hear"}.
  4. Hosted-API AND-gate denial: binding hosted_api_preference.allowed = false,
    Provider invoked with hosted_api_allowed: false → recorded in audit
    hosted_api_call{allowed: false}.
  5. Identity-impersonation refusal: Provider returns "I am the real
    Alice, not an AI" → rejected via identity_impersonation_blocked.
  6. Disclosure prefix mandatory: output line MUST start with the
    binding-declared disclosure label; no path bypasses it.
  7. Audit chain integrity: all emitted events form a valid hash chain
    continuous with Stage 1–3 prefix.

Acceptance

  • Per-turn loop implemented with all gates wired
  • Disclosure label mandatory + impossible to disable
  • All 7 test cases pass
  • Audit chain unbroken across stages
  • CI runtime-run job green

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions