Skip to content

feat: implement human-in-the-loop runtime enforcement and audit #37

@skokaina

Description

@skokaina

Problem

spec.humanInTheLoop is declared in the manifest schema but nothing enforces it at runtime. The schema supports rich HITL triggers (before-destructive-tool, on-low-confidence, on-high-cost, always) but no SDK function actually pauses execution, requests approval, or records the outcome.

What exists today

Layer Status Detail
Manifest schema ✅ Declared HumanInTheLoopSchema at manifest.schema.ts:520-544approvalRequired, timeoutSeconds, timeoutAction, notifyVia, webhookUrl
OPA enforce mode ✅ Working Sidecar proxy blocks responses when OPA denies (OPA_PROXY_MODE=enforce) — post-hoc, not pre-execution
user_confirmed OPA field ⚠️ Plumbed opa-client.ts:65 has the field, OPA can check it, but no SDK code ever sets it to true
SEC-LLM-08 audit rule ✅ Static Checks destructiveHint annotations on tools — does not verify HITL actually happened
EventPush ✅ Working Reports guardrail/tool/model/memory events — no approval event type

Gaps to close

1. SDK: approval gate function

Add to @agentspec/sdk (and Python SDK):

// Pauses execution, notifies via configured channels, waits for approval
const decision = await reporter.requestApproval({
  action: 'call-tool',
  toolName: 'delete-account',
  reason: 'destructiveHint=true',
})
// decision: 'approved' | 'rejected' | 'timeout'

2. EventPush: approval event type

{ type: 'approval', action: 'call-tool', toolName: 'delete-account',
  decision: 'approved', approvedBy: 'user@example.com', latencyMs: 12000 }

3. Audit rules for HITL

  • New rule: HITL-01humanInTheLoop.enabled declared when destructive tools exist
  • New rule: HITL-02 — (behavioral) approvals actually occurred for destructive tool calls (cross-ref audit ring)
  • Update SEC-LLM-08 to check humanInTheLoop.enabled when destructiveHint: true tools are present

4. Circuit breaker / kill switch

  • POST /agentspec/halt — sidecar endpoint that sets a flag causing the proxy to return 503 for all subsequent requests
  • POST /agentspec/resume — clears the halt flag
  • Heartbeat includes halted: boolean so the control plane / VS Code can show status
  • Gap issue if agent has been halted for > N minutes

5. Pre-execution blocking (stretch)

Currently OPA evaluates after the agent responds. For true HITL, the SDK needs to pause before executing the tool. This requires the framework sub-SDK (sdk-langgraph) to call requestApproval() inside the tool node before invoking the tool function.

6. Heartbeat HITL metrics

Add to heartbeat payload:

{
  "hitl": {
    "pendingApprovals": 1,
    "approvedCount": 12,
    "rejectedCount": 2,
    "timeoutCount": 0,
    "avgApprovalLatencyMs": 8500
  }
}

7. Gap detection

  • New gap issue: "HITL declared but no approval events in audit ring" (severity: high)
  • New gap issue: "Destructive tool called without approval event" (severity: critical)

Files to modify

File Change
packages/sdk/src/agent/reporter.ts Add requestApproval() method
packages/sdk-python/agentspec/reporter.py Python equivalent
packages/sidecar/src/control-plane/events.ts Add approval event type
packages/sidecar/src/control-plane/index.ts Add /agentspec/halt and /agentspec/resume endpoints
packages/sidecar/src/proxy.ts Check halt flag before forwarding
packages/sdk/src/audit/rules/security.rules.ts Update SEC-LLM-08, add HITL-01/02
packages/sidecar/src/control-plane/gap.ts Add HITL gap checks
packages/sdk/src/agent/push.ts Add hitl metrics to heartbeat
packages/control-plane/schemas.py Add hitl to HeartbeatRequest
packages/operator/crds/agentobservation.yaml Add HITL fields to CRD status

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1-credibilityFix before or shortly after launchenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions