feat: Agent-driven command recovery loops for verifyCommand

## Summary

Enhance `verifyCommand` with a structured agent-driven recovery mode that classifies command failures, dispatches fix agents with error context, and re-verifies in a closed loop.

## Motivation

The framework's existing `verifyCommand()` runs a command, detects failures, and calls an opaque `onFixNeeded` callback. This works but pushes significant complexity onto consumers who need to:

1. Classify whether the failure is infrastructure (retry) vs. code-quality (fix)
2. Build context for the fix agent from the command output
3. Launch the fix agent
4. Re-run the command and check for regressions
5. Track how many fix rounds have occurred and report metrics

AAMF implements a `runCommandWithRecovery()` (~200 lines) that does all of the above in a structured loop. The "command → classify → fix → re-verify" pattern is reusable for any workflow that runs validation commands (builds, tests, lints) and can dispatch agents to fix failures.

## Proposed API

### Enhanced VerifyCommandConfig

```typescript
interface VerifyCommandConfig {
  // ... existing fields ...

  /**
   * Error classifier for command failures. Returns a category label
   * for infrastructure errors, or undefined for code-quality errors.
   * Infrastructure errors are retried directly (no fix agent).
   */
  errorClassifier?: (output: string) => string | undefined;

  /** Maximum retries for infrastructure errors before escalating. */
  maxInfraRetries?: number;

  /**
   * Agent-driven recovery configuration. When provided, code-quality
   * failures invoke the fix agent instead of the basic onFixNeeded callback.
   */
  agentRecovery?: AgentRecoveryConfig;
}

interface AgentRecoveryConfig {
  /** Build context for the fix agent from the failed command output. */
  buildFixContext: (failure: CommandFailureContext) => AgentInvocation;

  /** Agent launcher to invoke the fix. */
  launcher: AgentLauncherLike;

  /** Maximum fix rounds (agent invocations) per verification cycle. */
  maxFixRounds: number;

  /** Called after each fix attempt with the agent result. */
  onFixAttempt?: (attempt: number, agentResult: AgentResult, commandResult: ProcessResult) => void;

  /**
   * Optional regression detection: compare current failures against
   * a previous snapshot to detect new failures introduced by the fix.
   */
  detectRegressions?: boolean;
}

interface CommandFailureContext {
  /** The command that failed. */
  command: string;
  /** Exit code of the failed command. */
  exitCode: number;
  /** Stdout from the failed command. */
  stdout: string;
  /** Stderr from the failed command. */
  stderr: string;
  /** Parsed failure strings from extractFailures(). */
  failures: string[];
  /** Which fix round this is (0-indexed). */
  fixRound: number;
  /** Previous fix attempts and their outcomes (for context accumulation). */
  previousAttempts: Array<{ agentResult: AgentResult; remainingFailures: string[] }>;
}
```

### Enhanced VerifyCommandResult

```typescript
interface VerifyCommandResult {
  // ... existing fields ...

  /** Number of infrastructure retries performed. */
  infraRetries: number;

  /** Number of agent-driven fix rounds performed. */
  fixRounds: number;

  /** Whether the command ultimately passed after recovery. */
  recoveredSuccessfully: boolean;

  /** Regressions introduced during fix attempts (if detectRegressions enabled). */
  regressions: string[];

  /** Classification of the final error (if the command still fails). */
  errorClass?: string;
}
```

## Execution Flow

```
command fails
    │
    ├─ errorClassifier → 'network'/'oom'/etc. → infra retry (fast backoff, up to maxInfraRetries)
    │
    └─ errorClassifier → undefined (code error)
        │
        ├─ agentRecovery configured?
        │   ├─ yes → buildFixContext() → launcher.launchAgent() → re-run command
        │   │         └─ if detectRegressions: compare failures vs. previous snapshot
        │   │         └─ loop up to maxFixRounds
        │   └─ no → onFixNeeded() callback (existing behavior)
        │
        └─ return VerifyCommandResult with full metrics
```

## Implementation Notes

- This is backward-compatible: `agentRecovery` is optional, existing `onFixNeeded` behavior is preserved when it's absent
- When both `onFixNeeded` and `agentRecovery` are provided, `agentRecovery` takes precedence
- `CommandFailureContext.previousAttempts` enables context accumulation — each fix round can see what was already tried, preventing the agent from repeating the same unsuccessful fix
- `detectRegressions` uses the existing `computeRegressions()` function to compare failure snapshots
- The error classifier should compose with the one from #377 — consumers can reuse `classifyInfrastructureError` here
- Metrics (`infraRetries`, `fixRounds`, `recoveredSuccessfully`) enable observability dashboards to track recovery effectiveness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Agent-driven command recovery loops for verifyCommand #381

Summary

Motivation

Proposed API

Enhanced VerifyCommandConfig

Enhanced VerifyCommandResult

Execution Flow

Implementation Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Agent-driven command recovery loops for verifyCommand #381

Description

Summary

Motivation

Proposed API

Enhanced VerifyCommandConfig

Enhanced VerifyCommandResult

Execution Flow

Implementation Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions