Skip to content

feat: Agent-driven command recovery loops for verifyCommand #381

@jafreck

Description

@jafreck

Summary

Enhance verifyCommand with a structured agent-driven recovery mode that classifies command failures, dispatches fix agents with error context, and re-verifies in a closed loop.

Motivation

The framework's existing verifyCommand() runs a command, detects failures, and calls an opaque onFixNeeded callback. This works but pushes significant complexity onto consumers who need to:

  1. Classify whether the failure is infrastructure (retry) vs. code-quality (fix)
  2. Build context for the fix agent from the command output
  3. Launch the fix agent
  4. Re-run the command and check for regressions
  5. Track how many fix rounds have occurred and report metrics

AAMF implements a runCommandWithRecovery() (~200 lines) that does all of the above in a structured loop. The "command → classify → fix → re-verify" pattern is reusable for any workflow that runs validation commands (builds, tests, lints) and can dispatch agents to fix failures.

Proposed API

Enhanced VerifyCommandConfig

interface VerifyCommandConfig {
  // ... existing fields ...

  /**
   * Error classifier for command failures. Returns a category label
   * for infrastructure errors, or undefined for code-quality errors.
   * Infrastructure errors are retried directly (no fix agent).
   */
  errorClassifier?: (output: string) => string | undefined;

  /** Maximum retries for infrastructure errors before escalating. */
  maxInfraRetries?: number;

  /**
   * Agent-driven recovery configuration. When provided, code-quality
   * failures invoke the fix agent instead of the basic onFixNeeded callback.
   */
  agentRecovery?: AgentRecoveryConfig;
}

interface AgentRecoveryConfig {
  /** Build context for the fix agent from the failed command output. */
  buildFixContext: (failure: CommandFailureContext) => AgentInvocation;

  /** Agent launcher to invoke the fix. */
  launcher: AgentLauncherLike;

  /** Maximum fix rounds (agent invocations) per verification cycle. */
  maxFixRounds: number;

  /** Called after each fix attempt with the agent result. */
  onFixAttempt?: (attempt: number, agentResult: AgentResult, commandResult: ProcessResult) => void;

  /**
   * Optional regression detection: compare current failures against
   * a previous snapshot to detect new failures introduced by the fix.
   */
  detectRegressions?: boolean;
}

interface CommandFailureContext {
  /** The command that failed. */
  command: string;
  /** Exit code of the failed command. */
  exitCode: number;
  /** Stdout from the failed command. */
  stdout: string;
  /** Stderr from the failed command. */
  stderr: string;
  /** Parsed failure strings from extractFailures(). */
  failures: string[];
  /** Which fix round this is (0-indexed). */
  fixRound: number;
  /** Previous fix attempts and their outcomes (for context accumulation). */
  previousAttempts: Array<{ agentResult: AgentResult; remainingFailures: string[] }>;
}

Enhanced VerifyCommandResult

interface VerifyCommandResult {
  // ... existing fields ...

  /** Number of infrastructure retries performed. */
  infraRetries: number;

  /** Number of agent-driven fix rounds performed. */
  fixRounds: number;

  /** Whether the command ultimately passed after recovery. */
  recoveredSuccessfully: boolean;

  /** Regressions introduced during fix attempts (if detectRegressions enabled). */
  regressions: string[];

  /** Classification of the final error (if the command still fails). */
  errorClass?: string;
}

Execution Flow

command fails
    │
    ├─ errorClassifier → 'network'/'oom'/etc. → infra retry (fast backoff, up to maxInfraRetries)
    │
    └─ errorClassifier → undefined (code error)
        │
        ├─ agentRecovery configured?
        │   ├─ yes → buildFixContext() → launcher.launchAgent() → re-run command
        │   │         └─ if detectRegressions: compare failures vs. previous snapshot
        │   │         └─ loop up to maxFixRounds
        │   └─ no → onFixNeeded() callback (existing behavior)
        │
        └─ return VerifyCommandResult with full metrics

Implementation Notes

  • This is backward-compatible: agentRecovery is optional, existing onFixNeeded behavior is preserved when it's absent
  • When both onFixNeeded and agentRecovery are provided, agentRecovery takes precedence
  • CommandFailureContext.previousAttempts enables context accumulation — each fix round can see what was already tried, preventing the agent from repeating the same unsuccessful fix
  • detectRegressions uses the existing computeRegressions() function to compare failure snapshots
  • The error classifier should compose with the one from feat: Infrastructure error classification for RetryExecutor #377 — consumers can reuse classifyInfrastructureError here
  • Metrics (infraRetries, fixRounds, recoveredSuccessfully) enable observability dashboards to track recovery effectiveness

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions