Skip to content

feat(check): in-process cersei-agent executor replacing claude -p + MCP (MULTI-1367)#144

Merged
RobbieMcKinstry merged 2 commits into
trunkfrom
robbie/multi-1367
Jun 25, 2026
Merged

feat(check): in-process cersei-agent executor replacing claude -p + MCP (MULTI-1367)#144
RobbieMcKinstry merged 2 commits into
trunkfrom
robbie/multi-1367

Conversation

@RobbieMcKinstry

Copy link
Copy Markdown
Contributor

Replace the shell-out ClaudeExecutor and the in-process MCP result server
with an in-process cersei-agent executor that captures verdicts through a
per-check cersei-tools judge tool. Consumes MULTI-1359's provider config.

  • CerseiExecutor (default): builds a cersei_agent::Agent per check in its
    CoW sandbox — concrete .model from the allowlist, provider via a per-check
    ProviderFactory (cersei Agent takes an owned Box<dyn Provider>), effort
    mapped to EffortLevel thinking budget, distinct session_id per check with
    clear_session_shell_state on teardown, tokio timeout + cancel token.
  • Judge tool (report-check-result): a per-check cersei_tools::Tool closing
    over a write-once VerdictSink + the agent's CancellationToken; records the
    verdict in-process and cancels the agent so run() returns immediately.
  • Least-privilege by default: read-only tool set (Read/Grep/Glob) + judge under
    AllowReadOnly.
  • New AgentOutcome { verdict, stop_reason, turns, error }; execution retries
    max-turns-without-verdict / stream-error / timeout per max_attempts.
  • Remove src/checks/mcp/ and mcp_config_path; drop now-dead rmcp, axum,
    schemars deps.
  • Executor selectable via --executor / MULTI_CHECKS_EXECUTOR / config
    (cersei default, claude fallback); the claude -p fallback now reports
    via a sentinel JSON file since the MCP server is gone.
  • Keep the slim CheckExecutor seam (Cersei + Claude + Fake); execution-phase
    tests still run without network or a live model.
  • Update guides/checks.md and the self-validation CHECKS.md to the new
    in-process design.

Vendor cersei-provider 0.1.9 via [patch.crates-io] (third_party/cersei-provider)
with a one-line fix: upstream sends a stale anthropic-beta: interleaved-thinking-2025-04-14 header unconditionally, which the current
Anthropic API rejects with HTTP 400 — breaking every in-process check. See
third_party/cersei-provider/PATCH.md; remove once upstream ships a fix.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

RobbieMcKinstry commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

… MCP (MULTI-1367)

Replace the shell-out `ClaudeExecutor` and the in-process MCP result server
with an in-process `cersei-agent` executor that captures verdicts through a
per-check `cersei-tools` judge tool. Consumes MULTI-1359's provider config.

- `CerseiExecutor` (default): builds a `cersei_agent::Agent` per check in its
  CoW sandbox — concrete `.model` from the allowlist, provider via a per-check
  `ProviderFactory` (cersei `Agent` takes an owned `Box<dyn Provider>`), effort
  applied as sampling temperature, distinct `session_id` per check with
  `clear_session_shell_state` on teardown, tokio timeout + cancel token.
- Judge tool (`report-check-result`): a per-check `cersei_tools::Tool` closing
  over a write-once `VerdictSink` + the agent's `CancellationToken`; records the
  verdict in-process and cancels the agent so `run()` returns immediately.
- Least-privilege by default: read-only tool set (Read/Grep/Glob) + judge under
  `AllowReadOnly`.
- New `AgentOutcome { verdict, stop_reason, turns, error }`; execution retries
  max-turns-without-verdict / stream-error / timeout per `max_attempts`.
- Remove `src/checks/mcp/` and `mcp_config_path`; drop now-dead `rmcp`, `axum`,
  `schemars` deps.
- Executor selectable via `--executor` / `MULTI_CHECKS_EXECUTOR` / config
  (`cersei` default, `claude` fallback); the `claude -p` fallback now reports
  via a sentinel JSON file since the MCP server is gone.
- Keep the slim `CheckExecutor` seam (Cersei + Claude + Fake); execution-phase
  tests still run without network or a live model.
- Update `guides/checks.md` and the self-validation `CHECKS.md` to the new
  in-process design.

Two cersei-provider 0.1.9 bugs surfaced by live runs against Anthropic, both
worked around so checks actually run:
- It sends a stale `anthropic-beta: interleaved-thinking-2025-04-14` header
  unconditionally (HTTP 400). Vendored via [patch.crates-io]
  (third_party/cersei-provider) with that header corrected. See
  third_party/cersei-provider/PATCH.md.
- Its SSE parser drops `signature_delta`, so an extended-thinking block is sent
  back on turn 2 with an empty signature (HTTP 400, "Invalid signature in
  thinking block"). Thinking is therefore left disabled; effort maps to
  temperature until upstream is fixed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@RobbieMcKinstry RobbieMcKinstry added this pull request to the merge queue Jun 25, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 25, 2026
@RobbieMcKinstry RobbieMcKinstry added this pull request to the merge queue Jun 25, 2026
…367)

The cersei agent deps pull in usearch (cersei-agent -> cersei-tools ->
cersei-embeddings -> usearch -> cxx -> link-cplusplus), which compiles C++.
`musl-tools` only provides musl-gcc (C), so the x86_64-unknown-linux-musl
build fails with `failed to find tool "x86_64-linux-musl-g++"`.

Install a full GNU musl-cross toolchain for the musl target and point cc-rs
(CC/CXX/AR) and the Rust linker at it. GNU is deliberate: link-cplusplus links
GNU libstdc++, which this toolchain bundles statically, so the fully-static
musl binary links the C++ objects cleanly.

Wired via the dist `github-build-setup` stub (release-prebuild.yml.stub) so
`dist generate` injects it into release.yml, and mirrored into the hand-written
on-merge.yml merge-queue job. Gated on the musl target so gnu builds are
unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@RobbieMcKinstry RobbieMcKinstry removed this pull request from the merge queue due to a manual request Jun 25, 2026
@RobbieMcKinstry RobbieMcKinstry added this pull request to the merge queue Jun 25, 2026
Merged via the queue into trunk with commit 5771c05 Jun 25, 2026
18 checks passed
@RobbieMcKinstry RobbieMcKinstry deleted the robbie/multi-1367 branch June 25, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant