feat(check): in-process cersei-agent executor replacing claude -p + MCP (MULTI-1367)#144
Merged
Conversation
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
1a91294 to
e8a5496
Compare
… MCP (MULTI-1367)
Replace the shell-out `ClaudeExecutor` and the in-process MCP result server
with an in-process `cersei-agent` executor that captures verdicts through a
per-check `cersei-tools` judge tool. Consumes MULTI-1359's provider config.
- `CerseiExecutor` (default): builds a `cersei_agent::Agent` per check in its
CoW sandbox — concrete `.model` from the allowlist, provider via a per-check
`ProviderFactory` (cersei `Agent` takes an owned `Box<dyn Provider>`), effort
applied as sampling temperature, distinct `session_id` per check with
`clear_session_shell_state` on teardown, tokio timeout + cancel token.
- Judge tool (`report-check-result`): a per-check `cersei_tools::Tool` closing
over a write-once `VerdictSink` + the agent's `CancellationToken`; records the
verdict in-process and cancels the agent so `run()` returns immediately.
- Least-privilege by default: read-only tool set (Read/Grep/Glob) + judge under
`AllowReadOnly`.
- New `AgentOutcome { verdict, stop_reason, turns, error }`; execution retries
max-turns-without-verdict / stream-error / timeout per `max_attempts`.
- Remove `src/checks/mcp/` and `mcp_config_path`; drop now-dead `rmcp`, `axum`,
`schemars` deps.
- Executor selectable via `--executor` / `MULTI_CHECKS_EXECUTOR` / config
(`cersei` default, `claude` fallback); the `claude -p` fallback now reports
via a sentinel JSON file since the MCP server is gone.
- Keep the slim `CheckExecutor` seam (Cersei + Claude + Fake); execution-phase
tests still run without network or a live model.
- Update `guides/checks.md` and the self-validation `CHECKS.md` to the new
in-process design.
Two cersei-provider 0.1.9 bugs surfaced by live runs against Anthropic, both
worked around so checks actually run:
- It sends a stale `anthropic-beta: interleaved-thinking-2025-04-14` header
unconditionally (HTTP 400). Vendored via [patch.crates-io]
(third_party/cersei-provider) with that header corrected. See
third_party/cersei-provider/PATCH.md.
- Its SSE parser drops `signature_delta`, so an extended-thinking block is sent
back on turn 2 with an empty signature (HTTP 400, "Invalid signature in
thinking block"). Thinking is therefore left disabled; effort maps to
temperature until upstream is fixed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e8a5496 to
b5b4229
Compare
…367) The cersei agent deps pull in usearch (cersei-agent -> cersei-tools -> cersei-embeddings -> usearch -> cxx -> link-cplusplus), which compiles C++. `musl-tools` only provides musl-gcc (C), so the x86_64-unknown-linux-musl build fails with `failed to find tool "x86_64-linux-musl-g++"`. Install a full GNU musl-cross toolchain for the musl target and point cc-rs (CC/CXX/AR) and the Rust linker at it. GNU is deliberate: link-cplusplus links GNU libstdc++, which this toolchain bundles statically, so the fully-static musl binary links the C++ objects cleanly. Wired via the dist `github-build-setup` stub (release-prebuild.yml.stub) so `dist generate` injects it into release.yml, and mirrored into the hand-written on-merge.yml merge-queue job. Gated on the musl target so gnu builds are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7964562 to
ec16b2c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Replace the shell-out
ClaudeExecutorand the in-process MCP result serverwith an in-process
cersei-agentexecutor that captures verdicts through aper-check
cersei-toolsjudge tool. Consumes MULTI-1359's provider config.CerseiExecutor(default): builds acersei_agent::Agentper check in itsCoW sandbox — concrete
.modelfrom the allowlist, provider via a per-checkProviderFactory(cerseiAgenttakes an ownedBox<dyn Provider>), effortmapped to
EffortLevelthinking budget, distinctsession_idper check withclear_session_shell_stateon teardown, tokio timeout + cancel token.report-check-result): a per-checkcersei_tools::Toolclosingover a write-once
VerdictSink+ the agent'sCancellationToken; records theverdict in-process and cancels the agent so
run()returns immediately.AllowReadOnly.AgentOutcome { verdict, stop_reason, turns, error }; execution retriesmax-turns-without-verdict / stream-error / timeout per
max_attempts.src/checks/mcp/andmcp_config_path; drop now-deadrmcp,axum,schemarsdeps.--executor/MULTI_CHECKS_EXECUTOR/ config(
cerseidefault,claudefallback); theclaude -pfallback now reportsvia a sentinel JSON file since the MCP server is gone.
CheckExecutorseam (Cersei + Claude + Fake); execution-phasetests still run without network or a live model.
guides/checks.mdand the self-validationCHECKS.mdto the newin-process design.
Vendor cersei-provider 0.1.9 via [patch.crates-io] (third_party/cersei-provider)
with a one-line fix: upstream sends a stale
anthropic-beta: interleaved-thinking-2025-04-14header unconditionally, which the currentAnthropic API rejects with HTTP 400 — breaking every in-process check. See
third_party/cersei-provider/PATCH.md; remove once upstream ships a fix.
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com