Release v3.3.0 — cluster-grade runtime + review hardening by ZhiXiao-Lin · Pull Request #47 · AI45Lab/Code

ZhiXiao-Lin · 2026-05-29T03:44:33Z

Summary

a3s-code v3.3.0 — cluster-grade runtime features for hosting long-lived
agent sessions across nodes, plus an adversarial-review hardening pass.
All additions are backward compatible (new methods, new optional
fields, SessionStore trait methods with default no-op impls) → minor bump.

Full details in CHANGELOG.md. Highlights:

Added

Lifecycle: real AgentSession::close() (cancels run + subagent tasks +
HITL), agent-side session registry (list_sessions/close_session/close),
session-level CancellationToken hierarchy.
Identity labels: tenant_id/principal/agent_template_id/correlation_id
on SessionOptions+SessionData (opaque, host-driven), exposed on both SDKs.
BudgetGuard cost/quota contract wired at the LLM call site (Deny →
BudgetExhausted, SoftLimit → event). SDK bridges (Python class, Node
setBudgetGuard, fail-closed).
HostEnv (IdGenerator + Clock) injection for deterministic replay.
Loop checkpoints + resume_run: per-tool-round persistence (crash-atomic
file writes), resume from the last boundary on any node sharing the store.
SessionRetentionLimits: FIFO caps on run/event/trace/terminal-subagent
state. MCP idle disconnect. Cluster AgentEvent variants. Subagent
tracker persistence.

Fixed (review hardening)

Loop-checkpoint leak (unbounded growth) — now cleared on terminal.
event_count corruption on restore. Node BudgetGuard fail-open → fail-closed.
MCP timestamp leak. Session-registry Weak pruning. Eviction TOCTOU
(atomic multi-lock).

Testing

1705 lib + all integration files green; Node 27 + Python 19 cargo tests;
all SDK smokes; clippy clean (core + both SDKs); cargo fmt --check clean.
Real-LLM end-to-end: 5 #[ignore] tests pass against
openai/MiniMax-M2.7-highspeed (BudgetGuard real usage, deny-blocks-call,
resume_run metric carry-forward, checkpoint clear, identity labels).
Version alignment green at 3.3.0 (scripts/check_release_versions.sh).

Release

Merging this, then pushing the v3.3.0 tag, triggers release.yml
(crates.io → npm → GH Release wheels → PyPI bootstrap → GitHub Release).

Known limitation

Node BudgetGuard callbacks must not throw (napi return-conversion aborts the
process); wrap in try/catch and return a decision. Hangs are fail-closed.
Python is unaffected.

Turn AgentSession::close() from a thin cancel() wrapper into a real cleanup path: - New `closed: AtomicBool` state; `is_closed()` accessor. - send / stream / send_with_attachments / stream_with_attachments fast-fail with new `CodeError::SessionClosed { session_id }` once closed. - close() is idempotent and runs the full sequence: cancel current run -> cancel all in-flight delegated subagent tasks -> cancel pending HITL confirmations. Tests cover the closed-state lifecycle (idempotency, post-close send/stream fast-fail, close-mid-flight cancellation).

Make AgentSession the parent of all in-flight cancellation tokens so session shutdown cascades automatically and races between close() and new operations are safe. - New `session_cancel: CancellationToken` field on AgentSession; public `session_cancel_token()` accessor lets embedders observe or fire it directly. - BlockingRunContext and StreamRunContext derive their per-run token via `session.session_cancel.child_token()` instead of fresh tokens. - Run lifecycles classify a failed run as Cancelled vs Failed by sampling the per-run token's `is_cancelled()` flag before clearing it — so runs killed via session_cancel land as Cancelled in the run store, not Failed. - close() fires session_cancel before per-run bookkeeping so any concurrently-spawned child inherits an already-cancelled token. Tests cover the token lifecycle and propagation to in-flight runs.

Make agent-side proactive shutdown a first-class API: enumerate live sessions, close one by ID, or close the whole agent (and its global MCP) in a single call. - New `agent_api/session_close.rs` module: `SessionCloseHandle` is an Arc-shareable bundle of every field needed to perform the close sequence. AgentSession holds an `Arc<SessionCloseHandle>`; Agent stores `Weak<SessionCloseHandle>` in its registry. - Agent gains `sessions: Mutex<HashMap<id, Weak<SessionCloseHandle>>>` and `closed: AtomicBool`. New public methods: `list_sessions()`, `close_session(id)`, `close()`, `is_closed()`. - AgentSession::close() refactored to a single-line delegate to the shared handle, so `agent.close_session(id)` and `session.close()` can never drift in behaviour. - `Agent::close()` also walks `global_mcp.list_connected()` and disconnects every server so background workers exit. - `Agent::session()` / `resume_session()` fail fast with `CodeError::SessionClosed` once the agent is closed. - Dead `Weak` entries are pruned lazily on `list_sessions()` / `close_session()` access — no background sweeper needed. Tests cover live-tracking with drop-driven pruning, close-by-id, agent-wide close, and post-close session() rejection.

Propagate the new core close-and-registry surface to the Node and Python SDKs. Node (napi): - Agent: listSessions / closeSession / close / isClosed. - Session: isClosed. Python (pyo3): - Agent: list_sessions / close_session / close + is_closed getter. - Session: is_closed getter. README updated to show the agent-side lifecycle calls alongside the existing session.cancel() / session.close() examples.

Cover the surfaces unit tests can't reach: - core/tests/test_session_close_lifecycle.rs (3 tests): * close_with_subagent_in_flight_marks_task_cancelled_and_resists_regression — verifies the session-close → subagent-tracker cross-module path, including the "late SubagentEnd must not regress Cancelled" contract. * agent_close_handles_global_mcp_branch_and_is_idempotent — exercises Agent::close() both with and without global_mcp, and confirms post-close session() rejection. * session_drop_prunes_registry_under_concurrency — multi-thread runtime spawns 32 sessions, drops half, asserts the Weak-ref registry converges to exactly the held set. - sdk/python/tests/test_session_close.py - sdk/node/test_session_close.mjs Smoke tests for the SDK wrappers: session.is_closed, session.close() idempotency, agent.list_sessions(), agent.close_session(id), agent.close() rejecting later session(). Verified runnable locally (maturin develop / napi build). Adds `AgentSession::subagent_tracker()` accessor so embedders with a custom subagent execution path can feed lifecycle events through the same close() pipeline as the built-in `task` tool — also unblocks the in-flight integration test. generated.d.ts is regenerated by `napi build` and now reflects the step-4 Agent / Session close methods.

Close the framework gap that blocked 书安OS from migrating sessions: session.save() now also persists the InMemorySubagentTaskTracker snapshots, and resume_session restores them. - SessionStore trait: new save_subagent_tasks / load_subagent_tasks with no-op defaults (backwards compatible for custom backends). - MemorySessionStore + FileSessionStore implement them; FileSessionStore writes to <dir>/subagent_tasks/<id>.json and cleans up on delete(). - InMemorySubagentTaskTracker::replace_snapshots() — restoration entry point. Cancellers are intentionally NOT restored (runtime-only). - session_persistence: wired into both save() and restore_persisted_session_state(). Integration test (test_session_close_lifecycle::subagent_tasks_persist _across_save_and_resume) covers the full matrix of terminal states (Completed/Failed/Cancelled) and verifies the post-restore semantics: late cancel attempts on terminal tasks safely return false. Before this, restoring a session lost its delegated child run history — which is the materialized view 书安OS needs to make scheduling / billing / drain decisions on a session about to migrate.

…emplate / correlation (P5) Add four optional, framework-opaque string slots to SessionOptions -> AgentSession -> SessionData so the host (书安OS) can drive multi-tenancy / accounting / distributed tracing without string-hacking session_id. - SessionOptions builder helpers: with_tenant_id / with_principal / with_agent_template_id / with_correlation_id. - AgentSession accessors: tenant_id() / principal() / agent_template_id() / correlation_id(). - Round-trip via SessionData (#[serde(default, skip_serializing_if = "Option::is_none")]) — forward-compatible for stores written by older versions. - apply_persisted_runtime_options restores labels on resume but caller- supplied opts take precedence (so a resume can intentionally relabel). Framework only transports these strings — it never interprets tenant boundaries, enforces quotas, or routes by template. That belongs to custom HookExecutor / PermissionChecker / (future) BudgetGuard impls the host plugs in. Tests: unit (defaults + round-trip via SessionOptions) + integration (persist & restore through MemorySessionStore, including caller-side override).

…PassivationRequested / PeerInvocation (P6) Extend AgentEvent with three variants that the host (书安OS) emits via HookExecutor to surface platform-level decisions inside the session loop: - BudgetThresholdHit { resource, kind, consumed, limit, message? } — host's BudgetGuard fired a soft/hard threshold. In-session policy decides what to do (compact, refuse next LLM call, etc). - PassivationRequested { reason, deadline_ms? } — host wants the session to release in-memory caches before close/migration. Framework does not act on this; in-session hooks can flush derived state. - PeerInvocation { from_session_id, from_tenant_id?, correlation_id? } — marks a send/stream that originated from a peer session in the cluster, so hooks can distinguish human-driven from peer-driven work. These variants are pure schema additions. The framework never produces them itself — only the host's transport / control plane does. They exist so cluster events flow through the same hook/trace pipeline as agent-loop events, without coupling the framework to a transport. #[non_exhaustive] on AgentEvent (already present) keeps minor releases non-breaking. New variants use #[serde(default, skip_serializing_if = "Option::is_none")] for optional fields to allow forward-compatible producers. Test: schema lock unit test verifies stable JSON tags (used by external producers) and round-trip with minimal payloads.

Define the cluster-grade cost/quota contract a3s-code owns at the framework level, with enforcement wired into the LLM call path. Real implementations (per-tenant Redis budgets, per-day USD caps, etc) live in 书安OS. - core/budget.rs: * BudgetGuard trait with default no-op methods (check_before_llm, record_after_llm, check_before_tool). * BudgetDecision { Allow, SoftLimit, Deny } — Allow proceeds silently, SoftLimit emits a BudgetThresholdHit("soft") event and continues, Deny bails with CodeError::BudgetExhausted. * NoopBudgetGuard as the framework default. - SessionOptions::with_budget_guard + AgentSession routes it through AgentConfig.budget_guard. - agent/llm_turn.rs::call_llm_with_circuit_breaker now consults the guard *once per turn* (not per retry): * Deny -> emit BudgetThresholdHit("hard") + return Err with "Budget exhausted on '{resource}': {reason}" * Soft -> emit BudgetThresholdHit("soft") + proceed * Allow -> proceed silently And calls record_after_llm with the real provider usage on success. - New CodeError::BudgetExhausted variant for SDK-level type matching. - Internal estimate_prompt_tokens helper (char/4 heuristic) — impls needing precision should use record_after_llm instead. Tests: NoopBudgetGuard fast-path; custom guard returning Deny; full integration via AgentSession::send verifies the LLM is never reached, history stays clean, record_after_llm doesn't fire on Deny. The framework still does not interpret tenants / cost / time itself — the trait is the boundary between framework mechanism and host policy.

Lift the framework's ambient capabilities — fresh IDs and current time — out of `uuid::Uuid::new_v4()` / `SystemTime::now()` direct calls and into a host-provided pair. Unlocks two cluster-grade features: 1. Deterministic replay of a run on another node (book-keep the seed, replay bit-identical). 2. Time-bending tests without monkey-patching std::time. - core/host_env.rs: * `IdGenerator` + `Clock` traits (Send+Sync+Debug). * `HostEnv { id_generator, clock }` bundle so a single Arc slot plumbs both. * Defaults: `SystemIdGenerator` (UUID v4) + `SystemClock` (wall time) — observably identical to pre-P2 behaviour. * Replay helpers: `SequentialIdGenerator { prefix, counter }` + `FixedClock { atomic now_ms }`. Public so external host crates (e.g. 书安OS replay) can use without copying the pattern. - AgentConfig gains `host_env: Arc<HostEnv>` defaulting to `HostEnv::system()`. SessionOptions::with_host_env() lets the host swap it. - Migrated two highest-leverage call sites (proof-of-pattern; the rest is leaf work that can move incrementally): * session_id generation in session_builder::prepare_session_options * run_id generation in RunControlState::start_run via new `InMemoryRunStore::create_run_with_id` overload (back-compat alias `create_run` kept). Tests: trait roundtrips (Seq/FixedClock determinism), and a full integration that wires SequentialIdGenerator into a fresh Agent and verifies the resulting sessions have session_id "test-0", "test-1". The framework still does not interpret IDs — it just consults the generator. Replay infrastructure (P3) lives on top of this contract.

Land the data contract and persistence wiring for crash-tolerant runs. After each completed tool round the agent loop snapshots the boundary into a LoopCheckpoint keyed by run_id; 书安OS picks this up to migrate / replay a run on another node. Boundary policy: checkpoints are taken **only between** tool rounds, never mid-tool. If a process dies mid-tool the work of that round is lost — the LLM re-deliberates from the previous checkpoint. This trades retry cost for correctness; re-executing a non-idempotent tool (write, bash) across the boundary is worse than re-asking. What this cut delivers: - core/loop_checkpoint.rs: * `LoopCheckpoint` struct (serde, schema_version=1 with forward- compatible `#[serde(default)]`). * `LoopCheckpointSink` trait — save_checkpoint / load_latest. * `SessionStoreCheckpointSink` default adapter — forwards into a `SessionStore`; failures are warn-logged but never halt a run. - `SessionStore` trait: save_loop_checkpoint / load_loop_checkpoint with no-op defaults (backwards compatible). MemorySessionStore + FileSessionStore implement them; FileSessionStore writes under `<dir>/loop_checkpoints/<run_id>.json` (keyed by run_id, not session_id — multiple runs per session). - `AgentLoop`: new `checkpoint_sink` + `checkpoint_run_id` fields; builder helpers `with_checkpoint_sink` + `set_checkpoint_run`. `build_agent_loop` auto-wires a sink from `session.session_store` when one is configured. BlockingRunContext + StreamRunContext bind the run id via `set_checkpoint_run` after `start_run` returns. - `loop_runtime::execute_loop_inner` calls `persist_loop_checkpoint` after each successful `execute_tool_turn`, capturing `(turn, messages, total_usage, tool_calls_count, verification_reports, host_env.now_ms())`. Checkpoint write is fire-and-forget — sink errors don't propagate to the loop. Tests: - core/src/loop_checkpoint.rs: JSON round-trip + forward-compat (missing schema_version defaults to 0). - integration: * loop_checkpoint_round_trips_through_session_store — full store contract: save → load → identical, second save overwrites, unknown run_id → None. * send_without_tool_calls_does_not_emit_loop_checkpoint — contractual negative: no tool rounds → no checkpoint pollution. Cut 2 (separate commit) will add `AgentSession::resume_run(run_id)` to actually pick up from a checkpoint, plus the end-to-end "crash mid-run on node A, finish on node B" integration test that needs a tool-using mock LLM. Data contract lands now so 书安OS can start building the surrounding control plane.

…3 cut 2) Land the resume side of the loop-checkpoint contract. After P3 cut 1 the framework persisted per-tool-round boundaries; this cut lets a new process pick up that boundary and continue. API: - `AgentSession::resume_run(checkpoint_run_id: &str) -> Result<AgentResult>` loads the latest `LoopCheckpoint` for the given run id from the session's `SessionStore` and replays the agent loop from the checkpoint's `messages`. Semantics: - A **new** run id is allocated for the resumed work — the framework does not pretend the old run continues. The relationship between old and new run is host metadata (e.g. 书安OS tracks it). - The new run also writes per-tool-round checkpoints (P3 cut 1 wiring), so a session can survive multiple node failures. - Two distinguishable error paths so cluster scheduling code can branch: * "resume_run requires a session_store" — host should fall back to a fresh session. * "no loop checkpoint found for run 'X'" — host can retry later (race against checkpoint write) or treat the run as lost. Implementation: - `conversation_runtime::resume_run` mirrors `send_with_attachments`' structure (BlockingRunContext + execute_from_messages) but feeds `checkpoint.messages` as the prebuilt message list, so the loop resumes at exactly the boundary state instead of appending a new user prompt. Integration test (`resume_run_error_paths_are_distinguishable`) covers both error branches so the strings stay stable for host-side matching. Together with cut 1, P3 now provides the full mechanism 书安OS needs: the framework persists boundary state automatically during a live run, and a fresh node can replay from any persisted boundary by calling resume_run on a freshly-created session bound to the same store. The framework still does not handle node selection, drain choreography, or run-graph metadata — those remain 书安OS concerns.

Mirror the P5 (identity labels) and P3 cut 2 (resume_run) framework additions through both SDKs so host code can drive them from JS/TS or Python without reaching into the Rust core. Node (napi): - SessionOptions gains optional `tenantId / principal / agentTemplateId / correlationId` fields. js_session_options_to_rust forwards each via the matching `with_*` builder. - Session gets four read-only getters with the same names plus an async `resumeRun(checkpointRunId)` that returns the AgentResult of the resumed run. - generated.d.ts regenerated by `napi build` so TypeScript callers see the new types. Python (pyo3): - PySessionOptions gains `tenant_id / principal / agent_template_id / correlation_id` with paired getter/setters (matching the existing session_id pattern). build_rust_session_options forwards each. - PySession exposes the same four labels as `@getter` properties plus a `resume_run(checkpoint_run_id)` method that raises RuntimeError on missing store / missing checkpoint (the framework's two distinguishable error strings stay intact for host-side branching). Verification: Node 27 unit tests + Python 19 unit tests still green; both SDKs pass `cargo clippy --lib -- -D warnings`.

Add a quick-reference example to the "Main APIs At A Glance" snippets showing the new identity labels (tenant_id / principal / agent_template_id / correlation_id) and `resume_run` flow. Same in both the Python and TypeScript mirror sections so SDK callers see the surface immediately. Detailed semantics live in apps/docs/content/docs/{en,cn}/code/ api-contract.mdx ("Cluster-grade extension points").

Long-running sessions accumulated three classes of in-memory state without bound: run records + per-run event buffers in InMemoryRunStore, trace events in InMemoryTraceSink, and terminal subagent task snapshots in InMemorySubagentTaskTracker. Fine for short sessions, a memory leak for the cluster workloads 书安OS is expected to host (hours / days / thousands per node). This commit lands FIFO eviction caps that the host opts into per session — defaults stay unbounded so existing callers see no behaviour change. API: - New `retention::SessionRetentionLimits` struct with four optional caps: max_runs_retained, max_events_per_run, max_trace_events, max_terminal_subagent_tasks. - `SessionOptions::with_retention_limits(...)` plumbs them into AgentConfig via session_builder + capabilities. Eviction policy (oldest-first, idempotent): - InMemoryRunStore: parallel VecDeque tracks insertion order; on create_run past the cap, oldest run + its events are dropped. Per-run events are FIFO-trimmed in record_event so the buffer stays at most max_events_per_run. `event_count` on RunSnapshot remains the cumulative total — not decremented on eviction. replace_records (resume path) rebuilds the queue in creation order so restored sessions honour the same cap. - InMemoryTraceSink: drain front past cap; preserves the most recent (most useful for debugging) events. drain(..excess) is O(n) per push at cap; switch to VecDeque if the trace path becomes a perf bottleneck. - InMemorySubagentTaskTracker: new terminal_order VecDeque records every Running → terminal transition (Completed / Failed / Cancelled). Running tasks are **never** evicted — only terminal snapshots. cancel() and the SubagentEnd record_event path both participate idempotently (a late SubagentEnd after cancel doesn't double-push). Tests: - Three unit-test blocks (one per store) cover: cap enforcement, FIFO ordering, default-unbounded behaviour, and the running-task-immunity property of the subagent tracker. - Integration test `retention_limits_are_plumbed_into_subagent_tracker` exercises the public SessionOptions → AgentSession → tracker path end-to-end via the same accessor 书安OS would use. Test count: 1692 unit (+9 new) + 9 integration (+1 new), clippy clean on `--lib -- -D warnings`.

The P3 cut-2 resume_run API only had error-path tests (missing store / missing checkpoint). Lock the happy-path contract too — write a LoopCheckpoint, call resume_run, verify the loop picks up from the checkpointed messages. test_resume_run_picks_up_from_persisted_checkpoint: - Seed a LoopCheckpoint in MemorySessionStore with messages representing one prior tool round. - Build a session bound to the same store with a StaticStreaming mock LLM that produces a final-answer text. - Call session.resume_run(seeded_run_id). - Assert: * AgentResult.text matches the mock's final response — proving the loop fed the checkpoint's messages to the LLM via execute_from_messages and ran to completion. * runs() contains exactly one new run whose id is NOT the seeded run id — framework allocates fresh, never pretends to continue the old run. * The seeded checkpoint stays in the store under the old run id — resume does not delete; retention is the host's call. Crash simulation is reduced to a manual checkpoint seed because the in-process agent loop has no "die mid-round" affordance suitable for unit testing. The contract surface that 书安OS will sit on (write on node A, hand run id to node B, resume) is fully exercised through the public API. 1693 lib tests + 9 integration green; clippy clean.

Hosts running thousands of long-lived sessions accumulate MCP subprocesses + transport connections even when individual servers fall quiet. Add a periodic-sweep API the host calls (e.g. every 60s) to drop servers whose last activity is older than a threshold. McpManager: - New `last_used_at_ms: HashMap<String, u64>` parallel to `clients`. Stamped on `connect` (initial use) and on every successful `call_tool` (active use). Cleared on `disconnect`. - `pub async fn last_used_at_ms(name) -> Option<u64>` for host-side observability (e.g. dashboards / Prometheus scrapes). - `pub async fn touch(name)` so hosts can mark a server as warm out-of-band when activity comes via a side channel. - `pub async fn disconnect_idle(threshold_ms) -> Vec<String>` — iterates connected clients, drops every one whose timestamp is older than `now - threshold_ms` (or has no timestamp at all — treated as infinitely idle). Per-server disconnect failures are warn-logged but never panic; the result vec lists every name attempted. Agent facade: - New `Agent::disconnect_idle_mcp(threshold_ms) -> Vec<String>` forwards to the global manager when present, else returns empty. This is the entry point a host's idle-reaper will call. Tests: - Three unit tests covering touch monotonicity, idle-sweep no-op with no live clients, and disconnect cleanup of the timestamp entry. - One Agent-level test covering the no-global-mcp short-circuit (contract surface for hosts). Uses a local `now_epoch_ms` helper rather than HostEnv (the manager predates HostEnv wiring; threading the host's Clock through to MCP is a separate change if needed for replay). 1697 unit + 9 integration tests green; clippy clean.

…Guard Propagate the framework-side BudgetGuard contract through pyo3 so Python hosts (the most likely 书安OS surface) can plug in cluster- wide cost / quota policy without writing Rust. Bridge: - `PyBudgetGuard { inner: Py<PyAny> }` implements a3s_code_core::budget::BudgetGuard via the async-trait. Each method acquires the GIL (Python::with_gil), looks up the named method on the held PyObject by `getattr`, and calls it. Missing methods behave as Allow / no-op so user classes only need to define what they actually want to govern. - `parse_py_budget_decision` parses the return value: None | {"decision": "allow"} → Allow {"decision": "soft", "resource", "consumed", "limit", "message"?} → SoftLimit {"decision": "deny", "resource", "reason"} → Deny Unknown / malformed shapes default to Allow (fail-safe). - Python callback raising never propagates: warning is printed to stderr, decision falls back to Allow / record_after_llm becomes no-op. A misbehaving guard cannot halt a live session. SDK surface: - PySessionOptions gains `budget_guard: Option<PyObject>` with matching `@getter` / `@setter`. build_rust_session_options wraps the held callable as `Arc<dyn BudgetGuard>` and calls with_budget_guard. Test (sdk/python/tests/test_budget_guard.py): - DenyingGuard returns {"decision":"deny", ...} from check_before_llm; session.send raises RuntimeError mentioning "Budget exhausted" / "llm_tokens"; check_before_llm fires exactly once; record_after_llm does not fire; session_id propagates correctly. - AllowingGuard with only no-op methods constructs a session cleanly — proves missing methods are tolerated. - SessionOptions.budget_guard round-trips through getter/setter including None reset. GIL acquisition blocks the tokio worker thread briefly per call. Acceptable here because BudgetGuard fires at most once per LLM turn / tool call, not on hot tool execution paths. Python SDK: 19 cargo tests + smoke test green, clippy clean.

Propagate the framework-side BudgetGuard contract through napi-rs so JS hosts can plug in cluster-wide cost / quota policy. The bridge follows the same pattern as Python's PyBudgetGuard but uses ThreadsafeFunction for cross-thread JS calls. Framework support (one small core addition): - AgentSession::set_budget_guard(Option<Arc<dyn BudgetGuard>>) + budget_guard() — runtime-mutable override slot. Read by build_agent_loop at every send/stream, takes precedence over config.budget_guard. Needed because JsFunction values can't live inside the value-typed SessionOptions. Node SDK surface: - `session.setBudgetGuard({checkBeforeLlm, recordAfterLlm, checkBeforeTool})` — all three handlers optional; missing methods fall back to Allow / no-op. Pass `null` for the whole arg to clear the guard. - Returns `BudgetDecision`: `null` | `{decision:'allow'}` → Allow `{decision:'soft', resource, consumed, limit, message?}` → emits `BudgetThresholdHit('soft')`, proceeds `{decision:'deny', resource, reason}` → aborts with "Budget exhausted" error - `BudgetGuardHandlers` napi(object) shape — typed in generated.d.ts. Bridge internals (NodeBudgetGuard): - ThreadsafeFunction<serde_json::Value, ErrorStrategy::Fatal> per handler. `Fatal` (not CalleeHandled) so the JS callback receives *only* the positional args — no leading Node-style `err`. - Args fanned out as `serde_json::Value::Array(...)`; the transformer closure unpacks the array into multiple positional JsUnknowns. - Rust async trait method blocks via `tokio::task::block_in_place` on a sync `mpsc::sync_channel` waiting for the JS callback's return value. 5-second timeout falls back to Allow. - Decisions parsed with the same shape as Python: - `parse_js_budget_decision(JsUnknown) -> BudgetDecision` - Unknown / malformed shapes fall back to Allow (fail-safe). Tests: - Unit: `test_runtime_budget_guard_overrides_session_options_value` verifies the runtime override slot wins over config.budget_guard and that clearing it (`set_budget_guard(None)`) restores the unguarded path. - Smoke: `sdk/node/test_budget_guard.mjs` installs a JS guard whose checkBeforeLlm denies, asserts `session.send` throws `Budget exhausted`, checkBeforeLlm fires exactly once, and recordAfterLlm doesn't fire. Then verifies `setBudgetGuard(null)` is accepted without error. generated.d.ts regenerated by `napi build:debug` — `setBudgetGuard` and `BudgetGuardHandlers` show up with the documented TS shape. Core: 1698 unit tests pass; Node: 27 cargo tests + smoke green; clippy clean across core and Node SDK.

Extend the Python and TypeScript "Main APIs at a Glance" snippets with two new sections so the recent additions are discoverable: - (15) Long-running session ops — SessionRetentionLimits + agent.disconnectIdleMcp / disconnect_idle_mcp. - (16) Budget / cost governance — Python opts.budget_guard class-shape and Node session.setBudgetGuard({...}) handler-shape, both showing Deny → "Budget exhausted" and the implicit Allow fallthrough. Detailed semantics live in apps/docs/content/docs/{en,cn}/code/ api-contract.mdx (covered in the docs-site commit on the parent repo).

@Getter

Propagate the framework-side retention caps through both SDKs so host code can stop long-running cluster sessions from accumulating in-memory state. Each cap is optional; missing fields keep the framework's unbounded default for that store. Python (pyo3): - PySessionOptions gains `retention_limits: Option<PyObject>` with matching @Getter / @Setter. - `parse_py_retention_limits` accepts a dict shape with optional integer keys: max_runs_retained / max_events_per_run / max_trace_events / max_terminal_subagent_tasks. Unknown / non-int values are ignored (no error), missing keys keep the default. - Forwarded via SessionOptions::with_retention_limits. Node (napi): - New `#[napi(object)] RetentionLimitsObject` with four optional `u32` fields (TypeScript-friendly), generated into the .d.ts as `RetentionLimitsObject` and surfaced on SessionOptions as `retentionLimits?: RetentionLimitsObject`. - js_session_options_to_rust converts each present field to usize and forwards via with_retention_limits. Verification: - Python SDK 19 cargo tests pass; clippy clean. - Node SDK 27 cargo tests pass; clippy clean. - generated.d.ts exposes the new types with the documented camelCase shape (retentionLimits + maxRunsRetained etc.).

Add a single integration test that exercises the **full** cluster- grade API surface in one realistic two-node lifecycle. This is the reference flow 书安OS-side scheduling code targets — every new piece shipped in the cluster pillars work participates. `cluster_ops_consolidated_session_lifecycle`: Node A (one Agent): - Creates a session with identity labels (tenant_id / principal / agent_template_id / correlation_id), retention caps, and a shared MemorySessionStore. - Injects a Completed subagent task into the tracker. - Persists state via session.save(). - Seeds a LoopCheckpoint for an "in-flight" run. - Drops the agent (simulating node failure / drain). Node B (a *different* Agent on the same store): - agent_b.resume_session("cluster-ops-target", opts) — Node B hydrates the session from the shared store. - Asserts identity labels survive verbatim. - Asserts the subagent task history (Completed status) survives. - Asserts the LoopCheckpoint persists across the migration with run id, turn count, message vec, and token usage intact — this is the contract resume_run reads from. The test deliberately doesn't *call* resume_run() because the test config has no real LLM credentials — that flow is covered by test_resume_run_picks_up_from_persisted_checkpoint in unit tests (uses build_session + mock streaming client). Here we lock the **data contract** for cross-node migration: identity, subagent view, and checkpoint state must round-trip through the store with no framework-level interpretation. Test count: 1698 lib + 10 integration (+1 new), clippy clean.

Adversarial multi-dimension review of the cluster-pillars batch surfaced 11 confirmed issues; this commit fixes every core-side one. HIGH: - H4 Checkpoint leak (unbounded growth). Loop checkpoints were written after every tool round and NEVER deleted — the dominant memory/disk leak for long-running hosts. Added SessionStore::delete_loop_checkpoint (memory + file impls); the run lifecycle now clears the checkpoint on any in-process terminal (complete/cancel/fail) in BlockingRunLifecycle ::complete and StreamRunLifecycle::wrap. Only a true crash (loop never returns) leaves a checkpoint for crash-recovery resume. Also: FileSessionStore::save_loop_checkpoint is now crash-atomic (temp + fsync + rename) — a checkpoint exists to survive a crash, so a half-written one (plain fs::write) that fails to parse defeats the purpose. (completeness-critic finding folded in.) - H3 event_count corruption. replace_records overwrote the persisted CUMULATIVE event_count with the (possibly trimmed) buffer length, corrupting audit counts for any restored run whose buffer hit max_events_per_run. Deleted the offending line; trust the snapshot. - H2 resume_run dropped checkpoint metrics. execute_from_messages built a fresh ExecutionLoopState (zeroed total_usage / tool_calls_count), so resumed runs under-reported cumulative cost. Added ExecutionSeed + ExecutionLoopState::new_seeded and threaded it through execute_from_messages_seeded -> BlockingRunContext -> resume_run, so a resumed run continues accounting from the checkpoint. MEDIUM: - M1 subagent eviction TOCTOU. mark_terminal_and_evict took terminal_order/tasks/cancellers locks separately, letting a concurrent record_event re-insert an evicted victim. Now holds all three together in one canonical order (callers drop their guards first → no deadlock). - M2 run-store eviction TOCTOU + lock-ordering. create_run_with_id locked runs/events separately (and in the opposite nesting order from the rest); now holds order+events+runs together for insert+evict in one canonical order. - M3 MCP timestamp leak. touch() records a timestamp unconditionally (even for a never-connected name) and disconnect_idle only scanned clients.keys(); orphan timestamps leaked. disconnect_idle now retains last_used_at_ms to live clients. LOW: - L1 Session registry dangling Weaks. close_agent now retain()s the registry before snapshotting handles. Tests: store delete + crash-atomic (no temp leftovers); lifecycle clears checkpoint on completion (deterministic run id); event_count preserved across replace_records after trim; resume_run carries non-zero checkpoint metrics (1002 not 2); two multi-thread concurrency stress tests guard the eviction lock-ordering changes against deadlock; MCP orphan-timestamp purge. 1705 lib + 10 integration green; clippy clean.

…/L2) Closes the SDK-side findings from the cluster-pillars review. H1 — Node BudgetGuard fail-OPEN hole (was: hung/slow guard silently ALLOWED, disabling budget enforcement): - call_decision now fails CLOSED: a guard that does not respond within timeoutMs -> Deny{budget_guard_timeout}; an unreadable return value -> Deny{budget_guard_error}. Previously both defaulted to Allow. - timeoutMs is configurable per guard (BudgetGuardHandlers.timeoutMs, default 5000). - Callbacks now receive a single context object — checkBeforeLlm({ sessionId, estimatedTokens }), recordAfterLlm({ sessionId, usage }), checkBeforeTool({ sessionId, toolName }). - Documented napi-rs constraint: a JS throw from a guard callback aborts the host process at return-value conversion (true under BOTH Fatal and CalleeHandled in this napi version — empirically verified), so callbacks MUST NOT throw; wrap in try/catch and return a decision. The fail-closed timeout still covers hangs. Python is unaffected (PyBudgetGuard catches exceptions). L2 — Python BudgetGuard re-entrancy doc: warn that calling session/agent APIs from inside a guard callback risks GIL re-entrancy deadlock. M4 — disconnect_idle_mcp now exposed in BOTH SDKs (was core-only, yet already referenced by the docs): - Node: agent.disconnectIdleMcp(idleThresholdMs) -> Promise<string[]> - Python: agent.disconnect_idle_mcp(idle_threshold_ms) -> list[str] The docs that referenced these now describe real methods. Verification: Node 27 + Python 19 cargo tests, clippy clean on both; Node smokes (budget deny path + fail-closed config + disconnectIdleMcp) and all three Python smokes (budget guard, session close + disconnect_idle_mcp, subagent query) pass against a fresh maturin build. generated.d.ts regenerated (setBudgetGuard ctx shape + timeoutMs + disconnectIdleMcp).

Reflow long lines in the BudgetGuard / disconnect_idle_mcp additions (281dc58) that the core-scoped pre-commit fmt hook did not check. Formatting only — no logic change (verified: only line wrapping).

Bump all packages 3.2.1 -> 3.3.0 and add the CHANGELOG entry. No push, no tag — release prep only. Version sync (scripts/check_release_versions.sh green at 3.3.0): - core/Cargo.toml, sdk/node/Cargo.toml (+ core dep pin), sdk/python/Cargo.toml (+ core dep pin) - sdk/node/package.json (+ @a3s-lab/code-* optionalDependencies) - sdk/python/pyproject.toml - sdk/python-bootstrap/pyproject.toml + _bootstrap.py __version__ - Cargo.lock, sdk/node/package-lock.json, sdk/node/examples/package-lock.json CHANGELOG 3.3.0 documents the cluster-grade runtime batch (session/agent lifecycle, identity labels, BudgetGuard, HostEnv, loop checkpoints + resume_run, retention caps, MCP idle disconnect, cluster AgentEvents, subagent persistence) plus the adversarial-review hardening fixes, and notes the Node BudgetGuard no-throw limitation. Minor bump per semver: all additions are backward compatible (new methods, new optional fields, SessionStore trait methods with default no-op impls).

Add #[ignore] integration tests (core/tests/test_real_llm_cluster_features.rs) that exercise the 3.3.0 LLM-loop features against a real model from .a3s/config.acl — validating paths mock clients cannot: - real_budget_guard_allow_records_actual_usage: record_after_llm receives the provider's ACTUAL non-zero token usage (mocks return fixed/zero). - real_budget_guard_deny_blocks_llm_call: Deny aborts before the provider is contacted; no usage, no history. - real_resume_run_carries_checkpoint_metrics_forward: resume_run against the live model continues cumulative metrics from the checkpoint. - real_run_with_store_leaves_no_dangling_checkpoint: completed real run clears its loop checkpoint (leak-fix lifecycle, end-to-end). - real_identity_labels_survive_live_run: tenant/principal/template/ correlation intact through a live run. Run with: A3S_CONFIG_FILE=/abs/.a3s/config.acl \ cargo test -p a3s-code-core --test test_real_llm_cluster_features \ -- --ignored --nocapture Verified locally: 5 passed against openai/MiniMax-M2.7-highspeed (155s).

claude added 27 commits May 28, 2026 10:57

style(sdk): cargo fmt node + python lib sources

213eb79

Reflow long lines in the BudgetGuard / disconnect_idle_mcp additions (281dc58) that the core-scoped pre-commit fmt hook did not check. Formatting only — no logic change (verified: only line wrapping).

ZhiXiao-Lin merged commit d2ed389 into main May 29, 2026
1 check passed

ZhiXiao-Lin deleted the release/v3.3.0 branch May 29, 2026 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v3.3.0 — cluster-grade runtime + review hardening#47

Release v3.3.0 — cluster-grade runtime + review hardening#47
ZhiXiao-Lin merged 27 commits into
mainfrom
release/v3.3.0

ZhiXiao-Lin commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZhiXiao-Lin commented May 29, 2026

Summary

Added

Fixed (review hardening)

Testing

Release

Known limitation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants