Release v3.3.0 — cluster-grade runtime + review hardening#47
Merged
Conversation
Turn AgentSession::close() from a thin cancel() wrapper into a real
cleanup path:
- New `closed: AtomicBool` state; `is_closed()` accessor.
- send / stream / send_with_attachments / stream_with_attachments
fast-fail with new `CodeError::SessionClosed { session_id }` once
closed.
- close() is idempotent and runs the full sequence: cancel current
run -> cancel all in-flight delegated subagent tasks -> cancel
pending HITL confirmations.
Tests cover the closed-state lifecycle (idempotency, post-close
send/stream fast-fail, close-mid-flight cancellation).
Make AgentSession the parent of all in-flight cancellation tokens so session shutdown cascades automatically and races between close() and new operations are safe. - New `session_cancel: CancellationToken` field on AgentSession; public `session_cancel_token()` accessor lets embedders observe or fire it directly. - BlockingRunContext and StreamRunContext derive their per-run token via `session.session_cancel.child_token()` instead of fresh tokens. - Run lifecycles classify a failed run as Cancelled vs Failed by sampling the per-run token's `is_cancelled()` flag before clearing it — so runs killed via session_cancel land as Cancelled in the run store, not Failed. - close() fires session_cancel before per-run bookkeeping so any concurrently-spawned child inherits an already-cancelled token. Tests cover the token lifecycle and propagation to in-flight runs.
Make agent-side proactive shutdown a first-class API: enumerate live sessions, close one by ID, or close the whole agent (and its global MCP) in a single call. - New `agent_api/session_close.rs` module: `SessionCloseHandle` is an Arc-shareable bundle of every field needed to perform the close sequence. AgentSession holds an `Arc<SessionCloseHandle>`; Agent stores `Weak<SessionCloseHandle>` in its registry. - Agent gains `sessions: Mutex<HashMap<id, Weak<SessionCloseHandle>>>` and `closed: AtomicBool`. New public methods: `list_sessions()`, `close_session(id)`, `close()`, `is_closed()`. - AgentSession::close() refactored to a single-line delegate to the shared handle, so `agent.close_session(id)` and `session.close()` can never drift in behaviour. - `Agent::close()` also walks `global_mcp.list_connected()` and disconnects every server so background workers exit. - `Agent::session()` / `resume_session()` fail fast with `CodeError::SessionClosed` once the agent is closed. - Dead `Weak` entries are pruned lazily on `list_sessions()` / `close_session()` access — no background sweeper needed. Tests cover live-tracking with drop-driven pruning, close-by-id, agent-wide close, and post-close session() rejection.
Propagate the new core close-and-registry surface to the Node and Python SDKs. Node (napi): - Agent: listSessions / closeSession / close / isClosed. - Session: isClosed. Python (pyo3): - Agent: list_sessions / close_session / close + is_closed getter. - Session: is_closed getter. README updated to show the agent-side lifecycle calls alongside the existing session.cancel() / session.close() examples.
Cover the surfaces unit tests can't reach:
- core/tests/test_session_close_lifecycle.rs (3 tests):
* close_with_subagent_in_flight_marks_task_cancelled_and_resists_regression
— verifies the session-close → subagent-tracker cross-module path,
including the "late SubagentEnd must not regress Cancelled" contract.
* agent_close_handles_global_mcp_branch_and_is_idempotent
— exercises Agent::close() both with and without global_mcp, and
confirms post-close session() rejection.
* session_drop_prunes_registry_under_concurrency
— multi-thread runtime spawns 32 sessions, drops half, asserts the
Weak-ref registry converges to exactly the held set.
- sdk/python/tests/test_session_close.py
- sdk/node/test_session_close.mjs
Smoke tests for the SDK wrappers: session.is_closed,
session.close() idempotency, agent.list_sessions(),
agent.close_session(id), agent.close() rejecting later session().
Verified runnable locally (maturin develop / napi build).
Adds `AgentSession::subagent_tracker()` accessor so embedders with a
custom subagent execution path can feed lifecycle events through the
same close() pipeline as the built-in `task` tool — also unblocks the
in-flight integration test.
generated.d.ts is regenerated by `napi build` and now reflects the
step-4 Agent / Session close methods.
Close the framework gap that blocked 书安OS from migrating sessions: session.save() now also persists the InMemorySubagentTaskTracker snapshots, and resume_session restores them. - SessionStore trait: new save_subagent_tasks / load_subagent_tasks with no-op defaults (backwards compatible for custom backends). - MemorySessionStore + FileSessionStore implement them; FileSessionStore writes to <dir>/subagent_tasks/<id>.json and cleans up on delete(). - InMemorySubagentTaskTracker::replace_snapshots() — restoration entry point. Cancellers are intentionally NOT restored (runtime-only). - session_persistence: wired into both save() and restore_persisted_session_state(). Integration test (test_session_close_lifecycle::subagent_tasks_persist _across_save_and_resume) covers the full matrix of terminal states (Completed/Failed/Cancelled) and verifies the post-restore semantics: late cancel attempts on terminal tasks safely return false. Before this, restoring a session lost its delegated child run history — which is the materialized view 书安OS needs to make scheduling / billing / drain decisions on a session about to migrate.
…emplate / correlation (P5) Add four optional, framework-opaque string slots to SessionOptions -> AgentSession -> SessionData so the host (书安OS) can drive multi-tenancy / accounting / distributed tracing without string-hacking session_id. - SessionOptions builder helpers: with_tenant_id / with_principal / with_agent_template_id / with_correlation_id. - AgentSession accessors: tenant_id() / principal() / agent_template_id() / correlation_id(). - Round-trip via SessionData (#[serde(default, skip_serializing_if = "Option::is_none")]) — forward-compatible for stores written by older versions. - apply_persisted_runtime_options restores labels on resume but caller- supplied opts take precedence (so a resume can intentionally relabel). Framework only transports these strings — it never interprets tenant boundaries, enforces quotas, or routes by template. That belongs to custom HookExecutor / PermissionChecker / (future) BudgetGuard impls the host plugs in. Tests: unit (defaults + round-trip via SessionOptions) + integration (persist & restore through MemorySessionStore, including caller-side override).
…PassivationRequested / PeerInvocation (P6)
Extend AgentEvent with three variants that the host (书安OS) emits via
HookExecutor to surface platform-level decisions inside the session
loop:
- BudgetThresholdHit { resource, kind, consumed, limit, message? } —
host's BudgetGuard fired a soft/hard threshold. In-session policy
decides what to do (compact, refuse next LLM call, etc).
- PassivationRequested { reason, deadline_ms? } — host wants the
session to release in-memory caches before close/migration. Framework
does not act on this; in-session hooks can flush derived state.
- PeerInvocation { from_session_id, from_tenant_id?, correlation_id? }
— marks a send/stream that originated from a peer session in the
cluster, so hooks can distinguish human-driven from peer-driven work.
These variants are pure schema additions. The framework never produces
them itself — only the host's transport / control plane does. They
exist so cluster events flow through the same hook/trace pipeline as
agent-loop events, without coupling the framework to a transport.
#[non_exhaustive] on AgentEvent (already present) keeps minor releases
non-breaking. New variants use #[serde(default, skip_serializing_if =
"Option::is_none")] for optional fields to allow forward-compatible
producers.
Test: schema lock unit test verifies stable JSON tags (used by
external producers) and round-trip with minimal payloads.
Define the cluster-grade cost/quota contract a3s-code owns at the
framework level, with enforcement wired into the LLM call path. Real
implementations (per-tenant Redis budgets, per-day USD caps, etc) live
in 书安OS.
- core/budget.rs:
* BudgetGuard trait with default no-op methods (check_before_llm,
record_after_llm, check_before_tool).
* BudgetDecision { Allow, SoftLimit, Deny } — Allow proceeds silently,
SoftLimit emits a BudgetThresholdHit("soft") event and continues,
Deny bails with CodeError::BudgetExhausted.
* NoopBudgetGuard as the framework default.
- SessionOptions::with_budget_guard + AgentSession routes it through
AgentConfig.budget_guard.
- agent/llm_turn.rs::call_llm_with_circuit_breaker now consults the
guard *once per turn* (not per retry):
* Deny -> emit BudgetThresholdHit("hard") + return Err with
"Budget exhausted on '{resource}': {reason}"
* Soft -> emit BudgetThresholdHit("soft") + proceed
* Allow -> proceed silently
And calls record_after_llm with the real provider usage on success.
- New CodeError::BudgetExhausted variant for SDK-level type matching.
- Internal estimate_prompt_tokens helper (char/4 heuristic) — impls
needing precision should use record_after_llm instead.
Tests: NoopBudgetGuard fast-path; custom guard returning Deny; full
integration via AgentSession::send verifies the LLM is never reached,
history stays clean, record_after_llm doesn't fire on Deny.
The framework still does not interpret tenants / cost / time itself —
the trait is the boundary between framework mechanism and host policy.
Lift the framework's ambient capabilities — fresh IDs and current time —
out of `uuid::Uuid::new_v4()` / `SystemTime::now()` direct calls and
into a host-provided pair. Unlocks two cluster-grade features:
1. Deterministic replay of a run on another node (book-keep the seed,
replay bit-identical).
2. Time-bending tests without monkey-patching std::time.
- core/host_env.rs:
* `IdGenerator` + `Clock` traits (Send+Sync+Debug).
* `HostEnv { id_generator, clock }` bundle so a single Arc slot
plumbs both.
* Defaults: `SystemIdGenerator` (UUID v4) + `SystemClock` (wall
time) — observably identical to pre-P2 behaviour.
* Replay helpers: `SequentialIdGenerator { prefix, counter }` +
`FixedClock { atomic now_ms }`. Public so external host crates
(e.g. 书安OS replay) can use without copying the pattern.
- AgentConfig gains `host_env: Arc<HostEnv>` defaulting to
`HostEnv::system()`. SessionOptions::with_host_env() lets the host
swap it.
- Migrated two highest-leverage call sites (proof-of-pattern; the rest
is leaf work that can move incrementally):
* session_id generation in session_builder::prepare_session_options
* run_id generation in RunControlState::start_run via new
`InMemoryRunStore::create_run_with_id` overload (back-compat alias
`create_run` kept).
Tests: trait roundtrips (Seq/FixedClock determinism), and a full
integration that wires SequentialIdGenerator into a fresh Agent and
verifies the resulting sessions have session_id "test-0", "test-1".
The framework still does not interpret IDs — it just consults the
generator. Replay infrastructure (P3) lives on top of this contract.
Land the data contract and persistence wiring for crash-tolerant
runs. After each completed tool round the agent loop snapshots the
boundary into a LoopCheckpoint keyed by run_id; 书安OS picks this up
to migrate / replay a run on another node.
Boundary policy: checkpoints are taken **only between** tool rounds,
never mid-tool. If a process dies mid-tool the work of that round is
lost — the LLM re-deliberates from the previous checkpoint. This
trades retry cost for correctness; re-executing a non-idempotent
tool (write, bash) across the boundary is worse than re-asking.
What this cut delivers:
- core/loop_checkpoint.rs:
* `LoopCheckpoint` struct (serde, schema_version=1 with forward-
compatible `#[serde(default)]`).
* `LoopCheckpointSink` trait — save_checkpoint / load_latest.
* `SessionStoreCheckpointSink` default adapter — forwards into a
`SessionStore`; failures are warn-logged but never halt a run.
- `SessionStore` trait: save_loop_checkpoint / load_loop_checkpoint
with no-op defaults (backwards compatible). MemorySessionStore +
FileSessionStore implement them; FileSessionStore writes under
`<dir>/loop_checkpoints/<run_id>.json` (keyed by run_id, not
session_id — multiple runs per session).
- `AgentLoop`: new `checkpoint_sink` + `checkpoint_run_id` fields;
builder helpers `with_checkpoint_sink` + `set_checkpoint_run`.
`build_agent_loop` auto-wires a sink from `session.session_store`
when one is configured. BlockingRunContext + StreamRunContext bind
the run id via `set_checkpoint_run` after `start_run` returns.
- `loop_runtime::execute_loop_inner` calls
`persist_loop_checkpoint` after each successful `execute_tool_turn`,
capturing `(turn, messages, total_usage, tool_calls_count,
verification_reports, host_env.now_ms())`. Checkpoint write is
fire-and-forget — sink errors don't propagate to the loop.
Tests:
- core/src/loop_checkpoint.rs: JSON round-trip + forward-compat
(missing schema_version defaults to 0).
- integration:
* loop_checkpoint_round_trips_through_session_store — full store
contract: save → load → identical, second save overwrites,
unknown run_id → None.
* send_without_tool_calls_does_not_emit_loop_checkpoint —
contractual negative: no tool rounds → no checkpoint pollution.
Cut 2 (separate commit) will add `AgentSession::resume_run(run_id)`
to actually pick up from a checkpoint, plus the end-to-end "crash
mid-run on node A, finish on node B" integration test that needs a
tool-using mock LLM. Data contract lands now so 书安OS can start
building the surrounding control plane.
…3 cut 2)
Land the resume side of the loop-checkpoint contract. After P3 cut 1
the framework persisted per-tool-round boundaries; this cut lets a
new process pick up that boundary and continue.
API:
- `AgentSession::resume_run(checkpoint_run_id: &str) -> Result<AgentResult>`
loads the latest `LoopCheckpoint` for the given run id from the
session's `SessionStore` and replays the agent loop from the
checkpoint's `messages`.
Semantics:
- A **new** run id is allocated for the resumed work — the framework
does not pretend the old run continues. The relationship between
old and new run is host metadata (e.g. 书安OS tracks it).
- The new run also writes per-tool-round checkpoints (P3 cut 1
wiring), so a session can survive multiple node failures.
- Two distinguishable error paths so cluster scheduling code can
branch:
* "resume_run requires a session_store" — host should fall back
to a fresh session.
* "no loop checkpoint found for run 'X'" — host can retry later
(race against checkpoint write) or treat the run as lost.
Implementation:
- `conversation_runtime::resume_run` mirrors `send_with_attachments`'
structure (BlockingRunContext + execute_from_messages) but feeds
`checkpoint.messages` as the prebuilt message list, so the loop
resumes at exactly the boundary state instead of appending a new
user prompt.
Integration test (`resume_run_error_paths_are_distinguishable`)
covers both error branches so the strings stay stable for host-side
matching.
Together with cut 1, P3 now provides the full mechanism 书安OS needs:
the framework persists boundary state automatically during a live
run, and a fresh node can replay from any persisted boundary by
calling resume_run on a freshly-created session bound to the same
store. The framework still does not handle node selection, drain
choreography, or run-graph metadata — those remain 书安OS concerns.
Mirror the P5 (identity labels) and P3 cut 2 (resume_run) framework additions through both SDKs so host code can drive them from JS/TS or Python without reaching into the Rust core. Node (napi): - SessionOptions gains optional `tenantId / principal / agentTemplateId / correlationId` fields. js_session_options_to_rust forwards each via the matching `with_*` builder. - Session gets four read-only getters with the same names plus an async `resumeRun(checkpointRunId)` that returns the AgentResult of the resumed run. - generated.d.ts regenerated by `napi build` so TypeScript callers see the new types. Python (pyo3): - PySessionOptions gains `tenant_id / principal / agent_template_id / correlation_id` with paired getter/setters (matching the existing session_id pattern). build_rust_session_options forwards each. - PySession exposes the same four labels as `@getter` properties plus a `resume_run(checkpoint_run_id)` method that raises RuntimeError on missing store / missing checkpoint (the framework's two distinguishable error strings stay intact for host-side branching). Verification: Node 27 unit tests + Python 19 unit tests still green; both SDKs pass `cargo clippy --lib -- -D warnings`.
Add a quick-reference example to the "Main APIs At A Glance" snippets
showing the new identity labels (tenant_id / principal /
agent_template_id / correlation_id) and `resume_run` flow. Same in
both the Python and TypeScript mirror sections so SDK callers see
the surface immediately.
Detailed semantics live in apps/docs/content/docs/{en,cn}/code/
api-contract.mdx ("Cluster-grade extension points").
Long-running sessions accumulated three classes of in-memory state without bound: run records + per-run event buffers in InMemoryRunStore, trace events in InMemoryTraceSink, and terminal subagent task snapshots in InMemorySubagentTaskTracker. Fine for short sessions, a memory leak for the cluster workloads 书安OS is expected to host (hours / days / thousands per node). This commit lands FIFO eviction caps that the host opts into per session — defaults stay unbounded so existing callers see no behaviour change. API: - New `retention::SessionRetentionLimits` struct with four optional caps: max_runs_retained, max_events_per_run, max_trace_events, max_terminal_subagent_tasks. - `SessionOptions::with_retention_limits(...)` plumbs them into AgentConfig via session_builder + capabilities. Eviction policy (oldest-first, idempotent): - InMemoryRunStore: parallel VecDeque tracks insertion order; on create_run past the cap, oldest run + its events are dropped. Per-run events are FIFO-trimmed in record_event so the buffer stays at most max_events_per_run. `event_count` on RunSnapshot remains the cumulative total — not decremented on eviction. replace_records (resume path) rebuilds the queue in creation order so restored sessions honour the same cap. - InMemoryTraceSink: drain front past cap; preserves the most recent (most useful for debugging) events. drain(..excess) is O(n) per push at cap; switch to VecDeque if the trace path becomes a perf bottleneck. - InMemorySubagentTaskTracker: new terminal_order VecDeque records every Running → terminal transition (Completed / Failed / Cancelled). Running tasks are **never** evicted — only terminal snapshots. cancel() and the SubagentEnd record_event path both participate idempotently (a late SubagentEnd after cancel doesn't double-push). Tests: - Three unit-test blocks (one per store) cover: cap enforcement, FIFO ordering, default-unbounded behaviour, and the running-task-immunity property of the subagent tracker. - Integration test `retention_limits_are_plumbed_into_subagent_tracker` exercises the public SessionOptions → AgentSession → tracker path end-to-end via the same accessor 书安OS would use. Test count: 1692 unit (+9 new) + 9 integration (+1 new), clippy clean on `--lib -- -D warnings`.
The P3 cut-2 resume_run API only had error-path tests
(missing store / missing checkpoint). Lock the happy-path
contract too — write a LoopCheckpoint, call resume_run, verify
the loop picks up from the checkpointed messages.
test_resume_run_picks_up_from_persisted_checkpoint:
- Seed a LoopCheckpoint in MemorySessionStore with messages
representing one prior tool round.
- Build a session bound to the same store with a StaticStreaming
mock LLM that produces a final-answer text.
- Call session.resume_run(seeded_run_id).
- Assert:
* AgentResult.text matches the mock's final response — proving
the loop fed the checkpoint's messages to the LLM via
execute_from_messages and ran to completion.
* runs() contains exactly one new run whose id is NOT the
seeded run id — framework allocates fresh, never pretends to
continue the old run.
* The seeded checkpoint stays in the store under the old run
id — resume does not delete; retention is the host's call.
Crash simulation is reduced to a manual checkpoint seed because
the in-process agent loop has no "die mid-round" affordance
suitable for unit testing. The contract surface that 书安OS will
sit on (write on node A, hand run id to node B, resume) is fully
exercised through the public API.
1693 lib tests + 9 integration green; clippy clean.
Hosts running thousands of long-lived sessions accumulate MCP subprocesses + transport connections even when individual servers fall quiet. Add a periodic-sweep API the host calls (e.g. every 60s) to drop servers whose last activity is older than a threshold. McpManager: - New `last_used_at_ms: HashMap<String, u64>` parallel to `clients`. Stamped on `connect` (initial use) and on every successful `call_tool` (active use). Cleared on `disconnect`. - `pub async fn last_used_at_ms(name) -> Option<u64>` for host-side observability (e.g. dashboards / Prometheus scrapes). - `pub async fn touch(name)` so hosts can mark a server as warm out-of-band when activity comes via a side channel. - `pub async fn disconnect_idle(threshold_ms) -> Vec<String>` — iterates connected clients, drops every one whose timestamp is older than `now - threshold_ms` (or has no timestamp at all — treated as infinitely idle). Per-server disconnect failures are warn-logged but never panic; the result vec lists every name attempted. Agent facade: - New `Agent::disconnect_idle_mcp(threshold_ms) -> Vec<String>` forwards to the global manager when present, else returns empty. This is the entry point a host's idle-reaper will call. Tests: - Three unit tests covering touch monotonicity, idle-sweep no-op with no live clients, and disconnect cleanup of the timestamp entry. - One Agent-level test covering the no-global-mcp short-circuit (contract surface for hosts). Uses a local `now_epoch_ms` helper rather than HostEnv (the manager predates HostEnv wiring; threading the host's Clock through to MCP is a separate change if needed for replay). 1697 unit + 9 integration tests green; clippy clean.
…Guard
Propagate the framework-side BudgetGuard contract through pyo3 so
Python hosts (the most likely 书安OS surface) can plug in cluster-
wide cost / quota policy without writing Rust.
Bridge:
- `PyBudgetGuard { inner: Py<PyAny> }` implements
a3s_code_core::budget::BudgetGuard via the async-trait. Each
method acquires the GIL (Python::with_gil), looks up the named
method on the held PyObject by `getattr`, and calls it. Missing
methods behave as Allow / no-op so user classes only need to
define what they actually want to govern.
- `parse_py_budget_decision` parses the return value:
None | {"decision": "allow"} → Allow
{"decision": "soft", "resource", "consumed", "limit", "message"?} → SoftLimit
{"decision": "deny", "resource", "reason"} → Deny
Unknown / malformed shapes default to Allow (fail-safe).
- Python callback raising never propagates: warning is printed to
stderr, decision falls back to Allow / record_after_llm becomes
no-op. A misbehaving guard cannot halt a live session.
SDK surface:
- PySessionOptions gains `budget_guard: Option<PyObject>` with
matching `@getter` / `@setter`. build_rust_session_options wraps
the held callable as `Arc<dyn BudgetGuard>` and calls
with_budget_guard.
Test (sdk/python/tests/test_budget_guard.py):
- DenyingGuard returns {"decision":"deny", ...} from
check_before_llm; session.send raises RuntimeError mentioning
"Budget exhausted" / "llm_tokens"; check_before_llm fires
exactly once; record_after_llm does not fire; session_id
propagates correctly.
- AllowingGuard with only no-op methods constructs a session
cleanly — proves missing methods are tolerated.
- SessionOptions.budget_guard round-trips through getter/setter
including None reset.
GIL acquisition blocks the tokio worker thread briefly per call.
Acceptable here because BudgetGuard fires at most once per LLM
turn / tool call, not on hot tool execution paths.
Python SDK: 19 cargo tests + smoke test green, clippy clean.
Propagate the framework-side BudgetGuard contract through napi-rs so
JS hosts can plug in cluster-wide cost / quota policy. The bridge
follows the same pattern as Python's PyBudgetGuard but uses
ThreadsafeFunction for cross-thread JS calls.
Framework support (one small core addition):
- AgentSession::set_budget_guard(Option<Arc<dyn BudgetGuard>>) +
budget_guard() — runtime-mutable override slot. Read by
build_agent_loop at every send/stream, takes precedence over
config.budget_guard. Needed because JsFunction values can't live
inside the value-typed SessionOptions.
Node SDK surface:
- `session.setBudgetGuard({checkBeforeLlm, recordAfterLlm,
checkBeforeTool})` — all three handlers optional; missing methods
fall back to Allow / no-op. Pass `null` for the whole arg to
clear the guard.
- Returns `BudgetDecision`:
`null` | `{decision:'allow'}` → Allow
`{decision:'soft', resource, consumed, limit, message?}` → emits
`BudgetThresholdHit('soft')`, proceeds
`{decision:'deny', resource, reason}` → aborts with
"Budget exhausted" error
- `BudgetGuardHandlers` napi(object) shape — typed in
generated.d.ts.
Bridge internals (NodeBudgetGuard):
- ThreadsafeFunction<serde_json::Value, ErrorStrategy::Fatal> per
handler. `Fatal` (not CalleeHandled) so the JS callback receives
*only* the positional args — no leading Node-style `err`.
- Args fanned out as `serde_json::Value::Array(...)`; the transformer
closure unpacks the array into multiple positional JsUnknowns.
- Rust async trait method blocks via `tokio::task::block_in_place`
on a sync `mpsc::sync_channel` waiting for the JS callback's
return value. 5-second timeout falls back to Allow.
- Decisions parsed with the same shape as Python:
- `parse_js_budget_decision(JsUnknown) -> BudgetDecision`
- Unknown / malformed shapes fall back to Allow (fail-safe).
Tests:
- Unit: `test_runtime_budget_guard_overrides_session_options_value`
verifies the runtime override slot wins over config.budget_guard
and that clearing it (`set_budget_guard(None)`) restores the
unguarded path.
- Smoke: `sdk/node/test_budget_guard.mjs` installs a JS guard whose
checkBeforeLlm denies, asserts `session.send` throws
`Budget exhausted`, checkBeforeLlm fires exactly once, and
recordAfterLlm doesn't fire. Then verifies `setBudgetGuard(null)`
is accepted without error.
generated.d.ts regenerated by `napi build:debug` — `setBudgetGuard`
and `BudgetGuardHandlers` show up with the documented TS shape.
Core: 1698 unit tests pass; Node: 27 cargo tests + smoke green;
clippy clean across core and Node SDK.
Extend the Python and TypeScript "Main APIs at a Glance" snippets
with two new sections so the recent additions are discoverable:
- (15) Long-running session ops — SessionRetentionLimits +
agent.disconnectIdleMcp / disconnect_idle_mcp.
- (16) Budget / cost governance — Python opts.budget_guard
class-shape and Node session.setBudgetGuard({...}) handler-shape,
both showing Deny → "Budget exhausted" and the implicit Allow
fallthrough.
Detailed semantics live in apps/docs/content/docs/{en,cn}/code/
api-contract.mdx (covered in the docs-site commit on the parent
repo).
Propagate the framework-side retention caps through both SDKs so host code can stop long-running cluster sessions from accumulating in-memory state. Each cap is optional; missing fields keep the framework's unbounded default for that store. Python (pyo3): - PySessionOptions gains `retention_limits: Option<PyObject>` with matching @Getter / @Setter. - `parse_py_retention_limits` accepts a dict shape with optional integer keys: max_runs_retained / max_events_per_run / max_trace_events / max_terminal_subagent_tasks. Unknown / non-int values are ignored (no error), missing keys keep the default. - Forwarded via SessionOptions::with_retention_limits. Node (napi): - New `#[napi(object)] RetentionLimitsObject` with four optional `u32` fields (TypeScript-friendly), generated into the .d.ts as `RetentionLimitsObject` and surfaced on SessionOptions as `retentionLimits?: RetentionLimitsObject`. - js_session_options_to_rust converts each present field to usize and forwards via with_retention_limits. Verification: - Python SDK 19 cargo tests pass; clippy clean. - Node SDK 27 cargo tests pass; clippy clean. - generated.d.ts exposes the new types with the documented camelCase shape (retentionLimits + maxRunsRetained etc.).
Add a single integration test that exercises the **full** cluster-
grade API surface in one realistic two-node lifecycle. This is the
reference flow 书安OS-side scheduling code targets — every new piece
shipped in the cluster pillars work participates.
`cluster_ops_consolidated_session_lifecycle`:
Node A (one Agent):
- Creates a session with identity labels (tenant_id / principal /
agent_template_id / correlation_id), retention caps, and a
shared MemorySessionStore.
- Injects a Completed subagent task into the tracker.
- Persists state via session.save().
- Seeds a LoopCheckpoint for an "in-flight" run.
- Drops the agent (simulating node failure / drain).
Node B (a *different* Agent on the same store):
- agent_b.resume_session("cluster-ops-target", opts) — Node B
hydrates the session from the shared store.
- Asserts identity labels survive verbatim.
- Asserts the subagent task history (Completed status) survives.
- Asserts the LoopCheckpoint persists across the migration with
run id, turn count, message vec, and token usage intact —
this is the contract resume_run reads from.
The test deliberately doesn't *call* resume_run() because the test
config has no real LLM credentials — that flow is covered by
test_resume_run_picks_up_from_persisted_checkpoint in unit tests
(uses build_session + mock streaming client). Here we lock the
**data contract** for cross-node migration: identity, subagent
view, and checkpoint state must round-trip through the store with
no framework-level interpretation.
Test count: 1698 lib + 10 integration (+1 new), clippy clean.
Adversarial multi-dimension review of the cluster-pillars batch surfaced 11 confirmed issues; this commit fixes every core-side one. HIGH: - H4 Checkpoint leak (unbounded growth). Loop checkpoints were written after every tool round and NEVER deleted — the dominant memory/disk leak for long-running hosts. Added SessionStore::delete_loop_checkpoint (memory + file impls); the run lifecycle now clears the checkpoint on any in-process terminal (complete/cancel/fail) in BlockingRunLifecycle ::complete and StreamRunLifecycle::wrap. Only a true crash (loop never returns) leaves a checkpoint for crash-recovery resume. Also: FileSessionStore::save_loop_checkpoint is now crash-atomic (temp + fsync + rename) — a checkpoint exists to survive a crash, so a half-written one (plain fs::write) that fails to parse defeats the purpose. (completeness-critic finding folded in.) - H3 event_count corruption. replace_records overwrote the persisted CUMULATIVE event_count with the (possibly trimmed) buffer length, corrupting audit counts for any restored run whose buffer hit max_events_per_run. Deleted the offending line; trust the snapshot. - H2 resume_run dropped checkpoint metrics. execute_from_messages built a fresh ExecutionLoopState (zeroed total_usage / tool_calls_count), so resumed runs under-reported cumulative cost. Added ExecutionSeed + ExecutionLoopState::new_seeded and threaded it through execute_from_messages_seeded -> BlockingRunContext -> resume_run, so a resumed run continues accounting from the checkpoint. MEDIUM: - M1 subagent eviction TOCTOU. mark_terminal_and_evict took terminal_order/tasks/cancellers locks separately, letting a concurrent record_event re-insert an evicted victim. Now holds all three together in one canonical order (callers drop their guards first → no deadlock). - M2 run-store eviction TOCTOU + lock-ordering. create_run_with_id locked runs/events separately (and in the opposite nesting order from the rest); now holds order+events+runs together for insert+evict in one canonical order. - M3 MCP timestamp leak. touch() records a timestamp unconditionally (even for a never-connected name) and disconnect_idle only scanned clients.keys(); orphan timestamps leaked. disconnect_idle now retains last_used_at_ms to live clients. LOW: - L1 Session registry dangling Weaks. close_agent now retain()s the registry before snapshotting handles. Tests: store delete + crash-atomic (no temp leftovers); lifecycle clears checkpoint on completion (deterministic run id); event_count preserved across replace_records after trim; resume_run carries non-zero checkpoint metrics (1002 not 2); two multi-thread concurrency stress tests guard the eviction lock-ordering changes against deadlock; MCP orphan-timestamp purge. 1705 lib + 10 integration green; clippy clean.
…/L2)
Closes the SDK-side findings from the cluster-pillars review.
H1 — Node BudgetGuard fail-OPEN hole (was: hung/slow guard silently
ALLOWED, disabling budget enforcement):
- call_decision now fails CLOSED: a guard that does not respond within
timeoutMs -> Deny{budget_guard_timeout}; an unreadable return value
-> Deny{budget_guard_error}. Previously both defaulted to Allow.
- timeoutMs is configurable per guard (BudgetGuardHandlers.timeoutMs,
default 5000).
- Callbacks now receive a single context object — checkBeforeLlm({
sessionId, estimatedTokens }), recordAfterLlm({ sessionId, usage }),
checkBeforeTool({ sessionId, toolName }).
- Documented napi-rs constraint: a JS throw from a guard callback
aborts the host process at return-value conversion (true under BOTH
Fatal and CalleeHandled in this napi version — empirically verified),
so callbacks MUST NOT throw; wrap in try/catch and return a decision.
The fail-closed timeout still covers hangs. Python is unaffected
(PyBudgetGuard catches exceptions).
L2 — Python BudgetGuard re-entrancy doc: warn that calling session/agent
APIs from inside a guard callback risks GIL re-entrancy deadlock.
M4 — disconnect_idle_mcp now exposed in BOTH SDKs (was core-only, yet
already referenced by the docs):
- Node: agent.disconnectIdleMcp(idleThresholdMs) -> Promise<string[]>
- Python: agent.disconnect_idle_mcp(idle_threshold_ms) -> list[str]
The docs that referenced these now describe real methods.
Verification: Node 27 + Python 19 cargo tests, clippy clean on both;
Node smokes (budget deny path + fail-closed config + disconnectIdleMcp)
and all three Python smokes (budget guard, session close +
disconnect_idle_mcp, subagent query) pass against a fresh maturin build.
generated.d.ts regenerated (setBudgetGuard ctx shape + timeoutMs +
disconnectIdleMcp).
Reflow long lines in the BudgetGuard / disconnect_idle_mcp additions (281dc58) that the core-scoped pre-commit fmt hook did not check. Formatting only — no logic change (verified: only line wrapping).
Bump all packages 3.2.1 -> 3.3.0 and add the CHANGELOG entry. No push, no tag — release prep only. Version sync (scripts/check_release_versions.sh green at 3.3.0): - core/Cargo.toml, sdk/node/Cargo.toml (+ core dep pin), sdk/python/Cargo.toml (+ core dep pin) - sdk/node/package.json (+ @a3s-lab/code-* optionalDependencies) - sdk/python/pyproject.toml - sdk/python-bootstrap/pyproject.toml + _bootstrap.py __version__ - Cargo.lock, sdk/node/package-lock.json, sdk/node/examples/package-lock.json CHANGELOG 3.3.0 documents the cluster-grade runtime batch (session/agent lifecycle, identity labels, BudgetGuard, HostEnv, loop checkpoints + resume_run, retention caps, MCP idle disconnect, cluster AgentEvents, subagent persistence) plus the adversarial-review hardening fixes, and notes the Node BudgetGuard no-throw limitation. Minor bump per semver: all additions are backward compatible (new methods, new optional fields, SessionStore trait methods with default no-op impls).
Add #[ignore] integration tests (core/tests/test_real_llm_cluster_features.rs)
that exercise the 3.3.0 LLM-loop features against a real model from
.a3s/config.acl — validating paths mock clients cannot:
- real_budget_guard_allow_records_actual_usage: record_after_llm receives
the provider's ACTUAL non-zero token usage (mocks return fixed/zero).
- real_budget_guard_deny_blocks_llm_call: Deny aborts before the provider
is contacted; no usage, no history.
- real_resume_run_carries_checkpoint_metrics_forward: resume_run against
the live model continues cumulative metrics from the checkpoint.
- real_run_with_store_leaves_no_dangling_checkpoint: completed real run
clears its loop checkpoint (leak-fix lifecycle, end-to-end).
- real_identity_labels_survive_live_run: tenant/principal/template/
correlation intact through a live run.
Run with:
A3S_CONFIG_FILE=/abs/.a3s/config.acl \
cargo test -p a3s-code-core --test test_real_llm_cluster_features \
-- --ignored --nocapture
Verified locally: 5 passed against openai/MiniMax-M2.7-highspeed (155s).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
a3s-code v3.3.0 — cluster-grade runtime features for hosting long-lived
agent sessions across nodes, plus an adversarial-review hardening pass.
All additions are backward compatible (new methods, new optional
fields,
SessionStoretrait methods with default no-op impls) → minor bump.Full details in
CHANGELOG.md. Highlights:Added
AgentSession::close()(cancels run + subagent tasks +HITL), agent-side session registry (
list_sessions/close_session/close),session-level
CancellationTokenhierarchy.tenant_id/principal/agent_template_id/correlation_idon
SessionOptions+SessionData(opaque, host-driven), exposed on both SDKs.BudgetGuardcost/quota contract wired at the LLM call site (Deny →BudgetExhausted, SoftLimit → event). SDK bridges (Python class, NodesetBudgetGuard, fail-closed).HostEnv(IdGenerator + Clock) injection for deterministic replay.resume_run: per-tool-round persistence (crash-atomicfile writes), resume from the last boundary on any node sharing the store.
SessionRetentionLimits: FIFO caps on run/event/trace/terminal-subagentstate. MCP idle disconnect. Cluster
AgentEventvariants. Subagenttracker persistence.
Fixed (review hardening)
event_countcorruption on restore. Node BudgetGuard fail-open → fail-closed.Weakpruning. Eviction TOCTOU(atomic multi-lock).
Testing
all SDK smokes; clippy clean (core + both SDKs);
cargo fmt --checkclean.#[ignore]tests pass againstopenai/MiniMax-M2.7-highspeed(BudgetGuard real usage, deny-blocks-call,resume_run metric carry-forward, checkpoint clear, identity labels).
scripts/check_release_versions.sh).Release
Merging this, then pushing the
v3.3.0tag, triggersrelease.yml(crates.io → npm → GH Release wheels → PyPI bootstrap → GitHub Release).
Known limitation
Node
BudgetGuardcallbacks must not throw (napi return-conversion aborts theprocess); wrap in try/catch and return a decision. Hangs are fail-closed.
Python is unaffected.