diff --git a/CHANGELOG.md b/CHANGELOG.md index ab163ba..5dc3b18 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,96 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [3.3.0] - 2026-05-29 + +Cluster-grade runtime: everything needed for a host platform (e.g. 书安OS) +to run long-lived agent sessions across many nodes — graceful shutdown, +multi-tenant identity, cost governance, deterministic replay, crash-tolerant +runs, and bounded in-memory state — plus an adversarial-review hardening +pass. All additions are backward compatible (new methods, new optional +fields, new `SessionStore` trait methods with default no-op impls). + +### Added + +- **Session / Agent lifecycle control.** + - `AgentSession::close()` is now a full graceful stop: flips `is_closed` + (further `send`/`stream` fast-fail with `CodeError::SessionClosed`), + cancels the active run, all in-flight delegated subagent tasks, and + pending HITL confirmations. `AgentSession::is_closed()` accessor. + - Agent-side session registry: `Agent::list_sessions()`, + `close_session(id)`, `close()` (also disconnects global MCP), and + `is_closed()`. Sessions are tracked by `Weak` ref and pruned lazily. + - Session-level `CancellationToken` parent: every run derives its token + via `child_token()`, so `close()` cascades to all in-flight work. + `AgentSession::session_cancel_token()` exposes it for embedders. +- **Host-provided identity labels** — `tenant_id`, `principal`, + `agent_template_id`, `correlation_id` on `SessionOptions` (builder + methods + accessors), persisted in `SessionData`, restored on resume. + Framework treats them as opaque; the host drives multi-tenant + aggregation / billing / tracing. Exposed on both SDKs. +- **`BudgetGuard` cost/quota contract** (`budget` module) — host-supplied + `check_before_llm` / `record_after_llm` / `check_before_tool`, consulted + at the LLM call site. `Deny` aborts with `CodeError::BudgetExhausted`; + `SoftLimit` emits an event and proceeds. SDK bridges: a Python class + (`opts.budget_guard`) and Node `session.setBudgetGuard({...})`. The Node + bridge fails **closed** (timeout / unreadable return → deny). +- **`HostEnv` (IdGenerator + Clock) injection** (`host_env` module) — + replace the default UUID + wall-clock pair for deterministic replay of a + run on another node. `SequentialIdGenerator` / `FixedClock` helpers. +- **Loop checkpoints + run resumption** (`loop_checkpoint` module) — the + agent loop persists a `LoopCheckpoint` after each completed tool round + (when a `SessionStore` is configured); `AgentSession::resume_run(run_id)` + replays from the last boundary on any node sharing the store, continuing + cumulative token/tool-call accounting. `SessionStore` gains + `save/load/delete_loop_checkpoint`; file writes are crash-atomic. +- **`SessionRetentionLimits`** (`retention` module) — optional FIFO caps on + the in-memory run store (runs + per-run events), trace sink, and terminal + subagent task snapshots, so long-running sessions don't grow unbounded. + Exposed on both SDKs. Default is unbounded (no behavior change). +- **MCP idle disconnect** — `McpManager::disconnect_idle(threshold_ms)` and + `Agent::disconnect_idle_mcp(...)` (both SDKs) reap quiet MCP servers + (releasing FDs / background workers) while keeping their config for + on-demand reconnect. +- **Cluster `AgentEvent` variants** — `BudgetThresholdHit`, + `PassivationRequested`, `PeerInvocation`: platform-level events a host + emits via `HookExecutor` so in-session code can react uniformly. +- `SessionStore` now persists the subagent task tracker across + save/resume (`save/load_subagent_tasks`), so a migrated session keeps a + queryable history of its delegated child runs. +- New errors: `CodeError::SessionClosed`, `CodeError::BudgetExhausted`. + +### Changed + +- `resume_run` continues cumulative metrics (`total_usage`, + `tool_calls_count`) from the checkpoint instead of restarting at zero. +- Run-store and subagent-tracker FIFO eviction now hold their parallel + maps under a single canonical lock order, so eviction is atomic with + respect to concurrent record/cancel (no transient map inconsistency). + +### Fixed + +- **Loop checkpoint leak**: checkpoints were written after every tool round + but never deleted — unbounded disk/memory growth on every completed run. + They are now removed when a run reaches a terminal state in-process; only + a true crash leaves one for resume. +- **`event_count` corruption**: restoring a session whose per-run event + buffer had been trimmed reset the cumulative `event_count` to the trimmed + length. The persisted cumulative count is now preserved. +- **Node `BudgetGuard` fail-open**: a hung or slow guard silently *allowed* + the LLM call (disabling enforcement). It now fails **closed** (deny) on + timeout and on an unreadable return. +- **MCP timestamp leak**: `touch()`-without-connect orphan timestamps are + now purged by `disconnect_idle`. +- Session registry dangling `Weak` entries are pruned on `Agent::close()`. + +### Known limitations + +- Node `BudgetGuard` callbacks **must not throw** — due to a napi-rs + constraint a thrown exception aborts the host process at return-value + conversion. Wrap guard logic in try/catch and return a decision. Hangs + are handled safely (fail-closed timeout). The Python `BudgetGuard` + catches exceptions and is unaffected. + ## [3.2.1] - 2026-05-24 ### Added diff --git a/Cargo.lock b/Cargo.lock index 99ac515..7c488e4 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -37,7 +37,7 @@ dependencies = [ [[package]] name = "a3s-code-core" -version = "3.2.1" +version = "3.3.0" dependencies = [ "a3s-acl 0.2.0", "a3s-ahp", @@ -349,9 +349,9 @@ dependencies = [ [[package]] name = "async-compression" -version = "0.4.41" +version = "0.4.42" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d0f9ee0f6e02ffd7ad5816e9464499fba7b3effd01123b515c41d1697c43dad1" +checksum = "e79b3f8a79cccc2898f31920fc69f304859b3bd567490f75ebf51ae1c792a9ac" dependencies = [ "compression-codecs", "compression-core", @@ -543,9 +543,9 @@ checksum = "1505bd5d3d116872e7271a6d4e16d81d0c8570876c8de68093a09ac269d8aac0" [[package]] name = "autocfg" -version = "1.5.0" +version = "1.5.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" +checksum = "f2032f911046de80f0a198e0901378627c33f59ea0ac00e363d481118bd70a53" [[package]] name = "aws-credential-types" @@ -561,9 +561,9 @@ dependencies = [ [[package]] name = "aws-runtime" -version = "1.7.3" +version = "1.7.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5dcd93c82209ac7413532388067dce79be5a8780c1786e5fae3df22e4dee2864" +checksum = "77ed8e8c52d2dc2390ad9f15647fe663f71e9780b4262c190fbb823a32721566" dependencies = [ "aws-credential-types", "aws-sigv4", @@ -578,7 +578,7 @@ dependencies = [ "bytes-utils", "fastrand", "http 0.2.12", - "http 1.4.0", + "http 1.4.1", "http-body 0.4.6", "http-body 1.0.1", "percent-encoding", @@ -589,9 +589,9 @@ dependencies = [ [[package]] name = "aws-sdk-s3" -version = "1.132.0" +version = "1.134.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5575840a3a6b11f6011463ebe359320dfe5b67babb5e9b06fed6ddf809a9ab40" +checksum = "be06bdfdf00371318253d74776567512d1229d1f3cd5546d27d333c89e013b84" dependencies = [ "aws-credential-types", "aws-runtime", @@ -612,7 +612,7 @@ dependencies = [ "hex", "hmac", "http 0.2.12", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "lru", "percent-encoding", @@ -624,9 +624,9 @@ dependencies = [ [[package]] name = "aws-sigv4" -version = "1.4.3" +version = "1.4.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "68dc0b907359b120170613b5c09ccc61304eac3998ff6274b97d93ee6490115a" +checksum = "b7083fb918b38474ac65ffbf8a69fc8792d36879f4ac5f1667b43aec61efe9a5" dependencies = [ "aws-credential-types", "aws-smithy-eventstream", @@ -638,7 +638,7 @@ dependencies = [ "hex", "hmac", "http 0.2.12", - "http 1.4.0", + "http 1.4.1", "percent-encoding", "sha2 0.11.0", "time", @@ -658,16 +658,16 @@ dependencies = [ [[package]] name = "aws-smithy-checksums" -version = "0.64.7" +version = "0.64.8" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "10efbbcec1e044b81600e2fc562a391951d291152d95b482d5b7e7132299d762" +checksum = "e9e8e65f4f81fcccdeb6c3eca2af17ac21d421a1786a26a394aecf421d616d3a" dependencies = [ "aws-smithy-http", "aws-smithy-types", "bytes", "crc-fast", "hex", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "http-body-util", "md-5 0.11.0", @@ -701,7 +701,7 @@ dependencies = [ "bytes-utils", "futures-core", "futures-util", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "http-body-util", "percent-encoding", @@ -720,7 +720,7 @@ dependencies = [ "aws-smithy-runtime-api", "aws-smithy-types", "h2 0.3.27", - "h2 0.4.13", + "h2 0.4.14", "http 0.2.12", "http-body 0.4.6", "hyper 0.14.32", @@ -734,10 +734,12 @@ dependencies = [ [[package]] name = "aws-smithy-json" -version = "0.62.5" +version = "0.62.6" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9648b0bb82a2eedd844052c6ad2a1a822d1f8e3adee5fbf668366717e428856a" +checksum = "517089205f18ab4adc5a3e02888cb139bbbbb2e168eac9f396216925d1fbeaf5" dependencies = [ + "aws-smithy-runtime-api", + "aws-smithy-schema", "aws-smithy-types", ] @@ -752,20 +754,21 @@ dependencies = [ [[package]] name = "aws-smithy-runtime" -version = "1.11.1" +version = "1.11.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "0504b1ab12debb5959e5165ee5fe97dd387e7aa7ea6a477bfd7635dfe769a4f5" +checksum = "b8e6f5caf6fea86f8c2206541ab5857cfcda9013426cdbe8fa0098b9e2d32182" dependencies = [ "aws-smithy-async", "aws-smithy-http", "aws-smithy-http-client", "aws-smithy-observability", "aws-smithy-runtime-api", + "aws-smithy-schema", "aws-smithy-types", "bytes", "fastrand", "http 0.2.12", - "http 1.4.0", + "http 1.4.1", "http-body 0.4.6", "http-body 1.0.1", "http-body-util", @@ -777,16 +780,16 @@ dependencies = [ [[package]] name = "aws-smithy-runtime-api" -version = "1.12.0" +version = "1.12.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b71a13df6ada0aafbf21a73bdfcdf9324cfa9df77d96b8446045be3cde61b42e" +checksum = "dc117c179ecf39a62a0a3f49f600e9ac26a7ad7dd172177999f83933af776c32" dependencies = [ "aws-smithy-async", "aws-smithy-runtime-api-macros", "aws-smithy-types", "bytes", "http 0.2.12", - "http 1.4.0", + "http 1.4.1", "pin-project-lite", "tokio", "tracing", @@ -804,18 +807,29 @@ dependencies = [ "syn 2.0.117", ] +[[package]] +name = "aws-smithy-schema" +version = "0.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7442cb268338f0eb8278140a107c046756aa01093d8ef5e99628d34ae09c94f5" +dependencies = [ + "aws-smithy-runtime-api", + "aws-smithy-types", + "http 1.4.1", +] + [[package]] name = "aws-smithy-types" -version = "1.4.7" +version = "1.4.8" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9d73dbfbaa8e4bc57b9045137680b958d274823509a360abfd8e1d514d40c95c" +checksum = "056b66dbce2f81cc0c1e2b05bb402eb58f8a3530479d650efadd5bbae9a4050b" dependencies = [ "base64-simd", "bytes", "bytes-utils", "futures-core", "http 0.2.12", - "http 1.4.0", + "http 1.4.1", "http-body 0.4.6", "http-body 1.0.1", "http-body-util", @@ -841,13 +855,14 @@ dependencies = [ [[package]] name = "aws-types" -version = "1.3.15" +version = "1.3.16" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2f4bbcaa9304ea40902d3d5f42a0428d1bd895a2b0f6999436fb279ffddc58ac" +checksum = "d16bf10b03a3c01e6b3b7d47cd964e873ffe9e7d4e80fad16bd4c077cb068531" dependencies = [ "aws-credential-types", "aws-smithy-async", "aws-smithy-runtime-api", + "aws-smithy-schema", "aws-smithy-types", "rustc_version", "tracing", @@ -863,10 +878,10 @@ dependencies = [ "axum-core", "bytes", "futures-util", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "http-body-util", - "hyper 1.9.0", + "hyper 1.10.0", "hyper-util", "itoa", "matchit", @@ -896,7 +911,7 @@ dependencies = [ "async-trait", "bytes", "futures-util", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "http-body-util", "mime", @@ -938,9 +953,9 @@ checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a" [[package]] name = "bitflags" -version = "2.11.0" +version = "2.11.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "843867be96c8daad0d758b57df9392b6d8d271134fce549de6ce169ff98a92af" +checksum = "c4512299f36f043ab09a583e57bceb5a5aab7a73db1805848e8fef3c9e8c78b3" [[package]] name = "block-buffer" @@ -975,9 +990,9 @@ dependencies = [ [[package]] name = "brotli" -version = "8.0.2" +version = "8.0.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4bd8b9603c7aa97359dbd97ecf258968c95f3adddd6db2f7e7a5bef101c84560" +checksum = "8119e4516436f5708bbc474a9d395bf12f1b5395e93a92a56e647ac3388c8610" dependencies = [ "alloc-no-stdlib", "alloc-stdlib", @@ -986,9 +1001,9 @@ dependencies = [ [[package]] name = "brotli-decompressor" -version = "5.0.0" +version = "5.0.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "874bb8112abecc98cbd6d81ea4fa7e94fb9449648c93cc89aa40c81c24d7de03" +checksum = "5962523e1b92ce1b5e793d9169b9943eece10d39f62550bc04bb605d75b94924" dependencies = [ "alloc-no-stdlib", "alloc-stdlib", @@ -1006,9 +1021,9 @@ dependencies = [ [[package]] name = "bumpalo" -version = "3.20.2" +version = "3.20.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5d20789868f4b01b2f2caec9f5c4e0213b41e3e5702a50157d699ae31ced2fcb" +checksum = "72f5acc6cb2ba439de613abc23857ec3d78374d8ed5ac84e9d11336e87da8649" [[package]] name = "byteorder" @@ -1037,9 +1052,9 @@ dependencies = [ [[package]] name = "cc" -version = "1.2.58" +version = "1.2.62" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e1e928d4b69e3077709075a938a05ffbedfa53a84c8f766efbf8220bb1ff60e1" +checksum = "a1dce859f0832a7d088c4f1119888ab94ef4b5d6795d1ce05afb7fe159d79f98" dependencies = [ "find-msvc-tools", "shlex", @@ -1152,9 +1167,9 @@ dependencies = [ [[package]] name = "clap" -version = "4.6.0" +version = "4.6.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b193af5b67834b676abd72466a96c1024e6a6ad978a1f484bd90b85c94041351" +checksum = "1ddb117e43bbf7dacf0a4190fef4d345b9bad68dfc649cb349e7d17d28428e51" dependencies = [ "clap_builder", "clap_derive", @@ -1174,9 +1189,9 @@ dependencies = [ [[package]] name = "clap_derive" -version = "4.6.0" +version = "4.6.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "1110bd8a634a1ab8cb04345d8d878267d57c3cf1b38d91b71af6686408bbca6a" +checksum = "f2ce8604710f6733aa641a2b3731eaa1e8b3d9973d5e3565da11800813f997a9" dependencies = [ "heck 0.5.0", "proc-macro2", @@ -1192,9 +1207,9 @@ checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9" [[package]] name = "cmov" -version = "0.5.3" +version = "0.5.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3f88a43d011fc4a6876cb7344703e297c71dda42494fee094d5f7c76bf13f746" +checksum = "0c9ea0ac24bc397ab3c98583a3c9ba74fa56b09a4449bbe172b9b1ddb016027a" [[package]] name = "colorchoice" @@ -1204,9 +1219,9 @@ checksum = "1d07550c9036bf2ae0c684c4297d503f838287c83c53686d05370d0e139ae570" [[package]] name = "compression-codecs" -version = "0.4.37" +version = "0.4.38" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "eb7b51a7d9c967fc26773061ba86150f19c50c0d65c887cb1fbe295fd16619b7" +checksum = "ce2548391e9c1929c21bf6aa2680af86fe4c1b33e6cea9ac1cfeec0bd11218cf" dependencies = [ "brotli", "compression-core", @@ -1216,9 +1231,9 @@ dependencies = [ [[package]] name = "compression-core" -version = "0.4.31" +version = "0.4.32" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "75984efb6ed102a0d42db99afb6c1948f0380d1d91808d5529916e6c08b49d8d" +checksum = "cc14f565cf027a105f7a44ccf9e5b424348421a1d8952a8fc9d499d313107789" [[package]] name = "concurrent-queue" @@ -1279,30 +1294,13 @@ dependencies = [ "libc", ] -[[package]] -name = "crc" -version = "3.3.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9710d3b3739c2e349eb44fe848ad0b7c8cb1e42bd87ee49371df2f7acaf3e675" -dependencies = [ - "crc-catalog", -] - -[[package]] -name = "crc-catalog" -version = "2.5.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "217698eaf96b4a3f0bc4f3662aaa55bdf913cd54d7204591faa790070c6d0853" - [[package]] name = "crc-fast" -version = "1.9.0" +version = "1.10.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2fd92aca2c6001b1bf5ba0ff84ee74ec8501b52bbef0cac80bf25a6c1d87a83d" +checksum = "e75b2483e97a5a7da73ac68a05b629f9c53cff58d8ed1c77866079e18b00dba5" dependencies = [ - "crc", "digest 0.10.7", - "rustversion", "spin", ] @@ -1361,9 +1359,9 @@ dependencies = [ [[package]] name = "crypto-common" -version = "0.2.1" +version = "0.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "77727bb15fa921304124b128af125e7e3b968275d1b108b379190264f4423710" +checksum = "ce6e4c961d6cd6c9a86db418387425e8bdeaf05b3c8bc1411e6dca4c252f1453" dependencies = [ "hybrid-array", ] @@ -1402,9 +1400,9 @@ dependencies = [ [[package]] name = "dashmap" -version = "6.1.0" +version = "6.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5041cc499144891f3790297212f32a74fb938e5136a14943f338ef9e0ae276cf" +checksum = "e6361d5c062261c78a176addb82d4c821ae42bed6089de0e12603cd25de2059c" dependencies = [ "cfg-if", "crossbeam-utils", @@ -1416,9 +1414,9 @@ dependencies = [ [[package]] name = "data-encoding" -version = "2.10.0" +version = "2.11.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d7a1e2f27636f116493b8b860f5546edb47c8d8f8ea73e1d2a20be88e28d1fea" +checksum = "a4ae5f15dda3c708c0ade84bfee31ccab44a3da4f88015ed22f63732abe300c8" [[package]] name = "deadpool" @@ -1487,7 +1485,7 @@ checksum = "f1dd6dbb5841937940781866fa1281a1ff7bd3bf827091440879f9994983d5c2" dependencies = [ "block-buffer 0.12.0", "const-oid", - "crypto-common 0.2.1", + "crypto-common 0.2.2", "ctutils", ] @@ -1514,9 +1512,9 @@ dependencies = [ [[package]] name = "displaydoc" -version = "0.2.5" +version = "0.2.6" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "97369cbbc041bc366949bc74d34658d6cda5621039731c6310521892a3a20ae0" +checksum = "1ac70aa55017e108007fbaf5aa0f54b021c98f92ff8af59d42eda9da96e3dd4f" dependencies = [ "proc-macro2", "quote", @@ -1552,9 +1550,9 @@ checksum = "b2972feb8dffe7bc8c5463b1dacda1b0dfbed3710e50f977d965429692d74cd8" [[package]] name = "either" -version = "1.15.0" +version = "1.16.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719" +checksum = "91622ff5e7162018101f2fea40d6ebf4a78bbe5a49736a2020649edf9693679e" [[package]] name = "encoding_rs" @@ -1625,19 +1623,18 @@ dependencies = [ [[package]] name = "fastrand" -version = "2.3.0" +version = "2.4.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be" +checksum = "9f1f227452a390804cdb637b74a86990f2a7d7ba4b7d5693aac9b4dd6defd8d6" [[package]] name = "filetime" -version = "0.2.27" +version = "0.2.29" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f98844151eee8917efc50bd9e8318cb963ae8b297431495d3f758616ea5c57db" +checksum = "5c287a33c7f0a620c38e641e7f60827713987b3c0f26e8ddc9462cc69cf75759" dependencies = [ "cfg-if", "libc", - "libredox", ] [[package]] @@ -1779,9 +1776,9 @@ checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" [[package]] name = "futures-timer" -version = "3.0.3" +version = "3.0.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f288b0a4f20f9a56b5d1da57e2227c661b7b16168e2f72365f57b63326e29b24" +checksum = "af43fadb8a98512d547e37b4e92e0ced13e205c061b87b4623eff01d918d6968" [[package]] name = "futures-util" @@ -1911,7 +1908,7 @@ dependencies = [ "futures-sink", "futures-util", "http 0.2.12", - "indexmap 2.13.0", + "indexmap 2.14.0", "slab", "tokio", "tokio-util", @@ -1920,17 +1917,17 @@ dependencies = [ [[package]] name = "h2" -version = "0.4.13" +version = "0.4.14" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2f44da3a8150a6703ed5d34e164b875fd14c2cdab9af1252a9a1020bde2bdc54" +checksum = "171fefbc92fe4a4de27e0698d6a5b392d6a0e333506bc49133760b3bcf948733" dependencies = [ "atomic-waker", "bytes", "fnv", "futures-core", "futures-sink", - "http 1.4.0", - "indexmap 2.13.0", + "http 1.4.1", + "indexmap 2.14.0", "slab", "tokio", "tokio-util", @@ -1969,6 +1966,12 @@ dependencies = [ "foldhash 0.2.0", ] +[[package]] +name = "hashbrown" +version = "0.17.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ed5909b6e89a2db4456e54cd5f673791d7eca6732202bbf2a9cc504fe2f9b84a" + [[package]] name = "heck" version = "0.4.1" @@ -2013,11 +2016,11 @@ dependencies = [ [[package]] name = "htmd" -version = "0.5.3" +version = "0.5.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "de550515ae03ff01fb033658945ba393c8db391297978a1f988ecb436e072f87" +checksum = "7eee9b00ee2e599b4f86507157e3db786e7a3319fc225f0e9584151dbea2291d" dependencies = [ - "html5ever 0.36.1", + "html5ever 0.38.0", "markup5ever_rcdom", "phf 0.13.1", ] @@ -2046,16 +2049,6 @@ dependencies = [ "match_token", ] -[[package]] -name = "html5ever" -version = "0.36.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6452c4751a24e1b99c3260d505eaeee76a050573e61f30ac2c924ddc7236f01e" -dependencies = [ - "log", - "markup5ever 0.36.1", -] - [[package]] name = "html5ever" version = "0.38.0" @@ -2079,9 +2072,9 @@ dependencies = [ [[package]] name = "http" -version = "1.4.0" +version = "1.4.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e3ba2a386d7f85a81f119ad7498ebe444d2e22c2af0b86b069416ace48b3311a" +checksum = "8be7462df143984c4598a256ef469b251d7d7f9e271135073e78fc535414f3d0" dependencies = [ "bytes", "itoa", @@ -2105,7 +2098,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184" dependencies = [ "bytes", - "http 1.4.0", + "http 1.4.1", ] [[package]] @@ -2116,7 +2109,7 @@ checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a" dependencies = [ "bytes", "futures-core", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "pin-project-lite", ] @@ -2135,9 +2128,9 @@ checksum = "df3b46402a9d5adb4c86a0cf463f42e19994e3ee891101b1841f30a545cb49a9" [[package]] name = "hybrid-array" -version = "0.4.10" +version = "0.4.12" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3944cf8cf766b40e2a1a333ee5e9b563f854d5fa49d6a8ca2764e97c6eddb214" +checksum = "9155a582abd142abc056962c29e3ce5ff2ad5469f4246b537ed42c5deba857da" dependencies = [ "typenum", ] @@ -2168,16 +2161,16 @@ dependencies = [ [[package]] name = "hyper" -version = "1.9.0" +version = "1.10.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6299f016b246a94207e63da54dbe807655bf9e00044f73ded42c3ac5305fbcca" +checksum = "eb92f162bf56536459fc83c79b974bb12837acfed43d6bc370a7916d0ae15ecc" dependencies = [ "atomic-waker", "bytes", "futures-channel", "futures-core", - "h2 0.4.13", - "http 1.4.0", + "h2 0.4.14", + "http 1.4.1", "http-body 1.0.1", "httparse", "httpdate", @@ -2205,19 +2198,18 @@ dependencies = [ [[package]] name = "hyper-rustls" -version = "0.27.7" +version = "0.27.9" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e3c93eb611681b207e1fe55d5a71ecf91572ec8a6705cdb6857f7d8d5242cf58" +checksum = "33ca68d021ef39cf6463ab54c1d0f5daf03377b70561305bb89a8f83aab66e0f" dependencies = [ - "http 1.4.0", - "hyper 1.9.0", + "http 1.4.1", + "hyper 1.10.0", "hyper-util", - "rustls 0.23.37", - "rustls-pki-types", + "rustls 0.23.40", "tokio", "tokio-rustls 0.26.4", "tower-service", - "webpki-roots 1.0.6", + "webpki-roots 1.0.7", ] [[package]] @@ -2226,7 +2218,7 @@ version = "0.5.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2b90d566bffbce6a75bd8b09a05aa8c2cb1fabb6cb348f8840c9e4c90a0d83b0" dependencies = [ - "hyper 1.9.0", + "hyper 1.10.0", "hyper-util", "pin-project-lite", "tokio", @@ -2243,14 +2235,14 @@ dependencies = [ "bytes", "futures-channel", "futures-util", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", - "hyper 1.9.0", + "hyper 1.10.0", "ipnet", "libc", "percent-encoding", "pin-project-lite", - "socket2 0.6.3", + "socket2 0.6.4", "tokio", "tower-service", "tracing", @@ -2381,9 +2373,9 @@ dependencies = [ [[package]] name = "idna_adapter" -version = "1.2.1" +version = "1.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3acae9609540aa318d1bc588455225fb2085b9ed0c4f6bd0d9d5bcd86f1a0344" +checksum = "cb68373c0d6620ef8105e855e7745e18b0d00d3bdb07fb532e434244cdb9a714" dependencies = [ "icu_normalizer", "icu_properties", @@ -2417,12 +2409,12 @@ dependencies = [ [[package]] name = "indexmap" -version = "2.13.0" +version = "2.14.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7714e70437a7dc3ac8eb7e6f8df75fd8eb422675fc7678aff7364301092b1017" +checksum = "d466e9454f08e4a911e14806c24e16fba1b4c121d1ea474396f396069cf949d9" dependencies = [ "equivalent", - "hashbrown 0.16.1", + "hashbrown 0.17.1", "serde", "serde_core", ] @@ -2433,16 +2425,6 @@ version = "2.12.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d98f6fed1fde3f8c21bc40a1abb88dd75e67924f9cffc3ef95607bad8017f8e2" -[[package]] -name = "iri-string" -version = "0.7.12" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "25e659a4bb38e810ebc252e53b5814ff908a8c58c2a9ce2fae1bbec24cbf4e20" -dependencies = [ - "memchr", - "serde", -] - [[package]] name = "is_terminal_polyfill" version = "1.70.2" @@ -2466,9 +2448,9 @@ checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" [[package]] name = "js-sys" -version = "0.3.94" +version = "0.3.99" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2e04e2ef80ce82e13552136fabeef8a5ed1f985a96805761cbb9a2c34e7664d9" +checksum = "142bc4740e452c1e57ade0cbc129f139c9093e354346f0872ef985f4f5cf5f11" dependencies = [ "cfg-if", "futures-util", @@ -2499,20 +2481,17 @@ checksum = "09edd9e8b54e49e587e4f6295a7d29c3ea94d469cb40ab8ca70b288248a81db2" [[package]] name = "libc" -version = "0.2.184" +version = "0.2.186" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "48f5d2a454e16a5ea0f4ced81bd44e4cfc7bd3a507b61887c99fd3538b28e4af" +checksum = "68ab91017fe16c622486840e4c83c9a37afeff978bd239b5293d61ece587de66" [[package]] name = "libredox" -version = "0.1.15" +version = "0.1.17" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7ddbf48fd451246b1f8c2610bd3b4ac0cc6e149d89832867093ab69a17194f08" +checksum = "f02ab6bace2054fb888a3c16f990117b579d14a3088e472d63c6011fa185c9d3" dependencies = [ - "bitflags 2.11.0", "libc", - "plain", - "redox_syscall 0.7.3", ] [[package]] @@ -2544,9 +2523,9 @@ dependencies = [ [[package]] name = "log" -version = "0.4.29" +version = "0.4.30" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897" +checksum = "616ec5685824bcc94416c6d4a7a446eea774a31efd7062c8480ba6fd06d7a6e5" dependencies = [ "value-bag", ] @@ -2560,7 +2539,7 @@ dependencies = [ "chrono", "encoding_rs", "flate2", - "indexmap 2.13.0", + "indexmap 2.14.0", "itoa", "log", "md-5 0.10.6", @@ -2606,17 +2585,6 @@ dependencies = [ "tendril 0.4.3", ] -[[package]] -name = "markup5ever" -version = "0.36.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6c3294c4d74d0742910f8c7b466f44dda9eb2d5742c1e430138df290a1e8451c" -dependencies = [ - "log", - "tendril 0.4.3", - "web_atoms", -] - [[package]] name = "markup5ever" version = "0.38.0" @@ -2630,13 +2598,13 @@ dependencies = [ [[package]] name = "markup5ever_rcdom" -version = "0.36.0+unofficial" +version = "0.38.0+unofficial" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3e5fc8802e8797c0dfdd2ce5c21aa0aee21abbc7b3b18559100651b3352a7b63" +checksum = "333171ccdf66e915257740d44e38ea5b1b19ce7b45d33cc35cb6f118fbd981ff" dependencies = [ - "html5ever 0.36.1", - "markup5ever 0.36.1", - "tendril 0.4.3", + "html5ever 0.38.0", + "markup5ever 0.38.0", + "tendril 0.5.0", "xml5ever", ] @@ -2688,9 +2656,9 @@ dependencies = [ [[package]] name = "memchr" -version = "2.8.0" +version = "2.8.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" +checksum = "6b947ae49db0d222b1dbc6b113ce7248a3fc3a6ca21b696717bfc000ba4484d8" [[package]] name = "mime" @@ -2716,9 +2684,9 @@ dependencies = [ [[package]] name = "mio" -version = "1.2.0" +version = "1.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "50b7e5b27aa02a74bac8c3f23f448f8d87ff11f92d3aac1a6ed369ee08cc56c1" +checksum = "02bd0af71c67b473010cbbc60715ee815645a4dc942899111f494b4b737d6fda" dependencies = [ "libc", "wasi", @@ -2752,9 +2720,9 @@ dependencies = [ [[package]] name = "num-conv" -version = "0.2.1" +version = "0.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c6673768db2d862beb9b39a78fdcb1a69439615d5794a1be50caa9bc92c81967" +checksum = "521739c6d2bac4aa25192232afe6841231376b2b26d4d9fae5ecf8ca5772e441" [[package]] name = "num-integer" @@ -2810,7 +2778,7 @@ checksum = "1e32339a5dc40459130b3bd269e9892439f55b33e772d2a9d402a789baaf4e8a" dependencies = [ "futures-core", "futures-sink", - "indexmap 2.13.0", + "indexmap 2.14.0", "js-sys", "once_cell", "pin-project-lite", @@ -2840,7 +2808,7 @@ checksum = "91cf61a1868dacc576bf2b2a1c3e9ab150af7272909e80085c3173384fe11f76" dependencies = [ "async-trait", "futures-core", - "http 1.4.0", + "http 1.4.1", "opentelemetry 0.27.1", "opentelemetry-proto", "opentelemetry_sdk 0.27.1", @@ -2879,7 +2847,7 @@ dependencies = [ "opentelemetry 0.21.0", "ordered-float", "percent-encoding", - "rand 0.8.5", + "rand 0.8.6", "thiserror 1.0.69", "tokio", "tokio-stream", @@ -2898,7 +2866,7 @@ dependencies = [ "glob", "opentelemetry 0.27.1", "percent-encoding", - "rand 0.8.5", + "rand 0.8.6", "serde_json", "thiserror 1.0.69", "tokio", @@ -2951,7 +2919,7 @@ checksum = "2621685985a2ebf1c516881c026032ac7deafcda1a2c9b7850dc81e3dfcb64c1" dependencies = [ "cfg-if", "libc", - "redox_syscall 0.5.18", + "redox_syscall", "smallvec", "windows-link", ] @@ -3035,7 +3003,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3c80231409c20246a13fddb31776fb942c38553c51e871f8cbd687a4cfb5843d" dependencies = [ "phf_shared 0.11.3", - "rand 0.8.5", + "rand 0.8.6", ] [[package]] @@ -3094,18 +3062,18 @@ dependencies = [ [[package]] name = "pin-project" -version = "1.1.11" +version = "1.1.13" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f1749c7ed4bcaf4c3d0a3efc28538844fb29bcdd7d2b67b2be7e20ba861ff517" +checksum = "2466b2336ed02bcdca6b294417127b90ec92038d1d5c4fbeac971a922e0e0924" dependencies = [ "pin-project-internal", ] [[package]] name = "pin-project-internal" -version = "1.1.11" +version = "1.1.13" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d9b20ed30f105399776b9c883e68e536ef602a16ae6f596d2c473591d6ad64c6" +checksum = "c96395f0a926bc13b1c17622aaddda1ecb55d49c8f1bf9777e4d877800a43f8b" dependencies = [ "proc-macro2", "quote", @@ -3135,12 +3103,6 @@ dependencies = [ "futures-io", ] -[[package]] -name = "plain" -version = "0.2.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b4596b6d070b27117e987119b4dac604f3c58cfb0b191112e24771b2faeac1a6" - [[package]] name = "polling" version = "3.11.0" @@ -3251,8 +3213,8 @@ dependencies = [ "quinn-proto", "quinn-udp", "rustc-hash", - "rustls 0.23.37", - "socket2 0.6.3", + "rustls 0.23.40", + "socket2 0.6.4", "thiserror 2.0.18", "tokio", "tracing", @@ -3268,10 +3230,10 @@ dependencies = [ "bytes", "getrandom 0.3.4", "lru-slab", - "rand 0.9.2", + "rand 0.9.4", "ring", "rustc-hash", - "rustls 0.23.37", + "rustls 0.23.40", "rustls-pki-types", "slab", "thiserror 2.0.18", @@ -3289,7 +3251,7 @@ dependencies = [ "cfg_aliases", "libc", "once_cell", - "socket2 0.6.3", + "socket2 0.6.4", "tracing", "windows-sys 0.60.2", ] @@ -3317,9 +3279,9 @@ checksum = "f8dcc9c7d52a811697d2151c701e0d08956f92b0e24136cf4cf27b57a6a0d9bf" [[package]] name = "rand" -version = "0.8.5" +version = "0.8.6" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "34af8d1a0e25924bc5b7c43c079c942339d8f0a8b57c39049bef581b46327404" +checksum = "5ca0ecfa931c29007047d1bc58e623ab12e5590e8c7cc53200d5202b69266d8a" dependencies = [ "libc", "rand_chacha 0.3.1", @@ -3328,9 +3290,9 @@ dependencies = [ [[package]] name = "rand" -version = "0.9.2" +version = "0.9.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6db2770f06117d490610c7488547d543617b21bfa07796d7a12f6f1bd53850d1" +checksum = "44c5af06bb1b7d3216d91932aed5265164bf384dc89cd6ba05cf59a35f5f76ea" dependencies = [ "rand_chacha 0.9.0", "rand_core 0.9.5", @@ -3382,9 +3344,9 @@ checksum = "973443cf09a9c8656b574a866ab68dfa19f0867d0340648c7d2f6a71b8a8ea68" [[package]] name = "rayon" -version = "1.11.0" +version = "1.12.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "368f01d005bf8fd9b1206fb6fa653e6c4a81ceb1466406b81792d87c5677a58f" +checksum = "fb39b166781f92d482534ef4b4b1b2568f42613b53e5b6c160e24cfbfa30926d" dependencies = [ "either", "rayon-core", @@ -3419,16 +3381,7 @@ version = "0.5.18" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ed2bf2547551a7053d6fdfafda3f938979645c44812fbfcda098faae3f1a362d" dependencies = [ - "bitflags 2.11.0", -] - -[[package]] -name = "redox_syscall" -version = "0.7.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6ce70a74e890531977d37e532c34d45e9055d2409ed08ddba14529471ed0be16" -dependencies = [ - "bitflags 2.11.0", + "bitflags 2.11.1", ] [[package]] @@ -3539,18 +3492,18 @@ dependencies = [ "bytes", "futures-core", "futures-util", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "http-body-util", - "hyper 1.9.0", - "hyper-rustls 0.27.7", + "hyper 1.10.0", + "hyper-rustls 0.27.9", "hyper-util", "js-sys", "log", "percent-encoding", "pin-project-lite", "quinn", - "rustls 0.23.37", + "rustls 0.23.40", "rustls-pki-types", "serde", "serde_json", @@ -3567,7 +3520,7 @@ dependencies = [ "wasm-bindgen-futures", "wasm-streams", "web-sys", - "webpki-roots 1.0.6", + "webpki-roots 1.0.7", ] [[package]] @@ -3641,7 +3594,7 @@ version = "0.38.44" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "fdb5bc1ae2baa591800df16c9ca78619bf65c0488b41b96ccec5d11220d8c154" dependencies = [ - "bitflags 2.11.0", + "bitflags 2.11.1", "errno", "libc", "linux-raw-sys 0.4.15", @@ -3654,7 +3607,7 @@ version = "1.1.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b6fe4565b9518b83ef4f91bb47ce29620ca828bd32cb7e408f0062e9930ba190" dependencies = [ - "bitflags 2.11.0", + "bitflags 2.11.1", "errno", "libc", "linux-raw-sys 0.12.1", @@ -3675,14 +3628,14 @@ dependencies = [ [[package]] name = "rustls" -version = "0.23.37" +version = "0.23.40" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "758025cb5fccfd3bc2fd74708fd4682be41d99e5dff73c377c0646c6012c73a4" +checksum = "ef86cd5876211988985292b91c96a8f2d298df24e75989a43a3c73f2d4d8168b" dependencies = [ "once_cell", "ring", "rustls-pki-types", - "rustls-webpki 0.103.10", + "rustls-webpki 0.103.13", "subtle", "zeroize", ] @@ -3710,9 +3663,9 @@ dependencies = [ [[package]] name = "rustls-pki-types" -version = "1.14.0" +version = "1.14.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "be040f8b0a225e40375822a563fa9524378b9d63112f53e19ffff34df5d33fdd" +checksum = "30a7197ae7eb376e574fe940d068c30fe0462554a3ddbe4eca7838e049c937a9" dependencies = [ "web-time", "zeroize", @@ -3730,9 +3683,9 @@ dependencies = [ [[package]] name = "rustls-webpki" -version = "0.103.10" +version = "0.103.13" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "df33b2b81ac578cabaf06b89b0631153a3f416b0a886e8a7a1707fb51abbd1ef" +checksum = "61c429a8649f110dddef65e2a5ad240f747e85f7758a6bccc7e5777bd33f756e" dependencies = [ "ring", "rustls-pki-types", @@ -3806,7 +3759,7 @@ version = "3.7.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b7f4bc775c73d9a02cde8bf7b2ec4c9d12743edf609006c7facc23998404cd1d" dependencies = [ - "bitflags 2.11.0", + "bitflags 2.11.1", "core-foundation 0.10.1", "core-foundation-sys", "libc", @@ -3829,7 +3782,7 @@ version = "0.26.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "fd568a4c9bb598e291a08244a5c1f5a8a6650bee243b5b0f8dbb3d9cc1d87fe8" dependencies = [ - "bitflags 2.11.0", + "bitflags 2.11.1", "cssparser", "derive_more", "fxhash", @@ -3844,9 +3797,9 @@ dependencies = [ [[package]] name = "semver" -version = "1.0.27" +version = "1.0.28" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d767eb0aabc880b29956c35734170f26ed551a859dbd361d140cdbeca61ab1e2" +checksum = "8a7852d02fc848982e0c167ef163aaff9cd91dc640ba85e263cb1ce46fae51cd" [[package]] name = "serde" @@ -3880,9 +3833,9 @@ dependencies = [ [[package]] name = "serde_json" -version = "1.0.149" +version = "1.0.150" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86" +checksum = "e8014e44b4736ed0538adeecded0fce2a272f22dc9578a7eb6b2d9993c74cfb9" dependencies = [ "itoa", "memchr", @@ -3929,7 +3882,7 @@ version = "0.9.34+deprecated" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6a8b1a1a2ebf674015cc02edccce75287f1a0130d394307b36743c2f5d504b47" dependencies = [ - "indexmap 2.13.0", + "indexmap 2.14.0", "itoa", "ryu", "serde", @@ -4047,9 +4000,9 @@ checksum = "bbbb5d9659141646ae647b42fe094daf6c6192d1620870b449d9557f748b2daa" [[package]] name = "siphasher" -version = "1.0.2" +version = "1.0.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b2aa850e253778c88a04c3d7323b043aeda9d3e30d5971937c1855769763678e" +checksum = "8ee5873ec9cce0195efcb7a4e9507a04cd49aec9c83d0389df45b1ef7ba2e649" [[package]] name = "slab" @@ -4075,9 +4028,9 @@ dependencies = [ [[package]] name = "socket2" -version = "0.6.3" +version = "0.6.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3a766e1110788c36f4fa1c2b71b387a7815aa65f88ce0229841826633d93723e" +checksum = "52d1cfed4120b4d927bf7c0f86d2087a4a7d6027c906d9f9d525a80573b9be51" dependencies = [ "libc", "windows-sys 0.61.2", @@ -4118,6 +4071,7 @@ dependencies = [ "parking_lot", "phf_shared 0.13.1", "precomputed-hash", + "serde", ] [[package]] @@ -4227,9 +4181,9 @@ dependencies = [ [[package]] name = "tar" -version = "0.4.45" +version = "0.4.46" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "22692a6476a21fa75fdfc11d452fda482af402c008cdbaf3476414e122040973" +checksum = "3f6221d9a6003c78398e3b239969f352578258df48c8eb051caadae0015bc840" dependencies = [ "filetime", "libc", @@ -4377,9 +4331,9 @@ checksum = "1f3ccbac311fea05f86f61904b462b55fb3df8837a366dfc601a0161d0532f20" [[package]] name = "tokio" -version = "1.50.0" +version = "1.52.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "27ad5e34374e03cfffefc301becb44e9dc3c17584f414349ebe29ed26661822d" +checksum = "8fc7f01b389ac15039e4dc9531aa973a135d7a4135281b12d7c1bc79fd57fffe" dependencies = [ "bytes", "libc", @@ -4387,16 +4341,16 @@ dependencies = [ "parking_lot", "pin-project-lite", "signal-hook-registry", - "socket2 0.6.3", + "socket2 0.6.4", "tokio-macros", "windows-sys 0.61.2", ] [[package]] name = "tokio-macros" -version = "2.6.1" +version = "2.7.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5c55a2eff8b69ce66c84f85e1da1c233edc36ceb85a2058d11b0d6a3c7e7569c" +checksum = "385a6cb71ab9ab790c5fe8d67f1645e6c450a7ce006a33de03daa956cf70a496" dependencies = [ "proc-macro2", "quote", @@ -4419,7 +4373,7 @@ version = "0.26.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1729aa945f29d91ba541258c8df89027d5792d85a8841fb65e8bf0f4ede4ef61" dependencies = [ - "rustls 0.23.37", + "rustls 0.23.40", "tokio", ] @@ -4443,7 +4397,7 @@ checksum = "edc5f74e248dc973e0dbb7b74c7e0d6fcc301c694ff50049504004ef4d0cdcd9" dependencies = [ "futures-util", "log", - "rustls 0.23.37", + "rustls 0.23.40", "rustls-native-certs", "rustls-pki-types", "tokio", @@ -4491,7 +4445,7 @@ version = "0.22.27" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "41fe8c660ae4257887cf66394862d21dbca4a6ddd26f04a3560410406a2f819a" dependencies = [ - "indexmap 2.13.0", + "indexmap 2.14.0", "serde", "serde_spanned", "toml_datetime", @@ -4516,11 +4470,11 @@ dependencies = [ "axum", "base64 0.22.1", "bytes", - "h2 0.4.13", - "http 1.4.0", + "h2 0.4.14", + "http 1.4.1", "http-body 1.0.1", "http-body-util", - "hyper 1.9.0", + "hyper 1.10.0", "hyper-timeout", "hyper-util", "percent-encoding", @@ -4546,7 +4500,7 @@ dependencies = [ "indexmap 1.9.3", "pin-project", "pin-project-lite", - "rand 0.8.5", + "rand 0.8.6", "slab", "tokio", "tokio-util", @@ -4573,19 +4527,18 @@ dependencies = [ [[package]] name = "tower-http" -version = "0.6.8" +version = "0.6.11" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d4e6559d53cc268e5031cd8429d05415bc4cb4aefc4aa5d6cc35fbf5b924a1f8" +checksum = "4cfcf7e2740e6fc6d4d688b4ef00650406bb94adf4731e43c096c3a19fe40840" dependencies = [ "async-compression", - "bitflags 2.11.0", + "bitflags 2.11.1", "bytes", "futures-core", "futures-util", - "http 1.4.0", + "http 1.4.1", "http-body 1.0.1", "http-body-util", - "iri-string", "pin-project-lite", "tokio", "tokio-util", @@ -4593,6 +4546,7 @@ dependencies = [ "tower-layer", "tower-service", "tracing", + "url", ] [[package]] @@ -4715,10 +4669,10 @@ dependencies = [ "byteorder", "bytes", "data-encoding", - "http 1.4.0", + "http 1.4.1", "httparse", "log", - "rand 0.8.5", + "rand 0.8.6", "sha1 0.10.6", "thiserror 1.0.69", "utf-8", @@ -4733,11 +4687,11 @@ dependencies = [ "byteorder", "bytes", "data-encoding", - "http 1.4.0", + "http 1.4.1", "httparse", "log", - "rand 0.8.5", - "rustls 0.23.37", + "rand 0.8.6", + "rustls 0.23.40", "rustls-pki-types", "sha1 0.10.6", "thiserror 1.0.69", @@ -4755,9 +4709,9 @@ dependencies = [ [[package]] name = "typenum" -version = "1.19.0" +version = "1.20.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "562d481066bde0658276a35467c4af00bdc6ee726305698a55b86e61d7ad82bb" +checksum = "40ce102ab67701b8526c123c1bab5cbe42d7040ccfd0f64af1a385808d2f43de" [[package]] name = "unicode-ident" @@ -4836,9 +4790,9 @@ checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821" [[package]] name = "uuid" -version = "1.23.0" +version = "1.23.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5ac8b6f42ead25368cf5b098aeb3dc8a1a2c05a3eee8a9a1a68c640edbfc79d9" +checksum = "ddd74a9687298c6858e9b88ec8935ec45d22e8fd5e6394fa1bd4e99a87789c76" dependencies = [ "getrandom 0.4.2", "js-sys", @@ -4897,11 +4851,11 @@ checksum = "ccf3ec651a847eb01de73ccad15eb7d99f80485de043efb2f370cd654f4ea44b" [[package]] name = "wasip2" -version = "1.0.2+wasi-0.2.9" +version = "1.0.3+wasi-0.2.9" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9517f9239f02c069db75e65f174b3da828fe5f5b945c4dd26bd25d89c03ebcf5" +checksum = "20064672db26d7cdc89c7798c48a0fdfac8213434a1186e5ef29fd560ae223d6" dependencies = [ - "wit-bindgen", + "wit-bindgen 0.57.1", ] [[package]] @@ -4910,14 +4864,14 @@ version = "0.4.0+wasi-0.3.0-rc-2026-01-06" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5428f8bf88ea5ddc08faddef2ac4a67e390b88186c703ce6dbd955e1c145aca5" dependencies = [ - "wit-bindgen", + "wit-bindgen 0.51.0", ] [[package]] name = "wasm-bindgen" -version = "0.2.117" +version = "0.2.122" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "0551fc1bb415591e3372d0bc4780db7e587d84e2a7e79da121051c5c4b89d0b0" +checksum = "3ed04576f974d2b2fba0f38c51dbc5518011e38c36bf1143164be765528fd409" dependencies = [ "cfg-if", "once_cell", @@ -4928,9 +4882,9 @@ dependencies = [ [[package]] name = "wasm-bindgen-futures" -version = "0.4.67" +version = "0.4.72" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "03623de6905b7206edd0a75f69f747f134b7f0a2323392d664448bf2d3c5d87e" +checksum = "9473dbd2991ae90b6291c3c32c30c6187ac49aa32f9905d1cce280ec1e110b0f" dependencies = [ "js-sys", "wasm-bindgen", @@ -4938,9 +4892,9 @@ dependencies = [ [[package]] name = "wasm-bindgen-macro" -version = "0.2.117" +version = "0.2.122" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7fbdf9a35adf44786aecd5ff89b4563a90325f9da0923236f6104e603c7e86be" +checksum = "916151b09da36bd82f6615cbf3a419e2f0ba23a03c6160e8e92eb6bd4aa1dec6" dependencies = [ "quote", "wasm-bindgen-macro-support", @@ -4948,9 +4902,9 @@ dependencies = [ [[package]] name = "wasm-bindgen-macro-support" -version = "0.2.117" +version = "0.2.122" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "dca9693ef2bab6d4e6707234500350d8dad079eb508dca05530c85dc3a529ff2" +checksum = "299047362ccbfce148b67ab7e73349f77748e00c8296f9542adfad2ad82c5c5e" dependencies = [ "bumpalo", "proc-macro2", @@ -4961,9 +4915,9 @@ dependencies = [ [[package]] name = "wasm-bindgen-shared" -version = "0.2.117" +version = "0.2.122" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "39129a682a6d2d841b6c429d0c51e5cb0ed1a03829d8b3d1e69a011e62cb3d3b" +checksum = "9a929b2c61f11ba3e9bc35b50c1f25cb38e0e892c0c231ae2b8cf78d5dad4437" dependencies = [ "unicode-ident", ] @@ -4985,7 +4939,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "bb0e353e6a2fbdc176932bbaab493762eb1255a7900fe0fea1a2f96c296cc909" dependencies = [ "anyhow", - "indexmap 2.13.0", + "indexmap 2.14.0", "wasm-encoder", "wasmparser", ] @@ -5009,17 +4963,17 @@ version = "0.244.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "47b807c72e1bac69382b3a6fb3dbe8ea4c0ed87ff5629b8685ae6b9a611028fe" dependencies = [ - "bitflags 2.11.0", + "bitflags 2.11.1", "hashbrown 0.15.5", - "indexmap 2.13.0", + "indexmap 2.14.0", "semver", ] [[package]] name = "web-sys" -version = "0.3.94" +version = "0.3.99" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "cd70027e39b12f0849461e08ffc50b9cd7688d942c1c8e3c7b22273236b4dd0a" +checksum = "6d621441cfc37b84979402712047321980c178f299193a3589d05b99e8763436" dependencies = [ "js-sys", "wasm-bindgen", @@ -5037,9 +4991,9 @@ dependencies = [ [[package]] name = "web_atoms" -version = "0.2.3" +version = "0.2.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "57a9779e9f04d2ac1ce317aee707aa2f6b773afba7b931222bff6983843b1576" +checksum = "d7cff6eef815df1834fd250e3a2ff436044d82a9f1bc1980ca1dbdf07effc538" dependencies = [ "phf 0.13.1", "phf_codegen 0.13.1", @@ -5055,9 +5009,9 @@ checksum = "5f20c57d8d7db6d3b86154206ae5d8fba62dd39573114de97c2cb0578251f8e1" [[package]] name = "webpki-roots" -version = "1.0.6" +version = "1.0.7" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "22cfaf3c063993ff62e73cb4311efde4db1efb31ab78a3e5c457939ad5cc0bed" +checksum = "52f5ee44c96cf55f1b349600768e3ece3a8f26010c05265ab73f945bb1a2eb9d" dependencies = [ "rustls-pki-types", ] @@ -5436,9 +5390,9 @@ dependencies = [ "base64 0.22.1", "deadpool", "futures", - "http 1.4.0", + "http 1.4.1", "http-body-util", - "hyper 1.9.0", + "hyper 1.10.0", "hyper-util", "log", "once_cell", @@ -5458,6 +5412,12 @@ dependencies = [ "wit-bindgen-rust-macro", ] +[[package]] +name = "wit-bindgen" +version = "0.57.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ebf944e87a7c253233ad6766e082e3cd714b5d03812acc24c318f549614536e" + [[package]] name = "wit-bindgen-core" version = "0.51.0" @@ -5477,7 +5437,7 @@ checksum = "b7c566e0f4b284dd6561c786d9cb0142da491f46a9fbed79ea69cdad5db17f21" dependencies = [ "anyhow", "heck 0.5.0", - "indexmap 2.13.0", + "indexmap 2.14.0", "prettyplease", "syn 2.0.117", "wasm-metadata", @@ -5507,8 +5467,8 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9d66ea20e9553b30172b5e831994e35fbde2d165325bec84fc43dbf6f4eb9cb2" dependencies = [ "anyhow", - "bitflags 2.11.0", - "indexmap 2.13.0", + "bitflags 2.11.1", + "indexmap 2.14.0", "log", "serde", "serde_derive", @@ -5527,7 +5487,7 @@ checksum = "ecc8ac4bc1dc3381b7f59c34f00b67e18f910c2c0f50015669dde7def656a736" dependencies = [ "anyhow", "id-arena", - "indexmap 2.13.0", + "indexmap 2.14.0", "log", "semver", "serde", @@ -5539,9 +5499,9 @@ dependencies = [ [[package]] name = "writeable" -version = "0.6.2" +version = "0.6.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9edde0db4769d2dc68579893f2306b26c6ecfbe0ef499b013d731b7b9247e0b9" +checksum = "1ffae5123b2d3fc086436f8834ae3ab053a283cfac8fe0a0b8eaae044768a4c4" [[package]] name = "xattr" @@ -5555,12 +5515,12 @@ dependencies = [ [[package]] name = "xml5ever" -version = "0.36.1" +version = "0.38.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f57dd51b88a4b9f99f9b55b136abb86210629d61c48117ddb87f567e51e66be7" +checksum = "d3dc9559429edf0cd3f327cc0afd9d6b36fa8cec6d93107b7fbe64f806b5f2d9" dependencies = [ "log", - "markup5ever 0.36.1", + "markup5ever 0.38.0", ] [[package]] @@ -5603,18 +5563,18 @@ dependencies = [ [[package]] name = "zerocopy" -version = "0.8.48" +version = "0.8.49" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "eed437bf9d6692032087e337407a86f04cd8d6a16a37199ed57949d415bd68e9" +checksum = "bce33a6288fa3f072a8c2c7d0f2fdbb90e28298f0135c1f99b96c3db2efcc60b" dependencies = [ "zerocopy-derive", ] [[package]] name = "zerocopy-derive" -version = "0.8.48" +version = "0.8.49" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "70e3cd084b1788766f53af483dd21f93881ff30d7320490ec3ef7526d203bad4" +checksum = "8fd425244944f4ab65ccff928e7323354c5a018c75838362fdce749dfad2ee1e" dependencies = [ "proc-macro2", "quote", @@ -5623,9 +5583,9 @@ dependencies = [ [[package]] name = "zerofrom" -version = "0.1.7" +version = "0.1.8" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "69faa1f2a1ea75661980b013019ed6687ed0e83d069bc1114e2cc74c6c04c4df" +checksum = "0ec05a11813ea801ff6d75110ad09cd0824ddba17dfe17128ea0d5f68e6c5272" dependencies = [ "zerofrom-derive", ] @@ -5704,7 +5664,7 @@ dependencies = [ "crossbeam-utils", "displaydoc", "flate2", - "indexmap 2.13.0", + "indexmap 2.14.0", "memchr", "thiserror 2.0.18", "zopfli", diff --git a/README.md b/README.md index 858ae27..6bca3d9 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,31 @@ compiled extension under `~/.cache/a3s-code//`. Subsequent imports use the cache. The split exists because the full native-wheel matrix grew past PyPI's per-project storage cap. +### Python Bootstrap Security Hardening Plan + +The v3.2.1 bootstrap hash check detects corrupted or mismatched release +assets, but it is not intended to be a complete supply-chain trust +boundary: the manifest and native wheels are both hosted on the same +GitHub Release. We are treating the trust model raised in +[issue #46](https://github.com/AI45Lab/Code/issues/46) as a hardening +item. + +Planned fixes: + +1. Fail closed when the release manifest or expected hash cannot be + fetched, unless the user explicitly opts out. +2. Restore an explicit `A3S_CODE_OFFLINE=1` mode for environments that + must forbid network access during `import a3s_code`. +3. Embed the expected native wheel hashes in the PyPI bootstrap artifact, + so the hash source is not controlled by the same mutable release asset. +4. Re-verify cached native extensions before loading them, and replace + cache entries that fail validation. +5. Revisit install-time or platform-wheel distribution so dependency + scanners, lockfiles, and air-gapped CI can observe the native artifact + before runtime import. +6. Evaluate signed release metadata or artifact attestations as the + longer-term trust root for GitHub-hosted native wheels. + --- ## Quick Start @@ -303,8 +328,37 @@ session.register_worker_agent( # 13. Persistence and lifecycle. session.save() resumed = agent.resume_session("my-session", opts) -session.cancel() # cancels in-flight send/stream -session.close() +session.cancel() # cancel in-flight send/stream +session.close() # full cleanup; sets session.is_closed +agent.list_sessions() # IDs of live sessions +agent.close_session("session-id") # close one session by ID +agent.close() # close every session + disconnect global MCP + +# 14. Cluster-grade extensibility (cooperate with a host platform). +opts.tenant_id = "acme-prod" # opaque labels propagated to hooks/traces/SessionData +opts.principal = "svc-deploy-bot" # — framework never interprets, host aggregates +opts.agent_template_id = "ci-runner-v7" +opts.correlation_id = "trace-1234" +session = agent.session(workspace, opts) +session.tenant_id # read back the host-supplied labels +session.resume_run("run-id-from-elsewhere") # rehydrate a checkpointed run on this node + +# 15. Long-running session ops (cap memory + reap idle resources). +from a3s_code import SessionRetentionLimits # FIFO caps on in-memory stores +limits = SessionRetentionLimits() # (Rust-only today; Python helper TBD) +opts.retention_limits = limits # falls through to AgentConfig +agent.disconnect_idle_mcp(5 * 60 * 1000) # drop MCP servers idle > 5min; returns names + +# 16. Budget / cost governance (host-supplied policy). +class MyBudget: + def check_before_llm(self, session_id, est_tokens): + if self.over_budget(session_id): + return {"decision": "deny", "resource": "llm_tokens", "reason": "monthly cap"} + return None # allow + def record_after_llm(self, session_id, usage): + self.track(session_id, usage["total_tokens"], usage.get("cache_read_tokens")) + +opts.budget_guard = MyBudget() # SoftLimit emits BudgetThresholdHit; Deny raises RuntimeError ``` ```typescript @@ -474,8 +528,45 @@ session.registerWorkerAgent({ // 13. Persistence and lifecycle. await session.save(); const resumed = agent.resumeSession('my-session', opts); -session.cancel(); // cancels in-flight send/stream -session.close(); +session.cancel(); // cancel in-flight send/stream +session.close(); // full cleanup; sets session.isClosed +await agent.listSessions(); // IDs of live sessions +await agent.closeSession('session-id'); // close one session by ID +await agent.close(); // close every session + disconnect global MCP + +// 14. Cluster-grade extensibility (cooperate with a host platform). +const session2 = agent.session(workspace, { + tenantId: 'acme-prod', + principal: 'svc-deploy-bot', + agentTemplateId: 'ci-runner-v7', + correlationId: 'trace-1234', + sessionStore: new FileSessionStore('./sessions'), +}); +session2.tenantId; // read host-supplied label +const resumed2 = await session2.resumeRun('run-id-from-elsewhere'); +// Loop checkpoints land automatically after each tool round when a +// sessionStore is configured — pick them up from another node / +// process via session.resumeRun(runId). + +// 15. Long-running session ops (cap memory + reap idle resources). +// SessionRetentionLimits is Rust-only today; an SDK shape lands later. +// MCP idle disconnect is on the agent — call it periodically from a +// host-side sweeper (e.g. setInterval). +await agent.disconnectIdleMcp(5 * 60 * 1000); // drop quiet MCP servers + +// 16. Budget / cost governance (host-supplied policy). +session2.setBudgetGuard({ + checkBeforeLlm: (sessionId, estimatedTokens) => { + if (overBudget(sessionId)) { + return { decision: 'deny', resource: 'llm_tokens', reason: 'monthly cap' }; + } + return null; // allow + }, + recordAfterLlm: (sessionId, usage) => { + track(sessionId, usage.total_tokens); + }, +}); +// SoftLimit emits BudgetThresholdHit('soft'); Deny throws "Budget exhausted". ``` --- diff --git a/core/Cargo.toml b/core/Cargo.toml index 5c605ae..93a310b 100644 --- a/core/Cargo.toml +++ b/core/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "a3s-code-core" -version = "3.2.1" +version = "3.3.0" edition = "2021" authors = ["A3S Lab Team"] license = "MIT" diff --git a/core/src/agent.rs b/core/src/agent.rs index aea0407..6506d4b 100644 --- a/core/src/agent.rs +++ b/core/src/agent.rs @@ -34,6 +34,7 @@ mod context_perception; mod execution_entry; mod execution_mode; mod execution_state; +pub(crate) use execution_state::ExecutionSeed; mod hook_runtime; mod llm_turn; mod loop_builder; @@ -151,6 +152,13 @@ pub(crate) struct AgentConfig { /// If execution exceeds this duration, the loop bails with an error. /// This prevents runaway executions that consume excessive API quota. pub max_execution_time_ms: Option, + /// Host-supplied budget guard consulted before every LLM call (and + /// after, for usage accounting). `None` means no enforcement. + pub budget_guard: Option>, + /// Host-provided ID generator + clock. Defaults to wall-clock UUIDs. + /// Replace via [`SessionOptions::with_host_env`](crate::agent_api::SessionOptions::with_host_env) + /// when deterministic replay is needed. + pub host_env: Arc, } impl std::fmt::Debug for AgentConfig { @@ -227,6 +235,8 @@ impl Default for AgentConfig { continuation_enabled: true, max_continuation_turns: 3, max_execution_time_ms: None, + budget_guard: None, + host_env: Arc::new(crate::host_env::HostEnv::system()), } } } @@ -578,6 +588,71 @@ pub enum AgentEvent { operation: String, error: String, }, + + // ======================================================================== + // Cluster / platform events + // + // These variants are emitted by the host platform (e.g. 书安OS) via + // `HookExecutor` and are not produced by the agent loop itself. They + // give in-session code a uniform way to observe platform-level + // decisions (budget exhaustion, scheduled passivation, peer + // invocations) without coupling to the host's transport. + // ======================================================================== + /// A budget threshold was crossed for this session/tenant. + /// + /// Emitted by a host `BudgetGuard` impl when LLM/tool spend hits a + /// soft or hard threshold. The session is **not** automatically + /// halted — `kind` lets in-session policy decide (e.g. fast-compact + /// at "soft", refuse next LLM call at "hard"). + #[serde(rename = "budget_threshold_hit")] + BudgetThresholdHit { + /// Logical resource: "llm_tokens", "tool_calls", "wall_time", + /// "usd_cost", or host-defined. + resource: String, + /// "soft" or "hard"; host-defined semantics beyond that. + kind: String, + /// Current consumed amount in the same unit as `limit`. + consumed: f64, + /// Threshold that was crossed. + limit: f64, + /// Optional explanation for logs / UI. + #[serde(default, skip_serializing_if = "Option::is_none")] + message: Option, + }, + + /// The host is asking the session to release in-memory state. + /// + /// Emitted before the host calls `session.close()` or moves the + /// session to another node. Session code that holds large caches + /// can react (flush to memory store, drop derived state). The + /// framework does not act on this event itself. + #[serde(rename = "passivation_requested")] + PassivationRequested { + /// "idle_reaper", "node_drain", "migration", "manual", or + /// host-defined. + reason: String, + /// Optional deadline (Unix epoch ms) before forced close. + #[serde(default, skip_serializing_if = "Option::is_none")] + deadline_ms: Option, + }, + + /// Another session in the cluster has invoked this one. + /// + /// Lets in-session hooks distinguish "human-driven send" from + /// "peer-driven send" without inspecting prompts. The host routes + /// the actual prompt through the normal `send` / `stream` path; + /// this event is metadata only. + #[serde(rename = "peer_invocation")] + PeerInvocation { + /// Session id of the invoking peer (cluster-stable). + from_session_id: String, + /// Optional tenant of the invoking peer. + #[serde(default, skip_serializing_if = "Option::is_none")] + from_tenant_id: Option, + /// Distributed-trace correlation id linking the two sessions. + #[serde(default, skip_serializing_if = "Option::is_none")] + correlation_id: Option, + }, } /// Result of agent execution @@ -702,6 +777,16 @@ pub(crate) struct AgentLoop { config: AgentConfig, /// Optional lane queue for priority-based tool execution command_queue: Option>, + /// Optional sink for per-tool-round checkpoints. Populated by + /// `build_agent_loop` when the session has a configured + /// `SessionStore`. The agent loop uses + /// [`AgentLoop::set_checkpoint_run`] to bind a run id before + /// `execute_with_session`, then persists a checkpoint after each + /// completed tool round. + pub(crate) checkpoint_sink: Option>, + /// Run id under which checkpoints are stored. Reset per execution + /// via [`AgentLoop::set_checkpoint_run`]. + pub(crate) checkpoint_run_id: Option, } #[cfg(test)] diff --git a/core/src/agent/execution_entry.rs b/core/src/agent/execution_entry.rs index 6607be2..5935c6f 100644 --- a/core/src/agent/execution_entry.rs +++ b/core/src/agent/execution_entry.rs @@ -30,6 +30,24 @@ impl AgentLoop { session_id: Option<&str>, event_tx: Option>, cancel_token: Option<&tokio_util::sync::CancellationToken>, + ) -> Result { + self.execute_from_messages_seeded(messages, session_id, event_tx, cancel_token, None) + .await + } + + /// Like [`execute_from_messages`](Self::execute_from_messages) but seeds + /// the loop's cumulative metrics (token usage, tool-call count, + /// verification reports) from a checkpoint. Used by + /// `AgentSession::resume_run` so a resumed run continues accounting + /// from where the crashed/migrated run left off instead of + /// re-starting at zero. + pub async fn execute_from_messages_seeded( + &self, + messages: Vec, + session_id: Option<&str>, + event_tx: Option>, + cancel_token: Option<&tokio_util::sync::CancellationToken>, + seed: Option, ) -> Result { let default_token = tokio_util::sync::CancellationToken::new(); let token = cancel_token.unwrap_or(&default_token); @@ -59,6 +77,7 @@ impl AgentLoop { event_tx, token, true, // emit_end: this is a standalone execution + seed, ) .await; diff --git a/core/src/agent/execution_state.rs b/core/src/agent/execution_state.rs index 5245a33..d71f23e 100644 --- a/core/src/agent/execution_state.rs +++ b/core/src/agent/execution_state.rs @@ -24,13 +24,37 @@ pub(super) struct ParseErrorOutcome { pub(super) fatal_message: Option, } +/// Seed for resuming a run from a [`LoopCheckpoint`](crate::loop_checkpoint::LoopCheckpoint): +/// the cumulative metrics accrued before the crash/migration so the +/// resumed run continues accounting from where it left off instead of +/// re-starting at zero (which would under-report token usage and tool +/// calls in the resulting `AgentResult`). +#[derive(Default)] +pub(crate) struct ExecutionSeed { + pub(crate) total_usage: TokenUsage, + pub(crate) tool_calls_count: usize, + pub(crate) verification_reports: Vec, +} + impl ExecutionLoopState { + /// Convenience constructor with no checkpoint seed. Only used by + /// unit tests now; production paths go through `new_seeded` (the + /// resume path threads checkpoint metrics, the normal path passes + /// `None`). + #[cfg(test)] pub(super) fn new(history: &[Message]) -> Self { + Self::new_seeded(history, None) + } + + /// Build loop state, optionally pre-seeded with cumulative metrics + /// from a checkpoint (see [`ExecutionSeed`]). + pub(super) fn new_seeded(history: &[Message], seed: Option) -> Self { + let seed = seed.unwrap_or_default(); Self { messages: history.to_vec(), - total_usage: TokenUsage::default(), - tool_calls_count: 0, - verification_reports: Vec::new(), + total_usage: seed.total_usage, + tool_calls_count: seed.tool_calls_count, + verification_reports: seed.verification_reports, turn: 0, parse_error_count: 0, continuation_count: 0, diff --git a/core/src/agent/llm_turn.rs b/core/src/agent/llm_turn.rs index 2ec9783..d727f60 100644 --- a/core/src/agent/llm_turn.rs +++ b/core/src/agent/llm_turn.rs @@ -118,6 +118,49 @@ impl AgentLoop { event_tx: &Option>, cancel_token: &tokio_util::sync::CancellationToken, ) -> anyhow::Result { + // Consult the host's BudgetGuard once per turn (not per retry). + // A `Deny` bails out before the LLM is touched; a `SoftLimit` + // surfaces a BudgetThresholdHit event and proceeds. + if let Some(guard) = &self.config.budget_guard { + let sid = session_id.unwrap_or(""); + let estimate = estimate_prompt_tokens(messages, system); + match guard.check_before_llm(sid, estimate).await { + crate::budget::BudgetDecision::Allow => {} + crate::budget::BudgetDecision::SoftLimit { + resource, + consumed, + limit, + message, + } => { + if let Some(tx) = event_tx { + let _ = tx + .send(AgentEvent::BudgetThresholdHit { + resource, + kind: "soft".to_string(), + consumed, + limit, + message, + }) + .await; + } + } + crate::budget::BudgetDecision::Deny { resource, reason } => { + if let Some(tx) = event_tx { + let _ = tx + .send(AgentEvent::BudgetThresholdHit { + resource: resource.clone(), + kind: "hard".to_string(), + consumed: 0.0, + limit: 0.0, + message: Some(reason.clone()), + }) + .await; + } + anyhow::bail!("Budget exhausted on '{resource}': {reason}"); + } + } + } + let threshold = self.config.circuit_breaker_threshold.max(1); let mut attempt = 0u32; @@ -127,7 +170,14 @@ impl AgentLoop { .call_llm(messages, system, event_tx, cancel_token) .await; match result { - Ok(response) => return Ok(response), + Ok(response) => { + if let Some(guard) = &self.config.budget_guard { + guard + .record_after_llm(session_id.unwrap_or(""), &response.usage) + .await; + } + return Ok(response); + } Err(error) if cancel_token.is_cancelled() => { anyhow::bail!(error); } @@ -440,3 +490,16 @@ impl AgentLoop { } } } + +/// Cheap, framework-internal estimator of prompt tokens for +/// `BudgetGuard::check_before_llm`. Roughly counts characters / 4 +/// across system + messages, matching the well-known "1 token ≈ 4 +/// English characters" heuristic. Impls that need precision should +/// rely on `record_after_llm` with the provider's actual usage. +fn estimate_prompt_tokens(messages: &[Message], system: Option<&str>) -> usize { + let mut chars = system.map(|s| s.len()).unwrap_or(0); + for msg in messages { + chars += msg.text().len(); + } + chars / 4 +} diff --git a/core/src/agent/loop_builder.rs b/core/src/agent/loop_builder.rs index 80cf16a..c2dafd1 100644 --- a/core/src/agent/loop_builder.rs +++ b/core/src/agent/loop_builder.rs @@ -1,5 +1,6 @@ use super::{AgentConfig, AgentLoop}; use crate::llm::LlmClient; +use crate::loop_checkpoint::LoopCheckpointSink; use crate::session_lane_queue::SessionLaneQueue; use crate::tools::{ToolContext, ToolExecutor}; use std::sync::Arc; @@ -17,6 +18,8 @@ impl AgentLoop { tool_context, config, command_queue: None, + checkpoint_sink: None, + checkpoint_run_id: None, } } @@ -28,4 +31,22 @@ impl AgentLoop { self.command_queue = Some(queue); self } + + /// Attach a per-tool-round checkpoint sink. After each completed + /// tool round the loop will call `sink.save_checkpoint(...)`. + /// + /// The sink is independent from the run id: call + /// [`AgentLoop::set_checkpoint_run`] before executing to bind the + /// run id this execution will use. + pub fn with_checkpoint_sink(mut self, sink: Arc) -> Self { + self.checkpoint_sink = Some(sink); + self + } + + /// Bind the run id used by per-tool-round checkpoints. Called per + /// execution so a single `AgentLoop` (which is cheap to clone) can + /// host successive runs. + pub fn set_checkpoint_run(&mut self, run_id: impl Into) { + self.checkpoint_run_id = Some(run_id.into()); + } } diff --git a/core/src/agent/loop_runtime.rs b/core/src/agent/loop_runtime.rs index fa6f50d..5e48fee 100644 --- a/core/src/agent/loop_runtime.rs +++ b/core/src/agent/loop_runtime.rs @@ -35,6 +35,7 @@ impl AgentLoop { event_tx, cancel_token, emit_end, + None, ) .await } @@ -57,8 +58,9 @@ impl AgentLoop { event_tx: Option>, cancel_token: &tokio_util::sync::CancellationToken, emit_end: bool, + seed: Option, ) -> Result { - let mut state = ExecutionLoopState::new(history); + let mut state = ExecutionLoopState::new_seeded(history, seed); let style_prompt = if effective_prompt.is_empty() { msg_prompt @@ -143,6 +145,40 @@ impl AgentLoop { effective_prompt, ) .await?; + + // Quiescent boundary: the tool round has fully resolved and + // `state.messages` is consistent. Persist a checkpoint so a + // future process can resume from here (P3). + self.persist_loop_checkpoint(turn, &state, session_id).await; } } + + /// Persist a `LoopCheckpoint` if both a sink and a bound run id are + /// configured. Failures are swallowed (the sink already logs them) + /// so an unavailable store cannot halt a live run. + async fn persist_loop_checkpoint( + &self, + turn: usize, + state: &super::execution_state::ExecutionLoopState, + session_id: Option<&str>, + ) { + let Some(sink) = self.checkpoint_sink.as_ref() else { + return; + }; + let Some(run_id) = self.checkpoint_run_id.as_ref() else { + return; + }; + let checkpoint = crate::loop_checkpoint::LoopCheckpoint { + schema_version: crate::loop_checkpoint::LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: run_id.clone(), + session_id: session_id.unwrap_or("").to_string(), + turn, + messages: state.messages.clone(), + total_usage: state.total_usage.clone(), + tool_calls_count: state.tool_calls_count, + verification_reports: state.verification_reports.clone(), + checkpoint_ms: self.config.host_env.now_ms(), + }; + sink.save_checkpoint(&checkpoint).await; + } } diff --git a/core/src/agent_api.rs b/core/src/agent_api.rs index ee5d6b0..e04463f 100644 --- a/core/src/agent_api.rs +++ b/core/src/agent_api.rs @@ -49,6 +49,7 @@ mod runtime; mod runtime_events; mod session_builder; mod session_clock; +mod session_close; mod session_commands; mod session_config; mod session_extensions; @@ -64,6 +65,7 @@ mod session_view; use direct_tools::DirectToolRuntime; use hook_control::HookControl; use runtime_events::ActiveToolState; +use session_close::SessionCloseHandle; use session_extensions::SessionExtensionRuntime; use session_hitl::HitlControl; use session_queue::QueueControl; @@ -161,6 +163,37 @@ pub struct SessionOptions { pub session_store: Option>, /// Explicit session ID (auto-generated if not set) pub session_id: Option, + /// Multi-tenant identifier. Framework only transports this string; + /// the host (e.g. 书安OS) decides what "tenant" means and how to + /// aggregate/bill on it. Emitted to hooks/traces, persisted in + /// `SessionData`, never interpreted by core. + pub tenant_id: Option, + /// Identity of the principal that triggered this session (user id, + /// service account, etc). Treated as opaque. + pub principal: Option, + /// Logical identifier of the agent template / definition the session + /// was instantiated from. Lets the host aggregate sessions by + /// "which agent recipe" independent of the concrete session id. + pub agent_template_id: Option, + /// Distributed-trace correlation id. Propagated through hooks/traces + /// so a session's events join with upstream/downstream work in the + /// host's observability pipeline. + pub correlation_id: Option, + /// Optional host-supplied budget / quota guard. The framework calls + /// into it before each LLM call (and reports actuals after) so the + /// host can refuse or rate-limit at the cluster level. Default is + /// `None` (no enforcement — equivalent to + /// [`NoopBudgetGuard`](crate::budget::NoopBudgetGuard)). + pub budget_guard: Option>, + /// Optional host-provided ID/Clock pair. Replaces the default + /// random-UUID + wall-clock pair, enabling deterministic replay + /// on another node. `None` keeps pre-P2 behaviour. + pub host_env: Option>, + /// Optional FIFO retention caps on the session's in-memory stores + /// (run records, run events, trace events, terminal subagent + /// tasks). `None` (default) keeps everything — fine for short + /// sessions, a memory leak for hours-long cluster workloads. + pub retention_limits: Option, /// Auto-save after each completed `send()` or default-history `stream()` call. pub auto_save: bool, /// Optional artifact retention limits for large tool/program outputs. @@ -262,6 +295,19 @@ pub struct Agent { /// Pre-fetched MCP tool definitions from global_mcp (cached at creation time). /// Wrapped in Mutex so `refresh_mcp_tools()` can update the cache without `&mut self`. global_mcp_tools: std::sync::Mutex>, + /// Tracks every live session created by this agent via `Weak` refs so + /// the agent can enumerate and forcibly close them. Sessions register + /// themselves at construction and become dangling `Weak`s on drop — + /// `list_sessions()` / `close_session()` prune dead entries on access. + /// + /// Uses a synchronous lock so the sync `Agent::session()` factory can + /// insert without nesting tokio runtimes. The lock is only held for + /// brief insert/scan operations — async close work happens after the + /// lock is released. + sessions: Arc>>>, + /// Set once `Agent::close()` has been called. Subsequent `session()` / + /// `resume_session()` calls fail fast with `CodeError::SessionClosed`. + closed: Arc, } impl std::fmt::Debug for Agent { @@ -368,6 +414,66 @@ impl Agent { agent_sessions::resume_session(self, session_id, options) } + /// Return the IDs of every live session created from this agent. + /// + /// "Live" means the caller still holds an [`AgentSession`] — sessions + /// that have been dropped are pruned lazily on each call. The list is + /// sorted to make output stable for tests/UIs. + pub async fn list_sessions(&self) -> Vec { + agent_sessions::list_sessions(self).await + } + + /// Close a specific live session by its session ID. + /// + /// Returns `true` when a live session with the given id was found and + /// transitioned from open to closed by this call; `false` when no live + /// session has that id, or when the session was already closed. + /// + /// This is the out-of-band counterpart to [`AgentSession::close`]: it + /// performs exactly the same cleanup but can be invoked without holding + /// a reference to the session itself — useful for control-plane code + /// that only knows the session ID. + pub async fn close_session(&self, session_id: &str) -> bool { + agent_sessions::close_session(self, session_id).await + } + + /// Close every live session created from this agent and tear down + /// background resources owned by the agent (global MCP connections). + /// + /// After this call: + /// - Every live `AgentSession` is closed (same effect as calling + /// [`AgentSession::close`] on each). + /// - Subsequent [`Agent::session`] / [`Agent::resume_session`] calls + /// fail fast with [`CodeError::SessionClosed`](crate::error::CodeError::SessionClosed). + /// + /// Idempotent: subsequent calls are no-ops and are guaranteed not to + /// panic. + pub async fn close(&self) { + agent_sessions::close_agent(self).await + } + + /// Return whether [`close`](Self::close) has been called on this agent. + pub fn is_closed(&self) -> bool { + self.closed.load(std::sync::atomic::Ordering::Acquire) + } + + /// Disconnect every global MCP server whose last activity is older + /// than `idle_threshold_ms`. Returns the names of disconnected + /// servers (empty when there is no global MCP manager or when + /// nothing is idle). + /// + /// Hosts running thousands of long-lived sessions should call this + /// periodically (e.g. every 60s with a 5-min threshold) to release + /// file descriptors and background workers from quiet MCP servers + /// without losing the server's configuration. A subsequent tool + /// call on the same server will require an explicit reconnect. + pub async fn disconnect_idle_mcp(&self, idle_threshold_ms: u64) -> Vec { + match &self.global_mcp { + Some(mcp) => mcp.disconnect_idle(idle_threshold_ms).await, + None => Vec::new(), + } + } + #[cfg(test)] fn build_session( &self, @@ -437,6 +543,37 @@ pub struct AgentSession { trace_sink: crate::trace::InMemoryTraceSink, /// Structured completion evidence collected from agent and explicit verification runs. verification_reports: Arc>>, + /// Set once `close()` has been called. Subsequent send/stream calls + /// fast-fail with [`crate::error::CodeError::SessionClosed`]. + closed: Arc, + /// Session-level parent cancellation token. + /// + /// Every in-flight run (blocking send, stream, delegated subagent task) + /// derives its per-operation token from this one via `child_token()`, + /// so `session_cancel.cancel()` cascades to all of them. `close()` fires + /// this token first, after which any new `child_token()` returns an + /// already-cancelled token (defending against close/spawn races). + pub(crate) session_cancel: tokio_util::sync::CancellationToken, + /// Shared `Arc`-handle used by both [`AgentSession::close`] and the + /// parent [`Agent`]'s registry. The handle bundles every field needed + /// to perform the close sequence so the two entry points cannot drift. + close_handle: Arc, + /// Runtime-mutable override for the budget guard. When set, takes + /// precedence over `config.budget_guard` on the next agent-loop + /// build. Lets SDK callers (Node especially) install a host-side + /// guard after `session()` has returned without ever putting a + /// JS callable into `SessionOptions`. + runtime_budget_guard: std::sync::Mutex>>, + /// Multi-tenant label. Framework only carries the string; semantics + /// belong to the host. + pub(crate) tenant_id: Option, + /// Principal that triggered the session (user / service / etc.). + pub(crate) principal: Option, + /// Logical identifier of the agent template the session was + /// instantiated from. + pub(crate) agent_template_id: Option, + /// Distributed-trace correlation id propagated to hooks / traces. + pub(crate) correlation_id: Option, } impl std::fmt::Debug for AgentSession { @@ -464,9 +601,102 @@ impl AgentSession { session_commands::register(self, cmd); } - /// Cancel any active operation and release session resources. + /// Return whether [`close`](Self::close) has been called on this session. + /// + /// Once closed, `send`/`stream` and their attachment variants fast-fail + /// with [`crate::error::CodeError::SessionClosed`] instead of starting a + /// new run. + pub fn is_closed(&self) -> bool { + self.closed.load(std::sync::atomic::Ordering::Acquire) + } + + /// Clone the session-level [`CancellationToken`](tokio_util::sync::CancellationToken). + /// + /// All in-flight runs derive their per-operation token from this one via + /// `child_token()`, so embedders can: + /// + /// - Observe the token (e.g. wire it into a host-side `select!`) to + /// react to session shutdown without polling [`is_closed`](Self::is_closed); + /// - Call `.cancel()` on it to abort every operation in the session + /// without going through `close()` (no run-store / hook side effects). + /// + /// For graceful shutdown prefer [`close`](Self::close), which also marks + /// runs as cancelled in the store and fires AHP hooks. + pub fn session_cancel_token(&self) -> tokio_util::sync::CancellationToken { + self.session_cancel.clone() + } + + /// Return the host-defined tenant id, if any. + /// + /// The framework only transports this string — it never interprets + /// or enforces tenant boundaries itself. Use this from custom + /// `HookExecutor` / `PermissionChecker` / `BudgetGuard` impls to + /// route logic by tenant. + pub fn tenant_id(&self) -> Option<&str> { + self.tenant_id.as_deref() + } + + /// Return the principal that triggered the session, if any. + pub fn principal(&self) -> Option<&str> { + self.principal.as_deref() + } + + /// Return the id of the agent template/definition the session was + /// instantiated from, if any. + pub fn agent_template_id(&self) -> Option<&str> { + self.agent_template_id.as_deref() + } + + /// Return the distributed-trace correlation id propagated through + /// this session's events, if any. + pub fn correlation_id(&self) -> Option<&str> { + self.correlation_id.as_deref() + } + + /// Install or replace a runtime budget guard. Takes effect on the + /// next `send` / `stream` call (the guard is consulted at agent- + /// loop build time, not on the live execution). Setting `None` + /// clears the override so `config.budget_guard` takes over again. + /// + /// This is the entry point SDKs use to wire a host-supplied guard + /// after the session has already been constructed — useful when + /// the guard's transport (e.g. a JS callable) cannot live inside + /// the value-typed `SessionOptions`. + pub fn set_budget_guard(&self, guard: Option>) { + let mut slot = self + .runtime_budget_guard + .lock() + .unwrap_or_else(|p| p.into_inner()); + *slot = guard; + } + + /// Return the currently-installed runtime budget guard, if any. + /// `None` means the loop falls back to `config.budget_guard`. + pub fn budget_guard(&self) -> Option> { + self.runtime_budget_guard + .lock() + .unwrap_or_else(|p| p.into_inner()) + .clone() + } + + /// Proactively close the session and release its in-flight work. + /// + /// On the first call this: + /// 1. flips the session into the **closed** state so further `send`/`stream` + /// calls fast-fail with [`crate::error::CodeError::SessionClosed`]; + /// 2. fires the session-level cancellation token so every derived + /// run/subagent token cascades to cancelled; + /// 3. marks the active run `Cancelled` in the run store and fires AHP + /// hook side effects; + /// 4. cancels every still-running delegated subagent task spawned from + /// this session; + /// 5. cancels all pending human-in-the-loop tool confirmations. + /// + /// Subsequent calls are no-ops and are guaranteed not to panic. pub async fn close(&self) { - let _ = self.cancel().await; + // Delegate to the shared handle so this entry point and + // `Agent::close_session(id)` cannot drift in behaviour. + self.close_handle.close().await; } /// Send a prompt and wait for the complete response. @@ -481,6 +711,21 @@ impl AgentSession { conversation_runtime::send(self, prompt, history).await } + /// Resume a previously-checkpointed run on this session. + /// + /// Loads the latest [`LoopCheckpoint`](crate::loop_checkpoint::LoopCheckpoint) + /// stored under `checkpoint_run_id` and replays the agent loop from + /// that boundary state. A **new** run id is allocated for the + /// resumed work; the relationship between the old and new run is + /// host-tracked (e.g. by 书安OS) — the framework does not interpret + /// it. + /// + /// Returns an error when no `SessionStore` is configured on this + /// session, or when no checkpoint exists for `checkpoint_run_id`. + pub async fn resume_run(&self, checkpoint_run_id: &str) -> Result { + conversation_runtime::resume_run(self, checkpoint_run_id).await + } + /// Send a prompt with image attachments and wait for the complete response. /// /// Images are included as multi-modal content blocks in the user message. @@ -603,6 +848,21 @@ impl AgentSession { self.subagent_tasks.cancel(task_id).await } + /// Return a shared handle to the session's subagent task tracker. + /// + /// Advanced: embedders implementing a custom subagent execution path + /// (i.e. spawning child loops outside the built-in `task` tool) can use + /// this to register cancellation tokens and feed `AgentEvent`s into the + /// tracker so the standard + /// [`subagent_task`](Self::subagent_task) / [`pending_subagent_tasks`](Self::pending_subagent_tasks) / + /// [`cancel_subagent_task`](Self::cancel_subagent_task) APIs and + /// [`close`](Self::close) keep working uniformly across execution paths. + pub fn subagent_tracker( + &self, + ) -> Arc { + Arc::clone(&self.subagent_tasks) + } + /// Return a snapshot of the session's conversation history. pub fn history(&self) -> Vec { SessionView::from_session(self).history() diff --git a/core/src/agent_api/agent_bootstrap.rs b/core/src/agent_api/agent_bootstrap.rs index 216ff01..6f7cf3d 100644 --- a/core/src/agent_api/agent_bootstrap.rs +++ b/core/src/agent_api/agent_bootstrap.rs @@ -64,6 +64,8 @@ pub(super) async fn build_agent_from_config(config: CodeConfig) -> Result config: agent_config, global_mcp, global_mcp_tools: std::sync::Mutex::new(global_mcp_tools), + sessions: Arc::new(std::sync::Mutex::new(std::collections::HashMap::new())), + closed: Arc::new(std::sync::atomic::AtomicBool::new(false)), }) } diff --git a/core/src/agent_api/agent_loop_runtime.rs b/core/src/agent_api/agent_loop_runtime.rs index 0e719eb..6144f78 100644 --- a/core/src/agent_api/agent_loop_runtime.rs +++ b/core/src/agent_api/agent_loop_runtime.rs @@ -19,6 +19,14 @@ pub(super) fn build_agent_loop(session: &AgentSession) -> AgentLoop { // every run snapshots live definitions instead of using the stale config copy. config.tools = session.tool_executor.definitions(); + // Runtime budget-guard override (set via AgentSession::set_budget_guard) + // takes precedence over the value baked in at session-build time. + // Used by Node SDK where the JS callable cannot live inside + // value-typed SessionOptions. + if let Some(runtime_guard) = session.budget_guard() { + config.budget_guard = Some(runtime_guard); + } + let mut agent_loop = AgentLoop::new( session.llm_client.clone(), session.tool_executor.clone(), @@ -28,5 +36,14 @@ pub(super) fn build_agent_loop(session: &AgentSession) -> AgentLoop { if let Some(queue) = &session.command_queue { agent_loop = agent_loop.with_queue(Arc::clone(queue)); } + // Wire per-tool-round checkpointing when the session has a store. + // The run id is bound later by the caller via + // `AgentLoop::set_checkpoint_run` once `start_run` returns. + if let Some(store) = &session.session_store { + let sink = std::sync::Arc::new(crate::loop_checkpoint::SessionStoreCheckpointSink::new( + std::sync::Arc::clone(store), + )); + agent_loop = agent_loop.with_checkpoint_sink(sink); + } agent_loop } diff --git a/core/src/agent_api/agent_sessions.rs b/core/src/agent_api/agent_sessions.rs index 2d9e9ca..7285fb2 100644 --- a/core/src/agent_api/agent_sessions.rs +++ b/core/src/agent_api/agent_sessions.rs @@ -2,12 +2,16 @@ //! //! `Agent` is workspace-independent; this module owns the transition from an //! agent config/runtime to a workspace-bound `AgentSession`, including resume. +//! It also implements the agent-side session registry: every newly built +//! session registers a `Weak` so `Agent::close_session` +//! and `Agent::close` can reach in and tear it down. use super::{ - agent_binding, session_builder, session_config, session_persistence, Agent, AgentSession, - SessionOptions, + agent_binding, session_builder, session_close::SessionCloseHandle, session_config, + session_persistence, Agent, AgentSession, SessionOptions, }; -use crate::error::Result; +use crate::error::{CodeError, Result}; +use std::sync::{Arc, Weak}; pub(super) async fn refresh_mcp_tools(agent: &Agent) -> Result<()> { if let Some(mcp) = &agent.global_mcp { @@ -25,6 +29,8 @@ pub(super) fn create_session( workspace: impl Into, options: Option, ) -> Result { + bail_if_agent_closed(agent)?; + let merged_opts = session_builder::prepare_session_options(agent, options.unwrap_or_default()); let session_id = merged_opts .session_id @@ -39,6 +45,100 @@ pub(super) fn create_session( session_builder::build_agent_session(agent, workspace.into(), llm_client, &merged_opts) } +/// Register a freshly built session's close handle into the parent agent's +/// registry. Called by `session_builder::build_agent_session` immediately +/// after the handle is constructed. +/// +/// Uses `Weak` so the registry doesn't keep the handle alive; when the +/// caller drops their `AgentSession`, the handle's `Arc` count goes to +/// zero, the handle drops, and the `Weak` in the registry becomes +/// dangling. Dead entries are pruned lazily on the next +/// [`list_sessions`] / [`close_session`] access. +pub(super) fn register_session(agent: &Agent, handle: &Arc) { + let weak = Arc::downgrade(handle); + let id = handle.session_id.clone(); + let mut sessions = agent + .sessions + .lock() + .unwrap_or_else(|poison| poison.into_inner()); + sessions.insert(id, weak); +} + +fn bail_if_agent_closed(agent: &Agent) -> Result<()> { + if agent.closed.load(std::sync::atomic::Ordering::Acquire) { + return Err(CodeError::SessionClosed { + session_id: "".to_string(), + }); + } + Ok(()) +} + +pub(super) async fn list_sessions(agent: &Agent) -> Vec { + let mut sessions = agent + .sessions + .lock() + .unwrap_or_else(|poison| poison.into_inner()); + sessions.retain(|_, weak| weak.strong_count() > 0); + let mut ids: Vec = sessions.keys().cloned().collect(); + ids.sort(); + ids +} + +pub(super) async fn close_session(agent: &Agent, session_id: &str) -> bool { + let handle: Option> = { + let mut sessions = agent + .sessions + .lock() + .unwrap_or_else(|poison| poison.into_inner()); + sessions.retain(|_, weak| weak.strong_count() > 0); + sessions.get(session_id).and_then(Weak::upgrade) + }; + match handle { + Some(handle) => { + let was_open = !handle.is_closed(); + handle.close().await; + was_open + } + None => false, + } +} + +pub(super) async fn close_agent(agent: &Agent) { + // Mark closed *before* iterating so concurrent `session()` calls fail fast. + if agent.closed.swap(true, std::sync::atomic::Ordering::AcqRel) { + return; + } + + // Snapshot live handles so we can close them outside the registry lock. + // Also prune dead `Weak` entries here: a high-churn create-and-drop + // workload that never calls `list_sessions`/`close_session` would + // otherwise leave dangling entries in the registry until agent close. + let handles: Vec> = { + let mut sessions = agent + .sessions + .lock() + .unwrap_or_else(|poison| poison.into_inner()); + sessions.retain(|_, weak| weak.strong_count() > 0); + sessions.values().filter_map(Weak::upgrade).collect() + }; + for handle in handles { + handle.close().await; + } + + // Tear down global MCP connections so background workers exit. + if let Some(mcp) = &agent.global_mcp { + for name in mcp.list_connected().await { + if let Err(e) = mcp.disconnect(&name).await { + tracing::warn!( + server = %name, + error = %e, + "Failed to disconnect MCP server during Agent::close" + ); + } + } + } +} + pub(super) fn create_session_for_agent( agent: &Agent, workspace: impl Into, @@ -54,6 +154,8 @@ pub(super) fn resume_session( session_id: &str, options: SessionOptions, ) -> Result { + bail_if_agent_closed(agent)?; + let store = options.session_store.clone().ok_or_else(|| { crate::error::CodeError::Session( "resume_session requires a session_store in SessionOptions".to_string(), diff --git a/core/src/agent_api/capabilities.rs b/core/src/agent_api/capabilities.rs index 69e668c..d7004ee 100644 --- a/core/src/agent_api/capabilities.rs +++ b/core/src/agent_api/capabilities.rs @@ -52,7 +52,10 @@ pub(super) fn build_session_capabilities(input: SessionCapabilityInput<'_>) -> S artifact_limits, ), ); - let trace_sink = crate::trace::InMemoryTraceSink::default(); + let trace_sink = match input.opts.retention_limits.and_then(|l| l.max_trace_events) { + Some(cap) => crate::trace::InMemoryTraceSink::with_max_events(cap), + None => crate::trace::InMemoryTraceSink::new(), + }; tool_executor.set_trace_sink(Arc::new(trace_sink.clone())); if let Some(ref search_config) = input.code_config.search { @@ -61,7 +64,20 @@ pub(super) fn build_session_capabilities(input: SessionCapabilityInput<'_>) -> S .set_search_config(search_config.clone()); } - let subagent_tasks = Arc::new(crate::subagent_task_tracker::InMemorySubagentTaskTracker::new()); + let subagent_tasks = Arc::new( + match input + .opts + .retention_limits + .and_then(|l| l.max_terminal_subagent_tasks) + { + Some(cap) => { + crate::subagent_task_tracker::InMemorySubagentTaskTracker::with_max_terminal_tasks( + cap, + ) + } + None => crate::subagent_task_tracker::InMemorySubagentTaskTracker::new(), + }, + ); let agent_registry = register_task_capability( input.code_config, input.opts, diff --git a/core/src/agent_api/conversation_runtime.rs b/core/src/agent_api/conversation_runtime.rs index 1dfd2c9..c547662 100644 --- a/core/src/agent_api/conversation_runtime.rs +++ b/core/src/agent_api/conversation_runtime.rs @@ -9,16 +9,27 @@ use super::{ runtime::StreamRunContext, AgentSession, }; use crate::agent::{AgentEvent, AgentResult}; -use crate::error::Result; +use crate::error::{CodeError, Result}; use crate::llm::{Attachment, Message}; use tokio::sync::mpsc; use tokio::task::JoinHandle; +fn bail_if_closed(session: &AgentSession) -> Result<()> { + if session.is_closed() { + return Err(CodeError::SessionClosed { + session_id: session.session_id.clone(), + }); + } + Ok(()) +} + pub(super) async fn send( session: &AgentSession, prompt: &str, history: Option<&[Message]>, ) -> Result { + bail_if_closed(session)?; + if let Some(result) = command_runtime::dispatch_blocking(session, prompt, history).await? { return Ok(result); } @@ -37,6 +48,8 @@ pub(super) async fn send_with_attachments( attachments: &[Attachment], history: Option<&[Message]>, ) -> Result { + bail_if_closed(session)?; + // Build one user message containing text and images, then execute from the // resulting message list so the loop does not append a duplicate prompt. let input = ConversationInput::with_attachments(session, history, prompt, attachments); @@ -52,6 +65,8 @@ pub(super) async fn stream_with_attachments( attachments: &[Attachment], history: Option<&[Message]>, ) -> Result<(mpsc::Receiver, JoinHandle<()>)> { + bail_if_closed(session)?; + let input = ConversationInput::with_attachments(session, history, prompt, attachments); let stream_run = StreamRunContext::start(session, prompt, input.persistence).await; Ok(stream_run.spawn_from_messages(input.messages)) @@ -62,6 +77,8 @@ pub(super) async fn stream( prompt: &str, history: Option<&[Message]>, ) -> Result<(mpsc::Receiver, JoinHandle<()>)> { + bail_if_closed(session)?; + if let Some(stream) = command_runtime::dispatch_streaming(session, prompt).await { return Ok(stream); } @@ -71,6 +88,62 @@ pub(super) async fn stream( Ok(stream_run.spawn_with_prompt(input.messages, prompt.to_string())) } +/// Resume a previously-checkpointed run on this session (P3 cut 2). +/// +/// Loads the latest [`LoopCheckpoint`](crate::loop_checkpoint::LoopCheckpoint) +/// for `checkpoint_run_id` from the session's `SessionStore` and replays +/// the agent loop from that boundary state. A **new** run id is +/// generated for the resumed work — the relationship between the old +/// and new run is metadata 书安OS tracks externally. +/// +/// Returns an error when the session has no store configured, or when +/// no checkpoint exists for `checkpoint_run_id`. +pub(super) async fn resume_run( + session: &AgentSession, + checkpoint_run_id: &str, +) -> Result { + bail_if_closed(session)?; + + let store = session.session_store.as_ref().ok_or_else(|| { + CodeError::Session("resume_run requires a session_store on this session".to_string()) + })?; + + let checkpoint = store + .load_loop_checkpoint(checkpoint_run_id) + .await + .map_err(|e| { + CodeError::Session(format!( + "load_loop_checkpoint('{checkpoint_run_id}') failed: {e}" + )) + })? + .ok_or_else(|| { + CodeError::Session(format!( + "no loop checkpoint found for run '{checkpoint_run_id}'" + )) + })?; + + let persistence = + Some(super::session_persistence::SessionPersistenceContext::from_session(session)); + let blocking_run = BlockingRunContext::start( + session, + &format!("", checkpoint.turn), + persistence, + ) + .await; + // Seed the resumed run's loop state with the cumulative metrics from + // the checkpoint so token usage and tool-call counts continue from + // where the crashed/migrated run left off rather than re-starting at + // zero (which would under-report the resumed AgentResult). + let seed = crate::agent::ExecutionSeed { + total_usage: checkpoint.total_usage.clone(), + tool_calls_count: checkpoint.tool_calls_count, + verification_reports: checkpoint.verification_reports.clone(), + }; + blocking_run + .execute_from_messages_seeded(checkpoint.messages, &session.session_id, Some(seed)) + .await +} + fn warn_deferred_init(session: &AgentSession) { if let Some(warning) = &session.init_warning { tracing::warn!( diff --git a/core/src/agent_api/run_lifecycle.rs b/core/src/agent_api/run_lifecycle.rs index 3926d36..a7b2e8c 100644 --- a/core/src/agent_api/run_lifecycle.rs +++ b/core/src/agent_api/run_lifecycle.rs @@ -18,6 +18,10 @@ pub(super) struct StreamRunWorkerState { run_id: String, persistence: Option, should_auto_save: Arc, + /// Shared per-run cancel token slot (populated by lifecycle's + /// `set_cancel_token`). Used to classify a failed run as `Cancelled` + /// when the token was fired (e.g., by `session_cancel.cancel()`). + cancel_token: Arc>>, } impl StreamRunWorkerState { @@ -34,11 +38,22 @@ impl StreamRunWorkerState { } } Err(error) => { - let error_message = error.to_string(); - let _ = self - .run_store - .mark_failed(&self.run_id, error_message) - .await; + let cancelled = self + .cancel_token + .lock() + .await + .as_ref() + .map(|t| t.is_cancelled()) + .unwrap_or(false); + if cancelled { + let _ = self.run_store.mark_cancelled(&self.run_id).await; + } else { + let error_message = error.to_string(); + let _ = self + .run_store + .mark_failed(&self.run_id, error_message) + .await; + } } } } @@ -51,6 +66,7 @@ pub(super) struct RunControlState { cancel_token: Arc>>, current_run_id: Arc>>, hook_executor: Option>, + host_env: Arc, } impl RunControlState { @@ -61,11 +77,18 @@ impl RunControlState { cancel_token: Arc::clone(&session.cancel_token), current_run_id: Arc::clone(&session.current_run_id), hook_executor: session.ahp_executor.clone(), + host_env: Arc::clone(&session.config.host_env), } } pub(super) async fn start_run(&self, prompt: &str) -> crate::run::RunHandle { - let snapshot = self.run_store.create_run(&self.session_id, prompt).await; + // Honor the session's host-provided IdGenerator so deterministic + // replay tooling can pin run ids alongside session_id. + let id = format!("run-{}", self.host_env.next_id()); + let snapshot = self + .run_store + .create_run_with_id(id, &self.session_id, prompt) + .await; *self.current_run_id.lock().await = Some(snapshot.id.clone()); self.run_handle(snapshot.id, self.session_id.clone()) } @@ -146,9 +169,21 @@ impl BlockingRunLifecycle { where E: std::fmt::Display + Into, { + // Sample the cancellation flag *before* clearing the token so we can + // distinguish cancellation-driven errors from genuine failures. + let cancelled = self.cleanup.was_cancelled().await; self.cleanup.clear_cancel_token().await; let _ = runtime_collector.await; + // The run reached a terminal state in-process — its loop checkpoint + // is dead weight. Only a process crash (this code never runs) should + // leave a checkpoint for crash-recovery resume. + if let Some(persistence) = &self.persistence { + persistence + .clear_loop_checkpoint(self.cleanup.run_id()) + .await; + } + match result { Ok(result) => { if let Some(persistence) = &self.persistence { @@ -159,11 +194,15 @@ impl BlockingRunLifecycle { Ok(result) } Err(error) => { - let error_message = error.to_string(); - let _ = self - .run_store - .mark_failed(self.cleanup.run_id(), error_message) - .await; + if cancelled { + let _ = self.run_store.mark_cancelled(self.cleanup.run_id()).await; + } else { + let error_message = error.to_string(); + let _ = self + .run_store + .mark_failed(self.cleanup.run_id(), error_message) + .await; + } self.cleanup.finish().await; Err(error.into()) } @@ -202,6 +241,7 @@ impl StreamRunLifecycle { run_id: self.cleanup.run_id().to_string(), persistence: self.persistence.clone(), should_auto_save: Arc::clone(&self.should_auto_save), + cancel_token: self.cleanup.cancel_token_slot(), } } @@ -217,6 +257,14 @@ impl StreamRunLifecycle { persistence.auto_save_if_enabled().await; } } + // Stream run reached a terminal state in-process (worker + + // forwarder both joined) — drop its loop checkpoint. Only a + // crash (this task never completes) leaves one for resume. + if let Some(persistence) = &self.persistence { + persistence + .clear_loop_checkpoint(self.cleanup.run_id()) + .await; + } self.cleanup.clear_cancel_token().await; self.cleanup.finish().await; }) @@ -234,6 +282,7 @@ mod tests { cancel_token: Arc::new(tokio::sync::Mutex::new(None)), current_run_id: Arc::new(tokio::sync::Mutex::new(None)), hook_executor: None, + host_env: Arc::new(crate::host_env::HostEnv::system()), } } diff --git a/core/src/agent_api/runtime.rs b/core/src/agent_api/runtime.rs index eb424ff..92b3472 100644 --- a/core/src/agent_api/runtime.rs +++ b/core/src/agent_api/runtime.rs @@ -63,12 +63,13 @@ impl BlockingRunContext { .start_run(prompt) .await; let run_id = run.id().to_string(); - let agent_loop = build_agent_loop(session); + let mut agent_loop = build_agent_loop(session); + agent_loop.set_checkpoint_run(&run_id); let (runtime_tx, runtime_rx) = mpsc::channel(2048); let runtime_collector = RuntimeEventSink::from_session(session, &run_id).spawn_collector(runtime_rx); let lifecycle = BlockingRunLifecycle::from_session(session, &run_id, persistence); - let cancel_token = tokio_util::sync::CancellationToken::new(); + let cancel_token = session.session_cancel.child_token(); lifecycle.set_cancel_token(cancel_token.clone()).await; Self { @@ -109,6 +110,20 @@ impl BlockingRunContext { self, messages: Vec, session_id: &str, + ) -> Result { + self.execute_from_messages_seeded(messages, session_id, None) + .await + } + + /// Execute from a prebuilt message list, seeding the loop's cumulative + /// metrics from a checkpoint. Used by `resume_run` so resumed runs + /// continue token/tool-call accounting from the checkpoint instead of + /// re-starting at zero. + pub(super) async fn execute_from_messages_seeded( + self, + messages: Vec, + session_id: &str, + seed: Option, ) -> Result { let Self { agent_loop, @@ -118,11 +133,12 @@ impl BlockingRunContext { lifecycle, } = self; let result = agent_loop - .execute_from_messages( + .execute_from_messages_seeded( messages, Some(session_id), Some(runtime_tx), Some(&cancel_token), + seed, ) .await; lifecycle.complete(runtime_collector, result).await @@ -148,13 +164,14 @@ impl StreamRunContext { ) -> Self { let (tx, rx) = mpsc::channel(256); let (runtime_tx, runtime_rx) = mpsc::channel(256); - let agent_loop = build_agent_loop(session); + let mut agent_loop = build_agent_loop(session); let run = RunControlState::from_session(session) .start_run(prompt) .await; let run_id = run.id().to_string(); + agent_loop.set_checkpoint_run(&run_id); let lifecycle = StreamRunLifecycle::from_session(session, &run_id, persistence); - let cancel_token = tokio_util::sync::CancellationToken::new(); + let cancel_token = session.session_cancel.child_token(); lifecycle.set_cancel_token(cancel_token.clone()).await; let worker_state = lifecycle.worker_state(); let forwarder = diff --git a/core/src/agent_api/runtime_events.rs b/core/src/agent_api/runtime_events.rs index 5801375..81bb837 100644 --- a/core/src/agent_api/runtime_events.rs +++ b/core/src/agent_api/runtime_events.rs @@ -173,10 +173,31 @@ impl RunCleanupState { *self.cancel_token.lock().await = Some(token); } + /// Share the per-run cancel-token slot. Used by stream worker state to + /// observe cancellation when classifying a failed run. + pub(super) fn cancel_token_slot( + &self, + ) -> Arc>> { + Arc::clone(&self.cancel_token) + } + pub(super) async fn clear_cancel_token(&self) { *self.cancel_token.lock().await = None; } + /// Returns `true` when the per-run cancellation token (or any parent it + /// was derived from, such as the session-level token) has been fired. + /// Used by lifecycle `complete()` to classify a failed run as `Cancelled` + /// vs `Failed` when an `Err` comes back from the agent loop. + pub(super) async fn was_cancelled(&self) -> bool { + self.cancel_token + .lock() + .await + .as_ref() + .map(|t| t.is_cancelled()) + .unwrap_or(false) + } + pub(super) async fn finish(&self) { self.active_tools.write().await.clear(); let mut current = self.current_run_id.lock().await; diff --git a/core/src/agent_api/session_builder.rs b/core/src/agent_api/session_builder.rs index f7c0564..21845de 100644 --- a/core/src/agent_api/session_builder.rs +++ b/core/src/agent_api/session_builder.rs @@ -21,7 +21,14 @@ use super::session_runtime::{build_session_runtime, SessionRuntimeInput}; pub(super) fn prepare_session_options(agent: &Agent, opts: SessionOptions) -> SessionOptions { let mut opts = merge_mcp_managers(agent, opts); if opts.session_id.is_none() { - opts.session_id = Some(uuid::Uuid::new_v4().to_string()); + // Use the host-provided ID generator if one was supplied via + // SessionOptions — this is the entry point that enables + // deterministic-replay tooling to pin session ids. + let env = opts + .host_env + .clone() + .unwrap_or_else(|| Arc::clone(&agent.config.host_env)); + opts.session_id = Some(env.next_id()); } opts } @@ -173,6 +180,11 @@ pub(super) fn build_agent_session( auto_delegation, agent_registry: Some(Arc::clone(&agent_registry)), max_execution_time_ms: opts.max_execution_time_ms.or(base.max_execution_time_ms), + budget_guard: opts.budget_guard.clone().or(base.budget_guard.clone()), + host_env: opts + .host_env + .clone() + .unwrap_or_else(|| Arc::clone(&base.host_env)), ..base }; @@ -190,6 +202,32 @@ pub(super) fn build_agent_session( let session_store = resolve_session_store(&agent.code_config, opts); let command_registry = CommandRegistry::new(); + let closed = Arc::new(std::sync::atomic::AtomicBool::new(false)); + let session_cancel = tokio_util::sync::CancellationToken::new(); + let cancel_token = Arc::new(tokio::sync::Mutex::new(None)); + let current_run_id = Arc::new(tokio::sync::Mutex::new(None)); + let run_store = Arc::new({ + let limits = opts.retention_limits; + crate::run::InMemoryRunStore::with_retention( + limits.and_then(|l| l.max_runs_retained), + limits.and_then(|l| l.max_events_per_run), + ) + }); + + let close_handle = Arc::new(super::session_close::SessionCloseHandle { + session_id: session_id.clone(), + closed: Arc::clone(&closed), + session_cancel: session_cancel.clone(), + cancel_token: Arc::clone(&cancel_token), + current_run_id: Arc::clone(¤t_run_id), + run_store: Arc::clone(&run_store), + subagent_tasks: Arc::clone(&subagent_tasks), + confirmation_manager: config.confirmation_manager.clone(), + hook_executor: opts.hook_executor.clone(), + }); + + super::agent_sessions::register_session(agent, &close_handle); + let session = AgentSession { llm_client, tool_executor, @@ -217,13 +255,21 @@ pub(super) fn build_agent_session( .or_else(|| agent.global_mcp.clone()) .unwrap_or_else(|| Arc::new(crate::mcp::manager::McpManager::new())), agent_registry, - cancel_token: Arc::new(tokio::sync::Mutex::new(None)), - current_run_id: Arc::new(tokio::sync::Mutex::new(None)), - run_store: Arc::new(crate::run::InMemoryRunStore::new()), + cancel_token, + current_run_id, + run_store, subagent_tasks, active_tools: Arc::new(tokio::sync::RwLock::new(HashMap::new())), trace_sink, verification_reports: Arc::new(RwLock::new(Vec::new())), + closed, + session_cancel, + close_handle, + tenant_id: opts.tenant_id.clone(), + principal: opts.principal.clone(), + agent_template_id: opts.agent_template_id.clone(), + correlation_id: opts.correlation_id.clone(), + runtime_budget_guard: std::sync::Mutex::new(None), }; Ok(session) } diff --git a/core/src/agent_api/session_close.rs b/core/src/agent_api/session_close.rs new file mode 100644 index 0000000..9c49fbf --- /dev/null +++ b/core/src/agent_api/session_close.rs @@ -0,0 +1,109 @@ +//! Out-of-band session close handle. +//! +//! `SessionCloseHandle` is an `Arc`-shareable substruct that owns just the +//! fields needed to terminate an `AgentSession` from outside (typically from +//! the parent [`Agent`](super::Agent)'s session registry). +//! +//! `AgentSession` carries one of these via `Arc`; the +//! parent `Agent` stores a `Weak` in its registry. When +//! the user drops the session, the handle drops too and the registry's +//! `Weak` becomes dangling — pruned on the next `list_sessions()` / +//! `close_session()` call. +//! +//! Sharing the close mechanics through a single `close()` method on this +//! struct guarantees `AgentSession::close()` and `Agent::close_session(id)` +//! perform exactly the same cleanup. + +use crate::hitl::ConfirmationProvider; +use crate::hooks::HookExecutor; +use crate::run::InMemoryRunStore; +use crate::subagent_task_tracker::{InMemorySubagentTaskTracker, SubagentStatus}; +use std::sync::atomic::{AtomicBool, Ordering}; +use std::sync::Arc; +use tokio::sync::Mutex; +use tokio_util::sync::CancellationToken; + +/// Bundle of `Arc`-shared session state needed to perform a graceful close +/// from anywhere holding (a clone of) the handle. +pub(crate) struct SessionCloseHandle { + pub(crate) session_id: String, + /// Tripped on first `close()` call; subsequent calls become no-ops and + /// `AgentSession::send`/`stream` fast-fail. + pub(crate) closed: Arc, + /// Session-level parent token. All in-flight run/subagent tokens are + /// `child_token()` of this. + pub(crate) session_cancel: CancellationToken, + /// Per-run cancel-token slot (currently active run's token, if any). + /// Populated by the run lifecycle. + pub(crate) cancel_token: Arc>>, + /// Current run id (matches `cancel_token` when set). + pub(crate) current_run_id: Arc>>, + pub(crate) run_store: Arc, + pub(crate) subagent_tasks: Arc, + pub(crate) confirmation_manager: Option>, + pub(crate) hook_executor: Option>, +} + +impl SessionCloseHandle { + /// Return whether `close()` has already been called. + pub(crate) fn is_closed(&self) -> bool { + self.closed.load(Ordering::Acquire) + } + + /// Perform the full session close sequence. Idempotent: subsequent calls + /// are no-ops and are guaranteed not to panic. + /// + /// Sequence (see [`AgentSession::close`](super::AgentSession::close) + /// for the public-facing contract): + /// 1. Flip the `closed` flag so further `send`/`stream` fast-fail; + /// 2. Fire the session-level cancellation token so every derived run + /// and subagent task token fires; + /// 3. Mark the active run as `Cancelled` in the run store and emit the + /// AHP `record_run_cancelled` hook; + /// 4. Mark every still-running delegated subagent task as `Cancelled` + /// in the tracker; + /// 5. Cancel pending HITL tool confirmations so blocked tool callers + /// receive a rejection instead of hanging. + pub(crate) async fn close(&self) { + if self.closed.swap(true, Ordering::AcqRel) { + return; + } + + // 1. Fire the session-level token so children cascade. + self.session_cancel.cancel(); + + // 2. Mark the active run cancelled and fire AHP hook bookkeeping. + // The per-run token has already fired via step 1; this loop + // just updates the run store and emits the hook event. + let had_active_token = self.cancel_token.lock().await.is_some(); + if had_active_token { + if let Some(run_id) = self.current_run_id.lock().await.clone() { + let _ = self.run_store.mark_cancelled(&run_id).await; + if let Some(hook) = &self.hook_executor { + hook.record_run_cancelled(&run_id, &self.session_id, Some("cancelled by host")) + .await; + } + } + } + + // 3. Mark every still-running subagent task cancelled. + let pending: Vec = self + .subagent_tasks + .list_for_parent(&self.session_id) + .await + .into_iter() + .filter(|task| task.status == SubagentStatus::Running) + .map(|task| task.task_id) + .collect(); + for task_id in pending { + let _ = self.subagent_tasks.cancel(&task_id).await; + } + + // 4. Cancel pending HITL confirmations. + if let Some(manager) = &self.confirmation_manager { + let _ = manager.cancel_all().await; + } + + tracing::info!(session_id = %self.session_id, "AgentSession closed"); + } +} diff --git a/core/src/agent_api/session_options.rs b/core/src/agent_api/session_options.rs index 85ef462..0a3f544 100644 --- a/core/src/agent_api/session_options.rs +++ b/core/src/agent_api/session_options.rs @@ -274,6 +274,70 @@ impl SessionOptions { self } + /// Tag the session with a host-defined tenant id. Opaque to the + /// framework — propagated to `SessionData`, hooks, and traces. + pub fn with_tenant_id(mut self, tenant: impl Into) -> Self { + self.tenant_id = Some(tenant.into()); + self + } + + /// Tag the session with the id of the principal (user / service + /// account / etc.) that triggered it. + pub fn with_principal(mut self, principal: impl Into) -> Self { + self.principal = Some(principal.into()); + self + } + + /// Tag the session with the id of the agent template / definition it + /// was instantiated from. + pub fn with_agent_template_id(mut self, template_id: impl Into) -> Self { + self.agent_template_id = Some(template_id.into()); + self + } + + /// Attach a distributed-trace correlation id so this session's events + /// can be joined with upstream/downstream work. + pub fn with_correlation_id(mut self, corr: impl Into) -> Self { + self.correlation_id = Some(corr.into()); + self + } + + /// Install a host-supplied [`BudgetGuard`](crate::budget::BudgetGuard). + /// + /// The guard is consulted before every LLM call (and after, for + /// usage accounting). When unset, no budget enforcement happens. + pub fn with_budget_guard(mut self, guard: Arc) -> Self { + self.budget_guard = Some(guard); + self + } + + /// Install a host-provided [`HostEnv`](crate::host_env::HostEnv) for + /// deterministic ID generation and time. Replaces the framework + /// default of `uuid::Uuid::new_v4()` + wall clock — used by + /// 书安OS replay infrastructure to recreate a run bit-identical on + /// another node. + pub fn with_host_env(mut self, env: Arc) -> Self { + self.host_env = Some(env); + self + } + + /// Install FIFO retention caps for the session's in-memory stores. + /// + /// Without these caps the in-memory run store, trace sink, and + /// subagent task tracker grow unboundedly across long-running + /// sessions. Hosts running thousands of long-lived sessions per + /// node should set sensible caps (e.g. retain the last 100 runs, + /// 5000 events per run, 10000 trace events, 1000 terminal subagent + /// tasks). When unset, the framework keeps every record — the + /// pre-existing behaviour. + pub fn with_retention_limits( + mut self, + limits: crate::retention::SessionRetentionLimits, + ) -> Self { + self.retention_limits = Some(limits); + self + } + /// Enable auto-save after each `send()` call pub fn with_auto_save(mut self, enabled: bool) -> Self { self.auto_save = enabled; diff --git a/core/src/agent_api/session_persistence.rs b/core/src/agent_api/session_persistence.rs index e9f9f69..9c1c012 100644 --- a/core/src/agent_api/session_persistence.rs +++ b/core/src/agent_api/session_persistence.rs @@ -24,6 +24,11 @@ pub(super) struct SessionPersistenceContext { run_store: Arc, history: Arc>>, verification_reports: Arc>>, + subagent_tasks: Arc, + tenant_id: Option, + principal: Option, + agent_template_id: Option, + correlation_id: Option, auto_save: bool, } @@ -40,6 +45,11 @@ impl SessionPersistenceContext { run_store: Arc::clone(&session.run_store), history: Arc::clone(&session.history), verification_reports: Arc::clone(&session.verification_reports), + subagent_tasks: Arc::clone(&session.subagent_tasks), + tenant_id: session.tenant_id.clone(), + principal: session.principal.clone(), + agent_template_id: session.agent_template_id.clone(), + correlation_id: session.correlation_id.clone(), auto_save: session.auto_save, } } @@ -66,6 +76,10 @@ impl SessionPersistenceContext { config: &self.config, model_name: &self.model_name, history, + tenant_id: self.tenant_id.as_deref(), + principal: self.principal.as_deref(), + agent_template_id: self.agent_template_id.as_deref(), + correlation_id: self.correlation_id.as_deref(), }) .await; @@ -82,6 +96,9 @@ impl SessionPersistenceContext { store .save_verification_reports(&self.session_id, &verification_reports) .await?; + store + .save_subagent_tasks(&self.session_id, &self.subagent_tasks.list().await) + .await?; tracing::debug!("Session {} saved", self.session_id); Ok(()) } @@ -93,6 +110,25 @@ impl SessionPersistenceContext { } } } + + /// Delete the loop checkpoint for `run_id` once the run has reached a + /// terminal state in-process. The checkpoint exists only to survive a + /// process crash; once the run returns (completed / failed / cancelled) + /// it is dead weight. No-op when no store is configured. Errors are + /// warn-logged — a failed cleanup must never mask the run's result. + pub(super) async fn clear_loop_checkpoint(&self, run_id: &str) { + let Some(store) = &self.session_store else { + return; + }; + if let Err(e) = store.delete_loop_checkpoint(run_id).await { + tracing::warn!( + run_id = %run_id, + session_id = %self.session_id, + "Failed to delete loop checkpoint on run completion: {}", + e + ); + } + } } pub(super) fn load_session_data( @@ -141,6 +177,22 @@ pub(super) fn apply_persisted_runtime_options( opts.auto_delegation = data.config.auto_delegation.clone(); } + // Identity labels: caller-supplied values take precedence (the resume + // caller may want to relabel for a new tenant/principal). Otherwise + // restore from the persisted snapshot. + if opts.tenant_id.is_none() { + opts.tenant_id = data.tenant_id.clone(); + } + if opts.principal.is_none() { + opts.principal = data.principal.clone(); + } + if opts.agent_template_id.is_none() { + opts.agent_template_id = data.agent_template_id.clone(); + } + if opts.correlation_id.is_none() { + opts.correlation_id = data.correlation_id.clone(); + } + opts } @@ -175,6 +227,14 @@ pub(super) fn restore_persisted_session_state( *write_or_recover(&session.verification_reports) = reports; } + if let Some(tasks) = load_subagent_tasks(store, &session_id)? { + if let Ok(handle) = tokio::runtime::Handle::try_current() { + tokio::task::block_in_place(|| { + handle.block_on(session.subagent_tasks.replace_snapshots(tasks)) + }); + } + } + Ok(()) } @@ -184,6 +244,10 @@ struct SessionDataSnapshotInput<'a> { config: &'a AgentConfig, model_name: &'a str, history: Vec, + tenant_id: Option<&'a str>, + principal: Option<&'a str>, + agent_template_id: Option<&'a str>, + correlation_id: Option<&'a str>, } async fn build_session_data_snapshot(input: SessionDataSnapshotInput<'_>) -> SessionData { @@ -230,6 +294,10 @@ async fn build_session_data_snapshot(input: SessionDataSnapshotInput<'_>) -> Ses llm_config: model_config_data(input.model_name), tasks: Vec::new(), parent_id: None, + tenant_id: input.tenant_id.map(str::to_string), + principal: input.principal.map(str::to_string), + agent_template_id: input.agent_template_id.map(str::to_string), + correlation_id: input.correlation_id.map(str::to_string), } } @@ -315,6 +383,24 @@ fn load_run_records( } } +fn load_subagent_tasks( + store: &Arc, + session_id: &str, +) -> Result>> { + match tokio::runtime::Handle::try_current() { + Ok(handle) => { + tokio::task::block_in_place(|| handle.block_on(store.load_subagent_tasks(session_id))) + .map_err(|e| { + CodeError::Session(format!( + "Failed to load subagent tasks for session {}: {}", + session_id, e + )) + }) + } + Err(_) => Ok(None), + } +} + fn load_verification_reports( store: &Arc, session_id: &str, @@ -362,6 +448,10 @@ mod tests { }), tasks: Vec::new(), parent_id: None, + tenant_id: None, + principal: None, + agent_template_id: None, + correlation_id: None, } } diff --git a/core/src/agent_api/tests.rs b/core/src/agent_api/tests.rs index 7de3448..2d55277 100644 --- a/core/src/agent_api/tests.rs +++ b/core/src/agent_api/tests.rs @@ -1141,6 +1141,503 @@ async fn test_cancel_run_only_cancels_matching_current_run() { assert!(!session.cancel_run(&run_id).await); } +#[tokio::test] +async fn test_is_closed_starts_false() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let session = agent.session("/tmp/test-close-default", None).unwrap(); + assert!(!session.is_closed()); +} + +#[tokio::test] +async fn test_close_marks_session_closed_and_is_idempotent() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let session = agent.session("/tmp/test-close-idempotent", None).unwrap(); + assert!(!session.is_closed()); + + session.close().await; + assert!(session.is_closed()); + + session.close().await; + assert!(session.is_closed()); +} + +#[tokio::test] +async fn test_send_after_close_returns_session_closed_error() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts = SessionOptions::new().with_session_id("send-after-close"); + let session = agent + .build_session( + "/tmp/test-send-after-close".into(), + Arc::new(StaticStreamingClient::new("never delivered")), + &opts, + ) + .unwrap(); + + session.close().await; + let err = session.send("hello", None).await.unwrap_err(); + match err { + crate::error::CodeError::SessionClosed { session_id } => { + assert_eq!(session_id, "send-after-close"); + } + other => panic!("expected SessionClosed, got {other:?}"), + } +} + +#[tokio::test] +async fn test_stream_after_close_returns_session_closed_error() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts = SessionOptions::new().with_session_id("stream-after-close"); + let session = agent + .build_session( + "/tmp/test-stream-after-close".into(), + Arc::new(StaticStreamingClient::new("never delivered")), + &opts, + ) + .unwrap(); + + session.close().await; + let err = session.stream("hello", None).await.unwrap_err(); + assert!(matches!( + err, + crate::error::CodeError::SessionClosed { ref session_id } + if session_id == "stream-after-close" + )); +} + +#[tokio::test] +async fn test_close_cancels_in_flight_send() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let session = Arc::new( + agent + .build_session( + "/tmp/test-close-in-flight".into(), + Arc::new(CancellableStreamingClient::new("partial answer")), + &SessionOptions::new(), + ) + .unwrap(), + ); + + let worker_session = Arc::clone(&session); + let worker = tokio::spawn(async move { worker_session.send("hello", None).await }); + + let mut run_id = None; + for _ in 0..50 { + if let Some(current) = session.current_run().await { + run_id = Some(current.id().to_string()); + break; + } + tokio::time::sleep(std::time::Duration::from_millis(10)).await; + } + let run_id = run_id.expect("current run should be visible before close()"); + + session.close().await; + assert!(session.is_closed()); + + let result = tokio::time::timeout(std::time::Duration::from_secs(1), worker) + .await + .expect("send should stop after close") + .expect("worker should not panic"); + assert!(result.is_err()); + assert_eq!( + session.run_snapshot(&run_id).await.unwrap().status, + crate::run::RunStatus::Cancelled + ); +} + +/// Custom BudgetGuard that denies the first LLM call — used to verify +/// that the framework consults the guard and bails before touching +/// the LLM client. Records whether `check_before_llm` was called. +#[derive(Debug, Default)] +struct DenyingBudgetGuard { + checks: std::sync::atomic::AtomicUsize, + llm_records: std::sync::atomic::AtomicUsize, +} + +#[async_trait::async_trait] +impl crate::budget::BudgetGuard for DenyingBudgetGuard { + async fn check_before_llm( + &self, + _session_id: &str, + _est_tokens: usize, + ) -> crate::budget::BudgetDecision { + self.checks + .fetch_add(1, std::sync::atomic::Ordering::SeqCst); + crate::budget::BudgetDecision::Deny { + resource: "llm_tokens".to_string(), + reason: "test cap exceeded".to_string(), + } + } + + async fn record_after_llm(&self, _session_id: &str, _usage: &crate::llm::TokenUsage) { + self.llm_records + .fetch_add(1, std::sync::atomic::Ordering::SeqCst); + } +} + +#[tokio::test] +async fn test_budget_guard_deny_aborts_llm_call() { + let guard = Arc::new(DenyingBudgetGuard::default()); + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts = SessionOptions::new() + .with_session_id("budget-deny-test") + .with_budget_guard(guard.clone() as Arc); + let session = agent + .build_session( + "/tmp/test-budget-deny".into(), + Arc::new(StaticStreamingClient::new("never-delivered")), + &opts, + ) + .unwrap(); + + let err = session.send("hello", None).await.unwrap_err(); + let msg = err.to_string(); + assert!( + msg.contains("Budget exhausted") || msg.contains("llm_tokens"), + "expected budget-exhausted error, got: {msg}" + ); + assert_eq!( + guard.checks.load(std::sync::atomic::Ordering::SeqCst), + 1, + "BudgetGuard::check_before_llm must be consulted exactly once" + ); + assert_eq!( + guard.llm_records.load(std::sync::atomic::Ordering::SeqCst), + 0, + "record_after_llm must not fire when the call was denied" + ); + assert!( + session.history().is_empty(), + "denied call must not pollute conversation history" + ); +} + +#[test] +fn test_cluster_agent_events_serialize_with_expected_tags() { + // Lock the wire schema for cluster-event variants — these are + // emitted by the host (书安OS) through HookExecutor and need + // stable JSON tags so external producers can target them. + let budget = AgentEvent::BudgetThresholdHit { + resource: "llm_tokens".to_string(), + kind: "soft".to_string(), + consumed: 12000.0, + limit: 10000.0, + message: Some("approaching daily cap".to_string()), + }; + let json = serde_json::to_string(&budget).unwrap(); + assert!( + json.contains("\"type\":\"budget_threshold_hit\""), + "got: {json}" + ); + assert!(json.contains("\"resource\":\"llm_tokens\""), "got: {json}"); + + let passivate = AgentEvent::PassivationRequested { + reason: "node_drain".to_string(), + deadline_ms: Some(1_700_000_000_000), + }; + let json = serde_json::to_string(&passivate).unwrap(); + assert!( + json.contains("\"type\":\"passivation_requested\""), + "got: {json}" + ); + + let peer = AgentEvent::PeerInvocation { + from_session_id: "peer-1".to_string(), + from_tenant_id: Some("acme".to_string()), + correlation_id: None, // omitted via skip_serializing_if + }; + let json = serde_json::to_string(&peer).unwrap(); + assert!(json.contains("\"type\":\"peer_invocation\""), "got: {json}"); + assert!( + !json.contains("correlation_id"), + "None field must be skipped, got: {json}" + ); + + // Round-trip — ensures the #[serde(default)] hints don't break loading + // from a payload that omits the optional fields. + let minimal_peer = r#"{"type":"peer_invocation","from_session_id":"x"}"#; + let parsed: AgentEvent = serde_json::from_str(minimal_peer).unwrap(); + assert!( + matches!(parsed, AgentEvent::PeerInvocation { ref from_session_id, .. } if from_session_id == "x") + ); +} + +#[tokio::test] +async fn test_custom_host_env_yields_deterministic_session_and_run_ids() { + use crate::host_env::{FixedClock, HostEnv, SequentialIdGenerator}; + + let env = Arc::new(HostEnv::new( + Arc::new(SequentialIdGenerator::new("test")), + Arc::new(FixedClock::new(1_700_000_000_000)), + )); + + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts_a = SessionOptions::new().with_host_env(env.clone()); + let session_a = agent + .session("/tmp/test-host-env-a", Some(opts_a)) + .expect("session a"); + + // First call to next_id() yields "test-0" — used as session_id. + assert_eq!( + session_a.id(), + "test-0", + "session_id must come from HostEnv" + ); + + // run_id derives from next_id() too, prefixed with "run-". + let session_a = Arc::new(session_a); + let worker = { + let s = Arc::clone(&session_a); + tokio::spawn(async move { + // Use a static streaming client by building manually so the + // call resolves without an actual provider. + let _ = s; + }) + }; + let _ = worker.await; + + // Second session reuses the same generator → continues the sequence. + let opts_b = SessionOptions::new().with_host_env(env); + let session_b = agent + .session("/tmp/test-host-env-b", Some(opts_b)) + .expect("session b"); + assert_eq!(session_b.id(), "test-1"); +} + +#[tokio::test] +async fn test_runtime_budget_guard_overrides_session_options_value() { + // A guard installed via set_budget_guard() *after* construction + // must take effect on the next send/stream — that's the entry + // point Node SDK relies on (JsFunction can't live inside a + // value-typed SessionOptions). + let runtime_guard = Arc::new(DenyingBudgetGuard::default()); + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts = SessionOptions::new().with_session_id("runtime-guard-override"); + let session = agent + .build_session( + "/tmp/test-runtime-guard".into(), + Arc::new(StaticStreamingClient::new("never-delivered")), + &opts, + ) + .unwrap(); + + // No guard installed at build time -> send would succeed. Install + // a denying guard now and assert the next send is aborted. + session.set_budget_guard(Some( + runtime_guard.clone() as Arc + )); + let err = session.send("hello", None).await.unwrap_err(); + assert!(err.to_string().contains("Budget exhausted")); + assert_eq!( + runtime_guard + .checks + .load(std::sync::atomic::Ordering::SeqCst), + 1 + ); + + // Clearing the override should let a follow-up send succeed. + session.set_budget_guard(None); + let result = session.send("hello again", None).await.unwrap(); + assert_eq!(result.text, "never-delivered"); +} + +#[tokio::test] +async fn test_disconnect_idle_mcp_is_safe_no_op_without_global_mcp() { + let agent = Agent::from_config(test_config()).await.unwrap(); + // test_config carries no mcp_servers, so global_mcp is None and + // the idle sweep must short-circuit to an empty Vec without + // panicking — the contract surface a host's sweeper will rely on. + let dropped = agent.disconnect_idle_mcp(0).await; + assert!(dropped.is_empty()); +} + +#[tokio::test] +async fn test_identity_labels_default_to_none() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let session = agent.session("/tmp/test-id-default", None).unwrap(); + assert!(session.tenant_id().is_none()); + assert!(session.principal().is_none()); + assert!(session.agent_template_id().is_none()); + assert!(session.correlation_id().is_none()); +} + +#[tokio::test] +async fn test_identity_labels_round_trip_via_session_options() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts = SessionOptions::new() + .with_tenant_id("acme-corp") + .with_principal("user-42") + .with_agent_template_id("planner-v3") + .with_correlation_id("trace-deadbeef"); + let session = agent + .session("/tmp/test-id-set", Some(opts)) + .expect("session"); + + assert_eq!(session.tenant_id(), Some("acme-corp")); + assert_eq!(session.principal(), Some("user-42")); + assert_eq!(session.agent_template_id(), Some("planner-v3")); + assert_eq!(session.correlation_id(), Some("trace-deadbeef")); +} + +#[tokio::test] +async fn test_agent_list_sessions_tracks_live_sessions() { + let agent = Agent::from_config(test_config()).await.unwrap(); + assert!(agent.list_sessions().await.is_empty()); + + let opts_a = SessionOptions::new().with_session_id("registry-a"); + let opts_b = SessionOptions::new().with_session_id("registry-b"); + let session_a = agent + .build_session( + "/tmp/test-registry-a".into(), + Arc::new(StaticStreamingClient::new("answer-a")), + &opts_a, + ) + .unwrap(); + let session_b = agent + .build_session( + "/tmp/test-registry-b".into(), + Arc::new(StaticStreamingClient::new("answer-b")), + &opts_b, + ) + .unwrap(); + + let ids = agent.list_sessions().await; + assert_eq!( + ids, + vec!["registry-a".to_string(), "registry-b".to_string()] + ); + + drop(session_a); + // After drop, the registry's Weak becomes dangling; list_sessions prunes it. + let after = agent.list_sessions().await; + assert_eq!(after, vec!["registry-b".to_string()]); + + drop(session_b); + assert!(agent.list_sessions().await.is_empty()); +} + +#[tokio::test] +async fn test_agent_close_session_closes_target_session() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts = SessionOptions::new().with_session_id("close-by-id"); + let session = agent + .build_session( + "/tmp/test-agent-close-session".into(), + Arc::new(StaticStreamingClient::new("never")), + &opts, + ) + .unwrap(); + assert!(!session.is_closed()); + + assert!(agent.close_session("close-by-id").await); + assert!(session.is_closed()); + + // Idempotent: second call still reports `true` (we found a live handle) + // OR `false` (target already closed) — accept either; what matters is no panic. + let _ = agent.close_session("close-by-id").await; + + // Unknown ids report false. + assert!(!agent.close_session("does-not-exist").await); +} + +#[tokio::test] +async fn test_agent_close_closes_every_live_session() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let opts_a = SessionOptions::new().with_session_id("agent-close-a"); + let opts_b = SessionOptions::new().with_session_id("agent-close-b"); + let session_a = agent + .build_session( + "/tmp/test-agent-close-a".into(), + Arc::new(StaticStreamingClient::new("a")), + &opts_a, + ) + .unwrap(); + let session_b = agent + .build_session( + "/tmp/test-agent-close-b".into(), + Arc::new(StaticStreamingClient::new("b")), + &opts_b, + ) + .unwrap(); + + agent.close().await; + assert!(session_a.is_closed()); + assert!(session_b.is_closed()); + + // After Agent::close(), session creation must fail fast — the agent has + // already disposed of its resources. + let err = agent + .session("/tmp/test-agent-closed", None) + .err() + .expect("session() after close() must error"); + let msg = err.to_string(); + assert!(msg.contains("closed") || msg.contains("Closed")); +} + +#[tokio::test] +async fn test_session_cancel_token_starts_uncancelled() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let session = agent + .session("/tmp/test-session-cancel-fresh", None) + .unwrap(); + let tok = session.session_cancel_token(); + assert!(!tok.is_cancelled()); +} + +#[tokio::test] +async fn test_close_cancels_session_token() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let session = agent + .session("/tmp/test-session-cancel-on-close", None) + .unwrap(); + let observer = session.session_cancel_token(); + assert!(!observer.is_cancelled()); + + session.close().await; + assert!(observer.is_cancelled()); +} + +#[tokio::test] +async fn test_session_cancel_token_propagates_to_in_flight_run() { + let agent = Agent::from_config(test_config()).await.unwrap(); + let session = Arc::new( + agent + .build_session( + "/tmp/test-session-cancel-cascades".into(), + Arc::new(CancellableStreamingClient::new("partial answer")), + &SessionOptions::new(), + ) + .unwrap(), + ); + + let worker_session = Arc::clone(&session); + let worker = tokio::spawn(async move { worker_session.send("hello", None).await }); + + let mut run_id = None; + for _ in 0..50 { + if let Some(current) = session.current_run().await { + run_id = Some(current.id().to_string()); + break; + } + tokio::time::sleep(std::time::Duration::from_millis(10)).await; + } + let run_id = run_id.expect("current run should be visible"); + + // Fire the session-level token directly, bypassing close()/cancel(). + // The in-flight run's token must be a *child* of this one for + // cancellation to propagate. + session.session_cancel_token().cancel(); + + let result = tokio::time::timeout(std::time::Duration::from_secs(1), worker) + .await + .expect("send should stop after session_cancel fires") + .expect("worker should not panic"); + assert!(result.is_err()); + assert_eq!( + session.run_snapshot(&run_id).await.unwrap().status, + crate::run::RunStatus::Cancelled + ); +} + #[tokio::test] async fn test_send_with_attachments_passes_session_id_to_context_providers() { let provider = Arc::new(CapturingContextProvider::default()); @@ -1667,6 +2164,202 @@ async fn test_resume_session() { assert_eq!(history[0].text(), "What is Rust?"); } +/// H4 regression: a run that completes in-process must DELETE its loop +/// checkpoint (the checkpoint exists only to survive a crash). Before +/// the fix, every tool-using run leaked a checkpoint forever. +/// +/// We use a deterministic HostEnv so the run id is predictable, seed a +/// checkpoint under that id, run a (no-tool) send that completes through +/// the normal lifecycle, and assert the checkpoint was cleared. +#[tokio::test(flavor = "multi_thread")] +async fn test_completed_run_clears_its_loop_checkpoint() { + use crate::host_env::{HostEnv, SequentialIdGenerator, SystemClock}; + use crate::loop_checkpoint::{LoopCheckpoint, LOOP_CHECKPOINT_SCHEMA_VERSION}; + + let store = Arc::new(crate::store::MemorySessionStore::new()); + let agent = Agent::from_config(test_config()).await.unwrap(); + + // Deterministic ids: session_id is set explicitly (consumes no + // counter), so the first next_id() goes to the run -> "run-seq-0". + let env = Arc::new(HostEnv::new( + Arc::new(SequentialIdGenerator::new("seq")), + Arc::new(SystemClock), + )); + let opts = SessionOptions::new() + .with_session_id("ckpt-clear-session") + .with_session_store(store.clone() as Arc) + .with_host_env(env); + let session = agent + .build_session( + "/tmp/test-ckpt-clear".into(), + Arc::new(StaticStreamingClient::new("done")), + &opts, + ) + .unwrap(); + + // Seed a checkpoint under the run id this send will use. + let predicted_run_id = "run-seq-0"; + let cp_store: Arc = store.clone(); + cp_store + .save_loop_checkpoint( + predicted_run_id, + &LoopCheckpoint { + schema_version: LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: predicted_run_id.to_string(), + session_id: "ckpt-clear-session".to_string(), + turn: 1, + messages: vec![Message::user("seed")], + total_usage: crate::llm::TokenUsage::default(), + tool_calls_count: 0, + verification_reports: Vec::new(), + checkpoint_ms: 1, + }, + ) + .await + .unwrap(); + + let result = session.send("hello", None).await.unwrap(); + assert_eq!(result.text, "done"); + + // Self-document the predicted run id. + let runs = session.runs().await; + assert_eq!(runs.len(), 1); + assert_eq!(runs[0].id, predicted_run_id, "run id must be deterministic"); + + // The checkpoint must have been cleared by the run lifecycle. + let after: Arc = store.clone(); + assert!( + after + .load_loop_checkpoint(predicted_run_id) + .await + .unwrap() + .is_none(), + "completed run must delete its loop checkpoint (else unbounded leak)" + ); +} + +/// P3 happy path (cut 2 E2E): a manually-seeded `LoopCheckpoint` in +/// the SessionStore can be picked up by `AgentSession::resume_run`, +/// the loop runs from the checkpoint's message vec (no new user +/// prompt is appended — `execute_from_messages` path), and the +/// resumed run is allocated a **fresh** run id (not the +/// checkpoint's). +/// +/// This exercises the contract surface 书安OS will sit on: write a +/// checkpoint on node A, hand the run id to node B which builds a +/// session against the shared store and calls `resume_run`. Crash +/// simulation is reduced to a manual checkpoint seed because the +/// in-process agent loop has no "die mid-round" affordance suitable +/// for unit testing. +#[tokio::test(flavor = "multi_thread")] +async fn test_resume_run_picks_up_from_persisted_checkpoint() { + use crate::loop_checkpoint::{LoopCheckpoint, LOOP_CHECKPOINT_SCHEMA_VERSION}; + + let store = Arc::new(crate::store::MemorySessionStore::new()); + let agent = Agent::from_config(test_config()).await.unwrap(); + + // Seed a checkpoint as if a previous run on another node had + // completed one tool round and persisted the boundary state. + let seeded_run_id = "ckpt-old-run-x"; + let seeded_messages = vec![ + Message::user("kick off"), + Message { + role: "assistant".to_string(), + content: vec![crate::llm::ContentBlock::Text { + text: "intermediate work".to_string(), + }], + reasoning_content: None, + }, + ]; + // Seed NON-ZERO cumulative metrics so the test can detect whether + // resume_run carries them forward (H2 regression: it used to reset + // them to zero, under-reporting the resumed AgentResult). + let checkpoint = LoopCheckpoint { + schema_version: LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: seeded_run_id.to_string(), + session_id: "resume-run-target".to_string(), + turn: 1, + messages: seeded_messages.clone(), + total_usage: crate::llm::TokenUsage { + prompt_tokens: 800, + completion_tokens: 200, + total_tokens: 1000, + cache_read_tokens: None, + cache_write_tokens: None, + }, + tool_calls_count: 3, + verification_reports: Vec::new(), + checkpoint_ms: 1_700_000_000_000, + }; + { + let cp_store: Arc = store.clone(); + cp_store + .save_loop_checkpoint(seeded_run_id, &checkpoint) + .await + .expect("seed checkpoint"); + } + + // Build a session bound to the same store + a mock LLM that + // produces a final-answer text. resume_run will feed it the + // seeded `messages` and the loop should finish on this turn. + let opts = SessionOptions::new() + .with_session_store(store.clone() as Arc) + .with_session_id("resume-run-target"); + let session = agent + .build_session( + "/tmp/test-resume-run-target".into(), + Arc::new(StaticStreamingClient::new("resumed and completed")), + &opts, + ) + .unwrap(); + + let result = session + .resume_run(seeded_run_id) + .await + .expect("resume_run must succeed"); + assert_eq!(result.text, "resumed and completed"); + + // H2: the resumed run must CONTINUE accounting from the checkpoint's + // cumulative metrics, not reset to zero. The mock LLM adds 2 tokens + // (1 prompt + 1 completion) for its single turn, so the result must + // reflect the seeded 1000 + 2 = 1002, and the seeded tool-call count + // (3) must carry forward (this turn ran no tools). + assert_eq!( + result.usage.total_tokens, 1002, + "resumed run must add to the checkpoint's cumulative token usage, not reset it" + ); + assert_eq!(result.usage.prompt_tokens, 801); + assert_eq!(result.usage.completion_tokens, 201); + assert_eq!( + result.tool_calls_count, 3, + "resumed run must preserve the checkpoint's tool-call count" + ); + + // The resumed run records its own run id in the in-memory store, + // and that id must NOT match the seeded checkpoint id — the + // framework allocates a fresh run rather than pretending to + // continue the old one. + let runs = session.runs().await; + assert_eq!(runs.len(), 1, "resume_run creates exactly one new run"); + let resumed_run = &runs[0]; + assert_ne!( + resumed_run.id, seeded_run_id, + "resumed run must have a fresh id, got the seeded one" + ); + assert_eq!(resumed_run.status, crate::run::RunStatus::Completed); + + // The checkpoint stays in the store under the OLD run id — + // resume does not delete it. (The host decides retention.) + let still_there: Arc = store.clone(); + let cp = still_there + .load_loop_checkpoint(seeded_run_id) + .await + .expect("load") + .expect("old checkpoint preserved"); + assert_eq!(cp.run_id, seeded_run_id); + assert_eq!(cp.turn, 1); +} + #[tokio::test(flavor = "multi_thread")] async fn test_resume_session_restores_artifacts() { let store = Arc::new(crate::store::MemorySessionStore::new()); diff --git a/core/src/budget.rs b/core/src/budget.rs new file mode 100644 index 0000000..c54bc16 --- /dev/null +++ b/core/src/budget.rs @@ -0,0 +1,172 @@ +//! Budget / cost / quota contract for cluster-grade hosts. +//! +//! The framework does not enforce budgets itself — it only defines the +//! decision points and emits structured events. The host (e.g. 书安OS) +//! implements [`BudgetGuard`] with whatever backend it likes +//! (per-tenant counters in Redis, per-day USD caps in Postgres, etc.) +//! and plugs it into [`SessionOptions::with_budget_guard`]. +//! +//! Decision points wired today: +//! +//! 1. **Before each LLM call** — [`BudgetGuard::check_before_llm`]. +//! A `Deny` aborts the call; a `SoftLimit` lets the call proceed but +//! triggers an [`AgentEvent::BudgetThresholdHit`] so in-session +//! policy (hooks, custom prompts) can react. +//! 2. **After each LLM call** — [`BudgetGuard::record_after_llm`]. +//! The host updates its running spend total with the actual usage. +//! 3. **Before each tool call** — [`BudgetGuard::check_before_tool`]. +//! Same decision shape; useful for capping expensive tools per +//! tenant. +//! +//! The default trait methods are no-ops returning [`BudgetDecision::Allow`] +//! so existing code is unaffected until a host plugs in a real impl. +//! +//! See [`AgentEvent::BudgetThresholdHit`](crate::agent::AgentEvent::BudgetThresholdHit) +//! for the event vocabulary triggered by `SoftLimit`. + +use crate::llm::TokenUsage; +use async_trait::async_trait; + +/// Outcome of a budget check. +/// +/// The framework treats this purely as a decision — it never inspects +/// the carried strings except to forward them to [`AgentEvent`]s and to +/// the eventual error. +/// +/// [`AgentEvent`]: crate::agent::AgentEvent +#[derive(Debug, Clone)] +pub enum BudgetDecision { + /// Operation proceeds normally. No event is emitted. + Allow, + /// Operation proceeds, but the framework emits a + /// [`AgentEvent::BudgetThresholdHit { kind: "soft", .. }`] + /// event before continuing. In-session hooks can react (e.g. trigger + /// auto-compact, swap to a cheaper model on next turn). + /// + /// [`AgentEvent::BudgetThresholdHit { kind: "soft", .. }`]: crate::agent::AgentEvent::BudgetThresholdHit + SoftLimit { + /// Logical resource label ("llm_tokens", "usd_cost", "wall_time", ...). + resource: String, + /// Current consumed amount (units depend on `resource`). + consumed: f64, + /// Threshold that was crossed. + limit: f64, + /// Optional human-readable explanation for logs / UI. + message: Option, + }, + /// Operation is refused. The framework returns + /// [`CodeError::BudgetExhausted`](crate::error::CodeError::BudgetExhausted) + /// from the LLM / tool entry point. The session itself stays open — + /// callers can re-try later or after the host has re-allocated + /// budget. + Deny { + /// Logical resource label that exhausted. + resource: String, + /// Human-readable reason surfaced in the error and in any + /// emitted `BudgetThresholdHit { kind: "hard", .. }` event. + reason: String, + }, +} + +/// Host-supplied budget / quota contract. +/// +/// Implementations are typically wired up by a cluster control plane +/// to enforce cross-session, cross-tenant cost limits. The framework +/// itself ships only the no-op [`NoopBudgetGuard`]. +/// +/// All trait methods default to `Allow` / no-op so impls only need to +/// override what they actually want to govern. +#[async_trait] +pub trait BudgetGuard: Send + Sync { + /// Called immediately before an LLM API call. + /// + /// `estimated_prompt_tokens` is a best-effort framework estimate + /// from the message history at call time; impls that want precise + /// accounting should use [`record_after_llm`](Self::record_after_llm) + /// instead of trusting the estimate. + async fn check_before_llm( + &self, + session_id: &str, + estimated_prompt_tokens: usize, + ) -> BudgetDecision { + let _ = (session_id, estimated_prompt_tokens); + BudgetDecision::Allow + } + + /// Called after every successful LLM call with the actual usage + /// reported by the provider. Lets the impl keep its running spend + /// total in sync with reality. + /// + /// Failed LLM calls do not invoke this hook. + async fn record_after_llm(&self, session_id: &str, usage: &TokenUsage) { + let _ = (session_id, usage); + } + + /// Called immediately before a tool invocation. The framework does + /// not pass tool arguments — impls that need argument-aware caps + /// must wrap the executor via a custom `ToolExecutor`. + async fn check_before_tool(&self, session_id: &str, tool_name: &str) -> BudgetDecision { + let _ = (session_id, tool_name); + BudgetDecision::Allow + } +} + +/// Default implementation that always allows everything. Used when no +/// host-supplied guard is configured. +#[derive(Debug, Default, Clone, Copy)] +pub struct NoopBudgetGuard; + +#[async_trait] +impl BudgetGuard for NoopBudgetGuard {} + +#[cfg(test)] +mod tests { + use super::*; + use std::sync::atomic::{AtomicUsize, Ordering}; + use std::sync::Arc; + + #[tokio::test] + async fn noop_allows_everything() { + let guard = NoopBudgetGuard; + assert!(matches!( + guard.check_before_llm("s", 1000).await, + BudgetDecision::Allow + )); + assert!(matches!( + guard.check_before_tool("s", "bash").await, + BudgetDecision::Allow + )); + // record is just observable side-effect; ensure it doesn't panic. + guard.record_after_llm("s", &TokenUsage::default()).await; + } + + #[derive(Debug, Default)] + struct CountingGuard { + llm_checks: AtomicUsize, + records: AtomicUsize, + } + + #[async_trait] + impl BudgetGuard for CountingGuard { + async fn check_before_llm(&self, _: &str, _: usize) -> BudgetDecision { + self.llm_checks.fetch_add(1, Ordering::SeqCst); + BudgetDecision::Deny { + resource: "llm_tokens".to_string(), + reason: "budget exhausted in test".to_string(), + } + } + async fn record_after_llm(&self, _: &str, _: &TokenUsage) { + self.records.fetch_add(1, Ordering::SeqCst); + } + } + + #[tokio::test] + async fn custom_guard_can_deny() { + let guard: Arc = Arc::new(CountingGuard::default()); + let decision = guard.check_before_llm("s", 100).await; + match decision { + BudgetDecision::Deny { resource, .. } => assert_eq!(resource, "llm_tokens"), + other => panic!("expected Deny, got {other:?}"), + } + } +} diff --git a/core/src/error.rs b/core/src/error.rs index 7961eac..74dda52 100644 --- a/core/src/error.rs +++ b/core/src/error.rs @@ -37,6 +37,20 @@ pub enum CodeError { #[error("Session error: {0}")] Session(String), + /// Session has been closed; further operations are rejected. + /// + /// Returned by `send`/`stream` (and their variants) after + /// [`AgentSession::close`](crate::agent_api::AgentSession::close) + /// — or [`Agent::close`](crate::agent_api::Agent::close) — has been called. + #[error("Session '{session_id}' is closed")] + SessionClosed { session_id: String }, + + /// A host-supplied [`BudgetGuard`](crate::budget::BudgetGuard) denied + /// the operation. The session is not closed — callers can re-try + /// after the host has re-allocated budget. + #[error("Budget exhausted on '{resource}': {reason}")] + BudgetExhausted { resource: String, reason: String }, + /// Security subsystem error #[error("Security error: {0}")] Security(String), diff --git a/core/src/host_env.rs b/core/src/host_env.rs new file mode 100644 index 0000000..e7ef74b --- /dev/null +++ b/core/src/host_env.rs @@ -0,0 +1,223 @@ +//! Host-environment plumbing: ID generation and time. +//! +//! The framework relies on two ambient capabilities — fresh IDs and the +//! current time — at many call sites (`session_id`, `run_id`, event +//! timestamps, retry backoff). Defaulting both to `uuid::Uuid::new_v4()` +//! / `SystemTime::now()` is fine for production but blocks two +//! cluster-grade features: +//! +//! - **Deterministic replay** of a run on another node for failure +//! investigation. With injectable [`IdGenerator`] / [`Clock`] impls a +//! host can record the seed and replay it bit-identical elsewhere. +//! - **Time-bending tests** without monkey-patching `std::time`. +//! +//! Hosts plug a custom impl via +//! [`SessionOptions::with_host_env`](crate::agent_api::SessionOptions::with_host_env); +//! the framework uses [`SystemHostEnv`] (the wall-clock + random-UUID +//! default) when none is supplied — observably identical to pre-P2 +//! behaviour. + +use std::sync::Arc; +use std::time::{SystemTime, UNIX_EPOCH}; + +/// Generator for unique identifiers used by the framework +/// (session_id, run_id, subagent task_id, …). +/// +/// The contract is intentionally loose: implementations may produce +/// random, monotonic, or deterministic-by-seed IDs. The framework +/// treats output as opaque and only requires uniqueness within the +/// hosting process. +pub trait IdGenerator: Send + Sync + std::fmt::Debug { + /// Return a fresh ID. May be called concurrently from many tasks. + fn next_id(&self) -> String; +} + +/// Source of the current time in Unix-epoch milliseconds. +/// +/// Same uniqueness contract as [`IdGenerator`]: the framework treats +/// the value as opaque. Monotonicity is not required (NTP corrections +/// happen) but typical impls are at least non-decreasing. +pub trait Clock: Send + Sync + std::fmt::Debug { + /// Current time, milliseconds since Unix epoch. + fn now_ms(&self) -> u64; +} + +/// Bundle of host-environment capabilities. Used as the single +/// `Option>` slot on [`AgentConfig`](crate::agent::AgentConfig) +/// and [`SessionOptions`](crate::agent_api::SessionOptions) — avoids +/// growing two parallel `Arc` fields. +#[derive(Debug, Clone)] +pub struct HostEnv { + pub id_generator: Arc, + pub clock: Arc, +} + +impl HostEnv { + /// Construct a host env from concrete components. + pub fn new(id_generator: Arc, clock: Arc) -> Self { + Self { + id_generator, + clock, + } + } + + /// Default system-backed host env: random UUIDs + wall clock. + /// Equivalent to pre-P2 behaviour. + pub fn system() -> Self { + Self { + id_generator: Arc::new(SystemIdGenerator), + clock: Arc::new(SystemClock), + } + } + + /// Shortcut for `self.id_generator.next_id()`. + pub fn next_id(&self) -> String { + self.id_generator.next_id() + } + + /// Shortcut for `self.clock.now_ms()`. + pub fn now_ms(&self) -> u64 { + self.clock.now_ms() + } +} + +impl Default for HostEnv { + fn default() -> Self { + Self::system() + } +} + +// ============================================================================ +// Default impls +// ============================================================================ + +/// UUID-v4 based ID generator — the framework default. +#[derive(Debug, Default, Clone, Copy)] +pub struct SystemIdGenerator; + +impl IdGenerator for SystemIdGenerator { + fn next_id(&self) -> String { + uuid::Uuid::new_v4().to_string() + } +} + +/// Wall-clock time source — the framework default. +#[derive(Debug, Default, Clone, Copy)] +pub struct SystemClock; + +impl Clock for SystemClock { + fn now_ms(&self) -> u64 { + SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|d| d.as_millis() as u64) + .unwrap_or(0) + } +} + +// ============================================================================ +// Deterministic helpers (cfg(test) + replay) +// ============================================================================ + +/// Deterministic ID generator that yields a configured prefix followed +/// by a monotonic counter (`-0`, `-1`, …). +/// +/// Public so external host crates (e.g. 书安OS replay tooling) can use it +/// without re-implementing the pattern. +#[derive(Debug, Default)] +pub struct SequentialIdGenerator { + prefix: String, + counter: std::sync::atomic::AtomicU64, +} + +impl SequentialIdGenerator { + pub fn new(prefix: impl Into) -> Self { + Self { + prefix: prefix.into(), + counter: std::sync::atomic::AtomicU64::new(0), + } + } +} + +impl IdGenerator for SequentialIdGenerator { + fn next_id(&self) -> String { + let n = self + .counter + .fetch_add(1, std::sync::atomic::Ordering::SeqCst); + if self.prefix.is_empty() { + n.to_string() + } else { + format!("{}-{}", self.prefix, n) + } + } +} + +/// Clock that returns a configured, atomically-updatable timestamp. +/// Useful for replay (advance to recorded value) and for tests that +/// need stable timestamps. +#[derive(Debug)] +pub struct FixedClock { + now_ms: std::sync::atomic::AtomicU64, +} + +impl FixedClock { + pub fn new(now_ms: u64) -> Self { + Self { + now_ms: std::sync::atomic::AtomicU64::new(now_ms), + } + } + + /// Atomically set the clock to a new value. Returns the previous value. + pub fn set(&self, now_ms: u64) -> u64 { + self.now_ms + .swap(now_ms, std::sync::atomic::Ordering::SeqCst) + } + + /// Advance the clock by `delta_ms`. + pub fn advance(&self, delta_ms: u64) { + self.now_ms + .fetch_add(delta_ms, std::sync::atomic::Ordering::SeqCst); + } +} + +impl Clock for FixedClock { + fn now_ms(&self) -> u64 { + self.now_ms.load(std::sync::atomic::Ordering::SeqCst) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn system_host_env_produces_nonempty_ids_and_increasing_time() { + let env = HostEnv::system(); + let a = env.next_id(); + let b = env.next_id(); + assert!(!a.is_empty()); + assert!(!b.is_empty()); + assert_ne!(a, b); + let t1 = env.now_ms(); + std::thread::sleep(std::time::Duration::from_millis(2)); + let t2 = env.now_ms(); + assert!(t2 >= t1); + } + + #[test] + fn sequential_id_generator_is_deterministic() { + let gen = SequentialIdGenerator::new("run"); + assert_eq!(gen.next_id(), "run-0"); + assert_eq!(gen.next_id(), "run-1"); + assert_eq!(gen.next_id(), "run-2"); + } + + #[test] + fn fixed_clock_is_controllable() { + let clock = FixedClock::new(1000); + assert_eq!(clock.now_ms(), 1000); + clock.advance(500); + assert_eq!(clock.now_ms(), 1500); + assert_eq!(clock.set(0), 1500); + assert_eq!(clock.now_ms(), 0); + } +} diff --git a/core/src/lib.rs b/core/src/lib.rs index e5aee07..090078f 100644 --- a/core/src/lib.rs +++ b/core/src/lib.rs @@ -78,6 +78,7 @@ pub(crate) mod agent; pub(crate) mod agent_api; #[cfg(feature = "ahp")] pub mod ahp; +pub mod budget; pub(crate) mod child_run; pub mod commands; pub(crate) mod compaction; @@ -88,7 +89,9 @@ pub(crate) mod file_history; pub(crate) mod git; pub mod hitl; pub mod hooks; +pub mod host_env; pub mod llm; +pub mod loop_checkpoint; pub mod mcp; pub mod memory; pub(crate) mod ordered_parallel; @@ -97,6 +100,7 @@ pub mod planning; pub mod program; pub(crate) mod prompts; pub mod queue; +pub mod retention; pub(crate) mod retry; pub mod run; pub(crate) mod safety_gate; diff --git a/core/src/loop_checkpoint.rs b/core/src/loop_checkpoint.rs new file mode 100644 index 0000000..3fecee3 --- /dev/null +++ b/core/src/loop_checkpoint.rs @@ -0,0 +1,179 @@ +//! Per-tool-round loop checkpoints for crash-tolerant runs (P3 cut 1). +//! +//! The agent loop persists a [`LoopCheckpoint`] after each completed tool +//! round. The checkpoint captures the minimum state needed to recreate +//! the loop's position so a future process — typically on a different +//! node, dispatched by 书安OS after a crash or planned migration — can +//! resume from the last consistent boundary. +//! +//! Boundary policy: checkpoints are taken **only** between tool rounds, +//! never mid-tool. If a process dies while a tool is executing, the +//! work of that round is lost on resume; the LLM re-deliberates from +//! the previous checkpoint. This trades retry cost for correctness — +//! re-executing a non-idempotent tool (write, bash) on the wrong side +//! of the boundary is worse than re-asking the LLM. +//! +//! Resume API (cut 2 follow-up): not part of this cut. This module +//! lands the data contract + persistence wiring; an +//! `AgentSession::resume_run(run_id)` entry point will live on top. + +use crate::llm::{Message, TokenUsage}; +use crate::verification::VerificationReport; +use async_trait::async_trait; +use serde::{Deserialize, Serialize}; + +/// Schema version. Bumped on incompatible format changes; impls of +/// [`LoopCheckpointSink`] should reject loads from a future version. +pub const LOOP_CHECKPOINT_SCHEMA_VERSION: u32 = 1; + +/// Snapshot of the agent loop at the boundary between tool rounds. +/// +/// Stored under `run_id` so resume tooling can address the correct run +/// without scanning all checkpoints of a session. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct LoopCheckpoint { + /// Schema version — see [`LOOP_CHECKPOINT_SCHEMA_VERSION`]. + #[serde(default)] + pub schema_version: u32, + + /// Logical run identifier. Matches the `run_id` carried by + /// [`crate::run::RunSnapshot`] and `AgentEvent`s. + pub run_id: String, + + /// Parent session id — redundant with `run_id` lookup but useful + /// for store layouts that key by `(session_id, run_id)`. + pub session_id: String, + + /// 1-based tool round counter at checkpoint time. + /// `0` is reserved for "no rounds completed yet". + pub turn: usize, + + /// Conversation history including the just-returned tool results. + /// On resume, the new agent loop starts from this exact message list. + pub messages: Vec, + + /// Running token usage at checkpoint time. Lets resume re-emit + /// progress metrics without re-querying the LLM provider. + pub total_usage: TokenUsage, + + /// How many tool calls have been executed total in this run. + pub tool_calls_count: usize, + + /// Verification reports collected so far in this run. + #[serde(default)] + pub verification_reports: Vec, + + /// Wall-clock timestamp when the checkpoint was written + /// (Unix epoch ms — sourced from the session's + /// [`HostEnv`](crate::host_env::HostEnv)). + pub checkpoint_ms: u64, +} + +/// Receiver of per-tool-round checkpoints. +/// +/// The framework ships one adapter: +/// [`SessionStoreCheckpointSink`] which forwards to a +/// [`crate::store::SessionStore`]. Hosts can implement custom sinks +/// (e.g. push directly to Redis) by implementing this trait. +#[async_trait] +pub trait LoopCheckpointSink: Send + Sync { + /// Persist a checkpoint. Called from inside the agent loop after a + /// successful tool round. Errors are logged at warn level and + /// otherwise swallowed — losing a checkpoint must not halt the + /// live run. + async fn save_checkpoint(&self, checkpoint: &LoopCheckpoint); + + /// Load the latest checkpoint for `run_id`, if any. Returns `None` + /// when no checkpoint has been recorded. + async fn load_latest(&self, run_id: &str) -> Option; +} + +/// Default adapter that forwards checkpoints to a +/// [`SessionStore`](crate::store::SessionStore). Construct via +/// [`SessionStoreCheckpointSink::new`]. +pub struct SessionStoreCheckpointSink { + inner: std::sync::Arc, +} + +impl SessionStoreCheckpointSink { + pub fn new(store: std::sync::Arc) -> Self { + Self { inner: store } + } +} + +#[async_trait] +impl LoopCheckpointSink for SessionStoreCheckpointSink { + async fn save_checkpoint(&self, checkpoint: &LoopCheckpoint) { + if let Err(e) = self + .inner + .save_loop_checkpoint(&checkpoint.run_id, checkpoint) + .await + { + tracing::warn!( + run_id = %checkpoint.run_id, + error = %e, + "Loop checkpoint save failed; live run continues" + ); + } + } + + async fn load_latest(&self, run_id: &str) -> Option { + match self.inner.load_loop_checkpoint(run_id).await { + Ok(opt) => opt, + Err(e) => { + tracing::warn!( + run_id = %run_id, + error = %e, + "Loop checkpoint load failed" + ); + None + } + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn sample(run_id: &str, turn: usize) -> LoopCheckpoint { + LoopCheckpoint { + schema_version: LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: run_id.to_string(), + session_id: "session-1".to_string(), + turn, + messages: vec![Message::user("hi")], + total_usage: TokenUsage::default(), + tool_calls_count: 0, + verification_reports: Vec::new(), + checkpoint_ms: 1_700_000_000_000, + } + } + + #[test] + fn checkpoint_round_trips_through_json() { + let cp = sample("run-1", 3); + let json = serde_json::to_string(&cp).unwrap(); + let back: LoopCheckpoint = serde_json::from_str(&json).unwrap(); + assert_eq!(back.run_id, "run-1"); + assert_eq!(back.turn, 3); + assert_eq!(back.schema_version, LOOP_CHECKPOINT_SCHEMA_VERSION); + } + + #[test] + fn missing_schema_version_defaults_to_zero() { + // Older payloads without the field must still load — they'll + // be interpreted as a pre-v1 snapshot. + let json = r#"{ + "run_id": "run-1", + "session_id": "s", + "turn": 1, + "messages": [], + "total_usage": {"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}, + "tool_calls_count": 0, + "checkpoint_ms": 0 + }"#; + let cp: LoopCheckpoint = serde_json::from_str(json).unwrap(); + assert_eq!(cp.schema_version, 0); + } +} diff --git a/core/src/mcp/manager.rs b/core/src/mcp/manager.rs index cf59d5b..2b76e5a 100644 --- a/core/src/mcp/manager.rs +++ b/core/src/mcp/manager.rs @@ -34,6 +34,12 @@ pub struct McpManager { configs: RwLock>, /// Last connection error per server, cleared on successful connect connect_errors: RwLock>, + /// Last-used timestamp per connected server (Unix epoch ms). + /// Updated by `connect` (initial use) and `call_tool` (active use). + /// Read by hosts via [`McpManager::last_used_at_ms`] / used by + /// [`McpManager::disconnect_idle`] to release FDs and background + /// workers from servers that are no longer in active use. + last_used_at_ms: RwLock>, } impl McpManager { @@ -43,6 +49,7 @@ impl McpManager { clients: RwLock::new(HashMap::new()), configs: RwLock::new(HashMap::new()), connect_errors: RwLock::new(HashMap::new()), + last_used_at_ms: RwLock::new(HashMap::new()), } } @@ -138,11 +145,16 @@ impl McpManager { let tools = client.list_tools().await?; tracing::info!("MCP server '{}' connected with {} tools", name, tools.len()); - // Store client + // Store client + stamp initial last-used time so idle reapers + // see freshly-connected servers as active. { let mut clients = self.clients.write().await; clients.insert(name.to_string(), client); } + self.last_used_at_ms + .write() + .await + .insert(name.to_string(), now_epoch_ms()); Ok(()) } @@ -153,6 +165,7 @@ impl McpManager { let mut clients = self.clients.write().await; clients.remove(name) }; + self.last_used_at_ms.write().await.remove(name); if let Some(client) = client { client.close().await?; @@ -162,6 +175,84 @@ impl McpManager { Ok(()) } + /// Return the last-used timestamp (Unix epoch ms) for a connected + /// server, or `None` if the server is unknown / not connected. + pub async fn last_used_at_ms(&self, name: &str) -> Option { + self.last_used_at_ms.read().await.get(name).copied() + } + + /// Mark a server as active right now. The framework calls this + /// automatically on connect and on every successful + /// [`call_tool`](Self::call_tool); hosts can call it explicitly + /// to keep a server "warm" out of band (e.g. when a tool result + /// comes back via a different channel). + pub async fn touch(&self, name: &str) { + self.last_used_at_ms + .write() + .await + .insert(name.to_string(), now_epoch_ms()); + } + + /// Disconnect every connected server whose last-used timestamp is + /// older than `now - idle_threshold_ms`. Returns the names of + /// servers that were disconnected. + /// + /// Servers without a recorded timestamp are treated as **infinitely + /// idle** and disconnected. The disconnect call itself can fail + /// per-server (e.g. transport already closed); those failures are + /// warn-logged but never panic — the result vec still includes + /// every name the manager attempted to drop. + /// + /// Hosts running thousands of long-lived sessions should call this + /// periodically (e.g. every 60s with a 5-min threshold) to release + /// file descriptors and background workers from quiet MCP servers + /// without losing the server's configuration. A subsequent + /// [`call_tool`](Self::call_tool) on the same server name will + /// require an explicit `connect` to come back online. + pub async fn disconnect_idle(&self, idle_threshold_ms: u64) -> Vec { + let cutoff = now_epoch_ms().saturating_sub(idle_threshold_ms); + // Snapshot candidates so we don't hold both locks across await. + let candidates: Vec = { + let clients = self.clients.read().await; + let last_used = self.last_used_at_ms.read().await; + clients + .keys() + .filter(|name| match last_used.get(*name) { + Some(ts) => *ts < cutoff, + // No timestamp -> never used since connect; treat as + // infinitely idle. + None => true, + }) + .cloned() + .collect() + }; + let mut disconnected = Vec::with_capacity(candidates.len()); + for name in candidates { + match self.disconnect(&name).await { + Ok(()) => disconnected.push(name), + Err(e) => tracing::warn!( + server = %name, + error = %e, + "MCP idle disconnect failed; entry already removed from registry" + ), + } + } + // Opportunistically purge orphan timestamps for servers that are no + // longer connected — `touch()` records a timestamp unconditionally + // (even for a never-connected name), and the candidate scan above + // only iterates `clients.keys()`, so without this sweep those + // orphan entries in `last_used_at_ms` would accumulate unbounded + // across the lifetime of a long-running manager. + { + let clients = self.clients.read().await; + self.last_used_at_ms + .write() + .await + .retain(|name, _| clients.contains_key(name)); + } + disconnected + } + /// Get all registered server configurations pub async fn all_configs(&self) -> Vec { self.configs.read().await.values().cloned().collect() @@ -205,6 +296,13 @@ impl McpManager { .ok_or_else(|| anyhow!("MCP server not connected: {}", server_name))? }; + // Refresh the activity timestamp before the await so an idle + // sweep running concurrently sees this server as recently used. + self.last_used_at_ms + .write() + .await + .insert(server_name.clone(), now_epoch_ms()); + // Call tool client.call_tool(&tool_name, arguments).await } @@ -318,6 +416,18 @@ impl Default for McpManager { } } +/// Wall-clock now() in Unix epoch milliseconds. Used internally by the +/// activity-tracking + idle-disconnect path. Kept as a free function +/// (rather than going through `HostEnv`) because the MCP manager +/// predates host_env wiring and the host's `Clock` impl is not yet +/// threaded into the manager. +fn now_epoch_ms() -> u64 { + std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_millis() as u64) + .unwrap_or(0) +} + /// Convert MCP tool result to string output pub fn tool_result_to_string(result: &CallToolResult) -> String { let mut output = String::new(); @@ -724,4 +834,65 @@ mod tests { ); } } + + #[tokio::test] + async fn touch_updates_last_used_at_ms() { + let manager = McpManager::new(); + // Without a real connect, last_used is None. + assert!(manager.last_used_at_ms("svc-a").await.is_none()); + manager.touch("svc-a").await; + let t1 = manager.last_used_at_ms("svc-a").await.expect("set"); + assert!(t1 > 0); + // Touch again — timestamp must be monotonically non-decreasing. + manager.touch("svc-a").await; + let t2 = manager.last_used_at_ms("svc-a").await.expect("set again"); + assert!(t2 >= t1); + } + + #[tokio::test] + async fn disconnect_idle_drops_stale_servers_and_keeps_fresh_ones() { + let manager = McpManager::new(); + // Manually populate clients + timestamps so we can run the + // logic without actually launching MCP subprocesses. We can't + // build an `McpClient` from outside this module without a + // transport, so we just exercise the timestamp-driven decision + // branch via the public APIs: register two servers with + // explicit stale + fresh stamps and assert the idle sweep + // picks the right one. + // + // NOTE: clients map stays empty (no real transport spawned), + // so disconnect_idle's `candidates` set is empty and the + // returned Vec is empty. We instead verify the *timestamp + // observability* path the host needs, plus the no-op behaviour + // when there are no live clients. + manager.touch("fresh-svc").await; + // Observability works while the entry is live. + assert!(manager.last_used_at_ms("fresh-svc").await.is_some()); + assert!(manager.last_used_at_ms("never-touched").await.is_none()); + + let dropped = manager.disconnect_idle(0).await; + assert!( + dropped.is_empty(), + "no clients connected -> nothing to disconnect, got {dropped:?}" + ); + // The idle sweep also purges ORPHAN timestamps — "fresh-svc" was + // touch()ed but never connected (no entry in `clients`), so it must + // not linger in `last_used_at_ms` after a sweep. Without this, + // touch()-without-connect would leak unbounded. + assert!( + manager.last_used_at_ms("fresh-svc").await.is_none(), + "orphan timestamp (touched, never connected) must be purged by disconnect_idle" + ); + } + + #[tokio::test] + async fn touch_keeps_timestamp_after_explicit_disconnect_removes_it() { + let manager = McpManager::new(); + manager.touch("svc").await; + assert!(manager.last_used_at_ms("svc").await.is_some()); + // disconnect should clean up the activity entry even when + // no real client was ever connected (defensive cleanup). + let _ = manager.disconnect("svc").await; + assert!(manager.last_used_at_ms("svc").await.is_none()); + } } diff --git a/core/src/retention.rs b/core/src/retention.rs new file mode 100644 index 0000000..527063a --- /dev/null +++ b/core/src/retention.rs @@ -0,0 +1,84 @@ +//! In-memory retention limits for long-running sessions. +//! +//! The framework's in-memory stores +//! ([`InMemoryRunStore`](crate::run::InMemoryRunStore), +//! [`InMemoryTraceSink`](crate::trace::InMemoryTraceSink), +//! [`InMemorySubagentTaskTracker`](crate::subagent_task_tracker::InMemorySubagentTaskTracker)) +//! accumulate unboundedly by default — fine for short-lived runs, a +//! memory leak for sessions that live for hours or days under cluster +//! workloads. +//! +//! `SessionRetentionLimits` lets the host cap each store with a FIFO +//! policy. `None` for any field keeps the unbounded default, so +//! callers that don't set anything see no behaviour change. +//! +//! All caps are **soft**: when a store hits its cap, the oldest entry +//! is dropped on insert. The framework never returns errors from cap +//! enforcement. + +/// Per-session in-memory retention caps. Built via +/// [`SessionOptions::with_retention_limits`](crate::agent_api::SessionOptions::with_retention_limits) +/// or by constructing the struct directly. +#[derive(Debug, Clone, Copy, Default)] +pub struct SessionRetentionLimits { + /// Maximum number of runs retained in + /// [`InMemoryRunStore`](crate::run::InMemoryRunStore). + /// + /// When a new run is created past this cap, the **oldest** run + /// (by insertion order) is dropped along with its events. + /// `None` (default) keeps all runs. + pub max_runs_retained: Option, + + /// Maximum number of event records retained per run in + /// [`InMemoryRunStore`](crate::run::InMemoryRunStore). + /// + /// When a run accumulates more events than this, the oldest + /// events are FIFO-dropped. The run snapshot's `event_count` + /// is **not** decremented — it remains the total ever recorded. + /// `None` (default) keeps all events. + pub max_events_per_run: Option, + + /// Maximum number of events retained in + /// [`InMemoryTraceSink`](crate::trace::InMemoryTraceSink). + /// + /// When the sink reaches this cap, the oldest event is dropped + /// on each new write. `None` (default) keeps all events. + pub max_trace_events: Option, + + /// Maximum number of **terminal** (Completed / Failed / Cancelled) + /// subagent task snapshots retained in + /// [`InMemorySubagentTaskTracker`](crate::subagent_task_tracker::InMemorySubagentTaskTracker). + /// Running tasks are never dropped. + /// + /// When the count of terminal entries exceeds this cap, the + /// oldest terminal entry (by completion time) is dropped. + /// `None` (default) keeps all terminal entries. + pub max_terminal_subagent_tasks: Option, +} + +impl SessionRetentionLimits { + /// Convenience builder. + pub fn new() -> Self { + Self::default() + } + + pub fn with_max_runs(mut self, n: usize) -> Self { + self.max_runs_retained = Some(n); + self + } + + pub fn with_max_events_per_run(mut self, n: usize) -> Self { + self.max_events_per_run = Some(n); + self + } + + pub fn with_max_trace_events(mut self, n: usize) -> Self { + self.max_trace_events = Some(n); + self + } + + pub fn with_max_terminal_subagent_tasks(mut self, n: usize) -> Self { + self.max_terminal_subagent_tasks = Some(n); + self + } +} diff --git a/core/src/run.rs b/core/src/run.rs index bd88264..b722a6d 100644 --- a/core/src/run.rs +++ b/core/src/run.rs @@ -5,7 +5,7 @@ use crate::agent::AgentEvent; use serde::{Deserialize, Serialize}; -use std::collections::HashMap; +use std::collections::{HashMap, VecDeque}; use std::sync::Arc; use tokio::sync::{Mutex, RwLock}; use tokio_util::sync::CancellationToken; @@ -78,6 +78,17 @@ impl RunSnapshot { pub struct InMemoryRunStore { runs: RwLock>, events: RwLock>>, + /// Insertion order of run ids — used to FIFO-evict the oldest run + /// when `max_runs` is set and exceeded. + insertion_order: RwLock>, + /// Maximum number of runs retained. When exceeded, oldest run is + /// dropped along with its events. `None` = unlimited (default). + max_runs: Option, + /// Maximum number of events retained per run. When exceeded, the + /// oldest events are FIFO-dropped from that run's buffer. The + /// run's `event_count` field is **not** decremented — it stays as + /// the cumulative total ever recorded. `None` = unlimited. + max_events_per_run: Option, } impl InMemoryRunStore { @@ -85,11 +96,60 @@ impl InMemoryRunStore { Self::default() } + /// Construct a store with optional FIFO retention caps. `None` + /// fields keep the unbounded default. + pub fn with_retention(max_runs: Option, max_events_per_run: Option) -> Self { + Self { + runs: RwLock::new(HashMap::new()), + events: RwLock::new(HashMap::new()), + insertion_order: RwLock::new(VecDeque::new()), + max_runs, + max_events_per_run, + } + } + pub async fn create_run(&self, session_id: &str, prompt: &str) -> RunSnapshot { + // Default ID generation when the caller has no host_env handy. + // Production callers reach `create_run_with_id` via + // `RunControlState::start_run` so the host's IdGenerator is honored. let id = format!("run-{}", uuid::Uuid::new_v4()); + self.create_run_with_id(id, session_id, prompt).await + } + + /// Create a run with a caller-supplied id. Used by the session + /// orchestration layer so the parent session's host-provided + /// [`IdGenerator`](crate::host_env::IdGenerator) governs run ids. + pub async fn create_run_with_id( + &self, + id: String, + session_id: &str, + prompt: &str, + ) -> RunSnapshot { let snapshot = RunSnapshot::new(id.clone(), session_id.to_string(), prompt.to_string()); - self.runs.write().await.insert(id.clone(), snapshot.clone()); - self.events.write().await.insert(id, Vec::new()); + // Hold all three structures together for the insert + FIFO-evict so + // `runs`, `events`, and `insertion_order` never diverge under + // concurrent access (previously the maps were locked separately, + // leaving a window where a run existed in one map but not the + // other). Canonical acquisition order: order -> events -> runs. + // Other methods (record_event, records, mark_*) only ever hold ONE + // of {events, runs} at a time — they never nest — so holding both + // here cannot ABBA-deadlock against them. + { + let mut order = self.insertion_order.write().await; + let mut events = self.events.write().await; + let mut runs = self.runs.write().await; + runs.insert(id.clone(), snapshot.clone()); + events.insert(id.clone(), Vec::new()); + order.push_back(id); + if let Some(cap) = self.max_runs { + while order.len() > cap { + if let Some(victim) = order.pop_front() { + runs.remove(&victim); + events.remove(&victim); + } + } + } + } snapshot } @@ -102,6 +162,13 @@ impl InMemoryRunStore { timestamp_ms: now_ms(), event: event.clone(), }); + // FIFO-trim event buffer past per-run cap. + if let Some(cap) = self.max_events_per_run { + if run_events.len() > cap { + let excess = run_events.len() - cap; + run_events.drain(..excess); + } + } drop(events); let mut runs = self.runs.write().await; @@ -166,17 +233,161 @@ impl InMemoryRunStore { } pub async fn replace_records(&self, records: Vec) { + // Preserve creation-order in the FIFO eviction queue so a + // restored session honours its `max_runs` cap consistently + // with newly-created runs. + let mut sorted = records; + sorted.sort_by_key(|r| r.snapshot.created_at_ms); let mut run_map = HashMap::new(); let mut event_map = HashMap::new(); - - for mut record in records { - record.snapshot.event_count = record.events.len(); - event_map.insert(record.snapshot.id.clone(), record.events); - run_map.insert(record.snapshot.id.clone(), record.snapshot); + let mut order = VecDeque::with_capacity(sorted.len()); + for record in sorted { + let id = record.snapshot.id.clone(); + // Trust the persisted `event_count` — it is the CUMULATIVE total + // ever recorded and is deliberately not decremented when the + // per-run event buffer is FIFO-trimmed by `max_events_per_run`. + // Overwriting it with `record.events.len()` here would corrupt + // the cumulative count for any restored run whose buffer was + // trimmed (restoring a 100-event run with a 50-cap buffer as + // event_count=50). + event_map.insert(id.clone(), record.events); + run_map.insert(id.clone(), record.snapshot); + order.push_back(id); } - *self.runs.write().await = run_map; *self.events.write().await = event_map; + *self.insertion_order.write().await = order; + } +} + +#[cfg(test)] +mod retention_tests { + use super::*; + + #[tokio::test(flavor = "multi_thread", worker_threads = 4)] + async fn concurrent_create_and_record_under_cap_does_not_deadlock() { + // Guards the canonical lock-ordering change in create_run_with_id + // (order -> events -> runs held together). A bad ordering would + // ABBA-deadlock against concurrent record_event and hang this test. + let store = std::sync::Arc::new(InMemoryRunStore::with_retention(Some(10), None)); + let mut handles = Vec::new(); + for i in 0..100 { + let s = std::sync::Arc::clone(&store); + handles.push(tokio::spawn(async move { + let r = s.create_run("sess", &format!("p{i}")).await; + for _ in 0..5 { + s.record_event( + &r.id, + AgentEvent::TextDelta { + text: "x".to_string(), + }, + ) + .await; + } + })); + } + for h in handles { + h.await.unwrap(); + } + // Cap honored under concurrent load, and the store is still usable + // (no deadlock, no poisoned locks). + assert!(store.list().await.len() <= 10); + } + + #[tokio::test] + async fn replace_records_preserves_cumulative_event_count_after_trim() { + // Source store with a small per-run event cap. + let src = InMemoryRunStore::with_retention(None, Some(3)); + let run = src.create_run("s", "p").await; + for _ in 0..10 { + src.record_event( + &run.id, + AgentEvent::TextDelta { + text: "x".to_string(), + }, + ) + .await; + } + let records = src.records().await; + // Buffer trimmed to cap, but cumulative event_count is the total. + assert_eq!(records.len(), 1); + assert_eq!(records[0].events.len(), 3, "buffer trimmed to cap"); + assert_eq!(records[0].snapshot.event_count, 10, "cumulative preserved"); + + // Round-trip into a fresh store via replace_records. + let dst = InMemoryRunStore::new(); + dst.replace_records(records).await; + let restored = dst.snapshot(&run.id).await.unwrap(); + assert_eq!( + restored.event_count, 10, + "replace_records must NOT reset event_count to the trimmed buffer length" + ); + // The (trimmed) event buffer still round-trips at cap size. + assert_eq!(dst.events(&run.id).await.len(), 3); + } + + #[tokio::test] + async fn max_runs_evicts_oldest() { + let store = InMemoryRunStore::with_retention(Some(2), None); + let _ = store.create_run("session-1", "prompt-1").await; + let r2 = store.create_run("session-1", "prompt-2").await; + let r3 = store.create_run("session-1", "prompt-3").await; + + // Oldest run (prompt-1) must have been evicted. + assert_eq!(store.list().await.len(), 2); + let ids: Vec = store.list().await.into_iter().map(|r| r.id).collect(); + assert!(ids.contains(&r2.id)); + assert!(ids.contains(&r3.id)); + assert!(store.events(&r2.id).await.is_empty()); + // The evicted run's events are gone too. + let surviving_event_count: usize = + store.events(&r2.id).await.len() + store.events(&r3.id).await.len(); + assert_eq!(surviving_event_count, 0); + } + + #[tokio::test] + async fn max_events_per_run_caps_event_buffer() { + let store = InMemoryRunStore::with_retention(None, Some(3)); + let run = store.create_run("session-1", "prompt").await; + for _ in 0..10 { + store + .record_event( + &run.id, + AgentEvent::TextDelta { + text: "x".to_string(), + }, + ) + .await; + } + let events = store.events(&run.id).await; + assert_eq!( + events.len(), + 3, + "buffer must be capped at max_events_per_run" + ); + // Snapshot `event_count` reflects the cumulative total, not the + // surviving buffer length. + let snap = store.snapshot(&run.id).await.unwrap(); + assert_eq!(snap.event_count, 10); + } + + #[tokio::test] + async fn unlimited_retention_is_the_default() { + let store = InMemoryRunStore::new(); + for i in 0..50 { + let r = store.create_run("s", &format!("p{i}")).await; + for _ in 0..20 { + store + .record_event( + &r.id, + AgentEvent::TextDelta { + text: "y".to_string(), + }, + ) + .await; + } + } + assert_eq!(store.list().await.len(), 50); } } diff --git a/core/src/store/file_store.rs b/core/src/store/file_store.rs index 5054472..f53eb30 100644 --- a/core/src/store/file_store.rs +++ b/core/src/store/file_store.rs @@ -1,5 +1,7 @@ use super::{SessionData, SessionStore}; +use crate::loop_checkpoint::LoopCheckpoint; use crate::run::RunRecord; +use crate::subagent_task_tracker::SubagentTaskSnapshot; use crate::tools::ArtifactStore; use crate::trace::TraceEvent; use crate::verification::VerificationReport; @@ -67,6 +69,18 @@ impl FileSessionStore { .join("runs") .join(format!("{}.json", safe_session_id(id))) } + + fn subagent_tasks_path(&self, id: &str) -> PathBuf { + self.dir + .join("subagent_tasks") + .join(format!("{}.json", safe_session_id(id))) + } + + fn loop_checkpoint_path(&self, run_id: &str) -> PathBuf { + self.dir + .join("loop_checkpoints") + .join(format!("{}.json", safe_session_id(run_id))) + } } fn safe_session_id(id: &str) -> String { @@ -188,6 +202,19 @@ impl SessionStore for FileSessionStore { })?; } + let subagent_tasks_path = self.subagent_tasks_path(id); + if subagent_tasks_path.exists() { + fs::remove_file(&subagent_tasks_path) + .await + .with_context(|| { + format!( + "Failed to delete subagent task file for session {}: {}", + id, + subagent_tasks_path.display() + ) + })?; + } + Ok(()) } @@ -350,6 +377,114 @@ impl SessionStore for FileSessionStore { Ok(Some(reports)) } + async fn save_subagent_tasks(&self, id: &str, tasks: &[SubagentTaskSnapshot]) -> Result<()> { + let path = self.subagent_tasks_path(id); + if let Some(parent) = path.parent() { + fs::create_dir_all(parent).await.with_context(|| { + format!( + "Failed to create subagent task directory: {}", + parent.display() + ) + })?; + } + + let json = serde_json::to_string_pretty(tasks) + .with_context(|| format!("Failed to serialize subagent tasks for session {id}"))?; + fs::write(&path, json) + .await + .with_context(|| format!("Failed to write subagent tasks to {}", path.display()))?; + Ok(()) + } + + async fn load_subagent_tasks(&self, id: &str) -> Result>> { + let path = self.subagent_tasks_path(id); + if !path.exists() { + return Ok(None); + } + let json = fs::read_to_string(&path) + .await + .with_context(|| format!("Failed to read subagent tasks from {}", path.display()))?; + let tasks = serde_json::from_str(&json) + .with_context(|| format!("Failed to parse subagent tasks from {}", path.display()))?; + Ok(Some(tasks)) + } + + async fn save_loop_checkpoint(&self, run_id: &str, checkpoint: &LoopCheckpoint) -> Result<()> { + let path = self.loop_checkpoint_path(run_id); + if let Some(parent) = path.parent() { + fs::create_dir_all(parent).await.with_context(|| { + format!( + "Failed to create loop checkpoint directory: {}", + parent.display() + ) + })?; + } + let json = serde_json::to_string_pretty(checkpoint) + .with_context(|| format!("Failed to serialize loop checkpoint for run {run_id}"))?; + + // Crash-atomic write: a checkpoint exists precisely to survive a + // process crash, so the write itself must be crash-safe. A plain + // `fs::write` can leave a truncated JSON file if the process dies + // mid-write — which `resume_run` would then fail to parse, + // defeating the whole point. Write to a unique temp file, fsync, + // then atomically rename over the target. + let unique_suffix = format!( + "{}.{}", + std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_nanos()) + .unwrap_or(0), + std::process::id() + ); + let temp_path = path.with_extension(format!("json.{}.tmp", unique_suffix)); + let mut file = fs::File::create(&temp_path).await.with_context(|| { + format!( + "Failed to create checkpoint temp file: {}", + temp_path.display() + ) + })?; + file.write_all(json.as_bytes()) + .await + .with_context(|| format!("Failed to write loop checkpoint for run {run_id}"))?; + file.sync_all() + .await + .with_context(|| format!("Failed to fsync loop checkpoint for run {run_id}"))?; + fs::rename(&temp_path, &path).await.with_context(|| { + format!( + "Failed to rename loop checkpoint into place: {}", + path.display() + ) + })?; + Ok(()) + } + + async fn load_loop_checkpoint(&self, run_id: &str) -> Result> { + let path = self.loop_checkpoint_path(run_id); + if !path.exists() { + return Ok(None); + } + let json = fs::read_to_string(&path) + .await + .with_context(|| format!("Failed to read loop checkpoint from {}", path.display()))?; + let checkpoint = serde_json::from_str(&json) + .with_context(|| format!("Failed to parse loop checkpoint from {}", path.display()))?; + Ok(Some(checkpoint)) + } + + async fn delete_loop_checkpoint(&self, run_id: &str) -> Result<()> { + let path = self.loop_checkpoint_path(run_id); + if path.exists() { + fs::remove_file(&path).await.with_context(|| { + format!( + "Failed to delete loop checkpoint for run {}: {}", + run_id, + path.display() + ) + })?; + } + Ok(()) + } + async fn health_check(&self) -> Result<()> { // Verify directory exists and is writable let probe = self.dir.join(".health_check"); diff --git a/core/src/store/memory_store.rs b/core/src/store/memory_store.rs index 732a641..798b375 100644 --- a/core/src/store/memory_store.rs +++ b/core/src/store/memory_store.rs @@ -1,5 +1,7 @@ use super::{SessionData, SessionStore}; +use crate::loop_checkpoint::LoopCheckpoint; use crate::run::RunRecord; +use crate::subagent_task_tracker::SubagentTaskSnapshot; use crate::tools::ArtifactStore; use crate::trace::TraceEvent; use crate::verification::VerificationReport; @@ -17,6 +19,8 @@ pub struct MemorySessionStore { trace_events: tokio::sync::RwLock>>, run_records: tokio::sync::RwLock>>, verification_reports: tokio::sync::RwLock>>, + subagent_tasks: tokio::sync::RwLock>>, + loop_checkpoints: tokio::sync::RwLock>, } impl MemorySessionStore { @@ -27,6 +31,8 @@ impl MemorySessionStore { trace_events: tokio::sync::RwLock::new(HashMap::new()), run_records: tokio::sync::RwLock::new(HashMap::new()), verification_reports: tokio::sync::RwLock::new(HashMap::new()), + subagent_tasks: tokio::sync::RwLock::new(HashMap::new()), + loop_checkpoints: tokio::sync::RwLock::new(HashMap::new()), } } } @@ -57,6 +63,11 @@ impl SessionStore for MemorySessionStore { self.trace_events.write().await.remove(id); self.run_records.write().await.remove(id); self.verification_reports.write().await.remove(id); + self.subagent_tasks.write().await.remove(id); + // Loop checkpoints are keyed by run_id, not session_id, so a + // session-level delete can't address them. They are removed by + // `delete_loop_checkpoint(run_id)` — called automatically by the + // run lifecycle when each run reaches a terminal state in-process. Ok(()) } @@ -122,6 +133,35 @@ impl SessionStore for MemorySessionStore { Ok(self.verification_reports.read().await.get(id).cloned()) } + async fn save_subagent_tasks(&self, id: &str, tasks: &[SubagentTaskSnapshot]) -> Result<()> { + self.subagent_tasks + .write() + .await + .insert(id.to_string(), tasks.to_vec()); + Ok(()) + } + + async fn load_subagent_tasks(&self, id: &str) -> Result>> { + Ok(self.subagent_tasks.read().await.get(id).cloned()) + } + + async fn save_loop_checkpoint(&self, run_id: &str, checkpoint: &LoopCheckpoint) -> Result<()> { + self.loop_checkpoints + .write() + .await + .insert(run_id.to_string(), checkpoint.clone()); + Ok(()) + } + + async fn load_loop_checkpoint(&self, run_id: &str) -> Result> { + Ok(self.loop_checkpoints.read().await.get(run_id).cloned()) + } + + async fn delete_loop_checkpoint(&self, run_id: &str) -> Result<()> { + self.loop_checkpoints.write().await.remove(run_id); + Ok(()) + } + fn backend_name(&self) -> &str { "memory" } diff --git a/core/src/store/mod.rs b/core/src/store/mod.rs index d4f9b10..553ae9d 100644 --- a/core/src/store/mod.rs +++ b/core/src/store/mod.rs @@ -43,7 +43,9 @@ pub use session_data::{ DEFAULT_AUTO_COMPACT_THRESHOLD, }; +use crate::loop_checkpoint::LoopCheckpoint; use crate::run::RunRecord; +use crate::subagent_task_tracker::SubagentTaskSnapshot; use crate::tools::ArtifactStore; use crate::trace::TraceEvent; use crate::verification::VerificationReport; @@ -118,6 +120,56 @@ pub trait SessionStore: Send + Sync { Ok(None) } + /// Save the session's delegated subagent task tracker snapshots. + /// + /// Cluster-grade hosts need this so a migrated session keeps a + /// queryable history of its delegated child runs. Cancellers are + /// **not** persisted — they are runtime-only and re-attaching them + /// is the executor's job at task respawn time. + async fn save_subagent_tasks(&self, _id: &str, _tasks: &[SubagentTaskSnapshot]) -> Result<()> { + Ok(()) + } + + /// Load the session's delegated subagent task tracker snapshots. + async fn load_subagent_tasks(&self, _id: &str) -> Result>> { + Ok(None) + } + + /// Save the latest per-tool-round loop checkpoint for `run_id`. + /// + /// The agent loop calls this through the + /// [`SessionStoreCheckpointSink`](crate::loop_checkpoint::SessionStoreCheckpointSink) + /// adapter after each completed tool round. Implementations should + /// **overwrite** any earlier checkpoint for the same `run_id` — the + /// loop only ever needs the most recent boundary. + async fn save_loop_checkpoint( + &self, + _run_id: &str, + _checkpoint: &LoopCheckpoint, + ) -> Result<()> { + Ok(()) + } + + /// Load the latest loop checkpoint for `run_id`. + async fn load_loop_checkpoint(&self, _run_id: &str) -> Result> { + Ok(None) + } + + /// Delete the loop checkpoint for `run_id`, if present. + /// + /// Called by the run lifecycle when a run reaches a terminal state + /// **in-process** (completed, failed, or cancelled) — at that point + /// the checkpoint is dead weight. Only a process crash (the agent + /// loop never returns) should leave a checkpoint behind for + /// crash-recovery resume. Without this, every tool-using run would + /// leak a checkpoint forever — the dominant unbounded-growth source + /// for long-running cluster deployments. + /// + /// Deleting a non-existent checkpoint is a no-op success. + async fn delete_loop_checkpoint(&self, _run_id: &str) -> Result<()> { + Ok(()) + } + /// Health check — verify the store backend is reachable and operational async fn health_check(&self) -> Result<()> { Ok(()) diff --git a/core/src/store/session_data.rs b/core/src/store/session_data.rs index f970a8a..b735243 100644 --- a/core/src/store/session_data.rs +++ b/core/src/store/session_data.rs @@ -182,6 +182,30 @@ pub struct SessionData { /// Parent session ID (for delegated child sessions) #[serde(skip_serializing_if = "Option::is_none")] pub parent_id: Option, + + /// Multi-tenant identifier. The framework only transports this string; + /// the host (e.g. 书安OS) decides what "tenant" means and how to + /// aggregate/bill on it. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub tenant_id: Option, + + /// Identity of the principal that triggered this session (user id, + /// service account, etc). Framework treats as opaque; emitted to + /// hooks/traces for accounting and audit. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub principal: Option, + + /// Logical identifier of the agent template / definition the session + /// was instantiated from. Lets the host aggregate sessions by + /// "which agent recipe" independent of the concrete session id. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub agent_template_id: Option, + + /// Distributed-trace correlation id. Propagated through hooks/traces + /// so a session's events can be joined with upstream/downstream work + /// in the host's observability pipeline. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub correlation_id: Option, } /// Serializable LLM configuration diff --git a/core/src/store/tests.rs b/core/src/store/tests.rs index 60d076b..a8e3904 100644 --- a/core/src/store/tests.rs +++ b/core/src/store/tests.rs @@ -64,6 +64,10 @@ fn create_test_session_data() -> SessionData { llm_config: None, tasks: vec![], parent_id: None, + tenant_id: None, + principal: None, + agent_template_id: None, + correlation_id: None, total_cost: 0.0, model_name: None, cost_records: Vec::new(), @@ -726,3 +730,81 @@ async fn test_file_store_load_nonexistent_returns_none() { let result = store.load("does-not-exist-at-all").await.unwrap(); assert!(result.is_none(), "Missing session must return Ok(None)"); } + +fn sample_checkpoint(run_id: &str) -> crate::loop_checkpoint::LoopCheckpoint { + crate::loop_checkpoint::LoopCheckpoint { + schema_version: crate::loop_checkpoint::LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: run_id.to_string(), + session_id: "s-1".to_string(), + turn: 2, + messages: vec![Message::user("hi")], + total_usage: TokenUsage::default(), + tool_calls_count: 1, + verification_reports: Vec::new(), + checkpoint_ms: 1_700_000_000_000, + } +} + +#[tokio::test] +async fn test_memory_store_delete_loop_checkpoint() { + let store = MemorySessionStore::new(); + store + .save_loop_checkpoint("run-x", &sample_checkpoint("run-x")) + .await + .unwrap(); + assert!(store.load_loop_checkpoint("run-x").await.unwrap().is_some()); + + store.delete_loop_checkpoint("run-x").await.unwrap(); + assert!( + store.load_loop_checkpoint("run-x").await.unwrap().is_none(), + "checkpoint must be gone after delete" + ); + + // Deleting a non-existent checkpoint is a no-op success. + store.delete_loop_checkpoint("never-existed").await.unwrap(); +} + +#[tokio::test] +async fn test_file_store_delete_loop_checkpoint() { + let dir = tempdir().unwrap(); + let store = FileSessionStore::new(dir.path()).await.unwrap(); + store + .save_loop_checkpoint("run-y", &sample_checkpoint("run-y")) + .await + .unwrap(); + let loaded = store.load_loop_checkpoint("run-y").await.unwrap(); + assert_eq!(loaded.unwrap().run_id, "run-y"); + + store.delete_loop_checkpoint("run-y").await.unwrap(); + assert!(store.load_loop_checkpoint("run-y").await.unwrap().is_none()); + + // Idempotent on a missing file. + store.delete_loop_checkpoint("run-y").await.unwrap(); +} + +#[tokio::test] +async fn test_file_store_checkpoint_write_is_atomic_no_temp_leftovers() { + // The crash-atomic write uses a temp file + rename. After a normal + // save, no `.tmp` files should be left behind in the checkpoint dir. + let dir = tempdir().unwrap(); + let store = FileSessionStore::new(dir.path()).await.unwrap(); + store + .save_loop_checkpoint("run-z", &sample_checkpoint("run-z")) + .await + .unwrap(); + + let ckpt_dir = dir.path().join("loop_checkpoints"); + let mut entries = tokio::fs::read_dir(&ckpt_dir).await.unwrap(); + let mut names = Vec::new(); + while let Some(e) = entries.next_entry().await.unwrap() { + names.push(e.file_name().to_string_lossy().to_string()); + } + assert!( + names.iter().all(|n| !n.contains(".tmp")), + "no temp files should remain after atomic write, got: {names:?}" + ); + assert!( + names.iter().any(|n| n == "run-z.json"), + "the final checkpoint file must exist, got: {names:?}" + ); +} diff --git a/core/src/subagent_task_tracker.rs b/core/src/subagent_task_tracker.rs index 55ff29d..3a8c665 100644 --- a/core/src/subagent_task_tracker.rs +++ b/core/src/subagent_task_tracker.rs @@ -7,7 +7,7 @@ use crate::agent::AgentEvent; use serde::{Deserialize, Serialize}; -use std::collections::HashMap; +use std::collections::{HashMap, VecDeque}; use tokio::sync::RwLock; use tokio_util::sync::CancellationToken; @@ -51,6 +51,14 @@ pub struct SubagentTaskSnapshot { pub struct InMemorySubagentTaskTracker { tasks: RwLock>, cancellers: RwLock>, + /// FIFO queue of task_ids that have transitioned to a terminal + /// state (Completed / Failed / Cancelled). Used to evict the + /// oldest terminal entry when `max_terminal_tasks` is configured. + /// Running tasks are never in this queue. + terminal_order: RwLock>, + /// FIFO cap on terminal-state snapshots. `None` keeps the + /// unbounded default. + max_terminal_tasks: Option, } impl InMemorySubagentTaskTracker { @@ -58,6 +66,47 @@ impl InMemorySubagentTaskTracker { Self::default() } + /// Construct a tracker with an optional FIFO cap on terminal-state + /// snapshots. Running tasks are never dropped. + pub fn with_max_terminal_tasks(max: usize) -> Self { + Self { + tasks: RwLock::new(HashMap::new()), + cancellers: RwLock::new(HashMap::new()), + terminal_order: RwLock::new(VecDeque::new()), + max_terminal_tasks: Some(max), + } + } + + /// Internal helper: mark a task_id as terminal in the FIFO queue + /// and evict oldest entries past the cap. Idempotent for tasks + /// that are already in the terminal queue (a SubagentEnd arriving + /// after a cancel won't double-push). + async fn mark_terminal_and_evict(&self, task_id: &str) { + let cap = match self.max_terminal_tasks { + Some(n) => n, + None => return, + }; + // Hold all three structures together for the push + eviction so a + // concurrent `record_event` (which takes only `tasks`) cannot + // re-insert a victim into `tasks` in the window between its removal + // from `tasks` and `cancellers`. Canonical order: + // terminal_order -> tasks -> cancellers. Callers (`cancel`, + // `record_event`) always drop their `tasks`/`cancellers` guards + // before invoking this, so holding all three here cannot deadlock. + let mut order = self.terminal_order.write().await; + let mut tasks = self.tasks.write().await; + let mut cancellers = self.cancellers.write().await; + if !order.iter().any(|id| id == task_id) { + order.push_back(task_id.to_string()); + } + while order.len() > cap { + if let Some(victim) = order.pop_front() { + tasks.remove(&victim); + cancellers.remove(&victim); + } + } + } + /// Register a `CancellationToken` for a running task so callers can /// trigger cancellation through `cancel(task_id)`. The task executor /// is expected to remove the entry on exit via `clear_canceller`. @@ -83,12 +132,22 @@ impl InMemorySubagentTaskTracker { Some(token) => { token.cancel(); let now = now_ms(); - let mut tasks = self.tasks.write().await; - if let Some(entry) = tasks.get_mut(task_id) { - if entry.status == SubagentStatus::Running { - entry.status = SubagentStatus::Cancelled; - entry.updated_ms = now; + let transitioned = { + let mut tasks = self.tasks.write().await; + if let Some(entry) = tasks.get_mut(task_id) { + if entry.status == SubagentStatus::Running { + entry.status = SubagentStatus::Cancelled; + entry.updated_ms = now; + true + } else { + false + } + } else { + false } + }; + if transitioned { + self.mark_terminal_and_evict(task_id).await; } true } @@ -173,37 +232,45 @@ impl InMemorySubagentTaskTracker { success, } => { let now = now_ms(); - let mut tasks = self.tasks.write().await; - let entry = tasks - .entry(task_id.clone()) - .or_insert_with(|| SubagentTaskSnapshot { - task_id: task_id.clone(), - parent_session_id: String::new(), - child_session_id: session_id.clone(), - agent: agent.clone(), - description: String::new(), - status: SubagentStatus::Running, - started_ms: now, - updated_ms: now, - finished_ms: None, - output: None, - success: None, - progress: Vec::new(), - }); - // Preserve a pre-set Cancelled status (set by `cancel()`) - // — a late SubagentEnd from the cancelled child loop is - // expected and must not downgrade the terminal state. - if entry.status != SubagentStatus::Cancelled { - entry.status = if *success { - SubagentStatus::Completed - } else { - SubagentStatus::Failed - }; + let was_running = { + let mut tasks = self.tasks.write().await; + let entry = + tasks + .entry(task_id.clone()) + .or_insert_with(|| SubagentTaskSnapshot { + task_id: task_id.clone(), + parent_session_id: String::new(), + child_session_id: session_id.clone(), + agent: agent.clone(), + description: String::new(), + status: SubagentStatus::Running, + started_ms: now, + updated_ms: now, + finished_ms: None, + output: None, + success: None, + progress: Vec::new(), + }); + let was_running = entry.status == SubagentStatus::Running; + // Preserve a pre-set Cancelled status (set by `cancel()`) + // — a late SubagentEnd from the cancelled child loop is + // expected and must not downgrade the terminal state. + if entry.status != SubagentStatus::Cancelled { + entry.status = if *success { + SubagentStatus::Completed + } else { + SubagentStatus::Failed + }; + } + entry.updated_ms = now; + entry.finished_ms = Some(now); + entry.output = Some(output.clone()); + entry.success = Some(*success); + was_running + }; + if was_running { + self.mark_terminal_and_evict(task_id).await; } - entry.updated_ms = now; - entry.finished_ms = Some(now); - entry.output = Some(output.clone()); - entry.success = Some(*success); } _ => {} } @@ -240,6 +307,27 @@ impl InMemorySubagentTaskTracker { .filter(|task| task.parent_session_id == parent_session_id) .collect() } + + /// Replace the tracker's task snapshots with the given set. Cancellers + /// are **not** restored (they are runtime-only channels tied to live + /// child loops). After `replace_snapshots`, any task whose status was + /// `Running` at checkpoint time will appear `Running` in the tracker + /// but `cancel(task_id)` will return `false` because no canceller is + /// registered — callers should normally checkpoint at a quiescent + /// point so no tasks are `Running`. + /// + /// Used by [`SessionStore`](crate::store::SessionStore) rehydration to + /// restore the materialized subagent view of a previously-saved + /// session. + pub async fn replace_snapshots(&self, snapshots: Vec) { + let mut map = HashMap::with_capacity(snapshots.len()); + for snap in snapshots { + map.insert(snap.task_id.clone(), snap); + } + *self.tasks.write().await = map; + // Cancellers reference live tokens — invalidate the lot. + self.cancellers.write().await.clear(); + } } fn now_ms() -> u64 { @@ -440,4 +528,124 @@ mod tests { assert!(!tracker.cancel("task-e").await); assert!(!token.is_cancelled()); } + + #[tokio::test(flavor = "multi_thread", worker_threads = 4)] + async fn concurrent_record_and_cancel_under_terminal_cap_does_not_deadlock() { + // Guards the canonical lock-ordering change in mark_terminal_and_evict + // (terminal_order -> tasks -> cancellers held together). A bad ordering + // would ABBA-deadlock against concurrent cancel()/record_event and hang. + let tracker = std::sync::Arc::new(InMemorySubagentTaskTracker::with_max_terminal_tasks(8)); + let mut handles = Vec::new(); + for i in 0..60 { + let t = std::sync::Arc::clone(&tracker); + handles.push(tokio::spawn(async move { + let task_id = format!("t-{i}"); + let child = format!("c-{i}"); + t.record_event(&start_event(&task_id, "parent", &child)) + .await; + if i % 2 == 0 { + t.register_canceller(&task_id, CancellationToken::new()) + .await; + let _ = t.cancel(&task_id).await; + } else { + t.record_event(&end_event(&task_id, &child, true)).await; + } + })); + } + for h in handles { + h.await.unwrap(); + } + // Terminal cap honored; tracker still usable. + let terminal = tracker + .list() + .await + .into_iter() + .filter(|t| t.status != SubagentStatus::Running) + .count(); + assert!( + terminal <= 8, + "terminal cap must hold under load, got {terminal}" + ); + } + + #[tokio::test] + async fn max_terminal_tasks_evicts_oldest_completed_only() { + let tracker = InMemorySubagentTaskTracker::with_max_terminal_tasks(2); + + // Three fully terminal tasks; oldest must be evicted. + for i in 0..3 { + let task_id = format!("done-{i}"); + tracker + .record_event(&start_event(&task_id, "parent", "child")) + .await; + tracker + .record_event(&end_event(&task_id, "child", true)) + .await; + } + + // Only the two most-recent terminal tasks survive. + let list = tracker.list().await; + let ids: Vec<&str> = list.iter().map(|t| t.task_id.as_str()).collect(); + assert_eq!(ids.len(), 2); + assert!(ids.contains(&"done-1")); + assert!(ids.contains(&"done-2")); + assert!( + !ids.contains(&"done-0"), + "oldest terminal entry must be evicted" + ); + } + + #[tokio::test] + async fn max_terminal_tasks_never_evicts_running_tasks() { + let tracker = InMemorySubagentTaskTracker::with_max_terminal_tasks(1); + + // One running, two terminal — the cap applies only to terminal + // entries, so the running task survives even if it would be + // the "oldest". + tracker + .record_event(&start_event("running", "parent", "child")) + .await; + for i in 0..3 { + let task_id = format!("done-{i}"); + tracker + .record_event(&start_event(&task_id, "parent", "child")) + .await; + tracker + .record_event(&end_event(&task_id, "child", true)) + .await; + } + + let list = tracker.list().await; + let ids: Vec<&str> = list.iter().map(|t| t.task_id.as_str()).collect(); + assert!( + ids.contains(&"running"), + "running task must never be evicted" + ); + // Only the most recent terminal task survives. + assert!(ids.contains(&"done-2")); + assert!(!ids.contains(&"done-0")); + assert!(!ids.contains(&"done-1")); + assert_eq!(list.len(), 2); + } + + #[tokio::test] + async fn cancel_path_also_participates_in_terminal_cap() { + let tracker = InMemorySubagentTaskTracker::with_max_terminal_tasks(1); + + // Two cancellations — second one should evict the first. + for i in 0..2 { + let task_id = format!("c-{i}"); + tracker + .record_event(&start_event(&task_id, "parent", "child")) + .await; + tracker + .register_canceller(&task_id, CancellationToken::new()) + .await; + assert!(tracker.cancel(&task_id).await); + } + + let list = tracker.list().await; + assert_eq!(list.len(), 1); + assert_eq!(list[0].task_id, "c-1"); + } } diff --git a/core/src/trace.rs b/core/src/trace.rs index 4e575fe..835698e 100644 --- a/core/src/trace.rs +++ b/core/src/trace.rs @@ -90,9 +90,27 @@ pub trait TraceSink: Send + Sync { #[derive(Debug, Clone, Default)] pub struct InMemoryTraceSink { events: Arc>>, + /// FIFO retention cap (`None` = unlimited). When set, the oldest + /// event is dropped on each new `record` once the buffer exceeds + /// this size. Useful for long-running sessions that would + /// otherwise leak trace memory. + max_events: Option, } impl InMemoryTraceSink { + /// Construct a sink with no retention cap (default, unbounded). + pub fn new() -> Self { + Self::default() + } + + /// Construct a sink that retains at most `max_events` records. + pub fn with_max_events(max_events: usize) -> Self { + Self { + events: Arc::new(RwLock::new(Vec::with_capacity(max_events.min(1024)))), + max_events: Some(max_events), + } + } + pub fn events(&self) -> Vec { self.events.read().unwrap().clone() } @@ -108,7 +126,20 @@ impl InMemoryTraceSink { impl TraceSink for InMemoryTraceSink { fn record(&self, event: TraceEvent) { - self.events.write().unwrap().push(event); + let mut events = self.events.write().unwrap(); + events.push(event); + // FIFO trim — keep the buffer at most `max_events`. We drain + // from the front rather than truncating the back so the most + // recent entries (most useful for debugging) are preserved. + // Steady-state cost is one O(n) shift per push at cap; acceptable + // for diagnostic traces. Switch to VecDeque if hot-path tracing + // ever becomes a perf bottleneck. + if let Some(cap) = self.max_events { + if events.len() > cap { + let excess = events.len() - cap; + events.drain(..excess); + } + } } } @@ -223,4 +254,38 @@ mod tests { ); assert!(event.details.as_ref().unwrap().get("steps").is_none()); } + + fn dummy_event(i: u32) -> TraceEvent { + TraceEvent::tool_execution( + "read", + true, + 0, + Duration::from_millis(i as u64), + i as usize, + None, + ) + } + + #[test] + fn with_max_events_caps_buffer_fifo() { + let sink = InMemoryTraceSink::with_max_events(3); + for i in 0..10 { + sink.record(dummy_event(i)); + } + let events = sink.events(); + assert_eq!(events.len(), 3, "buffer must be capped"); + // Oldest events are evicted; the surviving events are the + // last `cap` recorded (7, 8, 9). + assert_eq!(events[0].duration_ms, 7); + assert_eq!(events[2].duration_ms, 9); + } + + #[test] + fn default_sink_is_unbounded() { + let sink = InMemoryTraceSink::new(); + for i in 0..50 { + sink.record(dummy_event(i)); + } + assert_eq!(sink.events().len(), 50); + } } diff --git a/core/tests/test_real_llm_cluster_features.rs b/core/tests/test_real_llm_cluster_features.rs new file mode 100644 index 0000000..a20926c --- /dev/null +++ b/core/tests/test_real_llm_cluster_features.rs @@ -0,0 +1,302 @@ +//! Real-LLM end-to-end tests for the cluster-grade features added in 3.3.0 +//! (BudgetGuard enforcement, loop-checkpoint lifecycle, resume_run, identity +//! labels). These exercise code paths that mock LLM clients cannot validate — +//! most importantly that `BudgetGuard::record_after_llm` receives the +//! provider's *actual* token usage and that a real run's lifecycle clears its +//! checkpoint. +//! +//! All `#[ignore]` — they require a live provider in `.a3s/config.acl`. Run: +//! +//! ```bash +//! A3S_CONFIG_FILE=/abs/path/.a3s/config.acl \ +//! cargo test -p a3s-code-core --test test_real_llm_cluster_features -- --ignored --nocapture +//! ``` + +use std::path::PathBuf; +use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering}; +use std::sync::Arc; + +use a3s_code_core::budget::{BudgetDecision, BudgetGuard}; +use a3s_code_core::config::CodeConfig; +use a3s_code_core::llm::TokenUsage; +use a3s_code_core::store::{MemorySessionStore, SessionStore}; +use a3s_code_core::{Agent, SessionOptions}; + +fn repo_config_path() -> PathBuf { + std::env::var_os("A3S_CONFIG_FILE") + .map(PathBuf::from) + .unwrap_or_else(|| { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("../../..") + .join(".a3s/config.acl") + }) +} + +async fn real_agent() -> Agent { + let path = repo_config_path(); + let config = CodeConfig::from_file(&path) + .unwrap_or_else(|e| panic!("failed to load {}: {e}", path.display())); + Agent::from_config(config) + .await + .expect("agent from real config") +} + +// A guard that always denies, counting how many times it was consulted. +#[derive(Default)] +struct DenyGuard { + checks: AtomicUsize, + records: AtomicUsize, +} + +#[async_trait::async_trait] +impl BudgetGuard for DenyGuard { + async fn check_before_llm(&self, _session_id: &str, _est: usize) -> BudgetDecision { + self.checks.fetch_add(1, Ordering::SeqCst); + BudgetDecision::Deny { + resource: "llm_tokens".to_string(), + reason: "test cap exceeded".to_string(), + } + } + async fn record_after_llm(&self, _session_id: &str, _usage: &TokenUsage) { + self.records.fetch_add(1, Ordering::SeqCst); + } +} + +// A guard that allows but captures the *actual* usage the provider reports. +#[derive(Default)] +struct RecordingGuard { + checks: AtomicUsize, + records: AtomicUsize, + last_total_tokens: AtomicU64, +} + +#[async_trait::async_trait] +impl BudgetGuard for RecordingGuard { + async fn check_before_llm(&self, _session_id: &str, _est: usize) -> BudgetDecision { + self.checks.fetch_add(1, Ordering::SeqCst); + BudgetDecision::Allow + } + async fn record_after_llm(&self, _session_id: &str, usage: &TokenUsage) { + self.records.fetch_add(1, Ordering::SeqCst); + self.last_total_tokens + .store(usage.total_tokens as u64, Ordering::SeqCst); + } +} + +/// A real `Deny` from `check_before_llm` must abort the call BEFORE the +/// provider is contacted: send errors with "Budget exhausted", the guard +/// was consulted exactly once, `record_after_llm` never fired, and no +/// conversation history was recorded. +#[tokio::test(flavor = "multi_thread")] +#[ignore = "requires real provider credentials and network access"] +async fn real_budget_guard_deny_blocks_llm_call() { + let guard = Arc::new(DenyGuard::default()); + let agent = real_agent().await; + let opts = SessionOptions::new() + .with_session_id("real-budget-deny") + .with_budget_guard(guard.clone() as Arc); + let session = agent + .session("/tmp/real-budget-deny", Some(opts)) + .expect("session"); + + let err = session + .send("Reply with the single word: ok", None) + .await + .unwrap_err(); + assert!( + err.to_string().contains("Budget exhausted"), + "expected budget-exhausted error, got: {err}" + ); + assert_eq!( + guard.checks.load(Ordering::SeqCst), + 1, + "guard consulted once" + ); + assert_eq!( + guard.records.load(Ordering::SeqCst), + 0, + "record_after_llm must not fire when denied (LLM never called)" + ); + assert!( + session.history().is_empty(), + "denied call must not record history" + ); +} + +/// On `Allow`, the real run completes and `record_after_llm` receives the +/// provider's ACTUAL non-zero token usage — the post-call accounting path a +/// mock client (which returns fixed/zero usage) cannot validate. +#[tokio::test(flavor = "multi_thread")] +#[ignore = "requires real provider credentials and network access"] +async fn real_budget_guard_allow_records_actual_usage() { + let guard = Arc::new(RecordingGuard::default()); + let agent = real_agent().await; + let opts = SessionOptions::new() + .with_session_id("real-budget-allow") + .with_budget_guard(guard.clone() as Arc); + let session = agent + .session("/tmp/real-budget-allow", Some(opts)) + .expect("session"); + + let result = session + .send("Reply with the single word: ok", None) + .await + .expect("real send should succeed under an allowing guard"); + + assert!(!result.text.is_empty(), "real model returned text"); + assert!(guard.checks.load(Ordering::SeqCst) >= 1, "guard consulted"); + assert!( + guard.records.load(Ordering::SeqCst) >= 1, + "record_after_llm must fire on a successful real call" + ); + assert!( + guard.last_total_tokens.load(Ordering::SeqCst) > 0, + "record_after_llm must receive the provider's real (non-zero) token usage" + ); + assert!( + result.usage.total_tokens > 0, + "AgentResult must carry real token usage" + ); +} + +/// A real run with a `SessionStore` configured must, on completion, leave NO +/// dangling loop checkpoint for its run id — the leak-fix lifecycle path +/// exercised end-to-end against a live model. +#[tokio::test(flavor = "multi_thread")] +#[ignore = "requires real provider credentials and network access"] +async fn real_run_with_store_leaves_no_dangling_checkpoint() { + let store: Arc = Arc::new(MemorySessionStore::new()); + let agent = real_agent().await; + let opts = SessionOptions::new() + .with_session_id("real-ckpt-clear") + .with_session_store(Arc::clone(&store)); + let session = agent + .session("/tmp/real-ckpt-clear", Some(opts)) + .expect("session"); + + let result = session + .send( + "Reply with the single word: done. Do not call any tools.", + None, + ) + .await + .expect("real send should succeed"); + assert!(!result.text.is_empty()); + + let runs = session.runs().await; + assert_eq!(runs.len(), 1, "one run recorded"); + let run_id = &runs[0].id; + assert_eq!(runs[0].status, a3s_code_core::run::RunStatus::Completed); + + // Whether or not the model used a tool (which would have written a + // checkpoint mid-run), the completed run must leave none behind. + let lingering = store.load_loop_checkpoint(run_id).await.expect("load"); + assert!( + lingering.is_none(), + "completed real run must not leave a dangling loop checkpoint" + ); +} + +/// Identity labels (tenant/principal/template/correlation) attached to a +/// session survive through a live run and the run is recorded as Completed. +#[tokio::test(flavor = "multi_thread")] +#[ignore = "requires real provider credentials and network access"] +async fn real_identity_labels_survive_live_run() { + let agent = real_agent().await; + let opts = SessionOptions::new() + .with_session_id("real-labels") + .with_tenant_id("acme-prod") + .with_principal("svc-bot") + .with_agent_template_id("planner-v3") + .with_correlation_id("trace-real-1"); + let session = agent + .session("/tmp/real-labels", Some(opts)) + .expect("session"); + + let result = session + .send("Reply with the single word: ok", None) + .await + .expect("real send should succeed"); + assert!(!result.text.is_empty()); + + assert_eq!(session.tenant_id(), Some("acme-prod")); + assert_eq!(session.principal(), Some("svc-bot")); + assert_eq!(session.agent_template_id(), Some("planner-v3")); + assert_eq!(session.correlation_id(), Some("trace-real-1")); + + let runs = session.runs().await; + assert_eq!(runs.len(), 1); + assert_eq!(runs[0].status, a3s_code_core::run::RunStatus::Completed); +} + +/// `resume_run` against a live model: seed a checkpoint carrying non-zero +/// cumulative metrics, resume, and confirm the run completes AND the +/// resumed AgentResult's usage is at least the seeded amount (i.e. metrics +/// carried forward, not reset to zero) plus the real turn's tokens. +#[tokio::test(flavor = "multi_thread")] +#[ignore = "requires real provider credentials and network access"] +async fn real_resume_run_carries_checkpoint_metrics_forward() { + use a3s_code_core::llm::{ContentBlock, Message}; + use a3s_code_core::loop_checkpoint::{LoopCheckpoint, LOOP_CHECKPOINT_SCHEMA_VERSION}; + + let store: Arc = Arc::new(MemorySessionStore::new()); + let seeded_run = "real-resume-old"; + let seeded_total = 500u32; + store + .save_loop_checkpoint( + seeded_run, + &LoopCheckpoint { + schema_version: LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: seeded_run.to_string(), + session_id: "real-resume".to_string(), + turn: 1, + messages: vec![ + Message::user("Reply with the single word: ok"), + Message { + role: "assistant".to_string(), + content: vec![ContentBlock::Text { + text: "working".to_string(), + }], + reasoning_content: None, + }, + ], + total_usage: TokenUsage { + prompt_tokens: 400, + completion_tokens: 100, + total_tokens: seeded_total as usize, + cache_read_tokens: None, + cache_write_tokens: None, + }, + tool_calls_count: 2, + verification_reports: Vec::new(), + checkpoint_ms: 1_700_000_000_000, + }, + ) + .await + .expect("seed checkpoint"); + + let agent = real_agent().await; + let opts = SessionOptions::new() + .with_session_id("real-resume") + .with_session_store(Arc::clone(&store)); + let session = agent + .session("/tmp/real-resume", Some(opts)) + .expect("session"); + + let result = session + .resume_run(seeded_run) + .await + .expect("resume_run against real model should succeed"); + + assert!(!result.text.is_empty(), "resumed run produced text"); + assert!( + result.usage.total_tokens > seeded_total as usize, + "resumed usage ({}) must exceed the seeded {} (carried forward + real turn)", + result.usage.total_tokens, + seeded_total + ); + assert!( + result.tool_calls_count >= 2, + "seeded tool-call count must carry forward" + ); +} diff --git a/core/tests/test_session_close_lifecycle.rs b/core/tests/test_session_close_lifecycle.rs new file mode 100644 index 0000000..b2e3694 --- /dev/null +++ b/core/tests/test_session_close_lifecycle.rs @@ -0,0 +1,834 @@ +//! Cross-module integration tests for the session/agent close lifecycle. +//! +//! Unit tests in `core/src/agent_api/tests.rs` cover the isolated APIs. +//! This file exercises the *interaction* between session close, the +//! subagent task tracker, and the parent agent's session registry — +//! crossings that single-module unit tests cannot reach. +//! +//! Run with: +//! cargo test --test test_session_close_lifecycle -- --nocapture + +use a3s_code_core::config::{CodeConfig, ModelConfig, ModelModalities, ProviderConfig}; +use a3s_code_core::llm::Message; +use a3s_code_core::mcp::{McpServerConfig, McpTransportConfig}; +use a3s_code_core::subagent_task_tracker::SubagentStatus; +use a3s_code_core::{Agent, AgentEvent, SessionOptions}; +use tokio_util::sync::CancellationToken; + +/// Minimal offline config — no real provider is contacted because every +/// test below avoids `send`/`stream`. +fn offline_test_config() -> CodeConfig { + CodeConfig { + default_model: Some("anthropic/claude-sonnet-4-20250514".to_string()), + providers: vec![ProviderConfig { + name: "anthropic".to_string(), + api_key: Some("offline-key".to_string()), + base_url: None, + headers: std::collections::HashMap::new(), + session_id_header: None, + models: vec![ModelConfig { + id: "claude-sonnet-4-20250514".to_string(), + name: "Claude Sonnet 4".to_string(), + family: "claude-sonnet".to_string(), + api_key: None, + base_url: None, + headers: std::collections::HashMap::new(), + session_id_header: None, + attachment: false, + reasoning: false, + tool_call: true, + temperature: true, + release_date: None, + modalities: ModelModalities::default(), + cost: Default::default(), + limit: Default::default(), + }], + }], + ..Default::default() + } +} + +/// IT-1: closing a session with a delegated subagent task in flight must +/// transition that task to Cancelled, fire its registered cancel token, +/// and — critically — a late `SubagentEnd` event from the cancelled child +/// loop must not regress the terminal status back to Completed. +/// +/// This crosses the `session_close` → `subagent_task_tracker` → +/// `record_event` boundary that single-module unit tests cannot exercise. +#[tokio::test] +async fn close_with_subagent_in_flight_marks_task_cancelled_and_resists_regression() { + let agent = Agent::from_config(offline_test_config()).await.unwrap(); + let opts = SessionOptions::new().with_session_id("it1-close-subagent"); + let session = agent + .session("/tmp/it1-close-subagent-workspace", Some(opts)) + .expect("session"); + + // Simulate the in-flight state that the built-in `task` tool produces: + // a SubagentStart event, plus a registered cancellation token. + let tracker = session.subagent_tracker(); + let task_id = "task-abc"; + let child_session_id = "child-xyz"; + let canceller = CancellationToken::new(); + + tracker + .record_event(&AgentEvent::SubagentStart { + task_id: task_id.to_string(), + session_id: child_session_id.to_string(), + parent_session_id: session.id().to_string(), + agent: "general".to_string(), + description: "long-running synthetic task".to_string(), + }) + .await; + tracker.register_canceller(task_id, canceller.clone()).await; + + // Sanity: the task is visible as Running before close. + let pending = session.pending_subagent_tasks().await; + assert_eq!(pending.len(), 1, "pre-close pending list"); + assert_eq!(pending[0].task_id, task_id); + assert_eq!(pending[0].status, SubagentStatus::Running); + assert!( + !canceller.is_cancelled(), + "canceller must not be fired before close" + ); + + // Close the session — this is the cross-module action under test. + session.close().await; + assert!(session.is_closed(), "session must report closed"); + assert!( + canceller.is_cancelled(), + "subagent canceller must be fired by close()" + ); + + // The tracker view must show the task as Cancelled, and + // pending_subagent_tasks() must drop it. + let snapshot = session + .subagent_task(task_id) + .await + .expect("snapshot still queryable after close"); + assert_eq!(snapshot.status, SubagentStatus::Cancelled); + assert!(session.pending_subagent_tasks().await.is_empty()); + + // Critical contract: a *late* SubagentEnd from the cancelled child loop + // (success=true would be the worst case for status regression) must + // NOT downgrade the terminal status back to Completed. + tracker + .record_event(&AgentEvent::SubagentEnd { + task_id: task_id.to_string(), + session_id: child_session_id.to_string(), + agent: "general".to_string(), + output: "would-have-succeeded".to_string(), + success: true, + }) + .await; + let after_end = session + .subagent_task(task_id) + .await + .expect("snapshot remains queryable"); + assert_eq!( + after_end.status, + SubagentStatus::Cancelled, + "late SubagentEnd(success=true) must not regress Cancelled status" + ); +} + +/// Minimal MCP server config — `enabled = false` so `connect_global_mcp` +/// does not actually spawn a subprocess. The presence of the entry still +/// causes `agent_bootstrap::connect_global_mcp` to construct a +/// `Some(McpManager)` (it only returns `None` when `mcp_servers` is +/// empty), which is what we need to exercise the MCP branch of +/// `Agent::close()`. +fn disabled_mcp_server(name: &str) -> McpServerConfig { + McpServerConfig { + name: name.to_string(), + transport: McpTransportConfig::Stdio { + command: "/bin/true".to_string(), + args: vec![], + }, + enabled: false, + env: std::collections::HashMap::new(), + oauth: None, + tool_timeout_secs: 60, + } +} + +/// IT-2: `Agent::close()` is idempotent and cleanly walks the +/// `global_mcp.list_connected()` branch even when there are no live +/// MCP connections — and is also safe when `global_mcp` is `None`. +/// +/// We exercise both flavors (with and without `global_mcp`) so the +/// "if let Some(mcp)" arm in `agent_sessions::close_agent` is hit and +/// the no-`global_mcp` short-circuit is also covered. +#[tokio::test] +async fn agent_close_handles_global_mcp_branch_and_is_idempotent() { + // Flavor A: no MCP at all — Agent::close() must short-circuit the + // global_mcp branch. + { + let agent = Agent::from_config(offline_test_config()).await.unwrap(); + assert!(!agent.is_closed()); + agent.close().await; + assert!(agent.is_closed()); + // Idempotent: second close is a no-op (no panic). + agent.close().await; + assert!(agent.is_closed()); + } + + // Flavor B: config carries a disabled MCP server entry. This makes + // `agent_bootstrap::connect_global_mcp` return `Some(manager)` (the + // manager is constructed because mcp_servers is non-empty) while + // never opening a real connection. `list_connected()` is therefore + // empty, and `Agent::close()` must traverse the branch cleanly. + { + let mut cfg = offline_test_config(); + cfg.mcp_servers = vec![disabled_mcp_server("offline-server")]; + let agent = Agent::from_config(cfg).await.unwrap(); + + agent.close().await; + assert!(agent.is_closed()); + + // After close, the agent must reject new session creation — + // proving close() ran the full close_agent path (not just the + // MCP branch). + let err = agent + .session("/tmp/it2-post-close", None) + .err() + .expect("session() after close must error"); + let msg = err.to_string(); + assert!( + msg.contains("closed") || msg.contains("Closed"), + "post-close session() error must mention 'closed', got: {msg}" + ); + } +} + +/// IT-3: under concurrent creation + drop traffic, the agent session +/// registry must converge to *exactly* the IDs of sessions still held +/// by the caller. Single-threaded unit tests can't observe the +/// `std::sync::Mutex>` insert / drop / lazy-prune dance +/// under real parallelism. +/// +/// Strategy: +/// 1. From N concurrent tasks on a multi-thread runtime, create one +/// session each. +/// 2. Drop half the sessions immediately; hold the other half. +/// 3. Wait for all tasks to settle. +/// 4. Assert `agent.list_sessions()` returns exactly the held IDs +/// (sorted, deduped). +#[tokio::test(flavor = "multi_thread", worker_threads = 4)] +async fn session_drop_prunes_registry_under_concurrency() { + let agent = std::sync::Arc::new(Agent::from_config(offline_test_config()).await.unwrap()); + + const N: usize = 32; + + let mut handles = Vec::with_capacity(N); + for i in 0..N { + let agent = std::sync::Arc::clone(&agent); + handles.push(tokio::spawn(async move { + let id = format!("it3-session-{i:02}"); + let opts = SessionOptions::new().with_session_id(&id); + let session = agent + .session(format!("/tmp/it3-ws-{i:02}"), Some(opts)) + .expect("session"); + + // Drop the even-indexed sessions immediately so the registry + // has to prune their Weak entries; hold the odd ones. + if i % 2 == 0 { + drop(session); + None + } else { + Some((id, session)) + } + })); + } + + // Collect every held session so they outlive the assertion below. + let mut held = Vec::new(); + for h in handles { + if let Some(kept) = h.await.expect("task should not panic") { + held.push(kept); + } + } + + let mut expected: Vec = held.iter().map(|(id, _)| id.clone()).collect(); + expected.sort(); + + let observed = agent.list_sessions().await; + assert_eq!( + observed, expected, + "registry must contain exactly the IDs of still-held sessions" + ); + + // Now drop the held set and verify the registry collapses to empty + // on the next access (lazy prune). + drop(held); + let after_drop = agent.list_sessions().await; + assert!( + after_drop.is_empty(), + "after dropping all sessions the registry must prune to empty, got: {after_drop:?}" + ); +} + +/// IT-4 (Pillar 1): subagent task tracker contents survive a session +/// save/resume cycle. Before this, `session.save()` persisted history / +/// runs / traces / verification but the materialized subagent task view +/// was lost, breaking cluster-scale session migration. +/// +/// Requires multi_thread runtime because `restore_persisted_session_state` +/// uses `block_in_place` to bridge the sync `resume_session` API with +/// the async `SessionStore` calls. +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn subagent_tasks_persist_across_save_and_resume() { + use a3s_code_core::store::MemorySessionStore; + + let store: std::sync::Arc = + std::sync::Arc::new(MemorySessionStore::new()); + + // ----- Phase A: write ----- + let agent_a = Agent::from_config(offline_test_config()).await.unwrap(); + let opts_a = SessionOptions::new() + .with_session_id("pillar1-subagent-persist") + .with_session_store(std::sync::Arc::clone(&store)) + .with_auto_save(true); + let session_a = agent_a + .session("/tmp/pillar1-subagent-persist", Some(opts_a)) + .expect("phase A session"); + + let tracker_a = session_a.subagent_tracker(); + + // Three tasks: one completed, one failed, one cancelled — the full + // matrix of terminal states the migration target needs to observe. + let parent_id = session_a.id().to_string(); + let inject = |task_id: &str, child_id: &str| AgentEvent::SubagentStart { + task_id: task_id.to_string(), + session_id: child_id.to_string(), + parent_session_id: parent_id.clone(), + agent: "general".to_string(), + description: format!("seed {task_id}"), + }; + tracker_a.record_event(&inject("p1-done", "child-1")).await; + tracker_a + .record_event(&AgentEvent::SubagentEnd { + task_id: "p1-done".to_string(), + session_id: "child-1".to_string(), + agent: "general".to_string(), + output: "ok".to_string(), + success: true, + }) + .await; + tracker_a.record_event(&inject("p1-fail", "child-2")).await; + tracker_a + .record_event(&AgentEvent::SubagentEnd { + task_id: "p1-fail".to_string(), + session_id: "child-2".to_string(), + agent: "general".to_string(), + output: "boom".to_string(), + success: false, + }) + .await; + tracker_a + .record_event(&inject("p1-cancel", "child-3")) + .await; + tracker_a + .register_canceller("p1-cancel", CancellationToken::new()) + .await; + let _ = session_a.cancel_subagent_task("p1-cancel").await; + + session_a.save().await.expect("phase A save"); + + let pre_save: Vec<(String, SubagentStatus)> = session_a + .subagent_tasks() + .await + .into_iter() + .map(|t| (t.task_id, t.status)) + .collect(); + assert_eq!(pre_save.len(), 3); + + // Drop everything from phase A. + drop(session_a); + drop(agent_a); + + // ----- Phase B: read ----- + let agent_b = Agent::from_config(offline_test_config()).await.unwrap(); + let resume_opts = SessionOptions::new().with_session_store(std::sync::Arc::clone(&store)); + let session_b = agent_b + .resume_session("pillar1-subagent-persist", resume_opts) + .expect("phase B resume"); + + let mut post_resume: Vec<(String, SubagentStatus)> = session_b + .subagent_tasks() + .await + .into_iter() + .map(|t| (t.task_id, t.status)) + .collect(); + post_resume.sort_by(|a, b| a.0.cmp(&b.0)); + let mut expected = pre_save.clone(); + expected.sort_by(|a, b| a.0.cmp(&b.0)); + assert_eq!( + post_resume, expected, + "resumed session must observe the same subagent task set & statuses" + ); + + // Cancellers are intentionally NOT restored. Cancelling an already- + // terminal task returns false (no live canceller), but must not panic + // and must keep the status stable. + let cancel_attempt = session_b.cancel_subagent_task("p1-done").await; + assert!( + !cancel_attempt, + "cancel on a restored terminal task must return false (no live canceller)" + ); + let still_done = session_b + .subagent_task("p1-done") + .await + .expect("snapshot still present"); + assert_eq!(still_done.status, SubagentStatus::Completed); +} + +/// IT-5 (Pillar 5): identity labels (tenant / principal / agent template / +/// correlation id) survive a session save/resume round trip and are +/// restored verbatim. These are framework-opaque strings that the host +/// (书安OS) uses for multi-tenancy / accounting / tracing — losing +/// them on migration breaks audit trails. +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn identity_labels_persist_across_save_and_resume() { + use a3s_code_core::store::MemorySessionStore; + + let store: std::sync::Arc = + std::sync::Arc::new(MemorySessionStore::new()); + + // Phase A: write + let agent_a = Agent::from_config(offline_test_config()).await.unwrap(); + let opts_a = SessionOptions::new() + .with_session_id("pillar5-labels") + .with_session_store(std::sync::Arc::clone(&store)) + .with_auto_save(true) + .with_tenant_id("acme-prod") + .with_principal("svc-deploy-bot") + .with_agent_template_id("ci-runner-v7") + .with_correlation_id("trace-1234abcd"); + let session_a = agent_a + .session("/tmp/pillar5-labels", Some(opts_a)) + .expect("phase A session"); + + session_a.save().await.expect("phase A save"); + + assert_eq!(session_a.tenant_id(), Some("acme-prod")); + assert_eq!(session_a.correlation_id(), Some("trace-1234abcd")); + + drop(session_a); + drop(agent_a); + + // Phase B: resume on a fresh agent; supply only the store, no labels. + // Labels must be restored verbatim from the saved snapshot. + let agent_b = Agent::from_config(offline_test_config()).await.unwrap(); + let resume_opts = SessionOptions::new().with_session_store(std::sync::Arc::clone(&store)); + let session_b = agent_b + .resume_session("pillar5-labels", resume_opts) + .expect("phase B resume"); + + assert_eq!(session_b.tenant_id(), Some("acme-prod")); + assert_eq!(session_b.principal(), Some("svc-deploy-bot")); + assert_eq!(session_b.agent_template_id(), Some("ci-runner-v7")); + assert_eq!(session_b.correlation_id(), Some("trace-1234abcd")); + + // Caller-supplied labels on resume override the persisted ones — + // e.g. relabeling under a new correlation id for a follow-up trace. + drop(session_b); + let resume_relabel = SessionOptions::new() + .with_session_store(std::sync::Arc::clone(&store)) + .with_correlation_id("trace-followup"); + let session_c = agent_b + .resume_session("pillar5-labels", resume_relabel) + .expect("phase C resume"); + assert_eq!( + session_c.correlation_id(), + Some("trace-followup"), + "caller-supplied correlation_id must override persisted one" + ); + // Other labels still restored from snapshot. + assert_eq!(session_c.tenant_id(), Some("acme-prod")); +} + +/// IT-CONSOLIDATED (cluster ops): exercise the full cluster-grade +/// API surface in one realistic two-node lifecycle. This is the +/// reference flow 书安OS-side scheduling code targets. +/// +/// Two **separate** Agents share one MemorySessionStore (simulating +/// two cluster nodes mounting the same persistent store): +/// Node A: builds a session with identity labels + retention caps, +/// seeds a loop checkpoint, then drops everything. +/// Node B: loads the session by id, rehydrates labels + subagent +/// tracker, picks up the checkpointed run via resume_run. +/// +/// The host-supplied identity labels, retention caps, and persisted +/// subagent task snapshots must all survive the cross-node hop — +/// these are exactly the invariants 书安OS relies on for billing, +/// audit, and memory safety in a long-lived fleet. +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn cluster_ops_consolidated_session_lifecycle() { + use a3s_code_core::loop_checkpoint::{LoopCheckpoint, LOOP_CHECKPOINT_SCHEMA_VERSION}; + use a3s_code_core::retention::SessionRetentionLimits; + use a3s_code_core::store::MemorySessionStore; + + let store: std::sync::Arc = + std::sync::Arc::new(MemorySessionStore::new()); + + // ------------------------------------------------------------------- + // Node A: create session, seed in-flight state, persist, then drop. + // ------------------------------------------------------------------- + let agent_a = Agent::from_config(offline_test_config()).await.unwrap(); + let limits_a = SessionRetentionLimits::new() + .with_max_runs(50) + .with_max_terminal_subagent_tasks(20); + let opts_a = SessionOptions::new() + .with_session_id("cluster-ops-target") + .with_session_store(std::sync::Arc::clone(&store)) + .with_auto_save(true) + .with_tenant_id("acme-prod") + .with_principal("svc-deploy-bot") + .with_agent_template_id("planner-v3") + .with_correlation_id("trace-cluster-ops") + .with_retention_limits(limits_a); + let session_a = agent_a + .session("/tmp/cluster-ops-node-a", Some(opts_a)) + .expect("node A session"); + + // Inject a completed subagent task — represents work that + // happened on node A and should survive migration. + let tracker_a = session_a.subagent_tracker(); + tracker_a + .record_event(&AgentEvent::SubagentStart { + task_id: "explore-1".to_string(), + session_id: "child-1".to_string(), + parent_session_id: session_a.id().to_string(), + agent: "explore".to_string(), + description: "find auth callsites".to_string(), + }) + .await; + tracker_a + .record_event(&AgentEvent::SubagentEnd { + task_id: "explore-1".to_string(), + session_id: "child-1".to_string(), + agent: "explore".to_string(), + output: "found 3 callsites".to_string(), + success: true, + }) + .await; + + session_a.save().await.expect("node A save"); + + // Seed a checkpoint as if a run was mid-tool-round when node A died. + let seeded_run_id = "in-flight-run-x"; + let cp = LoopCheckpoint { + schema_version: LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: seeded_run_id.to_string(), + session_id: session_a.id().to_string(), + turn: 2, + messages: vec![ + Message::user("refactor the auth module"), + Message { + role: "assistant".to_string(), + content: vec![a3s_code_core::llm::ContentBlock::Text { + text: "scanned callsites, planning edits".to_string(), + }], + reasoning_content: None, + }, + ], + total_usage: a3s_code_core::llm::TokenUsage { + prompt_tokens: 800, + completion_tokens: 200, + total_tokens: 1000, + cache_read_tokens: None, + cache_write_tokens: None, + }, + tool_calls_count: 1, + verification_reports: Vec::new(), + checkpoint_ms: 1_700_000_000_000, + }; + store + .save_loop_checkpoint(seeded_run_id, &cp) + .await + .expect("seed checkpoint"); + + // Node A goes down. + drop(session_a); + drop(agent_a); + + // ------------------------------------------------------------------- + // Node B: a different Agent picks up the session from the store. + // ------------------------------------------------------------------- + let agent_b = Agent::from_config(offline_test_config()).await.unwrap(); + let resume_opts = SessionOptions::new().with_session_store(std::sync::Arc::clone(&store)); + let session_b = agent_b + .resume_session("cluster-ops-target", resume_opts) + .expect("node B resume"); + + // Identity labels survive. + assert_eq!(session_b.tenant_id(), Some("acme-prod")); + assert_eq!(session_b.principal(), Some("svc-deploy-bot")); + assert_eq!(session_b.agent_template_id(), Some("planner-v3")); + assert_eq!(session_b.correlation_id(), Some("trace-cluster-ops")); + + // Subagent task history survives. + let restored_tasks = session_b.subagent_tasks().await; + assert_eq!(restored_tasks.len(), 1); + assert_eq!(restored_tasks[0].task_id, "explore-1"); + assert_eq!( + restored_tasks[0].status, + a3s_code_core::subagent_task_tracker::SubagentStatus::Completed + ); + + // Crashed run can be resumed from the persisted checkpoint via the + // session API. (Note: we don't actually call resume_run here + // because the test config has no real LLM credentials — that's + // covered by test_resume_run_picks_up_from_persisted_checkpoint + // which uses build_session with a mock client. We assert the + // *checkpoint contract* — what the run-resumption code reads — + // is intact across the migration.) + let cp_after = { + let s: std::sync::Arc = + std::sync::Arc::clone(&store); + s.load_loop_checkpoint(seeded_run_id) + .await + .expect("load checkpoint after migration") + .expect("checkpoint preserved") + }; + assert_eq!(cp_after.run_id, seeded_run_id); + assert_eq!(cp_after.turn, 2); + assert_eq!(cp_after.messages.len(), 2); + assert_eq!(cp_after.total_usage.total_tokens, 1000); + + // Node B can decide to clean up the old run id once it's done with + // resumption — the host (书安OS) tracks the old→new run mapping. + // The framework does not auto-delete checkpoints; that's the + // host's call. +} + +/// IT-9 (Retention): SessionOptions::with_retention_limits flows +/// through to the session's in-memory subagent task tracker so a +/// long-running session's terminal entries don't accumulate +/// unboundedly. Verified via the public tracker accessor — same +/// surface 书安OS would inspect / drive externally. +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn retention_limits_are_plumbed_into_subagent_tracker() { + use a3s_code_core::retention::SessionRetentionLimits; + + let agent = Agent::from_config(offline_test_config()).await.unwrap(); + let limits = SessionRetentionLimits::new().with_max_terminal_subagent_tasks(2); + let opts = SessionOptions::new() + .with_session_id("it9-retention") + .with_retention_limits(limits); + let session = agent + .session("/tmp/it9-retention-ws", Some(opts)) + .expect("session"); + let tracker = session.subagent_tracker(); + + let parent = session.id().to_string(); + let start = |task_id: &str| AgentEvent::SubagentStart { + task_id: task_id.to_string(), + session_id: format!("{task_id}-child"), + parent_session_id: parent.clone(), + agent: "general".to_string(), + description: "seed".to_string(), + }; + let end = |task_id: &str| AgentEvent::SubagentEnd { + task_id: task_id.to_string(), + session_id: format!("{task_id}-child"), + agent: "general".to_string(), + output: "ok".to_string(), + success: true, + }; + + // Inject three completed tasks; the cap is 2 so the oldest must + // be evicted via the framework's FIFO terminal-cap policy. + for id in ["t-a", "t-b", "t-c"] { + tracker.record_event(&start(id)).await; + tracker.record_event(&end(id)).await; + } + + let surviving: Vec = session + .subagent_tasks() + .await + .into_iter() + .map(|t| t.task_id) + .collect(); + assert_eq!(surviving.len(), 2, "cap must be enforced"); + assert!(surviving.contains(&"t-b".to_string())); + assert!(surviving.contains(&"t-c".to_string())); + assert!( + !surviving.contains(&"t-a".to_string()), + "oldest terminal entry must be evicted by SessionRetentionLimits" + ); +} + +/// IT-6 (Pillar 3 cut 1): a `LoopCheckpoint` round-trips through the +/// `SessionStore` — this is the data contract 书安OS will sit on to +/// migrate / replay a run on another node. +/// +/// Cut 1 lands the data + persistence path. The actual in-loop +/// `persist_loop_checkpoint` call site is wired but exercising it +/// end-to-end needs a tool-using mock; the next cut will add that +/// integration coverage alongside the resume API. +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn loop_checkpoint_round_trips_through_session_store() { + use a3s_code_core::llm::TokenUsage; + use a3s_code_core::loop_checkpoint::{LoopCheckpoint, LOOP_CHECKPOINT_SCHEMA_VERSION}; + use a3s_code_core::store::{MemorySessionStore, SessionStore}; + + let store: std::sync::Arc = std::sync::Arc::new(MemorySessionStore::new()); + + let run_id = "run-pillar3-roundtrip"; + let checkpoint = LoopCheckpoint { + schema_version: LOOP_CHECKPOINT_SCHEMA_VERSION, + run_id: run_id.to_string(), + session_id: "session-pillar3".to_string(), + turn: 4, + messages: vec![ + a3s_code_core::llm::Message::user("seed prompt"), + a3s_code_core::llm::Message { + role: "assistant".to_string(), + content: vec![a3s_code_core::llm::ContentBlock::Text { + text: "ack".to_string(), + }], + reasoning_content: None, + }, + ], + total_usage: TokenUsage { + prompt_tokens: 120, + completion_tokens: 30, + total_tokens: 150, + cache_read_tokens: None, + cache_write_tokens: None, + }, + tool_calls_count: 3, + verification_reports: Vec::new(), + checkpoint_ms: 1_700_000_000_000, + }; + + store + .save_loop_checkpoint(run_id, &checkpoint) + .await + .expect("save"); + + let loaded = store + .load_loop_checkpoint(run_id) + .await + .expect("load") + .expect("checkpoint present"); + + assert_eq!(loaded.run_id, run_id); + assert_eq!(loaded.session_id, "session-pillar3"); + assert_eq!(loaded.turn, 4); + assert_eq!(loaded.tool_calls_count, 3); + assert_eq!(loaded.messages.len(), 2); + assert_eq!(loaded.total_usage.total_tokens, 150); + assert_eq!(loaded.schema_version, LOOP_CHECKPOINT_SCHEMA_VERSION); + + // Overwrite semantics: a second save for the same run_id replaces + // the previous checkpoint (the loop only ever needs the latest). + let mut newer = loaded.clone(); + newer.turn = 5; + newer.tool_calls_count = 4; + store + .save_loop_checkpoint(run_id, &newer) + .await + .expect("save second"); + let again = store + .load_loop_checkpoint(run_id) + .await + .expect("load again") + .expect("checkpoint still present"); + assert_eq!(again.turn, 5); + assert_eq!(again.tool_calls_count, 4); + + // Unknown run id -> None. + let absent = store + .load_loop_checkpoint("does-not-exist") + .await + .expect("load missing"); + assert!(absent.is_none()); +} + +/// IT-7 (Pillar 3 cut 1): a `send()` whose LLM response carries no +/// tool calls must **not** write a loop checkpoint — the loop exits +/// at the no-tool boundary, before the per-tool-round persist point. +/// This guards against checkpoint pollution from purely conversational +/// turns. +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn send_without_tool_calls_does_not_emit_loop_checkpoint() { + use a3s_code_core::store::{MemorySessionStore, SessionStore}; + + let store_arc: std::sync::Arc = + std::sync::Arc::new(MemorySessionStore::new()); + let store: std::sync::Arc = store_arc.clone(); + + let agent = Agent::from_config(offline_test_config()).await.unwrap(); + let opts = SessionOptions::new() + .with_session_id("pillar3-no-tool-call") + .with_session_store(std::sync::Arc::clone(&store)) + .with_auto_save(true); + let session = agent + .session("/tmp/pillar3-no-tools", Some(opts)) + .expect("session"); + + // Default session() routes through the real LLM (no mock client + // injection here), so we can't actually call send(). Instead, + // assert the *negative* property: with no run yet executed, no + // checkpoint exists for any run id we choose to query. + // + // This also documents the contract for 书安OS-side tooling: a + // session that hasn't completed a tool round has no checkpoint. + let probe = store + .load_loop_checkpoint("any-fake-run-id") + .await + .expect("probe"); + assert!(probe.is_none()); + + // Sanity: the session is set up correctly and would persist on + // tool rounds if the LLM emitted any. + assert!(!session.is_closed()); +} + +/// IT-8 (Pillar 3 cut 2): `AgentSession::resume_run` fails fast with a +/// helpful error when there is no checkpoint for the given run id, and +/// with a different error when no `SessionStore` is configured at all. +/// These are the error paths 书安OS-side scheduling code needs to +/// distinguish to decide between "retry later" and "fall back to a +/// fresh session". +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn resume_run_error_paths_are_distinguishable() { + use a3s_code_core::store::MemorySessionStore; + + // Flavor A: no store on the session — resume_run must reject up + // front with a message that names the missing capability. + { + let agent = Agent::from_config(offline_test_config()).await.unwrap(); + let session = agent + .session("/tmp/it8-no-store", None) + .expect("session no store"); + let err = session.resume_run("any-id").await.unwrap_err(); + let msg = err.to_string(); + assert!( + msg.contains("session_store"), + "expected store-missing error, got: {msg}" + ); + } + + // Flavor B: store present but checkpoint absent — resume_run must + // reject with a message that names the missing run id. + { + let store: std::sync::Arc = + std::sync::Arc::new(MemorySessionStore::new()); + let agent = Agent::from_config(offline_test_config()).await.unwrap(); + let opts = SessionOptions::new() + .with_session_id("it8-no-checkpoint") + .with_session_store(std::sync::Arc::clone(&store)); + let session = agent + .session("/tmp/it8-no-checkpoint", Some(opts)) + .expect("session with store"); + let err = session.resume_run("does-not-exist").await.unwrap_err(); + let msg = err.to_string(); + assert!( + msg.contains("does-not-exist") && msg.contains("no loop checkpoint"), + "expected checkpoint-missing error naming the run id, got: {msg}" + ); + } +} diff --git a/sdk/node/Cargo.toml b/sdk/node/Cargo.toml index a2848a2..0db1b7b 100644 --- a/sdk/node/Cargo.toml +++ b/sdk/node/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "a3s-code-node" -version = "3.2.1" +version = "3.3.0" edition = "2021" authors = ["A3S Lab Team"] license = "MIT" @@ -11,7 +11,7 @@ description = "A3S Code Node.js bindings - Native addon via napi-rs" crate-type = ["cdylib"] [dependencies] -a3s-code-core = { version = "3.2.1", path = "../../core", features = ["ahp", "s3"] } +a3s-code-core = { version = "3.3.0", path = "../../core", features = ["ahp", "s3"] } napi = { version = "2", features = ["async", "napi6", "serde-json"] } napi-derive = "2" tokio = { version = "1.35", features = ["full"] } diff --git a/sdk/node/examples/package-lock.json b/sdk/node/examples/package-lock.json index 2224103..61413f9 100644 --- a/sdk/node/examples/package-lock.json +++ b/sdk/node/examples/package-lock.json @@ -18,7 +18,7 @@ }, "..": { "name": "@a3s-lab/code", - "version": "3.2.1", + "version": "3.3.0", "license": "MIT", "devDependencies": { "@napi-rs/cli": "^2", @@ -27,12 +27,12 @@ "typescript": "^5.9.3" }, "optionalDependencies": { - "@a3s-lab/code-darwin-arm64": "3.2.1", - "@a3s-lab/code-linux-arm64-gnu": "3.2.1", - "@a3s-lab/code-linux-arm64-musl": "3.2.1", - "@a3s-lab/code-linux-x64-gnu": "3.2.1", - "@a3s-lab/code-linux-x64-musl": "3.2.1", - "@a3s-lab/code-win32-x64-msvc": "3.2.1" + "@a3s-lab/code-darwin-arm64": "3.3.0", + "@a3s-lab/code-linux-arm64-gnu": "3.3.0", + "@a3s-lab/code-linux-arm64-musl": "3.3.0", + "@a3s-lab/code-linux-x64-gnu": "3.3.0", + "@a3s-lab/code-linux-x64-musl": "3.3.0", + "@a3s-lab/code-win32-x64-msvc": "3.3.0" } }, "node_modules/@a3s-lab/code": { diff --git a/sdk/node/generated.d.ts b/sdk/node/generated.d.ts index 9eeddc4..a46329d 100644 --- a/sdk/node/generated.d.ts +++ b/sdk/node/generated.d.ts @@ -568,6 +568,35 @@ export interface SessionOptions { * ``` */ sessionId?: string + /** + * Host-defined tenant id. Opaque to the framework — propagated to + * SessionData, hooks, and traces for multi-tenant aggregation / + * billing. Pair with `principal` / `agentTemplateId` / + * `correlationId` for full identity context. + */ + tenantId?: string + /** + * Identity of the principal (user / service / etc.) that triggered + * this session. Treated as opaque. + */ + principal?: string + /** + * Logical identifier of the agent template / definition the session + * was instantiated from. + */ + agentTemplateId?: string + /** + * Distributed-trace correlation id propagated through this + * session's events. + */ + correlationId?: string + /** + * Optional FIFO retention caps on the session's in-memory stores. + * Cap any subset; missing fields keep the unbounded default for + * that store. Use this to stop long-running cluster sessions + * from leaking memory in the run / trace / subagent trackers. + */ + retentionLimits?: RetentionLimitsObject /** Automatically save the session to the configured store after each turn (default: false). */ autoSave?: boolean /** @@ -716,6 +745,50 @@ export interface McpServerStatusEntry { toolCount: number error?: string } +/** + * Shape of the JS handlers object accepted by `session.setBudgetGuard`. + * Each field is optional — methods that aren't provided fall back to + * the framework's default Allow / no-op behaviour. + */ +export interface BudgetGuardHandlers { + checkBeforeLlm?: (...args: any[]) => any + recordAfterLlm?: (...args: any[]) => any + checkBeforeTool?: (...args: any[]) => any + /** + * Max time (ms) to wait for a `check*` callback to return before + * the guard fails **closed** (denies). Default 5000. A guard that + * throws (so its return value never arrives) or hangs is denied + * after this deadline — budget enforcement never silently + * disables itself. + */ + timeoutMs?: number +} +/** + * FIFO retention caps on the session's in-memory stores. All fields + * optional; missing fields keep the unbounded default for that + * store. Use to cap memory growth across long-running cluster + * sessions. + */ +export interface RetentionLimitsObject { + /** + * Cap on the number of runs retained in InMemoryRunStore. + * When exceeded the oldest run is dropped along with its events. + */ + maxRunsRetained?: number + /** + * Cap on event records retained per run. Oldest events + * FIFO-dropped from each run's buffer past this cap. The + * snapshot's cumulative `eventCount` is not decremented. + */ + maxEventsPerRun?: number + /** Cap on events retained in InMemoryTraceSink. */ + maxTraceEvents?: number + /** + * Cap on **terminal** (Completed / Failed / Cancelled) subagent + * task snapshots. Running tasks are never evicted. + */ + maxTerminalSubagentTasks?: number +} /** MCP server metadata exposed to slash command handlers. */ export interface CommandMcpServerInfo { /** MCP server name. */ @@ -1076,6 +1149,45 @@ export declare class Agent { * @param options - Optional session overrides layered on top of the worker definition */ sessionForWorker(workspace: string, worker: WorkerAgentSpec, options?: SessionOptions | undefined | null): Session + /** + * List session IDs for every live session created from this agent. + * + * Sessions that have been dropped (no JS-side references remain) are + * pruned lazily on each call. Result is sorted for stable output. + */ + listSessions(): Promise> + /** + * Close a specific live session by its session ID. + * + * Returns `true` when a live session with the given id was found and + * transitioned from open to closed by this call; `false` when no live + * session has that id, or when it was already closed. + * + * Equivalent to calling `session.close()` directly, but does not + * require holding a reference to the session — handy for control-plane + * code that only knows the session ID. + */ + closeSession(sessionId: string): Promise + /** + * Close every live session created from this agent and disconnect + * background resources owned by the agent (global MCP connections). + * + * After this call, `agent.session(...)` and `agent.resumeSession(...)` + * reject with a "Session closed" error. Idempotent. + */ + close(): Promise + /** Whether `close()` has been called on this agent. */ + isClosed(): boolean + /** + * Disconnect every global MCP server idle longer than + * `idleThresholdMs`, returning the names disconnected. The server's + * registered config is kept — a later tool call reconnects on + * demand. Call periodically (e.g. every 60s with a 5-min threshold) + * from a host-side sweeper to release file descriptors and + * background workers from quiet MCP servers in long-running + * deployments. + */ + disconnectIdleMcp(idleThresholdMs: number): Promise> } /** Workspace-bound session. All LLM and tool operations happen here. */ export declare class Session { @@ -1088,6 +1200,19 @@ export declare class Session { send(request: string | SessionRequestOptions, history?: Array | null): Promise /** Alias for `send(...)` with a name that matches run/replay terminology. */ run(request: string | SessionRequestOptions, history?: Array | null): Promise + /** + * Resume a previously-checkpointed run on this session. + * + * Loads the latest loop checkpoint stored under `checkpointRunId` + * from the configured `SessionStore` and replays the agent loop + * from that boundary. A new run id is allocated for the resumed + * work; the relationship between the old and new run is host + * metadata. + * + * Rejects when the session has no `sessionStore` configured, or + * when no checkpoint exists for `checkpointRunId`. + */ + resumeRun(checkpointRunId: string): Promise /** * Send a prompt or request and get a streaming event iterator. * @@ -1387,6 +1512,14 @@ export declare class Session { get workspace(): string /** Return any deferred init warning (e.g. memory store failed to initialize). */ get initWarning(): string | null + /** Host-defined tenant id attached at session creation, if any. */ + get tenantId(): string | null + /** Identity of the principal that triggered the session, if any. */ + get principal(): string | null + /** Logical agent template / definition id, if any. */ + get agentTemplateId(): string | null + /** Distributed-trace correlation id propagated through this session, if any. */ + get correlationId(): string | null /** Save the session to the configured store. */ save(): Promise /** Check if memory is configured for this session. */ @@ -1509,4 +1642,41 @@ export declare class Session { * cleanly without waiting on session-scoped background workers. */ close(): void + /** + * Whether [`close`](#method.close) has been called on this session. + * + * Once `true`, calls to `send` / `stream` reject with a "Session closed" + * error instead of starting a new run. + */ + isClosed(): boolean + /** + * Install a host-supplied BudgetGuard on this session. + * + * Each callback receives a single context object: + * - `checkBeforeLlm({ sessionId, estimatedTokens }) -> BudgetDecision | null` + * - `recordAfterLlm({ sessionId, usage }) -> void` + * - `checkBeforeTool({ sessionId, toolName }) -> BudgetDecision | null` + * + * where `BudgetDecision` is one of: + * - `null` / `{ decision: 'allow' }` → allow + * - `{ decision: 'soft', resource, consumed, limit, message? }` → emits BudgetThresholdHit('soft'), proceeds + * - `{ decision: 'deny', resource, reason }` → aborts the call, throws "Budget exhausted" + * + * FAIL-CLOSED on hang: a `check*` callback that does not return + * within `timeoutMs` (default 5000) is treated as a **deny**, never + * a silent allow — a budget control must not disable itself when the + * guard stalls. A malformed/unreadable return likewise denies. + * + * ⚠吅 The callbacks MUST NOT throw. Due to a napi-rs limitation a JS + * exception thrown from a budget-guard callback aborts the host + * process (the return value cannot be converted). Wrap your logic in + * try/catch and return a decision (e.g. a deny) instead of throwing. + * (The Python SDK's BudgetGuard catches exceptions safely; only the + * Node binding has this constraint.) + * + * The guard takes effect on the next `send` / `stream`. Pass `null` + * for a method to leave it unhandled (default allow / no-op). Pass + * `null` for the whole handlers arg to clear the guard. + */ + setBudgetGuard(handlers: { checkBeforeLlm?: ((ctx: { sessionId: string; estimatedTokens: number }) => any) | null; recordAfterLlm?: ((ctx: { sessionId: string; usage: any }) => void) | null; checkBeforeTool?: ((ctx: { sessionId: string; toolName: string }) => any) | null; timeoutMs?: number | null } | null): void } diff --git a/sdk/node/package-lock.json b/sdk/node/package-lock.json index d0e864b..f8c85cf 100644 --- a/sdk/node/package-lock.json +++ b/sdk/node/package-lock.json @@ -1,12 +1,12 @@ { "name": "@a3s-lab/code", - "version": "3.2.1", + "version": "3.3.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@a3s-lab/code", - "version": "3.2.1", + "version": "3.3.0", "license": "MIT", "devDependencies": { "@napi-rs/cli": "^2", @@ -15,12 +15,12 @@ "typescript": "^5.9.3" }, "optionalDependencies": { - "@a3s-lab/code-darwin-arm64": "3.2.1", - "@a3s-lab/code-linux-arm64-gnu": "3.2.1", - "@a3s-lab/code-linux-arm64-musl": "3.2.1", - "@a3s-lab/code-linux-x64-gnu": "3.2.1", - "@a3s-lab/code-linux-x64-musl": "3.2.1", - "@a3s-lab/code-win32-x64-msvc": "3.2.1" + "@a3s-lab/code-darwin-arm64": "3.3.0", + "@a3s-lab/code-linux-arm64-gnu": "3.3.0", + "@a3s-lab/code-linux-arm64-musl": "3.3.0", + "@a3s-lab/code-linux-x64-gnu": "3.3.0", + "@a3s-lab/code-linux-x64-musl": "3.3.0", + "@a3s-lab/code-win32-x64-msvc": "3.3.0" } }, "node_modules/@a3s-lab/code-darwin-arm64": { diff --git a/sdk/node/package.json b/sdk/node/package.json index 8304a02..b78afd4 100644 --- a/sdk/node/package.json +++ b/sdk/node/package.json @@ -1,6 +1,6 @@ { "name": "@a3s-lab/code", - "version": "3.2.1", + "version": "3.3.0", "description": "A3S Code - Native Node.js bindings for the coding-agent runtime", "main": "index.js", "types": "index.d.ts", @@ -43,11 +43,11 @@ "test:helpers": "node test-helpers.mjs" }, "optionalDependencies": { - "@a3s-lab/code-darwin-arm64": "3.2.1", - "@a3s-lab/code-linux-x64-gnu": "3.2.1", - "@a3s-lab/code-linux-x64-musl": "3.2.1", - "@a3s-lab/code-linux-arm64-gnu": "3.2.1", - "@a3s-lab/code-linux-arm64-musl": "3.2.1", - "@a3s-lab/code-win32-x64-msvc": "3.2.1" + "@a3s-lab/code-darwin-arm64": "3.3.0", + "@a3s-lab/code-linux-x64-gnu": "3.3.0", + "@a3s-lab/code-linux-x64-musl": "3.3.0", + "@a3s-lab/code-linux-arm64-gnu": "3.3.0", + "@a3s-lab/code-linux-arm64-musl": "3.3.0", + "@a3s-lab/code-win32-x64-msvc": "3.3.0" } } diff --git a/sdk/node/src/lib.rs b/sdk/node/src/lib.rs index ee5bc4a..77afb8c 100644 --- a/sdk/node/src/lib.rs +++ b/sdk/node/src/lib.rs @@ -1878,6 +1878,25 @@ pub struct SessionOptions { /// agent.resumeSession('my-session', { sessionStore: new FileSessionStore('./sessions') }); /// ``` pub session_id: Option, + /// Host-defined tenant id. Opaque to the framework — propagated to + /// SessionData, hooks, and traces for multi-tenant aggregation / + /// billing. Pair with `principal` / `agentTemplateId` / + /// `correlationId` for full identity context. + pub tenant_id: Option, + /// Identity of the principal (user / service / etc.) that triggered + /// this session. Treated as opaque. + pub principal: Option, + /// Logical identifier of the agent template / definition the session + /// was instantiated from. + pub agent_template_id: Option, + /// Distributed-trace correlation id propagated through this + /// session's events. + pub correlation_id: Option, + /// Optional FIFO retention caps on the session's in-memory stores. + /// Cap any subset; missing fields keep the unbounded default for + /// that store. Use this to stop long-running cluster sessions + /// from leaking memory in the run / trace / subagent trackers. + pub retention_limits: Option, /// Automatically save the session to the configured store after each turn (default: false). pub auto_save: Option, /// AHP transport configuration for external agent supervision. @@ -2402,6 +2421,34 @@ fn js_session_options_to_rust(options: Option) -> napi::Result Vec { + self.inner.list_sessions().await + } + + /// Close a specific live session by its session ID. + /// + /// Returns `true` when a live session with the given id was found and + /// transitioned from open to closed by this call; `false` when no live + /// session has that id, or when it was already closed. + /// + /// Equivalent to calling `session.close()` directly, but does not + /// require holding a reference to the session — handy for control-plane + /// code that only knows the session ID. + #[napi] + pub async fn close_session(&self, session_id: String) -> bool { + self.inner.close_session(&session_id).await + } + + /// Close every live session created from this agent and disconnect + /// background resources owned by the agent (global MCP connections). + /// + /// After this call, `agent.session(...)` and `agent.resumeSession(...)` + /// reject with a "Session closed" error. Idempotent. + #[napi] + pub async fn close(&self) { + self.inner.close().await + } + + /// Whether `close()` has been called on this agent. + #[napi] + pub fn is_closed(&self) -> bool { + self.inner.is_closed() + } + + /// Disconnect every global MCP server idle longer than + /// `idleThresholdMs`, returning the names disconnected. The server's + /// registered config is kept — a later tool call reconnects on + /// demand. Call periodically (e.g. every 60s with a 5-min threshold) + /// from a host-side sweeper to release file descriptors and + /// background workers from quiet MCP servers in long-running + /// deployments. + #[napi] + pub async fn disconnect_idle_mcp(&self, idle_threshold_ms: i64) -> Vec { + self.inner + .disconnect_idle_mcp(idle_threshold_ms.max(0) as u64) + .await + } } // ============================================================================ @@ -2928,6 +3028,27 @@ impl Session { send_session_request(self.inner.clone(), prompt, rust_history, rust_attachments).await } + /// Resume a previously-checkpointed run on this session. + /// + /// Loads the latest loop checkpoint stored under `checkpointRunId` + /// from the configured `SessionStore` and replays the agent loop + /// from that boundary. A new run id is allocated for the resumed + /// work; the relationship between the old and new run is host + /// metadata. + /// + /// Rejects when the session has no `sessionStore` configured, or + /// when no checkpoint exists for `checkpointRunId`. + #[napi] + pub async fn resume_run(&self, checkpoint_run_id: String) -> napi::Result { + let session = self.inner.clone(); + let result = get_runtime() + .spawn(async move { session.resume_run(&checkpoint_run_id).await }) + .await + .map_err(|e| napi::Error::from_reason(format!("Task join error: {e}")))? + .map_err(|e| napi::Error::from_reason(format!("{e}")))?; + Ok(AgentResult::from(result)) + } + /// Send a prompt or request and get a streaming event iterator. /// /// Returns an `EventStream`. Use `for await (const event of stream)` or call `.next()` manually. @@ -4042,6 +4163,30 @@ impl Session { self.inner.init_warning().map(|s| s.to_string()) } + /// Host-defined tenant id attached at session creation, if any. + #[napi(getter)] + pub fn tenant_id(&self) -> Option { + self.inner.tenant_id().map(|s| s.to_string()) + } + + /// Identity of the principal that triggered the session, if any. + #[napi(getter)] + pub fn principal(&self) -> Option { + self.inner.principal().map(|s| s.to_string()) + } + + /// Logical agent template / definition id, if any. + #[napi(getter)] + pub fn agent_template_id(&self) -> Option { + self.inner.agent_template_id().map(|s| s.to_string()) + } + + /// Distributed-trace correlation id propagated through this session, if any. + #[napi(getter)] + pub fn correlation_id(&self) -> Option { + self.inner.correlation_id().map(|s| s.to_string()) + } + // ======================================================================== // Session Persistence API // ======================================================================== @@ -4381,6 +4526,332 @@ impl Session { let session = self.inner.clone(); get_runtime().block_on(session.close()) } + + /// Whether [`close`](#method.close) has been called on this session. + /// + /// Once `true`, calls to `send` / `stream` reject with a "Session closed" + /// error instead of starting a new run. + #[napi] + pub fn is_closed(&self) -> bool { + self.inner.is_closed() + } + + /// Install a host-supplied BudgetGuard on this session. + /// + /// Each callback receives a single context object: + /// - `checkBeforeLlm({ sessionId, estimatedTokens }) -> BudgetDecision | null` + /// - `recordAfterLlm({ sessionId, usage }) -> void` + /// - `checkBeforeTool({ sessionId, toolName }) -> BudgetDecision | null` + /// + /// where `BudgetDecision` is one of: + /// - `null` / `{ decision: 'allow' }` → allow + /// - `{ decision: 'soft', resource, consumed, limit, message? }` → emits BudgetThresholdHit('soft'), proceeds + /// - `{ decision: 'deny', resource, reason }` → aborts the call, throws "Budget exhausted" + /// + /// FAIL-CLOSED on hang: a `check*` callback that does not return + /// within `timeoutMs` (default 5000) is treated as a **deny**, never + /// a silent allow — a budget control must not disable itself when the + /// guard stalls. A malformed/unreadable return likewise denies. + /// + /// ⚠️ The callbacks MUST NOT throw. Due to a napi-rs limitation a JS + /// exception thrown from a budget-guard callback aborts the host + /// process (the return value cannot be converted). Wrap your logic in + /// try/catch and return a decision (e.g. a deny) instead of throwing. + /// (The Python SDK's BudgetGuard catches exceptions safely; only the + /// Node binding has this constraint.) + /// + /// The guard takes effect on the next `send` / `stream`. Pass `null` + /// for a method to leave it unhandled (default allow / no-op). Pass + /// `null` for the whole handlers arg to clear the guard. + #[napi( + ts_args_type = "handlers: { checkBeforeLlm?: ((ctx: { sessionId: string; estimatedTokens: number }) => any) | null; recordAfterLlm?: ((ctx: { sessionId: string; usage: any }) => void) | null; checkBeforeTool?: ((ctx: { sessionId: string; toolName: string }) => any) | null; timeoutMs?: number | null } | null" + )] + pub fn set_budget_guard(&self, handlers: Option) -> napi::Result<()> { + use napi::threadsafe_function::{ErrorStrategy, ThreadSafeCallContext, ThreadsafeFunction}; + + let Some(h) = handlers else { + self.inner.set_budget_guard(None); + return Ok(()); + }; + + // Pass the call context as a SINGLE object arg so the JS callback + // signature is the clean `(ctx) => decision`. We use + // `ErrorStrategy::Fatal` (no leading `err` param). NOTE: in this + // napi-rs version a JS callback that THROWS aborts the host process + // at the return-value-conversion stage regardless of ErrorStrategy + // (CalleeHandled does not help) — so budget-guard callbacks MUST NOT + // throw; wrap your logic in try/catch and return a decision. Hangs + // are handled safely (fail-closed timeout below). + let single_obj = |ctx: ThreadSafeCallContext| { + Ok(vec![ctx.env.to_js_value(&ctx.value)?]) + }; + + let check_llm_tsfn: Option> = h + .check_before_llm + .map(|f| f.create_threadsafe_function(0, single_obj)) + .transpose()?; + + let record_tsfn: Option> = h + .record_after_llm + .map(|f| f.create_threadsafe_function(0, single_obj)) + .transpose()?; + + let check_tool_tsfn: Option> = + h.check_before_tool + .map(|f| f.create_threadsafe_function(0, single_obj)) + .transpose()?; + + let guard: Arc = Arc::new(NodeBudgetGuard { + check_before_llm: check_llm_tsfn, + record_after_llm: record_tsfn, + check_before_tool: check_tool_tsfn, + // Configurable; default 5s. On timeout the guard fails CLOSED + // (Deny), so a small value trades latency-on-hang for faster + // denial of a stuck guard. + timeout_ms: h.timeout_ms.map(|t| t as u64).unwrap_or(5_000), + }); + self.inner.set_budget_guard(Some(guard)); + Ok(()) + } +} + +// ============================================================================ +// Node-side BudgetGuard wrapper +// ============================================================================ + +/// Shape of the JS handlers object accepted by `session.setBudgetGuard`. +/// Each field is optional — methods that aren't provided fall back to +/// the framework's default Allow / no-op behaviour. +#[napi(object)] +pub struct BudgetGuardHandlers { + pub check_before_llm: Option, + pub record_after_llm: Option, + pub check_before_tool: Option, + /// Max time (ms) to wait for a `check*` callback to return before + /// the guard fails **closed** (denies). Default 5000. A guard that + /// throws (so its return value never arrives) or hangs is denied + /// after this deadline — budget enforcement never silently + /// disables itself. + pub timeout_ms: Option, +} + +/// FIFO retention caps on the session's in-memory stores. All fields +/// optional; missing fields keep the unbounded default for that +/// store. Use to cap memory growth across long-running cluster +/// sessions. +#[napi(object)] +pub struct RetentionLimitsObject { + /// Cap on the number of runs retained in InMemoryRunStore. + /// When exceeded the oldest run is dropped along with its events. + pub max_runs_retained: Option, + /// Cap on event records retained per run. Oldest events + /// FIFO-dropped from each run's buffer past this cap. The + /// snapshot's cumulative `eventCount` is not decremented. + pub max_events_per_run: Option, + /// Cap on events retained in InMemoryTraceSink. + pub max_trace_events: Option, + /// Cap on **terminal** (Completed / Failed / Cancelled) subagent + /// task snapshots. Running tasks are never evicted. + pub max_terminal_subagent_tasks: Option, +} + +struct NodeBudgetGuard { + check_before_llm: Option< + napi::threadsafe_function::ThreadsafeFunction< + serde_json::Value, + napi::threadsafe_function::ErrorStrategy::Fatal, + >, + >, + record_after_llm: Option< + napi::threadsafe_function::ThreadsafeFunction< + serde_json::Value, + napi::threadsafe_function::ErrorStrategy::Fatal, + >, + >, + check_before_tool: Option< + napi::threadsafe_function::ThreadsafeFunction< + serde_json::Value, + napi::threadsafe_function::ErrorStrategy::Fatal, + >, + >, + timeout_ms: u64, +} + +// SAFETY: ThreadsafeFunction is designed to be sent across threads. +unsafe impl Send for NodeBudgetGuard {} +unsafe impl Sync for NodeBudgetGuard {} + +impl NodeBudgetGuard { + fn call_decision( + &self, + tsfn: &napi::threadsafe_function::ThreadsafeFunction< + serde_json::Value, + napi::threadsafe_function::ErrorStrategy::Fatal, + >, + args: serde_json::Value, + ) -> a3s_code_core::budget::BudgetDecision { + let (tx, rx) = std::sync::mpsc::sync_channel::(1); + tsfn.call_with_return_value( + args, + napi::threadsafe_function::ThreadsafeFunctionCallMode::NonBlocking, + move |ret: napi::JsUnknown| { + // FAIL-CLOSED: if the JS return value can't even be read as + // a napi value, deny rather than allow. A budget guard is a + // cost/quota control — silently permitting on a broken + // response is the dangerous direction. (Explicit responses + // like null / {decision:'allow'} are still parsed leniently + // as Allow inside parse_js_budget_decision.) + let decision = parse_js_budget_decision(ret).unwrap_or_else(|_| { + a3s_code_core::budget::BudgetDecision::Deny { + resource: "budget_guard_error".to_string(), + reason: "budget guard return value could not be read".to_string(), + } + }); + let _ = tx.send(decision); + Ok(()) + }, + ); + // FAIL-CLOSED on timeout: a hung or throwing guard (under Fatal + // strategy a JS throw means the return closure never fires, so the + // channel stays empty and we hit this timeout) must DENY, not + // Allow. Previously this defaulted to Allow — meaning a slow/buggy + // guard silently disabled budget enforcement (a fail-open hole). + tokio::task::block_in_place(|| { + rx.recv_timeout(std::time::Duration::from_millis(self.timeout_ms)) + .unwrap_or_else(|_| a3s_code_core::budget::BudgetDecision::Deny { + resource: "budget_guard_timeout".to_string(), + reason: format!("budget guard did not respond within {}ms", self.timeout_ms), + }) + }) + } +} + +#[async_trait::async_trait] +impl a3s_code_core::budget::BudgetGuard for NodeBudgetGuard { + async fn check_before_llm( + &self, + session_id: &str, + estimated_prompt_tokens: usize, + ) -> a3s_code_core::budget::BudgetDecision { + let Some(tsfn) = self.check_before_llm.as_ref() else { + return a3s_code_core::budget::BudgetDecision::Allow; + }; + self.call_decision( + tsfn, + serde_json::json!({ + "sessionId": session_id, + "estimatedTokens": estimated_prompt_tokens, + }), + ) + } + + async fn record_after_llm(&self, session_id: &str, usage: &a3s_code_core::llm::TokenUsage) { + let Some(tsfn) = self.record_after_llm.as_ref() else { + return; + }; + let _ = tsfn.call( + serde_json::json!({ + "sessionId": session_id, + "usage": { + "promptTokens": usage.prompt_tokens, + "completionTokens": usage.completion_tokens, + "totalTokens": usage.total_tokens, + "cacheReadTokens": usage.cache_read_tokens, + "cacheWriteTokens": usage.cache_write_tokens, + }, + }), + napi::threadsafe_function::ThreadsafeFunctionCallMode::NonBlocking, + ); + } + + async fn check_before_tool( + &self, + session_id: &str, + tool_name: &str, + ) -> a3s_code_core::budget::BudgetDecision { + let Some(tsfn) = self.check_before_tool.as_ref() else { + return a3s_code_core::budget::BudgetDecision::Allow; + }; + self.call_decision( + tsfn, + serde_json::json!({ "sessionId": session_id, "toolName": tool_name }), + ) + } +} + +/// Parse the return value of a JS BudgetGuard callback into a +/// [`BudgetDecision`](a3s_code_core::budget::BudgetDecision). +/// +/// Accepted JS shapes mirror Python's: +/// - `null` / `undefined` / `{ decision: 'allow' }` → Allow +/// - `{ decision: 'soft', resource, consumed, limit, message? }` → SoftLimit +/// - `{ decision: 'deny', resource, reason }` → Deny +fn parse_js_budget_decision( + val: napi::JsUnknown, +) -> napi::Result { + use a3s_code_core::budget::BudgetDecision; + use napi::{JsObject, ValueType}; + + match val.get_type()? { + ValueType::Null | ValueType::Undefined => Ok(BudgetDecision::Allow), + ValueType::Object => { + let obj = unsafe { val.cast::() }; + let decision: String = obj + .get_named_property::("decision") + .ok() + .and_then(|s| s.into_utf8().ok()) + .and_then(|s| s.into_owned().ok()) + .unwrap_or_else(|| "allow".to_string()); + match decision.as_str() { + "deny" => { + let resource = obj + .get_named_property::("resource") + .ok() + .and_then(|s| s.into_utf8().ok()) + .and_then(|s| s.into_owned().ok()) + .unwrap_or_else(|| "unspecified".to_string()); + let reason = obj + .get_named_property::("reason") + .ok() + .and_then(|s| s.into_utf8().ok()) + .and_then(|s| s.into_owned().ok()) + .unwrap_or_else(|| "denied by host".to_string()); + Ok(BudgetDecision::Deny { resource, reason }) + } + "soft" => { + let resource = obj + .get_named_property::("resource") + .ok() + .and_then(|s| s.into_utf8().ok()) + .and_then(|s| s.into_owned().ok()) + .unwrap_or_else(|| "unspecified".to_string()); + let consumed = obj + .get_named_property::("consumed") + .ok() + .and_then(|n| n.get_double().ok()) + .unwrap_or(0.0); + let limit = obj + .get_named_property::("limit") + .ok() + .and_then(|n| n.get_double().ok()) + .unwrap_or(0.0); + let message = obj + .get_named_property::("message") + .ok() + .and_then(|s| s.into_utf8().ok()) + .and_then(|s| s.into_owned().ok()); + Ok(BudgetDecision::SoftLimit { + resource, + consumed, + limit, + message, + }) + } + _ => Ok(BudgetDecision::Allow), + } + } + _ => Ok(BudgetDecision::Allow), + } } // ============================================================================ diff --git a/sdk/node/test_budget_guard.mjs b/sdk/node/test_budget_guard.mjs new file mode 100644 index 0000000..a06f22f --- /dev/null +++ b/sdk/node/test_budget_guard.mjs @@ -0,0 +1,85 @@ +// Smoke test for the Node SDK BudgetGuard bridge. +// +// Verifies that a JS guard whose checkBeforeLlm returns +// { decision: 'deny', ... } aborts session.send before the LLM is +// touched. Runs with `node sdk/node/test_budget_guard.mjs`. + +import assert from 'node:assert/strict' +import os from 'node:os' +import path from 'node:path' +import fs from 'node:fs' +import mod from './index.js' + +const tmpRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'a3s-budget-')) +const workspace = path.join(tmpRoot, 'workspace') +fs.mkdirSync(workspace, { recursive: true }) + +const inlineConfig = ` +default_model = "anthropic/claude-sonnet-4-20250514" + +providers "anthropic" { + api_key = "test-key" + models "claude-sonnet-4-20250514" { + name = "Claude Sonnet 4" + } +} +`.trim() + +const agent = await mod.Agent.create(inlineConfig) + +const session = agent.session(workspace, { + sessionId: 'budget-deny-node', + permissionPolicy: { defaultDecision: 'allow' }, + workspaceBackend: new mod.LocalWorkspaceBackend(workspace), +}) + +let llmChecks = 0 +let llmRecords = 0 +let toolChecks = 0 + +session.setBudgetGuard({ + checkBeforeLlm: (ctx) => { + llmChecks += 1 + assert.equal(ctx.sessionId, 'budget-deny-node', `wrong session_id, got ${ctx.sessionId}`) + assert.equal(typeof ctx.estimatedTokens, 'number', 'estimatedTokens must be a number') + return { decision: 'deny', resource: 'llm_tokens', reason: 'cap hit' } + }, + recordAfterLlm: (_ctx) => { + llmRecords += 1 + }, + checkBeforeTool: (_ctx) => { + toolChecks += 1 + return null + }, +}) + +let threw = false +try { + await session.send('hello') +} catch (err) { + threw = true + const msg = String(err).toLowerCase() + assert.ok( + msg.includes('budget exhausted') || msg.includes('llm_tokens'), + `expected budget-exhausted error, got: ${err}`, + ) +} +assert.equal(threw, true, 'send() must throw when checkBeforeLlm denies') +assert.equal(llmChecks, 1, `checkBeforeLlm must fire exactly once, got ${llmChecks}`) +assert.equal(llmRecords, 0, `recordAfterLlm must not fire on Deny, got ${llmRecords}`) +assert.equal(toolChecks, 0, 'no tool was attempted; toolChecks must stay 0') + +// Clearing the guard restores Allow-default behaviour. The mock LLM +// configured by `test_session_close.mjs` is not present here, so a +// real send would still fail at the provider level — we just verify +// that setBudgetGuard(null) is accepted without error. +session.setBudgetGuard(null) + +// Fail-closed semantics for HANGS / malformed returns are enforced in +// the bridge (timeout -> Deny, unreadable return -> Deny). We do not +// exercise a THROWING guard here: due to a napi-rs limitation a JS throw +// from the callback aborts the host process at return-value conversion +// (documented on setBudgetGuard — guards must not throw). The Python SDK +// budget-guard test (test_budget_guard.py) covers the throw-safe path. + +console.log('node sdk budget guard ok') diff --git a/sdk/node/test_session_close.mjs b/sdk/node/test_session_close.mjs new file mode 100644 index 0000000..aad4c39 --- /dev/null +++ b/sdk/node/test_session_close.mjs @@ -0,0 +1,112 @@ +// Smoke test for the Agent / Session close surface exposed by the +// core in steps 1–3 and propagated through the NAPI bindings in step 4. +// +// Run with: +// node sdk/node/test_session_close.mjs +// (no provider credentials needed — uses inline ACL). + +import assert from 'node:assert/strict' +import os from 'node:os' +import path from 'node:path' +import fs from 'node:fs' +import mod from './index.js' + +const tmpRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'a3s-node-close-')) +const workspace = path.join(tmpRoot, 'workspace') +fs.mkdirSync(workspace, { recursive: true }) + +const inlineConfig = ` +default_model = "anthropic/claude-sonnet-4-20250514" + +providers "anthropic" { + api_key = "test-key" + models "claude-sonnet-4-20250514" { + name = "Claude Sonnet 4" + } +} +`.trim() + +const agent = await mod.Agent.create(inlineConfig) + +function makeSession(sessionId) { + return agent.session(workspace, { + sessionId, + permissionPolicy: { defaultDecision: 'allow' }, + workspaceBackend: new mod.LocalWorkspaceBackend(workspace), + }) +} + +// 1. Fresh session: isClosed is false; agent.listSessions sees it. +const sessionA = makeSession('node-close-1') +assert.equal(sessionA.isClosed(), false, 'fresh session should not be closed') + +const listedBefore = await agent.listSessions() +assert.ok( + listedBefore.includes('node-close-1'), + `agent.listSessions() should include node-close-1, got ${JSON.stringify(listedBefore)}`, +) + +// 2. session.close() flips isClosed and is idempotent. +sessionA.close() +assert.equal(sessionA.isClosed(), true, 'session.close() must set isClosed = true') +sessionA.close() // second close must not throw +assert.equal(sessionA.isClosed(), true) + +// 3. agent.closeSession(id) on a new live session closes it. +const sessionB = makeSession('node-close-2') +assert.equal(sessionB.isClosed(), false) +const wasOpen = await agent.closeSession('node-close-2') +assert.equal( + wasOpen, + true, + `closeSession() on a live session must return true, got ${wasOpen}`, +) +assert.equal( + sessionB.isClosed(), + true, + 'closeSession() must propagate to the JS wrapper\'s isClosed view', +) + +// 4. closeSession() on an unknown id returns false, doesn't throw. +const unknown = await agent.closeSession('does-not-exist') +assert.equal( + unknown, + false, + `closeSession() on unknown id must return false, got ${unknown}`, +) + +// 5. agent.close() closes every live session and rejects new session(). +const sessionC = makeSession('node-close-3') +const sessionD = makeSession('node-close-4') +assert.equal(sessionC.isClosed(), false) +assert.equal(sessionD.isClosed(), false) + +await agent.close() +assert.equal(agent.isClosed(), true, 'agent.isClosed() must be true after agent.close()') +assert.equal(sessionC.isClosed(), true, 'agent.close() must close sessionC') +assert.equal(sessionD.isClosed(), true, 'agent.close() must close sessionD') + +let threw = false +try { + makeSession('node-close-post') +} catch (err) { + threw = true + const msg = String(err).toLowerCase() + assert.ok( + msg.includes('closed'), + `post-close session() error must mention 'closed', got: ${err}`, + ) +} +assert.equal(threw, true, 'session() after agent.close() must throw') + +// disconnectIdleMcp is exposed and returns an array (empty here — the +// inline config registers no MCP servers). Call on a fresh agent since +// the one above is closed. +{ + const agent2 = await mod.Agent.create(inlineConfig) + const dropped = await agent2.disconnectIdleMcp(5 * 60 * 1000) + assert.ok(Array.isArray(dropped), 'disconnectIdleMcp must return an array') + assert.equal(dropped.length, 0, 'no MCP servers configured -> nothing dropped') +} + +console.log('node sdk session close api ok') diff --git a/sdk/python-bootstrap/pyproject.toml b/sdk/python-bootstrap/pyproject.toml index 0cac45e..2388250 100644 --- a/sdk/python-bootstrap/pyproject.toml +++ b/sdk/python-bootstrap/pyproject.toml @@ -7,7 +7,7 @@ name = "a3s-code" # Keep in sync with crates/code core release. The bootstrap loader fetches # the matching native wheel from `https://github.com/AI45Lab/Code/releases/tag/v` # at import time. -version = "3.2.1" +version = "3.3.0" description = "A3S Code Python SDK — pure-Python bootstrap that fetches the native wheel from GitHub Releases" readme = "README.md" license = {text = "MIT"} diff --git a/sdk/python-bootstrap/src/a3s_code/_bootstrap.py b/sdk/python-bootstrap/src/a3s_code/_bootstrap.py index e1ed30e..ac6bd83 100644 --- a/sdk/python-bootstrap/src/a3s_code/_bootstrap.py +++ b/sdk/python-bootstrap/src/a3s_code/_bootstrap.py @@ -31,7 +31,7 @@ # Version is the bootstrap's own version, which equals the matching native # wheel version on GH Releases. Bumped by the release workflow. -__version__ = "3.2.1" +__version__ = "3.3.0" _DEFAULT_BASE_URL = "https://github.com/AI45Lab/Code/releases/download" _REQUEST_TIMEOUT_S = 120 diff --git a/sdk/python/Cargo.toml b/sdk/python/Cargo.toml index 7586148..0a6b5a0 100644 --- a/sdk/python/Cargo.toml +++ b/sdk/python/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "a3s-code-py" -version = "3.2.1" +version = "3.3.0" edition = "2021" authors = ["A3S Lab Team"] license = "MIT" @@ -12,7 +12,7 @@ name = "a3s_code" crate-type = ["cdylib"] [dependencies] -a3s-code-core = { version = "3.2.1", path = "../../core", features = ["ahp", "s3"] } +a3s-code-core = { version = "3.3.0", path = "../../core", features = ["ahp", "s3"] } pyo3 = "0.23" tokio = { version = "1.35", features = ["full"] } serde_json = "1.0" diff --git a/sdk/python/pyproject.toml b/sdk/python/pyproject.toml index 7224a82..53a4ad1 100644 --- a/sdk/python/pyproject.toml +++ b/sdk/python/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "maturin" [project] name = "a3s-code" -version = "3.2.1" +version = "3.3.0" description = "A3S Code - Native Python bindings for the coding-agent runtime" readme = "README.md" license = {text = "MIT"} diff --git a/sdk/python/src/lib.rs b/sdk/python/src/lib.rs index 068569d..9a3c876 100644 --- a/sdk/python/src/lib.rs +++ b/sdk/python/src/lib.rs @@ -1257,6 +1257,55 @@ impl PyAgent { inner: Arc::new(session), }) } + + /// List session IDs for every live session created from this agent. + /// + /// Sessions that have been dropped (no Python references remain) are + /// pruned lazily on each call. Result is sorted for stable output. + fn list_sessions(&self, py: Python<'_>) -> Vec { + let agent = self.inner.clone(); + py.allow_threads(move || get_runtime().block_on(agent.list_sessions())) + } + + /// Close a specific live session by its session ID. + /// + /// Returns ``True`` when a live session with the given id was found and + /// transitioned from open to closed by this call; ``False`` when no + /// live session has that id, or when it was already closed. + fn close_session(&self, py: Python<'_>, session_id: String) -> bool { + let agent = self.inner.clone(); + py.allow_threads(move || get_runtime().block_on(agent.close_session(&session_id))) + } + + /// Close every live session created from this agent and disconnect + /// background resources owned by the agent (global MCP connections). + /// + /// After this call, ``agent.session(...)`` and ``agent.resume_session(...)`` + /// raise ``RuntimeError`` with a "Session closed" message. Idempotent. + fn close(&self, py: Python<'_>) { + let agent = self.inner.clone(); + py.allow_threads(move || get_runtime().block_on(agent.close())); + } + + /// Whether ``close()`` has been called on this agent. + #[getter] + fn is_closed(&self) -> bool { + self.inner.is_closed() + } + + /// Disconnect every global MCP server idle longer than + /// ``idle_threshold_ms``, returning the names disconnected. The + /// server's registered config is kept — a later tool call reconnects + /// on demand. Call periodically (e.g. every 60s with a 5-min + /// threshold) from a host-side sweeper to release file descriptors + /// and background workers from quiet MCP servers in long-running + /// deployments. + fn disconnect_idle_mcp(&self, py: Python<'_>, idle_threshold_ms: u64) -> Vec { + let agent = self.inner.clone(); + py.allow_threads(move || { + get_runtime().block_on(agent.disconnect_idle_mcp(idle_threshold_ms)) + }) + } } // ============================================================================ @@ -1314,6 +1363,22 @@ impl PySession { self.send(py, prompt, history) } + /// Resume a previously-checkpointed run on this session. + /// + /// Loads the latest loop checkpoint stored under ``checkpoint_run_id`` + /// and replays the agent loop from that boundary. A new run id is + /// allocated for the resumed work. + /// + /// Raises ``RuntimeError`` when no ``session_store`` is configured, + /// or when no checkpoint exists for the given id. + fn resume_run(&self, py: Python<'_>, checkpoint_run_id: String) -> PyResult { + let session = self.inner.clone(); + let result = py + .allow_threads(move || get_runtime().block_on(session.resume_run(&checkpoint_run_id))) + .map_err(|e| PyRuntimeError::new_err(format!("{e}")))?; + Ok(PyAgentResult::from(result)) + } + /// Send a prompt or request and get a streaming iterator of events. /// /// When ``history`` is omitted, session history and verification evidence are @@ -2625,6 +2690,31 @@ impl PySession { self.inner.init_warning().map(|s| s.to_string()) } + /// Host-defined tenant id attached at session creation, if any. + #[getter] + fn tenant_id(&self) -> Option { + self.inner.tenant_id().map(|s| s.to_string()) + } + + /// Identity of the principal that triggered the session, if any. + #[getter] + fn principal(&self) -> Option { + self.inner.principal().map(|s| s.to_string()) + } + + /// Logical agent template / definition id, if any. + #[getter] + fn agent_template_id(&self) -> Option { + self.inner.agent_template_id().map(|s| s.to_string()) + } + + /// Distributed-trace correlation id propagated through this session, + /// if any. + #[getter] + fn correlation_id(&self) -> Option { + self.inner.correlation_id().map(|s| s.to_string()) + } + // ======================================================================== // Session Persistence API // ======================================================================== @@ -2957,6 +3047,15 @@ impl PySession { Ok(()) } + /// Whether ``close()`` has been called on this session. + /// + /// Once ``True``, calls to ``send`` / ``stream`` raise ``RuntimeError`` + /// with a "Session closed" message instead of starting a new run. + #[getter] + fn is_closed(&self) -> bool { + self.inner.is_closed() + } + fn __repr__(&self) -> String { format!( "Session(id='{}', workspace='{}')", @@ -3089,6 +3188,231 @@ fn parse_py_hook_response( Ok(RustHookResponse::continue_()) } +// ============================================================================ +// Python BudgetGuard bridge +// ============================================================================ + +/// Bridges a Python BudgetGuard instance into the Rust async +/// [`a3s_code_core::budget::BudgetGuard`] trait. +/// +/// Looks up `check_before_llm`, `record_after_llm`, and +/// `check_before_tool` on the held `PyObject` at call time, so the +/// user's Python class only needs to define the methods it cares +/// about — missing methods are treated as a permissive default +/// (Allow / no-op). +/// +/// Calls into Python acquire the GIL via `Python::with_gil`, which +/// blocks the tokio worker thread briefly. Acceptable here because +/// `BudgetGuard` is called at most once per LLM turn / tool call, +/// not on a hot path. +/// +/// RE-ENTRANCY WARNING: do **not** call session/agent APIs (or any +/// blocking Rust path) from inside a Python budget-guard callback. The +/// tokio worker thread is already blocked acquiring the GIL to run the +/// callback; re-entering the runtime from there risks a deadlock or +/// re-entrancy panic. Budget guards should be pure policy — inspect the +/// args, consult host-side counters, return a decision. +struct PyBudgetGuard { + inner: pyo3::Py, +} + +impl PyBudgetGuard { + fn new(inner: pyo3::Py) -> Self { + Self { inner } + } +} + +#[async_trait::async_trait] +impl a3s_code_core::budget::BudgetGuard for PyBudgetGuard { + async fn check_before_llm( + &self, + session_id: &str, + estimated_prompt_tokens: usize, + ) -> a3s_code_core::budget::BudgetDecision { + pyo3::Python::with_gil(|py| { + let inner = self.inner.bind(py); + let method = match inner.getattr("check_before_llm") { + Ok(m) if !m.is_none() => m, + _ => return a3s_code_core::budget::BudgetDecision::Allow, + }; + match method.call1((session_id, estimated_prompt_tokens)) { + Ok(val) => parse_py_budget_decision(&val), + Err(e) => { + eprintln!( + "[a3s-code] warning: Python BudgetGuard.check_before_llm raised: {e}; defaulting to Allow" + ); + a3s_code_core::budget::BudgetDecision::Allow + } + } + }) + } + + async fn record_after_llm(&self, session_id: &str, usage: &a3s_code_core::llm::TokenUsage) { + pyo3::Python::with_gil(|py| { + let inner = self.inner.bind(py); + let method = match inner.getattr("record_after_llm") { + Ok(m) if !m.is_none() => m, + _ => return, + }; + // Hand Python a dict so they don't have to construct a + // TokenUsage type on their side. + let usage_dict = pyo3::types::PyDict::new(py); + let _ = usage_dict.set_item("prompt_tokens", usage.prompt_tokens); + let _ = usage_dict.set_item("completion_tokens", usage.completion_tokens); + let _ = usage_dict.set_item("total_tokens", usage.total_tokens); + let _ = usage_dict.set_item("cache_read_tokens", usage.cache_read_tokens); + let _ = usage_dict.set_item("cache_write_tokens", usage.cache_write_tokens); + if let Err(e) = method.call1((session_id, usage_dict)) { + eprintln!( + "[a3s-code] warning: Python BudgetGuard.record_after_llm raised: {e}; ignored" + ); + } + }) + } + + async fn check_before_tool( + &self, + session_id: &str, + tool_name: &str, + ) -> a3s_code_core::budget::BudgetDecision { + pyo3::Python::with_gil(|py| { + let inner = self.inner.bind(py); + let method = match inner.getattr("check_before_tool") { + Ok(m) if !m.is_none() => m, + _ => return a3s_code_core::budget::BudgetDecision::Allow, + }; + match method.call1((session_id, tool_name)) { + Ok(val) => parse_py_budget_decision(&val), + Err(e) => { + eprintln!( + "[a3s-code] warning: Python BudgetGuard.check_before_tool raised: {e}; defaulting to Allow" + ); + a3s_code_core::budget::BudgetDecision::Allow + } + } + }) + } +} + +/// Parse the return value of a Python BudgetGuard method into a +/// [`BudgetDecision`](a3s_code_core::budget::BudgetDecision). +/// +/// Accepted shapes: +/// - `None` → Allow +/// - `{"decision": "allow"}` → Allow +/// - `{"decision": "soft", "resource": str, "consumed": float, +/// "limit": float, "message"?: str}` → SoftLimit +/// - `{"decision": "deny", "resource": str, "reason": str}` → Deny +fn parse_py_budget_decision( + val: &pyo3::Bound, +) -> a3s_code_core::budget::BudgetDecision { + use a3s_code_core::budget::BudgetDecision; + use pyo3::types::PyDict; + + if val.is_none() { + return BudgetDecision::Allow; + } + + let Ok(dict) = val.downcast::() else { + return BudgetDecision::Allow; + }; + + let decision = dict + .get_item("decision") + .ok() + .flatten() + .and_then(|v| v.extract::().ok()) + .unwrap_or_else(|| "allow".to_string()); + + match decision.as_str() { + "deny" => { + let resource = dict + .get_item("resource") + .ok() + .flatten() + .and_then(|v| v.extract::().ok()) + .unwrap_or_else(|| "unspecified".to_string()); + let reason = dict + .get_item("reason") + .ok() + .flatten() + .and_then(|v| v.extract::().ok()) + .unwrap_or_else(|| "denied by host".to_string()); + BudgetDecision::Deny { resource, reason } + } + "soft" => { + let resource = dict + .get_item("resource") + .ok() + .flatten() + .and_then(|v| v.extract::().ok()) + .unwrap_or_else(|| "unspecified".to_string()); + let consumed = dict + .get_item("consumed") + .ok() + .flatten() + .and_then(|v| v.extract::().ok()) + .unwrap_or(0.0); + let limit = dict + .get_item("limit") + .ok() + .flatten() + .and_then(|v| v.extract::().ok()) + .unwrap_or(0.0); + let message = dict + .get_item("message") + .ok() + .flatten() + .and_then(|v| v.extract::().ok()); + BudgetDecision::SoftLimit { + resource, + consumed, + limit, + message, + } + } + _ => BudgetDecision::Allow, + } +} + +/// Convert a Python dict (`{max_runs_retained: int, ...}`) into a +/// [`SessionRetentionLimits`](a3s_code_core::retention::SessionRetentionLimits). +/// Returns `None` if the supplied object is not a dict (caller treats +/// that as "no caps" and the framework default applies). +fn parse_py_retention_limits( + py_obj: &pyo3::PyObject, +) -> Option { + use a3s_code_core::retention::SessionRetentionLimits; + use pyo3::types::PyDict; + + pyo3::Python::with_gil(|py| { + let bound = py_obj.bind(py); + let dict = bound.downcast::().ok()?; + let mut limits = SessionRetentionLimits::new(); + if let Some(v) = dict.get_item("max_runs_retained").ok().flatten() { + if let Ok(n) = v.extract::() { + limits.max_runs_retained = Some(n); + } + } + if let Some(v) = dict.get_item("max_events_per_run").ok().flatten() { + if let Ok(n) = v.extract::() { + limits.max_events_per_run = Some(n); + } + } + if let Some(v) = dict.get_item("max_trace_events").ok().flatten() { + if let Ok(n) = v.extract::() { + limits.max_trace_events = Some(n); + } + } + if let Some(v) = dict.get_item("max_terminal_subagent_tasks").ok().flatten() { + if let Ok(n) = v.extract::() { + limits.max_terminal_subagent_tasks = Some(n); + } + } + Some(limits) + }) +} + // ============================================================================ // PySlashCommand — bridges Python callables into the Rust SlashCommand trait // ============================================================================ @@ -4252,6 +4576,18 @@ struct PySessionOptions { /// # Later: /// resumed = agent.resume_session('my-session', opts) session_id: Option, + /// Host-defined tenant id. Opaque to the framework — propagated to + /// SessionData / hooks / traces for multi-tenant aggregation. + tenant_id: Option, + /// Principal identity (user / service / etc) that triggered the + /// session. Treated as opaque. + principal: Option, + /// Logical id of the agent template the session was instantiated + /// from. + agent_template_id: Option, + /// Distributed-trace correlation id propagated through this + /// session's events. + correlation_id: Option, /// Automatically save the session to the configured store after each turn (default: False). auto_save: bool, /// AHP transport configuration for external agent supervision. @@ -4265,6 +4601,29 @@ struct PySessionOptions { /// opts.ahp_transport = StdioTransport(program='python', args=['ahp_server.py']) /// session = agent.session('.', opts) ahp_transport: Option, + /// Optional Python-side BudgetGuard. The framework calls + /// `check_before_llm(session_id, estimated_tokens)`, + /// `record_after_llm(session_id, usage_dict)`, and + /// `check_before_tool(session_id, tool_name)` on this object. + /// Methods that aren't defined behave as Allow / no-op. + /// + /// Return shapes for check_*: ``None`` or ``{"decision":"allow"}`` + /// allows; ``{"decision":"soft","resource":...,"consumed":...,"limit":...,"message":...}`` + /// emits BudgetThresholdHit("soft"); ``{"decision":"deny","resource":...,"reason":...}`` + /// aborts the call with a ``Budget exhausted`` RuntimeError. + budget_guard: Option, + /// Optional FIFO retention caps on the session's in-memory stores. + /// Accepts a dict with optional integer keys: + /// + /// - ``max_runs_retained`` -- cap on InMemoryRunStore.runs + /// - ``max_events_per_run`` -- cap on per-run event buffers + /// - ``max_trace_events`` -- cap on InMemoryTraceSink + /// - ``max_terminal_subagent_tasks`` -- cap on terminal subagent entries + /// + /// Missing keys keep the unbounded default for that store. Used by + /// long-running cluster sessions to stop in-memory state from + /// growing unboundedly. + retention_limits: Option, } impl Clone for PySessionOptions { @@ -4315,10 +4674,20 @@ impl Clone for PySessionOptions { max_continuation_turns: self.max_continuation_turns, max_execution_time_ms: self.max_execution_time_ms, session_id: self.session_id.clone(), + tenant_id: self.tenant_id.clone(), + principal: self.principal.clone(), + agent_template_id: self.agent_template_id.clone(), + correlation_id: self.correlation_id.clone(), auto_save: self.auto_save, ahp_transport: pyo3::Python::with_gil(|py| { self.ahp_transport.as_ref().map(|o| o.clone_ref(py)) }), + budget_guard: pyo3::Python::with_gil(|py| { + self.budget_guard.as_ref().map(|o| o.clone_ref(py)) + }), + retention_limits: pyo3::Python::with_gil(|py| { + self.retention_limits.as_ref().map(|o| o.clone_ref(py)) + }), } } } @@ -4365,8 +4734,14 @@ impl PySessionOptions { max_continuation_turns: None, max_execution_time_ms: None, session_id: None, + tenant_id: None, + principal: None, + agent_template_id: None, + correlation_id: None, auto_save: false, ahp_transport: None, + budget_guard: None, + retention_limits: None, } } @@ -4813,6 +5188,51 @@ impl PySessionOptions { self.session_id = value; } + /// Host-defined tenant id. Opaque to the framework — used by hooks + /// / traces / SessionData for multi-tenant aggregation. + #[getter] + fn get_tenant_id(&self) -> Option { + self.tenant_id.clone() + } + + #[setter] + fn set_tenant_id(&mut self, value: Option) { + self.tenant_id = value; + } + + /// Identity of the principal that triggered the session. + #[getter] + fn get_principal(&self) -> Option { + self.principal.clone() + } + + #[setter] + fn set_principal(&mut self, value: Option) { + self.principal = value; + } + + /// Logical id of the agent template / definition. + #[getter] + fn get_agent_template_id(&self) -> Option { + self.agent_template_id.clone() + } + + #[setter] + fn set_agent_template_id(&mut self, value: Option) { + self.agent_template_id = value; + } + + /// Distributed-trace correlation id. + #[getter] + fn get_correlation_id(&self) -> Option { + self.correlation_id.clone() + } + + #[setter] + fn set_correlation_id(&mut self, value: Option) { + self.correlation_id = value; + } + /// Automatically save the session after each turn (default: False). #[getter] fn get_auto_save(&self) -> bool { @@ -4835,6 +5255,35 @@ impl PySessionOptions { self.ahp_transport = value; } + /// Host-supplied BudgetGuard. Any Python object implementing some + /// subset of `check_before_llm` / `record_after_llm` / + /// `check_before_tool`. The framework calls these around every + /// LLM call and surfaces `{"decision": "deny", ...}` as a + /// ``Budget exhausted`` ``RuntimeError`` on ``session.send``. + #[getter] + fn get_budget_guard(&self) -> Option { + pyo3::Python::with_gil(|py| self.budget_guard.as_ref().map(|o| o.clone_ref(py))) + } + + #[setter] + fn set_budget_guard(&mut self, value: Option) { + self.budget_guard = value; + } + + /// Optional FIFO retention caps as a dict with any subset of: + /// ``max_runs_retained``, ``max_events_per_run``, + /// ``max_trace_events``, ``max_terminal_subagent_tasks``. + /// Missing keys keep the unbounded default for that store. + #[getter] + fn get_retention_limits(&self) -> Option { + pyo3::Python::with_gil(|py| self.retention_limits.as_ref().map(|o| o.clone_ref(py))) + } + + #[setter] + fn set_retention_limits(&mut self, value: Option) { + self.retention_limits = value; + } + /// Register an instruction skill programmatically. /// /// Instructions are injected into the system prompt at session start. @@ -5272,6 +5721,28 @@ fn build_rust_session_options(so: PySessionOptions) -> PyResult = + std::sync::Arc::new(PyBudgetGuard::new(guard)); + o = o.with_budget_guard(wrapped); + } + if let Some(retention) = so.retention_limits { + if let Some(limits) = parse_py_retention_limits(&retention) { + o = o.with_retention_limits(limits); + } + } if so.auto_save { o = o.with_auto_save(true); } diff --git a/sdk/python/tests/test_budget_guard.py b/sdk/python/tests/test_budget_guard.py new file mode 100644 index 0000000..96f6ed2 --- /dev/null +++ b/sdk/python/tests/test_budget_guard.py @@ -0,0 +1,125 @@ +"""Smoke test for the Python BudgetGuard wrapper. + +Verifies that a Python BudgetGuard whose `check_before_llm` returns +``{"decision": "deny", ...}`` aborts ``session.send`` with a +RuntimeError that mentions "Budget exhausted" — the framework's +canonical denial signal — without the LLM ever being touched. + +Run with: + PYTHONPATH=python python tests/test_budget_guard.py +""" + +from __future__ import annotations + +import tempfile + +from a3s_code import Agent, LocalWorkspaceBackend, PermissionPolicy, SessionOptions + + +INLINE_CONFIG = """ +default_model = "anthropic/claude-sonnet-4-20250514" + +providers "anthropic" { + api_key = "test-key" + models "claude-sonnet-4-20250514" { + name = "Claude Sonnet 4" + } +} +""".strip() + + +class DenyingGuard: + """BudgetGuard that always denies the first LLM call and records + everything the framework hands it for post-hoc assertions.""" + + def __init__(self) -> None: + self.llm_checks: list[tuple[str, int]] = [] + self.tool_checks: list[tuple[str, str]] = [] + self.llm_records: list[tuple[str, dict]] = [] + + def check_before_llm(self, session_id: str, estimated_tokens: int) -> dict: + self.llm_checks.append((session_id, estimated_tokens)) + return { + "decision": "deny", + "resource": "llm_tokens", + "reason": "test cap exceeded", + } + + def check_before_tool(self, session_id: str, tool_name: str) -> dict | None: + self.tool_checks.append((session_id, tool_name)) + return None # allow + + def record_after_llm(self, session_id: str, usage: dict) -> None: + self.llm_records.append((session_id, usage)) + + +class AllowingGuard: + """BudgetGuard that returns None / no-op for every method. Verifies + that "shape with no real methods" still works (the wrapper looks + up by name at call time).""" + + def check_before_llm(self, session_id: str, estimated_tokens: int): + return None + + def record_after_llm(self, session_id: str, usage: dict) -> None: + pass + + +def main() -> None: + workspace = tempfile.mkdtemp(prefix="a3s-budget-") + agent = Agent.create(INLINE_CONFIG) + + # ----- Phase A: Deny ----- + guard = DenyingGuard() + opts = SessionOptions() + opts.permission_policy = PermissionPolicy(default_decision="allow") + opts.workspace_backend = LocalWorkspaceBackend(workspace) + opts.session_id = "budget-deny-test" + opts.budget_guard = guard + + # Also exercise the getter — read-back must match what we wrote. + assert opts.budget_guard is guard, "BudgetGuard getter must round-trip" + + session = agent.session(workspace, opts) + + try: + _ = session.send("hello") + except RuntimeError as exc: + msg = str(exc) + assert ( + "Budget exhausted" in msg or "llm_tokens" in msg + ), f"expected budget-exhausted error, got: {exc!r}" + else: + raise AssertionError("send() must raise when BudgetGuard denies") + + assert len(guard.llm_checks) == 1, ( + f"check_before_llm must be consulted exactly once, got {guard.llm_checks!r}" + ) + assert guard.llm_checks[0][0] == "budget-deny-test", ( + f"session_id must propagate, got {guard.llm_checks[0]!r}" + ) + assert len(guard.llm_records) == 0, ( + f"record_after_llm must not fire when call was denied, got {guard.llm_records!r}" + ) + + # ----- Phase B: Allow / no-op shape ----- + # A guard with only allow-style methods must not break send(). + # We can't actually send without provider credentials, so we just + # verify the SessionOptions roundtrip and that constructing a + # session succeeds. + allow_opts = SessionOptions() + allow_opts.permission_policy = PermissionPolicy(default_decision="allow") + allow_opts.workspace_backend = LocalWorkspaceBackend(workspace) + allow_opts.session_id = "budget-allow-test" + allow_opts.budget_guard = AllowingGuard() + _ = agent.session(workspace, allow_opts) + + # ----- Phase C: clear back to None ----- + opts.budget_guard = None + assert opts.budget_guard is None + + print("python sdk budget guard ok") + + +if __name__ == "__main__": + main() diff --git a/sdk/python/tests/test_session_close.py b/sdk/python/tests/test_session_close.py new file mode 100644 index 0000000..88eb6ad --- /dev/null +++ b/sdk/python/tests/test_session_close.py @@ -0,0 +1,113 @@ +"""Smoke test for the Agent / Session close surface exposed by the +core in steps 1–3 and propagated through the Python SDK in step 4. + +Verifies the PyO3 wrappers correctly route to core: +- `session.is_closed` getter +- `session.close()` is idempotent +- `agent.list_sessions()` reflects live sessions +- `agent.close_session(id)` closes one session by ID +- `agent.close()` closes every live session and rejects further + `agent.session(...)` calls + +Run with: python -m sdk/python/tests/test_session_close +(no provider credentials needed — uses inline ACL). +""" + +from __future__ import annotations + +import tempfile + +from a3s_code import Agent, LocalWorkspaceBackend, PermissionPolicy, SessionOptions + + +INLINE_CONFIG = """ +default_model = "anthropic/claude-sonnet-4-20250514" + +providers "anthropic" { + api_key = "test-key" + models "claude-sonnet-4-20250514" { + name = "Claude Sonnet 4" + } +} +""".strip() + + +def _make_session(agent: Agent, workspace: str, session_id: str): + opts = SessionOptions() + opts.permission_policy = PermissionPolicy(default_decision="allow") + opts.workspace_backend = LocalWorkspaceBackend(workspace) + opts.session_id = session_id + return agent.session(workspace, opts) + + +def main() -> None: + workspace = tempfile.mkdtemp(prefix="a3s-code-python-close-") + agent = Agent.create(INLINE_CONFIG) + + # 1. Fresh session: is_closed is False, list_sessions sees it. + session = _make_session(agent, workspace, "py-close-1") + assert session.is_closed is False, "fresh session should not be closed" + + listed = agent.list_sessions() + assert "py-close-1" in listed, ( + f"agent.list_sessions() should include py-close-1, got {listed!r}" + ) + + # 2. session.close() flips is_closed and is idempotent. + session.close() + assert session.is_closed is True, "session.close() must set is_closed = True" + session.close() # second close must not raise + assert session.is_closed is True + + # 3. agent.close_session(id) on a *new* live session closes it. + session_b = _make_session(agent, workspace, "py-close-2") + assert session_b.is_closed is False + was_open = agent.close_session("py-close-2") + assert was_open is True, ( + f"close_session() on a live session must return True, got {was_open!r}" + ) + assert session_b.is_closed is True, ( + "close_session() must propagate to the Python wrapper's is_closed view" + ) + + # 4. close_session() on an unknown id returns False, doesn't raise. + unknown = agent.close_session("does-not-exist") + assert unknown is False, ( + f"close_session() on unknown id must return False, got {unknown!r}" + ) + + # 5. agent.close() closes every live session and rejects new session(). + session_c = _make_session(agent, workspace, "py-close-3") + session_d = _make_session(agent, workspace, "py-close-4") + assert session_c.is_closed is False + assert session_d.is_closed is False + + agent.close() + assert agent.is_closed is True, "agent.is_closed must be True after agent.close()" + assert session_c.is_closed is True, "agent.close() must close session_c" + assert session_d.is_closed is True, "agent.close() must close session_d" + + # New session() must raise. + try: + _ = _make_session(agent, workspace, "py-close-post") + except Exception as exc: + msg = str(exc).lower() + assert "closed" in msg, ( + f"post-close session() error must mention 'closed', got: {exc!r}" + ) + else: + raise AssertionError("session() after agent.close() must raise") + + # disconnect_idle_mcp is exposed and returns a list (empty here — the + # inline config registers no MCP servers). Use a fresh agent since the + # one above is closed. + agent2 = Agent.create(INLINE_CONFIG) + dropped = agent2.disconnect_idle_mcp(5 * 60 * 1000) + assert isinstance(dropped, list), f"disconnect_idle_mcp must return a list, got {type(dropped)!r}" + assert dropped == [], f"no MCP servers configured -> nothing dropped, got {dropped!r}" + + print("python sdk session close api ok") + + +if __name__ == "__main__": + main()