test(bootstrap): make cache-dir assertion version-agnostic (unblock v3.3.0 PyPI publish) by ZhiXiao-Lin · Pull Request #48 · AI45Lab/Code

ZhiXiao-Lin · 2026-05-29T04:23:20Z

Why

The v3.3.0 release pipeline (run 26616927410) published crates.io ✅, npm ✅, and native wheels → GH Release ✅, but the publish-python-bootstrap (PyPI) job failed and github-release was consequently skipped.

Root cause: a stale literal in the bootstrap unit test (which the publish job runs before twine upload):

FAIL: test_downloads_extracts_and_registers_module
    self.assertEqual(cache, Path(self._tmp) / "3.2.1")
AssertionError: PosixPath('.../3.3.0') != PosixPath('.../3.2.1')

_cache_root() keys the cache dir on the module's own __version__ (now 3.3.0), not the version argument passed to ensure_native_loaded("3.2.1"). The assertion hardcoded "3.2.1", so the version bump broke it.

Fix

Reference _bootstrap.__version__ directly — matching the sibling CacheDirTests (which already do this) — so the assertion can never go stale on a version bump again. One-line, test-only change.

Verified locally: python -m unittest tests.test_bootstrap -v → 15 passed, 1 skipped.

Release completion

After merge, the v3.3.0 tag will be moved to the merge commit and re-pushed. All publish jobs are idempotent (crate curl-check+skip, npm view+skip, wheels --clobber, twine --skip-existing), so the already-published artifacts no-op; only publish-python-bootstrap does new work, then github-release edits notes onto the existing v3.3.0 release.

ensure_native_loaded's cache dir is keyed on the module's __version__, not the version argument. The test hardcoded "3.2.1", so the 3.3.0 version bump broke it and blocked the publish-python-bootstrap job (which runs the bootstrap unit tests before twine upload), which in turn skipped github-release. Reference _bootstrap.__version__ directly, matching the sibling CacheDirTests, so the assertion can't go stale on future bumps.

* docs(code): document Agent / Session close surface Update both en and cn API contract pages with the full graceful-close contract: session.close() / isClosed semantics, agent.listSessions(), agent.closeSession(id), agent.close() (which also disconnects global MCP), and the SessionClosed error returned after agent.close(). Bumps the crates/code submodule pointer to include the new close surface across core (steps 1–3) and the Node/Python SDKs (step 4). * test(code): bump submodule for session-close integration tests Picks up the cross-module integration test (core/tests/test_session_close_lifecycle.rs) and SDK smoke tests (sdk/python/tests/test_session_close.py, sdk/node/test_session_close.mjs) plus the AgentSession::subagent_tracker() accessor that unblocks them. * chore(code): bump submodule for framework cluster-pillars P1+P5+P6+P4+P2 Picks up the five framework-only mechanisms 书安OS needs as prerequisites for ultra-scale agent cluster operation. Boundaries respected — no scheduler / placement / transport in core; those remain 书安OS responsibilities. - P1 (e0b7e9b): SessionStore persists subagent task tracker across save/resume — unblocks session migration. - P5 (7c4c58c): tenant / principal / agent_template / correlation identity labels on SessionOptions+SessionData — unblocks multi- tenancy aggregation without string-hacking session_id. - P6 (0043844): AgentEvent variants BudgetThresholdHit / PassivationRequested / PeerInvocation — give in-session code a uniform way to observe platform decisions. - P4 (679efb8): BudgetGuard trait wired into the LLM call path — host plugs in cluster-aware quota/cost enforcement; framework emits structured events and bails on Deny. - P2 (9c290ad): HostEnv (IdGenerator + Clock) injection — unlocks deterministic replay of a run on another node. P3 (loop resumable / per-step checkpoint) remains for follow-up. * chore(code): bump submodule — P3 cut 1 (loop checkpoint data + persistence) Picks up: - LoopCheckpoint data contract + SessionStoreCheckpointSink adapter. - SessionStore::save_loop_checkpoint / load_loop_checkpoint (default no-op; MemorySessionStore + FileSessionStore implement). - AgentLoop auto-wires a checkpoint sink from session.session_store in build_agent_loop, and persists after every successful tool round in execute_loop_inner. - Integration tests: store roundtrip + the no-tool-call negative property. Cut 2 (resume_run API) remains in the framework's P3 backlog. * chore(code): bump submodule — P3 complete (resume_run API) Picks up `AgentSession::resume_run(checkpoint_run_id)` which loads a LoopCheckpoint via SessionStore and replays the agent loop from that boundary. Together with P3 cut 1 (in the previous submodule bump), the framework now provides full crash-tolerant run semantics — 书安OS plugs in placement / drain choreography on top. Two distinguishable error paths (`session_store` missing vs `loop checkpoint` missing) lock the API for host-side scheduling. * chore(code): bump submodule — SDK identity labels + resume_run Surfaces the P5 (identity labels) and P3 (resume_run) framework additions through both Node and Python SDKs. JS/TS callers get `session.resumeRun(...)` + `session.tenantId` etc; Python callers get `session.resume_run(...)` + matching property getters. * docs(code): cluster-grade extension points (en + cn) New section in both api-contract pages walking through the five framework-level extension points the host platform (书安OS) sits on: - Identity labels (tenant_id / principal / agent_template_id / correlation_id) — opaque transport, host aggregates. - BudgetGuard — Allow / SoftLimit / Deny decision shape; structured events on threshold hits; LLM call-site enforcement. - Cluster AgentEvent variants — BudgetThresholdHit, PassivationRequested, PeerInvocation; host emits via HookExecutor. - Deterministic IDs / time via HostEnv (SequentialIdGenerator + FixedClock for replay). - LoopCheckpoint + session.resumeRun/resume_run with both error paths documented so cluster scheduling code can branch. Boundary policy ("between tool rounds, never mid-tool") is called out explicitly so host-side reasoning about lost-work semantics matches framework behaviour. Bumps crates/code submodule for the matching README update. * chore(code): bump submodule — retention caps for in-memory stores Picks up `SessionRetentionLimits` with four optional FIFO caps: max_runs_retained / max_events_per_run / max_trace_events / max_terminal_subagent_tasks. Plumbs through SessionOptions::with_retention_limits → AgentConfig → store constructors so long-running cluster sessions stop accumulating memory unboundedly. Defaults stay unbounded — existing callers see no behaviour change. Eviction policy preserves the most-recent entries (useful for debugging) and never drops Running subagent tasks. 1692 unit + 9 integration tests green; clippy clean. * chore(code): bump submodule — retention caps + resume_run E2E test Picks up SessionRetentionLimits with FIFO caps on RunStore / TraceSink / SubagentTracker plus the E2E happy-path test for resume_run that locks the P3 contract surface 书安OS will sit on. Defaults stay unbounded — pure additions. * chore(code): bump submodule — MCP idle disconnect Picks up McpManager::disconnect_idle + Agent::disconnect_idle_mcp. Hosts now have a focused entry point to reap quiet MCP subprocesses without losing the registered config — paired with the in-memory retention caps shipped earlier this batch, the framework no longer leaks memory / FDs across long-running cluster workloads. * chore(code): bump submodule — BudgetGuard SDK propagation (Python + Node) Picks up Python (PyBudgetGuard via Python::with_gil) and Node (NodeBudgetGuard via ThreadsafeFunction) bridges, plus the small framework addition (AgentSession::set_budget_guard) that lets the Node SDK install a JS-backed guard after session construction — required because JsFunction values can't live in the value-typed SessionOptions struct. Both SDKs use the same decision shape ({decision:'allow'|'soft'|'deny', ...}) and the same fail-safe defaults (unknown shapes / callback errors → Allow). * docs(code): retention caps + MCP idle + BudgetGuard SDK examples (en + cn) Three new sub-sections under "Cluster-grade extension points" so the operational additions ship with discoverable usage examples: - Retention caps for long-running sessions — SessionRetentionLimits.with_max_runs / max_events_per_run / max_trace_events / max_terminal_subagent_tasks. Notes that running subagent tasks are never evicted and that SDK shapes follow later. - MCP idle disconnect — agent.disconnectIdleMcp / disconnect_idle_mcp with a periodic-sweeper example for both SDKs. Calls out McpManager.touch for side-channel keep-warm. - BudgetGuard SDK bridges — decision-shape table (allow/soft/deny) shared across Python and Node, Python class-style attach via SessionOptions.budget_guard, Node setBudgetGuard({...}) handler attach (justified by JsFunction lifetime), and the "callback errors fall back to Allow" fail-safe. Bumps crates/code submodule for the matching README update. * chore(code): bump submodule — SessionRetentionLimits SDK propagation Picks up Python `opts.retention_limits = {dict}` and Node `opts.retentionLimits = {object}` shapes. Both forward into the framework's SessionRetentionLimits and into the per-session store construction. Missing fields keep the unbounded default. * chore(code): bump submodule — cluster ops consolidation test Picks up `cluster_ops_consolidated_session_lifecycle`, a single integration test that exercises identity labels + subagent persistence + LoopCheckpoint round-trip across two simulated nodes sharing one MemorySessionStore. Reference flow for 书安OS-side scheduling. * chore(code): bump submodule — cluster-pillars review hardening (11 fixes) Folds in the full fix batch from the adversarial multi-dimension review of the cluster-pillars work (11 confirmed findings, 1 rejected): core (4b35537): H4 checkpoint leak + crash-atomic write; H3 event_count corruption; H2 resume_run metric loss; M1/M2 eviction TOCTOU; M3 MCP timestamp leak; L1 registry prune. sdk (281dc58): H1 Node BudgetGuard fail-closed (timeout/parse -> Deny, not Allow) + documented no-throw constraint; M4 disconnect_idle_mcp exposed in both SDKs (docs now true); L2 Python re-entrancy doc. 1705 lib + 10 integration green; Node 27 + Python 19 cargo tests; all SDK smokes pass; clippy clean across core + both SDKs. * chore(code): bump submodule — v3.3.0 release prep Points to the a3s-code v3.3.0 release-prep commit: all package versions synced to 3.3.0, CHANGELOG entry added, SDK sources fmt-clean. Full core suite green (1705 lib + all integration files). Not pushed / not tagged. * chore(code): bump submodule — real-LLM cluster-feature tests Picks up core/tests/test_real_llm_cluster_features.rs: 5 #[ignore] end-to-end tests validating the 3.3.0 LLM-loop features against a live provider. Verified passing against openai/MiniMax-M2.7-highspeed. * chore(code): bump submodule — v3.3.0 released (crates.io/npm/PyPI/GH) Points crates/code at 44702931 (v3.3.0 tag + the bootstrap test fix from AI45Lab/Code#48). Release is live on all four registries. --------- Co-authored-by: Roy Lin <roylin@a3s.dev>

ZhiXiao-Lin merged commit 4470293 into main May 29, 2026
1 check passed

ZhiXiao-Lin deleted the fix/bootstrap-test-version-agnostic branch May 29, 2026 04:24

ZhiXiao-Lin mentioned this pull request May 29, 2026

chore(code): bump submodule — v3.3.0 released A3S-Lab/a3s#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(bootstrap): make cache-dir assertion version-agnostic (unblock v3.3.0 PyPI publish)#48

test(bootstrap): make cache-dir assertion version-agnostic (unblock v3.3.0 PyPI publish)#48
ZhiXiao-Lin merged 1 commit into
mainfrom
fix/bootstrap-test-version-agnostic

ZhiXiao-Lin commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZhiXiao-Lin commented May 29, 2026

Why

Fix

Release completion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants