test(bootstrap): make cache-dir assertion version-agnostic (unblock v3.3.0 PyPI publish)#48
Merged
Merged
Conversation
ensure_native_loaded's cache dir is keyed on the module's __version__, not the version argument. The test hardcoded "3.2.1", so the 3.3.0 version bump broke it and blocked the publish-python-bootstrap job (which runs the bootstrap unit tests before twine upload), which in turn skipped github-release. Reference _bootstrap.__version__ directly, matching the sibling CacheDirTests, so the assertion can't go stale on future bumps.
ZhiXiao-Lin
added a commit
to A3S-Lab/a3s
that referenced
this pull request
May 29, 2026
* docs(code): document Agent / Session close surface
Update both en and cn API contract pages with the full graceful-close
contract: session.close() / isClosed semantics, agent.listSessions(),
agent.closeSession(id), agent.close() (which also disconnects global
MCP), and the SessionClosed error returned after agent.close().
Bumps the crates/code submodule pointer to include the new close
surface across core (steps 1–3) and the Node/Python SDKs (step 4).
* test(code): bump submodule for session-close integration tests
Picks up the cross-module integration test
(core/tests/test_session_close_lifecycle.rs) and SDK smoke tests
(sdk/python/tests/test_session_close.py, sdk/node/test_session_close.mjs)
plus the AgentSession::subagent_tracker() accessor that unblocks them.
* chore(code): bump submodule for framework cluster-pillars P1+P5+P6+P4+P2
Picks up the five framework-only mechanisms 书安OS needs as
prerequisites for ultra-scale agent cluster operation. Boundaries
respected — no scheduler / placement / transport in core; those
remain 书安OS responsibilities.
- P1 (e0b7e9b): SessionStore persists subagent task tracker across
save/resume — unblocks session migration.
- P5 (7c4c58c): tenant / principal / agent_template / correlation
identity labels on SessionOptions+SessionData — unblocks multi-
tenancy aggregation without string-hacking session_id.
- P6 (0043844): AgentEvent variants BudgetThresholdHit /
PassivationRequested / PeerInvocation — give in-session code a
uniform way to observe platform decisions.
- P4 (679efb8): BudgetGuard trait wired into the LLM call path —
host plugs in cluster-aware quota/cost enforcement; framework
emits structured events and bails on Deny.
- P2 (9c290ad): HostEnv (IdGenerator + Clock) injection — unlocks
deterministic replay of a run on another node.
P3 (loop resumable / per-step checkpoint) remains for follow-up.
* chore(code): bump submodule — P3 cut 1 (loop checkpoint data + persistence)
Picks up:
- LoopCheckpoint data contract + SessionStoreCheckpointSink adapter.
- SessionStore::save_loop_checkpoint / load_loop_checkpoint
(default no-op; MemorySessionStore + FileSessionStore implement).
- AgentLoop auto-wires a checkpoint sink from session.session_store
in build_agent_loop, and persists after every successful tool round
in execute_loop_inner.
- Integration tests: store roundtrip + the no-tool-call negative
property.
Cut 2 (resume_run API) remains in the framework's P3 backlog.
* chore(code): bump submodule — P3 complete (resume_run API)
Picks up `AgentSession::resume_run(checkpoint_run_id)` which loads a
LoopCheckpoint via SessionStore and replays the agent loop from that
boundary. Together with P3 cut 1 (in the previous submodule bump),
the framework now provides full crash-tolerant run semantics — 书安OS
plugs in placement / drain choreography on top.
Two distinguishable error paths (`session_store` missing vs
`loop checkpoint` missing) lock the API for host-side scheduling.
* chore(code): bump submodule — SDK identity labels + resume_run
Surfaces the P5 (identity labels) and P3 (resume_run) framework
additions through both Node and Python SDKs. JS/TS callers get
`session.resumeRun(...)` + `session.tenantId` etc; Python callers
get `session.resume_run(...)` + matching property getters.
* docs(code): cluster-grade extension points (en + cn)
New section in both api-contract pages walking through the five
framework-level extension points the host platform (书安OS) sits on:
- Identity labels (tenant_id / principal / agent_template_id /
correlation_id) — opaque transport, host aggregates.
- BudgetGuard — Allow / SoftLimit / Deny decision shape; structured
events on threshold hits; LLM call-site enforcement.
- Cluster AgentEvent variants — BudgetThresholdHit,
PassivationRequested, PeerInvocation; host emits via HookExecutor.
- Deterministic IDs / time via HostEnv (SequentialIdGenerator +
FixedClock for replay).
- LoopCheckpoint + session.resumeRun/resume_run with both error
paths documented so cluster scheduling code can branch.
Boundary policy ("between tool rounds, never mid-tool") is called
out explicitly so host-side reasoning about lost-work semantics
matches framework behaviour.
Bumps crates/code submodule for the matching README update.
* chore(code): bump submodule — retention caps for in-memory stores
Picks up `SessionRetentionLimits` with four optional FIFO caps:
max_runs_retained / max_events_per_run / max_trace_events /
max_terminal_subagent_tasks. Plumbs through
SessionOptions::with_retention_limits → AgentConfig → store
constructors so long-running cluster sessions stop accumulating
memory unboundedly.
Defaults stay unbounded — existing callers see no behaviour
change. Eviction policy preserves the most-recent entries
(useful for debugging) and never drops Running subagent tasks.
1692 unit + 9 integration tests green; clippy clean.
* chore(code): bump submodule — retention caps + resume_run E2E test
Picks up SessionRetentionLimits with FIFO caps on RunStore /
TraceSink / SubagentTracker plus the E2E happy-path test for
resume_run that locks the P3 contract surface 书安OS will sit on.
Defaults stay unbounded — pure additions.
* chore(code): bump submodule — MCP idle disconnect
Picks up McpManager::disconnect_idle + Agent::disconnect_idle_mcp.
Hosts now have a focused entry point to reap quiet MCP subprocesses
without losing the registered config — paired with the in-memory
retention caps shipped earlier this batch, the framework no longer
leaks memory / FDs across long-running cluster workloads.
* chore(code): bump submodule — BudgetGuard SDK propagation (Python + Node)
Picks up Python (PyBudgetGuard via Python::with_gil) and Node
(NodeBudgetGuard via ThreadsafeFunction) bridges, plus the small
framework addition (AgentSession::set_budget_guard) that lets the
Node SDK install a JS-backed guard after session construction —
required because JsFunction values can't live in the value-typed
SessionOptions struct.
Both SDKs use the same decision shape ({decision:'allow'|'soft'|'deny',
...}) and the same fail-safe defaults (unknown shapes / callback
errors → Allow).
* docs(code): retention caps + MCP idle + BudgetGuard SDK examples (en + cn)
Three new sub-sections under "Cluster-grade extension points" so the
operational additions ship with discoverable usage examples:
- Retention caps for long-running sessions —
SessionRetentionLimits.with_max_runs / max_events_per_run /
max_trace_events / max_terminal_subagent_tasks. Notes that running
subagent tasks are never evicted and that SDK shapes follow later.
- MCP idle disconnect — agent.disconnectIdleMcp /
disconnect_idle_mcp with a periodic-sweeper example for both SDKs.
Calls out McpManager.touch for side-channel keep-warm.
- BudgetGuard SDK bridges — decision-shape table (allow/soft/deny)
shared across Python and Node, Python class-style attach via
SessionOptions.budget_guard, Node setBudgetGuard({...}) handler
attach (justified by JsFunction lifetime), and the "callback
errors fall back to Allow" fail-safe.
Bumps crates/code submodule for the matching README update.
* chore(code): bump submodule — SessionRetentionLimits SDK propagation
Picks up Python `opts.retention_limits = {dict}` and Node
`opts.retentionLimits = {object}` shapes. Both forward into the
framework's SessionRetentionLimits and into the per-session store
construction. Missing fields keep the unbounded default.
* chore(code): bump submodule — cluster ops consolidation test
Picks up `cluster_ops_consolidated_session_lifecycle`, a single
integration test that exercises identity labels + subagent
persistence + LoopCheckpoint round-trip across two simulated nodes
sharing one MemorySessionStore. Reference flow for 书安OS-side
scheduling.
* chore(code): bump submodule — cluster-pillars review hardening (11 fixes)
Folds in the full fix batch from the adversarial multi-dimension review
of the cluster-pillars work (11 confirmed findings, 1 rejected):
core (4b35537): H4 checkpoint leak + crash-atomic write; H3 event_count
corruption; H2 resume_run metric loss; M1/M2 eviction TOCTOU; M3 MCP
timestamp leak; L1 registry prune.
sdk (281dc58): H1 Node BudgetGuard fail-closed (timeout/parse -> Deny,
not Allow) + documented no-throw constraint; M4 disconnect_idle_mcp
exposed in both SDKs (docs now true); L2 Python re-entrancy doc.
1705 lib + 10 integration green; Node 27 + Python 19 cargo tests;
all SDK smokes pass; clippy clean across core + both SDKs.
* chore(code): bump submodule — v3.3.0 release prep
Points to the a3s-code v3.3.0 release-prep commit: all package versions
synced to 3.3.0, CHANGELOG entry added, SDK sources fmt-clean. Full core
suite green (1705 lib + all integration files). Not pushed / not tagged.
* chore(code): bump submodule — real-LLM cluster-feature tests
Picks up core/tests/test_real_llm_cluster_features.rs: 5 #[ignore]
end-to-end tests validating the 3.3.0 LLM-loop features against a live
provider. Verified passing against openai/MiniMax-M2.7-highspeed.
* chore(code): bump submodule — v3.3.0 released (crates.io/npm/PyPI/GH)
Points crates/code at 44702931 (v3.3.0 tag + the bootstrap test fix
from AI45Lab/Code#48). Release is live on all four registries.
---------
Co-authored-by: Roy Lin <roylin@a3s.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The v3.3.0 release pipeline (run 26616927410) published crates.io ✅, npm ✅, and native wheels → GH Release ✅, but the publish-python-bootstrap (PyPI) job failed and github-release was consequently skipped.
Root cause: a stale literal in the bootstrap unit test (which the publish job runs before
twine upload):_cache_root()keys the cache dir on the module's own__version__(now3.3.0), not the version argument passed toensure_native_loaded("3.2.1"). The assertion hardcoded"3.2.1", so the version bump broke it.Fix
Reference
_bootstrap.__version__directly — matching the siblingCacheDirTests(which already do this) — so the assertion can never go stale on a version bump again. One-line, test-only change.Verified locally:
python -m unittest tests.test_bootstrap -v→ 15 passed, 1 skipped.Release completion
After merge, the
v3.3.0tag will be moved to the merge commit and re-pushed. All publish jobs are idempotent (crate curl-check+skip,npm view+skip, wheels--clobber, twine--skip-existing), so the already-published artifacts no-op; only publish-python-bootstrap does new work, then github-release edits notes onto the existing v3.3.0 release.