Skip to content

test(bootstrap): make cache-dir assertion version-agnostic (unblock v3.3.0 PyPI publish)#48

Merged
ZhiXiao-Lin merged 1 commit into
mainfrom
fix/bootstrap-test-version-agnostic
May 29, 2026
Merged

test(bootstrap): make cache-dir assertion version-agnostic (unblock v3.3.0 PyPI publish)#48
ZhiXiao-Lin merged 1 commit into
mainfrom
fix/bootstrap-test-version-agnostic

Conversation

@ZhiXiao-Lin
Copy link
Copy Markdown
Contributor

Why

The v3.3.0 release pipeline (run 26616927410) published crates.io ✅, npm ✅, and native wheels → GH Release ✅, but the publish-python-bootstrap (PyPI) job failed and github-release was consequently skipped.

Root cause: a stale literal in the bootstrap unit test (which the publish job runs before twine upload):

FAIL: test_downloads_extracts_and_registers_module
    self.assertEqual(cache, Path(self._tmp) / "3.2.1")
AssertionError: PosixPath('.../3.3.0') != PosixPath('.../3.2.1')

_cache_root() keys the cache dir on the module's own __version__ (now 3.3.0), not the version argument passed to ensure_native_loaded("3.2.1"). The assertion hardcoded "3.2.1", so the version bump broke it.

Fix

Reference _bootstrap.__version__ directly — matching the sibling CacheDirTests (which already do this) — so the assertion can never go stale on a version bump again. One-line, test-only change.

Verified locally: python -m unittest tests.test_bootstrap -v15 passed, 1 skipped.

Release completion

After merge, the v3.3.0 tag will be moved to the merge commit and re-pushed. All publish jobs are idempotent (crate curl-check+skip, npm view+skip, wheels --clobber, twine --skip-existing), so the already-published artifacts no-op; only publish-python-bootstrap does new work, then github-release edits notes onto the existing v3.3.0 release.

ensure_native_loaded's cache dir is keyed on the module's __version__,
not the version argument. The test hardcoded "3.2.1", so the 3.3.0
version bump broke it and blocked the publish-python-bootstrap job
(which runs the bootstrap unit tests before twine upload), which in
turn skipped github-release.

Reference _bootstrap.__version__ directly, matching the sibling
CacheDirTests, so the assertion can't go stale on future bumps.
@ZhiXiao-Lin ZhiXiao-Lin merged commit 4470293 into main May 29, 2026
1 check passed
@ZhiXiao-Lin ZhiXiao-Lin deleted the fix/bootstrap-test-version-agnostic branch May 29, 2026 04:24
ZhiXiao-Lin added a commit to A3S-Lab/a3s that referenced this pull request May 29, 2026
* docs(code): document Agent / Session close surface

Update both en and cn API contract pages with the full graceful-close
contract: session.close() / isClosed semantics, agent.listSessions(),
agent.closeSession(id), agent.close() (which also disconnects global
MCP), and the SessionClosed error returned after agent.close().

Bumps the crates/code submodule pointer to include the new close
surface across core (steps 1–3) and the Node/Python SDKs (step 4).

* test(code): bump submodule for session-close integration tests

Picks up the cross-module integration test
(core/tests/test_session_close_lifecycle.rs) and SDK smoke tests
(sdk/python/tests/test_session_close.py, sdk/node/test_session_close.mjs)
plus the AgentSession::subagent_tracker() accessor that unblocks them.

* chore(code): bump submodule for framework cluster-pillars P1+P5+P6+P4+P2

Picks up the five framework-only mechanisms 书安OS needs as
prerequisites for ultra-scale agent cluster operation. Boundaries
respected — no scheduler / placement / transport in core; those
remain 书安OS responsibilities.

- P1 (e0b7e9b): SessionStore persists subagent task tracker across
  save/resume — unblocks session migration.
- P5 (7c4c58c): tenant / principal / agent_template / correlation
  identity labels on SessionOptions+SessionData — unblocks multi-
  tenancy aggregation without string-hacking session_id.
- P6 (0043844): AgentEvent variants BudgetThresholdHit /
  PassivationRequested / PeerInvocation — give in-session code a
  uniform way to observe platform decisions.
- P4 (679efb8): BudgetGuard trait wired into the LLM call path —
  host plugs in cluster-aware quota/cost enforcement; framework
  emits structured events and bails on Deny.
- P2 (9c290ad): HostEnv (IdGenerator + Clock) injection — unlocks
  deterministic replay of a run on another node.

P3 (loop resumable / per-step checkpoint) remains for follow-up.

* chore(code): bump submodule — P3 cut 1 (loop checkpoint data + persistence)

Picks up:
- LoopCheckpoint data contract + SessionStoreCheckpointSink adapter.
- SessionStore::save_loop_checkpoint / load_loop_checkpoint
  (default no-op; MemorySessionStore + FileSessionStore implement).
- AgentLoop auto-wires a checkpoint sink from session.session_store
  in build_agent_loop, and persists after every successful tool round
  in execute_loop_inner.
- Integration tests: store roundtrip + the no-tool-call negative
  property.

Cut 2 (resume_run API) remains in the framework's P3 backlog.

* chore(code): bump submodule — P3 complete (resume_run API)

Picks up `AgentSession::resume_run(checkpoint_run_id)` which loads a
LoopCheckpoint via SessionStore and replays the agent loop from that
boundary. Together with P3 cut 1 (in the previous submodule bump),
the framework now provides full crash-tolerant run semantics — 书安OS
plugs in placement / drain choreography on top.

Two distinguishable error paths (`session_store` missing vs
`loop checkpoint` missing) lock the API for host-side scheduling.

* chore(code): bump submodule — SDK identity labels + resume_run

Surfaces the P5 (identity labels) and P3 (resume_run) framework
additions through both Node and Python SDKs. JS/TS callers get
`session.resumeRun(...)` + `session.tenantId` etc; Python callers
get `session.resume_run(...)` + matching property getters.

* docs(code): cluster-grade extension points (en + cn)

New section in both api-contract pages walking through the five
framework-level extension points the host platform (书安OS) sits on:

- Identity labels (tenant_id / principal / agent_template_id /
  correlation_id) — opaque transport, host aggregates.
- BudgetGuard — Allow / SoftLimit / Deny decision shape; structured
  events on threshold hits; LLM call-site enforcement.
- Cluster AgentEvent variants — BudgetThresholdHit,
  PassivationRequested, PeerInvocation; host emits via HookExecutor.
- Deterministic IDs / time via HostEnv (SequentialIdGenerator +
  FixedClock for replay).
- LoopCheckpoint + session.resumeRun/resume_run with both error
  paths documented so cluster scheduling code can branch.

Boundary policy ("between tool rounds, never mid-tool") is called
out explicitly so host-side reasoning about lost-work semantics
matches framework behaviour.

Bumps crates/code submodule for the matching README update.

* chore(code): bump submodule — retention caps for in-memory stores

Picks up `SessionRetentionLimits` with four optional FIFO caps:
max_runs_retained / max_events_per_run / max_trace_events /
max_terminal_subagent_tasks. Plumbs through
SessionOptions::with_retention_limits → AgentConfig → store
constructors so long-running cluster sessions stop accumulating
memory unboundedly.

Defaults stay unbounded — existing callers see no behaviour
change. Eviction policy preserves the most-recent entries
(useful for debugging) and never drops Running subagent tasks.

1692 unit + 9 integration tests green; clippy clean.

* chore(code): bump submodule — retention caps + resume_run E2E test

Picks up SessionRetentionLimits with FIFO caps on RunStore /
TraceSink / SubagentTracker plus the E2E happy-path test for
resume_run that locks the P3 contract surface 书安OS will sit on.

Defaults stay unbounded — pure additions.

* chore(code): bump submodule — MCP idle disconnect

Picks up McpManager::disconnect_idle + Agent::disconnect_idle_mcp.
Hosts now have a focused entry point to reap quiet MCP subprocesses
without losing the registered config — paired with the in-memory
retention caps shipped earlier this batch, the framework no longer
leaks memory / FDs across long-running cluster workloads.

* chore(code): bump submodule — BudgetGuard SDK propagation (Python + Node)

Picks up Python (PyBudgetGuard via Python::with_gil) and Node
(NodeBudgetGuard via ThreadsafeFunction) bridges, plus the small
framework addition (AgentSession::set_budget_guard) that lets the
Node SDK install a JS-backed guard after session construction —
required because JsFunction values can't live in the value-typed
SessionOptions struct.

Both SDKs use the same decision shape ({decision:'allow'|'soft'|'deny',
...}) and the same fail-safe defaults (unknown shapes / callback
errors → Allow).

* docs(code): retention caps + MCP idle + BudgetGuard SDK examples (en + cn)

Three new sub-sections under "Cluster-grade extension points" so the
operational additions ship with discoverable usage examples:

- Retention caps for long-running sessions —
  SessionRetentionLimits.with_max_runs / max_events_per_run /
  max_trace_events / max_terminal_subagent_tasks. Notes that running
  subagent tasks are never evicted and that SDK shapes follow later.

- MCP idle disconnect — agent.disconnectIdleMcp /
  disconnect_idle_mcp with a periodic-sweeper example for both SDKs.
  Calls out McpManager.touch for side-channel keep-warm.

- BudgetGuard SDK bridges — decision-shape table (allow/soft/deny)
  shared across Python and Node, Python class-style attach via
  SessionOptions.budget_guard, Node setBudgetGuard({...}) handler
  attach (justified by JsFunction lifetime), and the "callback
  errors fall back to Allow" fail-safe.

Bumps crates/code submodule for the matching README update.

* chore(code): bump submodule — SessionRetentionLimits SDK propagation

Picks up Python `opts.retention_limits = {dict}` and Node
`opts.retentionLimits = {object}` shapes. Both forward into the
framework's SessionRetentionLimits and into the per-session store
construction. Missing fields keep the unbounded default.

* chore(code): bump submodule — cluster ops consolidation test

Picks up `cluster_ops_consolidated_session_lifecycle`, a single
integration test that exercises identity labels + subagent
persistence + LoopCheckpoint round-trip across two simulated nodes
sharing one MemorySessionStore. Reference flow for 书安OS-side
scheduling.

* chore(code): bump submodule — cluster-pillars review hardening (11 fixes)

Folds in the full fix batch from the adversarial multi-dimension review
of the cluster-pillars work (11 confirmed findings, 1 rejected):

core (4b35537): H4 checkpoint leak + crash-atomic write; H3 event_count
corruption; H2 resume_run metric loss; M1/M2 eviction TOCTOU; M3 MCP
timestamp leak; L1 registry prune.

sdk (281dc58): H1 Node BudgetGuard fail-closed (timeout/parse -> Deny,
not Allow) + documented no-throw constraint; M4 disconnect_idle_mcp
exposed in both SDKs (docs now true); L2 Python re-entrancy doc.

1705 lib + 10 integration green; Node 27 + Python 19 cargo tests;
all SDK smokes pass; clippy clean across core + both SDKs.

* chore(code): bump submodule — v3.3.0 release prep

Points to the a3s-code v3.3.0 release-prep commit: all package versions
synced to 3.3.0, CHANGELOG entry added, SDK sources fmt-clean. Full core
suite green (1705 lib + all integration files). Not pushed / not tagged.

* chore(code): bump submodule — real-LLM cluster-feature tests

Picks up core/tests/test_real_llm_cluster_features.rs: 5 #[ignore]
end-to-end tests validating the 3.3.0 LLM-loop features against a live
provider. Verified passing against openai/MiniMax-M2.7-highspeed.

* chore(code): bump submodule — v3.3.0 released (crates.io/npm/PyPI/GH)

Points crates/code at 44702931 (v3.3.0 tag + the bootstrap test fix
from AI45Lab/Code#48). Release is live on all four registries.

---------

Co-authored-by: Roy Lin <roylin@a3s.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants