Skip to content

sbp2: port session command core onto DICE#30

Merged
mrmidi merged 11 commits into
DICEfrom
sbp2-session-port
Jun 19, 2026
Merged

sbp2: port session command core onto DICE#30
mrmidi merged 11 commits into
DICEfrom
sbp2-session-port

Conversation

@mrmidi

@mrmidi mrmidi commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • Port the SBP-2 session/command core onto the DICE architecture without cherry-picking the main-branch implementation.
  • Add the production DriverKit session scheduler (IOTimerDispatchSource + OSAction) and wire SessionRegistry into driver lifecycle ownership.
  • Expose the session/command user-client boundary through the DICE dispatch table and selectors.
  • Hook bus-reset and discovery-complete flow into SBP-2 suspend/reconnect handling.

Provenance

This is a DICE-shaped port of the SBP-2 work originally implemented by @gly11 across the main-branch PR history, especially PR #19 (feat(sbp2): add session registry and command flow) and its follow-up hardening commits. The implementation here preserves that behavior/test oracle while decomposing the monoliths into DICE-style components.

Verification

  • ./build.sh --test-only --no-bump — 1168/1168 host tests passed.
  • ./build.sh --no-bump — Xcode/IIG build succeeded.

Remaining Risk

This builds and passes host coverage, but it still requires hardware verification against an SBP-2 target. The important smoke is login → INQUIRY, ideally followed by bus reset/reconnect behavior.

@mrmidi

mrmidi commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

@gly11 it's builds but not tested on HW. Could you test please?

mrmidi and others added 11 commits June 18, 2026 17:24
…tagging

FW-55 (foundation deltas for the SBP-2 session/command port). The session
layer ported from PR #19 tags each of its address-space ranges (login ORB,
login response, status FIFO, reconnect/logout ORBs) with a human-readable
label for diagnostics. DICE's AddressSpaceManager already covers every other
API the session layer needs and already had the remote-write callback
lifetime safety (callback copied out of the lock before firing), so this is
the only foundation gap.

Diagnostics-only: the label feeds range-dump logging and is a safe no-op for
unknown/zero handles and null labels. Added a contract test covering bad
input and that labelling never perturbs range state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Decomposes PR #19's two monoliths (SBP2LoginSession.cpp 1847 lines,
SBP2SessionRegistry.cpp 799 lines) into DICE-style single-purpose
components, and maps every #19 ORB call onto DICE's foundation API.

LoginSession -> LoginSession (orchestrator/state) + LoginOrbExchange
(management plane) + FetchAgent (command plane) + UnsolicitedStatusSink.
SessionRegistry -> SessionRegistry (identity/lifecycle) + CommandExecutor
(command plane) + slim SessionRecord.

Captures the single-Default-queue simplification (delete #19's owned
timeout-queue machinery) and the small CommandORB foundation additions
FW-56 needs (IsValid, bool SetCommandBlock, kern_return_t returns).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Approved guidance folded into the FW-56 plan:
- Timers via IOTimerDispatchSource + OSAction (WatchdogCoordinator precedent),
  not SBP2DelayedDispatch's IOSleep-on-queue hack which blocks the single
  Default queue thread for the whole delay.
- OSAction targets must be IIG TYPE() methods on a DriverKit class; the POCO
  session components therefore take an injected ISessionScheduler (production:
  IOTimerDispatchSource+OSAction wired in FW-58; tests: virtual-clock fake).
- Range memory stays IOBufferMemoryDescriptor/IODMACommand via
  AddressSpaceManager (handles, not raw buffers).
- UnsolicitedStatusSink owns the single status-FIFO range + routing callback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
POCO session components (decided) can't host an OSAction directly, so they
take an injected one-shot timer interface instead of scheduling timers
themselves. Production backs it with a driver-level IOTimerDispatchSource +
OSAction (FW-58); host tests use FakeSessionScheduler, a deterministic
virtual clock.

The fake steps the clock to each callback's own deadline before firing, so a
handler that re-schedules with a relative delay (reconnect, busy-timeout
replay) computes its deadline from its fire time — matching a real timer.
5 contract tests cover deadline ordering, cancel, re-entrant scheduling, and
cross-cancellation. Object-model decision recorded in SBP2_SESSION_PORT.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session/command layer (FW-56) relies on three CommandORB guarantees the
DICE ctor doesn't currently surface:
- IsValid(): the ctor calls AllocateResources() but swallows its bool, so a
  failed ORB allocation is otherwise undetectable. Callers check before submit.
- SetCommandBlock now returns bool and rejects (not truncates) a CDB larger
  than maxCommandBlockSize_.
- PrepareForExecution / SetNextORBAddress / SetToDummy return kern_return_t so
  the fetch-agent path can propagate write/alloc failures (kIOReturnNotReady
  when the ORB isn't allocated).

WriteORBToAddressSpace now returns its write status. 3 new SBP2ORBTests cover
validity, oversized-CDB rejection, and chain/dummy success. 12/12 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reading OnStatusBlockRemoteWrite shows status reception switches on LoginState
and routes straight into login/reconnect/logout completion, all sharing state_,
loginGeneration_, and the timers. Separating LoginOrbExchange and
UnsolicitedStatusSink would scatter that shared mutable state across class
boundaries. The one clean seam is the post-login command plane.

Login side is now LoginSession (state machine + status routing + ranges) +
FetchAgent (command plane). Build order revised bottom-up: FetchAgent →
LoginSession → SessionRegistry → CommandExecutor (the registry constructs a
LoginSession, so it can't precede it).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extracts the post-login ORB submission machinery from #19's SBP2LoginSession
into a focused FetchAgent owned by the session (composition). Submits immediate
ORBs to the fetch-agent register, chains subsequent ORBs, rings the doorbell,
retries failed fetch-agent writes, tracks outstanding ORBs, times them out, and
matches incoming status blocks back to their ORB.

Adapted to DICE: driven by an explicit Binding (generation/node/agent addresses)
the session supplies on login, instead of reading login state directly; ORB
timeouts and write-retry backoff go through the injected ISessionScheduler (not
the IOSleep-on-queue path); CommandORB kern_return_t returns are propagated;
async bus callbacks guarded by a weak lifetime token. Adds CommandORB::GetTimeout.

4 FetchAgentTests cover unbound rejection, immediate submit→write→timeout-arm,
status completion, ORB timeout, and write-retry exhaustion → agent reset. Green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Decompose PR #19's SBP2LoginSession (1847 lines): the login/reconnect/
logout state machine + status-FIFO routing stays here as Session/
LoginSession; the post-login command plane is the composed FetchAgent
(step 2). DICE adaptations:

  * Timers run on the injected ISessionScheduler (one cancelable
    management timer at a time), replacing #19's two-queue IOSleep model
    (SetTimeoutQueue/EnsureTimeoutQueue/owned queue all removed, §4).
  * SubmitORB/ResetFetchAgent/solicited-status routing delegate to the
    FetchAgent; login/reconnect success Bind() it, bus reset/logout
    Unbind() it.
  * Logs under the Async category (DICE has no SBP2 log category).

Also fixes a FetchAgent port slip: the fetch-agent write retry backoff
was 1 ms but PR #19 (the behavioral oracle) uses 1000 ms; the ported
ImmediateORBRetry... test pins the 1000 ms timing.

Ported SBP2LoginSessionTests -> LoginSessionTests, adapted to the
scheduler model and FetchAgent delegation; #19's two-queue-specific
test is dropped. 39 SBP2 host tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Decompose PR #19's SBP2SessionRegistry (799 lines, god-object record):

  * SessionRegistry (Session/SessionRegistry.{hpp,cpp}) — identity &
    lifecycle only: Create/StartLogin/GetSessionState/Release/OnBusReset/
    RefreshTargets, dup-target reject (afcbd9f), owner validation
    (8b64806), release-sessions-before-ranges ordering (9ca0d8e),
    async logout-retain (retiringSessions_ + SetReleaseLogoutCallback).
  * SessionRecord — slim value type; the command god-object is lifted
    into a per-record CommandExecutor.
  * CommandExecutor (Session/CommandExecutor.{hpp,cpp}) — command plane:
    owns command ORB / page table / management ORB / in-flight + result
    state, drives the session's FetchAgent via LoginSession::SubmitORB,
    preserves inquiry-failure status (f8b0403) and failed-ORB resource
    release (45a5609).

DICE adaptations vs #19:
  * LoginSession timers run on an injected ISessionScheduler (registry
    ctor argument); the two-queue model is gone. ManagementORB (foundation)
    keeps its single work queue, passed through for task-management timers.
  * ReleaseSession no longer blocks the single Default queue with an
    IOSleep(10) wait-loop; like ReleaseOwner it starts logout, retires the
    session, and lets the async logout completion / scheduler timeout erase
    it (the wait-loop path was untested).
  * CleanupCommandResources clears the session's fetch-agent ORB tracking
    via LoginSession::ClearCommandTracking (mirrors #19's
    session->ClearORBTracking), cancelling any in-flight fetch-agent write
    when a command completes/fails/aborts.

Ported SBP2SessionRegistryTests -> SessionRegistryTests, adapted to the
decomposed API + scheduler. Two assertions updated for DICE's 16-byte
NormalORB header (PR #19 assumed 20). The two SBP2Handler-dependent tests
defer to FW-57 (the session-aware user-client handler is not on DICE yet).

Full host suite green: 1166/1166.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Handler)

Expose the FW-56 session/command layer across the DriverKit user-client
boundary (host-testable core):

  * UserClient/WireFormats/SBP2CommandWireFormats.hpp — net-new ABI records
    (SBP2CommandRequestWire / SBP2CommandResultWire), ported verbatim from
    PR #19 with the static_assert layout guards intact.
  * UserClient/Handlers/SBP2Handler.hpp — re-thread #19's session-aware
    handler onto DICE's decomposed SessionRegistry (type/path adapted from
    SBP2SessionRegistry). Adds Create/StartLogin/GetSessionState/Inquiry/
    Command/CommandResult/TaskManagement/ReleaseSession on top of the
    foundation address-space methods. The registry pointer defaults to null
    so the existing address-space-only construction keeps compiling until
    the registry is wired into the driver lifecycle (FW-58). Owner-validation
    contract (void* owner + opaque handle) passes straight through (8b64806);
    SubmitSBP2Command hardens the structure-input ABI (1ee4515); ReleaseOwner
    releases sessions before address ranges (9ca0d8e).

Ported the two SBP2Handler tests deferred from FW-56 → SBP2HandlerTests:
GetSBP2SessionState scalar-output sizing + SubmitSBP2Command ABI hardening.
Full host suite green: 1168/1168.

Remaining FW-57 driver-integration (paired with FW-58, needs Xcode/IIG to
verify): add the SBP-2 session selectors to ASFWDriver.iig + dispatch table,
and construct the SessionRegistry in UserClientRuntimeState (it needs the
scheduler + device manager the driver lifecycle owns). The .dext compile of
those selectors lands in FW-59.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mrmidi mrmidi force-pushed the sbp2-session-port branch from 7d21aed to 5cf44cb Compare June 18, 2026 15:27
@mrmidi

mrmidi commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

Rebased sbp2-session-port onto current origin/DICE (c2bdf11, PR #29 included) and force-pushed the PR branch.

The reported DICE failure appears to have been a stale-base issue:

  • DiceDuplexRestartCoordinatorTests.LatestPendingClockRequestWinsDuringRestart: passes locally after rebase (1/1)
  • full host suite: passes locally after rebase (1143/1143)
  • Xcode/IIG build: succeeds after rebase with the existing isoch analyzer/script warnings only

The branch is now 0 commits behind origin/DICE and 11 SBP-2 commits ahead. Still requires hardware verification.

@mrmidi

mrmidi commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

Merging as is for the sake of other features sync

@mrmidi mrmidi marked this pull request as ready for review June 19, 2026 11:28
@mrmidi mrmidi merged commit dcf3b71 into DICE Jun 19, 2026
2 checks passed
@gly11

gly11 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

@gly11 it's builds but not tested on HW. Could you test please?

Sorry for the slow reply, I’ve been tied up with a few other things recently.

Thanks for porting and merging this. I’ll test the current main branch on my hardware and report back with the results. If I find any issues, I’ll try to fix them in a focused PR or report them with logs/details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants