Skip to content

feat(sbp2): add session registry and command flow#19

Merged
mrmidi merged 22 commits into
mrmidi:mainfrom
gly11:pr/sbp2-session-core
Jun 2, 2026
Merged

feat(sbp2): add session registry and command flow#19
mrmidi merged 22 commits into
mrmidi:mainfrom
gly11:pr/sbp2-session-core

Conversation

@gly11

@gly11 gly11 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR builds on the foundation changes merged in #18 and splits out the SBP-2 session and command core from the larger SBP-2 bring-up branch. The branch has been rebased onto current main, so the visible diff is now limited to the SBP-2 session layer.

SBP-2 Session Core

  • Add login, reconnect, and logout session state handling, including bus-reset retry behavior, busy-timeout replay, and unsolicited status setup.
  • Add a session registry for discovered SBP-2 units and command lifecycle management.
  • Add command submission helpers for standard SCSI command flow and task management.
  • Wire SBP-2 session operations through ControllerCore, DriverContext, and the user-client handler.

ORB and Addressing

  • Extend management and command ORB layout and status handling.
  • Add full bus/node embedded-address handling and page-table/direct-descriptor updates.
  • Add command wire formats for the user-client boundary.

Tests

  • Add login-session and session-registry coverage.
  • Extend ORB and address-space tests for timeout, status, embedded-address, and command lifecycle behavior.

Why this split

This PR intentionally excludes local debug UI, diagnostic handlers, install-helper changes, diagnostic scripts, and documentation experiments. Those can be reviewed separately or kept local. The async discovery and bus-reset foundations landed in #18; this layer adds SBP-2 session and command behavior on top of that.

Verification

cmake -S . -B build-pr2 -DCMAKE_BUILD_TYPE=Debug
cmake --build build-pr2 --target AddressSpaceManagerTests SBP2ORBTests SBP2LoginSessionTests SBP2SessionRegistryTests --parallel 8
for t in AddressSpaceManagerTests SBP2ORBTests SBP2LoginSessionTests SBP2SessionRegistryTests; do "build-pr2/tests/$t" || exit $?; done
xcodebuild -project ASFW.xcodeproj -scheme ASFW -configuration Debug -destination 'platform=macOS' CODE_SIGNING_ALLOWED=NO build

After #18 merged, this branch was rebased onto current upstream/main; git diff --check upstream/main..pr/sbp2-session-core passes.

@gly11 gly11 force-pushed the pr/sbp2-session-core branch from 2823267 to 9113139 Compare May 27, 2026 07:54
@gly11 gly11 marked this pull request as ready for review May 27, 2026 13:32

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91131392ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +103 to +105
if (lock_ != nullptr) {
IOLockFree(lock_);
lock_ = nullptr;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Unlock the registry lock before freeing it

When the registry is destroyed, this frees lock_ while the IOLockGuard created at the top of the destructor is still alive. As the function returns, the guard destructor will call IOLockUnlock on the already-freed lock pointer it captured, which can crash or corrupt memory during driver shutdown/unload. End the guard scope before calling IOLockFree or free the lock after it has been unlocked.

Useful? React with 👍 / 👎.

@gly11 gly11 May 27, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the current branch head (166b8a7). The registry destructor now ends the lock guard scope before freeing lock_, so the guard cannot unlock a freed IOLock during shutdown.

Comment on lines +1284 to +1286
orb->SetFetchAgentWriteRetries(testFetchAgentWriteRetries_);
orb->SetAppended(true);
outstandingORBs_[MakeORBKey(orb->GetORBAddress())] = orb;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start submitted ORB timeouts

After a command ORB is prepared and inserted into outstandingORBs_, no code starts the SBP2CommandORB timer. In the case where the fetch-agent write succeeds but the device never writes a status block, the registry leaves commandInFlight set and callers polling GetCommandResult wait forever despite SubmitCommand having configured a timeout on the ORB. Start the ORB timer when the ORB is actually submitted to the fetch agent/chain, and cancel it on normal status completion as this code already does.

Useful? React with 👍 / 👎.

@gly11 gly11 May 27, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the current branch head (166b8a7). Submitted command ORBs now start their timeout once the fetch-agent write succeeds and the target can fetch the ORB; normal completion and failure paths still cancel the timer.

CleanupCommandResources(*rec);
});

if (!record->session->SubmitORB(orb.get())) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid submit callbacks while holding the registry lock

This calls SubmitORB while still holding the registry IOLock, but SubmitORB can synchronously invoke the ORB completion callback on immediate submit failure (for example when WriteBlock returns an empty handle via AppendORBImmediateFailSubmittedORB). That callback immediately tries to take the same lock_, so a transient transport submission failure deadlocks the user-client request path instead of returning an error. Move the submission outside the locked section or ensure synchronous submit failures do not call back under this lock.

Useful? React with 👍 / 👎.

@gly11 gly11 May 27, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the current branch head (166b8a7). SubmitCommand now prepares and records the command under the registry lock, releases that lock before calling SubmitORB, and cleans up safely if submission fails. This avoids synchronous submit-failure callbacks re-entering the registry lock.

@gly11 gly11 force-pushed the pr/sbp2-session-core branch 2 times, most recently from eff2132 to 76a8f1d Compare May 27, 2026 14:44
@gly11

gly11 commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

The failing check on this PR is from the C++ coverage processing step, not from the SBP-2 code or tests. The run completed all 480 tests successfully, then failed while merging LLVM profile data.

I split the CI fix into #20. That PR isolates LLVM profile outputs for both test discovery and test execution, and its Build and Test check passes. Once #20 lands, this PR should only need a rerun/rebase against the updated workflow.

Comment thread ASFWDriver/Protocols/SBP2/SBP2WireFormats.hpp Outdated
Comment thread ASFWDriver/Protocols/SBP2/SBP2LoginSession.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2LoginSession.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2WireFormats.hpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2CommandORB.cpp Outdated
Comment thread ASFWDriver/Protocols/SBP2/SBP2CommandORB.hpp Outdated
Comment thread ASFWDriver/Protocols/SBP2/SBP2CommandORB.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2PageTable.hpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2SessionRegistry.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2SessionRegistry.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2SessionRegistry.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2SessionRegistry.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2SessionRegistry.cpp Outdated
Comment thread ASFWDriver/Protocols/SBP2/SBP2SessionRegistry.cpp
Comment thread ASFWDriver/Protocols/SBP2/SBP2SessionRegistry.hpp
Comment thread ASFWDriver/Service/DriverContext.cpp

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@mrmidi

mrmidi commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Thanks @gly11 — this looks like a great starting point for the SBP-2 implementation.

I’m going to merge this as a foundation. The next step will be to adapt and split the SBP-2 pieces so they align better with the newer protocol/device architecture we now have on the DICE branch. That follow-up restructuring can happen separately; this PR gives us a useful base to build from.

@mrmidi mrmidi merged commit fa380ac into mrmidi:main Jun 2, 2026
2 checks passed
@gly11

gly11 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

I have a few follow-up draft branches from the old stack, but I do not want to open follow-up PRs against the wrong base or architecture.

Should small independent fixes still target main, or would you prefer follow-up work to target DICE while the newer protocol/device architecture is being developed there?

I can re-split the queue so SBP-2-related work waits for the DICE-aligned structure, while only truly independent app/debug fixes go to main if that is preferred.

@mrmidi

mrmidi commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Preferably wait a bit. OR f you have some tokens to burn and time to debug/test — read below :).

I have some ideas how to re-organize different protocols: main idea could be seen in f19d2d2. Bierfly — decouple audio leakage from discovery and clear separation how protocols should be loaded. In the same time — i'm finishing full Bus/IRM manager implementation. So the goal is not to grow the driver for every single device, but make it hardware agnostic where possible — follow the specs first, quirks later.
Also in the DICE branch: some critical bug fixes regarding OHCI hw initialization, some IEEE 1394 constants fixes, etc. Generally it's much more reliable with my hardware right now. (audio performance degraded a bit, but that's not what I care right now). I've already ported your config rom fixes there too — please test if it works with your Nikon if you can.

So for SBP-2 i see it it like that (it's a draft, just the core idea):

ASFWDriver/
  DeviceProfiles/
    Storage/
      StorageProfileTypes.hpp
      StorageProfileRegistry.hpp
      StorageProfileRegistry.cpp
      Generic/
        GenericSBP2Profiles.hpp
        GenericSBP2Profiles.cpp
      Vendors/  <- probably not needed at all for SBP-2

  Storage/
    StorageCoordinator.hpp
    StorageCoordinator.cpp
    StorageRuntimeRegistry.hpp
    StorageRuntimeRegistry.cpp

    SCSI/ <- SCSIControllerDriverKit part
      DriverKitSCSIController.hpp
      DriverKitSCSIController.cpp
      DriverKitSCSIDevice.hpp
      DriverKitSCSIDevice.cpp
      DriverKitSCSIRequestBridge.hpp
      DriverKitSCSIRequestBridge.cpp

  Protocols/
    Storage/
      ISCSITransport.hpp
      SCSITransportTypes.hpp

      SBP2/
        SBP2StorageTransport.hpp
        SBP2StorageTransport.cpp

        Core/
          SBP2Constants.hpp
          SBP2Types.hpp
          SBP2TargetInfo.hpp
          SBP2TargetParser.hpp
          SBP2LoginSession.hpp
          SBP2LoginSession.cpp
          SBP2SessionState.hpp
          SBP2GenerationBinding.hpp

        ORB/
          SBP2WireFormats.hpp
          SBP2ManagementORB.hpp
          SBP2ManagementORB.cpp
          SBP2CommandORB.hpp
          SBP2CommandORB.cpp
          SBP2PageTable.hpp
          SBP2PageTable.cpp
          SBP2ORBExecutor.hpp
          SBP2ORBExecutor.cpp

        Addressing/
          SBP2AddressSpaceManager.hpp
          SBP2AddressSpaceManager.cpp
          SBP2AddressSpaceLease.hpp
          SBP2AddressSpaceLease.cpp
          SBP2StatusBlockAddress.hpp

        Status/
          SBP2StatusBlock.hpp
          SBP2StatusBlock.cpp
          SBP2StatusMapper.hpp
          SBP2StatusMapper.cpp

Ping me on Discord - we could chat about it more there!

@gly11

gly11 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Preferably wait a bit. OR f you have some tokens to burn and time to debug/test — read below :).

I have some ideas how to re-organize different protocols: main idea could be seen in f19d2d2. Bierfly — decouple audio leakage from discovery and clear separation how protocols should be loaded. In the same time — i'm finishing full Bus/IRM manager implementation. So the goal is not to grow the driver for every single device, but make it hardware agnostic where possible — follow the specs first, quirks later. Also in the DICE branch: some critical bug fixes regarding OHCI hw initialization, some IEEE 1394 constants fixes, etc. Generally it's much more reliable with my hardware right now. (audio performance degraded a bit, but that's not what I care right now). I've already ported your config rom fixes there too — please test if it works with your Nikon if you can.

So for SBP-2 i see it it like that (it's a draft, just the core idea):

ASFWDriver/
  DeviceProfiles/
    Storage/
      StorageProfileTypes.hpp
      StorageProfileRegistry.hpp
      StorageProfileRegistry.cpp
      Generic/
        GenericSBP2Profiles.hpp
        GenericSBP2Profiles.cpp
      Vendors/  <- probably not needed at all for SBP-2

  Storage/
    StorageCoordinator.hpp
    StorageCoordinator.cpp
    StorageRuntimeRegistry.hpp
    StorageRuntimeRegistry.cpp

    SCSI/ <- SCSIControllerDriverKit part
      DriverKitSCSIController.hpp
      DriverKitSCSIController.cpp
      DriverKitSCSIDevice.hpp
      DriverKitSCSIDevice.cpp
      DriverKitSCSIRequestBridge.hpp
      DriverKitSCSIRequestBridge.cpp

  Protocols/
    Storage/
      ISCSITransport.hpp
      SCSITransportTypes.hpp

      SBP2/
        SBP2StorageTransport.hpp
        SBP2StorageTransport.cpp

        Core/
          SBP2Constants.hpp
          SBP2Types.hpp
          SBP2TargetInfo.hpp
          SBP2TargetParser.hpp
          SBP2LoginSession.hpp
          SBP2LoginSession.cpp
          SBP2SessionState.hpp
          SBP2GenerationBinding.hpp

        ORB/
          SBP2WireFormats.hpp
          SBP2ManagementORB.hpp
          SBP2ManagementORB.cpp
          SBP2CommandORB.hpp
          SBP2CommandORB.cpp
          SBP2PageTable.hpp
          SBP2PageTable.cpp
          SBP2ORBExecutor.hpp
          SBP2ORBExecutor.cpp

        Addressing/
          SBP2AddressSpaceManager.hpp
          SBP2AddressSpaceManager.cpp
          SBP2AddressSpaceLease.hpp
          SBP2AddressSpaceLease.cpp
          SBP2StatusBlockAddress.hpp

        Status/
          SBP2StatusBlock.hpp
          SBP2StatusBlock.cpp
          SBP2StatusMapper.hpp
          SBP2StatusMapper.cpp

Ping me on Discord - we could chat about it more there!

That makes sense. I’ll test the DICE branch with my Nikon hardware next.

If I find any issues, I’ll report them in an issue or open a focused PR with a fix.

I’ll also keep an eye on Discord for follow-up discussion.

mrmidi added a commit that referenced this pull request Jun 18, 2026
…tagging

FW-55 (foundation deltas for the SBP-2 session/command port). The session
layer ported from PR #19 tags each of its address-space ranges (login ORB,
login response, status FIFO, reconnect/logout ORBs) with a human-readable
label for diagnostics. DICE's AddressSpaceManager already covers every other
API the session layer needs and already had the remote-write callback
lifetime safety (callback copied out of the lock before firing), so this is
the only foundation gap.

Diagnostics-only: the label feeds range-dump logging and is a safe no-op for
unknown/zero handles and null labels. Added a contract test covering bad
input and that labelling never perturbs range state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mrmidi added a commit that referenced this pull request Jun 18, 2026
Decomposes PR #19's two monoliths (SBP2LoginSession.cpp 1847 lines,
SBP2SessionRegistry.cpp 799 lines) into DICE-style single-purpose
components, and maps every #19 ORB call onto DICE's foundation API.

LoginSession -> LoginSession (orchestrator/state) + LoginOrbExchange
(management plane) + FetchAgent (command plane) + UnsolicitedStatusSink.
SessionRegistry -> SessionRegistry (identity/lifecycle) + CommandExecutor
(command plane) + slim SessionRecord.

Captures the single-Default-queue simplification (delete #19's owned
timeout-queue machinery) and the small CommandORB foundation additions
FW-56 needs (IsValid, bool SetCommandBlock, kern_return_t returns).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mrmidi added a commit that referenced this pull request Jun 18, 2026
Extracts the post-login ORB submission machinery from #19's SBP2LoginSession
into a focused FetchAgent owned by the session (composition). Submits immediate
ORBs to the fetch-agent register, chains subsequent ORBs, rings the doorbell,
retries failed fetch-agent writes, tracks outstanding ORBs, times them out, and
matches incoming status blocks back to their ORB.

Adapted to DICE: driven by an explicit Binding (generation/node/agent addresses)
the session supplies on login, instead of reading login state directly; ORB
timeouts and write-retry backoff go through the injected ISessionScheduler (not
the IOSleep-on-queue path); CommandORB kern_return_t returns are propagated;
async bus callbacks guarded by a weak lifetime token. Adds CommandORB::GetTimeout.

4 FetchAgentTests cover unbound rejection, immediate submit→write→timeout-arm,
status completion, ORB timeout, and write-retry exhaustion → agent reset. Green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mrmidi added a commit that referenced this pull request Jun 18, 2026
Decompose PR #19's SBP2LoginSession (1847 lines): the login/reconnect/
logout state machine + status-FIFO routing stays here as Session/
LoginSession; the post-login command plane is the composed FetchAgent
(step 2). DICE adaptations:

  * Timers run on the injected ISessionScheduler (one cancelable
    management timer at a time), replacing #19's two-queue IOSleep model
    (SetTimeoutQueue/EnsureTimeoutQueue/owned queue all removed, §4).
  * SubmitORB/ResetFetchAgent/solicited-status routing delegate to the
    FetchAgent; login/reconnect success Bind() it, bus reset/logout
    Unbind() it.
  * Logs under the Async category (DICE has no SBP2 log category).

Also fixes a FetchAgent port slip: the fetch-agent write retry backoff
was 1 ms but PR #19 (the behavioral oracle) uses 1000 ms; the ported
ImmediateORBRetry... test pins the 1000 ms timing.

Ported SBP2LoginSessionTests -> LoginSessionTests, adapted to the
scheduler model and FetchAgent delegation; #19's two-queue-specific
test is dropped. 39 SBP2 host tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mrmidi added a commit that referenced this pull request Jun 18, 2026
Decompose PR #19's SBP2SessionRegistry (799 lines, god-object record):

  * SessionRegistry (Session/SessionRegistry.{hpp,cpp}) — identity &
    lifecycle only: Create/StartLogin/GetSessionState/Release/OnBusReset/
    RefreshTargets, dup-target reject (afcbd9f), owner validation
    (8b64806), release-sessions-before-ranges ordering (9ca0d8e),
    async logout-retain (retiringSessions_ + SetReleaseLogoutCallback).
  * SessionRecord — slim value type; the command god-object is lifted
    into a per-record CommandExecutor.
  * CommandExecutor (Session/CommandExecutor.{hpp,cpp}) — command plane:
    owns command ORB / page table / management ORB / in-flight + result
    state, drives the session's FetchAgent via LoginSession::SubmitORB,
    preserves inquiry-failure status (f8b0403) and failed-ORB resource
    release (45a5609).

DICE adaptations vs #19:
  * LoginSession timers run on an injected ISessionScheduler (registry
    ctor argument); the two-queue model is gone. ManagementORB (foundation)
    keeps its single work queue, passed through for task-management timers.
  * ReleaseSession no longer blocks the single Default queue with an
    IOSleep(10) wait-loop; like ReleaseOwner it starts logout, retires the
    session, and lets the async logout completion / scheduler timeout erase
    it (the wait-loop path was untested).
  * CleanupCommandResources clears the session's fetch-agent ORB tracking
    via LoginSession::ClearCommandTracking (mirrors #19's
    session->ClearORBTracking), cancelling any in-flight fetch-agent write
    when a command completes/fails/aborts.

Ported SBP2SessionRegistryTests -> SessionRegistryTests, adapted to the
decomposed API + scheduler. Two assertions updated for DICE's 16-byte
NormalORB header (PR #19 assumed 20). The two SBP2Handler-dependent tests
defer to FW-57 (the session-aware user-client handler is not on DICE yet).

Full host suite green: 1166/1166.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mrmidi added a commit that referenced this pull request Jun 18, 2026
…Handler)

Expose the FW-56 session/command layer across the DriverKit user-client
boundary (host-testable core):

  * UserClient/WireFormats/SBP2CommandWireFormats.hpp — net-new ABI records
    (SBP2CommandRequestWire / SBP2CommandResultWire), ported verbatim from
    PR #19 with the static_assert layout guards intact.
  * UserClient/Handlers/SBP2Handler.hpp — re-thread #19's session-aware
    handler onto DICE's decomposed SessionRegistry (type/path adapted from
    SBP2SessionRegistry). Adds Create/StartLogin/GetSessionState/Inquiry/
    Command/CommandResult/TaskManagement/ReleaseSession on top of the
    foundation address-space methods. The registry pointer defaults to null
    so the existing address-space-only construction keeps compiling until
    the registry is wired into the driver lifecycle (FW-58). Owner-validation
    contract (void* owner + opaque handle) passes straight through (8b64806);
    SubmitSBP2Command hardens the structure-input ABI (1ee4515); ReleaseOwner
    releases sessions before address ranges (9ca0d8e).

Ported the two SBP2Handler tests deferred from FW-56 → SBP2HandlerTests:
GetSBP2SessionState scalar-output sizing + SubmitSBP2Command ABI hardening.
Full host suite green: 1168/1168.

Remaining FW-57 driver-integration (paired with FW-58, needs Xcode/IIG to
verify): add the SBP-2 session selectors to ASFWDriver.iig + dispatch table,
and construct the SessionRegistry in UserClientRuntimeState (it needs the
scheduler + device manager the driver lifecycle owns). The .dext compile of
those selectors lands in FW-59.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mrmidi added a commit that referenced this pull request Jun 19, 2026
DICE and main had diverged into two parallel lines. This merge records DICE
(40a796c) as a second parent so main's prior history — including gly11's
merged PRs #18/#19/#20 and the sbp2/ci/foundation fixes — is preserved as
ancestry, while the resulting tree is taken wholesale from DICE.

main-only code is superseded by DICE's implementation but remains recoverable
from history (refs/backup/main-pre-dice-merge = 757456d).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants