feat(ble-proxy): serialize BLE connects + structured slot-exhaustion errors#704
Draft
Apollon77 wants to merge 14 commits into
Draft
feat(ble-proxy): serialize BLE connects + structured slot-exhaustion errors#704Apollon77 wants to merge 14 commits into
Apollon77 wants to merge 14 commits into
Conversation
Add out_of_connection_slots and connection_aborted wire codes, a BleProxyError carrying the code, and isOutOfConnectionSlotsError() which classifies by structured code with a connection_failed message-text fallback for proxy clients predating the code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#handleResponse now rejects failed commands with BleProxyError so callers can read the structured .code (e.g. out_of_connection_slots) instead of only a flattened message. Adds a TestProxyClientError test helper so the test client can emit a chosen wire error code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odes _handle_connect now maps BLE connect exceptions to out_of_connection_slots, connection_aborted, device_not_found, or bluetooth_unavailable (else connection_failed), matching by exception class name + message markers so bleak_retry_connector stays a soft dependency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a Semaphore(1) connect gate in ProxyBleCentralInterface so matter.js parallel-PASE against a device's rotating BLE addresses can't exhaust the proxy backend's connection slots. The slot is held for the channel lifetime and released idempotently on close/failure; a queued connect aborts on interface close, an incoming options.abort, or a 20s bounded wait. Because matter.js never signals a blocked openChannel, also track handed-out channels and tear down orphans (plus reject queued waiters) via abortPendingConnects(), and a generation check tears down a connect that completes after an abort — so a cancelled/timed-out commission can't leave a channel holding the single slot forever. Surface a clear actionable error on slot exhaustion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Emit a commissioningEnded event from ControllerCommandHandler.commissionNode (finally, so success and failure both fire) and wire it in MatterServer to ProxyBle.abortPendingConnects(). BLE is only used during commissioning, so this releases queued connect waiters and tears down any orphaned channel holding the single connection slot once an attempt finishes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ted codes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The orphan branch threw into the connect-failure catch, which already sends the proxy Disconnect — so it was sent twice. Let the catch own it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
node_modules/lockfile were on 0.17.0 while package.json pins the alpha; the branch did not type-check until synced. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens BLE Proxy commissioning by preventing concurrent BLE connect storms (which can exhaust proxy backend connection slots) and by introducing structured, typed error codes for clearer diagnosis across the TS server and Python proxy client.
Changes:
- Serialize BLE connects through the proxy with a single-slot semaphore, plus lifecycle abort/cleanup to prevent “orphaned” connections from holding the only slot.
- Add structured wire error codes (
out_of_connection_slots,connection_aborted) and propagate them as typed errors (BleProxyError) end-to-end. - Add focused TS + Python tests covering connect gating, abort behavior, orphan cleanup, and error classification/round-trip.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python_ble_proxy/tests/test_connect_errors.py | Adds unit tests for Python-side connect-error classification into structured wire codes. |
| python_ble_proxy/matter_ble_proxy/client.py | Implements connect-error classification and uses it when connect fails. |
| packages/ws-controller/test/CommissioningLifecycleTest.ts | Adds tests asserting a new commissioning lifecycle event fires on success/failure. |
| packages/ws-controller/src/controller/ControllerCommandHandler.ts | Introduces commissioningEnded event emitted after commissionNode attempt finishes. |
| packages/matter-server/src/MatterServer.ts | Hooks commissioning lifecycle to abort pending proxy BLE connects when commissioning ends. |
| packages/ble-proxy/test/BleProxyTestClient.ts | Enhances test proxy client to return structured error codes from handlers. |
| packages/ble-proxy/test/BleProxyProtocolTest.ts | Adds tests for slot-exhaustion classification and typed error preservation. |
| packages/ble-proxy/test/BleProxyIntegrationTest.ts | Adds integration tests for connect serialization, abort behavior, orphan cleanup, and error messaging. |
| packages/ble-proxy/src/ProxyBleChannel.ts | Implements connect concurrency gate, abort generation handling, and slot-exhaustion error surfacing. |
| packages/ble-proxy/src/ProxyBle.ts | Exposes abortPendingConnects() to trigger central-interface connect abort/cleanup. |
| packages/ble-proxy/src/BleProxyProtocol.ts | Adds new error codes, slot-exhaustion message marker matching, and BleProxyError. |
| packages/ble-proxy/src/BleProxyHandler.ts | Preserves structured error codes by rejecting pending commands with BleProxyError. |
| package-lock.json | Updates workspace Node engine constraints in the lockfile to >=22.13.0. |
| docs/ble-proxy-protocol.md | Documents the new structured connect error codes. |
The client sent discover_failed; the protocol enum and docs use discovery_failed. Align the client with the canonical wire code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the TS isOutOfConnectionSlotsError lowercasing so mixed-case backend messages still classify as slot-exhaustion/aborted instead of falling back to connection_failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep commissioningStarted/commissioningEnded as plain lifecycle facts on the controller (emitted per attempt, never for invalid requests) and move the "release the BLE connect gate only when no commission is in progress" policy into wireBleConnectGate, the consumer-side ref-counter. Finishing one commission can no longer abort a concurrent attempt's connect, and the controller event stays generic and reusable. Addresses PR review feedback on the commissioningEnded finally placement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
python_ble_proxy/matter_ble_proxy/client.py:498
- The PR description's follow-ups mention a pre-existing wire-code drift (
discover_failedvsdiscovery_failed) as not addressed in this PR, but this change updates the Python client to emitdiscovery_failed. Please update the PR description/follow-ups so it matches what the code actually does (or revert this change if it was meant to stay out of scope).
if conn.client.is_connected:
await conn.client.disconnect()
await self._send_success(cmd_id)
async def _handle_discover_services(self, cmd_id: int, args: dict[str, Any]) -> None:
conn = self._get_connection(args["connection_handle"])
Keep PR #704 JS-only. The python client's _classify_connect_error (and its case-insensitive markers) move to the bleak_retry_connector connect_strategy PR, where the python-side connect path lives. Server still classifies slot-exhaustion via the connection_failed message-text fallback, so an un-updated client keeps working. discovery_failed fix retained. Tracked in todo: ble-proxy/python-structured-connect-errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines
484
to
488
| async close() { | ||
| this.#connected = false; | ||
| this.#releaseSlot(); | ||
| this.#cleanupObservers(); | ||
| this.#terminateIterator(); |
Comment on lines
+332
to
+335
| if (this.#closed || this.#abortGeneration !== abortGenAtStart) { | ||
| await proxyChannel.close(); | ||
| throw new BleError(`BLE connect to ${peripheralAddress} aborted before completion`); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A single physical device advertising several rotating BLE addresses caused matter.js parallel-PASE to fire concurrent BLE connect attempts at it. The proxy backend (HA/ESPHome) ran out of connection slots and commissioning failed with a cascade of timeouts (
8 of 9 started attempt(s) failed).This serializes BLE connects through the proxy and surfaces slot exhaustion as a clear, structured error. Scope: JS/TS server side only — see "Deferred" below for the python-client counterpart.
What changed
ProxyBleCentralInterface): a matter.jsSemaphore(concurrency = 1) acquired beforeConnectand held for the channel lifetime, released idempotently on close/failure. At most one BLE connection in flight, so the connect storm never forms. A queued connect aborts on interface close, an incomingoptions.abort, or a 20 s bounded wait.openChannel(it only races the promise viaAbort.attempt, never passesoptions.aborton the BLE path) andTransport.close()only fires at node teardown. So a connect finishing after its commission was abandoned would hold the single slot forever. We track handed-out channels (#openChannels) and a generation check tears down a connect that completes after an abort;ProxyBleCentralInterface.abortPendingConnects()rejects queued waiters and closes orphans.ControllerCommandHandleremits policy-freecommissioningStarted/commissioningEnded(per attempt; never for invalid requests).matter-server/src/bleConnectGate.tswireBleConnectGate()ref-counts them and callsabortPendingConnects()only when no commission is in progress — so finishing one attempt can't tear down a concurrent attempt's connect. These events are generic and reusable by other JS consumers.out_of_connection_slots+connection_aborted, propagated as a typedBleProxyError(carries.code).isOutOfConnectionSlotsError()classifies by structured code or a case-insensitiveconnection_failedmessage-text fallback, so an un-updated python client still gets the clear, actionable error ("increase connection_slots / add proxies").discover_failed→discovery_failedwire-code fix is included here (independent bug — the canonical code in the enum + docs isdiscovery_failed).Tests
Connectin flight), queue-abort on interface close,abortPendingConnectsreject-but-reusable, orphan teardown (already-open + finishes-after-abort), clear slot-exhaustion error, typed-error round-trip;commissioningStarted/commissioningEndedbalanced/per-attempt/not-on-invalid-args;wireBleConnectGateref-count (release only after last concurrent attempt).Deferred to the bleak_retry_connector connect_strategy PR
The python client's structured connect-error classification (
_classify_connect_error+ bleak marker matching) was reverted from this PR to keep it JS-only; it belongs with the python-side connect path in theconnect_strategywork. The server's message-text fallback keeps things working until then. The python/HA side also needs to wire the commissioning gate behavior. Tracked locally inble-proxy/python-structured-connect-errors.Other follow-ups (pre-existing, not in this PR)
_handle_connectTimeoutErrorreturnsconnection_failedrather than the existingtimeoutcode.BleProxyHandler.#cleanupClientrejects with a plainError(could beBleProxyError).ControllerCommissioneromits{ abort }toble.openChannel(TCP path passes it); plumbing it would let the gate cancel queued connects instantly instead of via the 20 s backstop.🤖 Generated with Claude Code