fix(node): keep Schrödinger clusters across re-interview#3993
Merged
Conversation
A cluster present in client storage but absent from the peer's descriptor server list survived startup (loadCache) but was deleted on the first live wildcard read when the descriptor chunk arrived before the cluster's attribute data and the cluster behavior was already active: the cancel-pending-delete logic only ran while building a new behavior, so an active cluster's scheduled deletion was never cleared. Track clusters that deliver attribute data within an interaction and cancel any deletion a descriptor scheduled for them, regardless of chunk order or whether the behavior already exists. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a client-structure edge case in @matter/node where a “Schrödinger” cluster (present in cached/local state and still reported by the peer via attributes, but omitted from the peer’s Descriptor serverList) could be incorrectly deleted during the first live re-interview due to chunk ordering.
Changes:
- Add per-interaction tracking of clusters that received data, and use it to prevent/suppress descriptor-driven deletion scheduling for those clusters regardless of report ordering.
- Centralize “preserve absent cluster” logic to cancel pending deletions when attribute/wire data arrives for a cluster that was previously marked for deletion.
- Add a regression test that reproduces descriptor-first ordering and asserts the cluster survives re-interview.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/node/src/node/client/ClientStructure.ts | Track clusters receiving data during an interaction and prevent descriptor sync from deleting clusters that still receive attribute/wire updates. |
| packages/node/test/node/ClientNodeTest.ts | Add a regression test for descriptor-first re-interview ordering to ensure “Schrödinger” clusters persist. |
Add a regression test asserting that a client cluster the peer stops serving (omitted from serverList and no attribute data in the interaction) is dropped from memory and erased from storage, and add a CHANGELOG entry for the re-interview preservation fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
Tick the box to add this pull request to the merge queue (same as
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A client cluster present in local storage but absent from the peer's descriptor
serverList("Schrödinger's cluster") survived startup (loadCache) but was deleted on the first live wildcard read after a fresh data migration. A server restart re-loaded it correctly — the deletion only struck the live read path.Observed in matterjs-server (see home-assistant/addons#4668): a migrated node's EP1 carries data for manufacturer cluster
0x125dfc11while its descriptorserverListomits it; the first re-interview loggedRemoving ...ep1.neoCluster from active endpoint.Root cause
Asymmetry between
loadCacheand livemutate:loadCacheprocesses the descriptor before injecting the endpoint's cluster behaviors, so the Schrödinger cluster is never incurrentlySupportedand is kept.mutate, the endpoint is already active. When the descriptor chunk (server list without the cluster) arrives before the cluster's attribute data,#synchronizeDescriptorschedules the cluster for deletion. The cancel-pending-delete logic lived only inside#synchronizeCluster'sif (cluster.behavior === undefined)branch — unreachable for an already-active cluster — so the later attribute data never cleared the deletion and#rebuilddropped it.Storage is not truly erased, so the next
loadCacherestores the cluster, which is why a restart "fixed" it.Fix
ClientStructure:#clustersWithDataThisInteractionset +#preserveAbsentClusterhelper (clearspendingDelete, warns once).#updateClusterand theapplyWireChangesupdate branch → cancels a deletion a descriptor scheduled earlier in the same interaction (descriptor-first ordering).#synchronizeDescriptorskips schedulingpendingDeletefor a cluster already in the set → handles data-first ordering.mutate/applyWireChanges(exception-safe; no cross-interaction leak).Live
mutateis now symmetric withloadCache: a cluster the peer still serves data for survives a re-interview regardless of chunk order.Test
New
ClientNodeTestcase "Schrödinger's cluster": establishes the cluster active-but-absent-from-serverListin one interaction, then re-interviews with descriptor-first ordering. Reproduces the exact deletion before the fix; passes after.Verification
packages/nodesuite: 1251/1251 (ESM/CJS), 1242/1242 (Web)format-verifyclean,lintclean🤖 Generated with Claude Code