Skip to content

fix(node): keep Schrödinger clusters across re-interview#3993

Merged
mergify[bot] merged 4 commits into
mainfrom
fix/client-schrodinger-cluster-reinterview
Jun 26, 2026
Merged

fix(node): keep Schrödinger clusters across re-interview#3993
mergify[bot] merged 4 commits into
mainfrom
fix/client-schrodinger-cluster-reinterview

Conversation

@Apollon77

@Apollon77 Apollon77 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Problem

A client cluster present in local storage but absent from the peer's descriptor serverList ("Schrödinger's cluster") survived startup (loadCache) but was deleted on the first live wildcard read after a fresh data migration. A server restart re-loaded it correctly — the deletion only struck the live read path.

Observed in matterjs-server (see home-assistant/addons#4668): a migrated node's EP1 carries data for manufacturer cluster 0x125dfc11 while its descriptor serverList omits it; the first re-interview logged Removing ...ep1.neoCluster from active endpoint.

Root cause

Asymmetry between loadCache and live mutate:

  • loadCache processes the descriptor before injecting the endpoint's cluster behaviors, so the Schrödinger cluster is never in currentlySupported and is kept.
  • In mutate, the endpoint is already active. When the descriptor chunk (server list without the cluster) arrives before the cluster's attribute data, #synchronizeDescriptor schedules the cluster for deletion. The cancel-pending-delete logic lived only inside #synchronizeCluster's if (cluster.behavior === undefined) branch — unreachable for an already-active cluster — so the later attribute data never cleared the deletion and #rebuild dropped it.

Storage is not truly erased, so the next loadCache restores the cluster, which is why a restart "fixed" it.

Fix

ClientStructure:

  • New #clustersWithDataThisInteraction set + #preserveAbsentCluster helper (clears pendingDelete, warns once).
  • Populate the set and call the helper in #updateCluster and the applyWireChanges update branch → cancels a deletion a descriptor scheduled earlier in the same interaction (descriptor-first ordering).
  • #synchronizeDescriptor skips scheduling pendingDelete for a cluster already in the set → handles data-first ordering.
  • The set is cleared at the start of mutate/applyWireChanges (exception-safe; no cross-interaction leak).

Live mutate is now symmetric with loadCache: a cluster the peer still serves data for survives a re-interview regardless of chunk order.

Test

New ClientNodeTest case "Schrödinger's cluster": establishes the cluster active-but-absent-from-serverList in one interaction, then re-interviews with descriptor-first ordering. Reproduces the exact deletion before the fix; passes after.

Verification

  • Full packages/node suite: 1251/1251 (ESM/CJS), 1242/1242 (Web)
  • format-verify clean, lint clean

🤖 Generated with Claude Code

A cluster present in client storage but absent from the peer's
descriptor server list survived startup (loadCache) but was deleted on
the first live wildcard read when the descriptor chunk arrived before
the cluster's attribute data and the cluster behavior was already
active: the cancel-pending-delete logic only ran while building a new
behavior, so an active cluster's scheduled deletion was never cleared.

Track clusters that deliver attribute data within an interaction and
cancel any deletion a descriptor scheduled for them, regardless of
chunk order or whether the behavior already exists.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 26, 2026 15:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a client-structure edge case in @matter/node where a “Schrödinger” cluster (present in cached/local state and still reported by the peer via attributes, but omitted from the peer’s Descriptor serverList) could be incorrectly deleted during the first live re-interview due to chunk ordering.

Changes:

  • Add per-interaction tracking of clusters that received data, and use it to prevent/suppress descriptor-driven deletion scheduling for those clusters regardless of report ordering.
  • Centralize “preserve absent cluster” logic to cancel pending deletions when attribute/wire data arrives for a cluster that was previously marked for deletion.
  • Add a regression test that reproduces descriptor-first ordering and asserts the cluster survives re-interview.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
packages/node/src/node/client/ClientStructure.ts Track clusters receiving data during an interaction and prevent descriptor sync from deleting clusters that still receive attribute/wire updates.
packages/node/test/node/ClientNodeTest.ts Add a regression test for descriptor-first re-interview ordering to ensure “Schrödinger” clusters persist.

Comment thread packages/node/src/node/client/ClientStructure.ts Outdated
Add a regression test asserting that a client cluster the peer stops
serving (omitted from serverList and no attribute data in the
interaction) is dropped from memory and erased from storage, and add a
CHANGELOG entry for the re-interview preservation fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Apollon77 Apollon77 added the automerge Set this label if the PR is ready to automatically merged after approval label Jun 26, 2026
@mergify

mergify Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

  • Queue this pull request

@mergify mergify Bot merged commit deac803 into main Jun 26, 2026
37 checks passed
@mergify mergify Bot deleted the fix/client-schrodinger-cluster-reinterview branch June 26, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge Set this label if the PR is ready to automatically merged after approval

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants