Skip to content

Node Manager Phase 2b-2a: groupKey + groupKeyMap kinds & per-node mutex reconcile queue#3960

Merged
Apollon77 merged 9 commits into
node-managerfrom
node-manager-phase2b2a
Jun 23, 2026
Merged

Node Manager Phase 2b-2a: groupKey + groupKeyMap kinds & per-node mutex reconcile queue#3960
Apollon77 merged 9 commits into
node-managerfrom
node-manager-phase2b2a

Conversation

@Apollon77

Copy link
Copy Markdown
Collaborator

Sub-PR into node-manager (umbrella PR #3948 picks it up in CI). First slice of Phase 2b-2 — the command-based group-key kinds, plus a reconcile-concurrency rework.

ItemKinds (GroupKeyManagement, root)

  • groupKey (first command-based kind): provisions key sets via KeySetWrite/KeySetReadAllIndices/KeySetRemove (peer.commandsOf(GroupKeyManagementClient)). apply always writes (KeySetWrite is an idempotent overwrite and epoch keys are write-only/null, so key material is unverifiable); verify is presence-only; capacity = maxGroupKeysPerFabric (spec-floor 3 fallback); groupKeySetId === 0 (IPK) → ImplementationError.
  • groupKeyMap (attribute RMW, upsert-by-groupId): a group maps to exactly one key set, so apply replaces a differing mapping, no-ops the exact pair, never duplicates a groupId; capacity = maxGroupsPerFabric (floor 4); rejects a group→IPK(0) mapping.

Verify semantics

  • planActions: a verify pass that detects drift now re-applies directly in the same pass (was repend), so reconcile(verify) converges deterministically without relying on a follow-up trigger. The now-unreachable repend action was removed.

Concurrency rework (no fire-and-forget)

  • Replaced InFlightGuard + void-ed trigger reconciles with a per-ClientNode Mutex (@matter/general). Sync trigger observers enqueue a coalesced request (verify/refreshCapacity OR-merge) via mutex.run; the Mutex owns and serializes the work and logs task rejections — no voided or silently swallowed promises. Explicit reconcile() uses mutex.produce so it is awaitable and never overlaps a triggered pass. A request arriving mid-pass coalesces into exactly one follow-up.
  • A "relevant-data-stable-for-X" debounce for externally-driven reconciles is deferred to Phase 1b (live subscription drift) — documented; not needed for the current event-based triggers.

Tests

  • Unit: groupKey (apply/verify/remove/capacity/IPK), groupKeyMap (upsert/verify/remove/capacity), planActions verify-drift→apply, engine hardenings.
  • Integration (single peer, @matter/node/testing): provision a key set + group-key map → committed + device shows both; remove the key set behind the engine → a verify reconcile re-applies (negative assertion proves it was gone first).
  • build --clean, format-verify, lint green; @matter/node-manager 61/61; @matter/node 1227/1227.

Review

Whole-branch review + independent cross-check: merge-ready, zero Critical/Important. Concurrency rework confirmed safe (no-void, coalescing, serialization, no deadlock, no leak); groupKeyMap IPK-guard judged spec-correct.

Scope

2b-2b (next): endpointGroupMembership (Groups cluster, per-endpoint AddGroup/RemoveGroup commands, verify via GetGroupMembership).

🤖 Generated with Claude Code

Apollon77 and others added 8 commits June 22, 2026 18:39
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re groupKey capacity

Trigger observers fired the async reconcile with a naked `void`, so a rejection
from a detached pass (e.g. a capacity command initiating an exchange against a
peer torn down mid-flight) surfaced as an unhandled rejection. Route triggers
through #fireTrigger, which logs and swallows. This also covers command-based
verify. With it, groupKey.capacity() (KeySetReadAllIndices) is safe to restore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…antics

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d.apply

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A verify pass that detects drift now re-applies in the same pass instead of
re-pending, so reconcile(verify) converges deterministically without relying on
a follow-up trigger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the InFlightGuard + fire-and-forget trigger wiring with a per-ClientNode
Mutex. Triggers synchronously enqueue a coalesced reconcile request (verify /
capacity-refresh flags OR-merge); the mutex serializes passes per node, owns the
work, and logs task rejections (no voided or silently swallowed promises).
Explicit reconcile() runs via mutex.produce so it is awaitable and never overlaps
a triggered pass, making convergence deterministic. A request arriving mid-pass
coalesces into exactly one follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the now-unreachable repend ReconcileAction (verify drift applies directly);
re-clear pending after mutex close in #unwirePeer; fix a stale test comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mergify

mergify Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

  • Queue this pull request

…live device read

Capacity counts come from the subscription-maintained state (stateOf), not a
forced Matter read (getStateOf). Live reads stay only on the RMW path of data we
modify (apply/remove), before writing. groupKey drops capacity() entirely: the
key-set count has no subscribed attribute (only the KeySetReadAllIndices command),
so the device's RESOURCE_EXHAUSTED on KeySetWrite is its over-capacity gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Apollon77 Apollon77 merged commit 14f8e21 into node-manager Jun 23, 2026
2 checks passed
@Apollon77 Apollon77 deleted the node-manager-phase2b2a branch June 23, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant