[WIP] Node Manager: desired-state, reconciliation & multi-step tasks#3948
Draft
Apollon77 wants to merge 59 commits into
Draft
[WIP] Node Manager: desired-state, reconciliation & multi-step tasks#3948Apollon77 wants to merge 59 commits into
Apollon77 wants to merge 59 commits into
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…edStateBehavior Drop the second `, unknown` type param from both Observable declarations in DesiredStateBehavior.Events to match the codebase convention. Convert the two test handlers that return number (Array.push) to block bodies so they type-check under the stricter void return. Remove the inline WHAT comment above static schema. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add export * from "./desired-state/index.js" to system behavior barrel - Add DesiredStateBehavior to ClientNode.RootEndpoint.with(...) - New DesiredStatePersistenceTest: verifies registration on ClientNode and intent persistence across a node restart via shared Environment storage - Update PEER1_STATE in ClientNodeTest to include desiredState initial state Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add test coverage for GroupCapacityExceededError mapping and remove the unused unknownKindMapped fixture entry from the cache. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-emit a peer's BasicInformation softwareVersion change as a node-level lifecycle signal, wired into Peers BasicInformation instrumentation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure status x mode branch-table function, internal to the package (consumed by the reconciler via #-import, not part of the public API). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Opt-in ServerNode-root behavior driving reachable peers toward intended state: pure planActions decision + executeActions executor, six triggers (settle, sweep, peers add/del, subscription-active, intent-change, software-version-change), per-peer in-flight guard, capacity refresh, and asyncDispose cleanup. Reachability gated on an active sustained subscription. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion - Wire per-peer trigger handlers directly on the peer ObserverGroup instead of via this.callback, so they are torn down on peer removal (no reactor leak on peer churn). - #reachable mirrors NetworkClient.subscriptionActive: a sustained subscription counts as reachable only once active, not merely created. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reconciler engine: @matter/node-manager package, ReconcilerBehavior (opt-in ServerNode-root, planActions decision + executeActions executor, six triggers, per-peer in-flight guard, capacity refresh, asyncDispose), plus @matter/node ephemeral capacity cache and softwareVersionChanged signal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rity bands Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ng, dispose race Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ting Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ard test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tradeoff Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… unread Pre-flight admission stays meaningful instead of failing open; the device write remains the authoritative gate for over-capacity (RESOURCE_EXHAUSTED). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sing binding endpoint Also document itemMapKey separator escaping and make its tests format-agnostic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…caping Simpler than escape-then-join; the separator never appears in identifier/number keys. Tests use the itemMapKey() helper rather than hardcoding the key format. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re groupKey capacity Trigger observers fired the async reconcile with a naked `void`, so a rejection from a detached pass (e.g. a capacity command initiating an exchange against a peer torn down mid-flight) surfaced as an unhandled rejection. Route triggers through #fireTrigger, which logs and swallows. This also covers command-based verify. With it, groupKey.capacity() (KeySetReadAllIndices) is safe to restore. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…antics Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d.apply Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A verify pass that detects drift now re-applies in the same pass instead of re-pending, so reconcile(verify) converges deterministically without relying on a follow-up trigger. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the InFlightGuard + fire-and-forget trigger wiring with a per-ClientNode Mutex. Triggers synchronously enqueue a coalesced reconcile request (verify / capacity-refresh flags OR-merge); the mutex serializes passes per node, owns the work, and logs task rejections (no voided or silently swallowed promises). Explicit reconcile() runs via mutex.produce so it is awaitable and never overlaps a triggered pass, making convergence deterministic. A request arriving mid-pass coalesces into exactly one follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the now-unreachable repend ReconcileAction (verify drift applies directly); re-clear pending after mutex close in #unwirePeer; fix a stale test comment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…live device read Capacity counts come from the subscription-maintained state (stateOf), not a forced Matter read (getStateOf). Live reads stay only on the RMW path of data we modify (apply/remove), before writing. groupKey drops capacity() entirely: the key-set count has no subscribed attribute (only the KeySetReadAllIndices command), so the device's RESOURCE_EXHAUSTED on KeySetWrite is its over-capacity gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements GroupMembershipItemKind — the fifth reconciler kind — which provisions peer endpoint group membership via the Groups cluster AddGroup/RemoveGroup/GetGroupMembership commands. Non-success statuses in response payloads are re-thrown as StatusResponseError so the engine's retry/drop machinery works uniformly. Capacity is read from the subscription-cached GroupKeyManagement state (no live I/O). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Register GroupMembershipItemKind in ReconcilerBehavior.initialize() after GroupKeyMapItemKind (priority order: keyset < group < membership). Add three-scenario integration test (apply, behind-back re-apply, removeIntent). API-drift notes: - DesiredStateBehavior uses removeIntent(), not deleteIntent() (brief was wrong). - GroupsServer.removeGroup calls assertRemoteActor, so the behind-back removal test mutates groupTable directly (same idiom as GroupKeyIntegrationTest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, driver) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…letion Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… park/resume) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uccess Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the AddNodeToGroup task that provisions a peer endpoint into a group (groupKey + groupKeyMap + endpointGroupMembership), registered as a builtin. Fix the gate to park (not fail) when a peer is unreachable at gate entry: TaskContextImpl#evaluate now skips the unguarded reconcile for unreachable peers so the predicate stays unsatisfied and the gate waits for the reachability wake. Defer the persisted-task resume pass until the node is online so a builtin task resuming on a fresh node does not act before initialization completes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tImpl->RunningTaskContext Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # packages/node/test/behaviors/thermostat/AtomicWriteHandlerTest.ts # packages/node/test/endpoint/EndpointVariableServiceTest.ts # packages/node/test/node/ServerNodeTest.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Umbrella / integration branch for the Node Manager feature. Long-lived WIP PR — each phase merges into
node-managervia its own sub-PR; this PR is the CI backstop for the integrated branch againstmain. Do not merge until all phases land and it's de-WIP'd.Design doc:
docs/superpowers/specs/2026-06-14-node-manager-desired-state-design.mdWhat this builds
A controller-side layer holding the intended state of fabric nodes (certs/keys/IPK, group keys, bindings, ACLs, group membership), with offline-tolerant reconciliation and multi-step orchestration (e.g. group-key rotation). Modeled on the JointFabric Datastore cluster (0x0752) and generalized.
Phase checklist
@matter/node): ManagedItem/StatusEntry, ItemKind registry, capacity admission + typed errors, persistentDesiredStateBehavioronClientNode(Node Manager Phase 1: Tier-1 desired-state model #3946)@matter/node-managerpkg): triggers, settle delay, verify-barrier, priority ordering, capacity reads, concrete ItemKinds;RotateGroupKey/MoveNodeToGroup; changeset rollback; 2-node rotation harnessCarry-forwards tracked for Phase 2
limitonly, deriveusedfresh at admission.itemMapKeyseparator unescaped — escape/document before non-identifier keys.🤖 Generated with Claude Code