feat: Matter multicast group support (controller + WS API)#756
feat: Matter multicast group support (controller + WS API)#756cedricziel wants to merge 4 commits into
Conversation
|
@Apollon77 - I intentionally marked this as draft as I wanted to get some feedback on the design here. I think there's some headroom to make groups manageable for consumers. - yes a client like HA can jump through all hoops and make create the key, provision it onto devices etc etc - but this PR would make it a ton easier and open opportunities for another ui in the dashboard. looking forward to your feedback :) |
|
@cedricziel Thanks a lot for this contribution and in fact yes this is a big part of what I also have in mind (just in a different place ;-) ) and started working on it. but because this is bigger it needs a bit of time and we could start with this but then in 1-2 months remove big parts again when the real system included in matter.js is done. ... I also have locally some "Matter dashboard UI experiments" and how to expose it to where and how to configure it for end users is even a "different "thing to discuss then with the Home Assistant guys (including what's best where). Can you contact me in discord for a sync? Might be easier than here. |
Add the OHF-extension WebSocket surface for controller-managed Matter multicast groups: create/delete/get groups, add/remove member, send_group_command, and reconcile_group, plus group_added/updated/removed events and the MatterGroupData / GroupReconciliation types. Advertise the capability via ServerInfoMessage.groups_supported. These are a superset of the Python Matter Server API (which has no group commands; see home-assistant-libs/python-matter-server#1071): groupcast send cannot be expressed via the unicast-only device_command, and the operational group key lives on the controller's own fabric.
Store controller-managed groups (id, name, key set id, members, and the operational epoch key) in dedicated storage so the controller can re-install its fabric key material after a restart — matter.js does not persist FabricGroups key sets across restarts. The epoch key is kept server-side only and stripped from the API shape. Allocates application group ids in the 0x0001..0xFEFF range and reserves key set 0 for the IPK.
Add GroupManager: installs the operational group key on the controller's own fabric, provisions members over unicast (KeySetWrite, GroupKeyMap, Groups.AddGroup, group ACL) and sends standards-compliant groupcasts via the matter.js ClientGroup path. Wire the WebSocket commands and broadcast group lifecycle events. Hardening: - Mutex-serialize group mutations (matches ConfigStorage's node-id pattern). - Read-modify-write ACL/GroupKeyMap from a fresh fabric-filtered read so a pre-existing admin ACL entry is never dropped (no controller lock-out). - Validate group ids via GroupId.isApplicationGroupId; surface a failed AddGroupResponse status instead of silently succeeding. - Optional least-privilege ACL targets, persisted at group level. - reconcile_group verifies registry membership against each device's GroupKeyManagement.GroupTable and can repair drift by re-provisioning.
Verified the group implementation against the Matter Core Specification R1.4 (CSA Doc 23-27349) and annotate the spec-driven decisions: - application group id range 0x0001..0xFEFF (Core §2.5.4) - key set 0 reserved for the IPK (§4.17.3.5.1) - EpochStartTime in epoch-us; only 0 is illegal (§11.2.5.4, §11.2.7.1) - TrustFirst is the only Mandatory key security policy (§11.2.5.1) - Group ACL subject = uint64 carrying the 16-bit Group ID (§9.10.5.6); Operate is sufficient and Administer must never be granted (§6.6.6.2) No behavioural change — every assumption was confirmed correct.
df72640 to
b1f7695
Compare
Apollon77
left a comment
There was a problem hiding this comment.
Ok, I did a first review pass over the PR and beside concrete comments on places here some general review feedback:
- The whole logic misses error handling, especially when multiple writes or commands are executed for one action it leaks entries that are not removed in case of the failure of an other one. This is especially problematic because of the next topic ...
- Node limits for ACLs (AccessControlEntriesPerFabric, TargetsPerAccessControlEntry, SubjectsPerAccessControlEntry) or GKM (MaxGroupsPerFabric, MaxGroupKeysPerFabric - we usually have a max of 4 groups per device per fabric and max 3 -no 2! - group keys per device per fabric) are not checked or verified before trying to add new values. Group Limits are in 2.11.1.2. Group Limits and till Matter 1.6.0 most devices use these minimas as defaults. ACL limits are in 2.11.1.1. Additionally for sleepy ICD device the practical number of groups that will work with thread is 1 or maybe max 2. Access Control Limits and mean that (excluding the one Admin ACL) we have usually maximum 3 more ACL entries we can create
- With usual limits here also the idea to create one own group key set per group - which would be very good security wise - will not work with the matter device reality. So this requires a bit more thinking how to structure this or we need to start with exactly one group key set for all ... I need to think whats best
Yes I know also the current ACL logic does not check the size beforehand (I will add soon) and this was already the reason for failures with incomplete writes and so also different write success or error s with the matter.js server because here we write differently than python that also returned the write failures but noone ever checked that.
Additionally we need to adjust the python client code for the new command s and such (But can be done later when we are fine with the general structure and such).
We should also enhance the integration tests to really use them against the test device to have a practical proof
| * primitives. Values passed/returned here are native (already Matter-shaped), not | ||
| * the WebSocket tag-based representation. | ||
| */ | ||
| export interface GroupMemberOps { |
There was a problem hiding this comment.
Honestly, no need to map that to the matterjs-server logic that always adds an additional layer. the Group manager has the commissioningcontroller and the nodes available and can do the writes directly use await node.setStateOf(ClusterType, { name: value})
| * Optional ACL targets scoping the group's access on each member (least | ||
| * privilege). When omitted the group is granted Operate on all clusters. | ||
| */ | ||
| acl_targets?: AccessControlTarget[]; |
There was a problem hiding this comment.
acl targets already include the endpoint and cluster? they not really belong into the group record unless you want to pre-prime a group for a dedicated use case which is problematic anyway because of the endpoint Id ... and would only work for clusters.
I see that the code uses this to add acls for the joined member but I do not really get the real idea of that.
| requestArgs: Record<string, never>; | ||
| response: MatterGroupData[]; | ||
| }; | ||
| add_group_member: { |
There was a problem hiding this comment.
lets change to "join_group" because this way it is also called in matter and will later on be with new Groupcast coming
| requestArgs: { group_id: number; node_id: number | bigint; endpoint_id: number }; | ||
| response: MatterGroupData; | ||
| }; | ||
| remove_group_member: { |
| requestArgs: { group_id: number; node_id: number | bigint; endpoint_id: number }; | ||
| response: MatterGroupData; | ||
| }; | ||
| send_group_command: { |
There was a problem hiding this comment.
formally not needed because each group has it's own Matter node Id which contains the groupId, so I would favor to not have a special websocket command to send a command but just use the normal one with a group node id provided
| targets: desiredTargets, | ||
| fabricIndex, | ||
| }); | ||
| await this.#ops.writeNative(nodeId, EndpointNumber(0), AccessControl.id, "acl", next); |
| } | ||
|
|
||
| // 4. Grant the group its ACL entry (read-modify-write). | ||
| await this.#ensureGroupAcl(record, nodeId, fabricIndex); |
There was a problem hiding this comment.
if anything fails here nothing rolls back the installed group
| const keyMap = (await this.#readList(nodeId, GroupKeyManagement.id, GROUP_KEY_MAP_ATTRIBUTE_ID)).filter( | ||
| entry => Number((entry as { groupId?: number }).groupId) !== groupId, | ||
| ); | ||
| await this.#ops.writeNative(nodeId, EndpointNumber(0), GroupKeyManagement.id, "groupKeyMap", keyMap); |
| const acl = (await this.#readList(nodeId, AccessControl.id, ACL_ATTRIBUTE_ID)).filter( | ||
| entry => !this.#isGroupAclEntry(entry, groupId), | ||
| ); | ||
| await this.#ops.writeNative(nodeId, EndpointNumber(0), AccessControl.id, "acl", acl); |
| bluetooth_enabled: this.#commandHandler.bleEnabled, | ||
| ble_proxy_enabled: this.#commandHandler.bleProxyEnabled, | ||
| controller_node_id: this.#commandHandler.getCommissionerNodeId(), | ||
| groups_supported: true, |
There was a problem hiding this comment.
see above, better lets increase schema version but still support the old one ... and basically the python client needs also to be adjusted with the new version and code
Standards-compliant Matter multicast group support: one command drives all members via real IPv6 groupcast (not unicast fan-out). Server-side half of "let HA consume groups, map them as entities, command a group's members". OHF extension — superset of the Python Matter Server API (no group commands; python-matter-server#1071). Groupcast can't ride the unicast-only
device_command, and the operational group key is controller-fabric state, so these are first-class commands. Feature-detect viaServerInfoMessage.groups_supported.Commands (
ws-client):create_group,delete_group,get_groups,add_group_member,remove_group_member,send_group_command,reconcile_group;group_added/updated/removedevents.GroupManager+GroupRegistry(ws-controller): installs the operational group key on the controller's own fabric (matter.js doesn't persistFabricGroupsacross restarts — re-installed on boot from the registry; epoch key never leaves the server), provisions members over unicast (KeySetWrite→GroupKeyMap→Groups.AddGroup→ group ACL), and sends groupcasts via the matter.jsClientGrouppath.Verified against Matter 1.4 Core (CSA 23-27349): group-id range 0x0001–0xFEFF (§2.5.4), keyset 0 = IPK (§4.17.3.5.1), ACL Group subject = uint64 low-16 = GroupId + never Administer (§9.10.5.6/§6.6.6.2), EpochStartTime epoch-µs, only 0 illegal (§11.2.7.1), TrustFirst mandatory (§11.2.5.1).
Hardening: mutex-serialized mutations; ACL/
GroupKeyMapRMW from a fresh fabric-filtered read (never drops the admin ACL entry → no controller lock-out); surfaces failedAddGroupResponsestatus; optional least-privilege ACLtargets;reconcile_groupchecks registry vs each device'sGroupKeyManagement.GroupTableand repairs drift.Tests: format/build/lint ✓, 121/121 unit ✓ (26 new).⚠️ groupcast delivery not yet hardware-verified — needs a real LAN with IPv6 multicast (matter.js has no mock-multicast loopback).
Follow-ups (separate): HA-core entity integration; dashboard group UI.