Skip to content

feat: Matter multicast group support (controller + WS API)#756

Draft
cedricziel wants to merge 4 commits into
matter-js:mainfrom
cedricziel:feat/multicast-groups
Draft

feat: Matter multicast group support (controller + WS API)#756
cedricziel wants to merge 4 commits into
matter-js:mainfrom
cedricziel:feat/multicast-groups

Conversation

@cedricziel

Copy link
Copy Markdown

Standards-compliant Matter multicast group support: one command drives all members via real IPv6 groupcast (not unicast fan-out). Server-side half of "let HA consume groups, map them as entities, command a group's members". OHF extension — superset of the Python Matter Server API (no group commands; python-matter-server#1071). Groupcast can't ride the unicast-only device_command, and the operational group key is controller-fabric state, so these are first-class commands. Feature-detect via ServerInfoMessage.groups_supported.

Commands (ws-client): create_group, delete_group, get_groups, add_group_member, remove_group_member, send_group_command, reconcile_group; group_added/updated/removed events.

GroupManager + GroupRegistry (ws-controller): installs the operational group key on the controller's own fabric (matter.js doesn't persist FabricGroups across restarts — re-installed on boot from the registry; epoch key never leaves the server), provisions members over unicast (KeySetWriteGroupKeyMapGroups.AddGroup → group ACL), and sends groupcasts via the matter.js ClientGroup path.

Verified against Matter 1.4 Core (CSA 23-27349): group-id range 0x0001–0xFEFF (§2.5.4), keyset 0 = IPK (§4.17.3.5.1), ACL Group subject = uint64 low-16 = GroupId + never Administer (§9.10.5.6/§6.6.6.2), EpochStartTime epoch-µs, only 0 illegal (§11.2.7.1), TrustFirst mandatory (§11.2.5.1).

Hardening: mutex-serialized mutations; ACL/GroupKeyMap RMW from a fresh fabric-filtered read (never drops the admin ACL entry → no controller lock-out); surfaces failed AddGroupResponse status; optional least-privilege ACL targets; reconcile_group checks registry vs each device's GroupKeyManagement.GroupTable and repairs drift.

Tests: format/build/lint ✓, 121/121 unit ✓ (26 new). ⚠️ groupcast delivery not yet hardware-verified — needs a real LAN with IPv6 multicast (matter.js has no mock-multicast loopback).

Follow-ups (separate): HA-core entity integration; dashboard group UI.

@cedricziel

Copy link
Copy Markdown
Author

@Apollon77 - I intentionally marked this as draft as I wanted to get some feedback on the design here. I think there's some headroom to make groups manageable for consumers. - yes a client like HA can jump through all hoops and make create the key, provision it onto devices etc etc - but this PR would make it a ton easier and open opportunities for another ui in the dashboard. looking forward to your feedback :)

@Apollon77

Copy link
Copy Markdown
Collaborator

@cedricziel Thanks a lot for this contribution and in fact yes this is a big part of what I also have in mind (just in a different place ;-) ) and started working on it. but because this is bigger it needs a bit of time and we could start with this but then in 1-2 months remove big parts again when the real system included in matter.js is done. ... I also have locally some "Matter dashboard UI experiments" and how to expose it to where and how to configure it for end users is even a "different "thing to discuss then with the Home Assistant guys (including what's best where).

Can you contact me in discord for a sync? Might be easier than here.

Add the OHF-extension WebSocket surface for controller-managed Matter
multicast groups: create/delete/get groups, add/remove member,
send_group_command, and reconcile_group, plus group_added/updated/removed
events and the MatterGroupData / GroupReconciliation types. Advertise the
capability via ServerInfoMessage.groups_supported.

These are a superset of the Python Matter Server API (which has no group
commands; see home-assistant-libs/python-matter-server#1071): groupcast
send cannot be expressed via the unicast-only device_command, and the
operational group key lives on the controller's own fabric.
Store controller-managed groups (id, name, key set id, members, and the
operational epoch key) in dedicated storage so the controller can
re-install its fabric key material after a restart — matter.js does not
persist FabricGroups key sets across restarts. The epoch key is kept
server-side only and stripped from the API shape. Allocates application
group ids in the 0x0001..0xFEFF range and reserves key set 0 for the IPK.
Add GroupManager: installs the operational group key on the controller's
own fabric, provisions members over unicast (KeySetWrite, GroupKeyMap,
Groups.AddGroup, group ACL) and sends standards-compliant groupcasts via
the matter.js ClientGroup path. Wire the WebSocket commands and broadcast
group lifecycle events.

Hardening:
- Mutex-serialize group mutations (matches ConfigStorage's node-id pattern).
- Read-modify-write ACL/GroupKeyMap from a fresh fabric-filtered read so a
  pre-existing admin ACL entry is never dropped (no controller lock-out).
- Validate group ids via GroupId.isApplicationGroupId; surface a failed
  AddGroupResponse status instead of silently succeeding.
- Optional least-privilege ACL targets, persisted at group level.
- reconcile_group verifies registry membership against each device's
  GroupKeyManagement.GroupTable and can repair drift by re-provisioning.
Verified the group implementation against the Matter Core Specification
R1.4 (CSA Doc 23-27349) and annotate the spec-driven decisions:
- application group id range 0x0001..0xFEFF (Core §2.5.4)
- key set 0 reserved for the IPK (§4.17.3.5.1)
- EpochStartTime in epoch-us; only 0 is illegal (§11.2.5.4, §11.2.7.1)
- TrustFirst is the only Mandatory key security policy (§11.2.5.1)
- Group ACL subject = uint64 carrying the 16-bit Group ID (§9.10.5.6);
  Operate is sufficient and Administer must never be granted (§6.6.6.2)

No behavioural change — every assumption was confirmed correct.
@cedricziel cedricziel force-pushed the feat/multicast-groups branch from df72640 to b1f7695 Compare June 17, 2026 15:46

@Apollon77 Apollon77 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did a first review pass over the PR and beside concrete comments on places here some general review feedback:

  • The whole logic misses error handling, especially when multiple writes or commands are executed for one action it leaks entries that are not removed in case of the failure of an other one. This is especially problematic because of the next topic ...
  • Node limits for ACLs (AccessControlEntriesPerFabric, TargetsPerAccessControlEntry, SubjectsPerAccessControlEntry) or GKM (MaxGroupsPerFabric, MaxGroupKeysPerFabric - we usually have a max of 4 groups per device per fabric and max 3 -no 2! - group keys per device per fabric) are not checked or verified before trying to add new values. Group Limits are in 2.11.1.2. Group Limits and till Matter 1.6.0 most devices use these minimas as defaults. ACL limits are in 2.11.1.1. Additionally for sleepy ICD device the practical number of groups that will work with thread is 1 or maybe max 2. Access Control Limits and mean that (excluding the one Admin ACL) we have usually maximum 3 more ACL entries we can create
  • With usual limits here also the idea to create one own group key set per group - which would be very good security wise - will not work with the matter device reality. So this requires a bit more thinking how to structure this or we need to start with exactly one group key set for all ... I need to think whats best

Yes I know also the current ACL logic does not check the size beforehand (I will add soon) and this was already the reason for failures with incomplete writes and so also different write success or error s with the matter.js server because here we write differently than python that also returned the write failures but noone ever checked that.

Additionally we need to adjust the python client code for the new command s and such (But can be done later when we are fine with the general structure and such).

We should also enhance the integration tests to really use them against the test device to have a practical proof

* primitives. Values passed/returned here are native (already Matter-shaped), not
* the WebSocket tag-based representation.
*/
export interface GroupMemberOps {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, no need to map that to the matterjs-server logic that always adds an additional layer. the Group manager has the commissioningcontroller and the nodes available and can do the writes directly use await node.setStateOf(ClusterType, { name: value})

* Optional ACL targets scoping the group's access on each member (least
* privilege). When omitted the group is granted Operate on all clusters.
*/
acl_targets?: AccessControlTarget[];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acl targets already include the endpoint and cluster? they not really belong into the group record unless you want to pre-prime a group for a dedicated use case which is problematic anyway because of the endpoint Id ... and would only work for clusters.

I see that the code uses this to add acls for the joined member but I do not really get the real idea of that.

requestArgs: Record<string, never>;
response: MatterGroupData[];
};
add_group_member: {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets change to "join_group" because this way it is also called in matter and will later on be with new Groupcast coming

requestArgs: { group_id: number; node_id: number | bigint; endpoint_id: number };
response: MatterGroupData;
};
remove_group_member: {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here "leave_group"

requestArgs: { group_id: number; node_id: number | bigint; endpoint_id: number };
response: MatterGroupData;
};
send_group_command: {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formally not needed because each group has it's own Matter node Id which contains the groupId, so I would favor to not have a special websocket command to send a command but just use the normal one with a group node id provided

targets: desiredTargets,
fabricIndex,
});
await this.#ops.writeNative(nodeId, EndpointNumber(0), AccessControl.id, "acl", next);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write result ignored

}

// 4. Grant the group its ACL entry (read-modify-write).
await this.#ensureGroupAcl(record, nodeId, fabricIndex);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if anything fails here nothing rolls back the installed group

const keyMap = (await this.#readList(nodeId, GroupKeyManagement.id, GROUP_KEY_MAP_ATTRIBUTE_ID)).filter(
entry => Number((entry as { groupId?: number }).groupId) !== groupId,
);
await this.#ops.writeNative(nodeId, EndpointNumber(0), GroupKeyManagement.id, "groupKeyMap", keyMap);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response ignored

const acl = (await this.#readList(nodeId, AccessControl.id, ACL_ATTRIBUTE_ID)).filter(
entry => !this.#isGroupAclEntry(entry, groupId),
);
await this.#ops.writeNative(nodeId, EndpointNumber(0), AccessControl.id, "acl", acl);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too

bluetooth_enabled: this.#commandHandler.bleEnabled,
ble_proxy_enabled: this.#commandHandler.bleProxyEnabled,
controller_node_id: this.#commandHandler.getCommissionerNodeId(),
groups_supported: true,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, better lets increase schema version but still support the old one ... and basically the python client needs also to be adjusted with the new version and code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants