feat: End-to-End Encryption with historical key sharing#5
Open
axel-krapotke wants to merge 39 commits intomainfrom
Open
feat: End-to-End Encryption with historical key sharing#5axel-krapotke wants to merge 39 commits intomainfrom
axel-krapotke wants to merge 39 commits intomainfrom
Conversation
…dling - Store device_id in HttpAPI credentials (was missing) - Attempt token refresh on any M_UNKNOWN_TOKEN 401, not just soft_logout - Fail clearly if E2EE is enabled but no device_id is available - Remove unsafe ODIN_DEVICE fallback
- OlmMachine is now initialized once and reused (no duplicate key uploads) - MatrixClient factory keeps a shared CryptoManager instance - Playground 'open' command reuses the existing client instead of creating a new one
- Add setRoomEncryption() to CryptoManager (calls OlmMachine.setRoomSettings) - Parse m.room.encryption in roomStateReducer - Project.hydrate() registers encrypted rooms with CryptoManager - Pass encryption state through structure-api hierarchy - Pass cryptoManager through Project constructor params Without this, the OlmMachine had no knowledge of which rooms were encrypted, causing shareRoomKey() to fail silently and Element clients unable to decrypt.
- Use queryKeysForUsers() to explicitly fetch device keys before key sharing - Fix member filtering: use content.membership and state_key (was wrong path) - Add debug logging to entire E2EE encrypt flow - Process outgoing requests after key sharing to ensure delivery The previous flow relied on outgoingRequests() returning a KeysQuery after updateTrackedUsers(), which doesn't always happen. Now we explicitly query device keys, ensuring the OlmMachine knows all devices before shareRoomKey().
Sends a standard m.room.message event directly through the command queue, bypassing the ODIN operation wrapper. Useful for testing E2EE with Element.
- New method: initializeWithStore(userId, deviceId, storeName, passphrase) Uses StoreHandle.open() + OlmMachine.initFromStore() for persistent crypto state (Olm/Megolm sessions survive restarts) - New method: close() releases store handle and OlmMachine - New getter: isPersistent - Original initialize() (in-memory) preserved for backwards compatibility - Tests: persistent store API surface, close() cleanup, post-close errors - StoreHandle requires IndexedDB (Electron/browser only, not Node.js)
- Docker Compose setup with jevolk/tuwunel:latest (~27MB) - Minimal tuwunel.toml (no federation, open registration) - Full E2EE flow tested against real homeserver: - Register users, upload device keys - Create encrypted room, join, key exchange - Alice encrypts → sends → Bob syncs → decrypts - npm run test:e2e (skips gracefully if no homeserver running) - Regular 'npm test' unaffected (unit tests only)
Tests the actual API components against Tuwunel, not raw fetch: - HttpAPI: processOutgoingCryptoRequests(), sendOutgoingCryptoRequest() - StructureAPI: room creation with m.room.encryption state - CommandAPI: encrypt + send via CryptoManager pipeline - TimelineAPI: sync → receiveSyncChanges → decrypt m.room.encrypted - Full round-trip: Alice sends 3 encrypted msgs, Bob decrypts all 3 All tests use real HttpAPI with ky, real CryptoManager with OlmMachine, against a real Tuwunel homeserver via Docker.
Tests now go through the real API stack as ODIN uses it:
Layer 1 - HttpAPI + CryptoManager:
- processOutgoingCryptoRequests() uploads device keys
- sendOutgoingCryptoRequest() handles KeysQuery
Layer 2 - StructureAPI:
- createProject({ encrypted: true }) sets m.room.encryption
- createLayer({ encrypted: true }) sets m.room.encryption
- createProject() without encrypted does NOT set encryption
Layer 3 - CommandAPI:
- schedule() + run() automatically encrypts sendMessageEvent
- Verifies server sees m.room.encrypted (not plaintext ODIN type)
Layer 4 - TimelineAPI:
- syncTimeline() transparently decrypts m.room.encrypted back to
io.syncpoint.odin.operation with decrypted=true flag
Full Stack:
- Alice creates encrypted layer (StructureAPI)
- Sends 2 ODIN operations (CommandAPI)
- Bob receives + decrypts both (TimelineAPI)
When CryptoManager is active, TimelineAPI now automatically: 1. BEFORE sync: Injects 'm.room.encrypted' into the server-side filter types. Without this, the server silently drops all encrypted events because it only sees the envelope type, not the original event type (e.g. io.syncpoint.odin.operation). 2. AFTER decrypt: Re-applies the original type constraint as a client-side filter. Since m.room.encrypted is a catch-all, any event type could be inside. The post-decrypt filter ensures only expected types pass through. This is fully transparent to ODIN — no filter changes needed in Project.content() or Project.start() filterProvider. Affected paths: - syncTimeline(): sync filter + catch-up filter augmented - content(): history replay filter augmented + decrypt + post-filter - Original filter is never mutated (deep clone)
MatrixClient encryption options extended:
encryption: {
enabled: true,
storeName: 'crypto-<projectUUID>', // IndexedDB name
passphrase: '<decrypted passphrase>' // encrypts the store
}
When storeName is provided, uses initializeWithStore() (IndexedDB-backed,
crypto state survives restarts). Without it, falls back to in-memory
(for testing or non-browser environments).
This is the integration point for ODIN: Project-services.js passes
storeName + passphrase from LevelDB/safeStorage, and the API handles
the rest transparently.
Some homeservers (e.g. Tuwunel) place room creation state events exclusively in the timeline rather than the state block on initial sync. We now merge state events with state-bearing timeline events (those with state_key) before reducing, with timeline taking precedence per the Matrix spec.
The join result now includes an 'encrypted' flag derived from the project's encryption state. This allows the caller to persist the E2EE setting per project when accepting an invitation.
Tuwunel may omit the timeline object entirely for rooms with no new timeline events, unlike Synapse which always includes it.
When a new user joins an encrypted layer room, the existing member (who is streaming) detects the m.room.member join event and: 1. Queries the new user's device keys 2. Establishes Olm sessions 3. Exports all historical Megolm session keys for the room 4. Encrypts them per-device using Olm (encryptToDeviceEvent) 5. Sends them as m.room.encrypted to_device messages On the receiving side, receiveSyncChanges() detects the custom io.syncpoint.odin.room_keys event type after Olm decryption and imports the keys via importRoomKeys(). This enables the joining user to decrypt all existing content during the initial replay/catch-up. Also: - Add exportRoomKeys(roomId) and importRoomKeys() to CryptoManager - Generalize HttpAPI.sendToDevice() to accept arbitrary message maps
Problem: If Alice shares an encrypted layer with content and Bob joins later (possibly while Alice is offline), Bob cannot decrypt historical events because keys were only shared on join (requiring Alice to be online). Solution: Share keys at TWO points: 1. At share time (shareLayer): Alice sends all Megolm session keys to ALL project members via to_device. These are queued server-side, so even if Bob is offline he receives them on next sync. 2. At join time (membershipChanged): Safety net that catches any keys created between share and join. Both paths use the new _shareHistoricalKeysWithProjectMembers() helper which handles device key query, Olm session establishment, and per-device Olm-encrypted to_device delivery.
The historical key share must happen after content has been encrypted and sent, not before (otherwise no Megolm session keys exist yet). Changes: - shareHistoricalKeys() now schedules a callback in the command queue that runs after all preceding content posts - CommandAPI supports async callback functions in the queue - Removed premature key sharing from shareLayer() (room is empty there) - Key sharing still fires on member join as safety net
syncTimeline now collects state events (from both state block and timeline state events) and returns them as stateEvents alongside timeline events. project.start() processes state events and emits a 'selfJoined' event when the current user's own m.room.member join is detected. This enables reliable content loading after join — the server has fully processed the join before we attempt to load content.
The Olm-encrypted approach failed because the WASM OlmMachine zeroizes content of decrypted to_device events it doesn't recognize. New approach: - Send exported Megolm session keys as unencrypted custom to_device events (type: io.syncpoint.odin.room_keys) - Intercept these events in receiveSyncChanges() BEFORE passing to OlmMachine, import keys via importRoomKeys() - Keys are the same exported format as server-side key backup Also fixes receiveSyncChanges() result parsing (WASM objects with .rawEvent, not plain JSON). Includes integration test (content-after-join.test.mjs) that validates the full ODIN flow: create encrypted layer → post content → share keys → Bob joins → Bob decrypts all content.
Reverts the unencrypted approach. Keys are now properly: - Olm-encrypted per-device via device.encryptToDeviceEvent() - Sent as m.room.encrypted to_device events - Decrypted by OlmMachine on receiving side - Extracted from DecryptedToDeviceEvent.rawEvent (JSON string) - content field is a JSON string that needs double-parse The previous approach failed because we didn't handle: 1. WASM return objects (need .rawEvent accessor, not JSON.parse) 2. Double-stringified content (encryptToDeviceEvent stringifies, rawEvent contains it as string) Tests verify both encrypted and unencrypted content loading.
Content loading after join is handled directly in toolbar.js. The selfJoined approach via stream didn't work due to filter timing. stateEvents collection in timeline-api remains (useful for future).
With E2EE, ODIN operations are sent as m.room.encrypted instead of io.syncpoint.odin.operation. Without an explicit power level for m.room.encrypted, it falls back to events_default (100 = ADMIN), causing 403 for CONTRIBUTORs (power level 25). Set m.room.encrypted to CONTRIBUTOR level in both layer and project room creation.
content() filtered out the current user's events (not_senders). This is correct for the live stream (own changes are already local), but on re-join after leave the local store is empty — ALL events are needed to reconstruct the layer state, including our own.
…layground CLI Also fix stale powerlevel unit tests to match current role definitions.
Add interactive device verification via Short Authentication String (SAS).
Both users see 7 matching emojis and confirm to verify each other's devices.
CryptoManager methods:
- requestVerification(userId, deviceId) — initiate verification
- getVerificationRequest(userId, flowId) — get pending request
- getVerificationRequests(userId) — list all requests for a user
- acceptVerification(request) — accept incoming request (SAS method)
- startSas(request) — transition to SAS flow
- getSas(request) — get SAS state machine from request
- getEmojis(sas) — get 7 emoji objects {symbol, description}
- confirmSas(sas) — confirm emojis match (marks device verified)
- cancelSas(sas) / cancelVerification(request) — cancel flow
- isDeviceVerified(userId, deviceId) — check trust status
- getDeviceVerificationStatus(userId) — all devices with trust info
- getVerificationPhase(request) — human-readable phase name
Exports: VerificationMethod, VerificationRequestPhase
Test: sas-verification.test.mjs validates the complete flow against Tuwunel.
Also: cleaned up duplicate JSDoc comments in shareHistoricalRoomKeys.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive End-to-End Encryption support to the matrix-client-api, including a novel historical key sharing mechanism that ensures late joiners can decrypt existing content.
What's New
E2EE Core
@matrix-org/matrix-sdk-crypto-wasmOlmMachineconsole.*callsHistorical Key Sharing
m.room.encryptedto_device eventsreceiveSyncChanges()and imports viaimportRoomKeys()Tuwunel Compatibility
timelineobject in sync responseBug Fixes
m.room.encryptedpower level set to CONTRIBUTOR level (was falling back to events_default = ADMIN)content()no longer filters out own events (not_senders), fixing re-join state reconstructionProjectList.join()returns encryption status so joiners persist the E2EE flagsendToDevice()generalized to support arbitrary user/device message mapsDocumentation & Testing
Breaking Changes
HttpAPI.sendToDevice()signature changed:(eventType, txnId, messages)instead of(deviceId, eventType, content, txnId)content()no longer usesnot_sendersfilter (includes own events for correct re-join state)Test Results