Summary
Event-based decryption currently batches multiple undecrypted fired identities into one decryption trigger / key-share / key-release flow.
This causes independent event identities to depend on each other. If one identity in the batch is missing shares, it can block key construction for other identities that already have enough shares. While service-level signatures are still present, batching also means Keyper must sign the exact same identity batch; if their local undecrypted views differ, signatures can split across different batch hashes and keys may not be released.
Event-based identities are independent, so batching should be treated only as a P2P optimization, not as a correctness requirement.
Problem
There are two separate steps affected by batching:
- Build the key
Building a decryption key should only require enough valid shares for that specific identity. It should not fail because another identity in the same event-based batch is missing shares.
- Publish the key
Right now, publishing keys requires threshold Keyper to sign the same batch of identities. If keypers have different local views of what is still undecrypted, they can sign different identity batches and no batch reaches threshold signatures.
The signatures prove threshold Keyper agreed to publish that exact batch. For event-based triggers, batch-level approval does not seem necessary because identities are independent and keys can be verified cryptographically.
Proposed Direction
For event-based triggers, stop batching identities.
one fired identity -> one decryption trigger -> one key-share message -> one key-release flow
This keeps the fix contained to the event-based implementation and avoids changing keyper core behavior, which treats batches atomically for other implementations such as Gnosis.
Relation to #617
Related: #617
Once #617 is fully implemented, service-level batch signature issues should be reduced or removed. However, removing event-based batching is a simpler contained fix that does not need to wait for that rollout.
Why not change the Keyper core first?
Keyper core is shared by multiple implementations. Some implementations may rely on atomic batch behavior.
For Gnosis, slot batches are intended to be atomic: either all selected transactions for the slot are decrypted or none are. Gnosis key-share messages also include signatures over the identities being decrypted, and keys messages include those signatures. If a keys message only contains a subset of the originally signed identities, those signatures would be invalid for the subset.
So this epic should not change Gnosis behavior or Keyper core semantics as the first step.
Message Volume
Removing batching increases P2P messages:
current: 1 message with N identities
new: N messages with 1 identity each
We are not too worried initially because API-side rate limiting already bounds event trigger registration, but fired-trigger bursts/backlogs can still increase message volume. We should test this on Chiado and add metrics/logging where useful.
Time-Based Follow-Up
Time-based triggers also batch multiple identities, so the same core behavior can theoretically happen there. The impact is different because time-based batching is tied to timestamp windows, while event-based uses the un-decrypted fired-trigger backlog and stale identities can keep polluting future batches.
This epic focuses first on event-based triggers, but time-based batching should be reviewed separately.
Sub-Issues to create
Summary
Event-based decryption currently batches multiple undecrypted fired identities into one decryption trigger / key-share / key-release flow.
This causes independent event identities to depend on each other. If one identity in the batch is missing shares, it can block key construction for other identities that already have enough shares. While service-level signatures are still present, batching also means Keyper must sign the exact same identity batch; if their local undecrypted views differ, signatures can split across different batch hashes and keys may not be released.
Event-based identities are independent, so batching should be treated only as a P2P optimization, not as a correctness requirement.
Problem
There are two separate steps affected by batching:
Building a decryption key should only require enough valid shares for that specific identity. It should not fail because another identity in the same event-based batch is missing shares.
Right now, publishing keys requires threshold Keyper to sign the same batch of identities. If keypers have different local views of what is still undecrypted, they can sign different identity batches and no batch reaches threshold signatures.
The signatures prove threshold Keyper agreed to publish that exact batch. For event-based triggers, batch-level approval does not seem necessary because identities are independent and keys can be verified cryptographically.
Proposed Direction
For event-based triggers, stop batching identities.
one fired identity -> one decryption trigger -> one key-share message -> one key-release flow
This keeps the fix contained to the event-based implementation and avoids changing keyper core behavior, which treats batches atomically for other implementations such as Gnosis.
Relation to #617
Related: #617
Once #617 is fully implemented, service-level batch signature issues should be reduced or removed. However, removing event-based batching is a simpler contained fix that does not need to wait for that rollout.
Why not change the Keyper core first?
Keyper core is shared by multiple implementations. Some implementations may rely on atomic batch behavior.
For Gnosis, slot batches are intended to be atomic: either all selected transactions for the slot are decrypted or none are. Gnosis key-share messages also include signatures over the identities being decrypted, and keys messages include those signatures. If a keys message only contains a subset of the originally signed identities, those signatures would be invalid for the subset.
So this epic should not change Gnosis behavior or Keyper core semantics as the first step.
Message Volume
Removing batching increases P2P messages:
current: 1 message with N identities
new: N messages with 1 identity each
We are not too worried initially because API-side rate limiting already bounds event trigger registration, but fired-trigger bursts/backlogs can still increase message volume. We should test this on Chiado and add metrics/logging where useful.
Time-Based Follow-Up
Time-based triggers also batch multiple identities, so the same core behavior can theoretically happen there. The impact is different because time-based batching is tied to timestamp windows, while event-based uses the un-decrypted fired-trigger backlog and stale identities can keep polluting future batches.
This epic focuses first on event-based triggers, but time-based batching should be reviewed separately.
Sub-Issues to create