Skip to content

Epic: Fix event-based decryption liveness and batching behavior #698

@ylembachar

Description

@ylembachar

Summary

Event-based decryption currently batches multiple undecrypted fired identities into one decryption trigger / key-share / key-release flow.

This causes independent event identities to depend on each other. If one identity in the batch is missing shares, it can block key construction for other identities that already have enough shares. While service-level signatures are still present, batching also means Keyper must sign the exact same identity batch; if their local undecrypted views differ, signatures can split across different batch hashes and keys may not be released.

Event-based identities are independent, so batching should be treated only as a P2P optimization, not as a correctness requirement.

Problem

There are two separate steps affected by batching:

  1. Build the key

Building a decryption key should only require enough valid shares for that specific identity. It should not fail because another identity in the same event-based batch is missing shares.

  1. Publish the key

Right now, publishing keys requires threshold Keyper to sign the same batch of identities. If keypers have different local views of what is still undecrypted, they can sign different identity batches and no batch reaches threshold signatures.

The signatures prove threshold Keyper agreed to publish that exact batch. For event-based triggers, batch-level approval does not seem necessary because identities are independent and keys can be verified cryptographically.

Proposed Direction

For event-based triggers, stop batching identities.

one fired identity -> one decryption trigger -> one key-share message -> one key-release flow

This keeps the fix contained to the event-based implementation and avoids changing keyper core behavior, which treats batches atomically for other implementations such as Gnosis.

Relation to #617

Related: #617

Once #617 is fully implemented, service-level batch signature issues should be reduced or removed. However, removing event-based batching is a simpler contained fix that does not need to wait for that rollout.

Why not change the Keyper core first?

Keyper core is shared by multiple implementations. Some implementations may rely on atomic batch behavior.

For Gnosis, slot batches are intended to be atomic: either all selected transactions for the slot are decrypted or none are. Gnosis key-share messages also include signatures over the identities being decrypted, and keys messages include those signatures. If a keys message only contains a subset of the originally signed identities, those signatures would be invalid for the subset.

So this epic should not change Gnosis behavior or Keyper core semantics as the first step.

Message Volume

Removing batching increases P2P messages:

current: 1 message with N identities
new: N messages with 1 identity each

We are not too worried initially because API-side rate limiting already bounds event trigger registration, but fired-trigger bursts/backlogs can still increase message volume. We should test this on Chiado and add metrics/logging where useful.

Time-Based Follow-Up

Time-based triggers also batch multiple identities, so the same core behavior can theoretically happen there. The impact is different because time-based batching is tied to timestamp windows, while event-based uses the un-decrypted fired-trigger backlog and stale identities can keep polluting future batches.

This epic focuses first on event-based triggers, but time-based batching should be reviewed separately.

Sub-Issues to create

  • Remove event-based batching / one identity per event-based message.
  • Add tests for independent event-based identity decryption.
  • Add mise-based event decryption test task.
  • Test message volume on Chiado.
  • Add/verify observability for event-based key-share/key-release flows.
  • Document or implement stale event-backlog cleanup.
  • Investigate missing key shares / P2P delivery.
  • Revisit time-based batching
  • Review keyper core partial-batch behavior later, likely around Remove signatures from service keyper messages #617 follow-up phases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions