Skip to content

Seed the canonical chain with the first block in the monitored head set#193

Merged
Chengxuan merged 5 commits into
mainfrom
seed-canonical-chain
Jun 17, 2026
Merged

Seed the canonical chain with the first block in the monitored head set#193
Chengxuan merged 5 commits into
mainfrom
seed-canonical-chain

Conversation

@peterbroadhurst

Copy link
Copy Markdown
Contributor

When we restart EVMConnect with a significant number of confirmations, it takes a long time for the block listener to confirm any new blocks, or to allow listeners to restart catchup.

This is because we are beginning creating the canonical in-memory chain from the head, and that means the number of confirmations worth of blocks (say 20 for example) need to arrive before we have enough blocks in-memory to confirm any block.

Whereas in reality, there might be a number of blocks mined during the since we stopped (maybe 100s or 1000s) that are fully confirmed now.

This PR proposes that we:

  • Seed the canonical chain on startup with a single block that is the monitored head length back from the highest block
  • Let it build using reconcileCanonicalChain on the first loop iteration
  • Then go into the same mode we would normally go into

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
@peterbroadhurst peterbroadhurst requested a review from a team May 13, 2026 01:28
Comment thread pkg/ethblocklistener/blocklistener.go Outdated
Comment on lines +372 to +387
if seedBi != nil {
notifyPos = bl.reconcileCanonicalChain(seedBi)
seedBi = nil
} else {
rpcErr := bl.backend.CallRPC(bl.ctx, &blockHashes, "eth_getFilterChanges", filter)
if rpcErr != nil {
if etherrors.MapError(etherrors.FilterRPCMethods, rpcErr.Error()) == ffcapi.ErrorReasonNotFound {
log.L(bl.ctx).Warnf("Block filter '%v' no longer valid. Recreating filter: %s", filter, rpcErr.Message)
filter = ""
gapPotential = true
}
log.L(bl.ctx).Errorf("Failed to query block filter changes: %s", rpcErr.Message)
failCount++
continue
}
log.L(bl.ctx).Errorf("Failed to query block filter changes: %s", rpcErr.Message)
failCount++
continue
log.L(bl.ctx).Debugf("Block filter received new block hashes: %+v", blockHashes)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights the trade-off difference between the current design and the proposed changes.

Current design builds the in-memory chain from the current head:

  1. The in-memory chain always starts from the head block
  2. No catch-up work is carried out for in-memory chain building. Therefore, minimal delay in indexing new blocks

The proposed design:

  1. The in-memory chain sometimes starts from the head block (current head is less than the expected length / fail to get the seed block information), and sometimes starts from a historical block.
  2. Catch-up is required for the in-memory chain. Depending on how many historical blocks need to be fetched and how quickly they can be fetched, there could be a visible delay in fetching any new blocks from the chain

The current design serves a primary purpose of tracking the new blocks in a timely manner and optionally provides an in-memory chain for efficient caching.

The proposed design compromises the primary purpose.

@peterbroadhurst I wonder whether the bug you are trying to fix is in the logic that uses the in-memory chain? If it sees that the head of the in-memory chain is already sufficient to confirm an item, it should then fetch the blocks that are not in the in-memory chain and complete the confirmation itself, rather than wait for the in-memory chain. <--- This should already be the behaviour of ReconcileConfirmationsForTransaction function that the connector provides.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in-memory chain always starts from the head block

No. There is no change here.

The current design is the in-memory canonical chain starts from the monitored head length back from the head of the chain. This just changes how it's constructed.

@peterbroadhurst peterbroadhurst May 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be the behaviour of ReconcileConfirmationsForTransaction function that the connector provides.

This sounds like the misunderstanding @Chengxuan .
The code never goes back and removes the first block in the in-memory list - unless it's no longer on the blockchain (or wraps the monitored head length obviously).

So this new code is just to ensure this block is established on startup at the beginning of the monitored head length, rather than the end of it.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
@peterbroadhurst

Copy link
Copy Markdown
Contributor Author

Note I'm seeing intermittent errors on builds relating to timing, which has been a historical problem in this package.
Seeing if I can provide a separate PR into this branch to add the readability of the tests to make it easier to diagnose and fix those issues.

Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Simplify the blocklistener_test.go with wrappers on mocks
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
@Chengxuan

Chengxuan commented May 14, 2026

Copy link
Copy Markdown
Contributor

@peterbroadhurst I feel I'm missing a trivial point here. Let me use an example to illustrate my understanding.

Example

At the connector start-up time, the latest block from the chain is 1000. I have a transaction that's mined in block 1, requires 5 confirmations.

Current behaviour:

  • a. Connector starts up,
      1. obtains canonicalChainLock
      1. builds an in-memory chain with 1 block (1000).
      1. releases canonicalChainLock
  • b. Consumer calls ReconcileConfirmationsForTransaction(1)
      1. obtains canonicalChainLock
      1. snapshots the in-memory chain
      1. releases canonicalChainLock
      1. because the first block of the in-memory chain (1000) > 6 (1 + 5), downloads block 2-6 (not under lock and in a separate go routine out of the listener)

^^ even with no new blocks indexed, confirmation should still happen through catchup

Proposed behaviour:

  • c. Connector starts up,
      1. obtains canonicalChainLock
      1. builds an in-memory chain with 50 blocks (default in-memory chain length) starting from block 951.
      1. releases canonicalChainLock
  • d. Consumer calls ReconcileConfirmationsForTransaction(1)
      1. obtains canonicalChainLock
      1. snapshots the in-memory chain
      1. releases canonicalChainLock
      1. because the first block of the in-memory chain (951) > 6 (1 + 5), downloads block 2-6 (not under lock and in a separate go routine out of the listener)

The difference to me is:

  1. The in-memory chain downloaded 49 extra blocks on startup. But I don't see how it matters to the confirmation action.
  2. The download of 49 extra blocks(c.2) increases the initialization time of the in-memory chain. Any function trying to obtain the canonicalChainLock will get delayed.
    • The delay becomes more visible if the connected JSON-RPC endpoint has a low rate limit (e.g. 5 request / second). <--- this is already a concern for today if the canonical chain has to rebuild due to a chain fork, but with this PR, the block listener will ALWAYS rebuild the canonical chain on every start-up.

@peterbroadhurst

peterbroadhurst commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for discussing this with me @Chengxuan - and I hope we've agreed that your analysis of one potential use of this code fulfilling it's expected behavior, is accurate, but also not a reason to continue with the code not doing the job it promises to do.

Function of code:

  • To maintain a window at the head of the chain that is monitored

Bug fixed by this PR:

  • After restart it does not attempt to recover to and perform its job

Indirect implication of bug (which we over indexed on, sorry):

  • A consumer of this module assuming it is going to do it's job as stated and maintain that window, would block for hours waiting for enough of the monitored chain head to build back up again naturally by accumulation of new blocks.

The area we got stuck in the conversation, and thanks for working through:

  • It might not make sense for consumers to assume they need the code to be functioning as expected after restart, if they aren't anywhere close to the monitored window, and they should just go ahead an do their catchup anyway.
    • No disagreement with this, and that optimization is encouraged by consumers, rather than waiting for the (potentially few seconds) it takes for this code to rebuild the monitored chain head after restart

@Chengxuan Chengxuan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbroadhurst, thanks for the explanation on the behaviour that's being fixed.

Changes look good to me. One minor comment about the current logic is that it still relies on at least 1 new block to trigger the rebuild of the floating window.

Leaving it for you to consider the severity of that gap

gapPotential = true
var notifyPos *list.Element
if seedBi != nil {
notifyPos = bl.reconcileCanonicalChain(seedBi)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Given the rebuild/catch-up of the in-memory chain relies on the new block not matching the existing tail block in the in-memory chain. So this first iteration here will only initiate the in-memory chain. The actual rebuild happens in the next iteration if at least 1 block hash has been discovered.

@Chengxuan Chengxuan merged commit 21d38d5 into main Jun 17, 2026
4 checks passed
@Chengxuan Chengxuan deleted the seed-canonical-chain branch June 17, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants