MF3-L15: fix(nitronode): fix reorg double-spend via confirmation gate#832
MF3-L15: fix(nitronode): fix reorg double-spend via confirmation gate#832nksazonov wants to merge 23 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR implements a comprehensive reorg defense mechanism for EVM event processing. It introduces a confirmation gate that delays event delivery by a configurable per-chain duration while simultaneously implementing reconciliation logic to detect and recover from chain reorganizations. The system persists block hashes, walks historical blocks to find canonical ancestors, and provides duplicate detection to prevent re-processing events during recovery. ChangesReorg Defense with Confirmation Gating
Sequence DiagramsequenceDiagram
participant Listener
participant ConfirmationGate
participant Poller
participant ChannelHubReactor
participant Store
Note over Listener,Store: Event Arrival
Listener->>ConfirmationGate: HandleEvent (live log)
ConfirmationGate->>ConfirmationGate: Fetch block timestamp (cached)
ConfirmationGate->>ConfirmationGate: Queue entry: (txHash, logIndex, blockHash, arrivalTime)
Note over ConfirmationGate: Waits for confirmation delay
Note over Listener,Store: Removed Log (pre-gate reorg)
Listener->>ConfirmationGate: HandleEvent (removed=true)
ConfirmationGate->>ConfirmationGate: Find & remove queued entry by (txHash, logIndex, blockHash)
Note over ConfirmationGate: Entry canceled, not forwarded
Note over Listener,Store: Confirmation Delay Elapsed
Poller->>ConfirmationGate: Poll (background goroutine)
ConfirmationGate->>ConfirmationGate: Check if arrivalTime + delay ≤ now
ConfirmationGate->>ChannelHubReactor: Forward to handler
ChannelHubReactor->>Store: IsContractEventProcessed check
Store-->>ChannelHubReactor: false (not duplicate)
ChannelHubReactor->>Store: StoreContractEvent with BlockHash
ConfirmationGate->>ConfirmationGate: Record in recentlyForwarded TTL map
Note over Listener,Store: Removed Log (post-gate reorg)
Listener->>ConfirmationGate: HandleEvent (removed=true, after forward)
ConfirmationGate->>ConfirmationGate: Check recentlyForwarded map
ConfirmationGate->>ConfirmationGate: Log post-gate reorg warning
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ihsraham
left a comment
There was a problem hiding this comment.
Thanks for putting the confirmation-gate path together. The uninterrupted live path is moving in the right direction: Removed logs now reach the gate, pending logs can be cancelled before processing, and the tests cover the basic pre-gate cancellation cases.
I would not close L-15 on this version yet. A few paths still either bypass the confirmation window or weaken the old listener retry behavior, so a reorg/restart or transient reactor failure can still leave the node in the wrong state. I left the closure blockers inline.
| ChannelHubAddress string `yaml:"channel_hub_address"` | ||
| // ChannelHubSigValidators maps validator IDs to the addresses of signature validators for the ChannelHub contract on this blockchain | ||
| ChannelHubSigValidators map[uint8]string `yaml:"channel_hub_sig_validators"` | ||
| // ConfirmationDelaySecs is the number of seconds to wait before processing an event. |
There was a problem hiding this comment.
This default leaves the mitigation off unless every deployed config is updated. ConfirmationDelaySecs stays at 0 when the YAML key is omitted, and 0 makes the gate transparent. The checked-in prod/sandbox/stress blockchains.yaml files do not set confirmation_delay_secs, so the deployed node would still process live deposits immediately.
For L-15 closure, please make the confirmation delay non-zero in shipped configs or enforce a safe default/validation for enabled EVM chains.
There was a problem hiding this comment.
Yes, I will set those for sure before the merge
There was a problem hiding this comment.
Rechecked current HEAD (3512884). This still looks open in the diff: the checked-in prod-v1, sandbox-v1, and stress-v1 blockchains.yaml files still omit confirmation_delay_secs, and omission still decodes as 0, so main.go takes the no-gate path.
A pre-merge config change is fine operationally, but for the audit closure I would want this represented in the PR itself or enforced through validation/defaults. Also, nitronode/docs/reorg-fix.md currently documents confirmation_delay_sec singular, while the YAML tag/API field are confirmation_delay_secs plural. Following the doc as written would also leave the field unset.
There was a problem hiding this comment.
Still open on current head (1bc0d565). The field-name/docs mismatch is fixed now, but the shipped chart configs still do not set confirmation_delay_secs.
Since the loader leaves the omitted field as 0 and main.go skips NewConfirmationGate when the delay is zero, the prod/sandbox/stress deploy path still processes live deposits immediately. The deploy workflow also passes those exact files via --set-file=config.blockchains=.../blockchains.yaml, so I would keep this open until the PR either sets non-zero values in those configs or enforces a safe default/validation for ChannelHub-enabled chains.
| // when the handler or the event-presence check fails. | ||
| func (l *Listener) listenEvents(ctx context.Context) error { | ||
| lastBlock, err := l.eventGetter.GetLatestContractEventBlockNumber(l.contractAddress.String(), l.blockchainID) | ||
| lastBlock, err := findCommonAncestor(ctx, l.client, l.eventGetter, l.contractAddress.String(), l.blockchainID, l.logger) |
There was a problem hiding this comment.
This new return value is later treated the same as an empty store, but findCommonAncestor also uses 0 when every stored block was reorged out. In that full-depth reorg case the listener skips historical replay entirely, leaving stale event-derived DB state in place instead of rebuilding from a canonical range.
Please distinguish the empty-store case from the all-stored-blocks-reorged case, or replay from the configured start/genesis boundary when reconciliation returns this because no stored block is still canonical.
There was a problem hiding this comment.
Acknowledged in 2ead611: added a code comment at the prevHash == "" branch and a §4.4 note explaining the empty-store / full-depth-reorg conflation is intentional; full-depth reorg to genesis is treated as a chain-level incident outside the gate's scope
There was a problem hiding this comment.
Re-reading this after the latest commits, I do not think the code comment quite matches the operational case. This branch is reached whenever every stored event block is no longer canonical. That is not necessarily a chain-level reorg to genesis: a fresh node, or a chain with only one processed ChannelHub event, can hit it on a shallow reorg of that one event block.
In that case findCommonAncestor returns 0, and listenEvents treats lastBlock == 0 as “skip historical logs fetching”, so canonical replacement logs that already exist between the start boundary and the subscription tip are missed. For L-15 closure I would still prefer distinguishing empty store from “no stored event block is canonical” and replaying from a configured start/genesis boundary for the latter.
There was a problem hiding this comment.
Fixed in 4709879: findCommonAncestor now splits the all-walk-exhausted branch — empty store (latestNum == 0) still skips replay; non-empty store with all blocks reorged (prevNum == 0 && prevHash == "" && latestNum > 0) returns the original latestNum as the resume anchor, so the listener re-fetches canonical replacements via eth_getLogs from the orphaned height (the orphan hash is discarded). Empty-store and all-reorged are no longer conflated.
There was a problem hiding this comment.
The split helps, but I think there is still one replay boundary left here. In the all-stored-blocks-reorged branch we now return latestNum; if the canonical replacement block is already the current tip at that same height, listenEvents calls reconcileBlockRange(currentBlock, lastBlock) with equal values and reconcileBlockRange never enters for currentBlock > startBlock.
That means the replacement log at the resume height is not fetched. Could we make the replay range include the resume height for this case, or return a start height that guarantees the orphaned height is queried?
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
pkg/blockchain/evm/mock_test.go (1)
123-129: ⚡ Quick winAdd doc comments to newly added exported mock methods.
These exported methods should include Go doc comments to comply with repository Go guidelines.
Suggested patch
+// HeaderByHash mocks EVMClient.HeaderByHash. func (m *MockEVMClient) HeaderByHash(ctx context.Context, hash common.Hash) (*types.Header, error) { args := m.Called(ctx, hash) if args.Get(0) == nil { return nil, args.Error(1) } return args.Get(0).(*types.Header), args.Error(1) } +// GetLatestContractEventBlockHashAndNumber mocks ContractEventGetter.GetLatestContractEventBlockHashAndNumber. func (m *MockContractEventGetter) GetLatestContractEventBlockHashAndNumber(contractAddress string, blockchainID uint64) (uint64, string, error) { args := m.Called(contractAddress, blockchainID) return args.Get(0).(uint64), args.String(1), args.Error(2) } +// GetPreviousDistinctBlockHash mocks ContractEventGetter.GetPreviousDistinctBlockHash. func (m *MockContractEventGetter) GetPreviousDistinctBlockHash(contractAddress string, blockchainID uint64, belowBlockNumber uint64) (uint64, string, error) { args := m.Called(contractAddress, blockchainID, belowBlockNumber) return args.Get(0).(uint64), args.String(1), args.Error(2) }As per coding guidelines, “Run
gofmtand write doc comments on all exported Go functions.”Also applies to: 146-154
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/blockchain/evm/mock_test.go` around lines 123 - 129, Add Go doc comments for the newly exported mock methods so they comply with repository guidelines: document MockEVMClient.HeaderByHash (and the other exported mock methods referenced around lines 146-154) with a brief sentence starting with the method name, describing its purpose and behavior in tests (e.g., that it returns a mocked *types.Header or an error via the testify mock). Ensure each comment is placed immediately above the corresponding method and follows Go doc comment style ("HeaderByHash ...").Source: Coding guidelines
nitronode/reorg-fix-spec.md (1)
319-323: 💤 Low valueAdd language identifier to fenced code block.
The fenced code block lacks a language identifier. While this is pseudocode rather than a specific programming language, adding an identifier (e.g.,
textorpseudocode) improves rendering and satisfies the markdownlint rule.✨ Suggested fix
-``` +```text tx_log_index = l.Index - min(l.Index for all logs of l.TxHash in this block)</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@nitronode/reorg-fix-spec.mdaround lines 319 - 323, The fenced code block
containing the pseudocode "tx_log_index = l.Index - min(l.Index for all logs of
l.TxHash in this block)" needs a language identifier to satisfy markdownlint and
improve rendering; update the block so the opening fence includes a language tag
such as "text" or "pseudocode" (e.g., ```text) while leaving the content and
symbols like tx_log_index, l.Index and l.TxHash unchanged.</details> <!-- cr-comment:v1:076974948adca715997203cf --> _Source: Linters/SAST tools_ </blockquote></details> <details> <summary>pkg/blockchain/evm/interface.go (1)</summary><blockquote> `47-48`: _⚡ Quick win_ **Clarify the canonicality verification limitation.** The comment states that `HeaderByHash` is used "to verify block canonicality," but `HeaderByHash` alone does not prove a block is in the canonical chain—it only proves the block exists in the node's database. Depending on the backend, a reorged block may still be retrievable by hash if the node retains side-chain headers temporarily. The reconciler implementation must compare the retrieved header's hash with the canonical header at that block number (via `HeaderByNumber`) to confirm canonicality. Consider updating the comment to reflect that `HeaderByHash` is used as part of canonicality verification, not as the sole check. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/blockchain/evm/interface.go` around lines 47 - 48, Update the comment on HeaderByHash to clarify it is only one part of canonicality checks: state that HeaderByHash retrieves a header by hash (verifies the block exists in the node DB) but does not alone prove inclusion in the canonical chain, and that callers (e.g., the reconciler) must compare the retrieved header against the canonical header at that block number using HeaderByNumber to confirm canonicality; reference HeaderByHash and HeaderByNumber by name in the comment. ``` </details> <!-- cr-comment:v1:3b71ed1b23f7e01b1ace884d --> </blockquote></details> <details> <summary>pkg/blockchain/evm/reconciler_test.go (1)</summary><blockquote> `17-24`: _🏗️ Heavy lift_ **Consider adding a test case for side-chain header detection.** Once the reconciler is updated to compare `HeaderByHash` results with `HeaderByNumber` (see the critical issue flagged in `reconciler.go`), add a test case where: - `HeaderByHash(storedHash)` returns a non-nil header (side-chain block still in database) - `HeaderByNumber(blockNum)` returns a different canonical hash - Expected behavior: the reconciler walks backward instead of accepting the side-chain block This will ensure the canonicality verification logic is fully covered. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/blockchain/evm/reconciler_test.go` around lines 17 - 24, Add a unit test in reconciler_test.go that verifies side-chain header detection by stubbing the blockchain client so HeaderByHash(storedHash) returns a valid header but HeaderByNumber(blockNum) returns a different canonical hash; call the reconciler's reconcile method (Reconciler.Reconcile or the method under test in reconciler.go) with the stored header and assert that the reconciler does not accept the side-chain block and instead walks backward/reattempts canonical verification (i.e., triggers the same backward-walking code path used when HeaderByNumber disagrees with HeaderByHash). Use the existing test constants (testContract, testBlockchainID) and new mocks for HeaderByHash and HeaderByNumber to ensure the canonicality branch is exercised. ``` </details> <!-- cr-comment:v1:a98befbcb6b90fb3c13f98f5 --> </blockquote></details> <details> <summary>pkg/blockchain/evm/listener_test.go (1)</summary><blockquote> `350-353`: _⚡ Quick win_ **Use deterministic signaling instead of fixed sleeps in these new tests.** These `time.Sleep(...)` cancels are timing-sensitive and can become flaky in slower CI workers. Prefer channel/WaitGroup-driven cancellation once expected events are observed. Also applies to: 401-405 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/blockchain/evm/listener_test.go` around lines 350 - 353, Replace the timing-sensitive goroutine that calls time.Sleep(...) then cancel() with deterministic signaling: have the test wait for the expected event(s) using a channel or sync.WaitGroup (e.g., an observedEventCh or wg) and call cancel() only after that signal is received; update both occurrences around the goroutine that currently sleeps and calls cancel() (and the similar block at the 401-405 region) so tests drive cancellation explicitly when the observed condition completes rather than relying on fixed sleeps. ``` </details> <!-- cr-comment:v1:0cc26647a8308c263f4d29e7 --> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.Inline comments:
In@nitronode/reorg-fix-spec.md:
- Line 242: The statement claiming "Zero RPC calls in the gate" is incorrect
because the gate uses blockTimestampFetcher which issues eth_getBlockByHash RPC
calls (one per unique block, with caching); update the text to remove the
absolute "Zero RPC calls" claim and instead state that the gate does not perform
chain queries per request but does fetch block timestamps via
blockTimestampFetcher using eth_getBlockByHash with caching to minimize RPCs.
Ensure you reference the gate and blockTimestampFetcher behavior and mention the
caching semantics.In
@nitronode/store/memory/blockchain_config.go:
- Around line 46-49: verifyBlockchainsConfig currently accepts any uint32 for
ConfirmationDelaySecs which can pause processing; add bounds checking in
verifyBlockchainsConfig to enforce a valid range (0..780 seconds) by returning
an error if blockchainConfig.ConfirmationDelaySecs is greater than the maximum
meaningful value (use a named constant like MaxConfirmationDelaySecs = 780).
Ensure you update any error messages to include the field name
(ConfirmationDelaySecs) and the invalid value so callers can correct configs.
Nitpick comments:
In@nitronode/reorg-fix-spec.md:
- Around line 319-323: The fenced code block containing the pseudocode
"tx_log_index = l.Index - min(l.Index for all logs of l.TxHash in this block)"
needs a language identifier to satisfy markdownlint and improve rendering;
update the block so the opening fence includes a language tag such as "text" or
"pseudocode" (e.g., ```text) while leaving the content and symbols like
tx_log_index, l.Index and l.TxHash unchanged.In
@pkg/blockchain/evm/interface.go:
- Around line 47-48: Update the comment on HeaderByHash to clarify it is only
one part of canonicality checks: state that HeaderByHash retrieves a header by
hash (verifies the block exists in the node DB) but does not alone prove
inclusion in the canonical chain, and that callers (e.g., the reconciler) must
compare the retrieved header against the canonical header at that block number
using HeaderByNumber to confirm canonicality; reference HeaderByHash and
HeaderByNumber by name in the comment.In
@pkg/blockchain/evm/listener_test.go:
- Around line 350-353: Replace the timing-sensitive goroutine that calls
time.Sleep(...) then cancel() with deterministic signaling: have the test wait
for the expected event(s) using a channel or sync.WaitGroup (e.g., an
observedEventCh or wg) and call cancel() only after that signal is received;
update both occurrences around the goroutine that currently sleeps and calls
cancel() (and the similar block at the 401-405 region) so tests drive
cancellation explicitly when the observed condition completes rather than
relying on fixed sleeps.In
@pkg/blockchain/evm/mock_test.go:
- Around line 123-129: Add Go doc comments for the newly exported mock methods
so they comply with repository guidelines: document MockEVMClient.HeaderByHash
(and the other exported mock methods referenced around lines 146-154) with a
brief sentence starting with the method name, describing its purpose and
behavior in tests (e.g., that it returns a mocked *types.Header or an error via
the testify mock). Ensure each comment is placed immediately above the
corresponding method and follows Go doc comment style ("HeaderByHash ...").In
@pkg/blockchain/evm/reconciler_test.go:
- Around line 17-24: Add a unit test in reconciler_test.go that verifies
side-chain header detection by stubbing the blockchain client so
HeaderByHash(storedHash) returns a valid header but HeaderByNumber(blockNum)
returns a different canonical hash; call the reconciler's reconcile method
(Reconciler.Reconcile or the method under test in reconciler.go) with the stored
header and assert that the reconciler does not accept the side-chain block and
instead walks backward/reattempts canonical verification (i.e., triggers the
same backward-walking code path used when HeaderByNumber disagrees with
HeaderByHash). Use the existing test constants (testContract, testBlockchainID)
and new mocks for HeaderByHash and HeaderByNumber to ensure the canonicality
branch is exercised.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro **Run ID**: `cb3e38c7-cee9-4e41-abb7-c1eb13c2e53f` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 8eea5664c4380dff4bd14c67f25eefad737cae67 and 256463668d96c054e7a8301d7c1a9b4dfd6440b0. </details> <details> <summary>⛔ Files ignored due to path filters (1)</summary> * `sdk/ts/test/unit/__snapshots__/public-api-drift.test.ts.snap` is excluded by `!**/*.snap` </details> <details> <summary>📒 Files selected for processing (26)</summary> * `nitronode/api/node_v1/utils.go` * `nitronode/config/migrations/postgres/20260608000000_add_block_hash_to_contract_events.sql` * `nitronode/main.go` * `nitronode/reorg-fix-spec.md` * `nitronode/store/database/contract_event.go` * `nitronode/store/database/interface.go` * `nitronode/store/memory/blockchain_config.go` * `nitronode/store/memory/memory_store.go` * `pkg/blockchain/evm/channel_hub_reactor.go` * `pkg/blockchain/evm/channel_hub_reactor_test.go` * `pkg/blockchain/evm/confirmation_gate.go` * `pkg/blockchain/evm/confirmation_gate_test.go` * `pkg/blockchain/evm/interface.go` * `pkg/blockchain/evm/listener.go` * `pkg/blockchain/evm/listener_test.go` * `pkg/blockchain/evm/mock_test.go` * `pkg/blockchain/evm/reconciler.go` * `pkg/blockchain/evm/reconciler_test.go` * `pkg/core/event.go` * `pkg/core/types.go` * `pkg/rpc/types.go` * `sdk/go/utils.go` * `sdk/go/utils_test.go` * `sdk/ts/src/core/types.ts` * `sdk/ts/src/rpc/types.ts` * `sdk/ts/src/utils.ts` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Reworks the gate test suite for the new ConfirmationGate API (no fetcher arg, error-returning constructor, BlockTimestamp-driven arrivedAt) and adds coverage for the new behaviors: tombstone-skip, FIFO-eviction with early-delete tolerance, single-timer reschedule, kick-during-pending-timer, shutdown-with-non-empty-queue. Adds listener_test cases for the new ensureBlockTimestamp helper (populated path, fetch path, single-entry cache hit, fetch error) and updates existing listener tests that previously fed events with BlockTimestamp == 0. Updates reorg-fix-spec.md sections 4.1, 4.3, 6.1, 6.2, 6.3, 6.5, 6.6, 6.7 to describe the timer+kick design, tombstone-map queue, forwardedSet/forwardedQueue + FIFO eviction with value-check, and the listener-owned ensureBlockTimestamp fallback (replacing the gate's removed block-timestamp fetcher/cache).
…tractEventProcessed
ihsraham
left a comment
There was a problem hiding this comment.
Re-reviewed current head. The age-based historical routing, canonical HeaderByNumber check, and fatal error propagation look addressed. I am not approving yet because the checked-in config still leaves the gate disabled, and I left one more no-gate Removed:true path note inline.
ihsraham
left a comment
There was a problem hiding this comment.
Requesting changes for L-15 closure. The main implementation is much closer now: no-gate Removed:true logs no longer reach the reactor, recent historical events route through the gate, canonicality uses HeaderByNumber, and handler failures after dequeue now restart instead of being swallowed. I resolved/reacted on the threads that look addressed.
I would still keep L-15 open on three points: shipped configs still decode to confirmation_delay_secs: 0, pending gate entries can survive a reconnect if the removal signal is missed, and the all-stored-blocks-reorged replay path still misses the same-height replacement boundary. I replied on the existing config/full-depth threads and left one new inline comment for the reconnect case.
| l.logger.Debug("skipping already present current event", "blockchainID", l.blockchainID, "blockNumber", eventLog.BlockNumber, "logIndex", eventLog.Index) | ||
| continue | ||
| if !eventLog.Removed { | ||
| *lastBlock = eventLog.BlockNumber |
There was a problem hiding this comment.
This still leaves a reconnect gap for pending gate entries. The listener advances lastBlock before the gate has committed the event, and HandleEvent only queues it. If the subscription drops after an orphaned live log is queued but before the matching Removed:true arrives, the next loop replays only canonical logs and never sends a cancellation for that pending entry. The old pending entry can then mature in the gate and reach the reactor even though its block is no longer canonical.
Could we keep the resume cursor tied to committed events, clear/reconcile pending gate entries on reconnect, or have replay explicitly cancel queued entries that no longer appear canonically before they can mature?
Summary
Implements the reorg double-spend fix specified in
nitronode/reorg-fix-spec.md(the canonical implementation spec for this PR — start there for design rationale, the residual-finality trade-off, and the per-component contracts).The Nitronode previously credited a user's off-chain balance the instant it observed a deposit event on-chain. A subsequent reorg would remove the on-chain deposit while the off-chain credit persisted, allowing an attacker to drain that credit to a receiver before the node could detect the reorg. This PR introduces a per-chain confirmation window, a startup reconciliation walk over stored block hashes, and a reactor-level idempotency pre-check that together close the window and make replay safe.
Spec at a glance
confirmation_delay_secsper chain inblockchains.yamlRemoved:true, replace-on-re-addblock_hashcolumn + common-ancestor walk viaeth_getBlockByHashConfirmationGatedesignconfirmation_delay_secsonnode.v1.GetConfigrecentlyForwardedmap, WARN signalIsContractEventProcessedskips already-committed eventsarrivedAtfrom block time so historical replay matures immediatelyChanges
On-chain event pipeline (
pkg/blockchain/evm/)ConfirmationGate(new): in-memory buffer between the Listener and reactor. Holds events forconfirmation_delay_secskeyed by block timestamp; cancels onRemoved:truearrivals; logs WARN on post-gate reorgs.Listener: now accepts two handlers —eventHandler(live, through gate) andhistoricalEventHandler(Phase 1 replay, direct to reactor). Historical events bypass the gate per spec §4.4 step 5, avoiding spuriousHeaderByHashcalls during startup.Removed:truelogs are forwarded to the live handler (the gate) instead of being silently dropped, so the gate can cancel pending entries.ChannelHubReactor: pre-check via newIsContractEventProcessed(txHash, logIndex, blockchainID)(§6.6) — converts what was a constraint-violation rollback on the reconciliation replay path into a clean INFO exit.findCommonAncestor(new inreconciler.go): startup walk over storedblock_hashvalues, callingeth_getBlockByHashuntil it finds a still-canonical block; treats pre-migration empty hashes as canonical fallback.Storage (
nitronode/store/database/,pkg/core/)20260608000000_add_block_hash_to_contract_events.sql— addsCHAR(66) NOT NULL DEFAULT ''column.core.BlockchainEventgainsBlockHash; reactors populate it on every store.IsContractEventProcessed,GetLatestContractEventBlockHashAndNumber,GetPreviousDistinctBlockHash.Config & API surface
core.Blockchain.ConfirmationDelaySecs(uint32) flows fromblockchains.yaml→ memory store →node.v1.GetConfig→BlockchainInfoV1.sdk/go) and TS SDK (sdk/ts) updated to expose the field; SDK tests updated.Wiring (
nitronode/main.go)Listener → ConfirmationGate → reactorfor live events;Listener → reactordirectly for Phase 1.Tests
confirmation_gate_test.go— 13 tests (T1–T13) covering delay=0 transparent path, normal queue/forward, pre-gate cancel, out-of-order re-delivery, post-gate reorg WARN, unknown removal, block-timestamp bypass, partial delay, fetcher error fallback, cache dedup, shutdown, eviction, ordering.reconciler_test.go— 7 tests covering canonical, single reorg, walk-to-genesis, pre-migration latest row, pre-migration mid-walk, RPC error, no-stored-events.channel_hub_reactor_test.go—TestChannelHubReactor_HandleEvent_AlreadyProcessedandTestChannelHubReactor_HandleEvent_PreCheckErrorcover the §6.6 pre-check.listener_test.go—TestListener_RemovedLog_ForwardedToHandlerverifies Removed-log routing;TestListener_PhaseHandlerRoutingverifies Phase 1 / Phase 2 handler split.Notes for reviewers
44efbe45— anywhere they currently appear to diverge, the spec is the intended source of truth and the code should match it (or both should be updated together).log_indexwhich is not stable across reorgs that change the tx's position within a block. Business-logic idempotency in the existing handlers covers this case; tx-relativelog_indexis filed as a follow-up task in the spec.Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests