Skip to content

MF3-L19: fix: prevent SC reentrancy + event-handler monotonicity#837

Open
nksazonov wants to merge 9 commits into
fix/audit-findings-finalx3from
fix/mf3-l19
Open

MF3-L19: fix: prevent SC reentrancy + event-handler monotonicity#837
nksazonov wants to merge 9 commits into
fix/audit-findings-finalx3from
fix/mf3-l19

Conversation

@nksazonov

@nksazonov nksazonov commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Addresses audit finding MF3-L19 (reentrancy in ChannelHub lifecycle entrypoints) with paired contract-side and Nitronode-side fixes, plus supporting documentation.

  • fix(contracts) — added nonReentrant to every external/public lifecycle entrypoint in ChannelHub.sol (root-cause remediation). Migrated guards upward from internal fund helpers since OZ ReentrancyGuard uses a single shared status slot. New regression tests for the four reentrancy scenarios.
  • fix(nitronode) — added version-monotonicity guards to six previously guard-less event handlers (HandleHomeChannelCheckpointed, HandleHomeChannelClosed, four escrow Initiated/Finalized handlers) via a shared helper. Drops stale events with a structured warn log instead of regressing local state or silently clearing challenges.
  • feat(nitronode)ChainStateRefresher interface + EVM implementation. On home-channel guard drop, fetches authoritative on-chain state via getChannelData and overwrites the local row. Defense-in-depth against any class of out-of-order delivery (indexer mis-order, reorg replay, future contract changes).
  • docs — new sections in contracts/SECURITY.md, docs/protocol/security-and-limitations.md, nitronode/README.md; new deployment-constraint matrix at contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md recording per-chain hook-token support; WARNING blocks in the three nitronode/chart/config/*/assets.yaml files.

Follow-up tracked at #836 (scheduler dedup for blockchain_actions).

Test plan

  • go build ./... — clean
  • go vet ./nitronode/... ./pkg/core/... ./pkg/blockchain/evm/... — clean
  • go test ./nitronode/event_handlers/... ./pkg/blockchain/evm/... ./pkg/core/... -count=1 — green
  • forge test — 292 passed, 0 failed (288 pre-existing + 4 new reentrancy regression tests)
  • forge fmt --check — clean

Summary by CodeRabbit

Release Notes

  • Security Enhancements

    • Added reentrancy protection to all channel lifecycle operations, preventing cross-function reentrancy attacks via token hooks
    • Node now detects out-of-order blockchain events and automatically refreshes state from the chain to ensure convergence
  • Known Limitations

    • Hook-enabled tokens (ERC777, ERC1363, and certain non-standard ERC20s) are not supported on currently deployed ChannelHub instances; node operators must not onboard such tokens

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b965674-c1f1-40d2-b239-f62e6348c7e0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds two defense-in-depth layers: (1) in ChannelHub.sol, nonReentrant is moved from internal helpers (_pullFunds, _pushFunds, _nonRevertingPushFunds) to all external/public lifecycle entry points; (2) in the node, EventHandlerService gains a ChainStateRefresher dependency that fetches authoritative on-chain channel snapshots when a stale StateVersion event is detected, overwriting local state within the event-processing transaction instead of silently diverging.

Changes

ChannelHub Reentrancy Protection and Node State Convergence

Layer / File(s) Summary
Core types and ChainStateRefresher interface
pkg/core/types.go, pkg/core/interface.go
Defines RefreshedChannel struct (Status, StateVersion, ChallengeExpiresAt, LastStateUserSig) and ChainStateRefresher interface with RefreshChannelFromChain.
ChannelHub.sol: nonReentrant shifted to external entry points
contracts/src/ChannelHub.sol
Adds nonReentrant to all 17 external/public lifecycle functions (node deposits/withdrawals, channel create/deposit/withdraw/checkpoint/challenge/close, escrow initiate/challenge/finalize, migration initiate/finalize, purgeEscrowDeposits) and removes it from _pullFunds, _pushFunds, _nonRevertingPushFunds.
EVMChainStateRefresher: on-chain snapshot reader
pkg/blockchain/evm/chain_state_refresher.go
Implements EVMChainStateRefresher with per-chain ChannelHubCaller registry; RefreshChannelFromChain resolves home chain, fetches GetChannelData, maps on-chain status enums to core.ChannelStatus, and returns a RefreshedChannel.
EventHandlerService: monotonicity guards and refresh
nitronode/event_handlers/service.go
Adds refresher field to EventHandlerService; introduces guardEventVersionMonotonic and refreshAfterDroppedEvent helpers; wires guards into HandleHomeChannelCheckpointed, HandleHomeChannelChallenged, HandleHomeChannelClosed, and four escrow handlers.
main.go: per-chain caller wiring
nitronode/main.go
Builds channelHubCallers map and resolveChannelChain closure, constructs EVMChainStateRefresher, and passes it to NewEventHandlerService; registers a ChannelHubCaller per configured blockchain at startup.
Solidity reentrancy regression tests
contracts/test/mocks/ReentrantERC20.sol, contracts/test/ChannelHub_reentrancy.t.sol
Adds ReentrantERC20 mock with arm/fire single-use reentrancy via transferFrom; four Forge scenarios (createChannel→depositToChannel, depositToChannel→checkpointChannel, challengeChannel→closeChannel, initiateEscrowDeposit→purgeEscrowDeposits) each assert ReentrancyGuardReentrantCall selector on inner call.
Go event handler tests
nitronode/event_handlers/service_test.go
Updates stale-challenged test to use MockChainStateRefresher; adds newTestEventHandlerServiceWithRefresher helper; introduces §E.1–§E.14 suite covering regression-drop, equal/higher-version acceptance, guard-drop chain refresh, refresher error bubbling, rescue idempotency, and equal-version replay no-side-effects.
Security docs and operator warnings
contracts/SECURITY.md, contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md, docs/protocol/security-and-limitations.md, nitronode/README.md, nitronode/chart/config/*/assets.yaml
Adds invariants 8 and 26 to SECURITY.md; introduces deployment hook-token compatibility matrix; adds off-chain convergence section and known-limitations entry to protocol docs; documents monotonicity behavior in README; inserts warning comments in three YAML configs.

Sequence Diagram(s)

sequenceDiagram
  participant TokenHook as Malicious Token Hook
  participant ChannelHub
  participant EventHandlerService
  participant EVMChainStateRefresher
  participant ChannelStore

  Note over ChannelHub: External entry point protected by nonReentrant
  TokenHook->>ChannelHub: transferFrom callback → reenter lifecycle function
  ChannelHub-->>TokenHook: ReentrancyGuardReentrantCall revert

  Note over EventHandlerService: On-chain event received out-of-order
  ChannelHub--)EventHandlerService: lifecycle event (stale StateVersion)
  EventHandlerService->>EventHandlerService: guardEventVersionMonotonic → dropped
  EventHandlerService->>EVMChainStateRefresher: RefreshChannelFromChain(ctx, channelID)
  EVMChainStateRefresher->>ChannelHub: GetChannelData(channelID)
  ChannelHub-->>EVMChainStateRefresher: on-chain Status, StateVersion, ChallengeExpiresAt, sig
  EVMChainStateRefresher-->>EventHandlerService: RefreshedChannel
  EventHandlerService->>ChannelStore: UpdateChannel(refreshed fields) within tx
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • layer-3/nitrolite#720: Both PRs modify HandleHomeChannelChallenged in nitronode/event_handlers/service.go to handle stale StateVersion events, with this PR replacing the warn-and-skip introduced by that one with chain-refresh convergence.
  • layer-3/nitrolite#651: Both PRs touch reentrancy/transfer logic in ChannelHub.sol around _pushFunds and _nonRevertingPushFunds, with this PR removing nonReentrant from those internal helpers while the earlier PR changed transfer-success detection in the same functions.
  • layer-3/nitrolite#666: Both PRs modify the same external lifecycle entry points in ChannelHub.sol (createChannel, withdrawFromChannel, checkpointChannel, closeChannel), with this PR adding nonReentrant and the earlier PR adjusting payable/msg.value validation.

Suggested reviewers

  • dimast-x
  • ihsraham

Poem

🐇 Hop, hop — no sneaky re-entry today!
The guard blocks the hook before it can play.
If events arrive stale and out of order,
We fetch from the chain and restore proper border.
Monotonic versions keep the state in line —
This rabbit's defense-in-depth works out just fine! 🛡️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly summarizes the main changes: adding smart contract reentrancy protection and event-handler monotonicity guards to address audit finding MF3-L19.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mf3-l19

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nksazonov nksazonov changed the title MF3-L19: prevent reentrancy + event-handler monotonicity MF3-L19: fix: prevent SC reentrancy + event-handler monotonicity Jun 15, 2026
@nksazonov

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md (1)

1-35: ⚠️ Potential issue | 🔴 Critical

Update deployment matrix with correct testnet addresses and commits; verify git tags exist.

Lines 20–34 contain outdated deployment data for three testnet chains:

  1. Chain 80002 (Polygon Amoy)

    • Matrix lists: address 0x5dba8515af063db0c243c15ece7b99f91459c7c3, commit b88d511c
    • Artifacts show: address 0x55D6f0A0322606447fbc612Cf58014Faed65aF9D, commit fd394085
  2. Chain 84532 (Base Sepolia)

    • Matrix lists: address 0x5dba8515af063db0c243c15ece7b99f91459c7c3, commit b88d511c
    • Artifacts show: address 0x6E2C4707DA119425dF2c722E2695300154652f56, commit 6c0a41d5
  3. Chain 11155111 (Sepolia)

    • Matrix lists: address 0x5dba8515af063db0c243c15ece7b99f91459c7c3, commit b88d511c
    • Artifacts show: address 0xCe87FD88F4B5Fd5475d163e2642C5c2c7dD655Ec, commit 0e5cd5b7

Additionally, git tags prod v1.3.0 and sandbox v1.3.0 do not exist in the repository; only the commits themselves are present. The referenced commits (both e07ad9c2 and b88d511c) do exist, but verify how deployment tags should be recorded.

All configured chains from blockchains.yaml (prod-v1 and sandbox-v1) are represented in the matrix; production deployments match their artifacts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md` around lines 1 - 35,
Update the deployment matrix in the HOOK-TOKEN-COMPATIBILITY.md file for three
testnet chains with correct data from artifacts. For chain 80002 (Polygon Amoy),
replace the ChannelHub Address with 0x55D6f0A0322606447fbc612Cf58014Faed65aF9D
and the Deploy Commit with fd394085. For chain 84532 (Base Sepolia), replace the
ChannelHub Address with 0x6E2C4707DA119425dF2c722E2695300154652f56 and the
Deploy Commit with 6c0a41d5. For chain 11155111 (Sepolia), replace the
ChannelHub Address with 0xCe87FD88F4B5Fd5475d163e2642C5c2c7dD655Ec and the
Deploy Commit with 0e5cd5b7. Additionally, verify the Deploy Tag column entries:
the git tags prod v1.3.0 and sandbox v1.3.0 do not exist in the repository, so
determine the correct tag naming convention or record strategy and update all
tag entries accordingly.
🧹 Nitpick comments (3)
docs/protocol/security-and-limitations.md (1)

56-57: 💤 Low value

Clarify escrow refresh deferral: when will it be implemented, and what is the risk window?

Line 58 states: "Escrow event handlers ship the version guard without the refresh hook; the cross-chain RPC plumbing required for escrow refresh is a deferred follow-up item."

This leaves escrow rows potentially divergent from chain until the next on-chain event arrives. To help operators understand the risk:

  1. Add an estimated timeline: When is escrow refresh planned? (e.g., "tracked in issue #XXX", "planned for v1.4.0")
  2. Quantify the window: Clarify that the divergence window is bounded by "until the next escrow event arrives" and that escrow lifecycles are typically short (specific duration would help).
  3. Operational guidance: Should operators monitor for escrow state divergence in logs, or is this purely internal and invisible?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/protocol/security-and-limitations.md` around lines 56 - 57, The
documentation at line 58 regarding deferred escrow refresh lacks important
context for operators. Update the deferred follow-up item statement to include:
(1) an estimated timeline or tracking reference (e.g., issue number or planned
version), (2) explicit clarification that the divergence window is bounded by
the arrival of the next escrow event and note typical escrow lifecycle duration
to help quantify the risk, and (3) operational guidance indicating whether
operators should actively monitor escrow state divergence in logs or treat this
as an internal implementation detail. These additions will provide the risk
visibility needed for operators to understand the impact and scope of this
deferral.
contracts/SECURITY.md (2)

556-557: ⚡ Quick win

Reentrancy-via-inbound-hooks section references current off-chain policy; clarify enforcement boundary.

Line 556 states: "The guard makes already-deployed contracts that historically operated under an off-chain 'no hook-bearing tokens' policy safe against this class of attack at the contract layer; the off-chain policy remains in effect for already-deployed contracts at specific addresses as a defense-in-depth measure pending operator-controlled token onboarding decisions."

This dual-layer defense (contract-level nonReentrant + off-chain policy) is sensible, but the statement conflates:

  1. What the contract now guarantees: nonReentrant prevents hook-reentrancy attacks even if hook-bearing tokens are deployed.
  2. What the off-chain policy does: Operators choose not to onboard hook tokens anyway.

Clarify that:

  • The nonReentrant guard is a hard protection at the contract layer going forward.
  • The off-chain "no hook tokens" policy is a defense-in-depth layer for already-deployed contracts, not a substitute.
  • New ChannelHub deployments should rely on the contract layer alone; the off-chain policy is a legacy safeguard.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@contracts/SECURITY.md` around lines 556 - 557, Clarify the relationship
between the two defense layers against reentrancy-via-inbound-hooks in the
security document. The current text conflates the contract-level nonReentrant
guard with the off-chain token policy, making it unclear which is the hard
protection. Rewrite the relevant paragraph to explicitly state that nonReentrant
is a permanent, hard contract-level protection that prevents hook-reentrancy
attacks regardless of token type, the off-chain "no hook-bearing tokens" policy
is a legacy defense-in-depth measure for historically-deployed contracts at
specific addresses, and new ChannelHub deployments should rely solely on the
contract-layer guard without depending on operator-controlled token onboarding
decisions to prevent this attack class.

54-60: ⚡ Quick win

Clarify invariant numbering: new invariant 8 inserted mid-list displaces existing structure.

Invariant 8 (lines 54–60) is inserted before the "Formal Invariants List" section, which begins at line 75 with invariants numbered 1–7. This creates ambiguity:

  • If invariants 1–7 in the "Formal Invariants List" section are the canonical primary invariants, then the new behavior description (version monotonicity) logically belongs in that section, not above it.
  • If the description above (lines 1–53) contains informal invariants 1–7, then adding a new invariant 8 mid-document before the formal section is confusing.

Consider either:

  1. Moving the version-monotonicity behavior (lines 54–60) into the "Formal Invariants List" section and renumbering it appropriately (e.g., as invariant 8 in the formal list), or
  2. Clarifying in the section header that the behavior descriptions above are informal and separate from the numbered formal invariants below.

Renumbering for consistency is recommended.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@contracts/SECURITY.md` around lines 54 - 60, The new version-monotonicity
behavior description for channel-lifecycle events is numbered as "Invariant 8"
and placed before the "Formal Invariants List" section (which contains
invariants 1–7), creating structural confusion about whether these are informal
or formal invariants. Move the version-monotonicity behavior description
(currently at lines 54–60) into the "Formal Invariants List" section and
renumber it as "Invariant 8" within that formal section to maintain consistent
numbering and clear document structure. Alternatively, if keeping the
description in its current location, add a clarifying section header to
explicitly distinguish informal behavior descriptions from the numbered formal
invariants list that follows.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@contracts/SECURITY.md`:
- Around line 56-57: Align the escrow refresh deferral descriptions across all
three files to use consistent terminology and include the interim divergence
context. At contracts/SECURITY.md lines 56-57, update the phrase "follow-up
item" to match the more detailed language used in
docs/protocol/security-and-limitations.md lines 50-60, which explains this is a
"deferred follow-up item" and includes the contextual detail: "Pending its
arrival, escrow rows can remain divergent from chain across an interim window
until the next on-chain event arrives." Apply the same terminology and interim
divergence explanation to nitronode/README.md lines 40-48. Alternatively, if a
tracking issue exists for this deferral, add consistent tracking issue
references to all three locations instead of describing the interim window
directly.

In `@nitronode/main.go`:
- Around line 85-97: The `channelHubCallers` map is passed to
`evm.NewEVMChainStateRefresher` where it is held as a reference. During the loop
that follows, the map is written to at line 131 while goroutine-based listeners
are started at line 149. These listeners call `RefreshChannelFromChain` which
reads the same map, creating a concurrent map read/write data race as listener
goroutines from earlier iterations read the map while later iterations write to
it. Fix this by restructuring the loop to complete all map registrations before
starting any listeners, or by protecting the map with a synchronization
mechanism like a mutex around all reads and writes in both the loop and the
`RefreshChannelFromChain` function.

---

Outside diff comments:
In `@contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md`:
- Around line 1-35: Update the deployment matrix in the
HOOK-TOKEN-COMPATIBILITY.md file for three testnet chains with correct data from
artifacts. For chain 80002 (Polygon Amoy), replace the ChannelHub Address with
0x55D6f0A0322606447fbc612Cf58014Faed65aF9D and the Deploy Commit with fd394085.
For chain 84532 (Base Sepolia), replace the ChannelHub Address with
0x6E2C4707DA119425dF2c722E2695300154652f56 and the Deploy Commit with 6c0a41d5.
For chain 11155111 (Sepolia), replace the ChannelHub Address with
0xCe87FD88F4B5Fd5475d163e2642C5c2c7dD655Ec and the Deploy Commit with 0e5cd5b7.
Additionally, verify the Deploy Tag column entries: the git tags prod v1.3.0 and
sandbox v1.3.0 do not exist in the repository, so determine the correct tag
naming convention or record strategy and update all tag entries accordingly.

---

Nitpick comments:
In `@contracts/SECURITY.md`:
- Around line 556-557: Clarify the relationship between the two defense layers
against reentrancy-via-inbound-hooks in the security document. The current text
conflates the contract-level nonReentrant guard with the off-chain token policy,
making it unclear which is the hard protection. Rewrite the relevant paragraph
to explicitly state that nonReentrant is a permanent, hard contract-level
protection that prevents hook-reentrancy attacks regardless of token type, the
off-chain "no hook-bearing tokens" policy is a legacy defense-in-depth measure
for historically-deployed contracts at specific addresses, and new ChannelHub
deployments should rely solely on the contract-layer guard without depending on
operator-controlled token onboarding decisions to prevent this attack class.
- Around line 54-60: The new version-monotonicity behavior description for
channel-lifecycle events is numbered as "Invariant 8" and placed before the
"Formal Invariants List" section (which contains invariants 1–7), creating
structural confusion about whether these are informal or formal invariants. Move
the version-monotonicity behavior description (currently at lines 54–60) into
the "Formal Invariants List" section and renumber it as "Invariant 8" within
that formal section to maintain consistent numbering and clear document
structure. Alternatively, if keeping the description in its current location,
add a clarifying section header to explicitly distinguish informal behavior
descriptions from the numbered formal invariants list that follows.

In `@docs/protocol/security-and-limitations.md`:
- Around line 56-57: The documentation at line 58 regarding deferred escrow
refresh lacks important context for operators. Update the deferred follow-up
item statement to include: (1) an estimated timeline or tracking reference
(e.g., issue number or planned version), (2) explicit clarification that the
divergence window is bounded by the arrival of the next escrow event and note
typical escrow lifecycle duration to help quantify the risk, and (3) operational
guidance indicating whether operators should actively monitor escrow state
divergence in logs or treat this as an internal implementation detail. These
additions will provide the risk visibility needed for operators to understand
the impact and scope of this deferral.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 550c4569-a784-40dc-9e24-0b96c2c6fd76

📥 Commits

Reviewing files that changed from the base of the PR and between 21ffd5b and aa879a2.

📒 Files selected for processing (16)
  • contracts/SECURITY.md
  • contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md
  • contracts/src/ChannelHub.sol
  • contracts/test/ChannelHub_reentrancy.t.sol
  • contracts/test/mocks/ReentrantERC20.sol
  • docs/protocol/security-and-limitations.md
  • nitronode/README.md
  • nitronode/chart/config/prod-v1/assets.yaml
  • nitronode/chart/config/sandbox-v1/assets.yaml
  • nitronode/chart/config/stress-v1/assets.yaml
  • nitronode/event_handlers/service.go
  • nitronode/event_handlers/service_test.go
  • nitronode/main.go
  • pkg/blockchain/evm/chain_state_refresher.go
  • pkg/core/interface.go
  • pkg/core/types.go

Comment thread contracts/SECURITY.md Outdated
Comment thread nitronode/main.go Outdated

@ihsraham ihsraham left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract-side reentrancy fix looks solid: moving nonReentrant to the lifecycle entrypoints is the right shape, and the new Forge coverage exercises the pull-hook cases. I would still block this for now because the new node refresh path can fail closed by terminating or hanging the listener, and the current-deployment hook-token policy is not documented precisely enough for closure.

I also agree with the existing thread on the channelHubCallers map race and reacted there rather than repeating it.

Comment thread nitronode/event_handlers/service.go Outdated
if err != nil {
// Surface; if the on-chain read fails we cannot safely converge and must
// retry the event by returning a non-nil error so the listener replays.
return fmt.Errorf("refresh after dropped %s: %w", droppedIntent, err)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before approval, I would not let this error reach the listener fatal path. Returning a refresh error does roll back the event tx, but processEvents returns it and main.go calls logger.Fatal, so a transient getChannelData failure on a stale event can stop nitronode. Can we keep the replay semantics with listener-level retry/backoff or a retryable error path instead of terminating the process?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 92299e3. Switched refreshAfterDroppedEvent from "return wrapped error" to "log Error and return nil" — the dedup row commits, the listener moves on, and the process no longer exits on a transient RPC blip. Trade-off documented in the helper's doc-comment: in the rare case of a sustained RPC outage coinciding with a guard-drop, the local row may stay divergent from chain until the next on-chain event for that channel arrives. For terminal states (Closed) this may be indefinite — accepted as strictly better than terminating the node. Tests renamed to LoggedAndIgnored to reflect the new contract.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this unresolved because the current change fixes the node-exit part, but drops the replay/convergence property. Once refreshAfterDroppedEvent logs and returns nil, the reactor records the event, so a transient RPC failure can leave the local row stale with no replay.

I moved the current-head blocker here: #837 (comment)

Comment thread pkg/blockchain/evm/chain_state_refresher.go Outdated
Comment thread contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md
Comment thread contracts/SECURITY.md Outdated
Comment thread pkg/core/types.go
// between when the dropped event was emitted and when the refresh RPC ran. The
// Node row may therefore briefly skip an intermediate status it never observed,
// but it will always converge to a status the chain currently asserts.
type RefreshedChannel struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please find a better name

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the file

@ihsraham ihsraham left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract-side guard placement looks good now, and the deployment policy/docs issues I raised are addressed. The per-chain reader also removes the map race, and the refresh RPC is bounded.

I’m keeping this as changes requested for one remaining closure issue: refresh failures are now logged and swallowed, so a guard-drop event can be marked processed without the row converging to chain.

"channelId", channel.ChannelID,
"droppedIntent", droppedIntent,
"error", err)
return nil

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids the process exit, but I would still block here because the failed refresh is now acknowledged as processed. After this returns nil, the reactor records the contract event, so the listener will not replay it; the local row can stay divergent from chain on the guard-drop path this refresh is meant to close.

Can we keep the non-fatal behavior while making refresh failure retryable, for example with listener-level retry/backoff or a bounded local retry before the event is deduped?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants