MF3-L19: fix: prevent SC reentrancy + event-handler monotonicity#837
MF3-L19: fix: prevent SC reentrancy + event-handler monotonicity#837nksazonov wants to merge 9 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds two defense-in-depth layers: (1) in ChangesChannelHub Reentrancy Protection and Node State Convergence
Sequence Diagram(s)sequenceDiagram
participant TokenHook as Malicious Token Hook
participant ChannelHub
participant EventHandlerService
participant EVMChainStateRefresher
participant ChannelStore
Note over ChannelHub: External entry point protected by nonReentrant
TokenHook->>ChannelHub: transferFrom callback → reenter lifecycle function
ChannelHub-->>TokenHook: ReentrancyGuardReentrantCall revert
Note over EventHandlerService: On-chain event received out-of-order
ChannelHub--)EventHandlerService: lifecycle event (stale StateVersion)
EventHandlerService->>EventHandlerService: guardEventVersionMonotonic → dropped
EventHandlerService->>EVMChainStateRefresher: RefreshChannelFromChain(ctx, channelID)
EVMChainStateRefresher->>ChannelHub: GetChannelData(channelID)
ChannelHub-->>EVMChainStateRefresher: on-chain Status, StateVersion, ChallengeExpiresAt, sig
EVMChainStateRefresher-->>EventHandlerService: RefreshedChannel
EventHandlerService->>ChannelStore: UpdateChannel(refreshed fields) within tx
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md (1)
1-35:⚠️ Potential issue | 🔴 CriticalUpdate deployment matrix with correct testnet addresses and commits; verify git tags exist.
Lines 20–34 contain outdated deployment data for three testnet chains:
Chain 80002 (Polygon Amoy)
- Matrix lists: address
0x5dba8515af063db0c243c15ece7b99f91459c7c3, commitb88d511c- Artifacts show: address
0x55D6f0A0322606447fbc612Cf58014Faed65aF9D, commitfd394085Chain 84532 (Base Sepolia)
- Matrix lists: address
0x5dba8515af063db0c243c15ece7b99f91459c7c3, commitb88d511c- Artifacts show: address
0x6E2C4707DA119425dF2c722E2695300154652f56, commit6c0a41d5Chain 11155111 (Sepolia)
- Matrix lists: address
0x5dba8515af063db0c243c15ece7b99f91459c7c3, commitb88d511c- Artifacts show: address
0xCe87FD88F4B5Fd5475d163e2642C5c2c7dD655Ec, commit0e5cd5b7Additionally, git tags
prod v1.3.0andsandbox v1.3.0do not exist in the repository; only the commits themselves are present. The referenced commits (bothe07ad9c2andb88d511c) do exist, but verify how deployment tags should be recorded.All configured chains from
blockchains.yaml(prod-v1 and sandbox-v1) are represented in the matrix; production deployments match their artifacts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md` around lines 1 - 35, Update the deployment matrix in the HOOK-TOKEN-COMPATIBILITY.md file for three testnet chains with correct data from artifacts. For chain 80002 (Polygon Amoy), replace the ChannelHub Address with 0x55D6f0A0322606447fbc612Cf58014Faed65aF9D and the Deploy Commit with fd394085. For chain 84532 (Base Sepolia), replace the ChannelHub Address with 0x6E2C4707DA119425dF2c722E2695300154652f56 and the Deploy Commit with 6c0a41d5. For chain 11155111 (Sepolia), replace the ChannelHub Address with 0xCe87FD88F4B5Fd5475d163e2642C5c2c7dD655Ec and the Deploy Commit with 0e5cd5b7. Additionally, verify the Deploy Tag column entries: the git tags prod v1.3.0 and sandbox v1.3.0 do not exist in the repository, so determine the correct tag naming convention or record strategy and update all tag entries accordingly.
🧹 Nitpick comments (3)
docs/protocol/security-and-limitations.md (1)
56-57: 💤 Low valueClarify escrow refresh deferral: when will it be implemented, and what is the risk window?
Line 58 states: "Escrow event handlers ship the version guard without the refresh hook; the cross-chain RPC plumbing required for escrow refresh is a deferred follow-up item."
This leaves escrow rows potentially divergent from chain until the next on-chain event arrives. To help operators understand the risk:
- Add an estimated timeline: When is escrow refresh planned? (e.g., "tracked in issue
#XXX", "planned for v1.4.0")- Quantify the window: Clarify that the divergence window is bounded by "until the next escrow event arrives" and that escrow lifecycles are typically short (specific duration would help).
- Operational guidance: Should operators monitor for escrow state divergence in logs, or is this purely internal and invisible?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/protocol/security-and-limitations.md` around lines 56 - 57, The documentation at line 58 regarding deferred escrow refresh lacks important context for operators. Update the deferred follow-up item statement to include: (1) an estimated timeline or tracking reference (e.g., issue number or planned version), (2) explicit clarification that the divergence window is bounded by the arrival of the next escrow event and note typical escrow lifecycle duration to help quantify the risk, and (3) operational guidance indicating whether operators should actively monitor escrow state divergence in logs or treat this as an internal implementation detail. These additions will provide the risk visibility needed for operators to understand the impact and scope of this deferral.contracts/SECURITY.md (2)
556-557: ⚡ Quick winReentrancy-via-inbound-hooks section references current off-chain policy; clarify enforcement boundary.
Line 556 states: "The guard makes already-deployed contracts that historically operated under an off-chain 'no hook-bearing tokens' policy safe against this class of attack at the contract layer; the off-chain policy remains in effect for already-deployed contracts at specific addresses as a defense-in-depth measure pending operator-controlled token onboarding decisions."
This dual-layer defense (contract-level nonReentrant + off-chain policy) is sensible, but the statement conflates:
- What the contract now guarantees: nonReentrant prevents hook-reentrancy attacks even if hook-bearing tokens are deployed.
- What the off-chain policy does: Operators choose not to onboard hook tokens anyway.
Clarify that:
- The nonReentrant guard is a hard protection at the contract layer going forward.
- The off-chain "no hook tokens" policy is a defense-in-depth layer for already-deployed contracts, not a substitute.
- New ChannelHub deployments should rely on the contract layer alone; the off-chain policy is a legacy safeguard.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contracts/SECURITY.md` around lines 556 - 557, Clarify the relationship between the two defense layers against reentrancy-via-inbound-hooks in the security document. The current text conflates the contract-level nonReentrant guard with the off-chain token policy, making it unclear which is the hard protection. Rewrite the relevant paragraph to explicitly state that nonReentrant is a permanent, hard contract-level protection that prevents hook-reentrancy attacks regardless of token type, the off-chain "no hook-bearing tokens" policy is a legacy defense-in-depth measure for historically-deployed contracts at specific addresses, and new ChannelHub deployments should rely solely on the contract-layer guard without depending on operator-controlled token onboarding decisions to prevent this attack class.
54-60: ⚡ Quick winClarify invariant numbering: new invariant 8 inserted mid-list displaces existing structure.
Invariant 8 (lines 54–60) is inserted before the "Formal Invariants List" section, which begins at line 75 with invariants numbered 1–7. This creates ambiguity:
- If invariants 1–7 in the "Formal Invariants List" section are the canonical primary invariants, then the new behavior description (version monotonicity) logically belongs in that section, not above it.
- If the description above (lines 1–53) contains informal invariants 1–7, then adding a new invariant 8 mid-document before the formal section is confusing.
Consider either:
- Moving the version-monotonicity behavior (lines 54–60) into the "Formal Invariants List" section and renumbering it appropriately (e.g., as invariant 8 in the formal list), or
- Clarifying in the section header that the behavior descriptions above are informal and separate from the numbered formal invariants below.
Renumbering for consistency is recommended.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contracts/SECURITY.md` around lines 54 - 60, The new version-monotonicity behavior description for channel-lifecycle events is numbered as "Invariant 8" and placed before the "Formal Invariants List" section (which contains invariants 1–7), creating structural confusion about whether these are informal or formal invariants. Move the version-monotonicity behavior description (currently at lines 54–60) into the "Formal Invariants List" section and renumber it as "Invariant 8" within that formal section to maintain consistent numbering and clear document structure. Alternatively, if keeping the description in its current location, add a clarifying section header to explicitly distinguish informal behavior descriptions from the numbered formal invariants list that follows.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@contracts/SECURITY.md`:
- Around line 56-57: Align the escrow refresh deferral descriptions across all
three files to use consistent terminology and include the interim divergence
context. At contracts/SECURITY.md lines 56-57, update the phrase "follow-up
item" to match the more detailed language used in
docs/protocol/security-and-limitations.md lines 50-60, which explains this is a
"deferred follow-up item" and includes the contextual detail: "Pending its
arrival, escrow rows can remain divergent from chain across an interim window
until the next on-chain event arrives." Apply the same terminology and interim
divergence explanation to nitronode/README.md lines 40-48. Alternatively, if a
tracking issue exists for this deferral, add consistent tracking issue
references to all three locations instead of describing the interim window
directly.
In `@nitronode/main.go`:
- Around line 85-97: The `channelHubCallers` map is passed to
`evm.NewEVMChainStateRefresher` where it is held as a reference. During the loop
that follows, the map is written to at line 131 while goroutine-based listeners
are started at line 149. These listeners call `RefreshChannelFromChain` which
reads the same map, creating a concurrent map read/write data race as listener
goroutines from earlier iterations read the map while later iterations write to
it. Fix this by restructuring the loop to complete all map registrations before
starting any listeners, or by protecting the map with a synchronization
mechanism like a mutex around all reads and writes in both the loop and the
`RefreshChannelFromChain` function.
---
Outside diff comments:
In `@contracts/deployments/HOOK-TOKEN-COMPATIBILITY.md`:
- Around line 1-35: Update the deployment matrix in the
HOOK-TOKEN-COMPATIBILITY.md file for three testnet chains with correct data from
artifacts. For chain 80002 (Polygon Amoy), replace the ChannelHub Address with
0x55D6f0A0322606447fbc612Cf58014Faed65aF9D and the Deploy Commit with fd394085.
For chain 84532 (Base Sepolia), replace the ChannelHub Address with
0x6E2C4707DA119425dF2c722E2695300154652f56 and the Deploy Commit with 6c0a41d5.
For chain 11155111 (Sepolia), replace the ChannelHub Address with
0xCe87FD88F4B5Fd5475d163e2642C5c2c7dD655Ec and the Deploy Commit with 0e5cd5b7.
Additionally, verify the Deploy Tag column entries: the git tags prod v1.3.0 and
sandbox v1.3.0 do not exist in the repository, so determine the correct tag
naming convention or record strategy and update all tag entries accordingly.
---
Nitpick comments:
In `@contracts/SECURITY.md`:
- Around line 556-557: Clarify the relationship between the two defense layers
against reentrancy-via-inbound-hooks in the security document. The current text
conflates the contract-level nonReentrant guard with the off-chain token policy,
making it unclear which is the hard protection. Rewrite the relevant paragraph
to explicitly state that nonReentrant is a permanent, hard contract-level
protection that prevents hook-reentrancy attacks regardless of token type, the
off-chain "no hook-bearing tokens" policy is a legacy defense-in-depth measure
for historically-deployed contracts at specific addresses, and new ChannelHub
deployments should rely solely on the contract-layer guard without depending on
operator-controlled token onboarding decisions to prevent this attack class.
- Around line 54-60: The new version-monotonicity behavior description for
channel-lifecycle events is numbered as "Invariant 8" and placed before the
"Formal Invariants List" section (which contains invariants 1–7), creating
structural confusion about whether these are informal or formal invariants. Move
the version-monotonicity behavior description (currently at lines 54–60) into
the "Formal Invariants List" section and renumber it as "Invariant 8" within
that formal section to maintain consistent numbering and clear document
structure. Alternatively, if keeping the description in its current location,
add a clarifying section header to explicitly distinguish informal behavior
descriptions from the numbered formal invariants list that follows.
In `@docs/protocol/security-and-limitations.md`:
- Around line 56-57: The documentation at line 58 regarding deferred escrow
refresh lacks important context for operators. Update the deferred follow-up
item statement to include: (1) an estimated timeline or tracking reference
(e.g., issue number or planned version), (2) explicit clarification that the
divergence window is bounded by the arrival of the next escrow event and note
typical escrow lifecycle duration to help quantify the risk, and (3) operational
guidance indicating whether operators should actively monitor escrow state
divergence in logs or treat this as an internal implementation detail. These
additions will provide the risk visibility needed for operators to understand
the impact and scope of this deferral.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 550c4569-a784-40dc-9e24-0b96c2c6fd76
📒 Files selected for processing (16)
contracts/SECURITY.mdcontracts/deployments/HOOK-TOKEN-COMPATIBILITY.mdcontracts/src/ChannelHub.solcontracts/test/ChannelHub_reentrancy.t.solcontracts/test/mocks/ReentrantERC20.soldocs/protocol/security-and-limitations.mdnitronode/README.mdnitronode/chart/config/prod-v1/assets.yamlnitronode/chart/config/sandbox-v1/assets.yamlnitronode/chart/config/stress-v1/assets.yamlnitronode/event_handlers/service.gonitronode/event_handlers/service_test.gonitronode/main.gopkg/blockchain/evm/chain_state_refresher.gopkg/core/interface.gopkg/core/types.go
ihsraham
left a comment
There was a problem hiding this comment.
The contract-side reentrancy fix looks solid: moving nonReentrant to the lifecycle entrypoints is the right shape, and the new Forge coverage exercises the pull-hook cases. I would still block this for now because the new node refresh path can fail closed by terminating or hanging the listener, and the current-deployment hook-token policy is not documented precisely enough for closure.
I also agree with the existing thread on the channelHubCallers map race and reacted there rather than repeating it.
| if err != nil { | ||
| // Surface; if the on-chain read fails we cannot safely converge and must | ||
| // retry the event by returning a non-nil error so the listener replays. | ||
| return fmt.Errorf("refresh after dropped %s: %w", droppedIntent, err) |
There was a problem hiding this comment.
Before approval, I would not let this error reach the listener fatal path. Returning a refresh error does roll back the event tx, but processEvents returns it and main.go calls logger.Fatal, so a transient getChannelData failure on a stale event can stop nitronode. Can we keep the replay semantics with listener-level retry/backoff or a retryable error path instead of terminating the process?
There was a problem hiding this comment.
Addressed in 92299e3. Switched refreshAfterDroppedEvent from "return wrapped error" to "log Error and return nil" — the dedup row commits, the listener moves on, and the process no longer exits on a transient RPC blip. Trade-off documented in the helper's doc-comment: in the rare case of a sustained RPC outage coinciding with a guard-drop, the local row may stay divergent from chain until the next on-chain event for that channel arrives. For terminal states (Closed) this may be indefinite — accepted as strictly better than terminating the node. Tests renamed to LoggedAndIgnored to reflect the new contract.
There was a problem hiding this comment.
Keeping this unresolved because the current change fixes the node-exit part, but drops the replay/convergence property. Once refreshAfterDroppedEvent logs and returns nil, the reactor records the event, so a transient RPC failure can leave the local row stale with no replay.
I moved the current-head blocker here: #837 (comment)
| // between when the dropped event was emitted and when the refresh RPC ran. The | ||
| // Node row may therefore briefly skip an intermediate status it never observed, | ||
| // but it will always converge to a status the chain currently asserts. | ||
| type RefreshedChannel struct { |
There was a problem hiding this comment.
please find a better name
ihsraham
left a comment
There was a problem hiding this comment.
The contract-side guard placement looks good now, and the deployment policy/docs issues I raised are addressed. The per-chain reader also removes the map race, and the refresh RPC is bounded.
I’m keeping this as changes requested for one remaining closure issue: refresh failures are now logged and swallowed, so a guard-drop event can be marked processed without the row converging to chain.
| "channelId", channel.ChannelID, | ||
| "droppedIntent", droppedIntent, | ||
| "error", err) | ||
| return nil |
There was a problem hiding this comment.
This avoids the process exit, but I would still block here because the failed refresh is now acknowledged as processed. After this returns nil, the reactor records the contract event, so the listener will not replay it; the local row can stay divergent from chain on the guard-drop path this refresh is meant to close.
Can we keep the non-fatal behavior while making refresh failure retryable, for example with listener-level retry/backoff or a bounded local retry before the event is deduped?
Summary
Addresses audit finding MF3-L19 (reentrancy in
ChannelHublifecycle entrypoints) with paired contract-side and Nitronode-side fixes, plus supporting documentation.fix(contracts)— addednonReentrantto every external/public lifecycle entrypoint inChannelHub.sol(root-cause remediation). Migrated guards upward from internal fund helpers since OZReentrancyGuarduses a single shared status slot. New regression tests for the four reentrancy scenarios.fix(nitronode)— added version-monotonicity guards to six previously guard-less event handlers (HandleHomeChannelCheckpointed,HandleHomeChannelClosed, four escrow Initiated/Finalized handlers) via a shared helper. Drops stale events with a structured warn log instead of regressing local state or silently clearing challenges.feat(nitronode)—ChainStateRefresherinterface + EVM implementation. On home-channel guard drop, fetches authoritative on-chain state viagetChannelDataand overwrites the local row. Defense-in-depth against any class of out-of-order delivery (indexer mis-order, reorg replay, future contract changes).docs— new sections incontracts/SECURITY.md,docs/protocol/security-and-limitations.md,nitronode/README.md; new deployment-constraint matrix atcontracts/deployments/HOOK-TOKEN-COMPATIBILITY.mdrecording per-chain hook-token support; WARNING blocks in the threenitronode/chart/config/*/assets.yamlfiles.Follow-up tracked at #836 (scheduler dedup for
blockchain_actions).Test plan
go build ./...— cleango vet ./nitronode/... ./pkg/core/... ./pkg/blockchain/evm/...— cleango test ./nitronode/event_handlers/... ./pkg/blockchain/evm/... ./pkg/core/... -count=1— greenforge test— 292 passed, 0 failed (288 pre-existing + 4 new reentrancy regression tests)forge fmt --check— cleanSummary by CodeRabbit
Release Notes
Security Enhancements
Known Limitations