Skip to content

Speed up split-vote elections with the new FAILOVER_AUTH_NACK message#3833

Open
enjoy-binbin wants to merge 12 commits into
valkey-io:unstablefrom
enjoy-binbin:nack
Open

Speed up split-vote elections with the new FAILOVER_AUTH_NACK message#3833
enjoy-binbin wants to merge 12 commits into
valkey-io:unstablefrom
enjoy-binbin:nack

Conversation

@enjoy-binbin

@enjoy-binbin enjoy-binbin commented May 26, 2026

Copy link
Copy Markdown
Member

In the current failover protocol a replica sends AUTH_REQUEST exactly
once per epoch and each voter casts at most one vote per epoch. Despite
the various delay heuristics in clusterHandleReplicaFailover that try to
stagger replicas, concurrent elections can still collide on the same
epoch. When the votes split and nobody reaches the quorum, the losing
replica has no way to learn this in time and must first wait for the
election to be declared expired after auth_timeout (2 * cluster-node-timeout)
and then wait another auth_retry_time (2 * auth_timeout) before it is even
allowed to start the next election with a higher epoch.

Introduce a new message type, FAILOVER_AUTH_NACK, that voters reply with
from every refusal branch. The replica counts incoming NACKs; since a
voter never re-answers within the same epoch, once size - nack_count
drops below the quorum the election is declared unwinnable and a new
one is started with a higher epoch right away, shrinking recovery from
the auth_timeout + auth_retry_time window to a few cron ticks.

Wire compatibility is preserved by gating NACK emission on a new
CLUSTER_NODE_FAILOVER_AUTH_NACK_SUPPORTED capability flag advertised in
PING/PONG flags via clusterUpdateMyselfFlags. Peers that do not advertise
the capability never see the new message type and fall back to the
legacy auth_timeout path.

Adding a new DEBUG CLUSTER-FAILOVER-DELAY hook overrides the delay
computed in clusterHandleReplicaFailover for testing.

In the current failover protocol a replica sends AUTH_REQUEST exactly
once per epoch and each voter casts at most one vote per epoch. Despite
the various delay heuristics in clusterHandleReplicaFailover that try to
stagger replicas, concurrent elections can still collide on the same
epoch. When the votes split and nobody reaches the quorum, the losing
replica has no way to learn this in time and must first wait for the
election to be declared expired after auth_timeout (2*cluster-node-timeout)
and then wait another auth_retry_time (2*auth_timeout) before it is even
allowed to start the next election with a higher epoch.

Introduce a new message type, FAILOVER_AUTH_NACK, that voters reply with
from every refusal branch. The replica counts incoming NACKs; since a
voter never re-answers within the same epoch, once size - nack_count
drops below the quorum the election is declared unwinnable and a new
one is started with a higher epoch right away, shrinking recovery from
the auth_timeout + auth_retry_time window to a few cron ticks.

Wire compatibility is preserved by gating NACK emission on a new
CLUSTER_NODE_FAILOVER_AUTH_NACK_SUPPORTED capability flag advertised in
PING/PONG flags via clusterUpdateMyselfFlags. Peers that do not advertise
the capability never see the new message type and fall back to the
legacy auth_timeout path.

Adding a new DEBUG CLUSTER-FAILOVER-DELAY <ms> hook overrides the delay
computed in clusterHandleReplicaFailover for testing.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin

enjoy-binbin commented May 26, 2026

Copy link
Copy Markdown
Member Author

This is an example without the auth_time reset (just print the logs but not reset failover_auth_time):

one replica

81935:S 26 May 2026 20:46:15.061 * Start of election delayed for 0 milliseconds (rank #0, primary rank #2, offset 14).
81935:S 26 May 2026 20:46:15.061 * Starting a failover election for epoch 11, node config epoch is 1
81935:S 26 May 2026 20:46:15.084 * Currently unable to failover: Waiting for votes, but majority still not reached.
81935:S 26 May 2026 20:46:15.089 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:15.102 # Failover auth NACK [already-voted] from 168a74f5255a32672cb114838466aae7c284ace1 (R4) for epoch 11 (NACKs 1/7)
81935:S 26 May 2026 20:46:15.103 # Failover auth NACK [already-voted] from ee52b4158d5a1e80165ba6e0eff0a1ef2ce7bfb3 (R3) for epoch 11 (NACKs 2/7)
81935:S 26 May 2026 20:46:15.103 # Failover auth NACK [already-voted] from 584c617a6cf7d4ba3d7842732d107b2c0fdcfcb8 (R5) for epoch 11 (NACKs 3/7)
81935:S 26 May 2026 20:46:15.113 # Failover auth NACK [already-voted] from f2166e8b732feb4a59a6db075247692e4b6b8d4a (R6) for epoch 11 (NACKs 4/7)
81935:S 26 May 2026 20:46:15.113 # Failover election for epoch 11 cannot reach quorum 4 (NACKs 4/7). Resetting the election since we cannot win an election without quorum.
81935:S 26 May 2026 20:46:16.002 * Currently unable to failover: Waiting for votes, but majority still not reached.
81935:S 26 May 2026 20:46:16.002 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:17.010 * Currently unable to failover: Waiting for votes, but majority still not reached.
81935:S 26 May 2026 20:46:17.010 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:18.021 * Currently unable to failover: Waiting for votes, but majority still not reached.
81935:S 26 May 2026 20:46:18.021 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:19.032 * Currently unable to failover: Waiting for votes, but majority still not reached.
81935:S 26 May 2026 20:46:19.032 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:20.043 * Currently unable to failover: Waiting for votes, but majority still not reached.
81935:S 26 May 2026 20:46:20.043 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:21.053 * Currently unable to failover: Waiting for votes, but majority still not reached.
81935:S 26 May 2026 20:46:21.053 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:21.155 * Currently unable to failover: Failover attempt expired.
81935:S 26 May 2026 20:46:21.155 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:22.064 * Currently unable to failover: Failover attempt expired.
81935:S 26 May 2026 20:46:22.064 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:23.074 * Currently unable to failover: Failover attempt expired.
81935:S 26 May 2026 20:46:23.074 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:24.085 * Currently unable to failover: Failover attempt expired.
81935:S 26 May 2026 20:46:24.085 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:25.093 * Currently unable to failover: Failover attempt expired.
81935:S 26 May 2026 20:46:25.093 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:26.007 * Currently unable to failover: Failover attempt expired.
81935:S 26 May 2026 20:46:26.008 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:27.017 * Currently unable to failover: Failover attempt expired.
81935:S 26 May 2026 20:46:27.018 * Needed quorum: 4. Number of votes received so far: 0
81935:S 26 May 2026 20:46:27.119 * Start of election delayed for 0 milliseconds (rank #0, primary rank #2, offset 14).
81935:S 26 May 2026 20:46:27.120 * Starting a failover election for epoch 13, node config epoch is 1
81935:S 26 May 2026 20:46:27.155 * Reconfiguring node 960f1889400efd7256c34b27a69db68a0e37e3ea (R9) as primary for shard 6d42ca63697f3e23acff992eced550081767c2f3
81935:S 26 May 2026 20:46:27.159 * Mismatch in topology information for sender node 960f1889400efd7256c34b27a69db68a0e37e3ea (R9) in shard 6d42ca63697f3e23acff992eced550081767c2f3
81935:S 26 May 2026 20:46:27.196 * Failover election won: I'm the new primary.

The other replica:

81924:S 26 May 2026 20:46:15.056 * Start of election delayed for 0 milliseconds (rank #0, primary rank #1, offset 14).
81924:S 26 May 2026 20:46:15.056 * Starting a failover election for epoch 11, node config epoch is 2
81924:S 26 May 2026 20:46:15.098 # Failover auth NACK [already-voted] from 168a74f5255a32672cb114838466aae7c284ace1 (R4) for epoch 11 (NACKs 1/7)
81924:S 26 May 2026 20:46:15.103 # Failover auth NACK [already-voted] from ee52b4158d5a1e80165ba6e0eff0a1ef2ce7bfb3 (R3) for epoch 11 (NACKs 2/7)
81924:S 26 May 2026 20:46:15.103 # Failover auth NACK [already-voted] from 584c617a6cf7d4ba3d7842732d107b2c0fdcfcb8 (R5) for epoch 11 (NACKs 3/7)
81924:S 26 May 2026 20:46:15.113 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:15.113 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:16.053 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:16.053 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:17.064 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:17.064 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:18.073 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:18.073 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:19.084 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:19.084 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:20.093 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:20.093 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:21.003 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:21.004 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:21.105 * Currently unable to failover: Failover attempt expired.
81924:S 26 May 2026 20:46:21.105 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:22.014 * Currently unable to failover: Failover attempt expired.
81924:S 26 May 2026 20:46:22.014 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:23.025 * Currently unable to failover: Failover attempt expired.
81924:S 26 May 2026 20:46:23.025 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:24.034 * Currently unable to failover: Failover attempt expired.
81924:S 26 May 2026 20:46:24.034 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:25.044 * Currently unable to failover: Failover attempt expired.
81924:S 26 May 2026 20:46:25.044 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:26.056 * Currently unable to failover: Failover attempt expired.
81924:S 26 May 2026 20:46:26.056 * Needed quorum: 4. Number of votes received so far: 1
81924:S 26 May 2026 20:46:27.067 * Start of election delayed for 0 milliseconds (rank #0, primary rank #1, offset 14).
81924:S 26 May 2026 20:46:27.067 * Starting a failover election for epoch 12, node config epoch is 2
81924:S 26 May 2026 20:46:27.113 # Failover auth NACK [already-voted] from 168a74f5255a32672cb114838466aae7c284ace1 (R4) for epoch 12 (NACKs 1/7)
81924:S 26 May 2026 20:46:27.113 # Failover auth NACK [already-voted] from ee52b4158d5a1e80165ba6e0eff0a1ef2ce7bfb3 (R3) for epoch 12 (NACKs 2/7)
81924:S 26 May 2026 20:46:27.117 # Failover auth NACK [already-voted] from 584c617a6cf7d4ba3d7842732d107b2c0fdcfcb8 (R5) for epoch 12 (NACKs 3/7)
81924:S 26 May 2026 20:46:27.121 # Failover auth NACK [already-voted] from f2166e8b732feb4a59a6db075247692e4b6b8d4a (R6) for epoch 12 (NACKs 4/7)
81924:S 26 May 2026 20:46:27.121 # Failover election for epoch 12 cannot reach quorum 4 (NACKs 4/7). Resetting the election since we cannot win an election without quorum.
81924:S 26 May 2026 20:46:27.121 * Currently unable to failover: Waiting for votes, but majority still not reached.
81924:S 26 May 2026 20:46:27.121 * Needed quorum: 4. Number of votes received so far: 0
81924:S 26 May 2026 20:46:27.159 # Failover election in progress for epoch 12, but received a claim from node 960f1889400efd7256c34b27a69db68a0e37e3ea (R9) with an equal or higher epoch 12. Resetting the election since we cannot win an election in the past.
81924:S 26 May 2026 20:46:27.163 * Reconfiguring node 960f1889400efd7256c34b27a69db68a0e37e3ea (R9) as primary for shard 6d42ca63697f3e23acff992eced550081767c2f3
81924:S 26 May 2026 20:46:27.163 * Mismatch in topology information for sender node 960f1889400efd7256c34b27a69db68a0e37e3ea (R9) in shard 6d42ca63697f3e23acff992eced550081767c2f3
81924:S 26 May 2026 20:46:27.163 * This is the best ranked replica and can initiate the election immediately.
81924:S 26 May 2026 20:46:27.167 * Start of election delayed for 0 milliseconds (rank #0, primary rank #0, offset 14).
81924:S 26 May 2026 20:46:27.172 * Starting a failover election for epoch 14, node config epoch is 2
81924:S 26 May 2026 20:46:27.276 * Reconfiguring node d51d892710dd8818b886c8e0e7ccdb51a4020ddb (R7) as primary for shard aeac04a66b927d48c8102f6a85030c756dd33f46
81924:S 26 May 2026 20:46:27.276 * Mismatch in topology information for sender node d51d892710dd8818b886c8e0e7ccdb51a4020ddb (R7) in shard aeac04a66b927d48c8102f6a85030c756dd33f46
81924:S 26 May 2026 20:46:27.276 * Failover election won: I'm the new primary.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds explicit FAILOVER_AUTH_NACK messaging and handling: wire contracts and capability flag, send/receive helpers, denial-path NACKs, NACK counting with election fast-fail, debug-configurable failover delay, and expanded parametrized tests.

Changes

Cluster Failover Authorization NACK Support

Layer / File(s) Summary
Wire protocol and capability contracts
src/cluster_legacy.h, src/cluster_legacy.c
New message type CLUSTERMSG_TYPE_FAILOVER_AUTH_NACK, clusterMsgDataFailoverNack payload and union member, reason codes, CLUSTER_NODE_FAILOVER_AUTH_NACK_SUPPORTED flag and accessor, init failover_auth_nack_count, and message-name mapping.
Debug configuration and command
src/server.h, src/server.c, src/debug.c
Adds server.debug_cluster_failover_delay (initialized to -1) and DEBUG CLUSTER-FAILOVER-DELAY <delay-ms> command with help text and validation.
Packet validation, capability tracking, send helper
src/cluster_legacy.c
Validate FAILOVER_AUTH_NACK packets, update peer capability from header flags, route NACK packets to handler, add clusterNackReasonString() and clusterSendFailoverNack() that only send to capable peers.
Failover auth rejection with NACKs
src/cluster_legacy.c
clusterSendFailoverAuthIfNeeded() now sends explicit NACKs for denial reasons (NOT_SAFE, REQ_EPOCH_OLD, ALREADY_VOTED, PRIMARY_UP, STALE_CONFIG); stale-config path sends UPDATE then STALE_CONFIG NACK.
NACK processing and election flow
src/cluster_legacy.c
clusterProcessFailoverAuthNack() increments failover_auth_nack_count, logs reason, fast-fails elections when quorum is unreachable; retry scheduling resets NACK count and debug delay overrides computed election delay.
Parametrized failover test
tests/unit/cluster/failover2.tcl
Refactors into test_same_epoch {delay} and runs scenario with delays 500 and 0; extends test_replica_config_epoch_failover with drop_nack to exercise legacy timeout vs fast-fail paths and expands the test matrix.

Sequence Diagram(s)

sequenceDiagram
  participant ComponentA
  participant ComponentB
  ComponentA->>ComponentB: observable interaction
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested reviewers

  • zuiderkwast
  • lucasyonge
  • ranshid
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: introducing a new FAILOVER_AUTH_NACK message to speed up split-vote elections, which is the core feature of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the motivation, implementation approach, and wire compatibility guarantees for the new FAILOVER_AUTH_NACK message type.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/cluster_legacy.h (1)

147-150: ⚡ Quick win

Pin the new wire payload size with a static assert.

This struct is now part of the cluster wire contract. A compile-time size check here would catch accidental padding or field changes before they silently alter the on-the-wire format.

♻️ Proposed fix
 typedef struct {
     uint8_t reason;
     char notused1[24];
 } clusterMsgDataFailoverNack;
+static_assert(sizeof(clusterMsgDataFailoverNack) == 25, "unexpected FAILOVER_AUTH_NACK payload size");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cluster_legacy.h` around lines 147 - 150, The new cluster wire struct
clusterMsgDataFailoverNack must have its on-the-wire size pinned; add a
compile-time assertion (e.g., static_assert or _Static_assert depending on
project C standard) that sizeof(clusterMsgDataFailoverNack) == 25 (1 byte for
reason + 24 bytes for notused1) with a clear message indicating this pins the
wire format, so any accidental padding/field changes will fail to compile; place
the assertion next to the struct definition and use the same symbol name in the
assertion.
src/cluster_legacy.c (1)

70-71: ⚡ Quick win

Make the new NACK helpers file-local.

clusterSendFailoverNack() and clusterProcessFailoverAuthNack() are only used inside this translation unit, so keeping them non-static needlessly widens the symbol surface.

Proposed change
-void clusterProcessFailoverAuthNack(clusterNode *sender, clusterMsg *request);
-void clusterSendFailoverNack(clusterNode *node, uint8_t reason);
+static void clusterProcessFailoverAuthNack(clusterNode *sender, clusterMsg *request);
+static void clusterSendFailoverNack(clusterNode *node, uint8_t reason);
...
-void clusterSendFailoverNack(clusterNode *node, uint8_t reason) {
+static void clusterSendFailoverNack(clusterNode *node, uint8_t reason) {
...
-void clusterProcessFailoverAuthNack(clusterNode *sender, clusterMsg *request) {
+static void clusterProcessFailoverAuthNack(clusterNode *sender, clusterMsg *request) {

As per coding guidelines "Use static keyword for file-local functions in C code".

Also applies to: 5312-5324, 5443-5471

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cluster_legacy.c` around lines 70 - 71, The two helper functions
clusterSendFailoverNack and clusterProcessFailoverAuthNack are only used within
this translation unit and should be made file-local: add the static qualifier to
their forward declarations and their function definitions (and to any other
internally-used helpers referenced in the same areas noted, e.g., the helpers
around the 5312-5324 and 5443-5471 regions) so their symbols are not exported
from the object file; ensure prototypes and definitions match (both marked
static).
tests/unit/cluster/failover2.tcl (1)

79-79: ⚡ Quick win

Consider using the pause_process helper for consistency.

Line 79 uses exec kill -SIGSTOP directly, while the rest of the file uses the pause_process helper (lines 15, 34, 120, 182). If atomic pausing is intentional for same-epoch testing, consider documenting the rationale; otherwise, prefer the framework helper:

♻️ Align with test framework pattern
-        exec kill -SIGSTOP [srv 0 pid] [srv -1 pid] [srv -2 pid]
+        pause_process [srv 0 pid]
+        pause_process [srv -1 pid]
+        pause_process [srv -2 pid]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/cluster/failover2.tcl` at line 79, Replace the direct call to exec
kill -SIGSTOP used on the test processes with the existing pause_process helper
for consistency with this test suite (the helper is used at lines where
pause_process is referenced); locate the line containing exec kill -SIGSTOP [srv
0 pid] [srv -1 pid] [srv -2 pid] and change it to call pause_process for each
target process (or document in a short comment why an atomic SIGSTOP across
multiple pids is required if you intentionally need simultaneous pausing),
ensuring you use the same helper signature and import/context as other uses in
this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/cluster_legacy.c`:
- Around line 70-71: The two helper functions clusterSendFailoverNack and
clusterProcessFailoverAuthNack are only used within this translation unit and
should be made file-local: add the static qualifier to their forward
declarations and their function definitions (and to any other internally-used
helpers referenced in the same areas noted, e.g., the helpers around the
5312-5324 and 5443-5471 regions) so their symbols are not exported from the
object file; ensure prototypes and definitions match (both marked static).

In `@src/cluster_legacy.h`:
- Around line 147-150: The new cluster wire struct clusterMsgDataFailoverNack
must have its on-the-wire size pinned; add a compile-time assertion (e.g.,
static_assert or _Static_assert depending on project C standard) that
sizeof(clusterMsgDataFailoverNack) == 25 (1 byte for reason + 24 bytes for
notused1) with a clear message indicating this pins the wire format, so any
accidental padding/field changes will fail to compile; place the assertion next
to the struct definition and use the same symbol name in the assertion.

In `@tests/unit/cluster/failover2.tcl`:
- Line 79: Replace the direct call to exec kill -SIGSTOP used on the test
processes with the existing pause_process helper for consistency with this test
suite (the helper is used at lines where pause_process is referenced); locate
the line containing exec kill -SIGSTOP [srv 0 pid] [srv -1 pid] [srv -2 pid] and
change it to call pause_process for each target process (or document in a short
comment why an atomic SIGSTOP across multiple pids is required if you
intentionally need simultaneous pausing), ensuring you use the same helper
signature and import/context as other uses in this file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5e5ebad9-589c-475d-8218-e3d48f92cb1b

📥 Commits

Reviewing files that changed from the base of the PR and between d9ba5ab and 887fccb.

📒 Files selected for processing (6)
  • src/cluster_legacy.c
  • src/cluster_legacy.h
  • src/debug.c
  • src/server.c
  • src/server.h
  • tests/unit/cluster/failover2.tcl

@enjoy-binbin enjoy-binbin requested a review from zuiderkwast June 5, 2026 07:45

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general.

Have you consider using the light header for NACK? If we use it from the start, we don't need to support the non-light variant.

Comment thread src/cluster_legacy.c
Comment thread src/debug.c Outdated
Comment thread src/cluster_legacy.c
Comment thread src/cluster_legacy.c

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, and the code looks mostly good. I would like to run it with valkey-fuzzer as well if it can find any issues. Shared one concern (not sure if it's valid.)

@enjoy-binbin

Copy link
Copy Markdown
Member Author

Have you consider using the light header for NACK? If we use it from the start, we don't need to support the non-light variant.

Yes, i did consider using a light header, but we do need the configEpoch information (which is currently in clusterMsg). And of course, we could also include the epoch information in the data payload. There are some conditional checks i don't want to touch:

    if (sender && !nodeInHandshake(sender)) {
        /* Update our currentEpoch if we see a newer epoch in the cluster. */
        sender_claimed_current_epoch = ntohu64(msg->currentEpoch);
        sender_claimed_config_epoch = ntohu64(msg->configEpoch);

And another reason, i want to keep the AUTH_REQUEST / AUTH_ACK / AUTH_NACK handling logic in one place. Keeping symmetry across these three election messages. NACK is point-to-point and only emitted during a failed elections, so i think i can accept it.

BUT of course, we can use light header for this type.

enjoy-binbin and others added 2 commits June 9, 2026 10:33
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
…the failover" test

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/unit/cluster/failover2.tcl (2)

78-78: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Typo in comment.

"Killing there primary nodes" should be "Killing three primary nodes".

-        # Killing there primary nodes.
+        # Killing three primary nodes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/cluster/failover2.tcl` at line 78, Typo in test comment: update
the comment string "Killing there primary nodes." in
tests/unit/cluster/failover2.tcl to read "Killing three primary nodes." so the
intent is clear; locate the commented line containing that exact phrase and
replace "there" with "three".

70-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Grammar issue in test name.

The test name contains "then they are elected" which should be "when they are elected" for grammatical correctness.

-    test "Primaries will not time out then they are elected in the same epoch - delay $delay" {
+    test "Primaries will not time out when they are elected in the same epoch - delay $delay" {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/cluster/failover2.tcl` at line 70, The test name string "Primaries
will not time out then they are elected in the same epoch - delay $delay" has a
grammar mistake; update the test declaration to replace "then" with "when" so it
reads "Primaries will not time out when they are elected in the same epoch -
delay $delay" (locate the test block that begins with the same quoted title and
change the text there).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/unit/cluster/failover2.tcl`:
- Line 78: Typo in test comment: update the comment string "Killing there
primary nodes." in tests/unit/cluster/failover2.tcl to read "Killing three
primary nodes." so the intent is clear; locate the commented line containing
that exact phrase and replace "there" with "three".
- Line 70: The test name string "Primaries will not time out then they are
elected in the same epoch - delay $delay" has a grammar mistake; update the test
declaration to replace "then" with "when" so it reads "Primaries will not time
out when they are elected in the same epoch - delay $delay" (locate the test
block that begins with the same quoted title and change the text there).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2f09e5f2-a421-48f6-843d-4f1f753114f9

📥 Commits

Reviewing files that changed from the base of the PR and between 6a9ddad and 64fac40.

📒 Files selected for processing (2)
  • src/cluster_legacy.c
  • tests/unit/cluster/failover2.tcl
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cluster_legacy.c

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.45614% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.75%. Comparing base (6774c09) to head (573ece4).

Files with missing lines Patch % Lines
src/cluster_legacy.c 83.33% 8 Missing ⚠️
src/debug.c 75.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3833      +/-   ##
============================================
+ Coverage     76.59%   76.75%   +0.16%     
============================================
  Files           162      162              
  Lines         80788    80845      +57     
============================================
+ Hits          61881    62056     +175     
+ Misses        18907    18789     -118     
Files with missing lines Coverage Δ
src/server.c 89.50% <100.00%> (+<0.01%) ⬆️
src/server.h 100.00% <ø> (ø)
src/debug.c 55.54% <75.00%> (+0.13%) ⬆️
src/cluster_legacy.c 88.46% <83.33%> (+0.35%) ⬆️

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@zuiderkwast

Copy link
Copy Markdown
Contributor

And another reason, i want to keep the AUTH_REQUEST / AUTH_ACK / AUTH_NACK handling logic in one place. Keeping symmetry across these three election messages. NACK is point-to-point and only emitted during a failed elections, so i think i can accept it.

BUT of course, we can use light header for this type.

Let's use the normal header for NACK now. In the future, we can add light variant of all of these.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some comments/questions. You can check if you agree or not, or just merge.

Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated
@zuiderkwast zuiderkwast added the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label Jun 9, 2026
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try this out with valkey-fuzzer and it looks to be passing 100 fuzzer runs: https://github.com/valkey-io/valkey-fuzzer/actions/runs/27182824004

Signed-off-by: Binbin <binloveplay1314@qq.com>
Comment thread src/cluster_legacy.h Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@zuiderkwast

Copy link
Copy Markdown
Contributor

@enjoy-binbin We can merge this. Just out of curiosity, I want to try this bot to see if there are any interesting comments.

@valkey-review-bot Please review this PR.

@valkey-review-bot

valkey-review-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

The NACK protocol itself looks solid — the capability gating (nodeSupportsFailoverAuthNack on send, the self-advertised flag on receive), the wire-length symmetry in clusterIsValidPacket/clusterSendFailoverNack, and the fast-fail quorum check all hold up, and the build is clean. One unrelated file got swept into the PR; details inline.

Comment thread design-docs/redis62-valkey90-command-compat.md Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants