Skip to content

Fix adjacent slot range behavior in ASM operations#15

Open
tomerqodo wants to merge 8 commits into
coderabbit_combined_20260121_augment_sentry_coderabbit_1_base_fix_adjacent_slot_range_behavior_in_asm_operations_pr92from
coderabbit_combined_20260121_augment_sentry_coderabbit_1_head_fix_adjacent_slot_range_behavior_in_asm_operations_pr92
Open

Fix adjacent slot range behavior in ASM operations#15
tomerqodo wants to merge 8 commits into
coderabbit_combined_20260121_augment_sentry_coderabbit_1_base_fix_adjacent_slot_range_behavior_in_asm_operations_pr92from
coderabbit_combined_20260121_augment_sentry_coderabbit_1_head_fix_adjacent_slot_range_behavior_in_asm_operations_pr92

Conversation

@tomerqodo

@tomerqodo tomerqodo commented Jan 22, 2026

Copy link
Copy Markdown

Benchmark PR from qodo-benchmark#92

Summary by CodeRabbit

Release Notes

  • New Features

    • Slot ranges are now automatically normalized (sorted and merged during validation) for consistent handling
  • Bug Fixes

    • Improved error handling and resource cleanup during slot range operations
    • Enhanced error messages for invalid slot ranges and duplicate slots
  • Tests

    • Added tests validating automatic slot range merging for adjacent and unsorted input ranges

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 22, 2026

Copy link
Copy Markdown

Walkthrough

The PR refactors slot range handling in cluster operations by introducing a normalization step during validation, renaming the validation API, adding a utility for sorting and merging ranges, strengthening equality checks, improving error handling in import task creation with proper cleanup paths, and adding observability through key counting in event logs. New unit tests validate slot merging behavior across multiple scenarios.

Changes

Cohort / File(s) Summary
Cluster slot range core
src/cluster.h, src/cluster.c
Renamed validateSlotRanges to slotRangeArrayNormalizeAndValidate to reflect new semantics. Added slotRangeArraySortAndMerge utility for encapsulating sort-and-merge logic. Enhanced slotRangeArrayIsEqual to normalize before comparison. Made slotRangeArrayToString resilient to NULL input. Updated validation flow to normalize before checking for out-of-range, inverted, or duplicate slots.
Cluster ASM implementation
src/cluster_asm.c
Refactored import task creation with goto-based error cleanup for resource deallocation. Added asmCountKeysInSlots function to count keys across slot ranges. Integrated normalization and validation before creating import tasks. Enhanced logging in asmLogTaskEvent and task lifecycle functions to report key counts for improved observability.
Unit tests
tests/unit/cluster/atomic-slot-migration.tcl
Added test cases validating slot merging behavior for adjacent and unsorted ranges during import operations, confirming ranges are consolidated correctly across source and destination nodes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Slots were scattered, wild and free,
Now sorted, merged in harmony,
Normalization sets them straight,
Validation checks each gate,
Errors cleaned with care so bright—
Rabbit hops with code delight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly corresponds to the main changes: introducing slot range normalization (sorting/merging) in ASM operations via the new slotRangeArrayNormalizeAndValidate function and related utilities.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/cluster_asm.c`:
- Around line 1011-1023: The function asmCountKeysInSlots contains an unused
local variable total_ranges which triggers -Werror; remove the declaration of
total_ranges from asmCountKeysInSlots (or alternatively use it in the loop) so
there are no unused locals—update the function to iterate using
slots->num_ranges directly and keep the rest of the logic (kvstoreDictSize calls
and accumulation into key_count) unchanged.

In `@src/cluster.c`:
- Around line 2014-2019: The error string produced by
slotRangeArrayNormalizeAndValidate is freed without informing the client; update
the failure path where sds err is set so that before sdsfree(err) and
slotRangeArrayFree(slots) you call addReplyError(c, err) (using the existing
client variable c) to send the validation message, then free err and slots as
before; ensure you only call addReplyError when err is non-NULL to avoid null
deref.
🧹 Nitpick comments (1)
src/cluster.c (1)

1876-1889: Side effect: input arrays are mutated during equality check.

slotRangeArraySortAndMerge modifies the input arrays in-place. Callers expecting their arrays to remain unchanged after an equality check may encounter unexpected behavior.

Consider either:

  1. Documenting this mutation in a comment above the function
  2. Making copies before sorting/merging if callers need to preserve original order

If this is intentional and callers are aware, a brief comment would clarify the contract.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3650a58 and 6ad1eae.

📒 Files selected for processing (4)
  • src/cluster.c
  • src/cluster.h
  • src/cluster_asm.c
  • tests/unit/cluster/atomic-slot-migration.tcl
🧰 Additional context used
🧬 Code graph analysis (3)
src/cluster.h (1)
src/cluster.c (2)
  • slotRangeArraySortAndMerge (1860-1873)
  • slotRangeArrayNormalizeAndValidate (1736-1771)
src/cluster_asm.c (4)
src/cluster_legacy.c (1)
  • getMyClusterNode (5832-5834)
src/sds.c (3)
  • sdsnew (203-206)
  • sdscatprintf (635-642)
  • sdsempty (198-200)
src/cluster.c (3)
  • slotRangeArrayFree (1944-1946)
  • slotRangeArrayDup (1781-1785)
  • slotRangeArrayNormalizeAndValidate (1736-1771)
src/kvstore.c (1)
  • kvstoreDictSize (706-712)
src/cluster.c (2)
src/sds.c (3)
  • sdscatprintf (635-642)
  • sdsempty (198-200)
  • sdsfree (214-217)
src/networking.c (1)
  • addReplyErrorArity (704-707)
🪛 GitHub Actions: CI
src/cluster_asm.c

[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]. Build failed in step: make -C src all (warnings are treated as errors).

🪛 GitHub Actions: External Server Tests
src/cluster_asm.c

[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-old-chain-jemalloc
  • GitHub Check: build-32bit
  • GitHub Check: code-coverage
  • GitHub Check: Analyze (cpp)
  • GitHub Check: test-external-standalone
  • GitHub Check: build-macos-latest
  • GitHub Check: code-coverage
🔇 Additional comments (10)
src/cluster.h (1)

187-197: LGTM! Header declarations align with implementation.

The new slotRangeArraySortAndMerge declaration and the renamed slotRangeArrayNormalizeAndValidate function are consistent with the implementations in src/cluster.c. The naming follows the existing slotRangeArray* convention.

src/cluster.c (4)

1734-1771: LGTM! Normalization and validation logic is correct.

The function properly:

  1. Validates the number of ranges first (before sorting)
  2. Sorts and merges adjacent ranges
  3. Validates bounds, ordering, and detects overlaps via the used_slots bitmap

The ordering is appropriate since sorting must happen before the overlap detection via the bitmap.


1794-1806: LGTM! Good defensive NULL check.

The added NULL safety check at line 1796 prevents potential null pointer dereference and returns an empty string consistently.


1855-1873: LGTM! Merge logic is correct for adjacent-only merging.

The function correctly:

  1. Guards against NULL/single-element inputs
  2. Sorts by start slot
  3. Merges only truly adjacent ranges (end + 1 == start)
  4. Preserves overlapping ranges (which will be caught by validation)

The comment clearly documents that overlapping ranges are intentionally not merged.


1832-1837: LGTM! Proper error handling in parsing function.

The normalization call and error message cleanup are handled correctly for this internal parsing function that returns NULL on failure.

tests/unit/cluster/atomic-slot-migration.tcl (1)

514-548: LGTM! Comprehensive test coverage for adjacent slot range merging.

The test covers key scenarios:

  1. Simple adjacent ranges merging (7000-7001 7002-71007000-7100)
  2. Unsorted input producing multiple disjoint ranges after normalization
  3. Single-slot ranges merging into a contiguous range

Verifying consistent results on both source and destination nodes ensures the normalization is applied uniformly across the cluster.

src/cluster_asm.c (4)

1036-1055: Key-count enrichment in lifecycle logs looks good.

The additional key counts improve observability for import/migrate events.


2869-2898: Normalization before import task creation is a solid safety net.

Duplicating and normalizing ranges prior to task creation reduces ambiguity for adjacent/unsorted ranges and keeps validation consistent.


3303-3308: Consistent key counting for active trim start is helpful.

Good use of the shared helper to keep metrics/logging consistent across events.


798-847: Add NULL check after asmTaskCreate call to prevent NULL pointer dereference.

Ownership transfer semantics at call sites are correct—both callers (line 865, line 2877) pass ownership to asmCreateImportTask, which either consumes the slots or frees them on error paths. No UAF/double-free risk there.

However, asmTaskCreate can fail due to memory allocation, but line 829–835 dereferences task without NULL checking:

asmTask *task = asmTaskCreate(task_id);
task->slots = slots;  // NULL dereference if asmTaskCreate returned NULL

Add a NULL check after asmTaskCreate and handle the failure:

asmTask *task = asmTaskCreate(task_id);
if (!task) {
    *err = sdsnew("failed to create ASM task");
    slotRangeArrayFree(slots);
    return NULL;
}

Also remove the unused total_ranges variable in asmCountKeysInSlots (line 1016) to fix build failure.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread src/cluster_asm.c
Comment on lines +1011 to +1023
/* Return the number of keys in the specified slot ranges. */
unsigned long long asmCountKeysInSlots(slotRangeArray *slots) {
if (!slots) return 0;

unsigned long long key_count = 0;
int total_ranges = slots->num_ranges;
for (int i = 0; i < slots->num_ranges; i++) {
for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) {
key_count += kvstoreDictSize(server.db[0].keys, j);
}
}
return key_count;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove unused variable to fix -Werror build break.

CI fails due to the unused total_ranges variable in asmCountKeysInSlots. Please remove it (or use it).

🛠️ Proposed fix
 unsigned long long asmCountKeysInSlots(slotRangeArray *slots) {
     if (!slots) return 0;

     unsigned long long key_count = 0;
-    int total_ranges = slots->num_ranges;
     for (int i = 0; i < slots->num_ranges; i++) {
         for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) {
             key_count += kvstoreDictSize(server.db[0].keys, j);
         }
     }
     return key_count;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/* Return the number of keys in the specified slot ranges. */
unsigned long long asmCountKeysInSlots(slotRangeArray *slots) {
if (!slots) return 0;
unsigned long long key_count = 0;
int total_ranges = slots->num_ranges;
for (int i = 0; i < slots->num_ranges; i++) {
for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) {
key_count += kvstoreDictSize(server.db[0].keys, j);
}
}
return key_count;
}
unsigned long long asmCountKeysInSlots(slotRangeArray *slots) {
if (!slots) return 0;
unsigned long long key_count = 0;
for (int i = 0; i < slots->num_ranges; i++) {
for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) {
key_count += kvstoreDictSize(server.db[0].keys, j);
}
}
return key_count;
}
🧰 Tools
🪛 GitHub Actions: CI

[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]. Build failed in step: make -C src all (warnings are treated as errors).

🪛 GitHub Actions: External Server Tests

[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]

🤖 Prompt for AI Agents
In `@src/cluster_asm.c` around lines 1011 - 1023, The function asmCountKeysInSlots
contains an unused local variable total_ranges which triggers -Werror; remove
the declaration of total_ranges from asmCountKeysInSlots (or alternatively use
it in the loop) so there are no unused locals—update the function to iterate
using slots->num_ranges directly and keep the rest of the logic (kvstoreDictSize
calls and accumulation into key_count) unchanged.

Comment thread src/cluster.c
Comment on lines 2014 to 2019
sds err = NULL;
if (validateSlotRanges(slots, &err) != C_OK) {
addReplyErrorSds(c, err);
if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) {
sdsfree(err);
slotRangeArrayFree(slots);
return NULL;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validation error message is discarded; client receives no feedback.

When slotRangeArrayNormalizeAndValidate fails, the specific error message (e.g., "Slot X specified multiple times") is freed without being sent to the client. The function returns NULL, but callers may not provide any error reply.

Consider using addReplyError(c, err) before freeing to inform the client of the specific validation failure:

Suggested fix
     sds err = NULL;
     if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) {
+        addReplyError(c, err);
         sdsfree(err);
         slotRangeArrayFree(slots);
         return NULL;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sds err = NULL;
if (validateSlotRanges(slots, &err) != C_OK) {
addReplyErrorSds(c, err);
if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) {
sdsfree(err);
slotRangeArrayFree(slots);
return NULL;
}
sds err = NULL;
if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) {
addReplyError(c, err);
sdsfree(err);
slotRangeArrayFree(slots);
return NULL;
}
🤖 Prompt for AI Agents
In `@src/cluster.c` around lines 2014 - 2019, The error string produced by
slotRangeArrayNormalizeAndValidate is freed without informing the client; update
the failure path where sds err is set so that before sdsfree(err) and
slotRangeArrayFree(slots) you call addReplyError(c, err) (using the existing
client variable c) to send the validation message, then free err and slots as
before; ensure you only call addReplyError when err is non-NULL to avoid null
deref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants