Fix adjacent slot range behavior in ASM operations#15
Conversation
WalkthroughThe PR refactors slot range handling in cluster operations by introducing a normalization step during validation, renaming the validation API, adding a utility for sorting and merging ranges, strengthening equality checks, improving error handling in import task creation with proper cleanup paths, and adding observability through key counting in event logs. New unit tests validate slot merging behavior across multiple scenarios. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/cluster_asm.c`:
- Around line 1011-1023: The function asmCountKeysInSlots contains an unused
local variable total_ranges which triggers -Werror; remove the declaration of
total_ranges from asmCountKeysInSlots (or alternatively use it in the loop) so
there are no unused locals—update the function to iterate using
slots->num_ranges directly and keep the rest of the logic (kvstoreDictSize calls
and accumulation into key_count) unchanged.
In `@src/cluster.c`:
- Around line 2014-2019: The error string produced by
slotRangeArrayNormalizeAndValidate is freed without informing the client; update
the failure path where sds err is set so that before sdsfree(err) and
slotRangeArrayFree(slots) you call addReplyError(c, err) (using the existing
client variable c) to send the validation message, then free err and slots as
before; ensure you only call addReplyError when err is non-NULL to avoid null
deref.
🧹 Nitpick comments (1)
src/cluster.c (1)
1876-1889: Side effect: input arrays are mutated during equality check.
slotRangeArraySortAndMergemodifies the input arrays in-place. Callers expecting their arrays to remain unchanged after an equality check may encounter unexpected behavior.Consider either:
- Documenting this mutation in a comment above the function
- Making copies before sorting/merging if callers need to preserve original order
If this is intentional and callers are aware, a brief comment would clarify the contract.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cluster.csrc/cluster.hsrc/cluster_asm.ctests/unit/cluster/atomic-slot-migration.tcl
🧰 Additional context used
🧬 Code graph analysis (3)
src/cluster.h (1)
src/cluster.c (2)
slotRangeArraySortAndMerge(1860-1873)slotRangeArrayNormalizeAndValidate(1736-1771)
src/cluster_asm.c (4)
src/cluster_legacy.c (1)
getMyClusterNode(5832-5834)src/sds.c (3)
sdsnew(203-206)sdscatprintf(635-642)sdsempty(198-200)src/cluster.c (3)
slotRangeArrayFree(1944-1946)slotRangeArrayDup(1781-1785)slotRangeArrayNormalizeAndValidate(1736-1771)src/kvstore.c (1)
kvstoreDictSize(706-712)
src/cluster.c (2)
src/sds.c (3)
sdscatprintf(635-642)sdsempty(198-200)sdsfree(214-217)src/networking.c (1)
addReplyErrorArity(704-707)
🪛 GitHub Actions: CI
src/cluster_asm.c
[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]. Build failed in step: make -C src all (warnings are treated as errors).
🪛 GitHub Actions: External Server Tests
src/cluster_asm.c
[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-old-chain-jemalloc
- GitHub Check: build-32bit
- GitHub Check: code-coverage
- GitHub Check: Analyze (cpp)
- GitHub Check: test-external-standalone
- GitHub Check: build-macos-latest
- GitHub Check: code-coverage
🔇 Additional comments (10)
src/cluster.h (1)
187-197: LGTM! Header declarations align with implementation.The new
slotRangeArraySortAndMergedeclaration and the renamedslotRangeArrayNormalizeAndValidatefunction are consistent with the implementations insrc/cluster.c. The naming follows the existingslotRangeArray*convention.src/cluster.c (4)
1734-1771: LGTM! Normalization and validation logic is correct.The function properly:
- Validates the number of ranges first (before sorting)
- Sorts and merges adjacent ranges
- Validates bounds, ordering, and detects overlaps via the
used_slotsbitmapThe ordering is appropriate since sorting must happen before the overlap detection via the bitmap.
1794-1806: LGTM! Good defensive NULL check.The added NULL safety check at line 1796 prevents potential null pointer dereference and returns an empty string consistently.
1855-1873: LGTM! Merge logic is correct for adjacent-only merging.The function correctly:
- Guards against NULL/single-element inputs
- Sorts by start slot
- Merges only truly adjacent ranges (
end + 1 == start)- Preserves overlapping ranges (which will be caught by validation)
The comment clearly documents that overlapping ranges are intentionally not merged.
1832-1837: LGTM! Proper error handling in parsing function.The normalization call and error message cleanup are handled correctly for this internal parsing function that returns NULL on failure.
tests/unit/cluster/atomic-slot-migration.tcl (1)
514-548: LGTM! Comprehensive test coverage for adjacent slot range merging.The test covers key scenarios:
- Simple adjacent ranges merging (
7000-7001 7002-7100→7000-7100)- Unsorted input producing multiple disjoint ranges after normalization
- Single-slot ranges merging into a contiguous range
Verifying consistent results on both source and destination nodes ensures the normalization is applied uniformly across the cluster.
src/cluster_asm.c (4)
1036-1055: Key-count enrichment in lifecycle logs looks good.The additional key counts improve observability for import/migrate events.
2869-2898: Normalization before import task creation is a solid safety net.Duplicating and normalizing ranges prior to task creation reduces ambiguity for adjacent/unsorted ranges and keeps validation consistent.
3303-3308: Consistent key counting for active trim start is helpful.Good use of the shared helper to keep metrics/logging consistent across events.
798-847: Add NULL check after asmTaskCreate call to prevent NULL pointer dereference.Ownership transfer semantics at call sites are correct—both callers (line 865, line 2877) pass ownership to
asmCreateImportTask, which either consumes the slots or frees them on error paths. No UAF/double-free risk there.However,
asmTaskCreatecan fail due to memory allocation, but line 829–835 dereferencestaskwithout NULL checking:asmTask *task = asmTaskCreate(task_id); task->slots = slots; // NULL dereference if asmTaskCreate returned NULLAdd a NULL check after
asmTaskCreateand handle the failure:asmTask *task = asmTaskCreate(task_id); if (!task) { *err = sdsnew("failed to create ASM task"); slotRangeArrayFree(slots); return NULL; }Also remove the unused
total_rangesvariable inasmCountKeysInSlots(line 1016) to fix build failure.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| /* Return the number of keys in the specified slot ranges. */ | ||
| unsigned long long asmCountKeysInSlots(slotRangeArray *slots) { | ||
| if (!slots) return 0; | ||
|
|
||
| unsigned long long key_count = 0; | ||
| int total_ranges = slots->num_ranges; | ||
| for (int i = 0; i < slots->num_ranges; i++) { | ||
| for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) { | ||
| key_count += kvstoreDictSize(server.db[0].keys, j); | ||
| } | ||
| } | ||
| return key_count; | ||
| } |
There was a problem hiding this comment.
Remove unused variable to fix -Werror build break.
CI fails due to the unused total_ranges variable in asmCountKeysInSlots. Please remove it (or use it).
🛠️ Proposed fix
unsigned long long asmCountKeysInSlots(slotRangeArray *slots) {
if (!slots) return 0;
unsigned long long key_count = 0;
- int total_ranges = slots->num_ranges;
for (int i = 0; i < slots->num_ranges; i++) {
for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) {
key_count += kvstoreDictSize(server.db[0].keys, j);
}
}
return key_count;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* Return the number of keys in the specified slot ranges. */ | |
| unsigned long long asmCountKeysInSlots(slotRangeArray *slots) { | |
| if (!slots) return 0; | |
| unsigned long long key_count = 0; | |
| int total_ranges = slots->num_ranges; | |
| for (int i = 0; i < slots->num_ranges; i++) { | |
| for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) { | |
| key_count += kvstoreDictSize(server.db[0].keys, j); | |
| } | |
| } | |
| return key_count; | |
| } | |
| unsigned long long asmCountKeysInSlots(slotRangeArray *slots) { | |
| if (!slots) return 0; | |
| unsigned long long key_count = 0; | |
| for (int i = 0; i < slots->num_ranges; i++) { | |
| for (int j = slots->ranges[i].start; j <= slots->ranges[i].end; j++) { | |
| key_count += kvstoreDictSize(server.db[0].keys, j); | |
| } | |
| } | |
| return key_count; | |
| } |
🧰 Tools
🪛 GitHub Actions: CI
[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]. Build failed in step: make -C src all (warnings are treated as errors).
🪛 GitHub Actions: External Server Tests
[error] 1016-1016: unused variable 'total_ranges' [-Werror=unused-variable]
🤖 Prompt for AI Agents
In `@src/cluster_asm.c` around lines 1011 - 1023, The function asmCountKeysInSlots
contains an unused local variable total_ranges which triggers -Werror; remove
the declaration of total_ranges from asmCountKeysInSlots (or alternatively use
it in the loop) so there are no unused locals—update the function to iterate
using slots->num_ranges directly and keep the rest of the logic (kvstoreDictSize
calls and accumulation into key_count) unchanged.
| sds err = NULL; | ||
| if (validateSlotRanges(slots, &err) != C_OK) { | ||
| addReplyErrorSds(c, err); | ||
| if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) { | ||
| sdsfree(err); | ||
| slotRangeArrayFree(slots); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Validation error message is discarded; client receives no feedback.
When slotRangeArrayNormalizeAndValidate fails, the specific error message (e.g., "Slot X specified multiple times") is freed without being sent to the client. The function returns NULL, but callers may not provide any error reply.
Consider using addReplyError(c, err) before freeing to inform the client of the specific validation failure:
Suggested fix
sds err = NULL;
if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) {
+ addReplyError(c, err);
sdsfree(err);
slotRangeArrayFree(slots);
return NULL;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sds err = NULL; | |
| if (validateSlotRanges(slots, &err) != C_OK) { | |
| addReplyErrorSds(c, err); | |
| if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) { | |
| sdsfree(err); | |
| slotRangeArrayFree(slots); | |
| return NULL; | |
| } | |
| sds err = NULL; | |
| if (slotRangeArrayNormalizeAndValidate(slots, &err) != C_OK) { | |
| addReplyError(c, err); | |
| sdsfree(err); | |
| slotRangeArrayFree(slots); | |
| return NULL; | |
| } |
🤖 Prompt for AI Agents
In `@src/cluster.c` around lines 2014 - 2019, The error string produced by
slotRangeArrayNormalizeAndValidate is freed without informing the client; update
the failure path where sds err is set so that before sdsfree(err) and
slotRangeArrayFree(slots) you call addReplyError(c, err) (using the existing
client variable c) to send the validation message, then free err and slots as
before; ensure you only call addReplyError when err is non-NULL to avoid null
deref.
Benchmark PR from qodo-benchmark#92
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.