Hash prefetching#2180
Conversation
|
This is great!. I wonder if we went all the way though? in some cases we might also want to add another state to check if the hashtable has value pointer and prefetch it as well. |
| if (val->encoding == OBJ_ENCODING_RAW && val->type == OBJ_STRING) { | ||
| valkey_prefetch(val->ptr); | ||
| } else if (val->encoding == OBJ_ENCODING_HASHTABLE && val->type == OBJ_HASH) { | ||
| if (info->client) { |
There was a problem hiding this comment.
Handle the case where info->client is NULL to prevent an infinite loop.
There was a problem hiding this comment.
should not need if we remove HASHTABLE_PREFETCH_DONE then it would go to markKeyAsdone
| if (info->client) { | ||
| if (info->client->io_parsed_cmd->proc == hsetCommand || info->client->io_parsed_cmd->proc == | ||
| hgetCommand) { | ||
| if (info->value_prefetch_info.state == HASHTABLE_PREFETCH_ENTRY) { |
There was a problem hiding this comment.
Consider refactor this logic into a dedicated method for prefetching hashtable values, as similar functionality will be needed for set-based commands.
| case PREFETCH_VALUE: { | ||
| prefetchValue(info); | ||
| if (info->state != PREFETCH_DONE) { | ||
| moveToNextKey(); |
There was a problem hiding this comment.
Consider refactoring to handle index management internally within the prefetch methods rather than requiring explicit moveToNextKey() calls. This would simplify the control flow.
There was a problem hiding this comment.
Please check the latest code
0092fd7 to
47955cd
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2180 +/- ##
============================================
- Coverage 76.72% 76.55% -0.17%
============================================
Files 159 159
Lines 79681 79710 +29
============================================
- Hits 61135 61022 -113
- Misses 18546 18688 +142
🚀 New features to boost your workflow:
|
| if (info->client->parsed_cmd->proc == hsetCommand || info->client->parsed_cmd->proc == | ||
| hgetCommand) { |
There was a problem hiding this comment.
I have some concern for this kind of custom code per command. This is part of the HSET and HGET command, so the implementation/optimization of these commands becomes scattered. The more we add of this kind of code, the bigger this becomes, so if we want to do this for sets, sorted sets, etc., we should think about other ways to do this. For example, can we include add a prefetching callback in the command's JSON file?
Extend the existing memory prefetch system to walk inner hashtables for hash (OBJ_ENCODING_HASHTABLE), set (OBJ_ENCODING_HASHTABLE), and sorted set (OBJ_ENCODING_SKIPLIST) types using a 3-phase approach: 1. HEADER: prefetch val->ptr (data structure header) 2. INIT: initialize incremental find on inner hashtable with argv[2] 3. STEP: walk inner hashtable buckets This amortizes cache misses across multiple commands in the batch, improving throughput for HGET, SISMEMBER, ZSCORE and similar commands on large datasets that exceed L3 cache. Benchmark results (2x c7g.4xlarge, separate server/client, io-threads=8, 200 clients, P=1, 30M hash / 10M set / 10M zset): HGET: +12.4% (888K -> 999K RPS) SISMEMBER: +12.6% (888K -> 1000K RPS) ZSCORE: +12.5% (888K -> 999K RPS) GET: ~0% (no regression for strings) Design addresses PR valkey-io#2180 review feedback: - Generic encoding-based approach (no per-command checks) - Dedicated prefetchValueDeep() function for reuse - NULL client handled gracefully - Preserves existing copy-avoidance for strings Signed-off-by: Rosh Khatri <roshkhatri@gmail.com>
Introduce hashtable prefetch logic to bring hash value data closer to cache Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
📝 WalkthroughWalkthroughAdds a per-command value prefetch callback interface ( ChangesPer-command value prefetch callback for HGET/HSET
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…tializer Replace hardcoded command checks in prefetchHashObj with a callback based architecture. Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/memory_prefetch.c`:
- Around line 121-123: The KeyPrefetchInfo structure binds client reference by
key index using batch->clients[i], but this loses the key-to-command context
because keys are enqueued per command while clients are enqueued per client.
When a client has multiple keys, the mapping at line 121-123 becomes incorrect,
causing line 147-149 to dereference info->client->parsed_cmd from the wrong
context. Store the complete per-key prefetch context (command, prefetch proc,
and argv context) in KeyPrefetchInfo at enqueue time instead of just storing
batch->clients[i]. Then in the prefetchValue callback around lines 259-293, use
this stored context directly rather than dereferencing from
info->client->parsed_cmd which may be NULL or belong to a different command.
In `@src/valkey-benchmark.c`:
- Around line 2635-2639: The integration test suite in
tests/integration/valkey-benchmark.tcl currently validates hset benchmark calls
but does not verify the new hget benchmark path added to the default suite in
valkey-benchmark.c. Update the full-suite test in
tests/integration/valkey-benchmark.tcl to add assertions that check for cmdstat
hget execution, ensuring that the HGET benchmark code path (which formats the
HGET myhash command) is continuously validated alongside the existing hset
validations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c7d1e068-8daf-46e5-923d-2559d95186bf
📒 Files selected for processing (12)
src/cli_commands.csrc/commands.csrc/commands.defsrc/commands/hget.jsonsrc/commands/hset.jsonsrc/memory_prefetch.csrc/memory_prefetch.hsrc/server.hsrc/t_hash.csrc/valkey-benchmark.ctests/support/util.tclutils/generate-command-code.py
| info->client = batch->clients[i]; | ||
| info->value_prefetch_info.state = HASHTABLE_PREFETCH_ENTRY; | ||
| hashtableIncrementalFindInit(&info->hashtab_state, tables[i], batch->keys[i]); |
There was a problem hiding this comment.
Key-to-command context is lost during value-prefetch dispatch.
KeyPrefetchInfo binds info->client by key index (batch->clients[i]), but keys are enqueued per command while clients are enqueued per client. Once a client contributes multiple keys/queued commands, this mapping diverges; Line 147 then dereferences info->client->parsed_cmd from the wrong context (or NULL) and may invoke an unrelated callback path.
Please store per-key prefetch context at enqueue time (e.g., command/prefetch proc plus the argv context needed by the callback) and use that in prefetchValue instead of deriving from batch->clients[i].
Also applies to: 147-149, 259-293
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/memory_prefetch.c` around lines 121 - 123, The KeyPrefetchInfo structure
binds client reference by key index using batch->clients[i], but this loses the
key-to-command context because keys are enqueued per command while clients are
enqueued per client. When a client has multiple keys, the mapping at line
121-123 becomes incorrect, causing line 147-149 to dereference
info->client->parsed_cmd from the wrong context. Store the complete per-key
prefetch context (command, prefetch proc, and argv context) in KeyPrefetchInfo
at enqueue time instead of just storing batch->clients[i]. Then in the
prefetchValue callback around lines 259-293, use this stored context directly
rather than dereferencing from info->client->parsed_cmd which may be NULL or
belong to a different command.
| if (test_is_selected("hget")) { | ||
| len = valkeyFormatCommand(&cmd, "HGET myhash%s element:__rand_int__", tag); | ||
| benchmark("HGET", cmd, len); | ||
| free(cmd); | ||
| } |
There was a problem hiding this comment.
Add integration coverage for the new default-suite HGET benchmark path.
This changes default end-to-end benchmark behavior, but the existing integration suite snapshot shown for tests/integration/valkey-benchmark.tcl validates hset calls and not hget. Please add/adjust assertions for cmdstat hget in the full-suite test so this path is continuously verified.
Suggested test update
diff --git a/tests/integration/valkey-benchmark.tcl b/tests/integration/valkey-benchmark.tcl
@@
assert_match {*calls=100,*} [cmdstat hset]
+ assert_match {*calls=100,*} [cmdstat hget]
assert_match {*calls=100,*} [cmdstat spop]As per coding guidelines, “End-to-end behavior changes should be covered by integration tests.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/valkey-benchmark.c` around lines 2635 - 2639, The integration test suite
in tests/integration/valkey-benchmark.tcl currently validates hset benchmark
calls but does not verify the new hget benchmark path added to the default suite
in valkey-benchmark.c. Update the full-suite test in
tests/integration/valkey-benchmark.tcl to add assertions that check for cmdstat
hget execution, ensuring that the HGET benchmark code path (which formats the
HGET myhash command) is continuously validated alongside the existing hset
validations.
Source: Coding guidelines
Prefetches hash type values.
This is 1 hash with 30 million fields, 70 bytes each