Hash prefetching by xbasel · Pull Request #2180 · valkey-io/valkey

xbasel · 2025-06-06T11:56:56Z

Prefetches hash type values.

This is 1 hash with 30 million fields, 70 bytes each

No hash prefetching (hget)
Summary:
  throughput summary: 974468.94 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.295     0.112     0.295     0.343     0.391    33.151
With hash prefetching:
Summary:
  throughput summary: 1210800.38 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.240     0.104     0.239     0.311     0.407     7.271
~%20 improvement.

ranshid · 2025-06-08T04:14:22Z

This is great!. I wonder if we went all the way though? in some cases we might also want to add another state to check if the hashtable has value pointer and prefetch it as well.

naglera · 2026-01-18T08:23:04Z

        if (val->encoding == OBJ_ENCODING_RAW && val->type == OBJ_STRING) {
            valkey_prefetch(val->ptr);
+        } else if (val->encoding == OBJ_ENCODING_HASHTABLE && val->type == OBJ_HASH) {
+            if (info->client) {


Handle the case where info->client is NULL to prevent an infinite loop.

should not need if we remove HASHTABLE_PREFETCH_DONE then it would go to markKeyAsdone

naglera · 2026-01-18T08:38:29Z

+            if (info->client) {
+                if (info->client->io_parsed_cmd->proc == hsetCommand || info->client->io_parsed_cmd->proc ==
+                    hgetCommand) {
+                    if (info->value_prefetch_info.state == HASHTABLE_PREFETCH_ENTRY) {


Consider refactor this logic into a dedicated method for prefetching hashtable values, as similar functionality will be needed for set-based commands.

naglera · 2026-01-18T08:48:11Z

+            case PREFETCH_VALUE: {
+                prefetchValue(info);
+                if (info->state != PREFETCH_DONE) {
+                    moveToNextKey();


Consider refactoring to handle index management internally within the prefetch methods rather than requiring explicit moveToNextKey() calls. This would simplify the control flow.

Please check the latest code

codecov · 2026-01-27T23:26:51Z

Codecov Report

❌ Patch coverage is 45.16129% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.55%. Comparing base (b83209d) to head (ce27eef).

Files with missing lines	Patch %	Lines
src/memory_prefetch.c	37.03%	17 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2180      +/-   ##
============================================
- Coverage     76.72%   76.55%   -0.17%     
============================================
  Files           159      159              
  Lines         79681    79710      +29     
============================================
- Hits          61135    61022     -113     
- Misses        18546    18688     +142

Files with missing lines	Coverage Δ
src/valkey-benchmark.c	`61.81% <100.00%> (+0.10%)`	⬆️
src/memory_prefetch.c	`72.95% <37.03%> (-7.65%)`	⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast · 2026-01-28T14:58:44Z

+        if (info->client->parsed_cmd->proc == hsetCommand || info->client->parsed_cmd->proc ==
+                                                                 hgetCommand) {


I have some concern for this kind of custom code per command. This is part of the HSET and HGET command, so the implementation/optimization of these commands becomes scattered. The more we add of this kind of code, the bigger this becomes, so if we want to do this for sets, sorted sets, etc., we should think about other ways to do this. For example, can we include add a prefetching callback in the command's JSON file?

Extend the existing memory prefetch system to walk inner hashtables for hash (OBJ_ENCODING_HASHTABLE), set (OBJ_ENCODING_HASHTABLE), and sorted set (OBJ_ENCODING_SKIPLIST) types using a 3-phase approach: 1. HEADER: prefetch val->ptr (data structure header) 2. INIT: initialize incremental find on inner hashtable with argv[2] 3. STEP: walk inner hashtable buckets This amortizes cache misses across multiple commands in the batch, improving throughput for HGET, SISMEMBER, ZSCORE and similar commands on large datasets that exceed L3 cache. Benchmark results (2x c7g.4xlarge, separate server/client, io-threads=8, 200 clients, P=1, 30M hash / 10M set / 10M zset): HGET: +12.4% (888K -> 999K RPS) SISMEMBER: +12.6% (888K -> 1000K RPS) ZSCORE: +12.5% (888K -> 999K RPS) GET: ~0% (no regression for strings) Design addresses PR valkey-io#2180 review feedback: - Generic encoding-based approach (no per-command checks) - Dedicated prefetchValueDeep() function for reuse - NULL client handled gracefully - Preserves existing copy-avoidance for strings Signed-off-by: Rosh Khatri <roshkhatri@gmail.com>

Introduce hashtable prefetch logic to bring hash value data closer to cache Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>

coderabbitai · 2026-06-16T21:55:27Z

📝 Walkthrough

Walkthrough

Adds a per-command value prefetch callback interface (serverValuePrefetchProc) to the memory prefetch engine. KeyPrefetchInfo is extended with a client pointer and ValuePrefetchInfo state; prefetchValue dispatches to parsed_cmd->prefetch_proc when present. hashValuePrefetchCallback implements a 3-phase incremental hashtable lookup prefetch, wired to HGET and HSET via JSON metadata and the ASSIGN_PREFETCH_PROC macro.

Changes

Per-command value prefetch callback for HGET/HSET

Layer / File(s)	Summary
Prefetch state types and callback interface `src/memory_prefetch.h`, `src/server.h`	`memory_prefetch.h` adds `HashtablePrefetchState` enum and `ValuePrefetchInfo` struct (with `hashtableIncrementalFindState` union). `server.h` adds the `serverValuePrefetchProc` typedef, a `prefetch_proc` field on `serverCommand`, and the `hashValuePrefetchCallback` prototype.
Prefetch engine: client tracking and callback dispatch `src/memory_prefetch.c`	`KeyPrefetchInfo` gains `client *client` and `value_prefetch_info`; batch init assigns client and sets initial state; `prefetchValue` conditionally invokes `parsed_cmd->prefetch_proc` for non-RAW values and handles early return; `hashtablePrefetch` switch cases reformatted to braced blocks.
`hashValuePrefetchCallback` 3-phase state machine `src/t_hash.c`	Implements the hash-specific prefetch callback: phase 1 prefetches the hashtable pointer, phase 2 calls `hashtableIncrementalFindInit` with `argv[2]`, phase 3 advances via `hashtableIncrementalFindStep` until done; returns `false` immediately for non-hashtable encodings.
Command wiring: `ASSIGN_PREFETCH_PROC` macro, JSON metadata, and code generator `src/commands.c`, `src/cli_commands.c`, `src/commands/hget.json`, `src/commands/hset.json`, `src/commands.def`, `utils/generate-command-code.py`	`ASSIGN_PREFETCH_PROC(fn)` macro defined in both command compilation units; `hget.json` and `hset.json` add `prefetch_function: hashValuePrefetchCallback`; `commands.def` applies the macro to HGET and HSET entries; `generate-command-code.py` emits the macro call when `prefetch_function` is present.
Benchmark HGET case and debug prompt fix `src/valkey-benchmark.c`, `tests/support/util.tcl`	`valkey-benchmark.c` adds an HGET default-suite benchmark using the cluster/tag-aware key pattern; `util.tcl` simplifies the `bp` debugger prompt from `$who/$s>` to `>` .

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Hash prefetching' directly describes the main change: adding hash value prefetching functionality for performance optimization across multiple files.
Description check	✅ Passed	The description explains the feature being added (hash prefetching) and provides specific performance benchmark data demonstrating the ~20% improvement, directly relating to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…tializer Replace hardcoded command checks in prefetchHashObj with a callback based architecture. Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/memory_prefetch.c`:
- Around line 121-123: The KeyPrefetchInfo structure binds client reference by
key index using batch->clients[i], but this loses the key-to-command context
because keys are enqueued per command while clients are enqueued per client.
When a client has multiple keys, the mapping at line 121-123 becomes incorrect,
causing line 147-149 to dereference info->client->parsed_cmd from the wrong
context. Store the complete per-key prefetch context (command, prefetch proc,
and argv context) in KeyPrefetchInfo at enqueue time instead of just storing
batch->clients[i]. Then in the prefetchValue callback around lines 259-293, use
this stored context directly rather than dereferencing from
info->client->parsed_cmd which may be NULL or belong to a different command.

In `@src/valkey-benchmark.c`:
- Around line 2635-2639: The integration test suite in
tests/integration/valkey-benchmark.tcl currently validates hset benchmark calls
but does not verify the new hget benchmark path added to the default suite in
valkey-benchmark.c. Update the full-suite test in
tests/integration/valkey-benchmark.tcl to add assertions that check for cmdstat
hget execution, ensuring that the HGET benchmark code path (which formats the
HGET myhash command) is continuously validated alongside the existing hset
validations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c7d1e068-8daf-46e5-923d-2559d95186bf

📥 Commits

Reviewing files that changed from the base of the PR and between 6774c09 and be3511d.

📒 Files selected for processing (12)

src/cli_commands.c
src/commands.c
src/commands.def
src/commands/hget.json
src/commands/hset.json
src/memory_prefetch.c
src/memory_prefetch.h
src/server.h
src/t_hash.c
src/valkey-benchmark.c
tests/support/util.tcl
utils/generate-command-code.py

coderabbitai · 2026-06-16T22:03:24Z

+        info->client = batch->clients[i];
+        info->value_prefetch_info.state = HASHTABLE_PREFETCH_ENTRY;
        hashtableIncrementalFindInit(&info->hashtab_state, tables[i], batch->keys[i]);


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Key-to-command context is lost during value-prefetch dispatch.

KeyPrefetchInfo binds info->client by key index (batch->clients[i]), but keys are enqueued per command while clients are enqueued per client. Once a client contributes multiple keys/queued commands, this mapping diverges; Line 147 then dereferences info->client->parsed_cmd from the wrong context (or NULL) and may invoke an unrelated callback path.
Please store per-key prefetch context at enqueue time (e.g., command/prefetch proc plus the argv context needed by the callback) and use that in prefetchValue instead of deriving from batch->clients[i].

Also applies to: 147-149, 259-293

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/memory_prefetch.c` around lines 121 - 123, The KeyPrefetchInfo structure binds client reference by key index using batch->clients[i], but this loses the key-to-command context because keys are enqueued per command while clients are enqueued per client. When a client has multiple keys, the mapping at line 121-123 becomes incorrect, causing line 147-149 to dereference info->client->parsed_cmd from the wrong context. Store the complete per-key prefetch context (command, prefetch proc, and argv context) in KeyPrefetchInfo at enqueue time instead of just storing batch->clients[i]. Then in the prefetchValue callback around lines 259-293, use this stored context directly rather than dereferencing from info->client->parsed_cmd which may be NULL or belong to a different command.

coderabbitai · 2026-06-16T22:03:24Z

+        if (test_is_selected("hget")) {
+            len = valkeyFormatCommand(&cmd, "HGET myhash%s element:__rand_int__", tag);
+            benchmark("HGET", cmd, len);
+            free(cmd);
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add integration coverage for the new default-suite HGET benchmark path.

This changes default end-to-end benchmark behavior, but the existing integration suite snapshot shown for tests/integration/valkey-benchmark.tcl validates hset calls and not hget. Please add/adjust assertions for cmdstat hget in the full-suite test so this path is continuously verified.

Suggested test update

diff --git a/tests/integration/valkey-benchmark.tcl b/tests/integration/valkey-benchmark.tcl @@ assert_match {*calls=100,*} [cmdstat hset] + assert_match {*calls=100,*} [cmdstat hget] assert_match {*calls=100,*} [cmdstat spop]

As per coding guidelines, “End-to-end behavior changes should be covered by integration tests.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/valkey-benchmark.c` around lines 2635 - 2639, The integration test suite in tests/integration/valkey-benchmark.tcl currently validates hset benchmark calls but does not verify the new hget benchmark path added to the default suite in valkey-benchmark.c. Update the full-suite test in tests/integration/valkey-benchmark.tcl to add assertions that check for cmdstat hget execution, ensuring that the HGET benchmark code path (which formats the HGET myhash command) is continuously validated alongside the existing hset validations.

Source: Coding guidelines

xbasel marked this pull request as draft June 6, 2025 11:57

xbasel marked this pull request as ready for review December 8, 2025 08:27

roshkhatri self-assigned this Jan 15, 2026

naglera reviewed Jan 18, 2026

View reviewed changes

roshkhatri mentioned this pull request Jan 22, 2026

Prefetch hash data #3093

Closed

roshkhatri reviewed Jan 23, 2026

View reviewed changes

Comment thread src/memory_prefetch.c Outdated

xbasel force-pushed the prefetch_hash branch 3 times, most recently from 0092fd7 to 47955cd Compare January 27, 2026 23:00

xbasel requested review from naglera and zuiderkwast January 27, 2026 23:00

xbasel force-pushed the prefetch_hash branch from 47955cd to 7e5c7a0 Compare January 27, 2026 23:07

roshkhatri assigned xbasel and unassigned roshkhatri Jan 27, 2026

zuiderkwast reviewed Jan 28, 2026

View reviewed changes

xbasel force-pushed the prefetch_hash branch from 7e5c7a0 to ce27eef Compare March 27, 2026 20:23

roshkhatri mentioned this pull request Jun 9, 2026

Deep prefetching for hash and zset inner hashtables #3940

Open

Add hash prefetching for hash type values

5b6551b

Introduce hashtable prefetch logic to bring hash value data closer to cache Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>

xbasel force-pushed the prefetch_hash branch from ce27eef to d0a9893 Compare June 16, 2026 21:55

Refactor hash prefetching to use command callbacks via designated ini…

be3511d

…tializer Replace hardcoded command checks in prefetchHashObj with a callback based architecture. Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>

xbasel force-pushed the prefetch_hash branch from d0a9893 to be3511d Compare June 16, 2026 21:55

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

		if (info->client->parsed_cmd->proc == hsetCommand \|\| info->client->parsed_cmd->proc ==
		hgetCommand) {

Conversation

xbasel commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranshid commented Jun 8, 2025

Uh oh!

Uh oh!

naglera Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

roshkhatri Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

naglera Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

naglera Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

xbasel Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xbasel commented Jun 6, 2025 •

edited

Loading

codecov Bot commented Jan 27, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading