Skip to content

Backport hfe fixes to 9.0#3111

Merged
ranshid merged 12 commits into
valkey-io:9.0from
ranshid:backport-hfe-fixes
Jan 29, 2026
Merged

Backport hfe fixes to 9.0#3111
ranshid merged 12 commits into
valkey-io:9.0from
ranshid:backport-hfe-fixes

Conversation

@ranshid ranshid added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jan 26, 2026
@ranshid ranshid marked this pull request as ready for review January 26, 2026 09:40
@ranshid ranshid requested a review from murphyjacob4 January 26, 2026 09:40
@codecov

codecov Bot commented Jan 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.72611% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.49%. Comparing base (ab3c953) to head (f1fd905).
⚠️ Report is 12 commits behind head on 9.0.

Files with missing lines Patch % Lines
src/module.c 0.00% 1 Missing ⚠️
src/t_hash.c 99.32% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              9.0    #3111      +/-   ##
==========================================
- Coverage   72.66%   72.49%   -0.18%     
==========================================
  Files         128      128              
  Lines       69867    69970     +103     
==========================================
- Hits        50769    50724      -45     
- Misses      19098    19246     +148     
Files with missing lines Coverage Δ
src/db.c 93.18% <100.00%> (+0.02%) ⬆️
src/server.c 88.39% <100.00%> (-0.15%) ⬇️
src/server.h 100.00% <ø> (ø)
src/module.c 9.76% <0.00%> (ø)
src/t_hash.c 96.80% <99.32%> (+0.36%) ⬆️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@murphyjacob4

Copy link
Copy Markdown
Contributor

Looks like Ubuntu 32bit tests failed: https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064

In test: test {HSET/HLEN - Small hash creation} - seems like it could be related?

@ranshid

ranshid commented Jan 27, 2026

Copy link
Copy Markdown
Member Author

Looks like Ubuntu 32bit tests failed: https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064

In test: test {HSET/HLEN - Small hash creation} - seems like it could be related?

Indeed. this is very strange. I was also looking into this. It seems that my IDE added include for "expire.h" at the start of "t_hash.c". This caused some change in the alignment in 32bit of "server.hash_max_listpack_entries", so when it is loded into a register with 32bit instructions, it will be loaded as '0' and that is the reason hash objects are always expended on 32 bits...

@ranshid

ranshid commented Jan 27, 2026

Copy link
Copy Markdown
Member Author

Looks like Ubuntu 32bit tests failed: https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064
In test: test {HSET/HLEN - Small hash creation} - seems like it could be related?

Indeed. this is very strange. I was also looking into this. It seems that my IDE added include for "expire.h" at the start of "t_hash.c". This caused some change in the alignment in 32bit of "server.hash_max_listpack_entries", so when it is loded into a register with 32bit instructions, it will be loaded as '0' and that is the reason hash objects are always expended on 32 bits...

Now I get to understand it better. #2943 had features.h early include in expire.h, which in turn made a wrong assumption about SIZE_TYPE during preprocessor include of server.h. this made the compiler indicate server.hash_max_listpack_entries (which is size_t) as 64bit in this compiation unit. however the following size_t where correctly using 32bit SIZE_TYPE.

ranshid added a commit that referenced this pull request Jan 27, 2026
This caused some unaligned load of server values on 32bit compilations.

Example:
https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064

Some insights here:
#3111 (comment)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
ranshid added a commit to ranshid/valkey that referenced this pull request Jan 27, 2026
This caused some unaligned load of server values on 32bit compilations.

Example:
https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064

Some insights here:
valkey-io#3111 (comment)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid ranshid requested a review from zuiderkwast January 27, 2026 15:40

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one test failure on the address sanitizer test run:

*** [err]: HSETEX EXAT single field expires leaving other fields intact in tests/unit/hashexpire.tcl
Active expiry did not occur as expected expected: 31 ststs: 30

If it's a flaky test that's unrelated to these fixes, then I guess we can ignore it for now.

@ranshid

ranshid commented Jan 28, 2026

Copy link
Copy Markdown
Member Author

There is one test failure on the address sanitizer test run:

*** [err]: HSETEX EXAT single field expires leaving other fields intact in tests/unit/hashexpire.tcl
Active expiry did not occur as expected expected: 31 ststs: 30

If it's a flaky test that's unrelated to these fixes, then I guess we can ignore it for now.

Yes and no. It is flaky but it does point to a bug which is fixed in https://github.com/valkey-io/valkey/pull/3001/files

Bottom line: I can deflake the test anyway. Will do it asap

@ranshid ranshid force-pushed the backport-hfe-fixes branch from 97528bb to 0e1ce10 Compare January 28, 2026 14:47
ranshid and others added 4 commits January 28, 2026 22:12
…xiting fields (valkey-io#2973)

When HEXPIRE commands are set with a time-in-the-past they are all
deleting the specified fields.
In such cases we allocate a temporal new argv in order to replicate
`HDEL`.
However in case no mutation was done (ie all fields do not exist) we do
not deallocate the temporal new_argv and there is a memory leak.

example:

```
HSET myhash field1 value1
1
HEXPIRE myhash 0 FIELDS 1 field2
-2
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Following Hash-Field-Expiration feature, a hash object can hold volatile
fields.
volatile fields which are already expired are deleted and reclaimed ONLY
by the active-expiration background job.
This means that hash object can contain items which have not yet
expired.
In case mutations are requesting to set a value on these
"already-expired" fields, they will be overwritten with the new value.
In such cases, though, it is requiered to update the global per-db
tracking map by removing the key if it has no more volatile fields.
This was implemented in all mutation cases of the hash commands but the
`INCRBY` and `INCRBYFLOAT`.
This can lead to a dangling object which has no volatile items, which
might lead to assertion during the active-expiration job:

example reproduction:
```
DEBUG SET-ACTIVE-EXPIRE 0
hset myhash f1 10
hexpire myhash 1 FIELDS 1 f1
sleep(10)
hincrby myhash f1 1
DEBUG SET-ACTIVE-EXPIRE 1
```   

NOTE: we actually had tests for this scenario, only the test did not
include explicit assertion in case the item is still tracked after the
mutation.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
If you send an `HSETEX key EXAT 0 FIELDS 1 foo bar`, it will create an
empty length string. This is not good behavior as other places in the
code will crash if an empty hash is accessed.

This change just makes it so that the key is freed if it ends up being
empty. This has some odd behavior w.r.t. to keyspace notifications
though.

Today, if you do `HSETEX key EXAT 0 FIELDS 1 foo bar` with a hash that
exists, it won't send an `HSET` notification or an `HEXPIRE`
notification unless foo is already present. It will also send a `DEL`
notification if foo is last the field, effectively deleting the key.

This isn't consistent with Redis, which will still send an `HSET` and
`HDEL` (*NOTE* not `hexpire`) notification if a key is added past the
expiration. Likewise, in the case where the key doesn't exist and
`HSETEX key EXAT 0 FIELDS 1 foo bar` is sent, Redis will send an HSET,
HDEL, and DEL notification in that order.

I think it makes sense to keep consistency with what we have today
(given that it's been out for awhile), but document the behavior.
Basically, if you send a command with an expiration in the past, we will
basically ignore the input unless it's "deleting" a key.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
HSETEX right now uses `strcmp` which does not account for
case-insensitivity replaced it with `strcasecmp`.
Fixes valkey-io#2996

---------

Signed-off-by: frostzt <aidenfrostbite@gmail.com>
Signed-off-by: Sourav Singh Rawat <aidenfrostbite@gmail.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid ranshid force-pushed the backport-hfe-fixes branch from 0e1ce10 to 8872177 Compare January 28, 2026 21:35
frostzt and others added 7 commits January 28, 2026 23:38
…alkey-io#3003)

In `dbSetValue` if the robj is a hash with volatile fields we need to
clean it up. It causes two core problems:
1. Type Confusion: Allows converting from hash to string with
`keys_with_volatile_items` tracked
2. Tracked Items with dangling pointer: Converting from `hash` to ie. a
string breaks this as `keys_with_volatile_items` is not cleaned up.

Check valkey-io#3000

---------

Signed-off-by: frostzt <aidenfrostbite@gmail.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
In `dbSetValue()` the `old` pointer may be reassigned to point to the
incoming value object which was created without an embedded key, so
calling `dbUntrackKeyWithVolatileItems()` would call `objectGetKey()`
which returns NULL, causing a crash in `hashtableSdsHash()` when trying
to hash the NULL key.

Idea is to assign `old_was_hash_with_volatile` before the swap and use
`new` instead of `old` for untracking when theres no embedded key.

Introduced in valkey-io#3003
Run with NULL ptr dereference:
https://github.com/valkey-io/valkey/actions/runs/20701343184/job/59424029880

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
…alkey-io#3022)

if the hrandfield(e.g. hrandfield myhash) command without other args
does not find a valid field, it will return an uninitialized lval.

```
debug set-active-expire no
hsetex myhash ex 1 fields 2 f1 v1 f2 v2
after 1s...
hrandfield myhash
[will return some uninitialized number]
```

related: valkey-io#3021

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
…on is in the past (valkey-io#3023)

if an expired time is used, the condition is ignored, and it directly
becomes the effect of the `hdel` command.
This is mainly important for cases where the user would like to protect
deleting the data in case the `HEXPIREAT` command is used and the user
would like to protect delay execution of this command to delete fields.

fixes: valkey-io#3021

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
…piration is in the past (valkey-io#3048)

valkey-io#3023 was only partially solving
the problem.
We need to avoid expiring items in case of any validation rule failed.

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
…ey-io#3060)

There are currently several issues with the existing hash field
expiration mechanism:
1. `HINCRBY` is propagated to the replica "as-is". It mean it relies on
the fact that the state of the hash is the same on the primary and the
replica. HFE did change this assumption as the field might be expired
only when the replica will handle the propagated `hincrby`. the problem
is that the replica does not "expire" fields by it's own. it needs to
respect the request from the primary and always try to use the existing
field. This can lead to either miss-alignment with the value on the
primary and the replica AND even a disconnection since the replica might
hold and "expired" field which is not in "integer" format...
2. HINCRBYFLOAT is currently ALWAYS propagating `hset` - this means that
the expiration time of an entry will always be removed on the replica
side (it needs to propagate HSETEX when expiration time needs to be
maintained)
3. Currently all hash write commands which are mutating values might
overwrite an expired field. In such cases the existing implementation
will "silently" do so. The problem is that the user will not get any
key-space-notificaiton explaining the reason for the behavior. For
example, when `hincrby` is issued overwriting an expired field which was
not yet "cleaned" by active-expiration it will reset the counter to '0'
before incrementing it. this means that the user might ask: why is the
value '1' and not bigger, "I did not see any notification that the old
value expired"...
4. HSETEX with KEEPTTL suffers from a "somewhat" similar problem as
if the primary "replaced" the entry which is expired now but might not
have been expired when the primary applied it.

There are 2 options for a solution:
1. we could propagate `hdel` for every entry we are "overwritting"
(batch them if we can)
2. propagate the commands "by effect". For example - have `hincrby`
always propagate either HSET or HSETEX. This will not solve the '#(4)'
problem above though, for which we might HAVE to propagate `hdel`

I tend to go with the second option. The reason is that it is expected
to have less impact on replication stream and should include less
processing time on the replicas and network traffic. Specifically for
HSETEX with KEEPTTL we will have to propagate the `hdel` in case we
overwritten an expired field, but that would help limit the impact of
this propagation.

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Sourav Singh Rawat <aidenfrostbite@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
…alkey-io#3120)

The test is currently flakey, because until we merge:
valkey-io#3001
when the expiration time provided is in the past, and the field does not
exist the HSETX will just silently ignore it, without incrementing the
statistics.

I prefer to focus on writing a dedicated test for:
valkey-io#3001 and deflake this test now.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid ranshid force-pushed the backport-hfe-fixes branch from 8872177 to 4f040ff Compare January 28, 2026 21:39
This caused some unaligned load of server values on 32bit compilations.

Example:
https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064

Some insights here:
valkey-io#3111 (comment)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid ranshid merged commit 27e76e3 into valkey-io:9.0 Jan 29, 2026
153 of 162 checks passed
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
This caused some unaligned load of server values on 32bit compilations.

Example:
https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064

Some insights here:
valkey-io#3111 (comment)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
This caused some unaligned load of server values on 32bit compilations.

Example:
https://github.com/valkey-io/valkey/actions/runs/21352969942/job/61491923381?pr=3111#step:6:3064

Some insights here:
valkey-io#3111 (comment)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants