Skip to content

Backport bug fixes for 9.0.2#3129

Merged
zuiderkwast merged 12 commits into
valkey-io:9.0from
zuiderkwast:backport-9-0-2
Feb 3, 2026
Merged

Backport bug fixes for 9.0.2#3129
zuiderkwast merged 12 commits into
valkey-io:9.0from
zuiderkwast:backport-9-0-2

Conversation

@zuiderkwast

@zuiderkwast zuiderkwast commented Jan 29, 2026

Copy link
Copy Markdown
Contributor

Includes release notes for these commits and for the bug fixes backported earlier in #3111 and #3132.

@zuiderkwast zuiderkwast requested a review from ranshid January 29, 2026 15:25
@zuiderkwast zuiderkwast requested a review from a team January 29, 2026 15:25
@codecov

codecov Bot commented Jan 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.23077% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.61%. Comparing base (d26a103) to head (529dad6).
⚠️ Report is 12 commits behind head on 9.0.

Files with missing lines Patch % Lines
src/module.c 0.00% 7 Missing ⚠️
src/networking.c 78.26% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              9.0    #3129   +/-   ##
=======================================
  Coverage   72.61%   72.61%           
=======================================
  Files         128      128           
  Lines       69972    69989   +17     
=======================================
+ Hits        50810    50826   +16     
- Misses      19162    19163    +1     
Files with missing lines Coverage Δ
src/db.c 93.18% <100.00%> (ø)
src/expire.c 97.83% <100.00%> (+0.01%) ⬆️
src/object.c 81.94% <100.00%> (ø)
src/server.c 88.43% <100.00%> (-0.11%) ⬇️
src/t_stream.c 93.38% <100.00%> (ø)
src/networking.c 88.39% <78.26%> (-0.07%) ⬇️
src/module.c 9.75% <0.00%> (-0.01%) ⬇️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread 00-RELEASENOTES Outdated
Comment thread 00-RELEASENOTES Outdated
Comment thread 00-RELEASENOTES Outdated
Comment thread 00-RELEASENOTES Outdated
Comment thread 00-RELEASENOTES Outdated
vitahlin and others added 11 commits January 30, 2026 12:25
GitHub has deprecated older macOS runners, and macos-13 is no longer supported.

1. The latest version of cross-platform-actions/action does allow
running on ubuntu-latest (Linux runner) and does not strictly require macOS.
2. Previously, cross-platform-actions/action@v0.22.0 used runs-on:
macos-13. I checked the latest version of cross-platform-actions, and
the official examples now use runs-on: ubuntu. I think we can switch from macOS to Ubuntu.

---------

Signed-off-by: Vitah Lin <vitahlin@gmail.com>
…ey-io#2945)

Previously, only changes to argv[0] were considered when deciding
whether to re-prepare the command and recompute the cluster slot. This
caused issues where changes to the argument count (argc) would not
trigger the necessary prepare in case additional keys were injected to
the command

This fix adds a check to ensure that changes to argc are properly
considered.

Signed-off-by: Patrik Hermansson <phermansson@gmail.com>
…ey-io#2944)

When import-mode is yes, we might be able to set an expired TTL. At the
same time,
commands like EXPIREAT/EXPIRE do not restrict TTL from being negative.
After we
set import-mode to no, server will crash at:
```
 int activeExpireCycleTryExpire(serverDb *db, robj *val, long long now, int didx) { 
     long long t = objectGetExpire(val); 
     serverAssert(t >= 0); 
```

In this case, we restrict ttl from being negative in
expireGenericCommand, we simply
change the expiration time to 0 to mark the key as expired since in
import-mode, the
import-source client can always read the expired keys anyway.

import-mode was introduced in valkey-io#1185

---------

Signed-off-by: cjx-zar <jxchenczar@foxmail.com>
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
…n and data corruption (valkey-io#3004)

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could
be
inserted into wrong hash slots, causing key duplication and data
corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for
subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in
k1's
slot.

The existing code already skipped this optimization for replicated
clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead,
which
covers both replicated clients and the AOF client.

Fixes valkey-io#2995, introduced in valkey-io#1949.

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
…erhead (valkey-io#3005)

The metric `used_memory_dataset` turned into an insanely large number close to 2^64 (actually
overflowed negative value), as reported in valkey-io#2994.

## Double-Counted database memory

When server starts, the global variable `server.initial_memory_usage` is used to record a memory
baseline in InitServerLast. This `server.initial_memory_usage` has clearly included initial database
memory, since databases are created in initServer.

In function getMemoryOverheadData, the `mem_total` is firstly assigned the baseline, which includes
initial database memory. And then all extra memory usage of databases are added to mem_total. The initial
database memory are therefore counted TWICE.

This eventually caused wrongly larger `used_memory_overhead`. For a database with only a couple of keys,
the `used_memory_overhead` is easily larger than `used_memory` and causes an overflowed `used_memory_dataset`.

## Missed Empty Databases

In function getMemoryOverheadData(), kvstores without any allocated hashtable are ignored from calculation:
```c
if (db == NULL || !kvstoreNumAllocatedHashtables(db->keys)) continue;
```

However, even the kvstore has no allocated hashtable, there are still some memory allocated by kvstoreCreate(),
including `hashtable_size_index`, which can be larger than 128 KiB.

On the contrary, this caused wrongly smaller `used_memory_overhead` for an empty database. When we insert only
ONE key to the database, the database is suddenly taken into account, and `used_memory_overhead` will increase
(for `used_memory_dataset` decrease) by more than 128 KiB due to the single key insertion.

Signed-off-by: Ace Breakpoint <chemistudio@gmail.com>
Signed-off-by: bpint <chemistudio@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Makes our tests possible to run with TCL 9.

The latest Fedora now has TCL 9.0 and it's working now, including the
TCL TLS package. (This wasn't working earlier due to some packaging
errors for TCL packages in Fedora, which have been fixed now.)

This PR also removes the custom compilation of TCL 8 used in our Daily
jobs and uses the system default TCL version instead. The TCL version
depends on the OS. For the latest Fedora, you get 9.0, for macOS you get
8.5 and for most other OSes you get 8.6.

The checks for TCL 8.7 are removed, because 8.7 doesn't exist. It was
never released.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…copy-avoidance (valkey-io#3046)

In releaseBufReferences(), clusterSlotStatsAddNetworkBytesOutForSlot() is called in the IO
thread after writing the reply to the client, and (since valkey-io#2652) it's also done in the normal
code path in the main thread (afterCommand() -> clusterSlotStatsAddNetworkBytesOutForUserClient()).
This means the output bytes are recorded twice in the cluster slot stats.

Fixes valkey-io#3043. Bug was introduced in valkey-io#2652 (9.0.1).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
When using XREAD STREAMS <stream> + on an empty stream created with
MKSTREAM, valkey returns an error instead of nil.

This happens because is missing a check on the stream length.

The fix adds the length check on the condition.

Fixes valkey-io#2728

Signed-off-by: diego-ciciani01 <diego.ciciani@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…alkey-io#3086)

Currently after valkey-io#2652 in copy avoidance path we unconditionally track
`c->net_output_bytes_curr_cmd` even when `commandlog-request-larger-than
-1`. This PR provides ability to skip that accounting in copy avoidance
path based on config value. If `commandlog-request-larger-than -1` then
performance is same as before valkey-io#2652.

Also added tracking of `c->net_output_bytes_curr_cmd` in IO thread if it
is not tracked in main thread based on decision made in main thread.

Read Performance (write performance is the same):
```
With this change:
Summary:
  throughput summary: 2191732.75 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.720     0.072     1.743     1.919     2.647    23.983

Unstable:
Summary:
  throughput summary: 1658197.25 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        2.299     0.120     2.343     2.503     3.319     4.791

Config:
commandlog-request-larger-than -1
^^ without this performance is just like on unstable branch
databases 1
save ""
appendonly no
rdbcompression no
activedefrag no
maxclients 1000
io-threads 9
protected-mode no
hz 10
maxmemory 2gb
```

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
This change amends valkey-io#3086 in which `commandlog-request-larger-than` was
mixed up with `commandlog-reply-larger-than` while checking for config
values and in tests.

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
@zuiderkwast zuiderkwast force-pushed the backport-9-0-2 branch 3 times, most recently from 85f43fa to eeba9e3 Compare February 3, 2026 10:57
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@zuiderkwast zuiderkwast merged commit 1ac4cfe into valkey-io:9.0 Feb 3, 2026
63 of 65 checks passed
@zuiderkwast zuiderkwast deleted the backport-9-0-2 branch February 3, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants