Skip to content

8.6 commits backport#465

Closed
sundb wants to merge 17 commits into
8.6-basefrom
8.6-backport
Closed

8.6 commits backport#465
sundb wants to merge 17 commits into
8.6-basefrom
8.6-backport

Conversation

@sundb

@sundb sundb commented Jun 4, 2026

Copy link
Copy Markdown
Owner

No description provided.

…itectures (redis#15212)

Ensure backward compatibility and consistent behavior across different
architectures by explicitly setting the default value.

Fixes redis#15175

Co-authored-by: ofiryanai <ofiryanai1@gmail.com>
(cherry picked from commit 6c3a8ec)
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.84492% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.31%. Comparing base (77870e9) to head (8ee4d45).

Files with missing lines Patch % Lines
src/sentinel.c 0.00% 47 Missing ⚠️
src/rdb.c 42.85% 8 Missing ⚠️
src/db.c 86.66% 2 Missing ⚠️
src/redis-check-rdb.c 0.00% 2 Missing ⚠️
src/eval.c 0.00% 1 Missing ⚠️
src/module.c 50.00% 1 Missing ⚠️
src/networking.c 98.64% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           8.6-base     #465      +/-   ##
============================================
+ Coverage     76.24%   76.31%   +0.07%     
============================================
  Files           138      138              
  Lines         79708    79838     +130     
============================================
+ Hits          60771    60929     +158     
+ Misses        18937    18909      -28     
Files with missing lines Coverage Δ
src/cluster_asm.c 92.23% <100.00%> (-0.22%) ⬇️
src/commands.def 100.00% <ø> (ø)
src/evict.c 98.49% <100.00%> (ø)
src/keymeta.c 93.34% <100.00%> (+0.47%) ⬆️
src/multi.c 95.76% <100.00%> (+0.01%) ⬆️
src/object.c 88.77% <100.00%> (+0.05%) ⬆️
src/pubsub.c 91.57% <100.00%> (ø)
src/replication.c 87.63% <100.00%> (ø)
src/script_lua.c 90.31% <100.00%> (ø)
src/server.c 90.60% <100.00%> (+0.01%) ⬆️
... and 11 more

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

melatron and others added 16 commits June 4, 2026 10:59
PFMERGE's second key spec (source keys) produces an empty range when
called with only a dest key (e.g. PFMERGE dest). getKeysUsingKeySpecs
treats that as invalid_spec, which discards all previously found keys
and returns an error. Add pfmergeGetKeys as a getkeys callback so the
command correctly falls back to it when key specs fail on the edge case.

(cherry picked from commit 5f5ddfd)
Validate HEXPIRE-family field counts without parser overflow
keep flexible option order; only require fields fit in argv
add tests for INT_MAX numfields across HEXPIRE/HPEXPIRE/HEXPIREAT/HPEXPIREAT

(cherry picked from commit e1d35ac)
This PR fixes two issues when processing corrupt data in
rdbLoadCheckModuleValue():

1. When handling `RDB_MODULE_OPCODE_STRING` opcode,
rdbGenericLoadStringObject() can return NULL on a corrupt payload. The
code called decrRefCount(o) unconditionally without a NULL check,
resulting in a NULL pointer dereference crash.

2. The while loop condition was `!= RDB_MODULE_OPCODE_EOF`, which means
a truncated payload (causing rdbLoadLen to return RDB_LENERR) would
never exit the loop, since `RDB_LENERR != RDB_MODULE_OPCODE_EOF` is
always true, potentially causing an infinite hang.

(cherry picked from commit ca6e471)
…tracking (redis#15037)

`xinfoReplyWithStreamInfo` passed the wrong key(c->argv[1]) instead of
`c->argv[2]` to `updateSlotAllocSize` when updating per-slot memory
tracking.

Fix by passing the key explicitly to `xinfoReplyWithStreamInfo` instead
of relying on a hardcoded argv index.
Also, add the `-DDEBUG_ASSERTIONS` flag to the test-ubuntu-jemalloc CI
to cover this debug assertion.

(cherry picked from commit 2049c7f)
RM_RegisterClusterMessageReceiver() unlinks a receiver node from the
clusterReceivers[type] linked list when the callback is set to NULL, but
when removing the head node (prev == NULL), the code updates
clusterReceivers[type]->next instead of clusterReceivers[type] itself.

This leaves clusterReceivers[type] pointing to the freed node, so any
later traversal through clusterReceivers[type] dereferences a dangling
pointer.

Fix by updating clusterReceivers[type] directly when prev == NULL.

Fixes redis#15057

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
(cherry picked from commit 303667a)
Fixes checkPrefixCollisionsOrReply() to return 0 (failure) on any provided-prefix self-overlap, instead of accidentally returning a non-zero loop index for overlaps found after the first prefix.

Signed-off-by: Raj Danday <rajkripal.danday@gmail.com>
(cherry picked from commit 625b6f5)
### Problem
In `scanGenericCommand`, `maxiterations = count * 10` overflows when
`count > LONG_MAX / 10`, causing undefined behavior.

### Changed
1. Use saturating arithmetic to prevent overflow.
2. Added a test to trigger the overflow path, detectable by UBSan.

(cherry picked from commit fafc472)
Reject control characters (0x00-0x1F, 0x7F) in user-controlled string
arguments to SENTINEL SET, SENTINEL MONITOR, and SENTINEL CONFIG SET to
prevent newline injection into the persisted config file. An attacker
with access to SENTINEL SET could inject arbitrary config directives
(e.g. notification-script) by embedding \r\n in auth-pass or similar
fields, leading to code execution on restart.

As a defense-in-depth measure, config serialization now uses sdscatrepr
(via sentinelSdscatConfigArg) for all user-controlled string fields when
they contain characters that require escaping. Simple values remain
unquoted for backward compatibility with older config parsers.

Add comprehensive Sentinel tests (16-config-injection.tcl) covering
control character rejection for all affected commands, verification that
injection payloads do not pollute the config file, round-trip
persistence of values containing spaces and quotes through restart, and
backward compatibility with the old unquoted config format.

(cherry picked from commit 3e1afec)
## Issue

The vector set Python tests intentionally use two clients:
- the default client (`self.redis`) for the existing RESP2-oriented test
expectations
- `self.redis3` for RESP3-specific coverage.

However, the default client did not explicitly set a protocol, so it
depended on redis-py's default behavior. With newer redis-py versions,
RESP3 is now the default
protocol(redis/redis-py#4052). In particular,
vector set replies such as `VSIM ... WITHSCORES` may be parsed into
map/dict-like structures instead of the RESP2 flat-array shape assumed
by existing tests.

## Changes

Explicitly create the default primary and replica Redis clients with
`protocol=2`.
`self.redis3` is left unchanged and continues to use `protocol=3` for
RESP3-specific test coverage.

(cherry picked from commit 8fcf3dc)
…ded bulk string references (redis#14934)

After redis#14608 (Reply Copy Avoidance), when copy avoidance kicks in, bulk
string replies are sent by reference instead of being copied into the
output buffer.
The referenced bytes are not counted in  `reply_bytes`, which causes:

1. `getClientOutputBufferMemoryUsage()` underestimates the actual memory
usage, so output buffer limits may not be triggered in time, allowing
clients to consume unbounded memory.
2. Client eviction does not account for the referenced bytes, making it
ineffective when copy avoidance is used.
3. `omem` reported in `CLIENT LIST` / `CLIENT INFO` does not reflect the
true output buffer memory footprint.

Track the bytes of referenced bulk strings in the output buffer with two
per-client counters:

1. reply_bytes_shared - the logical size of all BULK_STR_REF payloads in
the output buffer.
   Updated incrementally whenever a reference is added/removed.
Represents memory the client is "charged" for even though it is shared
with the keyspace.

2. reply_bytes_unshared — the subset of the above where the referenced
object's refcount == 1 (i.e. the key has been deleted from the
keyspace), so the memory is kept alive solely by this client's output
buffer and would actually be freed on disconnect.
Maintained as a lazy cache refreshed via
updateClientUnsharedReplyBytes().

CLIENT LIST / CLIENT INFO — two new fields, plus refined semantics for
existing ones:

Field | Meaning
-- | --
omem | (semantics changed) logical output-buffer memory, now including
shared memory referenced from the keyspace. Still
excludes client->buf so static clients show 0.
omem-shared | (new) shared output-buffer memory (referenced bulk
strings, not solely owned by this client).
omem-unshared | (new) unshared output-buffer memory (referenced bulk
strings solely owned by this client; freed on disconnect).
tot-mem | (semantics refined) actual memory usage —
includes omem-unshared, excludes omem-shared to avoid double-counting
keyspace memory.

INFO memory — two new fields mirroring the above:

Field | Meaning
-- | --
mem_clients_normal | (semantics changed) actual memory usage of normal
clients (includes unshared, excludes shared).
mem_clients_normal_shared | (new) aggregate shared output-buffer memory
across normal clients.
mem_clients_normal_unshared | (new) aggregate unshared output-buffer
memory across normal clients.

MEMORY STATS — schema extended with the matching keys:

Field | Meaning
-- | --
clients.normal.shared | (new) aggregate shared output-buffer memory
across normal clients.
clients.normal.unshared | (new) aggregate unshared output-buffer memory
across normal clients.
Fix missing closeClientOnOutputBufferLimitReached() call when adding a
referenced robj to the reply

---------

Co-authored-by: oranagra <oran@redislabs.com>
(cherry picked from commit 05859cd)
This PR is based on: valkey-io/valkey#3511
Close redis#14983

## Summary

During diskless replication, if **any single replica** cannot accept a
write (TCP send buffer full / `EAGAIN`), the master stops reading the
RDB pipe entirely, stalling data delivery to **all** replicas —
including fast ones that are ready to receive data.

The failure reason is similar to
redis#14946, the socket buffer is more
easy to fill.

## Root Cause

In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and
writes to all replica sockets in a loop. When `connWrite` to any replica
returns a partial write (socket send buffer full), the handler:

1. Installs a per-replica `rdbPipeWriteHandler` and increments
`rdb_pipe_numconns_writing`
2. **Removes the pipe read event** via `aeDeleteFileEvent(server.el,
server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads

The pipe read event is only re-enabled when **all** pending write
handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the
**slowest replica dictates the throughput for all replicas**.

## Observed Behavior

With one slow replica (consuming at ~290 KB/s due to `key-load-delay`):

- Master bursts ~1.3 MB of RDB data until the slow replica's socket send
buffer fills
- `rdbPipeReadHandler` disables the pipe read event
- **All replicas starve for 4–5 seconds** while the slow replica drains
its buffer
- Cycle repeats: burst → stall → burst → stall

Ultimately, it leads to a very slow synchronization process of the
entire master and replica.

### Changes

1. Skip the entire `diskless replicas drop during rdb pipe` test under
Valgrind to avoid timing flakiness on slow env.

2. Move `start_server` inside the `foreach all_drop` loop so each
subcase gets a fresh master instead of sharing state across subcases.

3. For `no / slow / fast / all` subcases, replica 0 runs with
`key-load-delay 500`, which combined with the blocked-writer TCP
back-pressure can stall the RDB-saving child indefinitely; shrink the
dataset to ~40 MB so the transfer still exercises the blocked-writer
path but completes in reasonable time instead of hanging on the TCP
deadlock.
For the timeout subcase, replica 0 does not run with `key-load-delay
500`, so to avoid the TCP deadlock we still reduce the dataset somewhat,
but keep it larger than the other subcases. Otherwise the kernel TCP
send buffer can absorb the whole RDB, and we'd miss the
repl_last_partial_write != 0 "(full sync)" timeout path and only hit the
"(streaming sync)" path instead.

5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so
the RDB child keeps generating data while both replicas are killed,
ensuring the last-replica-drop path is exercised rather than racing with
normal completion.

6. Move the slow-replica `pause_process()` so it happens only in the
timeout subcase, not after killing replicas, so Redis observes the
disconnect promptly in non-timeout flows.

7. In the timeout subcase, set `repl-timeout` 2, wait inline for
`*Disconnecting timedout replica (full sync)*`, then restore
`repl-timeout` 60 so the remaining replica can finish the streamed RDB.

---------

Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
(cherry picked from commit 3189614)
Test 15-config-set-config-get.tcl was leaving announce-port and
announce-hostnames at non-default values, which breaks auto-discovery in
subsequent test units. Add reset lines at the end of each test that
modifies config. This PR fixes failures in Daily CI tests.

(cherry picked from commit d9b03bd)
Fixes consumer replication inconsistency when `XREADGROUP` is called for
a new consumer but no `XCLAIM` commands are propagated to the replica.
Previously, consumer creation was only propagated to replicas when
`noack=true`, relying on `XCLAIM` propagation to implicitly create the
consumer in the non-NOACK path. However, if no messages exist to read,
no `XCLAIM` is generated, and the consumer is silently lost on the
replica.

This is a follow-up to the original fix in
[redis#7140](redis#7140) /
[redis#7526](redis#7526), which
introduced `XGROUP CREATECONSUMER` propagation but only for the `NOACK`
case.

- **`xreadgroupCommand` (src/t_stream.c):** Replaced the `if (noack)`
guard around the `streamPropagateConsumerCreation()` call with a
deferred check after `streamReplyWithRange()`. Consumer creation is now
propagated when `noack || propCount == 0` — that is, only when no
`XCLAIM` commands were generated. This avoids redundant propagation in
the common case where `XCLAIM` already implicitly creates the consumer
on the replica, while correctly handling both the NOACK path (where
PEL/XCLAIM is skipped entirely) and the no-messages path (where there is
nothing to XCLAIM).
- **Test (tests/unit/type/stream-cgroups.tcl):** Added replication test
`"XREADGROUP propagates new consumer to replica"` that sets up a
master-replica pair and verifies consumer propagation in two cases: (1)
without NOACK when no messages are available to deliver, and (2) with
NOACK when messages are delivered but XCLAIM is skipped.

- **Master-replica consistency:** Consumers created by `XREADGROUP` are
now visible on replicas whenever no `XCLAIM` would otherwise create them
— covering both the NOACK path and the empty-stream path.
- **No redundant propagation:** The noack || propCount == 0 condition
avoids emitting a superfluous XGROUP CREATECONSUMER when XCLAIM commands
are already propagated and would implicitly create the consumer on the
replica.

(cherry picked from commit 0be39e5)
…s#15094)

Fixes redis#15085

## Problem
getKeySlot() may return `server.current_client->slot` while a command is
executing instead of computing the slot from the provided string.

The unsubscribe can be triggered by another client, in which case server.current_client is not the client being unsubscribed, so getKeySlot() would return that client's cached slot. Using this wrong slot would make the lookup in type.serverPubSubChannels miss the channel and ultimately trigger the assertion below.

## Fix
Always use keyHashSlot() instead of getKeySlot() on unsubscribe.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
(cherry picked from commit 0bbb196)
# Description
There is an array corruption bug in LDB caused by an incorrect size
argument being passed to `memmove()` inside the `ldbDelBreakpoint()`
function.

When deleting a breakpoint, `memmove()` is used to shift the remaining
breakpoints in the ldb.bp integer array forward. However, the size
parameter passes the number of elements rather than the number of bytes.
Because ldb.bp is an array of type `int`, this results in an under-copy.

(cherry picked from commit bf432c9)
PR redis#14440 changed `mstate.commands` from an array of `multiCmd` structs
to an array of `pendingCommand` pointers. This PR fixes the overhead
calculation in multiStateMemOverhead to account for both the pointer
array and the actual structs:
- The pointer array: `alloc_count * sizeof(pendingCommand*)`
- The individually allocated structs: `count * sizeof(pendingCommand)`

(cherry picked from commit 9302d27)
@sundb sundb closed this Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.