8.6 commits backport by sundb · Pull Request #465 · sundb/redis

sundb · 2026-06-04T01:25:36Z

No description provided.

…itectures (redis#15212) Ensure backward compatibility and consistent behavior across different architectures by explicitly setting the default value. Fixes redis#15175 Co-authored-by: ofiryanai <ofiryanai1@gmail.com> (cherry picked from commit 6c3a8ec)

codecov · 2026-06-04T01:39:39Z

Codecov Report

❌ Patch coverage is 66.84492% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.31%. Comparing base (77870e9) to head (8ee4d45).

Files with missing lines	Patch %	Lines
src/sentinel.c	0.00%	47 Missing ⚠️
src/rdb.c	42.85%	8 Missing ⚠️
src/db.c	86.66%	2 Missing ⚠️
src/redis-check-rdb.c	0.00%	2 Missing ⚠️
src/eval.c	0.00%	1 Missing ⚠️
src/module.c	50.00%	1 Missing ⚠️
src/networking.c	98.64%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           8.6-base     #465      +/-   ##
============================================
+ Coverage     76.24%   76.31%   +0.07%     
============================================
  Files           138      138              
  Lines         79708    79838     +130     
============================================
+ Hits          60771    60929     +158     
+ Misses        18937    18909      -28

Files with missing lines	Coverage Δ
src/cluster_asm.c	`92.23% <100.00%> (-0.22%)`	⬇️
src/commands.def	`100.00% <ø> (ø)`
src/evict.c	`98.49% <100.00%> (ø)`
src/keymeta.c	`93.34% <100.00%> (+0.47%)`	⬆️
src/multi.c	`95.76% <100.00%> (+0.01%)`	⬆️
src/object.c	`88.77% <100.00%> (+0.05%)`	⬆️
src/pubsub.c	`91.57% <100.00%> (ø)`
src/replication.c	`87.63% <100.00%> (ø)`
src/script_lua.c	`90.31% <100.00%> (ø)`
src/server.c	`90.60% <100.00%> (+0.01%)`	⬆️
... and 11 more

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

PFMERGE's second key spec (source keys) produces an empty range when called with only a dest key (e.g. PFMERGE dest). getKeysUsingKeySpecs treats that as invalid_spec, which discards all previously found keys and returns an error. Add pfmergeGetKeys as a getkeys callback so the command correctly falls back to it when key specs fail on the edge case. (cherry picked from commit 5f5ddfd)

Validate HEXPIRE-family field counts without parser overflow keep flexible option order; only require fields fit in argv add tests for INT_MAX numfields across HEXPIRE/HPEXPIRE/HEXPIREAT/HPEXPIREAT (cherry picked from commit e1d35ac)

This PR fixes two issues when processing corrupt data in rdbLoadCheckModuleValue(): 1. When handling `RDB_MODULE_OPCODE_STRING` opcode, rdbGenericLoadStringObject() can return NULL on a corrupt payload. The code called decrRefCount(o) unconditionally without a NULL check, resulting in a NULL pointer dereference crash. 2. The while loop condition was `!= RDB_MODULE_OPCODE_EOF`, which means a truncated payload (causing rdbLoadLen to return RDB_LENERR) would never exit the loop, since `RDB_LENERR != RDB_MODULE_OPCODE_EOF` is always true, potentially causing an infinite hang. (cherry picked from commit ca6e471)

…tracking (redis#15037) `xinfoReplyWithStreamInfo` passed the wrong key(c->argv[1]) instead of `c->argv[2]` to `updateSlotAllocSize` when updating per-slot memory tracking. Fix by passing the key explicitly to `xinfoReplyWithStreamInfo` instead of relying on a hardcoded argv index. Also, add the `-DDEBUG_ASSERTIONS` flag to the test-ubuntu-jemalloc CI to cover this debug assertion. (cherry picked from commit 2049c7f)

RM_RegisterClusterMessageReceiver() unlinks a receiver node from the clusterReceivers[type] linked list when the callback is set to NULL, but when removing the head node (prev == NULL), the code updates clusterReceivers[type]->next instead of clusterReceivers[type] itself. This leaves clusterReceivers[type] pointing to the freed node, so any later traversal through clusterReceivers[type] dereferences a dangling pointer. Fix by updating clusterReceivers[type] directly when prev == NULL. Fixes redis#15057 --------- Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 303667a)

Fixes checkPrefixCollisionsOrReply() to return 0 (failure) on any provided-prefix self-overlap, instead of accidentally returning a non-zero loop index for overlaps found after the first prefix. Signed-off-by: Raj Danday <rajkripal.danday@gmail.com> (cherry picked from commit 625b6f5)

### Problem In `scanGenericCommand`, `maxiterations = count * 10` overflows when `count > LONG_MAX / 10`, causing undefined behavior. ### Changed 1. Use saturating arithmetic to prevent overflow. 2. Added a test to trigger the overflow path, detectable by UBSan. (cherry picked from commit fafc472)

Reject control characters (0x00-0x1F, 0x7F) in user-controlled string arguments to SENTINEL SET, SENTINEL MONITOR, and SENTINEL CONFIG SET to prevent newline injection into the persisted config file. An attacker with access to SENTINEL SET could inject arbitrary config directives (e.g. notification-script) by embedding \r\n in auth-pass or similar fields, leading to code execution on restart. As a defense-in-depth measure, config serialization now uses sdscatrepr (via sentinelSdscatConfigArg) for all user-controlled string fields when they contain characters that require escaping. Simple values remain unquoted for backward compatibility with older config parsers. Add comprehensive Sentinel tests (16-config-injection.tcl) covering control character rejection for all affected commands, verification that injection payloads do not pollute the config file, round-trip persistence of values containing spaces and quotes through restart, and backward compatibility with the old unquoted config format. (cherry picked from commit 3e1afec)

## Issue The vector set Python tests intentionally use two clients: - the default client (`self.redis`) for the existing RESP2-oriented test expectations - `self.redis3` for RESP3-specific coverage. However, the default client did not explicitly set a protocol, so it depended on redis-py's default behavior. With newer redis-py versions, RESP3 is now the default protocol(redis/redis-py#4052). In particular, vector set replies such as `VSIM ... WITHSCORES` may be parsed into map/dict-like structures instead of the RESP2 flat-array shape assumed by existing tests. ## Changes Explicitly create the default primary and replica Redis clients with `protocol=2`. `self.redis3` is left unchanged and continues to use `protocol=3` for RESP3-specific test coverage. (cherry picked from commit 8fcf3dc)

…ded bulk string references (redis#14934) After redis#14608 (Reply Copy Avoidance), when copy avoidance kicks in, bulk string replies are sent by reference instead of being copied into the output buffer. The referenced bytes are not counted in `reply_bytes`, which causes: 1. `getClientOutputBufferMemoryUsage()` underestimates the actual memory usage, so output buffer limits may not be triggered in time, allowing clients to consume unbounded memory. 2. Client eviction does not account for the referenced bytes, making it ineffective when copy avoidance is used. 3. `omem` reported in `CLIENT LIST` / `CLIENT INFO` does not reflect the true output buffer memory footprint. Track the bytes of referenced bulk strings in the output buffer with two per-client counters: 1. reply_bytes_shared - the logical size of all BULK_STR_REF payloads in the output buffer. Updated incrementally whenever a reference is added/removed. Represents memory the client is "charged" for even though it is shared with the keyspace. 2. reply_bytes_unshared — the subset of the above where the referenced object's refcount == 1 (i.e. the key has been deleted from the keyspace), so the memory is kept alive solely by this client's output buffer and would actually be freed on disconnect. Maintained as a lazy cache refreshed via updateClientUnsharedReplyBytes(). CLIENT LIST / CLIENT INFO — two new fields, plus refined semantics for existing ones: Field | Meaning -- | -- omem | (semantics changed) logical output-buffer memory, now including shared memory referenced from the keyspace. Still excludes client->buf so static clients show 0. omem-shared | (new) shared output-buffer memory (referenced bulk strings, not solely owned by this client). omem-unshared | (new) unshared output-buffer memory (referenced bulk strings solely owned by this client; freed on disconnect). tot-mem | (semantics refined) actual memory usage — includes omem-unshared, excludes omem-shared to avoid double-counting keyspace memory. INFO memory — two new fields mirroring the above: Field | Meaning -- | -- mem_clients_normal | (semantics changed) actual memory usage of normal clients (includes unshared, excludes shared). mem_clients_normal_shared | (new) aggregate shared output-buffer memory across normal clients. mem_clients_normal_unshared | (new) aggregate unshared output-buffer memory across normal clients. MEMORY STATS — schema extended with the matching keys: Field | Meaning -- | -- clients.normal.shared | (new) aggregate shared output-buffer memory across normal clients. clients.normal.unshared | (new) aggregate unshared output-buffer memory across normal clients. Fix missing closeClientOnOutputBufferLimitReached() call when adding a referenced robj to the reply --------- Co-authored-by: oranagra <oran@redislabs.com> (cherry picked from commit 05859cd)

This PR is based on: valkey-io/valkey#3511 Close redis#14983 ## Summary During diskless replication, if **any single replica** cannot accept a write (TCP send buffer full / `EAGAIN`), the master stops reading the RDB pipe entirely, stalling data delivery to **all** replicas — including fast ones that are ready to receive data. The failure reason is similar to redis#14946, the socket buffer is more easy to fill. ## Root Cause In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and writes to all replica sockets in a loop. When `connWrite` to any replica returns a partial write (socket send buffer full), the handler: 1. Installs a per-replica `rdbPipeWriteHandler` and increments `rdb_pipe_numconns_writing` 2. **Removes the pipe read event** via `aeDeleteFileEvent(server.el, server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads The pipe read event is only re-enabled when **all** pending write handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the **slowest replica dictates the throughput for all replicas**. ## Observed Behavior With one slow replica (consuming at ~290 KB/s due to `key-load-delay`): - Master bursts ~1.3 MB of RDB data until the slow replica's socket send buffer fills - `rdbPipeReadHandler` disables the pipe read event - **All replicas starve for 4–5 seconds** while the slow replica drains its buffer - Cycle repeats: burst → stall → burst → stall Ultimately, it leads to a very slow synchronization process of the entire master and replica. ### Changes 1. Skip the entire `diskless replicas drop during rdb pipe` test under Valgrind to avoid timing flakiness on slow env. 2. Move `start_server` inside the `foreach all_drop` loop so each subcase gets a fresh master instead of sharing state across subcases. 3. For `no / slow / fast / all` subcases, replica 0 runs with `key-load-delay 500`, which combined with the blocked-writer TCP back-pressure can stall the RDB-saving child indefinitely; shrink the dataset to ~40 MB so the transfer still exercises the blocked-writer path but completes in reasonable time instead of hanging on the TCP deadlock. For the timeout subcase, replica 0 does not run with `key-load-delay 500`, so to avoid the TCP deadlock we still reduce the dataset somewhat, but keep it larger than the other subcases. Otherwise the kernel TCP send buffer can absorb the whole RDB, and we'd miss the repl_last_partial_write != 0 "(full sync)" timeout path and only hit the "(streaming sync)" path instead. 5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so the RDB child keeps generating data while both replicas are killed, ensuring the last-replica-drop path is exercised rather than racing with normal completion. 6. Move the slow-replica `pause_process()` so it happens only in the timeout subcase, not after killing replicas, so Redis observes the disconnect promptly in non-timeout flows. 7. In the timeout subcase, set `repl-timeout` 2, wait inline for `*Disconnecting timedout replica (full sync)*`, then restore `repl-timeout` 60 so the remaining replica can finish the streamed RDB. --------- Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com> Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 3189614)

Test 15-config-set-config-get.tcl was leaving announce-port and announce-hostnames at non-default values, which breaks auto-discovery in subsequent test units. Add reset lines at the end of each test that modifies config. This PR fixes failures in Daily CI tests. (cherry picked from commit d9b03bd)

Fixes consumer replication inconsistency when `XREADGROUP` is called for a new consumer but no `XCLAIM` commands are propagated to the replica. Previously, consumer creation was only propagated to replicas when `noack=true`, relying on `XCLAIM` propagation to implicitly create the consumer in the non-NOACK path. However, if no messages exist to read, no `XCLAIM` is generated, and the consumer is silently lost on the replica. This is a follow-up to the original fix in [redis#7140](redis#7140) / [redis#7526](redis#7526), which introduced `XGROUP CREATECONSUMER` propagation but only for the `NOACK` case. - **`xreadgroupCommand` (src/t_stream.c):** Replaced the `if (noack)` guard around the `streamPropagateConsumerCreation()` call with a deferred check after `streamReplyWithRange()`. Consumer creation is now propagated when `noack || propCount == 0` — that is, only when no `XCLAIM` commands were generated. This avoids redundant propagation in the common case where `XCLAIM` already implicitly creates the consumer on the replica, while correctly handling both the NOACK path (where PEL/XCLAIM is skipped entirely) and the no-messages path (where there is nothing to XCLAIM). - **Test (tests/unit/type/stream-cgroups.tcl):** Added replication test `"XREADGROUP propagates new consumer to replica"` that sets up a master-replica pair and verifies consumer propagation in two cases: (1) without NOACK when no messages are available to deliver, and (2) with NOACK when messages are delivered but XCLAIM is skipped. - **Master-replica consistency:** Consumers created by `XREADGROUP` are now visible on replicas whenever no `XCLAIM` would otherwise create them — covering both the NOACK path and the empty-stream path. - **No redundant propagation:** The noack || propCount == 0 condition avoids emitting a superfluous XGROUP CREATECONSUMER when XCLAIM commands are already propagated and would implicitly create the consumer on the replica. (cherry picked from commit 0be39e5)

…s#15094) Fixes redis#15085 ## Problem getKeySlot() may return `server.current_client->slot` while a command is executing instead of computing the slot from the provided string. The unsubscribe can be triggered by another client, in which case server.current_client is not the client being unsubscribed, so getKeySlot() would return that client's cached slot. Using this wrong slot would make the lookup in type.serverPubSubChannels miss the channel and ultimately trigger the assertion below. ## Fix Always use keyHashSlot() instead of getKeySlot() on unsubscribe. --------- Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 0bbb196)

# Description There is an array corruption bug in LDB caused by an incorrect size argument being passed to `memmove()` inside the `ldbDelBreakpoint()` function. When deleting a breakpoint, `memmove()` is used to shift the remaining breakpoints in the ldb.bp integer array forward. However, the size parameter passes the number of elements rather than the number of bytes. Because ldb.bp is an array of type `int`, this results in an under-copy. (cherry picked from commit bf432c9)

PR redis#14440 changed `mstate.commands` from an array of `multiCmd` structs to an array of `pendingCommand` pointers. This PR fixes the overhead calculation in multiStateMemOverhead to account for both the pointer array and the actual structs: - The pointer array: `alloc_count * sizeof(pendingCommand*)` - The individually allocated structs: `count * sizeof(pendingCommand)` (cherry picked from commit 9302d27)

melatron and others added 16 commits June 4, 2026 10:59

Fix HEXPIRE numfields overflow (redis#15021)

1be85ca

Validate HEXPIRE-family field counts without parser overflow keep flexible option order; only require fields fit in argv add tests for INT_MAX numfields across HEXPIRE/HPEXPIRE/HEXPIREAT/HPEXPIREAT (cherry picked from commit e1d35ac)

sundb force-pushed the 8.6-backport branch from 85def4b to 8ee4d45 Compare June 4, 2026 07:34

sundb closed this Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.6 commits backport#465

8.6 commits backport#465
sundb wants to merge 17 commits into
8.6-basefrom
8.6-backport

sundb commented Jun 4, 2026

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

sundb commented Jun 4, 2026

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

codecov Bot commented Jun 4, 2026 •

edited

Loading