8.6 commits backport#465
Closed
sundb wants to merge 17 commits into
Closed
Conversation
…itectures (redis#15212) Ensure backward compatibility and consistent behavior across different architectures by explicitly setting the default value. Fixes redis#15175 Co-authored-by: ofiryanai <ofiryanai1@gmail.com> (cherry picked from commit 6c3a8ec)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 8.6-base #465 +/- ##
============================================
+ Coverage 76.24% 76.31% +0.07%
============================================
Files 138 138
Lines 79708 79838 +130
============================================
+ Hits 60771 60929 +158
+ Misses 18937 18909 -28
🚀 New features to boost your workflow:
|
PFMERGE's second key spec (source keys) produces an empty range when called with only a dest key (e.g. PFMERGE dest). getKeysUsingKeySpecs treats that as invalid_spec, which discards all previously found keys and returns an error. Add pfmergeGetKeys as a getkeys callback so the command correctly falls back to it when key specs fail on the edge case. (cherry picked from commit 5f5ddfd)
Validate HEXPIRE-family field counts without parser overflow keep flexible option order; only require fields fit in argv add tests for INT_MAX numfields across HEXPIRE/HPEXPIRE/HEXPIREAT/HPEXPIREAT (cherry picked from commit e1d35ac)
This PR fixes two issues when processing corrupt data in rdbLoadCheckModuleValue(): 1. When handling `RDB_MODULE_OPCODE_STRING` opcode, rdbGenericLoadStringObject() can return NULL on a corrupt payload. The code called decrRefCount(o) unconditionally without a NULL check, resulting in a NULL pointer dereference crash. 2. The while loop condition was `!= RDB_MODULE_OPCODE_EOF`, which means a truncated payload (causing rdbLoadLen to return RDB_LENERR) would never exit the loop, since `RDB_LENERR != RDB_MODULE_OPCODE_EOF` is always true, potentially causing an infinite hang. (cherry picked from commit ca6e471)
…tracking (redis#15037) `xinfoReplyWithStreamInfo` passed the wrong key(c->argv[1]) instead of `c->argv[2]` to `updateSlotAllocSize` when updating per-slot memory tracking. Fix by passing the key explicitly to `xinfoReplyWithStreamInfo` instead of relying on a hardcoded argv index. Also, add the `-DDEBUG_ASSERTIONS` flag to the test-ubuntu-jemalloc CI to cover this debug assertion. (cherry picked from commit 2049c7f)
RM_RegisterClusterMessageReceiver() unlinks a receiver node from the clusterReceivers[type] linked list when the callback is set to NULL, but when removing the head node (prev == NULL), the code updates clusterReceivers[type]->next instead of clusterReceivers[type] itself. This leaves clusterReceivers[type] pointing to the freed node, so any later traversal through clusterReceivers[type] dereferences a dangling pointer. Fix by updating clusterReceivers[type] directly when prev == NULL. Fixes redis#15057 --------- Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 303667a)
Fixes checkPrefixCollisionsOrReply() to return 0 (failure) on any provided-prefix self-overlap, instead of accidentally returning a non-zero loop index for overlaps found after the first prefix. Signed-off-by: Raj Danday <rajkripal.danday@gmail.com> (cherry picked from commit 625b6f5)
### Problem In `scanGenericCommand`, `maxiterations = count * 10` overflows when `count > LONG_MAX / 10`, causing undefined behavior. ### Changed 1. Use saturating arithmetic to prevent overflow. 2. Added a test to trigger the overflow path, detectable by UBSan. (cherry picked from commit fafc472)
Reject control characters (0x00-0x1F, 0x7F) in user-controlled string arguments to SENTINEL SET, SENTINEL MONITOR, and SENTINEL CONFIG SET to prevent newline injection into the persisted config file. An attacker with access to SENTINEL SET could inject arbitrary config directives (e.g. notification-script) by embedding \r\n in auth-pass or similar fields, leading to code execution on restart. As a defense-in-depth measure, config serialization now uses sdscatrepr (via sentinelSdscatConfigArg) for all user-controlled string fields when they contain characters that require escaping. Simple values remain unquoted for backward compatibility with older config parsers. Add comprehensive Sentinel tests (16-config-injection.tcl) covering control character rejection for all affected commands, verification that injection payloads do not pollute the config file, round-trip persistence of values containing spaces and quotes through restart, and backward compatibility with the old unquoted config format. (cherry picked from commit 3e1afec)
## Issue The vector set Python tests intentionally use two clients: - the default client (`self.redis`) for the existing RESP2-oriented test expectations - `self.redis3` for RESP3-specific coverage. However, the default client did not explicitly set a protocol, so it depended on redis-py's default behavior. With newer redis-py versions, RESP3 is now the default protocol(redis/redis-py#4052). In particular, vector set replies such as `VSIM ... WITHSCORES` may be parsed into map/dict-like structures instead of the RESP2 flat-array shape assumed by existing tests. ## Changes Explicitly create the default primary and replica Redis clients with `protocol=2`. `self.redis3` is left unchanged and continues to use `protocol=3` for RESP3-specific test coverage. (cherry picked from commit 8fcf3dc)
…ded bulk string references (redis#14934) After redis#14608 (Reply Copy Avoidance), when copy avoidance kicks in, bulk string replies are sent by reference instead of being copied into the output buffer. The referenced bytes are not counted in `reply_bytes`, which causes: 1. `getClientOutputBufferMemoryUsage()` underestimates the actual memory usage, so output buffer limits may not be triggered in time, allowing clients to consume unbounded memory. 2. Client eviction does not account for the referenced bytes, making it ineffective when copy avoidance is used. 3. `omem` reported in `CLIENT LIST` / `CLIENT INFO` does not reflect the true output buffer memory footprint. Track the bytes of referenced bulk strings in the output buffer with two per-client counters: 1. reply_bytes_shared - the logical size of all BULK_STR_REF payloads in the output buffer. Updated incrementally whenever a reference is added/removed. Represents memory the client is "charged" for even though it is shared with the keyspace. 2. reply_bytes_unshared — the subset of the above where the referenced object's refcount == 1 (i.e. the key has been deleted from the keyspace), so the memory is kept alive solely by this client's output buffer and would actually be freed on disconnect. Maintained as a lazy cache refreshed via updateClientUnsharedReplyBytes(). CLIENT LIST / CLIENT INFO — two new fields, plus refined semantics for existing ones: Field | Meaning -- | -- omem | (semantics changed) logical output-buffer memory, now including shared memory referenced from the keyspace. Still excludes client->buf so static clients show 0. omem-shared | (new) shared output-buffer memory (referenced bulk strings, not solely owned by this client). omem-unshared | (new) unshared output-buffer memory (referenced bulk strings solely owned by this client; freed on disconnect). tot-mem | (semantics refined) actual memory usage — includes omem-unshared, excludes omem-shared to avoid double-counting keyspace memory. INFO memory — two new fields mirroring the above: Field | Meaning -- | -- mem_clients_normal | (semantics changed) actual memory usage of normal clients (includes unshared, excludes shared). mem_clients_normal_shared | (new) aggregate shared output-buffer memory across normal clients. mem_clients_normal_unshared | (new) aggregate unshared output-buffer memory across normal clients. MEMORY STATS — schema extended with the matching keys: Field | Meaning -- | -- clients.normal.shared | (new) aggregate shared output-buffer memory across normal clients. clients.normal.unshared | (new) aggregate unshared output-buffer memory across normal clients. Fix missing closeClientOnOutputBufferLimitReached() call when adding a referenced robj to the reply --------- Co-authored-by: oranagra <oran@redislabs.com> (cherry picked from commit 05859cd)
This PR is based on: valkey-io/valkey#3511 Close redis#14983 ## Summary During diskless replication, if **any single replica** cannot accept a write (TCP send buffer full / `EAGAIN`), the master stops reading the RDB pipe entirely, stalling data delivery to **all** replicas — including fast ones that are ready to receive data. The failure reason is similar to redis#14946, the socket buffer is more easy to fill. ## Root Cause In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and writes to all replica sockets in a loop. When `connWrite` to any replica returns a partial write (socket send buffer full), the handler: 1. Installs a per-replica `rdbPipeWriteHandler` and increments `rdb_pipe_numconns_writing` 2. **Removes the pipe read event** via `aeDeleteFileEvent(server.el, server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads The pipe read event is only re-enabled when **all** pending write handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the **slowest replica dictates the throughput for all replicas**. ## Observed Behavior With one slow replica (consuming at ~290 KB/s due to `key-load-delay`): - Master bursts ~1.3 MB of RDB data until the slow replica's socket send buffer fills - `rdbPipeReadHandler` disables the pipe read event - **All replicas starve for 4–5 seconds** while the slow replica drains its buffer - Cycle repeats: burst → stall → burst → stall Ultimately, it leads to a very slow synchronization process of the entire master and replica. ### Changes 1. Skip the entire `diskless replicas drop during rdb pipe` test under Valgrind to avoid timing flakiness on slow env. 2. Move `start_server` inside the `foreach all_drop` loop so each subcase gets a fresh master instead of sharing state across subcases. 3. For `no / slow / fast / all` subcases, replica 0 runs with `key-load-delay 500`, which combined with the blocked-writer TCP back-pressure can stall the RDB-saving child indefinitely; shrink the dataset to ~40 MB so the transfer still exercises the blocked-writer path but completes in reasonable time instead of hanging on the TCP deadlock. For the timeout subcase, replica 0 does not run with `key-load-delay 500`, so to avoid the TCP deadlock we still reduce the dataset somewhat, but keep it larger than the other subcases. Otherwise the kernel TCP send buffer can absorb the whole RDB, and we'd miss the repl_last_partial_write != 0 "(full sync)" timeout path and only hit the "(streaming sync)" path instead. 5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so the RDB child keeps generating data while both replicas are killed, ensuring the last-replica-drop path is exercised rather than racing with normal completion. 6. Move the slow-replica `pause_process()` so it happens only in the timeout subcase, not after killing replicas, so Redis observes the disconnect promptly in non-timeout flows. 7. In the timeout subcase, set `repl-timeout` 2, wait inline for `*Disconnecting timedout replica (full sync)*`, then restore `repl-timeout` 60 so the remaining replica can finish the streamed RDB. --------- Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com> Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 3189614)
Test 15-config-set-config-get.tcl was leaving announce-port and announce-hostnames at non-default values, which breaks auto-discovery in subsequent test units. Add reset lines at the end of each test that modifies config. This PR fixes failures in Daily CI tests. (cherry picked from commit d9b03bd)
Fixes consumer replication inconsistency when `XREADGROUP` is called for a new consumer but no `XCLAIM` commands are propagated to the replica. Previously, consumer creation was only propagated to replicas when `noack=true`, relying on `XCLAIM` propagation to implicitly create the consumer in the non-NOACK path. However, if no messages exist to read, no `XCLAIM` is generated, and the consumer is silently lost on the replica. This is a follow-up to the original fix in [redis#7140](redis#7140) / [redis#7526](redis#7526), which introduced `XGROUP CREATECONSUMER` propagation but only for the `NOACK` case. - **`xreadgroupCommand` (src/t_stream.c):** Replaced the `if (noack)` guard around the `streamPropagateConsumerCreation()` call with a deferred check after `streamReplyWithRange()`. Consumer creation is now propagated when `noack || propCount == 0` — that is, only when no `XCLAIM` commands were generated. This avoids redundant propagation in the common case where `XCLAIM` already implicitly creates the consumer on the replica, while correctly handling both the NOACK path (where PEL/XCLAIM is skipped entirely) and the no-messages path (where there is nothing to XCLAIM). - **Test (tests/unit/type/stream-cgroups.tcl):** Added replication test `"XREADGROUP propagates new consumer to replica"` that sets up a master-replica pair and verifies consumer propagation in two cases: (1) without NOACK when no messages are available to deliver, and (2) with NOACK when messages are delivered but XCLAIM is skipped. - **Master-replica consistency:** Consumers created by `XREADGROUP` are now visible on replicas whenever no `XCLAIM` would otherwise create them — covering both the NOACK path and the empty-stream path. - **No redundant propagation:** The noack || propCount == 0 condition avoids emitting a superfluous XGROUP CREATECONSUMER when XCLAIM commands are already propagated and would implicitly create the consumer on the replica. (cherry picked from commit 0be39e5)
…s#15094) Fixes redis#15085 ## Problem getKeySlot() may return `server.current_client->slot` while a command is executing instead of computing the slot from the provided string. The unsubscribe can be triggered by another client, in which case server.current_client is not the client being unsubscribed, so getKeySlot() would return that client's cached slot. Using this wrong slot would make the lookup in type.serverPubSubChannels miss the channel and ultimately trigger the assertion below. ## Fix Always use keyHashSlot() instead of getKeySlot() on unsubscribe. --------- Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 0bbb196)
# Description There is an array corruption bug in LDB caused by an incorrect size argument being passed to `memmove()` inside the `ldbDelBreakpoint()` function. When deleting a breakpoint, `memmove()` is used to shift the remaining breakpoints in the ldb.bp integer array forward. However, the size parameter passes the number of elements rather than the number of bytes. Because ldb.bp is an array of type `int`, this results in an under-copy. (cherry picked from commit bf432c9)
PR redis#14440 changed `mstate.commands` from an array of `multiCmd` structs to an array of `pendingCommand` pointers. This PR fixes the overhead calculation in multiStateMemOverhead to account for both the pointer array and the actual structs: - The pointer array: `alloc_count * sizeof(pendingCommand*)` - The individually allocated structs: `count * sizeof(pendingCommand)` (cherry picked from commit 9302d27)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.