8.2 commits backport by sundb · Pull Request #463 · sundb/redis

sundb · 2026-06-04T01:17:17Z

No description provided.

…itectures (redis#15212) Ensure backward compatibility and consistent behavior across different architectures by explicitly setting the default value. Fixes redis#15175 Co-authored-by: ofiryanai <ofiryanai1@gmail.com> (cherry picked from commit 6c3a8ec)

Fixes [redis#15183](redis#15183). ## Motivation Commit [cf668f2](redis@cf668f2) tightened cluster-announce-ip validation to require a valid IPv4 or IPv6 address, which is a regression for users that legitimately announce a hostname. ## Changes * isValidClusterAnnounceIp() now accepts either: * A valid IPv4/IPv6 address * A valid hostname — same character rules as cluster-announce-hostname, length-bounded by NET_IP_STR_LEN to match the storage buffer. (cherry picked from commit 21f2569) (cherry picked from commit 54ea50c)

codecov · 2026-06-04T01:29:54Z

Codecov Report

❌ Patch coverage is 82.22222% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.30%. Comparing base (ee7464f) to head (7692e39).

Files with missing lines	Patch %	Lines
modules/vector-sets/vset.c	0.00%	3 Missing ⚠️
src/module.c	0.00%	3 Missing ⚠️
src/cluster_legacy.c	85.71%	1 Missing ⚠️
src/config.c	90.90%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           8.2-base     #463      +/-   ##
============================================
+ Coverage     69.24%   69.30%   +0.05%     
============================================
  Files           128      128              
  Lines         72353    72369      +16     
============================================
+ Hits          50104    50152      +48     
+ Misses        22249    22217      -32

Files with missing lines	Coverage Δ
src/blocked.c	`92.35% <100.00%> (+0.02%)`	⬆️
src/db.c	`89.63% <100.00%> (+0.02%)`	⬆️
src/networking.c	`91.37% <100.00%> (+0.04%)`	⬆️
src/t_stream.c	`93.62% <100.00%> (+0.22%)`	⬆️
src/cluster_legacy.c	`74.13% <85.71%> (-0.19%)`	⬇️
src/config.c	`73.17% <90.90%> (-0.03%)`	⬇️
modules/vector-sets/vset.c	`25.31% <0.00%> (+0.85%)`	⬆️
src/module.c	`22.29% <0.00%> (+0.26%)`	⬆️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…4373) This PR aims to avoid the situation of a potential crash when efSearch is too large (and therefore the memory allocated could lead to a server crash or an integer overflow (where less memory is allocated than expected). - Limit the accepted EF in the request o 100_000 as in VADD - Limit the ef search to the number of nodes in the HNSW graph (cherry picked from commit 97df5b5)

redis#14417) Fix a heap-buffer-overflow vulnerability in the `CLUSTER FORGET` command when provided with a node ID shorter than the expected 40 bytes. When `CLUSTER FORGET` is called with a node ID that has a length smaller than `CLUSTER_NAMELEN` (40 bytes), the `clusterBlacklistExists()` function would read beyond the allocated string buffer. This occurs because the function always attempted to read exactly 40 bytes via `sdsnewlen(nodeid, CLUSTER_NAMELEN)`. Changes: - Added a `size_t len` parameter to `clusterBlacklistExists()` to use the actual string length - Updated all call sites to pass the appropriate length - Added a test case to verify the fix - Test added to verify that `CLUSTER FORGET` with an invalid short node ID returns an appropriate error. This PR is based on: valkey-io/valkey#2108 Co-authored-by: Ran Shidlansik <ranshid@amazon.com> (cherry picked from commit 48d0aa9)

This PR is based on: valkey-io/valkey#2347 This was introduced in redis#13512 The server crashes with a null pointer dereference when lookupKey() is called from handleClientsBlockedOnKey(). The crash occurs because server.executing_client is NULL, but the code attempts to access server.executing_client->cmd->proc without checking. **Crash scenario:** Client 1 enables CLIENT NO-TOUCH Client 2 blocks on BRPOP mylist 0 Client 1 executes RPUSH mylist elem When unblocking Client 2, lookupKey() dereferences NULL server.executing_client → crash **Solution** Added proper null checks before dereferencing server.executing_client: Check if LOOKUP_NOTOUCH flag is already set before attempting to modify it Verify both server.current_client and server.executing_client are not NULL before accessing their members Maintain the TOUCH command exception for scripts **Testing** Added regression test in tests/unit/type/list.tcl that reproduces and verifies the fix for this crash scenario. This fix is based on valkey-io/valkey#2347 Co-authored-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> (cherry picked from commit 5b49119)

…ancellation (redis#14420) This issue was introduced by redis#10440 In that PR, we avoided resetting the current user during processCommand, but overlooked the fact that this client might not be the current one, it could be a client that was previously blocked due to shutdown. If we don’t reset these clients, and the shutdown is canceled, then when these clients continue executing other commands, they will trigger an assertion. This PR delays the operation of resetting the client to processUnblockedClients and no longer skips SHUTDOWN_BLOCKED clients. (cherry picked from commit 6d89370)

This PR is based on: valkey-io/valkey#3511 Close redis#14983 During diskless replication, if **any single replica** cannot accept a write (TCP send buffer full / `EAGAIN`), the master stops reading the RDB pipe entirely, stalling data delivery to **all** replicas — including fast ones that are ready to receive data. The failure reason is similar to redis#14946, the socket buffer is more easy to fill. In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and writes to all replica sockets in a loop. When `connWrite` to any replica returns a partial write (socket send buffer full), the handler: 1. Installs a per-replica `rdbPipeWriteHandler` and increments `rdb_pipe_numconns_writing` 2. **Removes the pipe read event** via `aeDeleteFileEvent(server.el, server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads The pipe read event is only re-enabled when **all** pending write handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the **slowest replica dictates the throughput for all replicas**. With one slow replica (consuming at ~290 KB/s due to `key-load-delay`): - Master bursts ~1.3 MB of RDB data until the slow replica's socket send buffer fills - `rdbPipeReadHandler` disables the pipe read event - **All replicas starve for 4–5 seconds** while the slow replica drains its buffer - Cycle repeats: burst → stall → burst → stall Ultimately, it leads to a very slow synchronization process of the entire master and replica. 1. Skip the entire `diskless replicas drop during rdb pipe` test under Valgrind to avoid timing flakiness on slow env. 2. Move `start_server` inside the `foreach all_drop` loop so each subcase gets a fresh master instead of sharing state across subcases. 3. For `no / slow / fast / all` subcases, replica 0 runs with `key-load-delay 500`, which combined with the blocked-writer TCP back-pressure can stall the RDB-saving child indefinitely; shrink the dataset to ~40 MB so the transfer still exercises the blocked-writer path but completes in reasonable time instead of hanging on the TCP deadlock. For the timeout subcase, replica 0 does not run with `key-load-delay 500`, so to avoid the TCP deadlock we still reduce the dataset somewhat, but keep it larger than the other subcases. Otherwise the kernel TCP send buffer can absorb the whole RDB, and we'd miss the repl_last_partial_write != 0 "(full sync)" timeout path and only hit the "(streaming sync)" path instead. 5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so the RDB child keeps generating data while both replicas are killed, ensuring the last-replica-drop path is exercised rather than racing with normal completion. 6. Move the slow-replica `pause_process()` so it happens only in the timeout subcase, not after killing replicas, so Redis observes the disconnect promptly in non-timeout flows. 7. In the timeout subcase, set `repl-timeout` 2, wait inline for `*Disconnecting timedout replica (full sync)*`, then restore `repl-timeout` 60 so the remaining replica can finish the streamed RDB. --------- Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com> Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 3189614)

`RedisModule_GetClusterNodeInfo` is [documented](https://redis.io/docs/latest/develop/reference/modules/modules-api-ref/#redismodule_getclusternodeslist) to be used with `RedisModule_GetClusterNodesList`, which states (and does) that the returned strings are not null-terminated. Therefore, it is unsafe to call `strlen` on the `const char *id` input of `RedisModule_GetClusterNodeInfo`, and the API should assume the string is of the correct length (cherry picked from commit b043ac5)

Fixes data race where main thread modifies pauserehash in dictNext while IO thread reads `useStoredKeyApi()` in `lookupCommand()->dictFind()` path. ASAN detected overlapping memory access at same byte offset causing race condition. When bit fields are adjacent in a struct, modifying one bit field requires a read-modify-write operation on the entire memory unit, which can cause race conditions with concurrent access to other bit fields in the same unit. This was introduced by redis#13696, although it was changed by redis#14440, but this issue still exist. The fix moves `useStoredKeyApi` to share a memory word with `pauseAutoResize` instead. Since `pauseAutoResize` is never modified for `server.commands`, this eliminates the race condition while maintaining memory efficiency through bit field packing. Reproduce step: ``` make SANITIZER=thread ./runtest --tsan --config io-threads 4 --accurate --verbose --dump-logs --single unit/obuf-limits --loop --stop ``` failed CI: https://github.com/redis/redis/actions/runs/18803219305/job/53653572710 --------- Co-authored-by: Moti Cohen <moti.cohen@redis.com> (cherry picked from commit 6ea4e2c)

- We will generate a random shard id when creating a cluster node, so `auxFieldHandlers[af_shard_id].isPresent(n) == 0` never meet, so it means we never add master nodes into `cluster.shards` when loading, this bug is introduced in redis#10536 that supports `shard-id` concept. BTW, redis#13468 can make replicas add into `cluster.shards` - Replica shard_id may be different with master, so before we add again, we should remove it, otherwise, `cluster.shards` will have dirty shards, introduced in redis#13468 These bugs causes the output of the `cluster shards` is corrupt, the temporary `slot_info` may not be cleaned up, we may see the duplicated slot ranges, and a shard info without master. (cherry picked from commit bf0ba1d)

…up (redis#14552) When destroying a consumer group with `XGROUP DESTROY`, the cached `min_cgroup_last_id` was not being invalidated. This caused incorrect behavior when using `XDELEX` with the `ACKED` option, as the cache still referenced the destroyed group's `last_id`. Invalidate the `min_cgroup_last_id` cache when the destroyed group's `last_id` equals the cached minimum. The cache will be recalculated on the next call to `streamEntryIsReferenced()`. --------- Co-authored-by: guybe7 <guy.benoish@redislabs.com> (cherry picked from commit bb6389e)

…egies (redis#14623) This bug was introduced by redis#14130 and found by guybe7 When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete strategies, if a node was eligible for removal but couldn't be removed directly (because consumer group references need to be checked), the code would incorrectly break out of the loop instead of continuing to process entries within the node. This fix allows the per-entry deletion logic to execute for eligible nodes when using non-KEEPREF strategies. (cherry picked from commit 9ca860b)

…#14816) `setModuleEnumConfig()` was passing the prefixed config name to module callbacks instead of the unprefixed name, inconsistent with other config types. Fixed by using getRegisteredConfigName() like bool, numeric, and string configs do. Added assertions to all test module config callbacks to validate correct unprefixed names are received. Issue was introduced by redis#13656 (cherry picked from commit 8a65b65)

In redis#14121, the SCAN filters order was changed, before redis#14121the order was - pattern, expiration and type, after redis#14121pattern became last, this break change broke the original behavior, which will cause scan with pattern also to remove the expired keys. This PR reorders the filters to be consistent with the original behavior and extends a test to cover this scenario. (cherry picked from commit 3920059)

## Summary This bus was introduced by redis#14623 Before PR redis#14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue redis#8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`. (cherry picked from commit 85ab4ca)

YaacovHazan and others added 2 commits June 4, 2026 09:15

sundb changed the title ~~Test tcp deadlock fixes (redis#14667)~~ 8.2 commits backport Jun 4, 2026

sundb force-pushed the 8.2-backport branch from 604a238 to edaa8c8 Compare June 4, 2026 02:15

moticless and others added 3 commits June 4, 2026 10:40

sundb force-pushed the 8.2-backport branch from edaa8c8 to f83eab0 Compare June 4, 2026 02:40

vitahlin and others added 9 commits June 4, 2026 12:40

sundb closed this Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.2 commits backport#463

8.2 commits backport#463
sundb wants to merge 15 commits into
8.2-basefrom
8.2-backport

sundb commented Jun 4, 2026

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

sundb commented Jun 4, 2026

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

codecov Bot commented Jun 4, 2026 •

edited

Loading