8.4 commits backport#464
Closed
sundb wants to merge 13 commits into
Closed
Conversation
…itectures (redis#15212) Ensure backward compatibility and consistent behavior across different architectures by explicitly setting the default value. Fixes redis#15175 Co-authored-by: ofiryanai <ofiryanai1@gmail.com> (cherry picked from commit 6c3a8ec)
Fixes [redis#15183](redis#15183). ## Motivation Commit [cf668f2](redis@cf668f2) tightened cluster-announce-ip validation to require a valid IPv4 or IPv6 address, which is a regression for users that legitimately announce a hostname. ## Changes * isValidClusterAnnounceIp() now accepts either: * A valid IPv4/IPv6 address * A valid hostname — same character rules as cluster-announce-hostname, length-bounded by NET_IP_STR_LEN to match the storage buffer. (cherry picked from commit 21f2569) (cherry picked from commit 54ea50c)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 8.4-base #464 +/- ##
============================================
+ Coverage 76.10% 76.44% +0.33%
============================================
Files 131 131
Lines 76673 76690 +17
============================================
+ Hits 58353 58626 +273
+ Misses 18320 18064 -256
🚀 New features to boost your workflow:
|
…ack (redis#14848) Fixes redis#14838 ## Summary Fix a crash in `prefetchCommands()` that occurs during replica full sync when the replica has existing data that needs to be emptied. ## Problem Description During `emptyData()` → `kvstoreEmpty()` → `dictEmpty()` → `_dictClear()`, the first hash table is cleared and `d->ht_table[0]` is set to NULL via `_dictReset`. Then while clearing the second hash table, every 65536 buckets it invokes `replicationEmptyDbCallback()` → `processEventsWhileBlocked()` → `readQueryFromClient()` → `prefetchCommands()`. At this point, `dictSize() > 0` still holds (because the second hash table isn't fully cleared yet), but `ht_table[0]` is already NULL. The prefetch code assumed `ht_table[0]` is always valid when `dictSize() > 0`, leading to a crash. ## Solution 1. **Skip prefetch during loading**: Added a `server.loading` check at the top of `prefetchCommands()` to return early. During RDB loading, the main dictionary is being rebuilt, so prefetching keys from it is useless anyway. 2. **Add defensive assertion**: Added `serverAssert(batch->current_dicts[i]->ht_table[0])` in `initBatchInfo()` to catch any future cases where `ht_table[0]` is NULL while `dictSize() > 0` (which should only happen mid-`dictEmpty` via `_dictReset`). --------- Co-authored-by: kairosci <kairosci@users.noreply.github.com> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Yuan Wang <yuan.wang@redis.com> (cherry picked from commit 1abd489)
…up (redis#14552) ## Problem When destroying a consumer group with `XGROUP DESTROY`, the cached `min_cgroup_last_id` was not being invalidated. This caused incorrect behavior when using `XDELEX` with the `ACKED` option, as the cache still referenced the destroyed group's `last_id`. ## Solution Invalidate the `min_cgroup_last_id` cache when the destroyed group's `last_id` equals the cached minimum. The cache will be recalculated on the next call to `streamEntryIsReferenced()`. --------- Co-authored-by: guybe7 <guy.benoish@redislabs.com> (cherry picked from commit bb6389e)
…egies (redis#14623) This bug was introduced by redis#14130 and found by guybe7 When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete strategies, if a node was eligible for removal but couldn't be removed directly (because consumer group references need to be checked), the code would incorrectly break out of the loop instead of continuing to process entries within the node. This fix allows the per-entry deletion logic to execute for eligible nodes when using non-KEEPREF strategies. (cherry picked from commit 9ca860b)
MSETEX doesn't properly check ACL key permissions for all keys - only the first key is validated. MSETEX arguments look like: MSETEX <numkeys> key1 val1 key2 val2 ... EX seconds Keys are at every 2nd position (step=2). When Redis extracts keys for ACL checking, it calculates where the last key is: last = first + numkeys - 1; => calculation ignores step last = first + (numkeys-1) * step; With 2 keys starting at position 2: Bug: last = 2 + 2 - 1 = 3 → only checks position 2 Fix: last = 2 + (2-1)*2 = 4 → checks positions 2 and 4 Fixes redis#14657 (cherry picked from commit 7324949)
## Summary This bus was introduced by redis#14623 Before PR redis#14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue redis#8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`. (cherry picked from commit 85ab4ca)
…#14816) `setModuleEnumConfig()` was passing the prefixed config name to module callbacks instead of the unprefixed name, inconsistent with other config types. Fixed by using getRegisteredConfigName() like bool, numeric, and string configs do. Added assertions to all test module config callbacks to validate correct unprefixed names are received. Issue was introduced by redis#13656 (cherry picked from commit 8a65b65)
This PR is based on: valkey-io/valkey#3511 Close redis#14983 During diskless replication, if **any single replica** cannot accept a write (TCP send buffer full / `EAGAIN`), the master stops reading the RDB pipe entirely, stalling data delivery to **all** replicas — including fast ones that are ready to receive data. The failure reason is similar to redis#14946, the socket buffer is more easy to fill. In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and writes to all replica sockets in a loop. When `connWrite` to any replica returns a partial write (socket send buffer full), the handler: 1. Installs a per-replica `rdbPipeWriteHandler` and increments `rdb_pipe_numconns_writing` 2. **Removes the pipe read event** via `aeDeleteFileEvent(server.el, server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads The pipe read event is only re-enabled when **all** pending write handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the **slowest replica dictates the throughput for all replicas**. With one slow replica (consuming at ~290 KB/s due to `key-load-delay`): - Master bursts ~1.3 MB of RDB data until the slow replica's socket send buffer fills - `rdbPipeReadHandler` disables the pipe read event - **All replicas starve for 4–5 seconds** while the slow replica drains its buffer - Cycle repeats: burst → stall → burst → stall Ultimately, it leads to a very slow synchronization process of the entire master and replica. 1. Skip the entire `diskless replicas drop during rdb pipe` test under Valgrind to avoid timing flakiness on slow env. 2. Move `start_server` inside the `foreach all_drop` loop so each subcase gets a fresh master instead of sharing state across subcases. 3. For `no / slow / fast / all` subcases, replica 0 runs with `key-load-delay 500`, which combined with the blocked-writer TCP back-pressure can stall the RDB-saving child indefinitely; shrink the dataset to ~40 MB so the transfer still exercises the blocked-writer path but completes in reasonable time instead of hanging on the TCP deadlock. For the timeout subcase, replica 0 does not run with `key-load-delay 500`, so to avoid the TCP deadlock we still reduce the dataset somewhat, but keep it larger than the other subcases. Otherwise the kernel TCP send buffer can absorb the whole RDB, and we'd miss the repl_last_partial_write != 0 "(full sync)" timeout path and only hit the "(streaming sync)" path instead. 5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so the RDB child keeps generating data while both replicas are killed, ensuring the last-replica-drop path is exercised rather than racing with normal completion. 6. Move the slow-replica `pause_process()` so it happens only in the timeout subcase, not after killing replicas, so Redis observes the disconnect promptly in non-timeout flows. 7. In the timeout subcase, set `repl-timeout` 2, wait inline for `*Disconnecting timedout replica (full sync)*`, then restore `repl-timeout` 60 so the remaining replica can finish the streamed RDB. --------- Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com> Co-authored-by: debing.sun <debing.sun@redis.com> (cherry picked from commit 3189614)
In redis#14121, the SCAN filters order was changed, before redis#14121the order was - pattern, expiration and type, after redis#14121pattern became last, this break change broke the original behavior, which will cause scan with pattern also to remove the expired keys. This PR reorders the filters to be consistent with the original behavior and extends a test to cover this scenario. (cherry picked from commit 3920059)
This PR fixes an issue(redis#14541) where EXEC’s ACL recheck was still being performed during AOF loading, that may cause AOF loading failed, if ACL rules are changed and don't allow some commands in MULTI-EXEC. (cherry picked from commit 0288d70)
In case we have to kill an rdb child at shutdown, we wait for the child process to exit, and then resume with the shutodwn, and we did not clear the child_pid variable, since we're going to terminate anyway. but if the shutdown is then aborted due to another issue further down that function, we will try to kill that child again, and the waitpid will never get released. Reproduced in the test "SHUTDOWN can proceed if shutdown command was with nosave" (cherry picked from commit f56e011)
PR redis#14440 changed `mstate.commands` from an array of `multiCmd` structs to an array of `pendingCommand` pointers. This PR fixes the overhead calculation in multiStateMemOverhead to account for both the pointer array and the actual structs: - The pointer array: `alloc_count * sizeof(pendingCommand*)` - The individually allocated structs: `count * sizeof(pendingCommand)` (cherry picked from commit 9302d27)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.