Skip to content

8.4 commits backport#464

Closed
sundb wants to merge 13 commits into
8.4-basefrom
8.4-backport
Closed

8.4 commits backport#464
sundb wants to merge 13 commits into
8.4-basefrom
8.4-backport

Conversation

@sundb

@sundb sundb commented Jun 4, 2026

Copy link
Copy Markdown
Owner

No description provided.

YaacovHazan and others added 2 commits June 4, 2026 09:21
…itectures (redis#15212)

Ensure backward compatibility and consistent behavior across different
architectures by explicitly setting the default value.

Fixes redis#15175

Co-authored-by: ofiryanai <ofiryanai1@gmail.com>
(cherry picked from commit 6c3a8ec)
Fixes [redis#15183](redis#15183).

## Motivation
Commit
[cf668f2](redis@cf668f2)
tightened cluster-announce-ip validation to require a valid IPv4 or IPv6
address, which is a regression for users that legitimately announce a
hostname.

## Changes
* isValidClusterAnnounceIp() now accepts either:
   * A valid IPv4/IPv6 address
   * A valid hostname — same character rules as cluster-announce-hostname,
length-bounded by NET_IP_STR_LEN to match the storage buffer.

(cherry picked from commit 21f2569)
(cherry picked from commit 54ea50c)
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.14286% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.44%. Comparing base (e0bc295) to head (f68bcd2).

Files with missing lines Patch % Lines
src/config.c 90.90% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           8.4-base     #464      +/-   ##
============================================
+ Coverage     76.10%   76.44%   +0.33%     
============================================
  Files           131      131              
  Lines         76673    76690      +17     
============================================
+ Hits          58353    58626     +273     
+ Misses        18320    18064     -256     
Files with missing lines Coverage Δ
src/db.c 93.56% <100.00%> (+<0.01%) ⬆️
src/memory_prefetch.c 99.32% <100.00%> (+<0.01%) ⬆️
src/module.c 82.65% <100.00%> (+0.20%) ⬆️
src/multi.c 95.76% <100.00%> (+0.06%) ⬆️
src/server.c 90.66% <100.00%> (+<0.01%) ⬆️
src/t_stream.c 94.28% <100.00%> (+0.23%) ⬆️
src/config.c 85.81% <90.90%> (-0.07%) ⬇️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kairosci and others added 11 commits June 4, 2026 10:20
…ack (redis#14848)

Fixes redis#14838

## Summary

Fix a crash in `prefetchCommands()` that occurs during replica full sync
when the replica has existing data that needs to be emptied.

## Problem Description

During `emptyData()` → `kvstoreEmpty()` → `dictEmpty()` →
`_dictClear()`, the first hash table is cleared and `d->ht_table[0]` is
set to NULL via `_dictReset`. Then while clearing the second hash table,
every 65536 buckets it invokes `replicationEmptyDbCallback()` →
`processEventsWhileBlocked()` → `readQueryFromClient()` →
`prefetchCommands()`.
At this point, `dictSize() > 0` still holds (because the second hash
table isn't fully cleared yet), but `ht_table[0]` is already NULL. The
prefetch code assumed `ht_table[0]` is always valid when `dictSize() >
0`, leading to a crash.

## Solution
1. **Skip prefetch during loading**: Added a `server.loading` check at
the top of `prefetchCommands()` to return early. During RDB loading, the
main dictionary is being rebuilt, so prefetching keys from it is useless
anyway.
2. **Add defensive assertion**: Added
`serverAssert(batch->current_dicts[i]->ht_table[0])` in
`initBatchInfo()` to catch any future cases where `ht_table[0]` is NULL
while `dictSize() > 0` (which should only happen mid-`dictEmpty` via
`_dictReset`).

---------

Co-authored-by: kairosci <kairosci@users.noreply.github.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
(cherry picked from commit 1abd489)
…up (redis#14552)

## Problem

When destroying a consumer group with `XGROUP DESTROY`, the cached
`min_cgroup_last_id` was not being invalidated. This caused incorrect
behavior when using `XDELEX` with the `ACKED` option, as the cache still
referenced the destroyed group's `last_id`.

## Solution

Invalidate the `min_cgroup_last_id` cache when the destroyed group's
`last_id` equals the cached minimum. The cache will be recalculated on
the next call to `streamEntryIsReferenced()`.

---------

Co-authored-by: guybe7 <guy.benoish@redislabs.com>
(cherry picked from commit bb6389e)
…egies (redis#14623)

This bug was introduced by redis#14130 and found by guybe7

When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete
strategies, if a node was eligible for removal but couldn't be removed
directly (because consumer group references need to be checked), the
code would incorrectly break out of the loop instead of continuing to
process entries within the node. This fix allows the per-entry deletion
logic to execute for eligible nodes when using non-KEEPREF strategies.

(cherry picked from commit 9ca860b)
MSETEX doesn't properly check ACL key permissions for all keys - only
the first key is validated.

MSETEX arguments look like: MSETEX <numkeys> key1 val1 key2 val2 ... EX
seconds

Keys are at every 2nd position (step=2). When Redis extracts keys for
ACL checking, it calculates where the last key is:

last = first + numkeys - 1;        => calculation ignores step
last = first + (numkeys-1) * step;
With 2 keys starting at position 2:

Bug: last = 2 + 2 - 1 = 3 → only checks position 2
Fix: last = 2 + (2-1)*2 = 4 → checks positions 2 and 4

Fixes redis#14657

(cherry picked from commit 7324949)
## Summary

This bus was introduced by redis#14623

Before PR redis#14623, when a stream node was going to be fully removed, we
would just delete the whole node directly instead of iterating through
and deleting each entry.

Now, with the XTRIM/XADD flags, we have to iterate and delete entries
one by one. However, the implementation in issue redis#8169 didn’t consider
the case where all entries are removed, so `p` can end up being NULL.

Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking
the last entry in a listpack as deleted. The issue occurs when
performing pointer arithmetic on a NULL pointer after `lpNext()` reaches
the end of the listpack.

## Solution
If p is NULL, we skip the delta calculation and the calculation of
new `p`.

(cherry picked from commit 85ab4ca)
…#14816)

`setModuleEnumConfig()` was passing the prefixed config name to module
callbacks instead of the unprefixed name, inconsistent with other
config types. Fixed by using getRegisteredConfigName() like bool,
numeric, and string configs do.

Added assertions to all test module config callbacks to validate
correct unprefixed names are received.

Issue was introduced by redis#13656

(cherry picked from commit 8a65b65)
This PR is based on: valkey-io/valkey#3511
Close redis#14983

During diskless replication, if **any single replica** cannot accept a
write (TCP send buffer full / `EAGAIN`), the master stops reading the
RDB pipe entirely, stalling data delivery to **all** replicas —
including fast ones that are ready to receive data.

The failure reason is similar to
redis#14946, the socket buffer is more
easy to fill.

In `rdbPipeReadHandler`, the master reads from the child's RDB pipe and
writes to all replica sockets in a loop. When `connWrite` to any replica
returns a partial write (socket send buffer full), the handler:

1. Installs a per-replica `rdbPipeWriteHandler` and increments
`rdb_pipe_numconns_writing`
2. **Removes the pipe read event** via `aeDeleteFileEvent(server.el,
server.rdb_pipe_read, AE_READABLE)`, stopping all pipe reads

The pipe read event is only re-enabled when **all** pending write
handlers complete (`rdb_pipe_numconns_writing == 0`), meaning the
**slowest replica dictates the throughput for all replicas**.

With one slow replica (consuming at ~290 KB/s due to `key-load-delay`):

- Master bursts ~1.3 MB of RDB data until the slow replica's socket send
buffer fills
- `rdbPipeReadHandler` disables the pipe read event
- **All replicas starve for 4–5 seconds** while the slow replica drains
its buffer
- Cycle repeats: burst → stall → burst → stall

Ultimately, it leads to a very slow synchronization process of the
entire master and replica.

1. Skip the entire `diskless replicas drop during rdb pipe` test under
Valgrind to avoid timing flakiness on slow env.

2. Move `start_server` inside the `foreach all_drop` loop so each
subcase gets a fresh master instead of sharing state across subcases.

3. For `no / slow / fast / all` subcases, replica 0 runs with
`key-load-delay 500`, which combined with the blocked-writer TCP
back-pressure can stall the RDB-saving child indefinitely; shrink the
dataset to ~40 MB so the transfer still exercises the blocked-writer
path but completes in reasonable time instead of hanging on the TCP
deadlock.
For the timeout subcase, replica 0 does not run with `key-load-delay
500`, so to avoid the TCP deadlock we still reduce the dataset somewhat,
but keep it larger than the other subcases. Otherwise the kernel TCP
send buffer can absorb the whole RDB, and we'd miss the
repl_last_partial_write != 0 "(full sync)" timeout path and only hit the
"(streaming sync)" path instead.

5. For the `all` subcase, set `rdb-key-save-delay 1000` on the master so
the RDB child keeps generating data while both replicas are killed,
ensuring the last-replica-drop path is exercised rather than racing with
normal completion.

6. Move the slow-replica `pause_process()` so it happens only in the
timeout subcase, not after killing replicas, so Redis observes the
disconnect promptly in non-timeout flows.

7. In the timeout subcase, set `repl-timeout` 2, wait inline for
`*Disconnecting timedout replica (full sync)*`, then restore
`repl-timeout` 60 so the remaining replica can finish the streamed RDB.

---------

Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
(cherry picked from commit 3189614)
In redis#14121, the SCAN filters order was changed, before redis#14121the order
was - pattern, expiration and type, after redis#14121pattern became last,
this break change broke the original behavior, which will cause scan
with pattern also to remove the expired keys.
This PR reorders the filters to be consistent with the original behavior
and extends a test to cover this scenario.

(cherry picked from commit 3920059)
This PR fixes an issue(redis#14541) where EXEC’s ACL recheck was still being
performed during AOF loading, that may cause AOF loading failed, if ACL
rules are changed and don't allow some commands in MULTI-EXEC.

(cherry picked from commit 0288d70)
In case we have to kill an rdb child at shutdown, we wait for the child
process to exit, and then resume with the shutodwn, and we did not clear
the child_pid variable, since we're going to terminate anyway. but if
the shutdown is then aborted due to another issue further down that
function, we will try to kill that child again, and the waitpid will
never get released.

Reproduced in the test "SHUTDOWN can proceed if shutdown command was
with nosave"

(cherry picked from commit f56e011)
PR redis#14440 changed `mstate.commands` from an array of `multiCmd` structs
to an array of `pendingCommand` pointers. This PR fixes the overhead
calculation in multiStateMemOverhead to account for both the pointer
array and the actual structs:
- The pointer array: `alloc_count * sizeof(pendingCommand*)`
- The individually allocated structs: `count * sizeof(pendingCommand)`

(cherry picked from commit 9302d27)
@sundb sundb closed this Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.