Skip to content

Backport Test Fixes 7.2#14

Open
roshkhatri wants to merge 20 commits into
7.2from
backport-test-fixes-7.2
Open

Backport Test Fixes 7.2#14
roshkhatri wants to merge 20 commits into
7.2from
backport-test-fixes-7.2

Conversation

@roshkhatri

@roshkhatri roshkhatri commented May 5, 2026

Copy link
Copy Markdown
Owner

Replicates - CI/test stabilization backports for 7.2 branch. Includes ARC runner daily.yml changes for self-hosted CI testing.

sarthakaggarwal97 and others added 20 commits May 3, 2026 11:59
Partial cherry-pick of f3b6470 from unstable.
Only maxmemory.tcl and memefficiency.tcl apply to 7.2.
Adapted: valkey_deferring_client -> redis_deferring_client.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit b6000ac)
Cherry-pick of c35dbdf from unstable.
Adapted: valkey_deferring_client -> redis_deferring_client.
Note: diskless-load-swapdb.tcl is at tests/cluster/tests/ on 7.2
(moved to tests/unit/cluster/ on later branches).

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit 31e3c88)
Fedora Rawhide/Latest now ships Tcl 9.x. The runtest scripts only
looked for tclsh8.5/8.6/8.7, causing 'You need tcl 8.5 or newer'
failures on Fedora CI jobs.

Minimal backport of the Tcl version detection from valkey-io#1673.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit 09b6338)
The weekly CI workflow uses unstable's daily.yml which passes
--io-threads to the test runner. 7.2's test_helper.tcl didn't
recognize this flag, causing 'Wrong argument: --io-threads' failures.

Translates --io-threads to --config io-threads 4 --config
io-threads-do-reads yes, matching the behavior on newer branches.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit ace3a97)
Port the robust version of 'client evicted due to percentage of
maxmemory' from unstable. The 7.2 version was racy — it asserted
memory usage immediately after write+flush without waiting for the
query buffer to be populated. Under TLS or ASAN overhead, the data
hadn't arrived yet, causing assertion failures.

Changes from unstable:
- Send n-1 bytes (not n) to avoid using the shared query buffer
- Add wait_for_condition before asserting tot-mem
- Add wait_for_condition in the eviction path too

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit b75c6f9)
Use CLIENT REPLY OFF pattern in the defrag eval scripts test which
pipelines 50,000 script loads + 50,000 set commands. Without this,
the TCP send buffer fills up causing 'I/O error reading reply'.

Same CLIENT REPLY OFF pattern as valkey-io#3430 and valkey-io#3452.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit 6047b58)
Cherry-pick of 6ce75cd from unstable.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit 2ac93c1)
Cherry-pick of f6b5461 from unstable.
Increases wait_for_condition timeout for rdb child termination.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit 9e7f841)
(cherry picked from commit 884c57b)
Maybe partially resolves valkey-io#952.

The hostnames test relies on an assumption that node zero and node six
don't communicate with each other to test a bunch of behavior in the
handshake stake. This was done by previously dropping all meet packets,
however it seems like there was some case where node zero was sending a
single pong message to node 6, which was partially initializing the
state.

I couldn't track down why this happened, but I adjusted the test to
simply pause node zero which also correctly emulates the state we want
to be in since we're just testing state on node 6, and removes the
chance of errant messages. The test was failing about 5% of the time
locally, and I wasn't able to reproduce a failure with this new
configuration.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
(cherry picked from commit 5fde8e8)
This deflakes all variants of `diskless replicas drop during rdb pipe`.

The main issue turned out to be that the test was too sensitive to
timing and log ordering under TLS, not that the core behavior was wrong.
This keeps the same five subcases (no, slow, fast, all, timeout) but
makes them much less CI-fragile.

CI passes 200 times:
https://github.com/sarthakaggarwal97/valkey/actions/runs/24547258515

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com>
Co-authored-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com>
(cherry picked from commit 2b8df83)
(cherry picked from commit b051910)
Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable:

Active defrag eval scripts
Active defrag big keys
Active defrag big list
Active defrag edge case

(cherry picked from commit becd50d)
Replace (void (*)(void*))sdsfree casts with sdsfreeVoid (which already
exists on 7.2 in sds.c). Add engineLibraryFreeVoid wrapper in
functions.c. Remove unnecessary cast on zfree (already void*).

Fixes UBSan error: 'call to function sdsfree through pointer to
incorrect function type void (*)(void *)' at adlist.c:185.

Minimal backport of valkey-io#1451 from unstable.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
(cherry picked from commit 5eeedb1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants