Backport Test Fixes 7.2 by roshkhatri · Pull Request #14 · roshkhatri/valkey

roshkhatri · 2026-05-05T20:06:02Z

Replicates - CI/test stabilization backports for 7.2 branch. Includes ARC runner daily.yml changes for self-hosted CI testing.

Partial cherry-pick of f3b6470 from unstable. Only maxmemory.tcl and memefficiency.tcl apply to 7.2. Adapted: valkey_deferring_client -> redis_deferring_client. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit b6000ac)

Cherry-pick of c35dbdf from unstable. Adapted: valkey_deferring_client -> redis_deferring_client. Note: diskless-load-swapdb.tcl is at tests/cluster/tests/ on 7.2 (moved to tests/unit/cluster/ on later branches). Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 31e3c88)

Fedora Rawhide/Latest now ships Tcl 9.x. The runtest scripts only looked for tclsh8.5/8.6/8.7, causing 'You need tcl 8.5 or newer' failures on Fedora CI jobs. Minimal backport of the Tcl version detection from valkey-io#1673. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 09b6338)

The weekly CI workflow uses unstable's daily.yml which passes --io-threads to the test runner. 7.2's test_helper.tcl didn't recognize this flag, causing 'Wrong argument: --io-threads' failures. Translates --io-threads to --config io-threads 4 --config io-threads-do-reads yes, matching the behavior on newer branches. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit ace3a97)

Port the robust version of 'client evicted due to percentage of maxmemory' from unstable. The 7.2 version was racy — it asserted memory usage immediately after write+flush without waiting for the query buffer to be populated. Under TLS or ASAN overhead, the data hadn't arrived yet, causing assertion failures. Changes from unstable: - Send n-1 bytes (not n) to avoid using the shared query buffer - Add wait_for_condition before asserting tot-mem - Add wait_for_condition in the eviction path too Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit b75c6f9)

Use CLIENT REPLY OFF pattern in the defrag eval scripts test which pipelines 50,000 script loads + 50,000 set commands. Without this, the TCP send buffer fills up causing 'I/O error reading reply'. Same CLIENT REPLY OFF pattern as valkey-io#3430 and valkey-io#3452. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 6047b58)

Cherry-pick of 6ce75cd from unstable. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 2ac93c1)

Cherry-pick of f6b5461 from unstable. Increases wait_for_condition timeout for rdb child termination. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 9e7f841)

(cherry picked from commit bf36908)

(cherry picked from commit 884c57b)

(cherry picked from commit 4eae1ba)

Maybe partially resolves valkey-io#952. The hostnames test relies on an assumption that node zero and node six don't communicate with each other to test a bunch of behavior in the handshake stake. This was done by previously dropping all meet packets, however it seems like there was some case where node zero was sending a single pong message to node 6, which was partially initializing the state. I couldn't track down why this happened, but I adjusted the test to simply pause node zero which also correctly emulates the state we want to be in since we're just testing state on node 6, and removes the chance of errant messages. The test was failing about 5% of the time locally, and I wasn't able to reproduce a failure with this new configuration. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Ping Xie <pingxie@google.com> (cherry picked from commit 5fde8e8)

(cherry picked from commit 277b800)

(cherry picked from commit 794b7f3)

(cherry picked from commit 2ba3ca1)

(cherry picked from commit e1876d6)

This deflakes all variants of `diskless replicas drop during rdb pipe`. The main issue turned out to be that the test was too sensitive to timing and log ordering under TLS, not that the core behavior was wrong. This keeps the same five subcases (no, slow, fast, all, timeout) but makes them much less CI-fragile. CI passes 200 times: https://github.com/sarthakaggarwal97/valkey/actions/runs/24547258515 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com> Co-authored-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com> (cherry picked from commit 2b8df83) (cherry picked from commit b051910)

Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable: Active defrag eval scripts Active defrag big keys Active defrag big list Active defrag edge case (cherry picked from commit becd50d)

Replace (void (*)(void*))sdsfree casts with sdsfreeVoid (which already exists on 7.2 in sds.c). Add engineLibraryFreeVoid wrapper in functions.c. Remove unnecessary cast on zfree (already void*). Fixes UBSan error: 'call to function sdsfree through pointer to incorrect function type void (*)(void *)' at adlist.c:185. Minimal backport of valkey-io#1451 from unstable. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 5eeedb1)

sarthakaggarwal97 and others added 20 commits May 3, 2026 11:59

Fix replica online timing issue in failover test (valkey-io#1044)

7af6a00

Cherry-pick of 6ce75cd from unstable. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 2ac93c1)

Deflake 'diskless no replicas drop during rdb pipe' (valkey-io#3461)

a7b157a

Cherry-pick of f6b5461 from unstable. Increases wait_for_condition timeout for rdb child termination. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> (cherry picked from commit 9e7f841)

tests: add missing replica sync helper

16db1d6

(cherry picked from commit bf36908)

tests: allow Tcl 9 in test harness

e8ba038

(cherry picked from commit 884c57b)

tests: deflake 7.2 memefficiency clients

4da20b9

(cherry picked from commit 4eae1ba)

tests: deflake 7.2 big-list defrag setup

eb15c39

(cherry picked from commit 277b800)

tests: deflake 7.2 defrag edge-case setup

16ea7ce

(cherry picked from commit 794b7f3)

tests: avoid deferred reply backlog in defrag test

5ebbe19

(cherry picked from commit 2ba3ca1)

tests: deflake legacy cluster manual failover

4494a9c

(cherry picked from commit e1876d6)

Disable flaky defrag tests affecting daily run (#12672)

be460b7

Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable: Active defrag eval scripts Active defrag big keys Active defrag big list Active defrag edge case (cherry picked from commit becd50d)

Add ARC runner changes to daily.yml for CI testing

cd73edb

github-actions Bot assigned roshkhatri May 5, 2026

roshkhatri force-pushed the 7.2 branch from 40d3ea7 to 986a3bc Compare May 5, 2026 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport Test Fixes 7.2#14

Backport Test Fixes 7.2#14
roshkhatri wants to merge 20 commits into
7.2from
backport-test-fixes-7.2

roshkhatri commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

roshkhatri commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roshkhatri commented May 5, 2026 •

edited

Loading