Skip to content

Improve COB memory tracking with copy avoidance#3306

Merged
ranshid merged 21 commits into
valkey-io:unstablefrom
dvkashapov:cob-repl-avoidance-fix
Apr 20, 2026
Merged

Improve COB memory tracking with copy avoidance#3306
ranshid merged 21 commits into
valkey-io:unstablefrom
dvkashapov:cob-repl-avoidance-fix

Conversation

@dvkashapov

@dvkashapov dvkashapov commented Mar 4, 2026

Copy link
Copy Markdown
Member

This improves COB memory tracking when using copy avoidance for bulk string replies. This fix addresses underestimation of client memory usage that occurred when reply buffers stored pointers to shared robj instead of copying data.
IO threads calculate actual reply sizes by calling sdslen() on strings before writing, for that we need atomic tracked_for_cob flag in payload headers to prevent race conditions and double accounting.

See #2396

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
@codecov

codecov Bot commented Mar 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.66%. Comparing base (a34f125) to head (3c1fc5f).
⚠️ Report is 32 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3306      +/-   ##
============================================
+ Coverage     76.41%   76.66%   +0.24%     
============================================
  Files           159      159              
  Lines         79815    79877      +62     
============================================
+ Hits          60990    61234     +244     
+ Misses        18825    18643     -182     
Files with missing lines Coverage Δ
src/networking.c 92.37% <100.00%> (+0.15%) ⬆️
src/server.h 100.00% <ø> (ø)
src/unit/test_networking.cpp 99.33% <ø> (ø)

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Mar 5, 2026

Copy link
Copy Markdown

Benchmark ran on this commit: 26e24d2

Benchmark Comparison: HEAD vs HEAD (averaged) - rps metrics

Run Summary:

  • HEAD: 80 total runs, 16 configurations (avg 5.00 runs per config)
  • HEAD: 80 total runs, 16 configurations (avg 5.00 runs per config)

Statistical Notes:

  • CI99%: 99% Confidence Interval - range where the true population mean is likely to fall
  • PI99%: 99% Prediction Interval - range where a single future observation is likely to fall
  • CV: Coefficient of Variation - relative variability (σ/μ × 100%)

Note: Values with (n=X, σ=Y, CV=Z%, CI99%=±W%, PI99%=±V%) indicate averages from X runs with standard deviation Y, coefficient of variation Z%, 99% confidence interval margin of error ±W% of the mean, and 99% prediction interval margin of error ±V% of the mean. CI bounds [A, B] and PI bounds [C, D] show the actual interval ranges.

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 16
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads HEAD HEAD Diff % Change
GET rps 1 1 227400.008 (n=5, σ=1057.859, CV=0.47%, CI99%=±0.958%, PI99%=±2.346%, CI[225221.862, 229578.154], PI[222064.662, 232735.354]) 226125.028 (n=5, σ=1909.759, CV=0.84%, CI99%=±1.739%, PI99%=±4.260%, CI[222192.809, 230057.247], PI[216493.097, 235756.959]) -1274.980 -0.561%
GET rps 1 9 1512435.174 (n=5, σ=15762.058, CV=1.04%, CI99%=±2.146%, PI99%=±5.256%, CI[1479980.878, 1544889.470], PI[1432938.709, 1591931.639]) 1510113.252 (n=5, σ=16705.214, CV=1.11%, CI99%=±2.278%, PI99%=±5.579%, CI[1475716.985, 1544509.519], PI[1425859.950, 1594366.554]) -2321.922 -0.154%
GET rps 10 1 1222464.972 (n=5, σ=3136.173, CV=0.26%, CI99%=±0.528%, PI99%=±1.294%, CI[1216007.549, 1228922.395], PI[1206647.581, 1238282.363]) 1224528.104 (n=5, σ=2828.452, CV=0.23%, CI99%=±0.476%, PI99%=±1.165%, CI[1218704.283, 1230351.925], PI[1210262.715, 1238793.493]) 2063.132 +0.169%
GET rps 10 9 2795810.600 (n=5, σ=21499.027, CV=0.77%, CI99%=±1.583%, PI99%=±3.878%, CI[2751543.806, 2840077.394], PI[2687379.542, 2904241.658]) 2782455.300 (n=5, σ=22808.488, CV=0.82%, CI99%=±1.688%, PI99%=±4.134%, CI[2735492.308, 2829418.292], PI[2667419.933, 2897490.667]) -13355.300 -0.478%
SET rps 1 1 219764.230 (n=5, σ=1064.040, CV=0.48%, CI99%=±0.997%, PI99%=±2.442%, CI[217573.357, 221955.103], PI[214397.710, 225130.750]) 220145.450 (n=5, σ=3312.579, CV=1.50%, CI99%=±3.098%, PI99%=±7.589%, CI[213324.804, 226966.096], PI[203438.347, 236852.553]) 381.220 +0.173%
SET rps 1 9 1476617.048 (n=5, σ=13018.118, CV=0.88%, CI99%=±1.815%, PI99%=±4.446%, CI[1449812.564, 1503421.532], PI[1410959.739, 1542274.357]) 1472671.674 (n=5, σ=8843.016, CV=0.60%, CI99%=±1.236%, PI99%=±3.029%, CI[1454463.782, 1490879.566], PI[1428071.628, 1517271.720]) -3945.374 -0.267%
SET rps 10 1 1042190.674 (n=5, σ=5579.471, CV=0.54%, CI99%=±1.102%, PI99%=±2.700%, CI[1030702.466, 1053678.882], PI[1014050.425, 1070330.923]) 1042658.874 (n=5, σ=6189.874, CV=0.59%, CI99%=±1.222%, PI99%=±2.994%, CI[1029913.837, 1055403.911], PI[1011440.037, 1073877.711]) 468.200 +0.045%
SET rps 10 9 1949962.424 (n=5, σ=16895.009, CV=0.87%, CI99%=±1.784%, PI99%=±4.370%, CI[1915175.368, 1984749.480], PI[1864751.886, 2035172.962]) 1947115.948 (n=5, σ=13189.373, CV=0.68%, CI99%=±1.395%, PI99%=±3.416%, CI[1919958.848, 1974273.048], PI[1880594.909, 2013636.987]) -2846.476 -0.146%

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 96
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads HEAD HEAD Diff % Change
GET rps 1 1 218947.202 (n=5, σ=1703.005, CV=0.78%, CI99%=±1.602%, PI99%=±3.923%, CI[215440.692, 222453.712], PI[210358.041, 227536.363]) 219489.050 (n=5, σ=1271.062, CV=0.58%, CI99%=±1.192%, PI99%=±2.921%, CI[216871.916, 222106.184], PI[213078.406, 225899.694]) 541.848 +0.247%
GET rps 1 9 1469890.974 (n=5, σ=14938.146, CV=1.02%, CI99%=±2.093%, PI99%=±5.126%, CI[1439133.124, 1500648.824], PI[1394549.936, 1545232.012]) 1462914.178 (n=5, σ=15393.414, CV=1.05%, CI99%=±2.167%, PI99%=±5.307%, CI[1431218.926, 1494609.430], PI[1385276.984, 1540551.372]) -6976.796 -0.475%
GET rps 10 1 1160639.024 (n=5, σ=4426.945, CV=0.38%, CI99%=±0.785%, PI99%=±1.924%, CI[1151523.883, 1169754.165], PI[1138311.580, 1182966.468]) 1158950.450 (n=5, σ=5297.702, CV=0.46%, CI99%=±0.941%, PI99%=±2.305%, CI[1148042.408, 1169858.492], PI[1132231.314, 1185669.586]) -1688.574 -0.145%
GET rps 10 9 2183098.500 (n=5, σ=13926.344, CV=0.64%, CI99%=±1.313%, PI99%=±3.217%, CI[2154423.964, 2211773.036], PI[2112860.519, 2253336.481]) 2179273.800 (n=5, σ=16528.235, CV=0.76%, CI99%=±1.562%, PI99%=±3.825%, CI[2145241.937, 2213305.663], PI[2095913.100, 2262634.500]) -3824.700 -0.175%
SET rps 1 1 212138.730 (n=5, σ=1363.276, CV=0.64%, CI99%=±1.323%, PI99%=±3.241%, CI[209331.726, 214945.734], PI[205263.003, 219014.457]) 213194.962 (n=5, σ=479.732, CV=0.23%, CI99%=±0.463%, PI99%=±1.135%, CI[212207.187, 214182.737], PI[210775.418, 215614.506]) 1056.232 +0.498%
SET rps 1 9 1454614.376 (n=5, σ=18405.263, CV=1.27%, CI99%=±2.605%, PI99%=±6.382%, CI[1416717.684, 1492511.068], PI[1361786.818, 1547441.934]) 1450817.326 (n=5, σ=10139.973, CV=0.70%, CI99%=±1.439%, PI99%=±3.525%, CI[1429938.982, 1471695.670], PI[1399676.036, 1501958.616]) -3797.050 -0.261%
SET rps 10 1 1036871.312 (n=5, σ=4746.359, CV=0.46%, CI99%=±0.943%, PI99%=±2.309%, CI[1027098.493, 1046644.131], PI[1012932.893, 1060809.731]) 1036169.886 (n=5, σ=6616.479, CV=0.64%, CI99%=±1.315%, PI99%=±3.221%, CI[1022546.465, 1049793.307], PI[1002799.456, 1069540.316]) -701.426 -0.068%
SET rps 10 9 1840951.026 (n=5, σ=23844.804, CV=1.30%, CI99%=±2.667%, PI99%=±6.533%, CI[1791854.244, 1890047.808], PI[1720688.963, 1961213.089]) 1861706.450 (n=5, σ=20658.051, CV=1.11%, CI99%=±2.285%, PI99%=±5.596%, CI[1819171.238, 1904241.662], PI[1757516.885, 1965896.015]) 20755.424 +1.127%

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor

@dvkashapov were you expecting something specific from the benchmarks? Looks mostly flat to me!

@dvkashapov

Copy link
Copy Markdown
Member Author

Yeah, I need to make sure this change does not introduce performance regression. Now I need to benchmark with only io threads calculation for copy avoided replies because values from benchmark get computed in main thread anyway.

@dvkashapov

dvkashapov commented Mar 6, 2026

Copy link
Copy Markdown
Member Author

@sarthakaggarwal97 other than that, got any comments on the code itself? Would be glad to hear your opinion

@sarthakaggarwal97 sarthakaggarwal97 self-requested a review March 9, 2026 17:25

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a potential issue here. What will happen if the replies spill into c->reply? These reply blocks maintain their own headers (as I can see in _addReplyPayloadToList)

Spilled replies are then potentially counted twice - in addReplyBulk and in trackBufReferences.

I asked AI to write a TCL test. Do let me know if it makes sense!

test {Copy avoidance spill to reply list returns omem to zero after drain} {
    r config set min-io-threads-avoid-copy-reply 1
    r config set io-threads 4
    r config set commandlog-reply-larger-than 1
    r config set client-output-buffer-limit {normal 0 0 0}

    set value [string repeat "q" [expr 16*1024]]
    r set spill_key $value

    set rd [valkey_deferring_client]
    $rd client setname spill_omem_test
    assert_equal "OK" [$rd read]
    $rd client id
    set client_id [$rd read]

    set cmd_count 1300
    set pipeline ""
    for {set i 0} {$i < $cmd_count} {incr i} {
        append pipeline "get spill_key\r\n"
    }
    $rd write $pipeline
    $rd flush

    set spilled_to_reply_list 0
    for {set i 0} {$i < 100} {incr i} {
        set oll [get_field_in_client_list $client_id [r client list] oll]
        if {$oll ne "" && $oll > 0} {
            set spilled_to_reply_list 1
            break
        }
        after 50
    }
    if {!$spilled_to_reply_list} {
        fail "Client never spilled copy-avoided replies into c->reply"
    }

    set reply_len [expr {[string length $value] + [string length [string length $value]] + 5}]
    set remaining [expr {$reply_len * $cmd_count}]
    while {$remaining > 0} {
        set chunk [$rd rawread [expr {min($remaining, 65536)}]]
        set chunk_len [string length $chunk]
        if {$chunk_len == 0} {
            fail "Socket drained unexpectedly after reading [expr {$reply_len * $cmd_count - $remaining}] bytes"
        }
        incr remaining -$chunk_len
    }

    set fully_drained 0
    for {set i 0} {$i < 100} {incr i} {
        set client_list [r client list]
        set obl [get_field_in_client_list $client_id $client_list obl]
        set oll [get_field_in_client_list $client_id $client_list oll]
        if {$obl ne "" && $oll ne "" && $obl == 0 && $oll == 0} {
            set fully_drained 1
            break
        }
        after 50
    }
    if {!$fully_drained} {
        fail "Client reply buffers did not fully drain"
    }

    set omem [get_field_in_client_list $client_id [r client list] omem]

    $rd close
    r config set commandlog-reply-larger-than -1
    r config set min-io-threads-avoid-copy-reply 0
    r config set io-threads 1

    assert_equal 0 $omem
}

@zuiderkwast zuiderkwast moved this to Needs Review in Valkey 9.1 Mar 9, 2026
@zuiderkwast zuiderkwast moved this to In Progress in Valkey 9.0 Mar 9, 2026
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
@dvkashapov

Copy link
Copy Markdown
Member Author

@sarthakaggarwal97 Great catch! So I pushed commits that fix 2 problems:

  • When multiple bulk replies extended the same header, reply_len was overwritten instead of accumulated which caused undertracking.
  • When replies spilled from c->buf to c->reply, both main thread (addReplyBulk()) and IO thread (trackBufReferences()) tracked the same content.

Fixes:

  1. Change reply_len = to reply_len +=
  2. Only track in main thread when content stays in c->buf; IO thread tracks spilled content

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
@dvkashapov dvkashapov added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Mar 10, 2026
@sarthakaggarwal97

sarthakaggarwal97 commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

Thanks @dvkashapov for handing the spill case. The 32 bit / TLS failures in CI are interesting to debug. Looks specific to those environments.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
…nce-fix

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
@dvkashapov

dvkashapov commented Mar 13, 2026

Copy link
Copy Markdown
Member Author

@sarthakaggarwal97, can we run benchmark on custom size values so that they're going to be > min-string-size-avoid-copy-reply or set that config to lower value if possible?

I handled the tls and 32bit failures and now the question is performance and correctness

@dvkashapov

dvkashapov commented Mar 13, 2026

Copy link
Copy Markdown
Member Author

I benchmarked with data sizes: 16, 96, 1024, 8kb, 16kb with lowered copy avoidance config values to exaggerate situation and did not observe any RPS/latency regressions >= 1%. Looks like changes are safe in that regard.

@madolson madolson requested review from madolson and ranshid March 30, 2026 16:00

@ranshid ranshid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I gave some nit comments.

the ordering choices are correct and necessary. You can't safely downgrade any of them to relaxed. The performance cost on ARM is real but small — a few cycles per bulk string header in the write path, not per byte. For a workload doing thousands of small GETs with copy avoidance, it could add up to measurable overhead on Graviton.
benchmarking shows no specific regression , and I don't think there's a way to avoid it without sacrificing correctness.

Comment thread src/unit/test_networking.cpp Outdated
Comment thread src/networking.c Outdated
Comment thread src/networking.c
Comment thread src/networking.c
Comment thread tests/unit/obuf-limits.tcl
@dvkashapov

Copy link
Copy Markdown
Member Author

The performance cost on ARM is real but small

I did benchmarks only on x86-64, we can investigate performance penalty on ARM if we merge #3433 and run custom data sizes benchmark that would force copy avoidance.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
@dvkashapov

Copy link
Copy Markdown
Member Author

@ranshid Thank you for review, addressed all comments, overall no big concerns, right? Except maybe for ARM performance.

@dvkashapov dvkashapov added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Apr 16, 2026
@dvkashapov

Copy link
Copy Markdown
Member Author

All test failures are unrelated, the only one I have not seen previously on unstable - module API test failure, tried reproducing with same build on Ubuntu but everything passes:

Test Summary: 633 passed, 0 failed

\o/ All tests passed without errors!

Cleanup: may take some time... OK
dvkashapov@valkey-dev:~/valkey$ 

@dvkashapov dvkashapov removed the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Apr 16, 2026
@ranshid ranshid merged commit 269b1c5 into valkey-io:unstable Apr 20, 2026
96 of 100 checks passed
@github-project-automation github-project-automation Bot moved this from Needs Review to To be backported in Valkey 9.1 Apr 20, 2026
@github-project-automation github-project-automation Bot moved this from In Progress / Needs Review to To be backported in Valkey 9.0 Apr 20, 2026
@dvkashapov dvkashapov deleted the cob-repl-avoidance-fix branch April 20, 2026 15:22
@sarthakaggarwal97

Copy link
Copy Markdown
Contributor

@dvkashapov @ranshid Seeing approx 5% degradation after this commit for 128 byte values. I'd wait for a few more runs to be sure of the degradation and that it's not noise, but wanted check if you have some thoughts on this.

Dashboard Link: https://perf-dashboard.valkey.io/public-dashboards/3e45bf8ded3043edaa941331cd1a94e2?from=2026-04-13T23:51:19.280Z&to=2026-04-21T07:27:04.802Z&timezone=UTC

Screenshot 2026-04-21 at 11 15 01 AM

sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 23, 2026
This improves COB memory tracking when using copy avoidance for bulk
string replies. This fix addresses underestimation of client memory
usage that occurred when reply buffers stored pointers to shared `robj`
instead of copying data.
IO threads calculate actual reply sizes by calling `sdslen()` on strings
before writing, for that we need atomic `tracked_for_cob` flag in
payload headers to prevent race conditions and double accounting.

See valkey-io#2396

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
madolson pushed a commit that referenced this pull request Apr 27, 2026
This improves COB memory tracking when using copy avoidance for bulk
string replies. This fix addresses underestimation of client memory
usage that occurred when reply buffers stored pointers to shared `robj`
instead of copying data.
IO threads calculate actual reply sizes by calling `sdslen()` on strings
before writing, for that we need atomic `tracked_for_cob` flag in
payload headers to prevent race conditions and double accounting.

See #2396

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
@sarthakaggarwal97 sarthakaggarwal97 added the release-notes This issue should get a line item in the release notes label Apr 28, 2026
@dvkashapov

Copy link
Copy Markdown
Member Author

Seeing approx 5% degradation after this commit for 128 byte values. I'd wait for a few more runs to be sure of the degradation and that it's not noise, but wanted check if you have some thoughts on this.

As far as I can tell the performance penalty for ARM is there, it's noticeable but my opinion is that we can accept it. Should we talk about this in the weekly meeting? Maybe folks have another opinion about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: To be backported
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants