Improve COB memory tracking with copy avoidance#3306
Conversation
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3306 +/- ##
============================================
+ Coverage 76.41% 76.66% +0.24%
============================================
Files 159 159
Lines 79815 79877 +62
============================================
+ Hits 60990 61234 +244
+ Misses 18825 18643 -182
🚀 New features to boost your workflow:
|
|
Benchmark ran on this commit: Benchmark Comparison: HEAD vs HEAD (averaged) - rps metricsRun Summary:
Statistical Notes:
Note: Values with (n=X, σ=Y, CV=Z%, CI99%=±W%, PI99%=±V%) indicate averages from X runs with standard deviation Y, coefficient of variation Z%, 99% confidence interval margin of error ±W% of the mean, and 99% prediction interval margin of error ±V% of the mean. CI bounds [A, B] and PI bounds [C, D] show the actual interval ranges. Configuration:
Configuration:
|
|
@dvkashapov were you expecting something specific from the benchmarks? Looks mostly flat to me! |
|
Yeah, I need to make sure this change does not introduce performance regression. Now I need to benchmark with only io threads calculation for copy avoided replies because values from benchmark get computed in main thread anyway. |
|
@sarthakaggarwal97 other than that, got any comments on the code itself? Would be glad to hear your opinion |
sarthakaggarwal97
left a comment
There was a problem hiding this comment.
I think there is a potential issue here. What will happen if the replies spill into c->reply? These reply blocks maintain their own headers (as I can see in _addReplyPayloadToList)
Spilled replies are then potentially counted twice - in addReplyBulk and in trackBufReferences.
I asked AI to write a TCL test. Do let me know if it makes sense!
test {Copy avoidance spill to reply list returns omem to zero after drain} {
r config set min-io-threads-avoid-copy-reply 1
r config set io-threads 4
r config set commandlog-reply-larger-than 1
r config set client-output-buffer-limit {normal 0 0 0}
set value [string repeat "q" [expr 16*1024]]
r set spill_key $value
set rd [valkey_deferring_client]
$rd client setname spill_omem_test
assert_equal "OK" [$rd read]
$rd client id
set client_id [$rd read]
set cmd_count 1300
set pipeline ""
for {set i 0} {$i < $cmd_count} {incr i} {
append pipeline "get spill_key\r\n"
}
$rd write $pipeline
$rd flush
set spilled_to_reply_list 0
for {set i 0} {$i < 100} {incr i} {
set oll [get_field_in_client_list $client_id [r client list] oll]
if {$oll ne "" && $oll > 0} {
set spilled_to_reply_list 1
break
}
after 50
}
if {!$spilled_to_reply_list} {
fail "Client never spilled copy-avoided replies into c->reply"
}
set reply_len [expr {[string length $value] + [string length [string length $value]] + 5}]
set remaining [expr {$reply_len * $cmd_count}]
while {$remaining > 0} {
set chunk [$rd rawread [expr {min($remaining, 65536)}]]
set chunk_len [string length $chunk]
if {$chunk_len == 0} {
fail "Socket drained unexpectedly after reading [expr {$reply_len * $cmd_count - $remaining}] bytes"
}
incr remaining -$chunk_len
}
set fully_drained 0
for {set i 0} {$i < 100} {incr i} {
set client_list [r client list]
set obl [get_field_in_client_list $client_id $client_list obl]
set oll [get_field_in_client_list $client_id $client_list oll]
if {$obl ne "" && $oll ne "" && $obl == 0 && $oll == 0} {
set fully_drained 1
break
}
after 50
}
if {!$fully_drained} {
fail "Client reply buffers did not fully drain"
}
set omem [get_field_in_client_list $client_id [r client list] omem]
$rd close
r config set commandlog-reply-larger-than -1
r config set min-io-threads-avoid-copy-reply 0
r config set io-threads 1
assert_equal 0 $omem
}
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
@sarthakaggarwal97 Great catch! So I pushed commits that fix 2 problems:
Fixes:
|
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
Thanks @dvkashapov for handing the spill case. The 32 bit / TLS failures in CI are interesting to debug. Looks specific to those environments. |
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
…nce-fix Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
@sarthakaggarwal97, can we run benchmark on custom size values so that they're going to be > I handled the tls and 32bit failures and now the question is performance and correctness |
|
I benchmarked with data sizes: 16, 96, 1024, 8kb, 16kb with lowered copy avoidance config values to exaggerate situation and did not observe any RPS/latency regressions >= 1%. Looks like changes are safe in that regard. |
There was a problem hiding this comment.
Overall looks good. I gave some nit comments.
the ordering choices are correct and necessary. You can't safely downgrade any of them to relaxed. The performance cost on ARM is real but small — a few cycles per bulk string header in the write path, not per byte. For a workload doing thousands of small GETs with copy avoidance, it could add up to measurable overhead on Graviton.
benchmarking shows no specific regression , and I don't think there's a way to avoid it without sacrificing correctness.
I did benchmarks only on x86-64, we can investigate performance penalty on ARM if we merge #3433 and run custom data sizes benchmark that would force copy avoidance. |
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
@ranshid Thank you for review, addressed all comments, overall no big concerns, right? Except maybe for ARM performance. |
|
All test failures are unrelated, the only one I have not seen previously on unstable - module API test failure, tried reproducing with same build on Ubuntu but everything passes: |
|
@dvkashapov @ranshid Seeing approx 5% degradation after this commit for 128 byte values. I'd wait for a few more runs to be sure of the degradation and that it's not noise, but wanted check if you have some thoughts on this.
|
This improves COB memory tracking when using copy avoidance for bulk string replies. This fix addresses underestimation of client memory usage that occurred when reply buffers stored pointers to shared `robj` instead of copying data. IO threads calculate actual reply sizes by calling `sdslen()` on strings before writing, for that we need atomic `tracked_for_cob` flag in payload headers to prevent race conditions and double accounting. See valkey-io#2396 --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
This improves COB memory tracking when using copy avoidance for bulk string replies. This fix addresses underestimation of client memory usage that occurred when reply buffers stored pointers to shared `robj` instead of copying data. IO threads calculate actual reply sizes by calling `sdslen()` on strings before writing, for that we need atomic `tracked_for_cob` flag in payload headers to prevent race conditions and double accounting. See #2396 --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
As far as I can tell the performance penalty for ARM is there, it's noticeable but my opinion is that we can accept it. Should we talk about this in the weekly meeting? Maybe folks have another opinion about that. |

This improves COB memory tracking when using copy avoidance for bulk string replies. This fix addresses underestimation of client memory usage that occurred when reply buffers stored pointers to shared
robjinstead of copying data.IO threads calculate actual reply sizes by calling
sdslen()on strings before writing, for that we need atomictracked_for_cobflag in payload headers to prevent race conditions and double accounting.See #2396