Replication streaming compression (per-replica) — review vs #3531 base#18
Open
roshkhatri wants to merge 13 commits into
Open
Replication streaming compression (per-replica) — review vs #3531 base#18roshkhatri wants to merge 13 commits into
roshkhatri wants to merge 13 commits into
Conversation
Adds replication wire compression on top of valkey-io#3531 with lz4 as the first supported codec for the incremental replication stream. The replication stream from primary to replica is wrapped in a VKCS envelope (using STREAM_KIND_REPL) and compressed as a single long-lived frame at the per-replica buffer layer. Default behavior is unchanged with 'replcompression no'; existing replicas without the new capability stay uncompressed. Negotiation is per-replica via the existing PSYNC handshake; a new REPLICA_CAPA_COMPRESSION capability lets each side opt in independently. Compression runs inline on the IO thread that owns the replica's write job; no dedicated compression thread, no IPC, no reordering. Optional sticky thread affinity (lazy ownership + event-driven rebalance) keeps the long-lived LZ4 frame state on a single IO thread for cache locality. Configs: replcompression bool, default no repl-compression-thread-affinity bool, default yes Internal constants: REPLICA_CAPA_COMPRESSION (1 << 4) REPL_COMPRESSION_ALGO ALGO_LZ4 REPL_COMPRESSION_LEVEL 0 (LZ4 fast mode) REPL_COMPRESSION_BATCH_LIMIT 1 MB raw input per dispatch REPL_STREAM_DECODER_OUTPUT_MAX 256 MB INFO replication per-replica fields: compression=lz4, compressed_bytes, uncompressed_bytes, compression_ratio, compression_errors, compression_cpu_usec, debug_compression_pending_drains, debug_thread_switches INFO replication server-level (replica side): repl_decompression_errors, repl_decompression_cpu_usec, repl_decompressed_bytes_total, repl_apply_cpu_usec, repl_apply_batches CI adds a test-replication-compression job that runs the replication-tagged integration tests with replcompression=yes to exercise compression across the broader replication test surface. Tests: 18 streamReader push-mode unit tests + 3 replCompression unit tests + 27 integration tests. Performance (BlockMesh tweets, 3M keys x ~315 byte JSON values, 1,073 MB uncompressed per replica, 30 clients, pipeline 50, 2 cross-region replicas): LZ4 level 0 (default): 0.48 ratio, 52% bandwidth saved, 2.5s compression CPU per replica, <1% throughput overhead vs uncompressed baseline. Affinity ON vs OFF: throughput unchanged (118.6K vs 118.1K keys/s) but thread switches drop from ~800K to ~30 per replica. ZSTD support follows in valkey-io#3798. Related to valkey-io#3531. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
# Conflicts: # src/io_threads.c # src/networking.c # src/replication.c
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
The compression CI job (--config replcompression yes) ran replication-buffer.tcl and the dual-channel buffer-memory tests, which assert exact replication buffer/backlog memory and byte volumes. Compression legitimately changes those (per-replica codec buffers add ~1MB scratch; fewer wire bytes let replicas keep up), so the assertions fail under compression even though replication is correct. Drop the repl-compression tag from replication-buffer.tcl and the two dual-channel blocks holding the memory tests; they still run uncompressed in the regular job. Functional dual-channel coverage stays in the compression job. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
…io#3897) ## Summary Fixes valkey-io#3008 > lets add assert checking that the object has a key in dbUntrackKeyWithVolatileItems and dbTrackKeyWithVolatileItems to be able to get a more explicit error in these cases This is addressed in this PR. Signed-off-by: ydsakshi <ydsakshi023@gmail.com>
clusterNode.shard_id is a fixed-size char[CLUSTER_NAMELEN] buffer that is not guaranteed to be NUL-terminated, so it must be printed with %.40s. This was introduced in valkey-io#2510. Signed-off-by: Binbin <binloveplay1314@qq.com>
…valkey-io#3941) Since valkey-io#2449 made the failover delay relative to cluster-node-timeout. Now delay = min(cluster-node-timeout / 30, 500), any cluster-node-timeout below 30, including the legal minimum 0 will collapses delay to zero, and `x % 0` is undefined behaviour. Signed-off-by: Binbin <binloveplay1314@qq.com>
…thakaggarwal97/valkey into replication-streaming-compression-pr # Conflicts: # src/compression_stream.c # src/compression_stream.h # src/config.c # src/rdb.c # src/server.h # src/unit/test_compression.cpp # tests/integration/rdb-compression.tcl # valkey.conf
…tate and plaintext passthrough Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
…val (fix macOS -Werror) Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Isolates the per-replica replication-stream compression work on top of the streaming-compression-rio (valkey-io#3531) base.
streaming-compression-rio-pr@ 0f15410 (the Streaming Compression support for RDB valkey-io/valkey#3531 commit this work is built on)replication-streaming-compression-pr@ 5144e12Review-only (both branches in this fork) — shows just the replication-compression diff, separated from the valkey-io#3531 base.