Skip to content

Fork-based Per-type Object Memory Profiling#9

Closed
artikell wants to merge 1 commit into
unstablefrom
feature/rdb-save-stat-memory
Closed

Fork-based Per-type Object Memory Profiling#9
artikell wants to merge 1 commit into
unstablefrom
feature/rdb-save-stat-memory

Conversation

@artikell

@artikell artikell commented May 22, 2026

Copy link
Copy Markdown
Owner

The problem/use-case that the feature addresses

Production operators currently lack a precise, low-overhead mechanism to understand the memory composition of a Valkey instance at the object level. The existing toolset falls short:

  1. INFO memory only exposes aggregate metrics (used_memory, used_memory_rss, mem_fragmentation_ratio) with no breakdown by data type (string/list/set/zset/hash/stream).
  2. MEMORY USAGE <key> operates per-key. For instances with hundreds of millions of keys, scanning them all is prohibitively expensive — it blocks the main thread and has O(N) complexity.
  3. Buffer visibility is fragmented — replication buffer/backlog, AOF buffer, and client output buffers are scattered across different INFO sections. There is no unified "memory profile" captured atomically in a single snapshot.
  4. Redis-era MEMORY STATS provides only coarse-grained overhead categories (dataset.percentage, overhead.total) without per-type exact allocation sizes.

Typical operational scenarios that remain difficult:

  • After a memory alert, quickly determining whether the spike is caused by hash-type bloat, repl-buffer accumulation, or client output buffers.
  • Capacity planning that requires per-type growth trend analysis across a fleet.
  • Automated, periodic memory profiling without per-key scanning overhead or external RDB parsing tools.

Description of the feature

Leverage the COW snapshot already created by fork() during BGSAVE to collect exact per-key memory statistics alongside RDB serialization at near-zero additional cost, then report a structured summary back to the parent process.

New subcommand:

BGSAVE STAT_MEMORY

This is mutually exclusive with SCHEDULE and CANCEL. When issued, the forked child process collects memory statistics during the normal RDB serialization pass and transmits them back via the existing childinfo pipe.

Statistics collected:

Category Fields Description
Per-type count[type] / bytes[type] Key count and exact allocated bytes per data type (string, list, set, zset, hash, stream)
KV data mem_kv_user_data Total user data bytes
KV data mem_kv_expiration Expires kvstore overhead
KV data mem_kv_hash_metadata Hash field-level TTL volatile metadata
User metadata mem_um_kvstore db->keys dictionary overhead (minus robj headers)
User metadata mem_um_clients_io NORMAL/PUBSUB/PRIMARY client buffers
System mem_sys_aof_buffer AOF write buffer
System mem_sys_repl_buffer Replication buffer (excluding backlog)
System mem_sys_repl_backlog Replication backlog + rax index

Output format (LL_NOTICE log lines after BGSAVE completes):

RDB object stats: total keys=1523456 mem_bytes=2147483648 | string=1200000/1073741824B hash=300000/859832320B zset=23456/213909504B
RDB memory breakdown: kv[user=2147483648B expire=12582912B hash_meta=0B] user_meta[kvstore=48234496B clients=8388608B] sys[aof_buf=4194304B repl_buf=67108864B repl_backlog=134217728B]

Implementation approach:

  • A file-static function pointer (rdb_key_mem_stat_fn) in rdb.c serves as the statistics callback, registered only when RDBFLAGS_STAT_MEMORY is set in the rdbflags bitmask.
  • rdbSaveObject() invokes the callback with (obj_type, mem_used, count) per key using zmalloc_size() for exact allocator-level measurements.
  • Results are transmitted to the parent process via CHILD_INFO_TYPE_RDB_OBJECT_STATS over the childinfo pipe.
  • All measurement runs on the forked child's COW snapshot — lock-free, contention-free, zero impact on main-thread throughput.

Alternatives you've considered

Approach Drawback
SCAN + MEMORY USAGE loop O(N), blocks main thread, tens of seconds for millions of keys
DEBUG OBJECT + sampling High sampling error; DEBUG commands are risky in production
External RDB parsing tools (rdb-tools) Requires a full RDB file on disk; not real-time; lacks buffer info
Real-time counters updated on every write High maintenance cost; degrades hot-path throughput
Redis MEMORY STATS / MEMORY DOCTOR Only coarse-grained overhead categories; no per-type exact breakdown

The proposed approach reuses the existing BGSAVE fork + full-scan path, achieving exact statistics at near-zero additional cost (~10% child process overhead measured on 10M+ simple string keys; lower with complex types due to I/O dominance).

Additional information

  • The feature is entirely opt-in per invocation (BGSAVE STAT_MEMORY). Normal BGSAVE, scheduled saves, and replication-triggered saves have zero overhead.
  • Memory measurement uses zmalloc_size() (exact allocator-level size), reflecting true RSS contribution.
  • Future extension: expose results in an INFO rdb_stats section for monitoring systems to scrape (current implementation is log-only).

@artikell artikell force-pushed the feature/rdb-save-stat-memory branch 2 times, most recently from dae4e3f to 3e59fc0 Compare May 28, 2026 11:52
@artikell artikell changed the title Add rdb-save-stat-memory option to log per-type object stats RFE: Optional in-process memory composition snapshot at RDB save time May 28, 2026
@artikell artikell changed the title RFE: Optional in-process memory composition snapshot at RDB save time Optional in-process memory composition snapshot at RDB save time May 28, 2026
@artikell artikell force-pushed the feature/rdb-save-stat-memory branch 3 times, most recently from b9bd312 to d5f549d Compare June 3, 2026 06:35
Introduce a new BGSAVE subcommand `STAT_MEMORY` that collects per-type
object memory statistics and a category breakdown during RDB save. The
implementation uses a file-static function pointer callback in rdb.c,
gated by the new RDBFLAGS_STAT_MEMORY bit passed through the rdbflags
chain.

Statistics collected in the forked child process:
- Per-type (string/list/set/zset/hash/stream): key count and exact bytes
- KV data: user data, expiration kvstore, hash field-level TTL metadata
- User metadata: kvstore overhead, client I/O buffers
- System: AOF buffer, replication buffer/backlog

Results are reported via childinfo pipe and logged at LL_NOTICE level
after BGSAVE completes. All measurement runs on the COW snapshot with
zero impact on main-thread throughput.
@artikell artikell force-pushed the feature/rdb-save-stat-memory branch from d5f549d to 8ae546b Compare June 3, 2026 06:57
@artikell artikell changed the title Optional in-process memory composition snapshot at RDB save time Fork-based Per-type Object Memory Profiling Jun 3, 2026
@artikell artikell closed this Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant