From 1e270d49fc0145361cbbff8ec887d140b2603923 Mon Sep 17 00:00:00 2001 From: Ran Shidlansik Date: Tue, 5 Aug 2025 11:12:27 +0300 Subject: [PATCH 1/3] Introduce HASH items expiration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes https://github.com/valkey-io/valkey/issues/640 This PR introduces support for **field-level expiration in Valkey hash types**, making it possible for individual fields inside a hash to expire independently — creating what we call **volatile fields**. This is just the first out of 3 PRs. The content of this PR focus on enabling the basic ability to set and modify hash fields expiration as well as persistency (AOF+RDB) and defrag. [The second PR](https://github.com/ranshid/valkey/pull/5) introduces the new algorithm (volatile-set) to track volatile hash fields is in the last stages of review. The current implementation in this PR (in volatile-set.h/c) is just s tub implementation and will be replaced by [The second PR](https://github.com/ranshid/valkey/pull/5) [The third PR](https://github.com/ranshid/valkey/pull/4/) which introduces the active expiration and defragmentation jobs. For more highlevel design details you can track the RFC PR: https://github.com/valkey-io/valkey-rfc/pull/22. --- Some highlevel major decisions which are taken as part of this work: 1. We decided to copy the existing Redis API in order to maintain compatibility with existing clients. 2. We decided to avoid introducing lazy-expiration at this point, in order to reduce complexity and rely only on active-expiration for memory reclamation. This will require us to continue to work on improving the active expiration job and potentially consider introduce lazy-expiration support later on. 3. Although different commands which are adding expiration on hash fields are influencing the memory utilization (by allocating more memory for expiration time and metadata) we decided to avoid adding the DENYOOM for these commands (an exception is HSETEX) in order to be better aligned with highlevel keys commands like `expire` 4. Some hash type commands will produce unexpected results: - HLEN - will still reflect the number of fields which exists in the hash object (either actually expired or not). - HRANDFIELD - in some cases we will not be able to randomly select a field which was not already expired. this case happen in 2 cases: 1/ when we are asked to provide a non-uniq fields (i.e negative count) 2/ when the size of the hash is much bigger than the count and we need to provide uniq results. In both cases it is possible that an empty response will be returned to the caller, even in case there are fields in the hash which are either persistent or not expired. 5. For the case were a field is provided with a zero (0) expiration time or expiration time in the past, it is immediately deleted. We decided that, in order to be aligned with how high level keys are handled, we will emit hexpired keyspace event for that case (instead of hdel). For example: for the case: 6. We will ALWAYS load hash fields during rdb load. This means that when primary is rebooting with an old snapshot, it will take time to reclaim all the expired fields. However this simplifies the current logic and avoid major refactoring that I suspect will be needed. ``` HSET myhash f1 v1 > 0 HGETEX myhash EX 0 FIELDS 1 f1 > "v1" HTTL myhash FIELDS 1 f1 > -2 ``` The reported events are: ``` 1) "psubscribe" 2) "__keyevent@0__*" 3) (integer) 1 1) "pmessage" 2) "__keyevent@0__*" 3) "__keyevent@0__:hset" 4) "myhash" 1) "pmessage" 2) "__keyevent@0__*" 3) "__keyevent@0__:hexpired" <---------------- note this 4) "myhash" 1) "pmessage" 2) "__keyevent@0__*" 3) "__keyevent@0__:del" 4) "myhash" ``` --- This PR also **modularizes and exposes the internal `hashTypeEntry` logic** as a new standalone `entry.c/h` module. This new abstraction handles all aspects of **field–value–expiry encoding** using multiple memory layouts optimized for performance and memory efficiency. An `entry` is an abstraction that represents a single **field–value pair with optional expiration**. Internally, Valkey uses different memory layouts for compactness and efficiency, chosen dynamically based on size and encoding constraints. The entry pointer is the field sds. Which make us use an entry just like any sds. We encode the entry layout type in the field SDS header. Field type SDS_TYPE_5 doesn't have any spare bits to encode this so we use it only for the first layout type. Entry with embedded value, used for small sizes. The value is stored as SDS_TYPE_8. The field can use any SDS type. Entry can also have expiration timestamp, which is the UNIX timestamp for it to be expired. For aligned fast access, we keep the expiry timestamp prior to the start of the sds header. +----------------+--------------+---------------+ | Expiration | field | value | | 1234567890LL | hdr "foo" \0 | hdr8 "bar" \0 | +-----------------------^-------+---------------+ | | entry pointer (points to field sds content) Entry with value pointer, used for larger fields and values. The field is SDS type 8 or higher. +--------------+-------+--------------+ | Expiration | value | field | | 1234567890LL | ptr | hdr "foo" \0 | +--------------+--^----+------^-------+ | | | | | entry pointer (points to field sds content) | value pointer = value sds The `entry.c/h` API provides methods to: - Create, read, and write and Update field/value/expiration - Set or clear expiration - Check expiration state - Clone or delete an entry --- This PR introduces **new commands** and extends existing ones to support field expiration: The proposed API is very much identical to the Redis provided API (Redis 7.4 + 8.0). This is intentionally proposed in order to avoid breaking client applications already opted to use hash items TTL. **Synopsis** ``` HSETEX key [NX | XX] [FNX | FXX] [EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT unix-time-milliseconds | KEEPTTL] FIELDS numfields field value [field value ...] ``` Set the value of one or more fields of a given hash key, and optionally set their expiration time or time-to-live (TTL). The HSETEX command supports the following set of options: * `NX` — Only set the fields if the hash object does NOT exist. * `XX` — Only set the fields if if the hash object doesx exist. * `FNX` — Only set the fields if none of them already exist. * `FXX` — Only set the fields if all of them already exist. * `EX seconds` — Set the specified expiration time in seconds. * `PX milliseconds` — Set the specified expiration time in milliseconds. * `EXAT unix-time-seconds` — Set the specified Unix time in seconds at which the fields will expire. * `PXAT unix-time-milliseconds` — Set the specified Unix time in milliseconds at which the fields will expire. * `KEEPTTL` — Retain the TTL associated with the fields. The `EX`, `PX`, `EXAT`, `PXAT`, and `KEEPTTL` options are mutually exclusive. **Synopsis** ``` HGETEX key [EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT unix-time-milliseconds | PERSIST] FIELDS numfields field [field ...] ``` Get the value of one or more fields of a given hash key and optionally set their expiration time or time-to-live (TTL). The `HGETEX` command supports a set of options: * `EX seconds` — Set the specified expiration time, in seconds. * `PX milliseconds` — Set the specified expiration time, in milliseconds. * `EXAT unix-time-seconds` — Set the specified Unix time at which the fields will expire, in seconds. * `PXAT unix-time-milliseconds` — Set the specified Unix time at which the fields will expire, in milliseconds. * `PERSIST` — Remove the TTL associated with the fields. The `EX`, `PX`, `EXAT`, `PXAT`, and `PERSIST` options are mutually exclusive. **Synopsis** ``` HEXPIRE key seconds [NX | XX | GT | LT] FIELDS numfields field [field ...] ``` Set an expiration (TTL or time to live) on one or more fields of a given hash key. You must specify at least one field. Field(s) will automatically be deleted from the hash key when their TTLs expire. Field expirations will only be cleared by commands that delete or overwrite the contents of the hash fields, including `HDEL` and `HSET` commands. This means that all the operations that conceptually *alter* the value stored at a hash key's field without replacing it with a new one will leave the TTL untouched. You can clear the TTL of a specific field by specifying 0 for the ‘seconds’ argument. Note that calling `HEXPIRE`/`HPEXPIRE` with a time in the past will result in the hash field being deleted immediately. The `HEXPIRE` command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. * `XX` — For each specified field, set expiration only when the field has an existing expiration. * `GT` — For each specified field, set expiration only when the new expiration is greater than current one. * `LT` — For each specified field, set expiration only when the new expiration is less than current one. **Synopsis** ``` HEXPIREAT key unix-time-seconds [NX | XX | GT | LT] FIELDS numfields field [field ...] ``` `HEXPIREAT` has the same effect and semantics as `HEXPIRE`, but instead of specifying the number of seconds for the TTL (time to live), it takes an absolute Unix timestamp in seconds since Unix epoch. A timestamp in the past will delete the field immediately. The `HEXPIREAT` command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. * `XX` — For each specified field, set expiration only when the field has an existing expiration. * `GT` — For each specified field, set expiration only when the new expiration is greater than current one. * `LT` — For each specified field, set expiration only when the new expiration is less than current one. **Synopsis** ``` HPEXPIRE key milliseconds [NX | XX | GT | LT] FIELDS numfields field [field ...] ``` This command works like `HEXPIRE`, but the expiration of a field is specified in milliseconds instead of seconds. The `HPEXPIRE` command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. * `XX` — For each specified field, set expiration only when the field has an existing expiration. * `GT` — For each specified field, set expiration only when the new expiration is greater than current one. * `LT` — For each specified field, set expiration only when the new expiration is less than current one. **Synopsis** ``` HPEXPIREAT key unix-time-milliseconds [NX | XX | GT | LT] FIELDS numfields field [field ...] ``` `HPEXPIREAT` has the same effect and semantics as `HEXPIREAT``,` but the Unix time at which the field will expire is specified in milliseconds since Unix epoch instead of seconds. **Synopsis** ``` HPERSIST key FIELDS numfields field [field ...] ``` Remove the existing expiration on a hash key's field(s), turning the field(s) from *volatile* (a field with expiration set) to *persistent* (a field that will never expire as no TTL (time to live) is associated). **Synopsis** ``` HSETEX key [NX] seconds field value [field value ...] ``` Similar to `HSET` but adds one or more hash fields that expire after specified number of seconds. By default, this command overwrites the values and expirations of specified fields that exist in the hash. If `NX` option is specified, the field data will not be overwritten. If `key` doesn't exist, a new Hash key is created. The HSETEX command supports a set of options: * `NX` — For each specified field, set expiration only when the field has no expiration. **Synopsis** ``` HTTL key FIELDS numfields field [field ...] ``` Returns the **remaining** TTL (time to live) of a hash key's field(s) that have a set expiration. This introspection capability allows you to check how many seconds a given hash field will continue to be part of the hash key. ``` HPTTL key FIELDS numfields field [field ...] ``` Like `HTTL`, this command returns the remaining TTL (time to live) of a field that has an expiration set, but in milliseconds instead of seconds. **Synopsis** ``` HEXPIRETIME key FIELDS numfields field [field ...] ``` Returns the absolute Unix timestamp in seconds since Unix epoch at which the given key's field(s) will expire. **Synopsis** ``` HPEXPIRETIME key FIELDS numfields field [field ...] ``` `HPEXPIRETIME` has the same semantics as `HEXPIRETIME`, but returns the absolute Unix expiration timestamp in milliseconds since Unix epoch instead of seconds. This PR introduces new notification events to support field-level expiration: | Event | Trigger | |-------------|-------------------------------------------| | `hexpire` | Field expiration was set | | `hexpired` | Field was deleted due to expiration | | `hpersist` | Expiration was removed from a field | | `del` | Key was deleted after all fields expired | Note that we diverge from Redis in the cases we emit hexpired event. For example: given the following usecase: ``` HSET myhash f1 v1 (integer) 0 HGETEX myhash EX 0 FIELDS 1 f1 1) "v1" HTTL myhash FIELDS 1 f1 1) (integer) -2 ``` regarding the keyspace-notifications: Redis reports: ``` 1) "psubscribe" 2) "__keyevent@0__:*" 3) (integer) 1 1) "pmessage" 2) "__keyevent@0__:*" 3) "__keyevent@0__:hset" 4) "myhash2" 1) "pmessage" 2) "__keyevent@0__:*" 3) "__keyevent@0__:hdel" <---------------- note this 4) "myhash2" 1) "pmessage" 2) "__keyevent@0__:*" 3) "__keyevent@0__:del" 4) "myhash2" ``` However In our current suggestion, Valkey will emit: ``` 1) "psubscribe" 2) "__keyevent@0__*" 3) (integer) 1 1) "pmessage" 2) "__keyevent@0__*" 3) "__keyevent@0__:hset" 4) "myhash" 1) "pmessage" 2) "__keyevent@0__*" 3) "__keyevent@0__:hexpired" <---------------- note this 4) "myhash" 1) "pmessage" 2) "__keyevent@0__*" 3) "__keyevent@0__:del" 4) "myhash" ``` --- - Expiration-aware commands (`HSETEX`, `HGETEX`, etc.) are **not propagated as-is**. - Instead, Valkey rewrites them into equivalent commands like: - `HDEL` (for expired fields) - `HPEXPIREAT` (for setting absolute expiration) - `HPERSIST` (for removing expiration) This ensures compatibility with replication and AOF while maintaining consistent field-level expiry behavior. --- | Command Name | QPS Standard | QPS HFE | QPS Diff % | Latency Standard (ms) | Latency HFE (ms) | Latency Diff % | |--------------|-------------|---------|------------|----------------------|------------------|----------------| | **One Large Hash Table** | | HGET | 137988.12 | 138484.97 | +0.36% | 0.951 | 0.949 | -0.21% | | HSET | 138561.73 | 137343.77 | -0.87% | 0.948 | 0.956 | +0.84% | | HEXISTS | 139431.12 | 138677.02 | -0.54% | 0.942 | 0.946 | +0.42% | | HDEL | 140114.89 | 138966.09 | -0.81% | 0.938 | 0.945 | +0.74% | | **Many Hash Tables (100 fields)** | | HGET | 136798.91 | 137419.27 | +0.45% | 0.959 | 0.956 | -0.31% | | HEXISTS | 138946.78 | 139645.31 | +0.50% | 0.946 | 0.941 | -0.52% | | HGETALL | 42194.09 | 42016.80 | -0.42% | 0.621 | 0.625 | +0.64% | | HSET | 137230.69 | 137249.53 | +0.01% | 0.959 | 0.958 | -0.10% | | HDEL | 138985.41 | 138619.34 | -0.26% | 0.948 | 0.949 | +0.10% | | **Many Hash Tables (1000 fields)** | | HGET | 135795.77 | 139256.36 | +2.54% | 0.965 | 0.943 | -2.27% | | HEXISTS | 138121.55 | 137950.06 | -0.12% | 0.951 | 0.952 | +0.10% | | HGETALL | 5885.81 | 5633.80 | **-4.28%** | 2.690 | 2.841 | **+5.61%** | | HSET | 137005.08 | 137400.39 | +0.28% | 0.959 | 0.955 | -0.41% | | HDEL | 138293.45 | 137381.52 | -0.65% | 0.948 | 0.955 | +0.73% | [ ] Consider extending HSETEX with extra arguments: NX/XX so that it is possible to prevent adding/setting/mutating fields of a non-existent hash [ ] Avoid loading expired fields when non-preamble RDB is being loaded on primary. This is an optimization in order to reduce loading unnecessary fields (which are expired). This would also require us to propagate the HDEL to the replicas in case of RDBFLAGS_FEED_REPL. Note that it might have to require some refactoring: 1/ propagate the rdbflags and current time to rdbLoadObject. 2/ consider the case of restore and check_rdb etc... For this reason I would like to avoid this optimizationfor the first drop. Signed-off-by: Ran Shidlansik --- cmake/Modules/SourceFiles.cmake | 5 +- src/Makefile | 2 +- src/anet.c | 2 - src/aof.c | 32 +- src/commands.def | 425 +++++ src/commands/hexpire.json | 118 ++ src/commands/hexpireat.json | 120 ++ src/commands/hexpiretime.json | 85 + src/commands/hgetex.json | 118 ++ src/commands/hpersist.json | 84 + src/commands/hpexpire.json | 120 ++ src/commands/hpexpireat.json | 120 ++ src/commands/hpexpiretime.json | 85 + src/commands/hpttl.json | 85 + src/commands/hsetex.json | 135 ++ src/commands/httl.json | 85 + src/db.c | 81 +- src/defrag.c | 26 +- src/entry.c | 410 +++++ src/entry.h | 94 ++ src/expire.c | 103 +- src/expire.h | 47 + src/hashtable.c | 27 +- src/hashtable.h | 6 + src/module.c | 8 +- src/monotonic.h | 2 + src/object.c | 9 +- src/rdb.c | 56 +- src/rdb.h | 50 +- src/server.c | 153 +- src/server.h | 80 +- src/serverassert.h | 4 + src/t_hash.c | 1406 ++++++++++++---- src/t_string.c | 181 +-- src/unit/test_entry.c | 471 ++++++ src/unit/test_files.h | 7 + src/util.c | 2 - src/util.h | 11 + src/valkey-check-rdb.c | 3 + src/volatile_set.c | 79 + src/volatile_set.h | 40 + tests/unit/hashexpire.tcl | 2639 +++++++++++++++++++++++++++++++ 42 files changed, 6943 insertions(+), 673 deletions(-) create mode 100644 src/commands/hexpire.json create mode 100644 src/commands/hexpireat.json create mode 100644 src/commands/hexpiretime.json create mode 100644 src/commands/hgetex.json create mode 100644 src/commands/hpersist.json create mode 100644 src/commands/hpexpire.json create mode 100644 src/commands/hpexpireat.json create mode 100644 src/commands/hpexpiretime.json create mode 100644 src/commands/hpttl.json create mode 100644 src/commands/hsetex.json create mode 100644 src/commands/httl.json create mode 100644 src/entry.c create mode 100644 src/entry.h create mode 100644 src/expire.h create mode 100644 src/unit/test_entry.c create mode 100644 src/volatile_set.c create mode 100644 src/volatile_set.h create mode 100644 tests/unit/hashexpire.tcl diff --git a/cmake/Modules/SourceFiles.cmake b/cmake/Modules/SourceFiles.cmake index 0e484e5179b..da23f9f880b 100644 --- a/cmake/Modules/SourceFiles.cmake +++ b/cmake/Modules/SourceFiles.cmake @@ -117,7 +117,10 @@ set(VALKEY_SERVER_SRCS ${CMAKE_SOURCE_DIR}/src/connection.c ${CMAKE_SOURCE_DIR}/src/unix.c ${CMAKE_SOURCE_DIR}/src/server.c - ${CMAKE_SOURCE_DIR}/src/logreqres.c) + ${CMAKE_SOURCE_DIR}/src/logreqres.c + ${CMAKE_SOURCE_DIR}/src/entry.c + ${CMAKE_SOURCE_DIR}/src/volatile_set.c) + # valkey-cli set(VALKEY_CLI_SRCS diff --git a/src/Makefile b/src/Makefile index 53911687014..2f4f360bf2b 100644 --- a/src/Makefile +++ b/src/Makefile @@ -423,7 +423,7 @@ ENGINE_NAME=valkey SERVER_NAME=$(ENGINE_NAME)-server$(PROG_SUFFIX) ENGINE_SENTINEL_NAME=$(ENGINE_NAME)-sentinel$(PROG_SUFFIX) ENGINE_TRACE_OBJ=trace/trace.o trace/trace_commands.o trace/trace_db.o trace/trace_cluster.o trace/trace_server.o trace/trace_rdb.o trace/trace_aof.o -ENGINE_SERVER_OBJ=threads_mngr.o adlist.o vector.o quicklist.o ae.o anet.o dict.o hashtable.o kvstore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o memory_prefetch.o io_threads.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o commandlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o valkey-check-rdb.o valkey-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o allocator_defrag.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script.o functions.o commands.o strl.o connection.o unix.o logreqres.o rdma.o scripting_engine.o lua/script_lua.o lua/function_lua.o lua/engine_lua.o lua/debug_lua.o +ENGINE_SERVER_OBJ=threads_mngr.o adlist.o vector.o quicklist.o ae.o anet.o dict.o hashtable.o kvstore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o memory_prefetch.o io_threads.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o commandlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o valkey-check-rdb.o valkey-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o allocator_defrag.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script.o functions.o commands.o strl.o connection.o unix.o logreqres.o rdma.o scripting_engine.o entry.o volatile_set.o lua/script_lua.o lua/function_lua.o lua/engine_lua.o lua/debug_lua.o ENGINE_SERVER_OBJ+=$(ENGINE_TRACE_OBJ) ENGINE_CLI_NAME=$(ENGINE_NAME)-cli$(PROG_SUFFIX) ENGINE_CLI_OBJ=anet.o adlist.o dict.o valkey-cli.o zmalloc.o release.o ae.o serverassert.o crcspeed.o crccombine.o crc64.o siphash.o crc16.o monotonic.o cli_common.o mt19937-64.o strl.o cli_commands.o sds.o util.o sha256.o diff --git a/src/anet.c b/src/anet.c index 8bc1626966c..5524e9cf4c1 100644 --- a/src/anet.c +++ b/src/anet.c @@ -52,8 +52,6 @@ #include "util.h" #include "serverassert.h" -#define UNUSED(x) (void)(x) - static void anetSetError(char *err, const char *fmt, ...) { va_list ap; diff --git a/src/aof.c b/src/aof.c index 9b72aff0f53..567acdf60cf 100644 --- a/src/aof.c +++ b/src/aof.c @@ -1955,12 +1955,32 @@ static int rioWriteHashIteratorCursor(rio *r, hashTypeIterator *hi, int what) { * The function returns 0 on error, 1 on success. */ int rewriteHashObject(rio *r, robj *key, robj *o) { hashTypeIterator hi; - long long count = 0, items = hashTypeLength(o); - + long long count = 0, volatile_items = 0, non_volatile_items; + /* First serialize volatile items if exist */ + if (hashTypeHasVolatileElements(o)) { + hashTypeInitVolatileIterator(o, &hi); + while (hashTypeNext(&hi) != C_ERR) { + long long expiry = entryGetExpiry(hi.next); + sds field = entryGetField(hi.next); + sds value = entryGetValue(hi.next); + if (rioWriteBulkCount(r, '*', 8) == 0) return 0; + if (rioWriteBulkString(r, "HSETEX", 6) == 0) return 0; + if (rioWriteBulkObject(r, key) == 0) return 0; + if (rioWriteBulkString(r, "PXAT", 4) == 0) return 0; + if (rioWriteBulkLongLong(r, expiry) == 0) return 0; + if (rioWriteBulkString(r, "FIELDS", 6) == 0) return 0; + if (rioWriteBulkLongLong(r, 1) == 0) return 0; + if (rioWriteBulkString(r, field, sdslen(field)) == 0) return 0; + if (rioWriteBulkString(r, value, sdslen(value)) == 0) return 0; + volatile_items++; + } + hashTypeResetIterator(&hi); + } + non_volatile_items = hashTypeLength(o) - volatile_items; hashTypeInitIterator(o, &hi); while (hashTypeNext(&hi) != C_ERR) { if (count == 0) { - int cmd_items = (items > AOF_REWRITE_ITEMS_PER_CMD) ? AOF_REWRITE_ITEMS_PER_CMD : items; + int cmd_items = (non_volatile_items > AOF_REWRITE_ITEMS_PER_CMD) ? AOF_REWRITE_ITEMS_PER_CMD : non_volatile_items; if (!rioWriteBulkCount(r, '*', 2 + cmd_items * 2) || !rioWriteBulkString(r, "HMSET", 5) || !rioWriteBulkObject(r, key)) { @@ -1969,16 +1989,18 @@ int rewriteHashObject(rio *r, robj *key, robj *o) { } } + if (volatile_items > 0 && entryHasExpiry(hi.next)) + continue; + if (!rioWriteHashIteratorCursor(r, &hi, OBJ_HASH_FIELD) || !rioWriteHashIteratorCursor(r, &hi, OBJ_HASH_VALUE)) { hashTypeResetIterator(&hi); return 0; } if (++count == AOF_REWRITE_ITEMS_PER_CMD) count = 0; - items--; + non_volatile_items--; } hashTypeResetIterator(&hi); - return 1; } diff --git a/src/commands.def b/src/commands.def index 689b08be475..71d5a114736 100644 --- a/src/commands.def +++ b/src/commands.def @@ -3564,6 +3564,119 @@ struct COMMAND_ARG HEXISTS_Args[] = { {MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, }; +/********** HEXPIRE ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HEXPIRE history */ +#define HEXPIRE_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HEXPIRE tips */ +#define HEXPIRE_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HEXPIRE key specs */ +keySpec HEXPIRE_Keyspecs[1] = { +{NULL,CMD_KEY_RW|CMD_KEY_UPDATE,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HEXPIRE condition argument table */ +struct COMMAND_ARG HEXPIRE_condition_Subargs[] = { +{MAKE_ARG("nx",ARG_TYPE_PURE_TOKEN,-1,"NX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("xx",ARG_TYPE_PURE_TOKEN,-1,"XX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("gt",ARG_TYPE_PURE_TOKEN,-1,"GT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("lt",ARG_TYPE_PURE_TOKEN,-1,"LT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HEXPIRE fields argument table */ +struct COMMAND_ARG HEXPIRE_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HEXPIRE argument table */ +struct COMMAND_ARG HEXPIRE_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("seconds",ARG_TYPE_INTEGER,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("condition",ARG_TYPE_ONEOF,-1,NULL,NULL,NULL,CMD_ARG_OPTIONAL,4,NULL),.subargs=HEXPIRE_condition_Subargs}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HEXPIRE_fields_Subargs}, +}; + +/********** HEXPIREAT ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HEXPIREAT history */ +#define HEXPIREAT_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HEXPIREAT tips */ +#define HEXPIREAT_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HEXPIREAT key specs */ +keySpec HEXPIREAT_Keyspecs[1] = { +{NULL,CMD_KEY_RW|CMD_KEY_UPDATE,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HEXPIREAT condition argument table */ +struct COMMAND_ARG HEXPIREAT_condition_Subargs[] = { +{MAKE_ARG("nx",ARG_TYPE_PURE_TOKEN,-1,"NX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("xx",ARG_TYPE_PURE_TOKEN,-1,"XX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("gt",ARG_TYPE_PURE_TOKEN,-1,"GT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("lt",ARG_TYPE_PURE_TOKEN,-1,"LT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HEXPIREAT fields argument table */ +struct COMMAND_ARG HEXPIREAT_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HEXPIREAT argument table */ +struct COMMAND_ARG HEXPIREAT_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("unix-time-seconds",ARG_TYPE_INTEGER,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("condition",ARG_TYPE_ONEOF,-1,NULL,NULL,"9.0.0",CMD_ARG_OPTIONAL,4,NULL),.subargs=HEXPIREAT_condition_Subargs}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HEXPIREAT_fields_Subargs}, +}; + +/********** HEXPIRETIME ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HEXPIRETIME history */ +#define HEXPIRETIME_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HEXPIRETIME tips */ +#define HEXPIRETIME_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HEXPIRETIME key specs */ +keySpec HEXPIRETIME_Keyspecs[1] = { +{NULL,CMD_KEY_RO|CMD_KEY_ACCESS,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HEXPIRETIME fields argument table */ +struct COMMAND_ARG HEXPIRETIME_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HEXPIRETIME argument table */ +struct COMMAND_ARG HEXPIRETIME_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HEXPIRETIME_fields_Subargs}, +}; + /********** HGET ********************/ #ifndef SKIP_CMD_HISTORY_TABLE @@ -3615,6 +3728,47 @@ struct COMMAND_ARG HGETALL_Args[] = { {MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, }; +/********** HGETEX ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HGETEX history */ +#define HGETEX_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HGETEX tips */ +#define HGETEX_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HGETEX key specs */ +keySpec HGETEX_Keyspecs[1] = { +{NULL,CMD_KEY_RW|CMD_KEY_ACCESS,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HGETEX expiration argument table */ +struct COMMAND_ARG HGETEX_expiration_Subargs[] = { +{MAKE_ARG("seconds",ARG_TYPE_INTEGER,-1,"EX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("milliseconds",ARG_TYPE_INTEGER,-1,"PX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("unix-time-seconds",ARG_TYPE_UNIX_TIME,-1,"EXAT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("unix-time-milliseconds",ARG_TYPE_UNIX_TIME,-1,"PXAT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("persist",ARG_TYPE_PURE_TOKEN,-1,"PERSIST",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HGETEX fields argument table */ +struct COMMAND_ARG HGETEX_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HGETEX argument table */ +struct COMMAND_ARG HGETEX_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("expiration",ARG_TYPE_ONEOF,-1,NULL,NULL,NULL,CMD_ARG_OPTIONAL,5,NULL),.subargs=HGETEX_expiration_Subargs}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HGETEX_fields_Subargs}, +}; + /********** HINCRBY ********************/ #ifndef SKIP_CMD_HISTORY_TABLE @@ -3773,6 +3927,181 @@ struct COMMAND_ARG HMSET_Args[] = { {MAKE_ARG("data",ARG_TYPE_BLOCK,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,2,NULL),.subargs=HMSET_data_Subargs}, }; +/********** HPERSIST ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HPERSIST history */ +#define HPERSIST_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HPERSIST tips */ +#define HPERSIST_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HPERSIST key specs */ +keySpec HPERSIST_Keyspecs[1] = { +{NULL,CMD_KEY_RW|CMD_KEY_UPDATE,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HPERSIST fields argument table */ +struct COMMAND_ARG HPERSIST_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HPERSIST argument table */ +struct COMMAND_ARG HPERSIST_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HPERSIST_fields_Subargs}, +}; + +/********** HPEXPIRE ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HPEXPIRE history */ +#define HPEXPIRE_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HPEXPIRE tips */ +#define HPEXPIRE_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HPEXPIRE key specs */ +keySpec HPEXPIRE_Keyspecs[1] = { +{NULL,CMD_KEY_RW|CMD_KEY_UPDATE,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HPEXPIRE condition argument table */ +struct COMMAND_ARG HPEXPIRE_condition_Subargs[] = { +{MAKE_ARG("nx",ARG_TYPE_PURE_TOKEN,-1,"NX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("xx",ARG_TYPE_PURE_TOKEN,-1,"XX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("gt",ARG_TYPE_PURE_TOKEN,-1,"GT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("lt",ARG_TYPE_PURE_TOKEN,-1,"LT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HPEXPIRE fields argument table */ +struct COMMAND_ARG HPEXPIRE_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HPEXPIRE argument table */ +struct COMMAND_ARG HPEXPIRE_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("milliseconds",ARG_TYPE_INTEGER,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("condition",ARG_TYPE_ONEOF,-1,NULL,NULL,"9.0.0",CMD_ARG_OPTIONAL,4,NULL),.subargs=HPEXPIRE_condition_Subargs}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HPEXPIRE_fields_Subargs}, +}; + +/********** HPEXPIREAT ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HPEXPIREAT history */ +#define HPEXPIREAT_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HPEXPIREAT tips */ +#define HPEXPIREAT_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HPEXPIREAT key specs */ +keySpec HPEXPIREAT_Keyspecs[1] = { +{NULL,CMD_KEY_RW|CMD_KEY_UPDATE,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HPEXPIREAT condition argument table */ +struct COMMAND_ARG HPEXPIREAT_condition_Subargs[] = { +{MAKE_ARG("nx",ARG_TYPE_PURE_TOKEN,-1,"NX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("xx",ARG_TYPE_PURE_TOKEN,-1,"XX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("gt",ARG_TYPE_PURE_TOKEN,-1,"GT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("lt",ARG_TYPE_PURE_TOKEN,-1,"LT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HPEXPIREAT fields argument table */ +struct COMMAND_ARG HPEXPIREAT_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HPEXPIREAT argument table */ +struct COMMAND_ARG HPEXPIREAT_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("unix-time-milliseconds",ARG_TYPE_INTEGER,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("condition",ARG_TYPE_ONEOF,-1,NULL,NULL,"9.0.0",CMD_ARG_OPTIONAL,4,NULL),.subargs=HPEXPIREAT_condition_Subargs}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HPEXPIREAT_fields_Subargs}, +}; + +/********** HPEXPIRETIME ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HPEXPIRETIME history */ +#define HPEXPIRETIME_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HPEXPIRETIME tips */ +#define HPEXPIRETIME_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HPEXPIRETIME key specs */ +keySpec HPEXPIRETIME_Keyspecs[1] = { +{NULL,CMD_KEY_RO|CMD_KEY_ACCESS,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HPEXPIRETIME fields argument table */ +struct COMMAND_ARG HPEXPIRETIME_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HPEXPIRETIME argument table */ +struct COMMAND_ARG HPEXPIRETIME_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HPEXPIRETIME_fields_Subargs}, +}; + +/********** HPTTL ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HPTTL history */ +#define HPTTL_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HPTTL tips */ +#define HPTTL_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HPTTL key specs */ +keySpec HPTTL_Keyspecs[1] = { +{NULL,CMD_KEY_RO|CMD_KEY_ACCESS,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HPTTL fields argument table */ +struct COMMAND_ARG HPTTL_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HPTTL argument table */ +struct COMMAND_ARG HPTTL_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HPTTL_fields_Subargs}, +}; + /********** HRANDFIELD ********************/ #ifndef SKIP_CMD_HISTORY_TABLE @@ -3869,6 +4198,60 @@ struct COMMAND_ARG HSET_Args[] = { {MAKE_ARG("data",ARG_TYPE_BLOCK,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,2,NULL),.subargs=HSET_data_Subargs}, }; +/********** HSETEX ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HSETEX history */ +#define HSETEX_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HSETEX tips */ +#define HSETEX_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HSETEX key specs */ +keySpec HSETEX_Keyspecs[1] = { +{NULL,CMD_KEY_RW|CMD_KEY_INSERT,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HSETEX fields_condition argument table */ +struct COMMAND_ARG HSETEX_fields_condition_Subargs[] = { +{MAKE_ARG("fnx",ARG_TYPE_PURE_TOKEN,-1,"FNX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("fxx",ARG_TYPE_PURE_TOKEN,-1,"FXX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HSETEX expiration argument table */ +struct COMMAND_ARG HSETEX_expiration_Subargs[] = { +{MAKE_ARG("seconds",ARG_TYPE_INTEGER,-1,"EX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("milliseconds",ARG_TYPE_INTEGER,-1,"PX",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("unix-time-seconds",ARG_TYPE_UNIX_TIME,-1,"EXAT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("unix-time-milliseconds",ARG_TYPE_UNIX_TIME,-1,"PXAT",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("keepttl",ARG_TYPE_PURE_TOKEN,-1,"KEEPTTL",NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HSETEX fields data argument table */ +struct COMMAND_ARG HSETEX_fields_data_Subargs[] = { +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("value",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +}; + +/* HSETEX fields argument table */ +struct COMMAND_ARG HSETEX_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("data",ARG_TYPE_BLOCK,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,2,NULL),.subargs=HSETEX_fields_data_Subargs}, +}; + +/* HSETEX argument table */ +struct COMMAND_ARG HSETEX_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("fields-condition",ARG_TYPE_ONEOF,-1,NULL,NULL,NULL,CMD_ARG_OPTIONAL,2,NULL),.subargs=HSETEX_fields_condition_Subargs}, +{MAKE_ARG("expiration",ARG_TYPE_ONEOF,-1,NULL,NULL,NULL,CMD_ARG_OPTIONAL,5,NULL),.subargs=HSETEX_expiration_Subargs}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HSETEX_fields_Subargs}, +}; + /********** HSETNX ********************/ #ifndef SKIP_CMD_HISTORY_TABLE @@ -3920,6 +4303,37 @@ struct COMMAND_ARG HSTRLEN_Args[] = { {MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, }; +/********** HTTL ********************/ + +#ifndef SKIP_CMD_HISTORY_TABLE +/* HTTL history */ +#define HTTL_History NULL +#endif + +#ifndef SKIP_CMD_TIPS_TABLE +/* HTTL tips */ +#define HTTL_Tips NULL +#endif + +#ifndef SKIP_CMD_KEY_SPECS_TABLE +/* HTTL key specs */ +keySpec HTTL_Keyspecs[1] = { +{NULL,CMD_KEY_RO|CMD_KEY_ACCESS,KSPEC_BS_INDEX,.bs.index={1},KSPEC_FK_RANGE,.fk.range={0,1,0}} +}; +#endif + +/* HTTL fields argument table */ +struct COMMAND_ARG HTTL_fields_Subargs[] = { +{MAKE_ARG("numfields",ARG_TYPE_INTEGER,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("field",ARG_TYPE_STRING,-1,NULL,NULL,NULL,CMD_ARG_MULTIPLE,0,NULL)}, +}; + +/* HTTL argument table */ +struct COMMAND_ARG HTTL_Args[] = { +{MAKE_ARG("key",ARG_TYPE_KEY,0,NULL,NULL,NULL,CMD_ARG_NONE,0,NULL)}, +{MAKE_ARG("fields",ARG_TYPE_BLOCK,-1,"FIELDS",NULL,NULL,CMD_ARG_NONE,2,NULL),.subargs=HTTL_fields_Subargs}, +}; + /********** HVALS ********************/ #ifndef SKIP_CMD_HISTORY_TABLE @@ -11278,19 +11692,30 @@ struct COMMAND_STRUCT serverCommandTable[] = { /* hash */ {MAKE_CMD("hdel","Deletes one or more fields and their values from a hash. Deletes the hash if no fields remain.","O(N) where N is the number of fields to be removed.","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HDEL_History,1,HDEL_Tips,0,hdelCommand,-3,CMD_WRITE|CMD_FAST,ACL_CATEGORY_HASH,HDEL_Keyspecs,1,NULL,2),.args=HDEL_Args}, {MAKE_CMD("hexists","Determines whether a field exists in a hash.","O(1)","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HEXISTS_History,0,HEXISTS_Tips,0,hexistsCommand,3,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HEXISTS_Keyspecs,1,NULL,2),.args=HEXISTS_Args}, +{MAKE_CMD("hexpire","Set expiry time on hash fields.","O(N) where N is the number of specified fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HEXPIRE_History,0,HEXPIRE_Tips,0,hexpireCommand,-6,CMD_WRITE|CMD_FAST,ACL_CATEGORY_HASH,HEXPIRE_Keyspecs,1,NULL,4),.args=HEXPIRE_Args}, +{MAKE_CMD("hexpireat","Set expiry time on hash fields.","O(N) where N is the number of specified fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HEXPIREAT_History,0,HEXPIREAT_Tips,0,hexpireatCommand,-6,CMD_WRITE|CMD_FAST,ACL_CATEGORY_HASH,HEXPIREAT_Keyspecs,1,NULL,4),.args=HEXPIREAT_Args}, +{MAKE_CMD("hexpiretime","Returns Unix timestamps in seconds since the epoch at which the given key's field(s) will expire","O(1) for each field, so O(N) for N items when the command is called with multiple fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HEXPIRETIME_History,0,HEXPIRETIME_Tips,0,hexpiretimeCommand,-5,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HEXPIRETIME_Keyspecs,1,NULL,2),.args=HEXPIRETIME_Args}, {MAKE_CMD("hget","Returns the value of a field in a hash.","O(1)","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HGET_History,0,HGET_Tips,0,hgetCommand,3,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HGET_Keyspecs,1,NULL,2),.args=HGET_Args}, {MAKE_CMD("hgetall","Returns all fields and values in a hash.","O(N) where N is the size of the hash.","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HGETALL_History,0,HGETALL_Tips,1,hgetallCommand,2,CMD_READONLY,ACL_CATEGORY_HASH,HGETALL_Keyspecs,1,NULL,1),.args=HGETALL_Args}, +{MAKE_CMD("hgetex","Get the value of one or more fields of a given hash key, and optionally set their expiration time or time-to-live (TTL).","O(1)","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HGETEX_History,0,HGETEX_Tips,0,hgetexCommand,-5,CMD_WRITE|CMD_FAST,ACL_CATEGORY_HASH,HGETEX_Keyspecs,1,NULL,3),.args=HGETEX_Args}, {MAKE_CMD("hincrby","Increments the integer value of a field in a hash by a number. Uses 0 as initial value if the field doesn't exist.","O(1)","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HINCRBY_History,0,HINCRBY_Tips,0,hincrbyCommand,4,CMD_WRITE|CMD_DENYOOM|CMD_FAST,ACL_CATEGORY_HASH,HINCRBY_Keyspecs,1,NULL,3),.args=HINCRBY_Args}, {MAKE_CMD("hincrbyfloat","Increments the floating point value of a field by a number. Uses 0 as initial value if the field doesn't exist.","O(1)","2.6.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HINCRBYFLOAT_History,0,HINCRBYFLOAT_Tips,0,hincrbyfloatCommand,4,CMD_WRITE|CMD_DENYOOM|CMD_FAST,ACL_CATEGORY_HASH,HINCRBYFLOAT_Keyspecs,1,NULL,3),.args=HINCRBYFLOAT_Args}, {MAKE_CMD("hkeys","Returns all fields in a hash.","O(N) where N is the size of the hash.","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HKEYS_History,0,HKEYS_Tips,1,hkeysCommand,2,CMD_READONLY,ACL_CATEGORY_HASH,HKEYS_Keyspecs,1,NULL,1),.args=HKEYS_Args}, {MAKE_CMD("hlen","Returns the number of fields in a hash.","O(1)","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HLEN_History,0,HLEN_Tips,0,hlenCommand,2,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HLEN_Keyspecs,1,NULL,1),.args=HLEN_Args}, {MAKE_CMD("hmget","Returns the values of all fields in a hash.","O(N) where N is the number of fields being requested.","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HMGET_History,0,HMGET_Tips,0,hmgetCommand,-3,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HMGET_Keyspecs,1,NULL,2),.args=HMGET_Args}, {MAKE_CMD("hmset","Sets the values of multiple fields.","O(N) where N is the number of fields being set.","2.0.0",CMD_DOC_DEPRECATED,"`HSET` with multiple field-value pairs","4.0.0","hash",COMMAND_GROUP_HASH,HMSET_History,0,HMSET_Tips,0,hsetCommand,-4,CMD_WRITE|CMD_DENYOOM|CMD_FAST,ACL_CATEGORY_HASH,HMSET_Keyspecs,1,NULL,2),.args=HMSET_Args}, +{MAKE_CMD("hpersist","Remove the existing expiration on a hash key's field(s).","O(1) for each field assigned with TTL, so O(N) to persist N items when the command is called with multiple fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HPERSIST_History,0,HPERSIST_Tips,0,hpersistCommand,-5,CMD_WRITE|CMD_FAST,ACL_CATEGORY_HASH,HPERSIST_Keyspecs,1,NULL,2),.args=HPERSIST_Args}, +{MAKE_CMD("hpexpire","Set expiry time on hash object.","O(N) where N is the number of specified fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HPEXPIRE_History,0,HPEXPIRE_Tips,0,hpexpireCommand,-6,CMD_WRITE|CMD_FAST,ACL_CATEGORY_HASH,HPEXPIRE_Keyspecs,1,NULL,4),.args=HPEXPIRE_Args}, +{MAKE_CMD("hpexpireat","Set expiration time on hash field.","O(N) where N is the number of specified fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HPEXPIREAT_History,0,HPEXPIREAT_Tips,0,hpexpireatCommand,-6,CMD_WRITE|CMD_FAST,ACL_CATEGORY_HASH,HPEXPIREAT_Keyspecs,1,NULL,4),.args=HPEXPIREAT_Args}, +{MAKE_CMD("hpexpiretime","Returns the Unix timestamp in milliseconds since Unix epoch at which the given key's field(s) will expire","O(1) for each field, so O(N) for N items when the command is called with multiple fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HPEXPIRETIME_History,0,HPEXPIRETIME_Tips,0,hpexpiretimeCommand,-5,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HPEXPIRETIME_Keyspecs,1,NULL,2),.args=HPEXPIRETIME_Args}, +{MAKE_CMD("hpttl","Returns the remaining time to live in milliseconds of a hash key's field(s) that have an associated expiration.","O(1) for each field assigned with TTL, so O(N) for N items when the command is called with multiple fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HPTTL_History,0,HPTTL_Tips,0,hpttlCommand,-5,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HPTTL_Keyspecs,1,NULL,2),.args=HPTTL_Args}, {MAKE_CMD("hrandfield","Returns one or more random fields from a hash.","O(N) where N is the number of fields returned","6.2.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HRANDFIELD_History,0,HRANDFIELD_Tips,1,hrandfieldCommand,-2,CMD_READONLY,ACL_CATEGORY_HASH,HRANDFIELD_Keyspecs,1,NULL,2),.args=HRANDFIELD_Args}, {MAKE_CMD("hscan","Iterates over fields and values of a hash.","O(1) for every call. O(N) for a complete iteration, including enough command calls for the cursor to return back to 0. N is the number of elements inside the collection.","2.8.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HSCAN_History,0,HSCAN_Tips,1,hscanCommand,-3,CMD_READONLY,ACL_CATEGORY_HASH,HSCAN_Keyspecs,1,NULL,5),.args=HSCAN_Args}, {MAKE_CMD("hset","Creates or modifies the value of a field in a hash.","O(1) for each field/value pair added, so O(N) to add N field/value pairs when the command is called with multiple field/value pairs.","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HSET_History,1,HSET_Tips,0,hsetCommand,-4,CMD_WRITE|CMD_DENYOOM|CMD_FAST,ACL_CATEGORY_HASH,HSET_Keyspecs,1,NULL,2),.args=HSET_Args}, +{MAKE_CMD("hsetex","Set the value of one or more fields of a given hash key, and optionally set their expiration time.","O(1)","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HSETEX_History,0,HSETEX_Tips,0,hsetexCommand,-6,CMD_WRITE|CMD_DENYOOM|CMD_FAST,ACL_CATEGORY_HASH,HSETEX_Keyspecs,1,NULL,4),.args=HSETEX_Args}, {MAKE_CMD("hsetnx","Sets the value of a field in a hash only when the field doesn't exist.","O(1)","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HSETNX_History,0,HSETNX_Tips,0,hsetnxCommand,4,CMD_WRITE|CMD_DENYOOM|CMD_FAST,ACL_CATEGORY_HASH,HSETNX_Keyspecs,1,NULL,3),.args=HSETNX_Args}, {MAKE_CMD("hstrlen","Returns the length of the value of a field.","O(1)","3.2.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HSTRLEN_History,0,HSTRLEN_Tips,0,hstrlenCommand,3,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HSTRLEN_Keyspecs,1,NULL,2),.args=HSTRLEN_Args}, +{MAKE_CMD("httl","Returns the remaining time to live (in seconds) of a hash key's field(s) that have an associated expiration.","O(1) for each field, so O(N) for N items when the command is called with multiple fields.","9.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HTTL_History,0,HTTL_Tips,0,httlCommand,-5,CMD_READONLY|CMD_FAST,ACL_CATEGORY_HASH,HTTL_Keyspecs,1,NULL,2),.args=HTTL_Args}, {MAKE_CMD("hvals","Returns all values in a hash.","O(N) where N is the size of the hash.","2.0.0",CMD_DOC_NONE,NULL,NULL,"hash",COMMAND_GROUP_HASH,HVALS_History,0,HVALS_Tips,1,hvalsCommand,2,CMD_READONLY,ACL_CATEGORY_HASH,HVALS_Keyspecs,1,NULL,1),.args=HVALS_Args}, /* hyperloglog */ {MAKE_CMD("pfadd","Adds elements to a HyperLogLog key. Creates the key if it doesn't exist.","O(1) to add every element.","2.8.9",CMD_DOC_NONE,NULL,NULL,"hyperloglog",COMMAND_GROUP_HYPERLOGLOG,PFADD_History,0,PFADD_Tips,0,pfaddCommand,-2,CMD_WRITE|CMD_DENYOOM|CMD_FAST,ACL_CATEGORY_HYPERLOGLOG,PFADD_Keyspecs,1,NULL,2),.args=PFADD_Args}, diff --git a/src/commands/hexpire.json b/src/commands/hexpire.json new file mode 100644 index 00000000000..338fe53dd4f --- /dev/null +++ b/src/commands/hexpire.json @@ -0,0 +1,118 @@ +{ + "HEXPIRE": { + "summary": "Set expiry time on hash fields.", + "complexity": "O(N) where N is the number of specified fields.", + "group": "hash", + "since": "9.0.0", + "arity": -6, + "function": "hexpireCommand", + "command_flags": [ + "WRITE", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RW", + "UPDATE" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of integer codes indicating the result of setting expiry on each specified field, in the same order as the fields are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the HASH, or key does not exist.", + "const": -2 + }, + { + "description": "The specified NX | XX | GT | LT condition has not been met.", + "const": 0 + }, + { + "description": "The expiration time was applied.", + "const": 1 + }, + { + "description": "When called with a 0 second", + "const": 2 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "seconds", + "type": "integer" + }, + { + "name": "condition", + "type": "oneof", + "optional": true, + "arguments": [ + { + "name": "nx", + "type": "pure-token", + "token": "NX" + }, + { + "name": "xx", + "type": "pure-token", + "token": "XX" + }, + { + "name": "gt", + "type": "pure-token", + "token": "GT" + }, + { + "name": "lt", + "type": "pure-token", + "token": "LT" + } + ] + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hexpireat.json b/src/commands/hexpireat.json new file mode 100644 index 00000000000..995391f0e6d --- /dev/null +++ b/src/commands/hexpireat.json @@ -0,0 +1,120 @@ +{ + "HEXPIREAT": { + "summary": "Set expiry time on hash fields.", + "complexity": "O(N) where N is the number of specified fields.", + "group": "hash", + "since": "9.0.0", + "arity": -6, + "function": "hexpireatCommand", + "command_flags": [ + "WRITE", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RW", + "UPDATE" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of integer codes indicating the result of setting expiry on each specified field, in the same order as the fields are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the HASH, or HASH is empty.", + "const": -2 + }, + { + "description": "The specified NX | XX | GT | LT condition has not been met.", + "const": 0 + }, + { + "description": "The expiration time was applied.", + "const": 1 + }, + { + "description": "When called with a 0 second or is called with a past Unix time in seconds.", + "const": 2 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "unix-time-seconds", + "type": "integer" + }, + { + "name": "condition", + "type": "oneof", + "optional": true, + "since": "9.0.0", + "arguments": [ + { + "name": "nx", + "type": "pure-token", + "token": "NX" + }, + { + "name": "xx", + "type": "pure-token", + "token": "XX" + }, + { + "name": "gt", + "type": "pure-token", + "token": "GT" + }, + { + "name": "lt", + "type": "pure-token", + "token": "LT" + } + ] + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hexpiretime.json b/src/commands/hexpiretime.json new file mode 100644 index 00000000000..82c4d5c70ee --- /dev/null +++ b/src/commands/hexpiretime.json @@ -0,0 +1,85 @@ +{ + "HEXPIRETIME": { + "summary": "Returns Unix timestamps in seconds since the epoch at which the given key's field(s) will expire", + "complexity": "O(1) for each field, so O(N) for N items when the command is called with multiple fields.", + "group": "hash", + "since": "9.0.0", + "arity": -5, + "function": "hexpiretimeCommand", + "command_flags": [ + "READONLY", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RO", + "ACCESS" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of values associated with the result of getting the absolute expiry timestamp of the specific fields, in the same order as they are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the provided hash key, or the hash key is empty.", + "const": -2 + }, + { + "description": "Field exists in the provided hash key, but has no expiration associated with it.", + "const": -1 + }, + { + "description": "The expiration time associated with the hash key field, in seconds.", + "type": "integer", + "minimum": 0 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hgetex.json b/src/commands/hgetex.json new file mode 100644 index 00000000000..ec25c79fa5c --- /dev/null +++ b/src/commands/hgetex.json @@ -0,0 +1,118 @@ +{ + "HGETEX": { + "summary": "Get the value of one or more fields of a given hash key, and optionally set their expiration time or time-to-live (TTL).", + "complexity": "O(1)", + "group": "hash", + "since": "9.0.0", + "arity": -5, + "function": "hgetexCommand", + "command_flags": [ + "WRITE", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RW", + "ACCESS" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "oneOf": [ + { + "description": "List of values associated with the given fields, in the same order as they are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ] + } + }, + { + "description": "Key does not exist.", + "type": "null" + } + ] + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "expiration", + "type": "oneof", + "optional": true, + "arguments": [ + { + "name": "seconds", + "type": "integer", + "token": "EX" + }, + { + "name": "milliseconds", + "type": "integer", + "token": "PX" + }, + { + "name": "unix-time-seconds", + "type": "unix-time", + "token": "EXAT" + }, + { + "name": "unix-time-milliseconds", + "type": "unix-time", + "token": "PXAT" + }, + { + "name": "persist", + "type": "pure-token", + "token": "PERSIST" + } + ] + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hpersist.json b/src/commands/hpersist.json new file mode 100644 index 00000000000..180d3e90161 --- /dev/null +++ b/src/commands/hpersist.json @@ -0,0 +1,84 @@ +{ + "HPERSIST": { + "summary": "Remove the existing expiration on a hash key's field(s).", + "complexity": "O(1) for each field assigned with TTL, so O(N) to persist N items when the command is called with multiple fields.", + "group": "hash", + "since": "9.0.0", + "arity": -5, + "function": "hpersistCommand", + "command_flags": [ + "WRITE", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RW", + "UPDATE" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of integer codes indicating the result of setting expiry on each specified field, in the same order as the fields are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the provided hash key, or the hash key does not exist.", + "const": -2 + }, + { + "description": "Field exists in the provided hash key, but has no expiration associated with it.", + "const": -1 + }, + { + "description": "The expiration time was removed from the hash key field.", + "const": 1 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hpexpire.json b/src/commands/hpexpire.json new file mode 100644 index 00000000000..0cdec60a3a4 --- /dev/null +++ b/src/commands/hpexpire.json @@ -0,0 +1,120 @@ +{ + "HPEXPIRE": { + "summary": "Set expiry time on hash object.", + "complexity": "O(N) where N is the number of specified fields.", + "group": "hash", + "since": "9.0.0", + "arity": -6, + "function": "hpexpireCommand", + "command_flags": [ + "WRITE", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RW", + "UPDATE" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of integer codes indicating the result of setting expiry on each specified field, in the same order as the fields are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the HASH, or HASH is empty.", + "const": -2 + }, + { + "description": "The specified NX | XX | GT | LT condition has not been met.", + "const": 0 + }, + { + "description": "The expiration time was applied.", + "const": 1 + }, + { + "description": "When called with a 0 millisecond", + "const": 2 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "milliseconds", + "type": "integer" + }, + { + "name": "condition", + "type": "oneof", + "optional": true, + "since": "9.0.0", + "arguments": [ + { + "name": "nx", + "type": "pure-token", + "token": "NX" + }, + { + "name": "xx", + "type": "pure-token", + "token": "XX" + }, + { + "name": "gt", + "type": "pure-token", + "token": "GT" + }, + { + "name": "lt", + "type": "pure-token", + "token": "LT" + } + ] + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hpexpireat.json b/src/commands/hpexpireat.json new file mode 100644 index 00000000000..a696b3a1386 --- /dev/null +++ b/src/commands/hpexpireat.json @@ -0,0 +1,120 @@ +{ + "HPEXPIREAT": { + "summary": "Set expiration time on hash field.", + "complexity": "O(N) where N is the number of specified fields.", + "group": "hash", + "since": "9.0.0", + "arity": -6, + "function": "hpexpireatCommand", + "command_flags": [ + "WRITE", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RW", + "UPDATE" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of integer codes indicating the result of setting expiry on each specified field, in the same order as the fields are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the HASH, or HASH is empty.", + "const": -2 + }, + { + "description": "The specified NX | XX | GT | LT condition has not been met.", + "const": 0 + }, + { + "description": "The expiration time was applied.", + "const": 1 + }, + { + "description": "When called with a 0 second or is called with a past Unix time in milliseconds.", + "const": 2 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "unix-time-milliseconds", + "type": "integer" + }, + { + "name": "condition", + "type": "oneof", + "optional": true, + "since": "9.0.0", + "arguments": [ + { + "name": "nx", + "type": "pure-token", + "token": "NX" + }, + { + "name": "xx", + "type": "pure-token", + "token": "XX" + }, + { + "name": "gt", + "type": "pure-token", + "token": "GT" + }, + { + "name": "lt", + "type": "pure-token", + "token": "LT" + } + ] + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hpexpiretime.json b/src/commands/hpexpiretime.json new file mode 100644 index 00000000000..6a2be6a22f8 --- /dev/null +++ b/src/commands/hpexpiretime.json @@ -0,0 +1,85 @@ +{ + "HPEXPIRETIME": { + "summary": "Returns the Unix timestamp in milliseconds since Unix epoch at which the given key's field(s) will expire", + "complexity": "O(1) for each field, so O(N) for N items when the command is called with multiple fields.", + "group": "hash", + "since": "9.0.0", + "arity": -5, + "function": "hpexpiretimeCommand", + "command_flags": [ + "READONLY", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RO", + "ACCESS" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of values associated with the result of getting the absolute expiry timestamp of the specific fields, in the same order as they are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the provided hash key, or the hash key is empty.", + "const": -2 + }, + { + "description": "Field exists in the provided hash key, but has no expiration associated with it.", + "const": -1 + }, + { + "description": "The expiration time associated with the hash key field, in milliseconds.", + "type": "integer", + "minimum": 0 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hpttl.json b/src/commands/hpttl.json new file mode 100644 index 00000000000..f1c7da24c7d --- /dev/null +++ b/src/commands/hpttl.json @@ -0,0 +1,85 @@ +{ + "HPTTL": { + "summary": "Returns the remaining time to live in milliseconds of a hash key's field(s) that have an associated expiration.", + "complexity": "O(1) for each field assigned with TTL, so O(N) for N items when the command is called with multiple fields.", + "group": "hash", + "since": "9.0.0", + "arity": -5, + "function": "hpttlCommand", + "command_flags": [ + "READONLY", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RO", + "ACCESS" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of values associated with the result of getting the remaining time-to-live of the specific fields, in the same order as they are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the provided hash key, or the hash key is empty", + "const": -2 + }, + { + "description": "Field exists in the provided hash key, but has no expiration associated with it.", + "const": -1 + }, + { + "description": "The expiration time associated with the hash key field, in milliseconds.", + "type": "integer", + "minimum": 0 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/commands/hsetex.json b/src/commands/hsetex.json new file mode 100644 index 00000000000..7e1df6ead0e --- /dev/null +++ b/src/commands/hsetex.json @@ -0,0 +1,135 @@ +{ + "HSETEX": { + "summary": "Set the value of one or more fields of a given hash key, and optionally set their expiration time.", + "complexity": "O(1)", + "group": "hash", + "since": "9.0.0", + "arity": -6, + "function": "hsetexCommand", + "command_flags": [ + "WRITE", + "DENYOOM", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RW", + "INSERT" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "oneOf": [ + { + "description": "None of the provided fields value and or expiration time was set.", + "const": 0 + }, + { + "description": "All the fields value and or expiration time was set.", + "const": 1 + } + ] + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "fields-condition", + "type": "oneof", + "optional": true, + "arguments": [ + { + "name": "fnx", + "type": "pure-token", + "token": "FNX" + }, + { + "name": "fxx", + "type": "pure-token", + "token": "FXX" + } + ] + }, + { + "name": "expiration", + "type": "oneof", + "optional": true, + "arguments": [ + { + "name": "seconds", + "type": "integer", + "token": "EX" + }, + { + "name": "milliseconds", + "type": "integer", + "token": "PX" + }, + { + "name": "unix-time-seconds", + "type": "unix-time", + "token": "EXAT" + }, + { + "name": "unix-time-milliseconds", + "type": "unix-time", + "token": "PXAT" + }, + { + "name": "keepttl", + "type": "pure-token", + "token": "KEEPTTL" + } + ] + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "data", + "type": "block", + "multiple": true, + "arguments": [ + { + "name": "field", + "type": "string" + }, + { + "name": "value", + "type": "string" + } + ] + } + ] + } + ] + } +} diff --git a/src/commands/httl.json b/src/commands/httl.json new file mode 100644 index 00000000000..6d3ab789a7c --- /dev/null +++ b/src/commands/httl.json @@ -0,0 +1,85 @@ +{ + "HTTL": { + "summary": "Returns the remaining time to live (in seconds) of a hash key's field(s) that have an associated expiration.", + "complexity": "O(1) for each field, so O(N) for N items when the command is called with multiple fields.", + "group": "hash", + "since": "9.0.0", + "arity": -5, + "function": "httlCommand", + "command_flags": [ + "READONLY", + "FAST" + ], + "acl_categories": [ + "HASH" + ], + "key_specs": [ + { + "flags": [ + "RO", + "ACCESS" + ], + "begin_search": { + "index": { + "pos": 1 + } + }, + "find_keys": { + "range": { + "lastkey": 0, + "step": 1, + "limit": 0 + } + } + } + ], + "reply_schema": { + "description": "List of values associated with the result of getting the remaining time-to-live of the specific fields, in the same order as they are requested.", + "type": "array", + "minItems": 1, + "items": { + "oneOf": [ + { + "description": "Field does not exist in the provided hash key, or the hash key is empty", + "const": -2 + }, + { + "description": "Field exists in the provided hash key, but has no expiration associated with it.", + "const": -1 + }, + { + "description": "The expiration time associated with the hash key field, in seconds.", + "type": "integer", + "minimum": 0 + } + ] + } + }, + "arguments": [ + { + "name": "key", + "type": "key", + "key_spec_index": 0 + }, + { + "name": "fields", + "token": "FIELDS", + "type": "block", + "arguments": [ + { + "name": "numfields", + "type": "integer", + "key_spec_index": 0, + "multiple": false, + "minimum": 1 + }, + { + "name": "field", + "type": "string", + "multiple": true + } + ] + } + ] + } +} diff --git a/src/db.c b/src/db.c index add4ff9b940..9350bd56cc6 100644 --- a/src/db.c +++ b/src/db.c @@ -35,6 +35,7 @@ #include "io_threads.h" #include "module.h" #include "vector.h" +#include "expire.h" #include #include @@ -43,17 +44,6 @@ * C-level DB API *----------------------------------------------------------------------------*/ -/* Flags for expireIfNeeded */ -#define EXPIRE_FORCE_DELETE_EXPIRED 1 -#define EXPIRE_AVOID_DELETE_EXPIRED 2 - -/* Return values for expireIfNeeded */ -typedef enum { - KEY_VALID = 0, /* Could be volatile and not yet expired, non-volatile, or even non-existing key. */ - KEY_EXPIRED, /* Logically expired but not yet deleted. */ - KEY_DELETED /* The key was deleted now. */ -} keyStatus; - static keyStatus expireIfNeededWithDictIndex(serverDb *db, robj *key, robj *val, int flags, int dict_index); static keyStatus expireIfNeeded(serverDb *db, robj *key, robj *val, int flags); static int keyIsExpiredWithDictIndex(serverDb *db, robj *key, int dict_index); @@ -125,7 +115,7 @@ robj *lookupKey(serverDb *db, robj *key, int flags) { /* Update the access time for the ageing algorithm. * Don't do it if we have a saving child, as this will trigger * a copy on write madness. */ - if (server.current_client && server.current_client->flag.no_touch && + if (server.current_client && server.current_client->flag.no_touch && server.executing_client && server.executing_client->cmd->proc != touchCommand) flags |= LOOKUP_NOTOUCH; if (!hasActiveChildProcess() && !(flags & LOOKUP_NOTOUCH)) { @@ -1004,9 +994,9 @@ void hashtableScanCallback(void *privdata, void *entry) { key = node->ele; /* zset data is copied after filtering by key */ } else if (o->type == OBJ_HASH) { - key = hashTypeEntryGetField(entry); + key = entryGetField(entry); if (!data->only_keys) { - val = hashTypeEntryGetValue(entry); + val = entryGetValue(entry); } } else { serverPanic("Type not handled in hashtable SCAN callback."); @@ -1900,16 +1890,6 @@ void propagateDeletion(serverDb *db, robj *key, int lazy) { server.replication_allowed = prev_replication_allowed; } -/* Returns 1 if the expire value is expired, 0 otherwise. */ -static int timestampIsExpired(mstime_t when) { - if (when < 0) return 0; /* no expire */ - mstime_t now = commandTimeSnapshot(); - - /* The key expired if the current (virtual or real) time is greater - * than the expire time of the key. */ - return now > when; -} - /* Use this instead of keyIsExpired if you already have the value object. */ static int objectIsExpired(robj *val) { /* Don't expire anything while loading. It will be done later. */ @@ -1925,7 +1905,7 @@ static int keyIsExpiredWithDictIndexImpl(serverDb *db, robj *key, int dict_index /* Don't expire anything while loading. It will be done later. */ if (server.loading) return 0; mstime_t when = getExpireWithDictIndex(db, key, dict_index); - return timestampIsExpired(when); + return timestampIsExpired(when) ? 1 : 0; } /* Check if the key is expired. */ @@ -1953,52 +1933,11 @@ static keyStatus expireIfNeededWithDictIndex(serverDb *db, robj *key, robj *val, } else { if (!keyIsExpiredWithDictIndexImpl(db, key, dict_index)) return KEY_VALID; } - - /* If we are running in the context of a replica, instead of - * evicting the expired key from the database, we return ASAP: - * the replica key expiration is controlled by the primary that will - * send us synthesized DEL operations for expired keys. The - * exception is when write operations are performed on writable - * replicas. - * - * Still we try to return the right information to the caller, - * that is, KEY_VALID if we think the key should still be valid, - * KEY_EXPIRED if we think the key is expired but don't want to delete it at this time. - * - * When replicating commands from the primary, keys are never considered - * expired. */ - if (server.primary_host != NULL) { - if (server.current_client && (server.current_client->flag.primary)) return KEY_VALID; - if (!(flags & EXPIRE_FORCE_DELETE_EXPIRED)) return KEY_EXPIRED; - } else if (server.import_mode) { - /* If we are running in the import mode on a primary, instead of - * evicting the expired key from the database, we return ASAP: - * the key expiration is controlled by the import source that will - * send us synthesized DEL operations for expired keys. The - * exception is when write operations are performed on this server - * because it's a primary. - * - * Notice: other clients, apart from the import source, should not access - * the data imported by import source. - * - * Still we try to return the right information to the caller, - * that is, KEY_VALID if we think the key should still be valid, - * KEY_EXPIRED if we think the key is expired but don't want to delete it at this time. - * - * When receiving commands from the import source, keys are never considered - * expired. */ - if (server.current_client && (server.current_client->flag.import_source)) return KEY_VALID; - if (!(flags & EXPIRE_FORCE_DELETE_EXPIRED)) return KEY_EXPIRED; - } - - /* In some cases we're explicitly instructed to return an indication of a - * missing key without actually deleting it, even on primaries. */ - if (flags & EXPIRE_AVOID_DELETE_EXPIRED) return KEY_EXPIRED; - - /* If 'expire' action is paused, for whatever reason, then don't expire any key. - * Typically, at the end of the pause we will properly expire the key OR we - * will have failed over and the new primary will send us the expire. */ - if (isPausedActionsWithUpdate(PAUSE_ACTION_EXPIRE)) return KEY_EXPIRED; + expirationPolicy policy = getExpirationPolicyWithFlags(flags); + if (policy == POLICY_IGNORE_EXPIRE) /* Ignore keys expiration. treat all keys as valid. */ + return KEY_VALID; + else if (policy == POLICY_KEEP_EXPIRED) /* Treat expired keys as invalid, but do not delete them. */ + return KEY_EXPIRED; /* The key needs to be converted from static to heap before deleted */ int static_key = key->refcount == OBJ_STATIC_REFCOUNT; diff --git a/src/defrag.c b/src/defrag.c index 9ea8a10741f..8eb0e32accd 100644 --- a/src/defrag.c +++ b/src/defrag.c @@ -39,6 +39,7 @@ */ #include "server.h" +#include "entry.h" #include "hashtable.h" #include "eval.h" #include "script.h" @@ -442,18 +443,27 @@ static void scanLaterSet(robj *ob, unsigned long *cursor) { } /* Hashtable scan callback for hash datatype */ -static void activeDefragHashTypeEntry(void *privdata, void *element_ref) { - UNUSED(privdata); - hashTypeEntry **entry_ref = (hashTypeEntry **)element_ref; - - hashTypeEntry *new_entry = hashTypeEntryDefrag(*entry_ref, activeDefragAlloc, activeDefragSds); - if (new_entry) *entry_ref = new_entry; +static void activeDefragEntry(void *privdata, void *element_ref) { + entry **entry_ref = (entry **)element_ref; + entry *old_entry = *entry_ref, *new_entry = NULL; + long long old_expiry = entryGetExpiry(old_entry); + + new_entry = entryDefrag(*entry_ref, activeDefragAlloc, activeDefragSds); + if (new_entry) { + /* In case the entry is tracked we need to update it in the volatile set */ + if (entryHasExpiry(new_entry)) { + robj *obj = (robj *)privdata; + serverAssert(obj); + hashTypeTrackUpdateEntry(obj, old_entry, new_entry, old_expiry, entryGetExpiry(new_entry)); + } + *entry_ref = new_entry; + } } static void scanLaterHash(robj *ob, unsigned long *cursor) { serverAssert(ob->type == OBJ_HASH && ob->encoding == OBJ_ENCODING_HASHTABLE); hashtable *ht = ob->ptr; - *cursor = hashtableScanDefrag(ht, *cursor, activeDefragHashTypeEntry, NULL, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF); + *cursor = hashtableScanDefrag(ht, *cursor, activeDefragEntry, ob, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF); } static void defragQuicklist(robj *ob) { @@ -498,7 +508,7 @@ static void defragHash(robj *ob) { } else { unsigned long cursor = 0; do { - cursor = hashtableScanDefrag(ht, cursor, activeDefragHashTypeEntry, NULL, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF); + cursor = hashtableScanDefrag(ht, cursor, activeDefragEntry, ob, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF); } while (cursor != 0); } /* defrag the hashtable struct and tables */ diff --git a/src/entry.c b/src/entry.c new file mode 100644 index 00000000000..097a36387c9 --- /dev/null +++ b/src/entry.c @@ -0,0 +1,410 @@ +#include +#include "server.h" +#include "serverassert.h" +#include "entry.h" + +#include + +/*----------------------------------------------------------------------------- + * Entry API + *----------------------------------------------------------------------------*/ + +/* The entry pointer is the field sds. We encode the entry layout type + * in the field SDS header. Field type SDS_TYPE_5 doesn't have any spare bits to + * encode this so we use it only for the first layout type. + * + * Entry with embedded value, used for small sizes. The value is stored as + * SDS_TYPE_8. The field can use any SDS type. + * + * Entry can also have expiration timestamp, which is the UNIX timestamp for it to be expired. + * For aligned fast access, we keep the expiry timestamp prior to the start of the sds header. + * + * +--------------+--------------+---------------+ + * | Expiration | field | value | + * | 1234567890LL | hdr "foo" \0 | hdr8 "bar" \0 | + * +--------------+--------------+---------------+ + * + * Entry with value pointer, used for larger fields and values. The field is SDS + * type 8 or higher. + * + * +--------------+-------+--------------+ + * | Expiration | value | field | + * | 1234567890LL | ptr | hdr "foo" \0 | + * +--------------+---^---+--------------+ + * | + * | + * value pointer = value sds + */ + +enum { + /* SDS aux flag. If set, it indicates that the entry has TTL metadata set. */ + FIELD_SDS_AUX_BIT_ENTRY_HAS_EXPIRY = 0, + /* SDS aux flag. If set, it indicates that the entry has an embedded value + * pointer located in memory before the embedded field. If unset, the entry + * instead has an embedded value located after the embedded field. */ + FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR = 1, + FIELD_SDS_AUX_BIT_MAX +}; +static_assert(FIELD_SDS_AUX_BIT_MAX < sizeof(char) - SDS_TYPE_BITS, "too many sds bits are used for entry metadata"); + +/* Returns true in case the entry's value is not embedded in the entry. + * Returns false otherwise. */ +static inline bool entryHasValuePtr(const entry *entry) { + return sdsGetAuxBit(entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR); +} + +/* Returns true in case the entry's value is embedded in the entry. + * Returns false otherwise. */ +bool entryHasEmbeddedValue(entry *entry) { + return (!entryHasValuePtr(entry)); +} + +/* Returns true in case the entry has expiration timestamp. + * Returns false otherwise. */ +bool entryHasExpiry(const entry *entry) { + return sdsGetAuxBit(entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_EXPIRY); +} + +/* The entry pointer is the field sds, but that's an implementation detail. */ +sds entryGetField(const entry *entry) { + return (sds)entry; +} + +/* Returns the location of a pointer to a separately allocated value. Only for + * an entry without an embedded value. */ +static sds *entryGetValueRef(const entry *entry) { + serverAssert(entryHasValuePtr(entry)); + char *field_data = sdsAllocPtr(entry); + field_data -= sizeof(sds); + return (sds *)field_data; +} + +/* Returns the sds of the entry's value. */ +sds entryGetValue(const entry *entry) { + if (entryHasValuePtr(entry)) { + return *entryGetValueRef(entry); + } else { + /* Skip field content, field null terminator and value sds8 hdr. */ + size_t offset = sdslen(entry) + 1 + sdsHdrSize(SDS_TYPE_8); + return (char *)entry + offset; + } +} + +/* Modify the value of this entry and return a pointer to the (potentially new) entry. + * The value is taken by the function and cannot be reused after this function returns. */ +entry *entrySetValue(entry *e, sds value) { + if (entryHasValuePtr(e)) { + sds *value_ref = entryGetValueRef(e); + sdsfree(*value_ref); + *value_ref = value; + return e; + } else { + entry *new_entry = entryUpdate(e, value, entryGetExpiry(e)); + return new_entry; + } +} + +/* Returns the address of the entry allocation. */ +void *entryGetAllocPtr(const entry *entry) { + char *buf = sdsAllocPtr(entry); + if (entryHasValuePtr(entry)) buf -= sizeof(sds); + if (entryHasExpiry(entry)) buf -= sizeof(long long); + return buf; +} + +/**************************************** Entry Expiry API *****************************************/ + +/* Returns the entry expiration timestamp. + * In case this entry has no expiration time, will return EXPIRE_NONE. */ +long long entryGetExpiry(const entry *entry) { + long long expiry = EXPIRY_NONE; + if (entryHasExpiry(entry)) { + char *buf = entryGetAllocPtr(entry); + debugServerAssert((((uintptr_t)buf & 0x7) == 0)); /* Test that the allocation is indeed 8 bytes aligned + * This is needed since we access the expiry as with pointer casting + * which require the access to be 8 bytes aligned. */ + expiry = *(long long *)buf; + } + return expiry; +} + +/* Modify the expiration time of this entry and return a pointer to the (potentially new) entry. */ +entry *entrySetExpiry(entry *e, long long expiry) { + if (entryHasExpiry(e)) { + char *buf = entryGetAllocPtr(e); + debugServerAssert((((uintptr_t)buf & 0x7) == 0)); /* Test that the allocation is indeed 8 bytes aligned + * This is needed since we access the expiry as with pointer casting + * which require the access to be 8 bytes aligned. */ + *(long long *)buf = expiry; + return e; + } + entry *new_entry = entryUpdate(e, NULL, expiry); + return new_entry; +} + +/* Return true in case the entry has assigned expiration or false otherwise. */ +bool entryIsExpired(entry *entry) { + return timestampIsExpired(entryGetExpiry(entry)); +} +/**************************************** Entry Expiry API - End *****************************************/ + +void entryFree(entry *entry) { + if (entryHasValuePtr(entry)) { + sdsfree(entryGetValue(entry)); + } + zfree(entryGetAllocPtr(entry)); +} + +static inline size_t entryReqSize(const_sds field, + sds value, + long long expiry, + bool *is_value_embedded, + int *field_sds_type, + size_t *field_size, + size_t *expiry_size, + size_t *embedded_value_size) { + size_t expiry_alloc_size = (expiry == EXPIRY_NONE) ? 0 : sizeof(long long); + size_t field_len = sdslen(field); + int embedded_field_sds_type = sdsReqType(field_len); + if (embedded_field_sds_type == SDS_TYPE_5 && (expiry_alloc_size > 0)) { + embedded_field_sds_type = SDS_TYPE_8; + } + size_t field_alloc_size = sdsReqSize(field_len, embedded_field_sds_type); + size_t value_len = value ? sdslen(value) : 0; + size_t embedded_value_alloc_size = value ? sdsReqSize(value_len, SDS_TYPE_8) : 0; + size_t alloc_size = field_alloc_size + expiry_alloc_size; + bool embed_value = false; + if (value) { + if (alloc_size + embedded_value_alloc_size <= EMBED_VALUE_MAX_ALLOC_SIZE) { + /* Embed field and value. Value is fixed to SDS_TYPE_8. Unused + * allocation space is recorded in the embedded value's SDS header. + * + * +------+--------------+---------------+ + * | TTL | field | value | + * | | hdr "foo" \0 | hdr8 "bar" \0 | + * +------+--------------+---------------+ + */ + embed_value = true; + alloc_size += embedded_value_alloc_size; + } else { + /* Embed field, but not value. Field must be >= SDS_TYPE_8 to encode to + * indicate this type of entry. + * + * +------+-------+---------------+ + * | TTL | value | field | + * | | ptr | hdr8 "foo" \0 | + * +------+-------+---------------+ + */ + embed_value = false; + alloc_size += sizeof(sds); + if (embedded_field_sds_type == SDS_TYPE_5) { + embedded_field_sds_type = SDS_TYPE_8; + alloc_size -= field_alloc_size; + field_alloc_size = sdsReqSize(field_len, embedded_field_sds_type); + alloc_size += field_alloc_size; + } + } + } + if (expiry_size) *expiry_size = expiry_alloc_size; + if (field_sds_type) *field_sds_type = embedded_field_sds_type; + if (field_size) *field_size = field_alloc_size; + if (embedded_value_size) *embedded_value_size = embedded_value_alloc_size; + if (is_value_embedded) *is_value_embedded = embed_value; + + return alloc_size; +} + +/* Serialize the content of the entry into the provided buffer buf. Make use of the provided arguments provided by a call to entryReqSize. + * Note that this function will take ownership of the value so user should not assume it is valid after this call. */ +static entry *entryWrite(char *buf, + size_t buf_size, + const_sds field, + sds value, + long long expiry, + bool embed_value, + int embedded_field_sds_type, + size_t embedded_field_sds_size, + size_t embedded_value_sds_size, + size_t expiry_size) { + /* Set The expiry if exists */ + if (expiry_size) { + *(long long *)buf = expiry; + buf += expiry_size; + buf_size -= expiry_size; + } + if (value) { + if (!embed_value) { + *(sds *)buf = value; + buf += sizeof(sds); + buf_size -= sizeof(sds); + } else { + sdswrite(buf + embedded_field_sds_size, buf_size - embedded_field_sds_size, SDS_TYPE_8, value, sdslen(value)); + sdsfree(value); + buf_size -= embedded_value_sds_size; + } + } + /* Set the field data */ + entry *new_entry = sdswrite(buf, embedded_field_sds_size, embedded_field_sds_type, field, sdslen(field)); + + /* Field sds aux bits are zero, which we use for this entry encoding. */ + sdsSetAuxBit(new_entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR, embed_value ? 0 : 1); + sdsSetAuxBit(new_entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_EXPIRY, expiry_size > 0 ? 1 : 0); + + /* Check that the new entry was built correctly */ + debugServerAssert(sdsGetAuxBit(new_entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR) == (embed_value ? 0 : 1)); + debugServerAssert(sdsGetAuxBit(new_entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_EXPIRY) == (expiry_size > 0 ? 1 : 0)); + return new_entry; +} + +/* Takes ownership of value. does not take ownership of field */ +entry *entryCreate(const_sds field, sds value, long long expiry) { + bool embed_value = false; + int embedded_field_sds_type; + size_t expiry_size, embedded_value_sds_size, embedded_field_sds_size; + size_t alloc_size = entryReqSize(field, value, expiry, &embed_value, &embedded_field_sds_type, &embedded_field_sds_size, &expiry_size, &embedded_value_sds_size); + size_t buf_size; + + /* allocate the buffer */ + char *buf = zmalloc_usable(alloc_size, &buf_size); + + return entryWrite(buf, buf_size, field, value, expiry, embed_value, embedded_field_sds_type, embedded_field_sds_size, embedded_value_sds_size, expiry_size); +} + +/* Modify the entry's value and/or expiration time. + * In case the provided value is NULL, will use the existing value. + * Note that the value ownership is moved to this function and the caller should assume the + * value is no longer usable after calling this function. */ +entry *entryUpdate(entry *e, sds value, long long expiry) { + sds field = (sds)e; + entry *new_entry = NULL; + + bool update_value = value ? true : false; + long long curr_expiration_time = entryGetExpiry(e); + bool update_expiry = (expiry != curr_expiration_time) ? true : false; + /* Just a sanity check. If nothing changes, lets just return */ + if (!update_value && !update_expiry) + return e; + + if (!value) value = entryGetValue(e); + bool embed_value = false; + int embedded_field_sds_type; + size_t expiry_size, embedded_value_size, embedded_field_size; + size_t required_entry_size = entryReqSize(field, value, expiry, &embed_value, &embedded_field_sds_type, &embedded_field_size, &expiry_size, &embedded_value_size); + size_t current_embedded_allocation_size = entryHasValuePtr(e) ? 0 : entryMemUsage(e); + + bool expiry_add_remove = update_expiry && (curr_expiration_time == EXPIRY_NONE || expiry == EXPIRY_NONE); // In case we are toggling expiration + bool value_change_encoding = update_value && (embed_value != entryHasEmbeddedValue(e)); // In case we change the way value is embedded or not + + + /* We will create a new entry in the following cases: + * 1. In the case were we add or remove expiration. + * 2. We change the way value is encoded + * 3. in the case were we are NOT migrating from an embedded entry to an embedded entry with ~the same size. */ + bool create_new_entry = (expiry_add_remove) || (value_change_encoding) || + (update_value && entryHasEmbeddedValue(e) && + !(required_entry_size <= EMBED_VALUE_MAX_ALLOC_SIZE && + required_entry_size <= current_embedded_allocation_size && + required_entry_size >= current_embedded_allocation_size * 3 / 4)); + + if (!create_new_entry) { + /* In this case we are sure we do not have to allocate new entry, so expiry must already be set. */ + if (update_expiry) { + serverAssert(entryHasExpiry(e)); + char *buf = entryGetAllocPtr(e); + *(long long *)buf = expiry; + } + /* In this case we are sure we do not have to allocate new entry, so value must already be set or we have enough room to embed it. */ + if (update_value) { + if (entryHasValuePtr(e)) { + sds *value_ref = entryGetValueRef(e); + sdsfree(*value_ref); + *value_ref = value; + } else { + /* Skip field content, field null terminator and value sds8 hdr. */ + sds old_value = entryGetValue(e); + /* We are using the same entry memory in order to store a potentially new value. + * In such cases the old value alloc was adjusted to the real buffer size part it was embedded to. + * Since we can potentially write here a smaller value, which requires less allocation space, we would like to + * inherit the old value memory allocation size. */ + size_t value_size = sdsHdrSize(SDS_TYPE_8) + sdsalloc(old_value) + 1; + sdswrite(sdsAllocPtr(old_value), value_size, SDS_TYPE_8, value, sdslen(value)); + sdsfree(value); + } + } + new_entry = e; + + } else { + if (!update_value) { + /* Check if the value can be reused. */ + int value_was_embedded = !entryHasValuePtr(e); + /* In case the original entry value is embedded WE WILL HAVE TO DUPLICATE IT + * if not we have to duplicate it, remove it from the original entry since we are going to delete it.*/ + if (value_was_embedded) { + value = sdsdup(value); + } else { + sds *value_ref = entryGetValueRef(e); + *value_ref = NULL; + } + } + /* allocate the buffer for a new entry */ + size_t buf_size; + char *buf = zmalloc_usable(required_entry_size, &buf_size); + new_entry = entryWrite(buf, buf_size, entryGetField(e), value, expiry, embed_value, embedded_field_sds_type, embedded_field_size, embedded_value_size, expiry_size); + debugServerAssert(new_entry != e); + entryFree(e); + } + /* Check that the new entry was built correctly */ + debugServerAssert(sdsGetAuxBit(new_entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR) == (embed_value ? 0 : 1)); + debugServerAssert(sdsGetAuxBit(new_entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_EXPIRY) == (expiry_size > 0 ? 1 : 0)); + serverAssert(new_entry); + return new_entry; +} + +/* Returns memory usage of a entry, including all allocations owned by + * the entry. */ +size_t entryMemUsage(entry *entry) { + size_t mem = 0; + + if (entryHasValuePtr(entry)) { + /* In case the value is not embedded we might not be able to sum all the allocation sizes since the field + * header could be too small for holding the real allocation size. */ + mem += zmalloc_usable_size(entryGetAllocPtr(entry)); + } else { + mem += sdsReqSize(sdslen(entry), sdsType(entry)); + if (entryHasExpiry(entry)) mem += sizeof(long long); + } + mem += sdsAllocSize(entryGetValue(entry)); + return mem; +} + +/* Defragments a hashtable entry (field-value pair) if needed, using the + * provided defrag functions. The defrag functions return NULL if the allocation + * was not moved, otherwise they return a pointer to the new memory location. + * A separate sds defrag function is needed because of the unique memory layout + * of sds strings. + * If the location of the entry changed we return the new location, + * otherwise we return NULL. */ +entry *entryDefrag(entry *entry, void *(*defragfn)(void *), sds (*sdsdefragfn)(sds)) { + if (entryHasValuePtr(entry)) { + sds *value_ref = entryGetValueRef(entry); + sds new_value = sdsdefragfn(*value_ref); + if (new_value) *value_ref = new_value; + } + char *allocation = entryGetAllocPtr(entry); + char *new_allocation = defragfn(allocation); + if (new_allocation != NULL) { + /* Return the same offset into the new allocation as the entry's offset + * in the old allocation. */ + return new_allocation + ((char *)entry - allocation); + } + return NULL; +} + +/* Used for releasing memory to OS to avoid unnecessary CoW. Called when we've + * forked and memory won't be used again. See zmadvise_dontneed() */ +void entryDismissMemory(entry *entry) { + /* Only dismiss values memory since the field size usually is small. */ + if (entryHasValuePtr(entry)) { + dismissSds(*entryGetValueRef(entry)); + } +} diff --git a/src/entry.h b/src/entry.h new file mode 100644 index 00000000000..f23f3dfc7b1 --- /dev/null +++ b/src/entry.h @@ -0,0 +1,94 @@ +#ifndef _ENTRY_H_ +#define _ENTRY_H_ + +#include "sds.h" +#include + +/*----------------------------------------------------------------------------- + * Entry + *----------------------------------------------------------------------------*/ + +/* + * The entry pointer is the field `sds`. We encode the entry layout type + * in the SDS header. + * + * An entry represents a key–value pair with an optional expiration timestamp. + * The pointer of type `entry *` always points to the VALUE `sds`. + * + * Layout 1: Embedded Field and Value (Compact Form) + * + * +-------------------+-------------------+-------------------+ + * | Expiration (opt) | Field (sds) | Value (sds) | + * | 8 bytes (int64_t) | "field" + header | "value" + header | + * +-------------------+-------------------+-------------------+ + * ^ + * | + * entry pointer + * + * - Both field and value are small and embedded. + * - The expiration is stored just before the first sds. + * + * + * Layout 2: Pointer-Based Value (Large Values) + * + * +-------------------+-------------------+------------------+ + * | Expiration (opt) | Value pointer | Field (sds) | + * | 8 bytes (int64_t) | 8 bytes (void *) | "field" + header | + * +-------------------+-------------------+------------------+ + * ^ + * | + * entry pointer + * + * - The value is stored separately via a pointer. + * - Used for large value sizes. */ +typedef void entry; + +/* The maximum allocation size we want to use for entries with embedded + * values. */ +#define EMBED_VALUE_MAX_ALLOC_SIZE 128 + +/* Returns the field string (sds) from the entry. */ +sds entryGetField(const entry *entry); + +/* Returns the value string (sds) from the entry. */ +sds entryGetValue(const entry *entry); + +/* Sets or replaces the value string in the entry. May reallocate and return a new pointer. */ +entry *entrySetValue(entry *entry, sds value); + +/* Gets the expiration timestamp (UNIX time in milliseconds). */ +long long entryGetExpiry(const entry *entry); + +/* Returns true if the entry has an expiration timestamp set. */ +bool entryHasExpiry(const entry *entry); + +/* Sets the expiration timestamp. */ +entry *entrySetExpiry(entry *entry, long long expiry); + +/* Returns true if the entry is expired compared to current system time (commandTimeSnapshot). */ +bool entryIsExpired(entry *entry); + +/* Frees the memory used by the entry (including field/value). */ +void entryFree(entry *entry); + +/* Creates a new entry with the given field, value, and optional expiry. */ +entry *entryCreate(const_sds field, sds value, long long expiry); + +/* Updates the value and/or expiry of an existing entry. + * In case value is NULL, will use the existing entry value. + * In case expiry is EXPIRE_NONE, will use the existing entry expiration time. */ +entry *entryUpdate(entry *entry, sds value, long long expiry); + +/* Returns the total memory used by the entry (in bytes). */ +size_t entryMemUsage(entry *entry); + +/* Defragments the entry and returns the new pointer (if moved). */ +entry *entryDefrag(entry *entry, void *(*defragfn)(void *), sds (*sdsdefragfn)(sds)); + +/* Advises allocator to dismiss memory used by entry. */ +void entryDismissMemory(entry *entry); + +/* Internal used for debug. No need to use this function except in tests */ +bool entryHasEmbeddedValue(entry *entry); + +#endif diff --git a/src/expire.c b/src/expire.c index a771454345d..d0a465979ec 100644 --- a/src/expire.c +++ b/src/expire.c @@ -537,23 +537,19 @@ int checkAlreadyExpired(long long when) { return (when <= commandTimeSnapshot() && !server.loading && !server.primary_host && !server.import_mode); } -#define EXPIRE_NX (1 << 0) -#define EXPIRE_XX (1 << 1) -#define EXPIRE_GT (1 << 2) -#define EXPIRE_LT (1 << 3) - -/* Parse additional flags of expire commands +/* Parse additional flags of expire commands up to the specify max_index. + * In case max_index will scan all arguments. * * Supported flags: * - NX: set expiry only when the key has no expiry * - XX: set expiry only when the key has an existing expiry * - GT: set expiry only when the new expiry is greater than current one * - LT: set expiry only when the new expiry is less than current one */ -int parseExtendedExpireArgumentsOrReply(client *c, int *flags) { +int parseExtendedExpireArgumentsOrReply(client *c, int *flags, int max_args) { int nx = 0, xx = 0, gt = 0, lt = 0; int j = 3; - while (j < c->argc) { + while (j < max_args) { char *opt = c->argv[j]->ptr; if (!strcasecmp(opt, "nx")) { *flags |= EXPIRE_NX; @@ -587,6 +583,32 @@ int parseExtendedExpireArgumentsOrReply(client *c, int *flags) { return C_OK; } +int convertExpireArgumentToUnixTime(client *c, robj *arg, long long basetime, int unit, long long *unixtime) { + long long when; + if (getLongLongFromObjectOrReply(c, arg, &when, NULL) != C_OK) return C_ERR; + + if (when < 0) { + addReplyErrorExpireTime(c); + return C_ERR; + } + + if (unit == UNIT_SECONDS) { + if (when > LLONG_MAX / 1000 || when < LLONG_MIN / 1000) { + addReplyErrorExpireTime(c); + return C_ERR; + } + when *= 1000; + } + if (when > LLONG_MAX - basetime) { + addReplyErrorExpireTime(c); + return C_ERR; + } + when += basetime; + debugServerAssert(unixtime); + *unixtime = when; + return C_OK; +} + /*----------------------------------------------------------------------------- * Expires Commands *----------------------------------------------------------------------------*/ @@ -607,7 +629,7 @@ void expireGenericCommand(client *c, long long basetime, int unit) { int flag = 0; /* checking optional flags */ - if (parseExtendedExpireArgumentsOrReply(c, &flag) != C_OK) { + if (parseExtendedExpireArgumentsOrReply(c, &flag, c->argc) != C_OK) { return; } @@ -795,3 +817,66 @@ void touchCommand(client *c) { if (lookupKeyRead(c->db, c->argv[j]) != NULL) touched++; addReplyLongLong(c, touched); } + +/* Returns 1 if the expire value is expired, 0 otherwise. */ +bool timestampIsExpired(mstime_t when) { + if (when < 0) return false; /* no expire */ + mstime_t now = commandTimeSnapshot(); + + /* The time indicated by 'when' is considered expired if the current (virtual or real) time is greater + * than it. */ + return now > when; +} + +/* This function verifies if the current conditions allow expiration of keys and fields. + * For some cases expiration is not allowed, but we would still like to ignore the key + * so to treat it as "expired" without actively deleting it. */ +expirationPolicy getExpirationPolicyWithFlags(int flags) { + if (server.loading) return POLICY_IGNORE_EXPIRE; + + /* If we are running in the context of a replica, instead of + * evicting the expired key from the database, we return ASAP: + * the replica key expiration is controlled by the primary that will + * send us synthesized DEL operations for expired keys. The + * exception is when write operations are performed on writable + * replicas. + * + * Still we try to reflect the correct state to the caller, + * that is, POLICY_KEEP_EXPIRED so that the key will be ignored, but not deleted. + * + * When replicating commands from the primary, keys are never considered + * expired, so we return POLICY_IGNORE_EXPIRE */ + if (server.primary_host != NULL) { + if (server.current_client && (server.current_client->flag.primary)) return POLICY_IGNORE_EXPIRE; + if (!(flags & EXPIRE_FORCE_DELETE_EXPIRED)) return POLICY_KEEP_EXPIRED; + } else if (server.import_mode) { + /* If we are running in the import mode on a primary, instead of + * evicting the expired key from the database, we return ASAP: + * the key expiration is controlled by the import source that will + * send us synthesized DEL operations for expired keys. The + * exception is when write operations are performed on this server + * because it's a primary. + * + * Notice: other clients, apart from the import source, should not access + * the data imported by import source. + * + * Still we try to reflect the correct state to the caller, + * that is, POLICY_KEEP_EXPIRED so that the key will be ignored, but not deleted. + * + * When receiving commands from the import source, keys are never considered + * expired, so we return POLICY_IGNORE_EXPIRE */ + if (server.current_client && (server.current_client->flag.import_source)) return POLICY_IGNORE_EXPIRE; + if (!(flags & EXPIRE_FORCE_DELETE_EXPIRED)) return POLICY_KEEP_EXPIRED; + } + + /* In some cases we're explicitly instructed to return an indication of a + * missing key without actually deleting it, even on primaries. */ + if (flags & EXPIRE_AVOID_DELETE_EXPIRED) return POLICY_KEEP_EXPIRED; + + /* If 'expire' action is paused, for whatever reason, then don't expire any key. + * Typically, at the end of the pause we will properly expire the key OR we + * will have failed over and the new primary will send us the expire. */ + if (isPausedActionsWithUpdate(PAUSE_ACTION_EXPIRE)) return POLICY_KEEP_EXPIRED; + + return POLICY_DELETE_EXPIRED; +} diff --git a/src/expire.h b/src/expire.h new file mode 100644 index 00000000000..11ef9d9c103 --- /dev/null +++ b/src/expire.h @@ -0,0 +1,47 @@ +#ifndef EXPIRE_H +#define EXPIRE_H + +#include +#include +#include "monotonic.h" + +/* Special Expiry values */ +#define EXPIRY_NONE -1 + +/* Flags for expireIfNeeded */ +#define EXPIRE_FORCE_DELETE_EXPIRED 1 +#define EXPIRE_AVOID_DELETE_EXPIRED 2 + +#define ACTIVE_EXPIRE_CYCLE_SLOW 0 +#define ACTIVE_EXPIRE_CYCLE_FAST 1 + +/* Command flags for items expiration update conditions */ +#define EXPIRE_NX (1 << 0) +#define EXPIRE_XX (1 << 1) +#define EXPIRE_GT (1 << 2) +#define EXPIRE_LT (1 << 3) + +/* Return values for expireIfNeeded */ +typedef enum { + KEY_VALID = 0, /* Could be volatile and not yet expired, non-volatile, or even nonexistent key. */ + KEY_EXPIRED, /* Logically expired but not yet deleted. */ + KEY_DELETED /* The key was deleted now. */ +} keyStatus; + +/* Return value for getExpirationPolicy */ +typedef enum { + POLICY_IGNORE_EXPIRE, /* Ignore expiration time of items and treat them as valid. */ + POLICY_KEEP_EXPIRED, /* Ignore items which are expired but do not actively delete them. */ + POLICY_DELETE_EXPIRED /* Delete expired keys on access. */ +} expirationPolicy; + +/* Forward declarations */ +typedef struct client client; +typedef struct serverObject robj; + +bool timestampIsExpired(mstime_t when); +expirationPolicy getExpirationPolicyWithFlags(int flags); +int parseExtendedExpireArgumentsOrReply(client *c, int *flags, int max_args); +int convertExpireArgumentToUnixTime(client *c, robj *arg, long long basetime, int unit, long long *unixtime); + +#endif diff --git a/src/hashtable.c b/src/hashtable.c index eb64fd97dd0..214df11e7ee 100644 --- a/src/hashtable.c +++ b/src/hashtable.c @@ -368,6 +368,12 @@ typedef struct { /* --- Internal functions --- */ +/* --- Access API --- */ +static inline bool validateElementIfNeeded(hashtable *ht, void *elem) { + if (ht->type->validateEntry == NULL) return true; + return ht->type->validateEntry(ht, elem); +} + static bucket *findBucketForInsert(hashtable *ht, uint64_t hash, int *pos_in_bucket, int *table_index); static inline void freeEntry(hashtable *ht, void *entry) { @@ -690,6 +696,9 @@ static inline int checkCandidateInBucket(hashtable *ht, bucket *b, int pos, cons if (compareKeys(ht, key, elem_key) == 0) { /* It's a match. */ assert(pos_in_bucket != NULL); + if (!validateElementIfNeeded(ht, entry)) { + return 0; + } *pos_in_bucket = pos; if (table_index) *table_index = table; return 1; @@ -1132,6 +1141,15 @@ hashtableType *hashtableGetType(hashtable *ht) { return ht->type; } +/* Set the hashtable type and returns the old type of the hashtable. + * NOTE that changing the hashtable type can lead to unexpected results. + * For example, changing the hash function can impact the ability to correctly fetch elements. */ +hashtableType *hashtableSetType(hashtable *ht, hashtableType *type) { + hashtableType *oldtype = ht->type; + ht->type = type; + return oldtype; +} + /* Returns a pointer to the table's metadata (userdata) section. */ void *hashtableMetadata(hashtable *ht) { return &ht->metadata; @@ -1785,7 +1803,7 @@ size_t hashtableScanDefrag(hashtable *ht, size_t cursor, hashtableScanFunction f if (b->presence != 0) { int pos; for (pos = 0; pos < ENTRIES_PER_BUCKET; pos++) { - if (isPositionFilled(b, pos)) { + if (isPositionFilled(b, pos) && validateElementIfNeeded(ht, b->entries[pos])) { void *emit = emit_ref ? &b->entries[pos] : b->entries[pos]; fn(privdata, emit); } @@ -1827,7 +1845,7 @@ size_t hashtableScanDefrag(hashtable *ht, size_t cursor, hashtableScanFunction f do { if (b->presence) { for (int pos = 0; pos < ENTRIES_PER_BUCKET; pos++) { - if (isPositionFilled(b, pos)) { + if (isPositionFilled(b, pos) && validateElementIfNeeded(ht, b->entries[pos])) { void *emit = emit_ref ? &b->entries[pos] : b->entries[pos]; fn(privdata, emit); } @@ -1857,7 +1875,7 @@ size_t hashtableScanDefrag(hashtable *ht, size_t cursor, hashtableScanFunction f do { if (b->presence) { for (int pos = 0; pos < ENTRIES_PER_BUCKET; pos++) { - if (isPositionFilled(b, pos)) { + if (isPositionFilled(b, pos) && validateElementIfNeeded(ht, b->entries[pos])) { void *emit = emit_ref ? &b->entries[pos] : b->entries[pos]; fn(privdata, emit); } @@ -2047,6 +2065,9 @@ int hashtableNext(hashtableIterator *iterator, void **elemptr) { /* No entry here. */ continue; } + if (!(iter->flags & HASHTABLE_ITER_SKIP_VALIDATION) && !validateElementIfNeeded(iter->hashtable, b->entries[iter->pos_in_bucket])) { + continue; + } /* Return the entry at this position. */ if (elemptr) { *elemptr = b->entries[iter->pos_in_bucket]; diff --git a/src/hashtable.h b/src/hashtable.h index ff02077fc82..3e8ec08dddd 100644 --- a/src/hashtable.h +++ b/src/hashtable.h @@ -31,6 +31,7 @@ #include #include #include +#include /* --- Opaque types --- */ @@ -57,6 +58,8 @@ typedef struct { /* Compare function, returns 0 if the keys are equal. Defaults to just * comparing the pointers for equality. */ int (*keyCompare)(const void *key1, const void *key2); + /* Check for entry access should be masked or not. Masked access will just treat the entry as not-exist. */ + bool (*validateEntry)(hashtable *ht, void *entry); /* Callback to free an entry when it's overwritten or deleted. * Optional. */ void (*entryDestructor)(void *entry); @@ -77,6 +80,7 @@ typedef struct { size_t (*getMetadataSize)(void); /* Flag to disable incremental rehashing */ unsigned instant_rehashing : 1; + } hashtableType; typedef enum { @@ -96,6 +100,7 @@ typedef void (*hashtableScanFunction)(void *privdata, void *entry); /* Iterator flags */ #define HASHTABLE_ITER_SAFE (1 << 0) #define HASHTABLE_ITER_PREFETCH_VALUES (1 << 1) +#define HASHTABLE_ITER_SKIP_VALIDATION (1 << 2) /* --- Prototypes --- */ @@ -113,6 +118,7 @@ hashtable *hashtableCreate(hashtableType *type); void hashtableRelease(hashtable *ht); void hashtableEmpty(hashtable *ht, void(callback)(hashtable *)); hashtableType *hashtableGetType(hashtable *ht); +hashtableType *hashtableSetType(hashtable *ht, hashtableType *type); void *hashtableMetadata(hashtable *ht); size_t hashtableSize(const hashtable *ht); size_t hashtableBuckets(hashtable *ht); diff --git a/src/module.c b/src/module.c index cbc5632ab96..080eec240f5 100644 --- a/src/module.c +++ b/src/module.c @@ -5350,11 +5350,11 @@ int VM_HashSet(ValkeyModuleKey *key, int flags, ...) { /* If CFIELDS is active, we can pass the ownership of the * SDS object to the low level function that sets the field * to avoid a useless copy. */ - if (flags & VALKEYMODULE_HASH_CFIELDS) low_flags |= HASH_SET_TAKE_FIELD; + if (flags & VALKEYMODULE_HASH_CFIELDS) low_flags |= (HASH_SET_TAKE_FIELD); robj *argv[2] = {field, value}; hashTypeTryConversion(key->value, argv, 0, 1); - int updated = hashTypeSet(key->value, field->ptr, value->ptr, low_flags); + int updated = hashTypeSet(key->value, field->ptr, value->ptr, EXPIRY_NONE, low_flags); count += (flags & VALKEYMODULE_HASH_COUNT_ALL) ? 1 : updated; /* If CFIELDS is active, SDS string ownership is now of hashTypeSet(), @@ -11224,8 +11224,8 @@ static void moduleScanKeyHashtableCallback(void *privdata, void *entry) { key = node->ele; value = createStringObjectFromLongDouble(node->score, 0); } else if (o->type == OBJ_HASH) { - key = hashTypeEntryGetField(entry); - sds val = hashTypeEntryGetValue(entry); + key = entryGetField(entry); + sds val = entryGetValue(entry); value = createStringObject(val, sdslen(val)); } else { serverPanic("unexpected object type"); diff --git a/src/monotonic.h b/src/monotonic.h index b465f90b109..2880cda858b 100644 --- a/src/monotonic.h +++ b/src/monotonic.h @@ -20,6 +20,8 @@ * variable is associated with the monotonic clock and should not be confused * with other types of time.*/ typedef uint64_t monotime; +typedef long long mstime_t; /* millisecond time type. */ +typedef long long ustime_t; /* microsecond time type. */ /* Retrieve counter of micro-seconds relative to an arbitrary point in time. */ extern monotime (*getMonotonicUs)(void); diff --git a/src/object.c b/src/object.c index 34a971e52a1..d85699b7cb4 100644 --- a/src/object.c +++ b/src/object.c @@ -527,7 +527,10 @@ void freeZsetObject(robj *o) { void freeHashObject(robj *o) { switch (o->encoding) { - case OBJ_ENCODING_HASHTABLE: hashtableRelease((hashtable *)o->ptr); break; + case OBJ_ENCODING_HASHTABLE: + hashTypeFreeVolatileSet(o); + hashtableRelease((hashtable *)o->ptr); + break; case OBJ_ENCODING_LISTPACK: lpFree(o->ptr); break; default: serverPanic("Unknown hash encoding type"); break; } @@ -682,7 +685,7 @@ void dismissHashObject(robj *o, size_t size_hint) { hashtableInitIterator(&iter, ht, 0); void *next; while (hashtableNext(&iter, &next)) { - dismissHashTypeEntry(next); + entryDismissMemory(next); } hashtableResetIterator(&iter); } @@ -1203,7 +1206,7 @@ size_t objectComputeSize(robj *key, robj *o, size_t sample_size, int dbid) { asize = zmalloc_size((void *)o) + hashtableMemUsage(ht); while (hashtableNext(&iter, &next) && samples < sample_size) { - elesize += hashTypeEntryMemUsage(next); + elesize += entryMemUsage(next); samples++; } hashtableResetIterator(&iter); diff --git a/src/rdb.c b/src/rdb.c index 0c8a42ef4dd..6ec4e064dd7 100644 --- a/src/rdb.c +++ b/src/rdb.c @@ -32,6 +32,7 @@ * SPDX-License-Identifier: BSD-3-Clause */ +#include "hashtable.h" #include "server.h" #include "lzf.h" /* LZF compression library */ #include "zipmap.h" @@ -717,7 +718,10 @@ int rdbSaveObjectType(rio *rdb, robj *o) { if (o->encoding == OBJ_ENCODING_LISTPACK) return rdbSaveType(rdb, RDB_TYPE_HASH_LISTPACK); else if (o->encoding == OBJ_ENCODING_HASHTABLE) - return rdbSaveType(rdb, RDB_TYPE_HASH); + if (hashTypeHasVolatileElements(o)) + return rdbSaveType(rdb, RDB_TYPE_HASH_2); + else + return rdbSaveType(rdb, RDB_TYPE_HASH); else serverPanic("Unknown hash encoding"); case OBJ_STREAM: return rdbSaveType(rdb, RDB_TYPE_STREAM_LISTPACKS_3); @@ -840,7 +844,6 @@ size_t rdbSaveStreamConsumers(rio *rdb, streamCG *cg) { * Returns -1 on error, number of bytes written on success. */ ssize_t rdbSaveObject(rio *rdb, robj *o, robj *key, int dbid) { ssize_t n = 0, nwritten = 0; - if (o->type == OBJ_STRING) { /* Save a string value */ if ((n = rdbSaveStringObject(rdb, o)) == -1) return -1; @@ -963,13 +966,14 @@ ssize_t rdbSaveObject(rio *rdb, robj *o, robj *key, int dbid) { return -1; } nwritten += n; - + /* check if need to add expired time for the hash elements */ + bool add_expiry = hashTypeHasVolatileElements(o); hashtableIterator iter; - hashtableInitIterator(&iter, ht, 0); + hashtableInitIterator(&iter, ht, HASHTABLE_ITER_SKIP_VALIDATION); void *next; while (hashtableNext(&iter, &next)) { - sds field = hashTypeEntryGetField(next); - sds value = hashTypeEntryGetValue(next); + sds field = entryGetField(next); + sds value = entryGetValue(next); if ((n = rdbSaveRawString(rdb, (unsigned char *)field, sdslen(field))) == -1) { hashtableResetIterator(&iter); @@ -981,8 +985,17 @@ ssize_t rdbSaveObject(rio *rdb, robj *o, robj *key, int dbid) { return -1; } nwritten += n; + if (add_expiry) { + long long expiry = entryGetExpiry(next); + if ((n = rdbSaveMillisecondTime(rdb, expiry) == -1)) { + hashtableResetIterator(&iter); + return -1; + } + nwritten += n; + } } hashtableResetIterator(&iter); + } else { serverPanic("Unknown hash encoding"); } @@ -2073,7 +2086,7 @@ robj *rdbLoadObject(int rdbtype, rio *rdb, sds key, int dbid, int *error) { lpSafeToAdd(NULL, totelelen)) { zsetConvert(o, OBJ_ENCODING_LISTPACK); } - } else if (rdbtype == RDB_TYPE_HASH) { + } else if (rdbtype == RDB_TYPE_HASH || rdbtype == RDB_TYPE_HASH_2) { uint64_t len; sds field, value; hashtable *dupSearchHashtable = NULL; @@ -2084,8 +2097,8 @@ robj *rdbLoadObject(int rdbtype, rio *rdb, sds key, int dbid, int *error) { o = createHashObject(); - /* Too many entries? Use a hash table right from the start. */ - if (len > server.hash_max_listpack_entries) + /* Too many entries or hash object contains elements with expiry? Use a hash table right from the start. */ + if (len > server.hash_max_listpack_entries || rdbtype == RDB_TYPE_HASH_2) hashTypeConvert(o, OBJ_ENCODING_HASHTABLE); else if (deep_integrity_validation) { /* In this mode, we need to guarantee that the server won't crash @@ -2126,21 +2139,23 @@ robj *rdbLoadObject(int rdbtype, rio *rdb, sds key, int dbid, int *error) { } /* Convert to hash table if size threshold is exceeded */ - if (sdslen(field) > server.hash_max_listpack_value || sdslen(value) > server.hash_max_listpack_value || - !lpSafeToAdd(o->ptr, sdslen(field) + sdslen(value))) { + if (o->encoding != OBJ_ENCODING_HASHTABLE && + (sdslen(field) > server.hash_max_listpack_value || sdslen(value) > server.hash_max_listpack_value || + !lpSafeToAdd(o->ptr, sdslen(field) + sdslen(value)))) { hashTypeConvert(o, OBJ_ENCODING_HASHTABLE); - hashTypeEntry *entry = hashTypeCreateEntry(field, value); + entry *entry = entryCreate(field, value, EXPIRY_NONE); sdsfree(field); if (!hashtableAdd((hashtable *)o->ptr, entry)) { rdbReportCorruptRDB("Duplicate hash fields detected"); if (dupSearchHashtable) hashtableRelease(dupSearchHashtable); - freeHashTypeEntry(entry); + entryFree(entry); decrRefCount(o); return NULL; } break; } + /* Add pair to listpack */ o->ptr = lpAppend(o->ptr, (unsigned char *)field, sdslen(field)); o->ptr = lpAppend(o->ptr, (unsigned char *)value, sdslen(value)); @@ -2178,15 +2193,26 @@ robj *rdbLoadObject(int rdbtype, rio *rdb, sds key, int dbid, int *error) { return NULL; } + /* Also load the entry expiry */ + long long itemexpiry = EXPIRY_NONE; + if (rdbtype == RDB_TYPE_HASH_2) { + itemexpiry = rdbLoadMillisecondTime(rdb, RDB_VERSION); + if (itemexpiry < EXPIRY_NONE || rioGetReadError(rdb)) return NULL; + } + /* Add pair to hash table */ - hashTypeEntry *entry = hashTypeCreateEntry(field, value); + entry *entry = entryCreate(field, value, itemexpiry); sdsfree(field); if (!hashtableAdd((hashtable *)o->ptr, entry)) { rdbReportCorruptRDB("Duplicate hash fields detected"); - freeHashTypeEntry(entry); + entryFree(entry); decrRefCount(o); return NULL; } + + if (rdbtype == RDB_TYPE_HASH_2 && itemexpiry > 0) { + hashTypeTrackEntry(o, entry); + } } /* All pairs should be read by now */ diff --git a/src/rdb.h b/src/rdb.h index 9f19a3a9eca..1253c3fd059 100644 --- a/src/rdb.h +++ b/src/rdb.h @@ -90,32 +90,36 @@ static_assert(RDB_VERSION < RDB_FOREIGN_VERSION_MIN || RDB_VERSION > RDB_FOREIGN /* Map object types to RDB object types. Macros starting with OBJ_ are for * memory storage and may change. Instead RDB types must be fixed because * we store them on disk. */ -#define RDB_TYPE_STRING 0 -#define RDB_TYPE_LIST 1 -#define RDB_TYPE_SET 2 -#define RDB_TYPE_ZSET 3 -#define RDB_TYPE_HASH 4 -#define RDB_TYPE_ZSET_2 5 /* ZSET version 2 with doubles stored in binary. */ -#define RDB_TYPE_MODULE_PRE_GA 6 /* Used in 4.0 release candidates */ -#define RDB_TYPE_MODULE_2 7 /* Module value with annotations for parsing without \ +enum RdbType { + RDB_TYPE_STRING = 0, + RDB_TYPE_LIST = 1, + RDB_TYPE_SET = 2, + RDB_TYPE_ZSET = 3, + RDB_TYPE_HASH = 4, + RDB_TYPE_ZSET_2 = 5, /* ZSET version 2 with doubles stored in binary. */ + RDB_TYPE_MODULE_PRE_GA = 6, /* Used in 4.0 release candidates */ + RDB_TYPE_MODULE_2 = 7, /* Module value with annotations for parsing without \ the generating module being loaded. */ -#define RDB_TYPE_HASH_ZIPMAP 9 -#define RDB_TYPE_LIST_ZIPLIST 10 -#define RDB_TYPE_SET_INTSET 11 -#define RDB_TYPE_ZSET_ZIPLIST 12 -#define RDB_TYPE_HASH_ZIPLIST 13 -#define RDB_TYPE_LIST_QUICKLIST 14 -#define RDB_TYPE_STREAM_LISTPACKS 15 -#define RDB_TYPE_HASH_LISTPACK 16 -#define RDB_TYPE_ZSET_LISTPACK 17 -#define RDB_TYPE_LIST_QUICKLIST_2 18 -#define RDB_TYPE_STREAM_LISTPACKS_2 19 -#define RDB_TYPE_SET_LISTPACK 20 -#define RDB_TYPE_STREAM_LISTPACKS_3 21 -/* NOTE: WHEN ADDING NEW RDB TYPE, UPDATE rdbIsObjectType(), and rdb_type_string[] */ + RDB_TYPE_HASH_ZIPMAP = 9, + RDB_TYPE_LIST_ZIPLIST = 10, + RDB_TYPE_SET_INTSET = 11, + RDB_TYPE_ZSET_ZIPLIST = 12, + RDB_TYPE_HASH_ZIPLIST = 13, + RDB_TYPE_LIST_QUICKLIST = 14, + RDB_TYPE_STREAM_LISTPACKS = 15, + RDB_TYPE_HASH_LISTPACK = 16, + RDB_TYPE_ZSET_LISTPACK = 17, + RDB_TYPE_LIST_QUICKLIST_2 = 18, + RDB_TYPE_STREAM_LISTPACKS_2 = 19, + RDB_TYPE_SET_LISTPACK = 20, + RDB_TYPE_STREAM_LISTPACKS_3 = 21, + RDB_TYPE_HASH_2 = 22, + RDB_TYPE_LAST +}; +/* NOTE: WHEN ADDING NEW RDB TYPE, UPDATE rdb_type_string[] */ /* Test if a type is an object type. */ -#define rdbIsObjectType(t) (((t) >= 0 && (t) <= 7) || ((t) >= 9 && (t) <= 21)) +#define rdbIsObjectType(t) (((t) >= 0 && (t) <= 7) || ((t) >= 9 && (t) < RDB_TYPE_LAST)) /* Special RDB opcodes (saved/loaded with rdbSaveType/rdbLoadType). */ #define RDB_OPCODE_FUNCTION2 245 /* function library data */ diff --git a/src/server.c b/src/server.c index e5d8acca2e6..75495ab80ec 100644 --- a/src/server.c +++ b/src/server.c @@ -664,20 +664,34 @@ hashtableType subcommandSetType = {.entryGetKey = hashtableSubcommandGetKey, /* Hash type hash table (note that small hashes are represented with listpacks) */ const void *hashHashtableTypeGetKey(const void *entry) { - const hashTypeEntry *hash_entry = entry; - return (const void *)hashTypeEntryGetField(hash_entry); + return (const void *)entryGetField(entry); } void hashHashtableTypeDestructor(void *entry) { - hashTypeEntry *hash_entry = entry; - freeHashTypeEntry(hash_entry); + entryFree(entry); } +size_t hashHashtableTypeMetadataSize(void) { + return sizeof(void *); +} + +extern bool hashHashtableTypeValidate(hashtable *ht, void *entry); + hashtableType hashHashtableType = { .hashFunction = dictSdsHash, .entryGetKey = hashHashtableTypeGetKey, .keyCompare = hashtableSdsKeyCompare, .entryDestructor = hashHashtableTypeDestructor, + .getMetadataSize = hashHashtableTypeMetadataSize, +}; + +hashtableType hashWithVolatileItemsHashtableType = { + .hashFunction = dictSdsHash, + .entryGetKey = hashHashtableTypeGetKey, + .keyCompare = hashtableSdsKeyCompare, + .entryDestructor = hashHashtableTypeDestructor, + .getMetadataSize = hashHashtableTypeMetadataSize, + .validateEntry = hashHashtableTypeValidate, }; /* Hashtable type without destructor */ @@ -2135,6 +2149,9 @@ void createSharedObjects(void) { shared.multi = createSharedString("MULTI"); shared.exec = createSharedString("EXEC"); shared.hset = createSharedString("HSET"); + shared.hdel = createSharedString("HDEL"); + shared.hpexpireat = createSharedString("HPEXPIREAT"); + shared.hpersist = createSharedString("HPERSIST"); shared.srem = createSharedString("SREM"); shared.xgroup = createSharedString("XGROUP"); shared.xclaim = createSharedString("XCLAIM"); @@ -2167,6 +2184,7 @@ void createSharedObjects(void) { shared.special_asterisk = createSharedString("*"); shared.special_equals = createSharedString("="); shared.redacted = createSharedString("(redacted)"); + shared.fields = createSharedString("FIELDS"); for (j = 0; j < OBJ_SHARED_INTEGERS; j++) { shared.integers[j] = makeObjectShared(createObject(OBJ_STRING, (void *)(long)j)); @@ -7333,4 +7351,131 @@ __attribute__((weak)) int main(int argc, char **argv) { aeDeleteEventLoop(server.el); return 0; } + +/* + * The parseExtendedCommandArgumentsOrReply() function performs the common validation for extended + * command arguments used in STRING and HASH commands. + * + * Get specific command extended options - PERSIST/DEL + * Set specific command extended options - XX/NX/GET/IFEQ + * HSET specific command extended options - FXX/FNX + * Common command extended options - EX/EXAT/PX/PXAT/KEEPTTL + * + * Function takes pointers to client, flags, unit, pointer to pointer of expire obj if needed + * to be determined and command_type which can be COMMAND_GET or COMMAND_SET. + * + * If there are any syntax violations C_ERR is returned else C_OK is returned. + * + * Input flags are updated upon parsing the arguments. Unit and expire are updated if there are any + * EX/EXAT/PX/PXAT arguments. Unit is updated to millisecond if PX/PXAT is set. + * + * max_args provides a way to limit the scan to a specific range of arguments. + */ +int parseExtendedCommandArgumentsOrReply(client *c, int *flags, int *unit, robj **expire, robj **compare_val, int command_type, int max_args) { + int j = command_type == COMMAND_SET ? 3 : 2; + for (; j < max_args; j++) { + char *opt = c->argv[j]->ptr; + robj *next = (j == max_args - 1) ? NULL : c->argv[j + 1]; + + /* clang-format off */ + if ((opt[0] == 'n' || opt[0] == 'N') && + (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && + !(*flags & ARGS_SET_XX || *flags & ARGS_SET_IFEQ) && (command_type == COMMAND_SET)) + { + *flags |= ARGS_SET_NX; + } else if ((opt[0] == 'x' || opt[0] == 'X') && + (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && + !(*flags & ARGS_SET_NX || *flags & ARGS_SET_IFEQ) && (command_type == COMMAND_SET)) + { + *flags |= ARGS_SET_XX; + } else if ((opt[0] == 'f' || opt[0] == 'F') && + (opt[1] == 'n' || opt[1] == 'N') && + (opt[2] == 'x' || opt[2] == 'X') && opt[3] == '\0' && + !(*flags & ARGS_SET_FXX || *flags & ARGS_SET_IFEQ) && (command_type == COMMAND_HSET)) + { + *flags |= ARGS_SET_FNX; + } else if ((opt[0] == 'f' || opt[0] == 'F') && + (opt[1] == 'x' || opt[1] == 'X') && + (opt[2] == 'x' || opt[2] == 'X') && opt[3] == '\0' && + !(*flags & ARGS_SET_FNX || *flags & ARGS_SET_IFEQ) && (command_type == COMMAND_HSET)) + { + *flags |= ARGS_SET_FXX; + } else if ((opt[0] == 'i' || opt[0] == 'I') && + (opt[1] == 'f' || opt[1] == 'F') && + (opt[2] == 'e' || opt[2] == 'E') && + (opt[3] == 'q' || opt[3] == 'Q') && opt[4] == '\0' && + next && + !(*flags & ARGS_SET_NX || *flags & ARGS_SET_XX || *flags & ARGS_SET_IFEQ) && (command_type == COMMAND_SET)) + { + *flags |= ARGS_SET_IFEQ; + *compare_val = next; + j++; + } else if ((opt[0] == 'g' || opt[0] == 'G') && + (opt[1] == 'e' || opt[1] == 'E') && + (opt[2] == 't' || opt[2] == 'T') && opt[3] == '\0' && + (command_type == COMMAND_SET)) + { + *flags |= ARGS_SET_GET; + } else if (!strcasecmp(opt, "KEEPTTL") && !(*flags & ARGS_PERSIST) && + !(*flags & ARGS_EX) && !(*flags & ARGS_EXAT) && + !(*flags & ARGS_PX) && !(*flags & ARGS_PXAT) && (command_type == COMMAND_SET || command_type == COMMAND_HSET)) + { + *flags |= ARGS_KEEPTTL; + } else if (!strcasecmp(opt,"PERSIST") && (command_type == COMMAND_GET || command_type == COMMAND_HGET) && + !(*flags & ARGS_EX) && !(*flags & ARGS_EXAT) && + !(*flags & ARGS_PX) && !(*flags & ARGS_PXAT) && + !(*flags & ARGS_KEEPTTL)) + { + *flags |= ARGS_PERSIST; + } else if ((opt[0] == 'e' || opt[0] == 'E') && + (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && + !(*flags & ARGS_KEEPTTL) && !(*flags & ARGS_PERSIST) && + !(*flags & ARGS_EXAT) && !(*flags & ARGS_PX) && + !(*flags & ARGS_PXAT) && next) + { + *flags |= ARGS_EX; + *expire = next; + j++; + } else if ((opt[0] == 'p' || opt[0] == 'P') && + (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && + !(*flags & ARGS_KEEPTTL) && !(*flags & ARGS_PERSIST) && + !(*flags & ARGS_EX) && !(*flags & ARGS_EXAT) && + !(*flags & ARGS_PXAT) && next) + { + *flags |= ARGS_PX; + *unit = UNIT_MILLISECONDS; + *expire = next; + j++; + } else if ((opt[0] == 'e' || opt[0] == 'E') && + (opt[1] == 'x' || opt[1] == 'X') && + (opt[2] == 'a' || opt[2] == 'A') && + (opt[3] == 't' || opt[3] == 'T') && opt[4] == '\0' && + !(*flags & ARGS_KEEPTTL) && !(*flags & ARGS_PERSIST) && + !(*flags & ARGS_EX) && !(*flags & ARGS_PX) && + !(*flags & ARGS_PXAT) && next) + { + *flags |= ARGS_EXAT; + *expire = next; + j++; + } else if ((opt[0] == 'p' || opt[0] == 'P') && + (opt[1] == 'x' || opt[1] == 'X') && + (opt[2] == 'a' || opt[2] == 'A') && + (opt[3] == 't' || opt[3] == 'T') && opt[4] == '\0' && + !(*flags & ARGS_KEEPTTL) && !(*flags & ARGS_PERSIST) && + !(*flags & ARGS_EX) && !(*flags & ARGS_EXAT) && + !(*flags & ARGS_PX) && next) + { + *flags |= ARGS_PXAT; + *unit = UNIT_MILLISECONDS; + *expire = next; + j++; + } else { + addReplyErrorObject(c,shared.syntaxerr); + return C_ERR; + } + /* clang-format on */ + } + return C_OK; +} + /* The End */ diff --git a/src/server.h b/src/server.h index 4f79793f5b6..a4e60788595 100644 --- a/src/server.h +++ b/src/server.h @@ -62,9 +62,6 @@ #define static_assert(expr, lit) extern char __static_assert_failure[(expr) ? 1 : -1] #endif -typedef long long mstime_t; /* millisecond time type. */ -typedef long long ustime_t; /* microsecond time type. */ - #include "ae.h" /* Event driven programming library */ #include "sds.h" /* Dynamic safe strings */ #include "dict.h" /* Hash tables (old implementation) */ @@ -79,10 +76,13 @@ typedef long long ustime_t; /* microsecond time type. */ #include "sparkline.h" /* ASCII graphs API */ #include "quicklist.h" /* Lists are encoded as linked lists of N-elements flat arrays */ +#include "expire.h" /* Expiration public API */ #include "rax.h" /* Radix tree */ #include "connection.h" /* Connection abstraction */ #include "memory_prefetch.h" +#include "volatile_set.h" #include "trace/trace.h" +#include "entry.h" #ifdef USE_LTTNG #define valkey_fork() do_fork() @@ -162,9 +162,6 @@ struct hdr_histogram; #define CLIENT_MEM_USAGE_BUCKET_MAX_LOG 33 /* Bucket for largest clients: sizes above 4GB (2^32) */ #define CLIENT_MEM_USAGE_BUCKETS (1 + CLIENT_MEM_USAGE_BUCKET_MAX_LOG - CLIENT_MEM_USAGE_BUCKET_MIN_LOG) -#define ACTIVE_EXPIRE_CYCLE_SLOW 0 -#define ACTIVE_EXPIRE_CYCLE_FAST 1 - /* Children process will exit with this status code to signal that the * process terminated without an error: this is useful in order to kill * a saving child (RDB or AOF one), without triggering in the parent the @@ -220,6 +217,11 @@ struct hdr_histogram; extern int configOOMScoreAdjValuesDefaults[CONFIG_OOM_COUNT]; +#define COMMAND_GET 0 +#define COMMAND_SET 1 +#define COMMAND_HGET 2 +#define COMMAND_HSET 3 + /* Command flags. Please check the definition of struct serverCommand in this file * for more information about the meaning of every flag. */ #define CMD_WRITE (1ULL << 0) @@ -514,9 +516,6 @@ typedef enum { #define SUPERVISED_SYSTEMD 2 #define SUPERVISED_UPSTART 3 -/* Anti-warning macro... */ -#define UNUSED(V) ((void)V) - #define ZSKIPLIST_MAXLEVEL 32 /* Should be enough for 2^64 elements */ #define ZSKIPLIST_P 0.25 /* Skiplist P = 1/4 */ #define ZSKIPLIST_MAX_SEARCH 10 @@ -719,6 +718,23 @@ typedef enum { * Data types *----------------------------------------------------------------------------*/ +/* Generic set command string object set flags */ +#define ARGS_NO_FLAGS 0 +#define ARGS_SET_NX (1 << 0) /* Set if key not exists. */ +#define ARGS_SET_XX (1 << 1) /* Set if key exists. */ +#define ARGS_EX (1 << 2) /* Set if time in seconds is given */ +#define ARGS_PX (1 << 3) /* Set if time in ms in given */ +#define ARGS_KEEPTTL (1 << 4) /* Set and keep the ttl */ +#define ARGS_SET_GET (1 << 5) /* Set if want to get key before set */ +#define ARGS_EXAT (1 << 6) /* Set if timestamp in second is given */ +#define ARGS_PXAT (1 << 7) /* Set if timestamp in ms is given */ +#define ARGS_PERSIST (1 << 8) /* Set if we need to remove the ttl */ +#define ARGS_SET_IFEQ (1 << 9) /* Set if we need compare and set */ +#define ARGS_ARGV3 (1 << 10) /* Set if the value is at argv[3]; otherwise it's \ + * at argv[2]. */ +#define ARGS_SET_FNX (1 << 11) /* Set if key item not exists. */ +#define ARGS_SET_FXX (1 << 12) /* Set if key item exists. */ + /* An Object, that is a type able to hold a string / list / set */ /* The actual Object */ @@ -852,8 +868,9 @@ typedef struct replBufBlock { * by integers from 0 (the default database) up to the max configured * database. The database number is the 'id' field in the structure. */ typedef struct serverDb { - kvstore *keys; /* The keyspace for this DB */ - kvstore *expires; /* Timeout of keys with a timeout set */ + kvstore *keys; /* The keyspace for this DB */ + kvstore *expires; /* Timeout of keys with a timeout set */ + kvstore *object_with_volatile_elements; dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/ dict *blocking_keys_unblock_on_nokey; /* Keys with clients waiting for * data, and should be unblocked if key is deleted (XREADEDGROUP). @@ -1361,10 +1378,10 @@ struct sharedObjectsStruct { *bgsaveerr_variants[2], *execaborterr, *noautherr, *noreplicaserr, *busykeyerr, *oomerr, *plus, *messagebulk, *pmessagebulk, *subscribebulk, *unsubscribebulk, *psubscribebulk, *punsubscribebulk, *del, *unlink, *rpop, *lpop, *lpush, - *rpoplpush, *lmove, *blmove, *zpopmin, *zpopmax, *emptyscan, *multi, *exec, *left, *right, *hset, *srem, + *rpoplpush, *lmove, *blmove, *zpopmin, *zpopmax, *emptyscan, *multi, *exec, *left, *right, *hset, *hdel, *hpexpireat, *hpersist, *srem, *xgroup, *xclaim, *script, *replconf, *eval, *persist, *set, *pexpireat, *pexpire, *time, *pxat, *absttl, *retrycount, *force, *justid, *entriesread, *lastid, *ping, *setid, *keepttl, *load, *createconsumer, *getack, - *special_asterisk, *special_equals, *default_username, *redacted, *ssubscribebulk, *sunsubscribebulk, + *special_asterisk, *special_equals, *default_username, *redacted, *ssubscribebulk, *sunsubscribebulk, *fields, *smessagebulk, *select[PROTO_SHARED_SELECT_CMDS], *integers[OBJ_SHARED_INTEGERS], *mbulkhdr[OBJ_SHARED_BULKHDR_LEN], /* "*\r\n" */ *bulkhdr[OBJ_SHARED_BULKHDR_LEN], /* "$\r\n" */ @@ -1609,7 +1626,6 @@ typedef enum childInfoType { CHILD_INFO_TYPE_RDB_COW_SIZE, CHILD_INFO_TYPE_MODULE_COW_SIZE } childInfoType; - struct valkeyServer { /* General */ pid_t pid; /* Main process pid. */ @@ -2607,11 +2623,13 @@ typedef struct { typedef struct { robj *subject; int encoding; - + bool volatile_items_iter; unsigned char *fptr, *vptr; hashtableIterator iter; + volatileSetIterator viter; void *next; + } hashTypeIterator; #include "stream.h" /* Stream data type header file. */ @@ -2635,6 +2653,7 @@ extern hashtableType kvstoreKeysHashtableType; extern hashtableType kvstoreExpiresHashtableType; extern double R_Zero, R_PosInf, R_NegInf, R_Nan; extern hashtableType hashHashtableType; +extern hashtableType hashWithVolatileItemsHashtableType; extern dictType stringSetDictType; extern dictType externalStringType; extern dictType sdsHashDictType; @@ -2846,6 +2865,7 @@ int processIOThreadsWriteDone(void); void releaseReplyReferences(client *c); void resetLastWrittenBuf(client *c); +int parseExtendedCommandArgumentsOrReply(client *c, int *flags, int *unit, robj **expire, robj **compare_val, int command_type, int max_args); /* logreqres.c - logging of requests and responses */ void reqresReset(client *c, int free_buf); @@ -3335,16 +3355,14 @@ robj *setTypeDup(robj *o); /* Hash data type */ #define HASH_SET_TAKE_FIELD (1 << 0) #define HASH_SET_TAKE_VALUE (1 << 1) +#define HASH_SET_KEEP_EXPIRY (1 << 2) #define HASH_SET_COPY 0 -typedef void hashTypeEntry; -hashTypeEntry *hashTypeCreateEntry(sds field, sds value); -sds hashTypeEntryGetField(const hashTypeEntry *entry); -sds hashTypeEntryGetValue(const hashTypeEntry *entry); -size_t hashTypeEntryMemUsage(hashTypeEntry *entry); -hashTypeEntry *hashTypeEntryDefrag(hashTypeEntry *entry, void *(*defragfn)(void *), sds (*sdsdefragfn)(sds)); -void dismissHashTypeEntry(hashTypeEntry *entry); -void freeHashTypeEntry(hashTypeEntry *entry); + +void hashTypeFreeVolatileSet(robj *o); +void hashTypeTrackEntry(robj *o, void *entry); +void hashTypeUntrackEntry(robj *o, void *entry); +void hashTypeTrackUpdateEntry(robj *o, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry); void hashTypeConvert(robj *o, int enc); void hashTypeTryConversion(robj *subject, robj **argv, int start, int end); @@ -3352,6 +3370,7 @@ int hashTypeExists(robj *o, sds key); int hashTypeDelete(robj *o, sds key); unsigned long hashTypeLength(const robj *o); void hashTypeInitIterator(robj *subject, hashTypeIterator *hi); +void hashTypeInitVolatileIterator(robj *subject, hashTypeIterator *hi); void hashTypeResetIterator(hashTypeIterator *hi); int hashTypeNext(hashTypeIterator *hi); void hashTypeCurrentFromListpack(hashTypeIterator *hi, @@ -3363,8 +3382,10 @@ sds hashTypeCurrentFromHashTable(hashTypeIterator *hi, int what); sds hashTypeCurrentObjectNewSds(hashTypeIterator *hi, int what); robj *hashTypeLookupWriteOrCreate(client *c, robj *key); robj *hashTypeGetValueObject(robj *o, sds field); -int hashTypeSet(robj *o, sds field, sds value, int flags); +int hashTypeSet(robj *o, sds field, sds value, long long expiry, int flags); robj *hashTypeDup(robj *o); +bool hashTypeHasVolatileElements(robj *o); +size_t hashTypeNumVolatileElements(robj *o); /* Pub / Sub */ int pubsubUnsubscribeAllChannels(client *c, int notify); @@ -3826,6 +3847,8 @@ void zrankCommand(client *c); void zrevrankCommand(client *c); void hsetCommand(client *c); void hsetnxCommand(client *c); +void hsetexCommand(client *c); +void hgetexCommand(client *c); void hgetCommand(client *c); void hmgetCommand(client *c); void hdelCommand(client *c); @@ -3847,6 +3870,15 @@ void hgetallCommand(client *c); void hexistsCommand(client *c); void hscanCommand(client *c); void hrandfieldCommand(client *c); +void hexpireCommand(client *c); +void hexpireatCommand(client *c); +void hpexpireCommand(client *c); +void hpexpireatCommand(client *c); +void httlCommand(client *c); +void hpttlCommand(client *c); +void hexpiretimeCommand(client *c); +void hpexpiretimeCommand(client *c); +void hpersistCommand(client *c); void configSetCommand(client *c); void configGetCommand(client *c); void configResetStatCommand(client *c); diff --git a/src/serverassert.h b/src/serverassert.h index 5ce8eb24505..88c9815e566 100644 --- a/src/serverassert.h +++ b/src/serverassert.h @@ -63,4 +63,8 @@ void _serverAssert(const char *estr, const char *file, int line); void _serverPanic(const char *file, int line, const char *msg, ...); +#ifndef static_assert +#define static_assert(expr, lit) extern char __static_assert_failure[(expr) ? 1 : -1] +#endif + #endif diff --git a/src/t_hash.c b/src/t_hash.c index 5a8c17e90c8..14332fcea80 100644 --- a/src/t_hash.c +++ b/src/t_hash.c @@ -32,233 +32,127 @@ * SPDX-License-Identifier: BSD-3-Clause */ +#include "hashtable.h" +#include "rax.h" +#include "sds.h" +#include "volatile_set.h" #include "server.h" +#include "zmalloc.h" #include -#include +#include +#include "entry.h" + +/* enumeration of all the possible return values of commands manipulating fields expiration. */ +typedef enum { + /* SDS aux flag. If set, it indicates that the entry has TTL metadata set. */ + EXPIRATION_MODIFICATION_NOT_EXIST = -2, /* in case the provided object is NULL or the specific field was not found */ + EXPIRATION_MODIFICATION_SUCCESSFUL = 1, /* if the expiration time was applied or modified */ + EXPIRATION_MODIFICATION_FAILED_CONDITION = 0, /* if the some predefined conditions (e.g hexpire conditional flags) has not been met */ + EXPIRATION_MODIFICATION_FAILED = -1, /* if apply of the expiration modification failed (e.g hpersist on item without expiration) */ + EXPIRATION_MODIFICATION_EXPIRE_ASAP = 2, /* if apply of the expiration modification was set to a time in the past (i.e field is immediately expired) */ +} expiryModificationResult; + +volatileEntryType hashVolatileEntryType = { + .entryGetKey = (sds(*)(const void *entry))entryGetField, + .getExpiry = (long long (*)(const void *entry))entryGetExpiry, +}; /*----------------------------------------------------------------------------- - * Hash Entry API + * Hash type Expiry API *----------------------------------------------------------------------------*/ -/* The hashTypeEntry pointer is the field sds. We encode the entry layout type - * in the field SDS header. Field type SDS_TYPE_5 doesn't have any spare bits to - * encode this so we use it only for the first layout type. - * - * Entry with embedded value, used for small sizes. The value is stored as - * SDS_TYPE_8. The field can use any SDS type. - * - * +--------------+---------------+ - * | field | value | - * | hdr "foo" \0 | hdr8 "bar" \0 | - * +------^-------+---------------+ - * | - * | - * entry pointer = field sds - * - * Entry with value pointer, used for larger fields and values. The field is SDS - * type 8 or higher. - * - * +-------+--------------+ - * | value | field | - * | ptr | hdr "foo" \0 | - * +-------+------^-------+ - * | - * | - * entry pointer = field sds - */ +static volatile_set *hashTypeGetVolatileSet(robj *o) { + serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); + return *(volatile_set **)hashtableMetadata(o->ptr); +} -/* The maximum allocation size we want to use for entries with embedded - * values. */ -#define EMBED_VALUE_MAX_ALLOC_SIZE 128 - -/* SDS aux flag. If set, it indicates that the entry has an embedded value - * pointer located in memory before the embedded field. If unset, the entry - * instead has an embedded value located after the embedded field. */ -#define FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR 0 - -static inline bool entryHasValuePtr(const hashTypeEntry *entry) { - return sdsGetAuxBit(entry, FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR); -} - -/* Returns the location of a pointer to a separately allocated value. Only for - * an entry without an embedded value. */ -static sds *hashTypeEntryGetValueRef(const hashTypeEntry *entry) { - serverAssert(entryHasValuePtr(entry)); - char *field_data = sdsAllocPtr(entry); - field_data -= sizeof(sds *); - return (sds *)field_data; -} - -/* takes ownership of value, does not take ownership of field */ -hashTypeEntry *hashTypeCreateEntry(sds field, sds value) { - size_t field_len = sdslen(field); - int field_sds_type = sdsReqType(field_len); - size_t field_size = sdsReqSize(field_len, field_sds_type); - size_t value_len = sdslen(value); - size_t value_size = sdsReqSize(value_len, SDS_TYPE_8); - sds embedded_field_sds; - if (field_size + value_size <= EMBED_VALUE_MAX_ALLOC_SIZE) { - /* Embed field and value. Value is fixed to SDS_TYPE_8. Unused - * allocation space is recorded in the embedded value's SDS header. - * - * +--------------+---------------+ - * | field | value | - * | hdr "foo" \0 | hdr8 "bar" \0 | - * +--------------+---------------+ - */ - size_t min_size = field_size + value_size; - size_t buf_size; - char *buf = zmalloc_usable(min_size, &buf_size); - embedded_field_sds = sdswrite(buf, field_size, field_sds_type, field, field_len); - sdswrite(buf + field_size, buf_size - field_size, SDS_TYPE_8, value, value_len); - /* Field sds aux bits are zero, which we use for this entry encoding. */ - sdsSetAuxBit(embedded_field_sds, FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR, 0); - serverAssert(!entryHasValuePtr(embedded_field_sds)); - sdsfree(value); - } else { - /* Embed field, but not value. Field must be >= SDS_TYPE_8 to encode to - * indicate this type of entry. - * - * +-------+---------------+ - * | value | field | - * | ptr | hdr8 "foo" \0 | - * +-------+---------------+ - */ - char field_sds_type = sdsReqType(field_len); - if (field_sds_type == SDS_TYPE_5) field_sds_type = SDS_TYPE_8; - field_size = sdsReqSize(field_len, field_sds_type); - size_t alloc_size = sizeof(sds *) + field_size; - char *buf = zmalloc(alloc_size); - *(sds *)buf = value; - embedded_field_sds = sdswrite(buf + sizeof(sds *), field_size, field_sds_type, field, field_len); - /* Store the entry encoding type in sds aux bits. */ - sdsSetAuxBit(embedded_field_sds, FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR, 1); - serverAssert(entryHasValuePtr(embedded_field_sds)); - } - return (void *)embedded_field_sds; -} - -/* The entry pointer is the field sds, but that's an implementation detail. */ -sds hashTypeEntryGetField(const hashTypeEntry *entry) { - return (sds)entry; -} - -sds hashTypeEntryGetValue(const hashTypeEntry *entry) { - if (entryHasValuePtr(entry)) { - return *hashTypeEntryGetValueRef(entry); - } else { - /* Skip field content, field null terminator and value sds8 hdr. */ - size_t offset = sdslen(entry) + 1 + sdsHdrSize(SDS_TYPE_8); - return (char *)entry + offset; - } -} - -/* Returns the address of the entry allocation. */ -static void *hashTypeEntryAllocPtr(hashTypeEntry *entry) { - char *buf = sdsAllocPtr(entry); - if (entryHasValuePtr(entry)) { - buf -= sizeof(sds *); - } - return buf; -} - -/* Frees previous value, takes ownership of new value, returns entry (may be - * reallocated). */ -static hashTypeEntry *hashTypeEntryReplaceValue(hashTypeEntry *entry, sds value) { - sds field = (sds)entry; - size_t field_size = sdsHdrSize(sdsType(field)) + sdsalloc(field) + 1; - size_t value_len = sdslen(value); - size_t value_size = sdsReqSize(value_len, SDS_TYPE_8); - if (!entryHasValuePtr(entry)) { - /* Reuse the allocation if the new value fits and leaves no more than - * 25% unused space after replacing the value. */ - char *alloc_ptr = sdsAllocPtr(entry); - size_t required_size = field_size + value_size; - size_t alloc_size; - if (required_size <= EMBED_VALUE_MAX_ALLOC_SIZE && - required_size <= (alloc_size = hashTypeEntryMemUsage(entry)) && - required_size >= alloc_size * 3 / 4) { - /* It fits in the allocation and leaves max 25% unused space. */ - sdswrite(alloc_ptr + field_size, alloc_size - field_size, SDS_TYPE_8, value, value_len); - sdsfree(value); - return entry; - } - hashTypeEntry *new_entry = hashTypeCreateEntry(hashTypeEntryGetField(entry), value); - freeHashTypeEntry(entry); - return new_entry; - } else { - /* The value pointer is located before the embedded field. */ - if (field_size + value_size <= EMBED_VALUE_MAX_ALLOC_SIZE) { - /* Convert to entry with embedded value. */ - hashTypeEntry *new_entry = hashTypeCreateEntry(field, value); - freeHashTypeEntry(entry); - return new_entry; - } else { - /* Not embedded value. */ - sds *value_ref = hashTypeEntryGetValueRef(entry); - sdsfree(*value_ref); - *value_ref = value; - return entry; +void hashTypeFreeVolatileSet(robj *o) { + volatile_set *set = hashTypeGetVolatileSet(o); + if (set) freeVolatileSet(set); +} + +bool hashTypeHasVolatileElements(robj *o) { + return ((o->encoding == OBJ_ENCODING_HASHTABLE) && (hashTypeGetVolatileSet(o) != NULL)); +} + +/* make any access to the hash object elements ignore the specific elements expiration. + * This is mainly in order to be able to access hash elements which are already expired. */ +static inline void hashTypeIgnoreTTL(robj *o, bool ignore) { + if (o->encoding == OBJ_ENCODING_HASHTABLE) { + /* prevent placing access function if not needed */ + if (!ignore && !hashTypeHasVolatileElements(o)) { + ignore = true; } + hashtableSetType(o->ptr, ignore ? &hashHashtableType : &hashWithVolatileItemsHashtableType); } } -/* Returns memory usage of a hashTypeEntry, including all allocations owned by - * the hashTypeEntry. */ -size_t hashTypeEntryMemUsage(hashTypeEntry *entry) { - size_t mem = 0; - if (entryHasValuePtr(entry)) { - /* Alloc size is not stored in the embedded field. */ - mem = zmalloc_usable_size(hashTypeEntryAllocPtr(entry)); - mem += sdsAllocSize(*hashTypeEntryGetValueRef(entry)); - } else { - /* Remaining alloc size is encoded in the embedded value SDS header. */ - sds field = entry; - sds value = (char *)entry + sdslen(field) + 1 + sdsHdrSize(SDS_TYPE_8); - size_t field_size = sdsHdrSize(sdsType(field)) + sdslen(field) + 1; - size_t value_size = sdsHdrSize(SDS_TYPE_8) + sdsalloc(value) + 1; - mem = field_size + value_size; +static volatile_set *hashTypeGetOrcreateVolatileSet(robj *o) { + serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); + volatile_set **volatile_set_ref = hashtableMetadata(o->ptr); + if (*volatile_set_ref == NULL) { + *volatile_set_ref = createVolatileSet(&hashVolatileEntryType); + /* serves mainly for optimization. Use type which supports access function only when needed. */ + hashTypeIgnoreTTL(o, false); } - return mem; + return *volatile_set_ref; } -/* Defragments a hashtable entry (field-value pair) if needed, using the - * provided defrag functions. The defrag functions return NULL if the allocation - * was not moved, otherwise they return a pointer to the new memory location. - * A separate sds defrag function is needed because of the unique memory layout - * of sds strings. - * If the location of the hashTypeEntry changed we return the new location, - * otherwise we return NULL. */ -hashTypeEntry *hashTypeEntryDefrag(hashTypeEntry *entry, void *(*defragfn)(void *), sds (*sdsdefragfn)(sds)) { - if (entryHasValuePtr(entry)) { - sds *value_ref = hashTypeEntryGetValueRef(entry); - sds new_value = sdsdefragfn(*value_ref); - if (new_value) *value_ref = new_value; - } - char *allocation = hashTypeEntryAllocPtr(entry); - char *new_allocation = defragfn(allocation); - if (new_allocation != NULL) { - /* Return the same offset into the new allocation as the entry's offset - * in the old allocation. */ - return new_allocation + ((char *)entry - allocation); - } - return NULL; +static void hashTypeDeleteVolatileSet(robj *o) { + volatile_set **volatile_set_ref = hashtableMetadata(o->ptr); + freeVolatileSet(*volatile_set_ref); + *volatile_set_ref = NULL; + /* serves mainly for optimization. by changing the hashtable type we can avoid extra function call in hashtable access */ + hashTypeIgnoreTTL(o, true); } -/* Used for releasing memory to OS to avoid unnecessary CoW. Called when we've - * forked and memory won't be used again. See zmadvise_dontneed() */ -void dismissHashTypeEntry(hashTypeEntry *entry) { - /* Only dismiss values memory since the field size usually is small. */ - if (entryHasValuePtr(entry)) { - dismissSds(*hashTypeEntryGetValueRef(entry)); +void hashTypeTrackEntry(robj *o, void *entry) { + volatile_set *set = hashTypeGetOrcreateVolatileSet(o); + serverAssert(volatileSetAddEntry(set, entry, entryGetExpiry(entry))); +} + +void hashTypeUntrackEntry(robj *o, void *entry) { + if (!entryHasExpiry(entry)) return; + volatile_set *set = hashTypeGetVolatileSet(o); + debugServerAssert(set); + serverAssert(volatileSetRemoveEntry(set, entry, entryGetExpiry(entry))); + if (volatileSetNumEntries(set) == 0) { + hashTypeDeleteVolatileSet(o); } } -void freeHashTypeEntry(hashTypeEntry *entry) { - if (entryHasValuePtr(entry)) { - sdsfree(*hashTypeEntryGetValueRef(entry)); +void hashTypeTrackUpdateEntry(robj *o, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { + int old_tracked = (old_entry && old_expiry != EXPIRY_NONE); + int new_tracked = (new_entry && new_expiry != EXPIRY_NONE); + /* If entry was not tracked before and not going to be tracked now, we can simply return */ + if (!old_tracked && !new_tracked) + return; + + volatile_set *set = hashTypeGetOrcreateVolatileSet(o); + debugServerAssert(set); + + if (old_tracked && !new_tracked) { + serverAssert(volatileSetRemoveEntry(set, old_entry, old_expiry)); + } else if (new_tracked && !old_tracked) { + serverAssert(volatileSetAddEntry(set, new_entry, new_expiry)); + } else { + volatile_set *set = hashTypeGetVolatileSet(o); + debugServerAssert(set); + serverAssert(volatileSetUpdateEntry(set, old_entry, new_entry, old_expiry, new_expiry) == 1); + } + if (volatileSetNumEntries(set) == 0) { + hashTypeDeleteVolatileSet(o); } - zfree(hashTypeEntryAllocPtr(entry)); +} + +bool hashHashtableTypeValidate(hashtable *ht, void *entry) { + UNUSED(ht); + expirationPolicy policy = getExpirationPolicyWithFlags(0); + if (policy == POLICY_IGNORE_EXPIRE) return true; + + if (!entryIsExpired(entry)) return true; + + return false; } /*----------------------------------------------------------------------------- @@ -322,16 +216,6 @@ int hashTypeGetFromListpack(robj *o, sds field, unsigned char **vstr, unsigned i return -1; } -/* Get the value from a hash table encoded hash, identified by field. - * Returns NULL when the field cannot be found, otherwise the SDS value - * is returned. */ -sds hashTypeGetFromHashTable(robj *o, sds field) { - serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); - void *found_element; - if (!hashtableFind(o->ptr, field, &found_element)) return NULL; - return hashTypeEntryGetValue(found_element); -} - /* Higher level function of hashTypeGet*() that returns the hash value * associated with the specified field. If the field is found C_OK * is returned, otherwise C_ERR. The returned object is returned by @@ -340,16 +224,48 @@ sds hashTypeGetFromHashTable(robj *o, sds field) { * * If *vll is populated *vstr is set to NULL, so the caller * can always check the function return by checking the return value - * for C_OK and checking if vll (or vstr) is NULL. */ -int hashTypeGetValue(robj *o, sds field, unsigned char **vstr, unsigned int *vlen, long long *vll) { + * for C_OK and checking if vll (or vstr) is NULL. + * + * If *expiry is populated than the function will also provide the current field expiration time + * or EXPIRY_NONE in case the field has no expiration time defined. */ +int hashTypeGetValue(robj *o, sds field, unsigned char **vstr, unsigned int *vlen, long long *vll, long long *expiry) { if (o->encoding == OBJ_ENCODING_LISTPACK) { *vstr = NULL; - if (hashTypeGetFromListpack(o, field, vstr, vlen, vll) == 0) return C_OK; + if (hashTypeGetFromListpack(o, field, vstr, vlen, vll) == 0) { + if (expiry) *expiry = EXPIRY_NONE; + return C_OK; + } } else if (o->encoding == OBJ_ENCODING_HASHTABLE) { - sds value = hashTypeGetFromHashTable(o, field); - if (value != NULL) { + void *entry = NULL; + hashtableFind(o->ptr, field, &entry); + if (entry) { + sds value = entryGetValue(entry); + serverAssert(value != NULL); *vstr = (unsigned char *)value; *vlen = sdslen(value); + if (expiry) *expiry = entryGetExpiry(entry); + return C_OK; + } + } else { + serverPanic("Unknown hash encoding"); + } + return C_ERR; +} + +/* Returns the expiration time associated with the specified field. + * If the field is found C_OK is returned, otherwise C_ERR. + * The matching item expiration time is assigned to `expiry` memory location, if specified. + * In case the item has no assigned expiration time, -1 is returned. */ +int hashTypeGetExpiry(robj *o, sds field, long long *expiry) { + if (o->encoding == OBJ_ENCODING_LISTPACK) { + if (hashTypeExists(o, field)) { + if (expiry) *expiry = EXPIRY_NONE; + return C_OK; + } + } else if (o->encoding == OBJ_ENCODING_HASHTABLE) { + void *found_element = NULL; + if (hashtableFind(o->ptr, field, &found_element)) { + if (expiry) *expiry = entryGetExpiry(found_element); return C_OK; } } else { @@ -367,7 +283,7 @@ robj *hashTypeGetValueObject(robj *o, sds field) { unsigned int vlen; long long vll; - if (hashTypeGetValue(o, field, &vstr, &vlen, &vll) == C_ERR) return NULL; + if (hashTypeGetValue(o, field, &vstr, &vlen, &vll, NULL) == C_ERR) return NULL; if (vstr) return createStringObject((char *)vstr, vlen); else @@ -383,7 +299,7 @@ size_t hashTypeGetValueLength(robj *o, sds field) { unsigned int vlen = UINT_MAX; long long vll = LLONG_MAX; - if (hashTypeGetValue(o, field, &vstr, &vlen, &vll) == C_OK) len = vstr ? vlen : sdigits10(vll); + if (hashTypeGetValue(o, field, &vstr, &vlen, &vll, NULL) == C_OK) len = vstr ? vlen : sdigits10(vll); return len; } @@ -395,7 +311,7 @@ int hashTypeExists(robj *o, sds field) { unsigned int vlen = UINT_MAX; long long vll = LLONG_MAX; - return hashTypeGetValue(o, field, &vstr, &vlen, &vll) == C_OK; + return hashTypeGetValue(o, field, &vstr, &vlen, &vll, NULL) == C_OK; } /* Add a new field, overwrite the old with the new value if it already exists. @@ -416,14 +332,14 @@ int hashTypeExists(robj *o, sds field) { * semantics of copying the values if needed. * */ -int hashTypeSet(robj *o, sds field, sds value, int flags) { +int hashTypeSet(robj *o, sds field, sds value, long long expiry, int flags) { int update = 0; /* Check if the field is too long for listpack, and convert before adding the item. * This is needed for HINCRBY* case since in other commands this is handled early by * hashTypeTryConversion, so this check will be a NOP. */ if (o->encoding == OBJ_ENCODING_LISTPACK) { - if (sdslen(field) > server.hash_max_listpack_value || sdslen(value) > server.hash_max_listpack_value) + if (expiry > 0 || sdslen(field) > server.hash_max_listpack_value || sdslen(value) > server.hash_max_listpack_value) hashTypeConvert(o, OBJ_ENCODING_HASHTABLE); } @@ -465,22 +381,40 @@ int hashTypeSet(robj *o, sds field, sds value, int flags) { v = sdsdup(value); } + /* We have to ignore the TTL when setting an element. this is mainly in order to be able to update an existing expired + * entry and not have it remain in the hashtable with the same field/value. */ + hashTypeIgnoreTTL(o, true); hashtablePosition position; void *existing; if (hashtableFindPositionForInsert(ht, field, &position, &existing)) { /* does not exist yet */ - hashTypeEntry *entry = hashTypeCreateEntry(field, v); + entry *entry = entryCreate(field, v, expiry); hashtableInsertAtPosition(ht, entry, &position); + /* In case an expiry is set on the new entry, we need to track it */ + if (expiry != EXPIRY_NONE) { + hashTypeTrackEntry(o, entry); + } } else { /* exists: replace value */ - void *new_entry = hashTypeEntryReplaceValue(existing, v); + long long entry_expiry = entryGetExpiry(existing); + /* It is possible that the entry is already expired. In this case we can override it, but we need to make sure to expire it first + * and treat it like it did not exist. */ + bool is_expired = timestampIsExpired(entry_expiry); + if (!is_expired && flags & HASH_SET_KEEP_EXPIRY) { + /* In case the HASH_SET_KEEP_EXPIRY will force keeping the existing entry expiry. */ + expiry = entry_expiry; + } + void *new_entry = entryUpdate(existing, v, expiry); if (new_entry != existing) { /* It has been reallocated. */ int replaced = hashtableReplaceReallocatedEntry(ht, existing, new_entry); serverAssert(replaced); } - update = 1; + hashTypeTrackUpdateEntry(o, existing, new_entry, entry_expiry, expiry); + /* since we are exposed to expired entries, we must NOT reflect them as being "updated" */ + update = is_expired ? 0 : 1; } + hashTypeIgnoreTTL(o, false); } else { serverPanic("Unknown hash encoding"); } @@ -492,6 +426,110 @@ int hashTypeSet(robj *o, sds field, sds value, int flags) { return update; } +/* Set expiration on the specific HASH object 'o' item indicated by 'field'. + * returns -2 in case the provided object is NULL or the specific field was not found. + * returns 0 if the specified flag conditions has not been met. + * returns 1 if the expiration time was applied. + * returns 2 when 'expire' indicate a past Unix time. In this case, if the item exists in the HASH, it will also be expired. */ +static expiryModificationResult hashTypeSetExpire(robj *o, sds field, long long expiry, int flag) { + /* If no object we will return -2 */ + if (o == NULL) return EXPIRATION_MODIFICATION_NOT_EXIST; + + if (o->encoding == OBJ_ENCODING_LISTPACK) { + unsigned char *vstr; + unsigned int vlen; + long long vll; + /* We do not want to convert to listpack for no good reason. + * So we first check if the item exists.*/ + if (hashTypeGetFromListpack(o, field, &vstr, &vlen, &vll) < 0) { + return EXPIRATION_MODIFICATION_NOT_EXIST; + } + /* When listpack representation is used, we consider it as infinite TTL, + * so expire command with gt always fail the GT as well as existence(XX). + * Else, we already know we are going to set an expiration so we expend to hashtable encoding. */ + if (flag & EXPIRE_XX || flag & EXPIRE_GT) { + return EXPIRATION_MODIFICATION_FAILED_CONDITION; + } else { + hashTypeConvert(o, OBJ_ENCODING_HASHTABLE); + } + } + + /* we must be hashtable encoded */ + serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); + + hashtable *ht = o->ptr; + void **entry_ref = NULL; + if ((entry_ref = hashtableFindRef(ht, field))) { + entry *current_entry = *entry_ref; + long long current_expire = entryGetExpiry(current_entry); + if (flag) { + /* NX option is set, check no current expiry */ + if (flag & EXPIRE_NX) { + if (current_expire != EXPIRY_NONE) { + return EXPIRATION_MODIFICATION_FAILED_CONDITION; + } + } + + /* XX option is set, check current expiry */ + if (flag & EXPIRE_XX) { + if (current_expire == EXPIRY_NONE) { + return EXPIRATION_MODIFICATION_FAILED_CONDITION; + } + } + + /* GT option is set, check current expiry */ + if (flag & EXPIRE_GT) { + /* When current_expire is -1, we consider it as infinite TTL, + * so expire command with gt always fail the GT. */ + if (expiry <= current_expire || current_expire == EXPIRY_NONE) { + return EXPIRATION_MODIFICATION_FAILED_CONDITION; + } + } + + /* LT option is set, check current expiry */ + if (flag & EXPIRE_LT) { + /* When current_expire -1, we consider it as infinite TTL, + * so if there is an expiry on the key and it's not less than current, we fail the LT. */ + if (current_expire != EXPIRY_NONE && expiry >= current_expire) { + return EXPIRATION_MODIFICATION_FAILED_CONDITION; + } + } + } + *entry_ref = entrySetExpiry(current_entry, expiry); + hashTypeTrackUpdateEntry(o, current_entry, *entry_ref, current_expire, expiry); + return EXPIRATION_MODIFICATION_SUCCESSFUL; + } + return EXPIRATION_MODIFICATION_NOT_EXIST; // we did not find anything to do. return -2 +} + + +static expiryModificationResult hashTypePersist(robj *o, sds field) { + /* NULL object returns -2 */ + if (o == NULL || o->type != OBJ_HASH) return EXPIRATION_MODIFICATION_NOT_EXIST; + + if (o->encoding == OBJ_ENCODING_LISTPACK) { + if (hashTypeExists(o, field)) + /* When listpack representation is used, All items are without expiry */ + return EXPIRATION_MODIFICATION_FAILED; + else + return EXPIRATION_MODIFICATION_NOT_EXIST; // Did not find any element return -2 + } + + hashtable *ht = o->ptr; + void **entry_ref = NULL; + if ((entry_ref = hashtableFindRef(ht, field))) { + entry *current_entry = *entry_ref; + long long current_expire = entryGetExpiry(current_entry); + if (current_expire != EXPIRY_NONE) { + hashTypeUntrackEntry(o, current_entry); + *entry_ref = entryUpdate(current_entry, NULL, EXPIRY_NONE); + return EXPIRATION_MODIFICATION_SUCCESSFUL; + } + return EXPIRATION_MODIFICATION_FAILED; // If the found element has no expiration set, return -1 + } + return EXPIRATION_MODIFICATION_NOT_EXIST; // Did not find any element return -2 +} + /* Delete an element from a hash. * Return 1 on deleted and 0 on not found. */ int hashTypeDelete(robj *o, sds field) { @@ -513,7 +551,12 @@ int hashTypeDelete(robj *o, sds field) { } } else if (o->encoding == OBJ_ENCODING_HASHTABLE) { hashtable *ht = o->ptr; - deleted = hashtableDelete(ht, field); + void *entry = NULL; + deleted = hashtablePop(ht, field, &entry); + if (deleted) { + hashTypeUntrackEntry(o, entry); + entryFree(entry); + } } else { serverPanic("Unknown hash encoding"); } @@ -536,6 +579,7 @@ unsigned long hashTypeLength(const robj *o) { void hashTypeInitIterator(robj *subject, hashTypeIterator *hi) { hi->subject = subject; hi->encoding = subject->encoding; + hi->volatile_items_iter = false; if (hi->encoding == OBJ_ENCODING_LISTPACK) { hi->fptr = NULL; @@ -547,8 +591,27 @@ void hashTypeInitIterator(robj *subject, hashTypeIterator *hi) { } } +void hashTypeInitVolatileIterator(robj *subject, hashTypeIterator *hi) { + hi->subject = subject; + hi->encoding = subject->encoding; + hi->volatile_items_iter = true; + + if (hi->encoding == OBJ_ENCODING_LISTPACK) { + return; + } else if (hi->encoding == OBJ_ENCODING_HASHTABLE) { + volatileSetStart(hashTypeGetVolatileSet(subject), &hi->viter); + } else { + serverPanic("Unknown hash encoding"); + } +} + void hashTypeResetIterator(hashTypeIterator *hi) { - if (hi->encoding == OBJ_ENCODING_HASHTABLE) hashtableResetIterator(&hi->iter); + if (hi->encoding == OBJ_ENCODING_HASHTABLE) { + if (!hi->volatile_items_iter) + hashtableResetIterator(&hi->iter); + else + volatileSetReset(&hi->viter); + } } /* Move to the next entry in the hash. Return C_OK when the next entry @@ -558,6 +621,9 @@ int hashTypeNext(hashTypeIterator *hi) { unsigned char *zl; unsigned char *fptr, *vptr; + /* listpack encoding does not have volatile items, so return as iteration end */ + if (hi->volatile_items_iter) return C_ERR; + zl = hi->subject->ptr; fptr = hi->fptr; vptr = hi->vptr; @@ -581,7 +647,11 @@ int hashTypeNext(hashTypeIterator *hi) { hi->fptr = fptr; hi->vptr = vptr; } else if (hi->encoding == OBJ_ENCODING_HASHTABLE) { - if (!hashtableNext(&hi->iter, &hi->next)) return C_ERR; + if (!hi->volatile_items_iter) { + if (!hashtableNext(&hi->iter, &hi->next)) return C_ERR; + } else { + if (!volatileSetNext(&hi->viter, &hi->next)) return C_ERR; + } } else { serverPanic("Unknown hash encoding"); } @@ -611,9 +681,9 @@ sds hashTypeCurrentFromHashTable(hashTypeIterator *hi, int what) { serverAssert(hi->encoding == OBJ_ENCODING_HASHTABLE); if (what & OBJ_HASH_FIELD) { - return hashTypeEntryGetField(hi->next); + return entryGetField(hi->next); } else { - return hashTypeEntryGetValue(hi->next); + return entryGetValue(hi->next); } } @@ -682,10 +752,10 @@ void hashTypeConvertListpack(robj *o, int enc) { while (hashTypeNext(&hi) != C_ERR) { sds field = hashTypeCurrentObjectNewSds(&hi, OBJ_HASH_FIELD); sds value = hashTypeCurrentObjectNewSds(&hi, OBJ_HASH_VALUE); - hashTypeEntry *entry = hashTypeCreateEntry(field, value); + entry *entry = entryCreate(field, value, EXPIRY_NONE); sdsfree(field); if (!hashtableAdd(ht, entry)) { - freeHashTypeEntry(entry); + entryFree(entry); hashTypeResetIterator(&hi); /* Needed for gcc ASAN */ serverLogHexDump(LL_WARNING, "listpack with dup elements dump", o->ptr, lpBytes(o->ptr)); serverPanic("Listpack corruption detected"); @@ -731,21 +801,22 @@ robj *hashTypeDup(robj *o) { } else if (o->encoding == OBJ_ENCODING_HASHTABLE) { hashtable *ht = hashtableCreate(&hashHashtableType); hashtableExpand(ht, hashtableSize((const hashtable *)o->ptr)); + hobj = createObject(OBJ_HASH, ht); + hobj->encoding = OBJ_ENCODING_HASHTABLE; hashTypeInitIterator(o, &hi); while (hashTypeNext(&hi) != C_ERR) { /* Extract a field-value pair from an original hash object.*/ sds field = hashTypeCurrentFromHashTable(&hi, OBJ_HASH_FIELD); sds value = hashTypeCurrentFromHashTable(&hi, OBJ_HASH_VALUE); - + long long expiry = entryGetExpiry(hi.next); /* Add a field-value pair to a new hash object. */ - hashTypeEntry *entry = hashTypeCreateEntry(field, sdsdup(value)); + entry *entry = entryCreate(field, sdsdup(value), expiry); hashtableAdd(ht, entry); + if (expiry != EXPIRY_NONE) + hashTypeTrackEntry(hobj, entry); } hashTypeResetIterator(&hi); - - hobj = createObject(OBJ_HASH, ht); - hobj->encoding = OBJ_ENCODING_HASHTABLE; } else { serverPanic("Unknown hash encoding"); } @@ -771,16 +842,33 @@ void hashReplyFromListpackEntry(client *c, listpackEntry *e) { * 'val' can be NULL in which case it's not extracted. */ static void hashTypeRandomElement(robj *hashobj, unsigned long hashsize, listpackEntry *field, listpackEntry *val) { if (hashobj->encoding == OBJ_ENCODING_HASHTABLE) { - void *entry; - hashtableFairRandomEntry(hashobj->ptr, &entry); - sds sds_field = hashTypeEntryGetField(entry); - field->sval = (unsigned char *)sds_field; - field->slen = sdslen(sds_field); - if (val) { - sds sds_val = hashTypeEntryGetValue(entry); - val->sval = (unsigned char *)sds_val; - val->slen = sdslen(sds_val); + void *e = NULL; + int maxtries = 100; + hashTypeIgnoreTTL(hashobj, true); + while (!e) { + hashtableFairRandomEntry(hashobj->ptr, &e); + if (entryIsExpired(e) && --maxtries) { + e = NULL; + continue; + } else if (maxtries == 0) { + /* in case we will not be able to locate an entry which is not expired, we will just not return any + * result. An alternative would have been that we end up returning an expired entry. */ + field->sval = NULL; + if (val) val->sval = NULL; + break; + } + sds sds_field = entryGetField(e); + field->sval = (unsigned char *)sds_field; + field->slen = sdslen(sds_field); + if (val) { + entry *hash_entry = e; + sds sds_val = entryGetValue(hash_entry); + val->sval = (unsigned char *)sds_val; + val->slen = + sdslen(sds_val); + } } + hashTypeIgnoreTTL(hashobj, false); } else if (hashobj->encoding == OBJ_ENCODING_LISTPACK) { lpRandomPair(hashobj->ptr, hashsize, field, val); } else { @@ -793,61 +881,16 @@ static void hashTypeRandomElement(robj *hashobj, unsigned long hashsize, listpac * Hash type commands *----------------------------------------------------------------------------*/ -void hsetnxCommand(client *c) { - robj *o; - if ((o = hashTypeLookupWriteOrCreate(c, c->argv[1])) == NULL) return; - - if (hashTypeExists(o, c->argv[2]->ptr)) { - addReply(c, shared.czero); - } else { - hashTypeTryConversion(o, c->argv, 2, 3); - hashTypeSet(o, c->argv[2]->ptr, c->argv[3]->ptr, HASH_SET_COPY); - signalModifiedKey(c, c->db, c->argv[1]); - notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); - server.dirty++; - addReply(c, shared.cone); - } -} - -void hsetCommand(client *c) { - int i, created = 0; - robj *o; - - if ((c->argc % 2) == 1) { - addReplyErrorArity(c); - return; - } - - if ((o = hashTypeLookupWriteOrCreate(c, c->argv[1])) == NULL) return; - hashTypeTryConversion(o, c->argv, 2, c->argc - 1); - - for (i = 2; i < c->argc; i += 2) created += !hashTypeSet(o, c->argv[i]->ptr, c->argv[i + 1]->ptr, HASH_SET_COPY); - - signalModifiedKey(c, c->db, c->argv[1]); - notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); - server.dirty += (c->argc - 2) / 2; - - /* HMSET (deprecated) and HSET return value is different. */ - char *cmdname = c->argv[0]->ptr; - if (cmdname[1] == 's' || cmdname[1] == 'S') { - /* HSET */ - addReplyLongLong(c, created); - } else { - /* HMSET */ - addReply(c, shared.ok); - } -} - void hincrbyCommand(client *c) { long long value, incr, oldvalue; robj *o; sds new; unsigned char *vstr; unsigned int vlen; - + long long expiry = EXPIRY_NONE; if (getLongLongFromObjectOrReply(c, c->argv[3], &incr, NULL) != C_OK) return; if ((o = hashTypeLookupWriteOrCreate(c, c->argv[1])) == NULL) return; - if (hashTypeGetValue(o, c->argv[2]->ptr, &vstr, &vlen, &value) == C_OK) { + if (hashTypeGetValue(o, c->argv[2]->ptr, &vstr, &vlen, &value, &expiry) == C_OK) { if (vstr) { if (string2ll((char *)vstr, vlen, &value) == 0) { addReplyError(c, "hash value is not an integer"); @@ -866,7 +909,7 @@ void hincrbyCommand(client *c) { } value += incr; new = sdsfromlonglong(value); - hashTypeSet(o, c->argv[2]->ptr, new, HASH_SET_TAKE_VALUE); + hashTypeSet(o, c->argv[2]->ptr, new, expiry, HASH_SET_TAKE_VALUE); signalModifiedKey(c, c->db, c->argv[1]); notifyKeyspaceEvent(NOTIFY_HASH, "hincrby", c->argv[1], c->db->id); server.dirty++; @@ -880,6 +923,7 @@ void hincrbyfloatCommand(client *c) { sds new; unsigned char *vstr; unsigned int vlen; + long long expiry = EXPIRY_NONE; if (getLongDoubleFromObjectOrReply(c, c->argv[3], &incr, NULL) != C_OK) return; if (isnan(incr) || isinf(incr)) { @@ -887,7 +931,8 @@ void hincrbyfloatCommand(client *c) { return; } if ((o = hashTypeLookupWriteOrCreate(c, c->argv[1])) == NULL) return; - if (hashTypeGetValue(o, c->argv[2]->ptr, &vstr, &vlen, &ll) == C_OK) { + + if (hashTypeGetValue(o, c->argv[2]->ptr, &vstr, &vlen, &ll, &expiry) == C_OK) { if (vstr) { if (string2ld((char *)vstr, vlen, &value) == 0) { addReplyError(c, "hash value is not a float"); @@ -909,7 +954,7 @@ void hincrbyfloatCommand(client *c) { char buf[MAX_LONG_DOUBLE_CHARS]; int len = ld2string(buf, sizeof(buf), value, LD_STR_HUMAN); new = sdsnewlen(buf, len); - hashTypeSet(o, c->argv[2]->ptr, new, HASH_SET_TAKE_VALUE); + hashTypeSet(o, c->argv[2]->ptr, new, expiry, HASH_SET_TAKE_VALUE); signalModifiedKey(c, c->db, c->argv[1]); notifyKeyspaceEvent(NOTIFY_HASH, "hincrbyfloat", c->argv[1], c->db->id); server.dirty++; @@ -935,7 +980,7 @@ static void addHashFieldToReply(client *c, robj *o, sds field) { unsigned int vlen = UINT_MAX; long long vll = LLONG_MAX; - if (hashTypeGetValue(o, field, &vstr, &vlen, &vll) == C_OK) { + if (hashTypeGetValue(o, field, &vstr, &vlen, &vll, NULL) == C_OK) { if (vstr) { addReplyBulkCBuffer(c, vstr, vlen); } else { @@ -950,7 +995,6 @@ void hgetCommand(client *c) { robj *o; if ((o = lookupKeyReadOrReply(c, c->argv[1], shared.null[c->resp])) == NULL || checkType(c, o, OBJ_HASH)) return; - addHashFieldToReply(c, o, c->argv[2]->ptr); } @@ -961,12 +1005,16 @@ void hmgetCommand(client *c) { /* Don't abort when the key cannot be found. Non-existing keys are empty * hashes, where HMGET should respond with a series of null bulks. */ o = lookupKeyRead(c->db, c->argv[1]); + if (checkType(c, o, OBJ_HASH)) return; addReplyArrayLen(c, c->argc - 2); for (i = 2; i < c->argc; i++) { addHashFieldToReply(c, o, c->argv[i]->ptr); } + if (o && hashTypeLength(o) == 0) { + dbDelete(c->db, c->argv[1]); + } } void hdelCommand(client *c) { @@ -974,7 +1022,6 @@ void hdelCommand(client *c) { int j, deleted = 0, keyremoved = 0; if ((o = lookupKeyWriteOrReply(c, c->argv[1], shared.czero)) == NULL || checkType(c, o, OBJ_HASH)) return; - for (j = 2; j < c->argc; j++) { if (hashTypeDelete(o, c->argv[j]->ptr)) { deleted++; @@ -1028,10 +1075,395 @@ static void addHashIteratorCursorToReply(writePreparedClient *wpc, hashTypeItera } } +void hsetnxCommand(client *c) { + robj *o; + if ((o = hashTypeLookupWriteOrCreate(c, c->argv[1])) == NULL) return; + if (hashTypeExists(o, c->argv[2]->ptr)) { + addReply(c, shared.czero); + } else { + hashTypeTryConversion(o, c->argv, 2, 3); + hashTypeSet(o, c->argv[2]->ptr, c->argv[3]->ptr, EXPIRY_NONE, HASH_SET_COPY | HASH_SET_KEEP_EXPIRY); + signalModifiedKey(c, c->db, c->argv[1]); + notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); + server.dirty++; + addReply(c, shared.cone); + } +} + +void hsetCommand(client *c) { + int i, created = 0; + robj *o; + + if ((c->argc % 2) == 1) { + addReplyErrorArity(c); + return; + } + + if ((o = hashTypeLookupWriteOrCreate(c, c->argv[1])) == NULL) return; + hashTypeTryConversion(o, c->argv, 2, c->argc - 1); + + for (i = 2; i < c->argc; i += 2) created += !hashTypeSet(o, c->argv[i]->ptr, c->argv[i + 1]->ptr, EXPIRY_NONE, HASH_SET_COPY); + + signalModifiedKey(c, c->db, c->argv[1]); + notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); + server.dirty += (c->argc - 2) / 2; + + /* HMSET (deprecated) and HSET return value is different. */ + char *cmdname = c->argv[0]->ptr; + if (cmdname[1] == 's' || cmdname[1] == 'S') { + /* HSET */ + addReplyLongLong(c, created); + } else { + /* HMSET */ + addReply(c, shared.ok); + } +} + +/* High-Level Algorithm of HSETEX Command: + * + * - Parse arguments and options: + * Parses optional flags such as NX, XX, FNX, FXX, KEEPTTL, and expiration time options. + * Ensures the number of specified fields matches the actual provided key-value pairs. + * + * - Check object existence conditions: + * Depending on NX/XX flags, verifies whether the hash key must or must not exist. + * Exits early with a zero reply if conditions aren't met. + * + * - Create the hash object if needed: + * If the key does not exist and creation is permitted, allocates a new hash. + * + * - Handle expiration logic: + * Computes the expiry time (relative or absolute). + * If the expiration is in the past, the command proceeds to delete the relevant fields. + * + * - Enforce per-field conditions: + * If FNX (field must not exist) or FXX (field must exist) flags are set, + * ensures all fields satisfy these conditions before proceeding. + * + * - Apply changes: + * Either deletes expired fields or sets fields with optional expiration. + * + * - Clean up and notify: + * Deletes the key if the hash becomes empty. + * Emits keyspace notifications for changes (see below). + * Modifies the command vector for AOF propagation if necessary. + * + * + * Return Value: + * - Returns integer 1 if all fields were successfully updated or deleted. + * - Returns integer 0 if no fields were updated due to condition failures. + * + * + * Keyspace Notifications (if enabled): + * - "hset" — Emitted when fields are added or updated. + * - "hexpire" — Emitted when expiration is set on fields. + * - "hexpired" — Emitted when fields are immediately expired and deleted. + * - "del" — Emitted if the entire key is removed (empty hash). + * + * + * Client Reply: + * - Integer reply: 1 if all changes succeeded, 0 if no changes occurred. */ +void hsetexCommand(client *c) { + robj *o; + robj *expire = NULL; + robj *comparison = NULL; + int unit = UNIT_SECONDS; + int flags = ARGS_NO_FLAGS; + int fields_index = 0; + long long num_fields = 0; + long long when = EXPIRY_NONE; + int i = 0; + int set_flags = HASH_SET_COPY, set_expired = 0; + int changes = 0; + robj **new_argv = NULL; + int new_argc = 0; + + for (; fields_index < c->argc; fields_index++) { + if (!strcasecmp(c->argv[fields_index]->ptr, "fields")) { + /* checking optional flags */ + if (parseExtendedCommandArgumentsOrReply(c, &flags, &unit, &expire, &comparison, COMMAND_HSET, fields_index++) != C_OK) return; + if (getLongLongFromObjectOrReply(c, c->argv[fields_index++], &num_fields, NULL) != C_OK) return; + break; + } + } + /* Check that the parsed fields number matches the real provided number of fields */ + if (!num_fields || num_fields != (c->argc - fields_index) / 2) { + addReplyError(c, "numfields should be greater than 0 and match the provided number of fields"); + return; + } + + o = lookupKeyWrite(c->db, c->argv[1]); + if (checkType(c, o, OBJ_HASH)) + return; + + if (o == NULL) { + o = createHashObject(); + dbAdd(c->db, c->argv[1], &o); + } + + /* Handle parsing and calculating the expiration time. */ + if (flags & ARGS_KEEPTTL) + set_flags |= HASH_SET_KEEP_EXPIRY; + else if (expire) { + long long basetime = (flags & (ARGS_EXAT | ARGS_PXAT)) ? 0 : commandTimeSnapshot(); + + if (convertExpireArgumentToUnixTime(c, expire, basetime, unit, &when) == C_ERR) + return; + + if (checkAlreadyExpired(when)) { + set_expired = 1; + } + } + + /* Check for all fields condition */ + if (flags & (ARGS_SET_FNX | ARGS_SET_FXX)) { + for (i = fields_index; i < c->argc; i += 2) { + if (((flags & ARGS_SET_FNX) && hashTypeExists(o, c->argv[i]->ptr)) || + ((flags & ARGS_SET_FXX) && !hashTypeExists(o, c->argv[i]->ptr))) { + addReply(c, shared.czero); + return; + } + } + } + + /* In case we are expiring all the elements prepare a new argv since we are going to delete all the expired fields. */ + if (set_expired) { + new_argv = zmalloc(sizeof(robj *) * (num_fields + 2)); + new_argv[new_argc++] = shared.hdel; + incrRefCount(shared.hdel); + new_argv[new_argc++] = c->argv[1]; + incrRefCount(c->argv[1]); + } + + for (i = fields_index; i < c->argc; i += 2) { + if (set_expired) { + if (hashTypeDelete(o, c->argv[i]->ptr)) { + new_argv[new_argc++] = c->argv[i]; + incrRefCount(c->argv[i]); + changes++; + } + } else { + hashTypeSet(o, c->argv[i]->ptr, c->argv[i + 1]->ptr, when, set_flags); + changes++; + } + } + + + if (changes) { + notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); + if (set_expired) { + replaceClientCommandVector(c, new_argc, new_argv); + /* We would like to reduce the number of hexpired events in case there are potential many expired fields. */ + notifyKeyspaceEvent(NOTIFY_HASH, "hexpired", c->argv[1], c->db->id); + } else if (expire) { + /* Propagate as HSETEX Key Value PXAT millisecond-timestamp if there is + * EX/PX/EXAT flag. */ + if (!(flags & ARGS_PXAT)) { + for (int i = 2; i < fields_index; i++) { + if (c->argv[i + 1] == expire) { + robj *milliseconds_obj = createStringObjectFromLongLong(when); + rewriteClientCommandArgument(c, i, shared.pxat); + rewriteClientCommandArgument(c, i + 1, milliseconds_obj); + decrRefCount(milliseconds_obj); + break; + } + } + } + notifyKeyspaceEvent(NOTIFY_HASH, "hexpire", c->argv[1], c->db->id); + } + signalModifiedKey(c, c->db, c->argv[1]); + /* Delete the object in case it was left empty */ + if (hashTypeLength(o) == 0) { + dbDelete(c->db, c->argv[1]); + notifyKeyspaceEvent(NOTIFY_GENERIC, "del", c->argv[1], c->db->id); + } + server.dirty += changes; + } else { + /* If no changes were done we still need to free the new argv array and the refcount of the first argument. */ + if (set_expired) + decrRefCount(c->argv[1]); + if (new_argv) zfree(new_argv); + } + addReplyLongLong(c, changes == num_fields ? 1 : 0); +} + +/* High-Level Algorithm of HGETEX Command: + * + * - Parses the command for optional arguments, including expiration options, + * persistence flags, and the list of hash fields to retrieve. + * + * - Verifies that the number of fields specified matches the actual arguments, + * and ensures the key exists and is a valid hash type. + * + * - Computes the expiration behavior: + * - If `PERSIST` is provided, removes the expiration from the fields. + * - If an expiration time is specified, calculates it relative or absolute. + * - If already expired, deletes the fields immediately. + * - Otherwise, schedules new expiration timestamps. + * + * - Retrieves and replies with the values for each requested field. + * + * - For each field: + * - If expiration is due: deletes the field. + * - If an expiry is scheduled: updates the field's expiration timestamp. + * - If persisting: clears the field's expiration. + * + * - If any changes were made (deletes, expires, or persists): + * - Rewrites the command vector (for AOF and replication) using HDEL, HPEXPIREAT, or HPERSIST. + * - Issues keyspace notifications accordingly. + * - If the hash becomes empty as a result, deletes the key and notifies. + * + * + * Return Value: + * - Always replies with an array of values for the requested fields (including NULLs for missing fields). + * + * + * Keyspace Notifications (if enabled): + * - "hexpire" — When expiration is added to hash fields. + * - "hexpired" — When fields are immediately expired and deleted. + * - "hpersist" — When expiration is removed from fields. + * - "del" — If the hash becomes empty and is removed entirely. */ +void hgetexCommand(client *c) { + robj *o; + robj *expire = NULL; + robj *comparison = NULL; + int unit = UNIT_SECONDS; + int flags = ARGS_NO_FLAGS; + int fields_index = 0; + long long num_fields = -1; + long long when = EXPIRY_NONE; + int i = 0; + int set_expiry = 0, set_expired = 0, persist = 0; + int changes = 0; + robj **new_argv = NULL; + robj *milliseconds_obj = NULL, *numitems_obj = NULL; + int new_argc = 0; + int milliseconds_index = -1, numitems_index = -1; + + for (; fields_index < c->argc; fields_index++) { + if (!strcasecmp(c->argv[fields_index]->ptr, "fields")) { + /* checking optional flags */ + if (parseExtendedCommandArgumentsOrReply(c, &flags, &unit, &expire, &comparison, COMMAND_HGET, fields_index++) != C_OK) return; + if (getLongLongFromObjectOrReply(c, c->argv[fields_index++], &num_fields, NULL) != C_OK) return; + break; + } + } + + /* Check that the parsed fields number matches the real provided number of fields */ + if (!num_fields || num_fields != (c->argc - fields_index)) { + addReplyError(c, "numfields should be greater than 0 and match the provided number of fields"); + return; + } + + if ((o = lookupKeyReadOrReply(c, c->argv[1], shared.null[c->resp])) == NULL || checkType(c, o, OBJ_HASH)) return; + + /* Handle parsing and calculating the expiration time. */ + if (flags & ARGS_PERSIST) { + persist = 1; + } else if (expire) { + long long basetime = (flags & (ARGS_EXAT | ARGS_PXAT)) ? 0 : commandTimeSnapshot(); + + if (convertExpireArgumentToUnixTime(c, expire, basetime, unit, &when) == C_ERR) + return; + + if (checkAlreadyExpired(when)) { + set_expired = 1; + when = 0; + } else { + set_expiry = 1; + } + } + + initDeferredReplyBuffer(c); + + addReplyArrayLen(c, num_fields); + /* This command is never propagated as is. It is either propagated as HDEL, HPEXPIREAT or PERSIST. + * This why it doesn't need special handling in feedAppendOnlyFile to convert relative expire time to absolute one. */ + if (set_expiry || set_expired || persist) { + /* allocate a new client argv for replicating the command. */ + new_argv = zmalloc(sizeof(robj *) * (num_fields + 5)); + if (set_expired) + new_argv[new_argc++] = shared.hdel; + else if (persist) + new_argv[new_argc++] = shared.hpersist; + else + new_argv[new_argc++] = shared.hpexpireat; + + new_argv[new_argc++] = c->argv[1]; + incrRefCount(c->argv[1]); + + if (set_expiry) { + new_argv[new_argc++] = NULL; // placeholder for the expiration time + milliseconds_index = new_argc - 1; + } + + if (set_expiry || persist) { + new_argv[new_argc++] = shared.fields; + new_argv[new_argc++] = NULL; // placeholder for the number of objects + numitems_index = new_argc - 1; + } + } + for (i = fields_index; i < c->argc; i++) { + int changed = 0; + addHashFieldToReply(c, o, c->argv[i]->ptr); + if (set_expired) { + changed = hashTypeDelete(o, c->argv[i]->ptr); + } else if (set_expiry) { + changed = (hashTypeSetExpire(o, c->argv[i]->ptr, when, 0) == EXPIRATION_MODIFICATION_SUCCESSFUL) ? 1 : 0; + } else if (persist) { + changed = (hashTypePersist(o, c->argv[i]->ptr) == EXPIRATION_MODIFICATION_SUCCESSFUL) ? 1 : 0; + } + if (changed) { + changes++; + new_argv[new_argc++] = c->argv[i]; + incrRefCount(c->argv[i]); + } + } + + /* rewrite the command vector and persist in case there are changes. + * Also notify keyspace notifications and signal the key was changed. */ + if (changes) { + if (milliseconds_index > 0) { + milliseconds_obj = createStringObjectFromLongLong(when); + new_argv[milliseconds_index] = milliseconds_obj; + incrRefCount(milliseconds_obj); + } + if (numitems_index > 0) { + numitems_obj = createStringObjectFromLongLong(changes); + new_argv[numitems_index] = numitems_obj; + incrRefCount(numitems_obj); + } + replaceClientCommandVector(c, new_argc, new_argv); + if (set_expired) + notifyKeyspaceEvent(NOTIFY_HASH, "hexpired", c->argv[1], c->db->id); + else + notifyKeyspaceEvent(NOTIFY_HASH, set_expiry ? "hexpire" : "hpersist", c->argv[1], c->db->id); + if (milliseconds_obj) decrRefCount(milliseconds_obj); + if (numitems_obj) decrRefCount(numitems_obj); + + server.dirty += changes; + signalModifiedKey(c, c->db, c->argv[1]); + + /* Delete the object in case it was left empty */ + if (hashTypeLength(o) == 0) { + dbDelete(c->db, c->argv[1]); + notifyKeyspaceEvent(NOTIFY_GENERIC, "del", c->argv[1], c->db->id); + } + } else { + /* If no changes were done we still need to free the new argv array and the refcount of the first argument. */ + if (set_expiry || set_expired || persist) { + decrRefCount(c->argv[1]); + } + if (new_argv) zfree(new_argv); + } + + commitDeferredReplyBuffer(c, 1); +} + void genericHgetallCommand(client *c, int flags) { robj *o; hashTypeIterator hi; - int length, count = 0; + int count = 0; robj *emptyResp = (flags & OBJ_HASH_FIELD && flags & OBJ_HASH_VALUE) ? shared.emptymap[c->resp] : shared.emptyarray; if ((o = lookupKeyReadOrReply(c, c->argv[1], emptyResp)) == NULL || checkType(c, o, OBJ_HASH)) return; @@ -1040,13 +1472,7 @@ void genericHgetallCommand(client *c, int flags) { if (!wpc) return; /* We return a map if the user requested fields and values, like in the * HGETALL case. Otherwise to use a flat array makes more sense. */ - length = hashTypeLength(o); - if (flags & OBJ_HASH_FIELD && flags & OBJ_HASH_VALUE) { - addWritePreparedReplyMapLen(wpc, length); - } else { - addWritePreparedReplyArrayLen(wpc, length); - } - + void *replylen = addReplyDeferredLen(c); hashTypeInitIterator(o, &hi); while (hashTypeNext(&hi) != C_ERR) { if (flags & OBJ_HASH_FIELD) { @@ -1060,10 +1486,13 @@ void genericHgetallCommand(client *c, int flags) { } hashTypeResetIterator(&hi); - /* Make sure we returned the right number of elements. */ - if (flags & OBJ_HASH_FIELD && flags & OBJ_HASH_VALUE) count /= 2; - serverAssert(count == length); + if (flags & OBJ_HASH_FIELD && flags & OBJ_HASH_VALUE) { + setDeferredMapLen(c, replylen, count /= 2); + count /= 2; + } else { + setDeferredArrayLen(c, replylen, count); + } } void hkeysCommand(client *c) { @@ -1081,7 +1510,6 @@ void hgetallCommand(client *c) { void hexistsCommand(client *c) { robj *o; if ((o = lookupKeyReadOrReply(c, c->argv[1], shared.czero)) == NULL || checkType(c, o, OBJ_HASH)) return; - addReply(c, hashTypeExists(o, c->argv[2]->ptr) ? shared.cone : shared.czero); } @@ -1111,6 +1539,281 @@ static void hrandfieldReplyWithListpack(writePreparedClient *wpc, unsigned int c } } + +/* High-Level Algorithm of hexpireGenericCommand (used by HEXPIRE, HPEXPIRE, HEXPIREAT, HPEXPIREAT): + * + * - Parses optional flags and the number of hash fields to apply expiration to. + * + * - Converts the given expiration time (relative or absolute) into a Unix timestamp. + * + * - Determines if the given timestamp is already expired: + * - If so, immediately deletes the specified hash fields. + * - If not, updates their expiration metadata. + * + * - Responds with an array of integers: + * - 1 if the expiration was set. + * - 0 if it was unchanged (due to provided condition check failing). + * - -2 if the field does not exist or the hash is empty. + * - 2 if the field was immediately expired and deleted due to provided expiration is 0 or in the past. + * + * - If fields were deleted due to expiration: + * - Rewrites the command as HDEL for replication/AOF. + * - Emits a "hexpired" keyspace event. + * + * - If expiration was newly set: + * - May rewrite the command as HPEXPIREAT if needed. + * - Emits a "hexpire" keyspace event. + * + * - If the hash becomes empty after deletions: + * - Deletes the hash key. + * - Emits a "del" event for the key. + * + * Return Value: + * - An array of integers corresponding to the result for each field. + * + * Keyspace Notifications (if enabled): + * - "hexpired" — when fields are immediately expired and deleted. + * - "hexpire" — when fields receive new expiration timestamps. + * - "del" — when the hash key becomes empty and is removed. */ +void hexpireGenericCommand(client *c, long long basetime, int unit) { + robj *key = c->argv[1], *param = c->argv[2]; + long long when; /* unix time in milliseconds when the key will expire. */ + int flag = 0; + int fields_index = 3; + long long num_fields = 0; + int i, expired = 0, updated = 0; + int set_expired = 0; + robj **new_argv = NULL; + int new_argc = 0; + + for (; fields_index < c->argc; fields_index++) { + if (!strcasecmp(c->argv[fields_index]->ptr, "fields")) { + /* checking optional flags */ + if (parseExtendedExpireArgumentsOrReply(c, &flag, fields_index++) != C_OK) return; + if (getLongLongFromObjectOrReply(c, c->argv[fields_index++], &num_fields, NULL) != C_OK) return; + break; + } + } + + /* Check that the parsed fields number matches the real provided number of fields */ + if (!num_fields || num_fields != (c->argc - fields_index)) { + addReplyError(c, "numfields should be greater than 0 and match the provided number of fields"); + return; + } + + if (convertExpireArgumentToUnixTime(c, param, basetime, unit, &when) == C_ERR) + return; + + if (checkAlreadyExpired(when)) + set_expired = 1; + + robj *obj = lookupKeyWrite(c->db, key); + + /* Non HASH type return simple error */ + if (checkType(c, obj, OBJ_HASH)) { + return; + } + /* From this point we would return array reply */ + addReplyArrayLen(c, num_fields); + + /* In case we are expiring all the elements prepare a new argv since we are going to delete all the expired fields. */ + if (set_expired) { + new_argv = zmalloc(sizeof(robj *) * (num_fields + 3)); + new_argv[new_argc++] = shared.hdel; + incrRefCount(shared.hdel); + new_argv[new_argc++] = c->argv[1]; + incrRefCount(c->argv[1]); + } + + for (i = 0; i < num_fields; i++) { + expiryModificationResult result = EXPIRATION_MODIFICATION_NOT_EXIST; + if (set_expired) { + if (obj && hashTypeDelete(obj, c->argv[fields_index + i]->ptr)) { + /* In case we deleted the field, add it to the new hdel command vector. */ + new_argv[new_argc++] = c->argv[fields_index + i]; + incrRefCount(c->argv[fields_index + i]); + result = EXPIRATION_MODIFICATION_EXPIRE_ASAP; + expired++; + } + } else { + result = hashTypeSetExpire(obj, c->argv[fields_index + i]->ptr, when, flag); + if (result == EXPIRATION_MODIFICATION_SUCCESSFUL) updated++; + } + addReplyLongLong(c, result); + } + + if (expired || updated) { + if (expired) { + replaceClientCommandVector(c, new_argc, new_argv); + /* We would like to reduce the number of hexpired events in case there are potential many expired fields. */ + notifyKeyspaceEvent(NOTIFY_HASH, "hexpired", c->argv[1], c->db->id); + } else if (updated) { + /* Propagate as HPEXPIREAT millisecond-timestamp + * Only rewrite the command arg if not already HPEXPIREAT */ + if (c->cmd->proc != hpexpireatCommand) { + rewriteClientCommandArgument(c, 0, shared.hpexpireat); + } + + /* Avoid creating a string object when it's the same as argv[2] parameter */ + if (basetime != 0 || unit == UNIT_SECONDS) { + robj *when_obj = createStringObjectFromLongLong(when); + rewriteClientCommandArgument(c, 2, when_obj); + decrRefCount(when_obj); + } + notifyKeyspaceEvent(NOTIFY_HASH, "hexpire", c->argv[1], c->db->id); + } + server.dirty += (expired + updated); // in case there was a change increment the dirty + signalModifiedKey(c, c->db, c->argv[1]); + /* Delete the object in case it was left empty */ + if (hashTypeLength(obj) == 0) { + dbDelete(c->db, c->argv[1]); + notifyKeyspaceEvent(NOTIFY_GENERIC, "del", c->argv[1], c->db->id); + } + } +} + +void hexpireCommand(client *c) { + hexpireGenericCommand(c, commandTimeSnapshot(), UNIT_SECONDS); +} + +void hexpireatCommand(client *c) { + hexpireGenericCommand(c, 0, UNIT_SECONDS); +} + +void hpexpireCommand(client *c) { + hexpireGenericCommand(c, commandTimeSnapshot(), UNIT_MILLISECONDS); +} + +void hpexpireatCommand(client *c) { + hexpireGenericCommand(c, 0, UNIT_MILLISECONDS); +} + +/* High-Level Algorithm of HPERSIST Command: + * + * - Expects a key and a list of hash fields whose expiration metadata should be removed. + * - Validates that the number of provided fields matches the declared count. + * + * - For each specified field attempts to remove any existing expiration. + * - Replies to the client with an array of integers, each representing the result of persistence for one field: + * - 1 if the expiration for the field was removed. + * - -1 if the field exists, but has no expiration time set. + * - -2 if the field does not exist or the hash is empty. + * + * - If any expirations were removed: + * - Marks the key as modified (for replication/AOF consistency). + * - Emits a "hpersist" keyspace notification. + * + * Keyspace Notifications (if enabled): + * - "hpersist" — emitted once if any field had its expiration removed. */ +void hpersistCommand(client *c) { + int fields_index = 4, result = 0, changes = 0; + long long num_fields = 0; + + if (getLongLongFromObjectOrReply(c, c->argv[fields_index - 1], &num_fields, NULL) != C_OK) return; + + /* Check that the parsed fields number matches the real provided number of fields */ + if (!num_fields || num_fields != (c->argc - fields_index)) { + addReplyError(c, "numfields should be greater than 0 and match the provided number of fields"); + return; + } + + /* From this point we would return array reply */ + addReplyArrayLen(c, num_fields); + + robj *hash = lookupKeyWrite(c->db, c->argv[1]); + if (checkType(c, hash, OBJ_HASH)) + return; + + for (int i = 0; i < num_fields; i++, fields_index++) { + result = hashTypePersist(hash, c->argv[fields_index]->ptr); + if (result == EXPIRATION_MODIFICATION_SUCCESSFUL) { + server.dirty++; + changes++; + } + addReplyLongLong(c, result); + } + if (changes) { + notifyKeyspaceEvent(NOTIFY_HASH, "hpersist", c->argv[1], c->db->id); + signalModifiedKey(c, c->db, c->argv[1]); + } +} + +/* High-Level Algorithm of HTTL / HPTTL / HEXPIRETIME / HPEXPIRETIME Commands: + * + * - These commands return the remaining time to live (TTL) or absolute expiry time + * of one or more fields in a hash. + * + * - HTTL / HPTTL: + * - Return relative TTL of each field (in seconds or milliseconds). + * - TTL is computed as the difference between current time and expiry time. + * + * - HEXPIRETIME / HPEXPIRETIME: + * - Return the absolute Unix time at which each field will expire + * (in seconds or milliseconds, depending on the variant). + * + * For each field requested: + * - If the field or hash does not exist: reply with -2. + * - If the field exists but has no expiration: reply with -1. + * - If the field has an expiration: + * - HTTL / HPTTL: reply with remaining TTL (clamped at 0 if negative). + * - HEXPIRETIME / HPEXPIRETIME: reply with the absolute expiry time. + * + * Return Value: + * - An array of integers, one per field: + * - -2 = hash or field does not exist. + * - -1 = field exists but has no expiration. + * - >=0 = TTL or expiry time, depending on the command variant. + * + * Keyspace Notifications: + * - None emitted; this command is read-only. */ +void httlGenericCommand(client *c, long long basetime, int unit) { + int fields_index = 4; + long long num_fields = 0, result = -2; + + if (getLongLongFromObjectOrReply(c, c->argv[fields_index - 1], &num_fields, NULL) != C_OK) return; + + /* Check that the parsed fields number matches the real provided number of fields */ + if (!num_fields || num_fields != (c->argc - fields_index)) { + addReplyErrorObject(c, shared.syntaxerr); + return; + } + + robj *hash = lookupKeyRead(c->db, c->argv[1]); + + if (checkType(c, hash, OBJ_HASH)) return; + + /* From this point we would return array reply */ + addReplyArrayLen(c, num_fields); + + for (int i = 0; i < num_fields; i++) { + if (!hash || hashTypeGetExpiry(hash, c->argv[fields_index + i]->ptr, &result) == C_ERR) { + addReplyLongLong(c, -2); + } else if (result == EXPIRY_NONE) { + addReplyLongLong(c, -1); + } else { + result = result - basetime; + if (result < 0) result = 0; + addReplyLongLong(c, unit == UNIT_MILLISECONDS ? result : ((result + 500) / 1000)); + } + } +} + +void httlCommand(client *c) { + httlGenericCommand(c, commandTimeSnapshot(), UNIT_SECONDS); +} + +void hpttlCommand(client *c) { + httlGenericCommand(c, commandTimeSnapshot(), UNIT_MILLISECONDS); +} + +void hexpiretimeCommand(client *c) { + httlGenericCommand(c, 0, UNIT_SECONDS); +} + +void hpexpiretimeCommand(client *c) { + httlGenericCommand(c, 0, UNIT_MILLISECONDS); +} + /* How many times bigger should be the hash compared to the requested size * for us to not use the "remove elements" strategy? Read later in the * implementation for more info. */ @@ -1144,26 +1847,30 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { writePreparedClient *wpc = prepareClientForFutureWrites(c); if (!wpc) return; + + void *replylen = addReplyDeferredLen(c); + unsigned long reply_size = 0; + /* CASE 1: The count was negative, so the extraction method is just: * "return N random elements" sampling the whole set every time. * This case is trivial and can be served without auxiliary data * structures. This case is the only one that also needs to return the * elements in random order. */ if (!uniq || count == 1) { - if (withvalues && c->resp == 2) - addWritePreparedReplyArrayLen(wpc, count * 2); - else - addWritePreparedReplyArrayLen(wpc, count); if (hash->encoding == OBJ_ENCODING_HASHTABLE) { while (count--) { - void *entry; - hashtableFairRandomEntry(hash->ptr, &entry); - sds field = hashTypeEntryGetField(entry); - sds value = hashTypeEntryGetValue(entry); + listpackEntry field, value; + hashTypeRandomElement(hash, size, &field, &value); + + /* In case we were unable to locate random element, it is probably because there is no such element + * since all elements are expired. */ + if (!field.sval) break; + if (withvalues && c->resp > 2) addWritePreparedReplyArrayLen(wpc, 2); - addWritePreparedReplyBulkCBuffer(wpc, field, sdslen(field)); - if (withvalues) addWritePreparedReplyBulkCBuffer(wpc, value, sdslen(value)); + addWritePreparedReplyBulkCBuffer(wpc, field.sval, field.slen); + if (withvalues) addWritePreparedReplyBulkCBuffer(wpc, value.sval, value.slen); if (c->flag.close_asap) break; + reply_size++; } } else if (hash->encoding == OBJ_ENCODING_LISTPACK) { listpackEntry *fields, *vals = NULL; @@ -1175,6 +1882,7 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { while (count) { sample_count = count > limit ? limit : count; count -= sample_count; + reply_size += sample_count; lpRandomPairs(hash->ptr, sample_count, fields, vals); hrandfieldReplyWithListpack(wpc, sample_count, fields, vals); if (c->flag.close_asap) break; @@ -1182,16 +1890,9 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { zfree(fields); zfree(vals); } - return; + goto set_deferred_response; } - /* Initiate reply count, RESP3 responds with nested array, RESP2 with flat one. */ - long reply_size = count < size ? count : size; - if (withvalues && c->resp == 2) - addWritePreparedReplyArrayLen(wpc, reply_size * 2); - else - addWritePreparedReplyArrayLen(wpc, reply_size); - /* CASE 2: * The number of requested elements is greater than the number of * elements inside the hash: simply return the whole hash. */ @@ -1202,11 +1903,14 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { if (withvalues && c->resp > 2) addWritePreparedReplyArrayLen(wpc, 2); addHashIteratorCursorToReply(wpc, &hi, OBJ_HASH_FIELD); if (withvalues) addHashIteratorCursorToReply(wpc, &hi, OBJ_HASH_VALUE); + reply_size++; } hashTypeResetIterator(&hi); - return; + + goto set_deferred_response; } + /* CASE 2.5 listpack only. Sampling unique elements, in non-random order. * Listpack encoded hashes are meant to be relatively small, so * HRANDFIELD_SUB_STRATEGY_MUL isn't necessary and we rather not make @@ -1216,6 +1920,7 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { * And it is inefficient to repeatedly pick one random element from a * listpack in CASE 4. So we use this instead. */ if (hash->encoding == OBJ_ENCODING_LISTPACK) { + reply_size = count < size ? count : size; listpackEntry *fields, *vals = NULL; fields = zmalloc(sizeof(listpackEntry) * count); if (withvalues) vals = zmalloc(sizeof(listpackEntry) * count); @@ -1223,7 +1928,7 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { hrandfieldReplyWithListpack(wpc, count, fields, vals); zfree(fields); zfree(vals); - return; + goto set_deferred_response; } /* CASE 3: @@ -1247,24 +1952,25 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { while (hashtableNext(&iter, &entry)) { int res = hashtableAdd(ht, entry); serverAssert(res); + reply_size++; } - serverAssert(hashtableSize(ht) == size); + serverAssert(hashtableSize(ht) == reply_size); hashtableResetIterator(&iter); /* Remove random elements to reach the right count. */ - while (size > count) { + while (reply_size > count) { void *element; hashtableFairRandomEntry(ht, &element); hashtableDelete(ht, element); - size--; + reply_size--; } /* Reply with what's in the temporary hashtable and release memory */ hashtableInitIterator(&iter, ht, 0); void *next; while (hashtableNext(&iter, &next)) { - sds field = hashTypeEntryGetField(next); - sds value = hashTypeEntryGetValue(next); + sds field = entryGetField(next); + sds value = entryGetValue(next); if (withvalues && c->resp > 2) addWritePreparedReplyArrayLen(wpc, 2); addWritePreparedReplyBulkCBuffer(wpc, field, sdslen(field)); if (withvalues) addWritePreparedReplyBulkCBuffer(wpc, value, sdslen(value)); @@ -1287,8 +1993,12 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { while (added < count) { hashTypeRandomElement(hash, size, &field, withvalues ? &value : NULL); - /* Try to add the object to the hashtable. If it already exists - * free it, otherwise increment the number of objects we have + /* In case we were unable to locate random element, it is probably because there is no such element + * since all elements are expired. */ + if (!field.sval) break; + + /* Try to add the object to the hashtable. If expired, stop adding (there are probably non left). + * If it already exists free it, otherwise increment the number of objects we have * in the result hashtable. */ sds sfield = hashSdsFromListpackEntry(&field); if (!hashtableAdd(ht, sfield)) { @@ -1305,7 +2015,15 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) { /* Release memory */ hashtableRelease(ht); + reply_size = added; } + +set_deferred_response: + /* Set the reply count, RESP3 responds with nested array, RESP2 with flat one. */ + if (withvalues && c->resp == 2) + setDeferredArrayLen(c, replylen, reply_size * 2); + else + setDeferredArrayLen(c, replylen, reply_size); } /* HRANDFIELD key [ [WITHVALUES]] */ @@ -1328,6 +2046,7 @@ void hrandfieldCommand(client *c) { } } hrandfieldWithCountCommand(c, l, withvalues); + return; } @@ -1335,7 +2054,6 @@ void hrandfieldCommand(client *c) { if ((hash = lookupKeyReadOrReply(c, c->argv[1], shared.null[c->resp])) == NULL || checkType(c, hash, OBJ_HASH)) { return; } - hashTypeRandomElement(hash, hashTypeLength(hash), &ele, NULL); hashReplyFromListpackEntry(c, &ele); } diff --git a/src/t_string.c b/src/t_string.c index ef3e4bccde7..a8c46a8a913 100644 --- a/src/t_string.c +++ b/src/t_string.c @@ -55,6 +55,9 @@ static int checkStringLength(client *c, long long size, long long append) { return C_OK; } +/* Forward declaration */ +static int getExpireMillisecondsOrReply(client *c, robj *expire, int flags, int unit, long long *milliseconds); + /* The setGenericCommand() function implements the SET operation with different * options and variants. This function is called in order to implement the * following commands: SET, SETEX, PSETEX, SETNX, GETSET. @@ -70,24 +73,6 @@ static int checkStringLength(client *c, long long size, long long append) { * * If ok_reply is NULL "+OK" is used. * If abort_reply is NULL, "$-1" is used. */ - -#define OBJ_NO_FLAGS 0 -#define OBJ_SET_NX (1 << 0) /* Set if key not exists. */ -#define OBJ_SET_XX (1 << 1) /* Set if key exists. */ -#define OBJ_EX (1 << 2) /* Set if time in seconds is given */ -#define OBJ_PX (1 << 3) /* Set if time in ms in given */ -#define OBJ_KEEPTTL (1 << 4) /* Set and keep the ttl */ -#define OBJ_SET_GET (1 << 5) /* Set if want to get key before set */ -#define OBJ_EXAT (1 << 6) /* Set if timestamp in second is given */ -#define OBJ_PXAT (1 << 7) /* Set if timestamp in ms is given */ -#define OBJ_PERSIST (1 << 8) /* Set if we need to remove the ttl */ -#define OBJ_SET_IFEQ (1 << 9) /* Set if we need compare and set */ -#define OBJ_ARGV3 (1 << 10) /* Set if the value is at argv[3]; otherwise it's \ - * at argv[2]. */ - -/* Forward declaration */ -static int getExpireMillisecondsOrReply(client *c, robj *expire, int flags, int unit, long long *milliseconds); - void setGenericCommand(client *c, int flags, robj *key, @@ -105,7 +90,7 @@ void setGenericCommand(client *c, return; } - if (flags & OBJ_SET_GET) { + if (flags & ARGS_SET_GET) { initDeferredReplyBuffer(c); if (getGenericCommand(c) == C_ERR) goto cleanup; } @@ -114,26 +99,26 @@ void setGenericCommand(client *c, found = existing_value != NULL; /* Handle the IFEQ conditional check */ - if (flags & OBJ_SET_IFEQ && found) { - if (!(flags & OBJ_SET_GET) && checkType(c, existing_value, OBJ_STRING)) { + if (flags & ARGS_SET_IFEQ && found) { + if (!(flags & ARGS_SET_GET) && checkType(c, existing_value, OBJ_STRING)) { goto cleanup; } if (compareStringObjects(existing_value, comparison) != 0) { - if (!(flags & OBJ_SET_GET)) { + if (!(flags & ARGS_SET_GET)) { addReply(c, abort_reply ? abort_reply : shared.null[c->resp]); } goto cleanup; } - } else if (flags & OBJ_SET_IFEQ && !found) { - if (!(flags & OBJ_SET_GET)) { + } else if (flags & ARGS_SET_IFEQ && !found) { + if (!(flags & ARGS_SET_GET)) { addReply(c, abort_reply ? abort_reply : shared.null[c->resp]); } goto cleanup; } - if ((flags & OBJ_SET_NX && found) || (flags & OBJ_SET_XX && !found)) { - if (!(flags & OBJ_SET_GET)) { + if ((flags & ARGS_SET_NX && found) || (flags & ARGS_SET_XX && !found)) { + if (!(flags & ARGS_SET_GET)) { addReply(c, abort_reply ? abort_reply : shared.null[c->resp]); } goto cleanup; @@ -144,13 +129,13 @@ void setGenericCommand(client *c, * If the key already exists, delete it. */ if (expire && checkAlreadyExpired(milliseconds)) { if (found) deleteExpiredKeyFromOverwriteAndPropagate(c, key); - if (!(flags & OBJ_SET_GET)) addReply(c, shared.ok); + if (!(flags & ARGS_SET_GET)) addReply(c, shared.ok); goto cleanup; } /* When expire is not NULL, we avoid deleting the TTL so it can be updated later instead of being deleted and then * created again. */ - setkey_flags |= ((flags & OBJ_KEEPTTL) || expire) ? SETKEY_KEEPTTL : 0; + setkey_flags |= ((flags & ARGS_KEEPTTL) || expire) ? SETKEY_KEEPTTL : 0; setkey_flags |= found ? SETKEY_ALREADY_EXIST : SETKEY_DOESNT_EXIST; setKey(c, c->db, key, &val, setkey_flags); @@ -158,7 +143,7 @@ void setGenericCommand(client *c, /* By setting the reallocated value back into argv, we can avoid duplicating * a large string value when adding it to the db. */ - c->argv[(flags & OBJ_ARGV3) ? 3 : 2] = val; + c->argv[(flags & ARGS_ARGV3) ? 3 : 2] = val; incrRefCount(val); server.dirty++; @@ -167,7 +152,7 @@ void setGenericCommand(client *c, if (expire) { /* Propagate as SET Key Value PXAT millisecond-timestamp if there is * EX/PX/EXAT flag. */ - if (!(flags & OBJ_PXAT)) { + if (!(flags & ARGS_PXAT)) { robj *milliseconds_obj = createStringObjectFromLongLong(milliseconds); rewriteClientCommandVector(c, 5, shared.set, key, val, shared.pxat, milliseconds_obj); decrRefCount(milliseconds_obj); @@ -175,13 +160,13 @@ void setGenericCommand(client *c, notifyKeyspaceEvent(NOTIFY_GENERIC, "expire", key, c->db->id); } - if (!(flags & OBJ_SET_GET)) { + if (!(flags & ARGS_SET_GET)) { addReply(c, ok_reply ? ok_reply : shared.ok); } /* Propagate without the GET argument (Isn't needed if we had expire since in that case we completely re-written the * command argv) */ - if ((flags & OBJ_SET_GET) && !expire) { + if ((flags & ARGS_SET_GET) && !expire) { int argc = 0; int j; robj **argv = zmalloc((c->argc - 1) * sizeof(robj *)); @@ -227,7 +212,7 @@ static int getExpireMillisecondsOrReply(client *c, robj *expire, int flags, int if (unit == UNIT_SECONDS) *milliseconds *= 1000; - if ((flags & OBJ_PX) || (flags & OBJ_EX)) { + if ((flags & ARGS_PX) || (flags & ARGS_EX)) { *milliseconds += commandTimeSnapshot(); } @@ -240,118 +225,6 @@ static int getExpireMillisecondsOrReply(client *c, robj *expire, int flags, int return C_OK; } -#define COMMAND_GET 0 -#define COMMAND_SET 1 -/* - * The parseExtendedStringArgumentsOrReply() function performs the common validation for extended - * string arguments used in SET and GET command. - * - * Get specific commands - PERSIST/DEL - * Set specific commands - XX/NX/GET/IFEQ - * Common commands - EX/EXAT/PX/PXAT/KEEPTTL - * - * Function takes pointers to client, flags, unit, pointer to pointer of expire obj if needed - * to be determined and command_type which can be COMMAND_GET or COMMAND_SET. - * - * If there are any syntax violations C_ERR is returned else C_OK is returned. - * - * Input flags are updated upon parsing the arguments. Unit and expire are updated if there are any - * EX/EXAT/PX/PXAT arguments. Unit is updated to millisecond if PX/PXAT is set. - */ -int parseExtendedStringArgumentsOrReply(client *c, int *flags, int *unit, robj **expire, robj **compare_val, int command_type) { - int j = command_type == COMMAND_GET ? 2 : 3; - for (; j < c->argc; j++) { - char *opt = c->argv[j]->ptr; - robj *next = (j == c->argc - 1) ? NULL : c->argv[j + 1]; - - /* clang-format off */ - if ((opt[0] == 'n' || opt[0] == 'N') && - (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && - !(*flags & OBJ_SET_XX || *flags & OBJ_SET_IFEQ) && (command_type == COMMAND_SET)) - { - *flags |= OBJ_SET_NX; - } else if ((opt[0] == 'x' || opt[0] == 'X') && - (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && - !(*flags & OBJ_SET_NX || *flags & OBJ_SET_IFEQ) && (command_type == COMMAND_SET)) - { - *flags |= OBJ_SET_XX; - } else if ((opt[0] == 'i' || opt[0] == 'I') && - (opt[1] == 'f' || opt[1] == 'F') && - (opt[2] == 'e' || opt[2] == 'E') && - (opt[3] == 'q' || opt[3] == 'Q') && opt[4] == '\0' && - next && !(*flags & OBJ_SET_NX || *flags & OBJ_SET_XX || *flags & OBJ_SET_IFEQ) && (command_type == COMMAND_SET)) - { - *flags |= OBJ_SET_IFEQ; - *compare_val = next; - j++; - } else if ((opt[0] == 'g' || opt[0] == 'G') && - (opt[1] == 'e' || opt[1] == 'E') && - (opt[2] == 't' || opt[2] == 'T') && opt[3] == '\0' && - (command_type == COMMAND_SET)) - { - *flags |= OBJ_SET_GET; - } else if (!strcasecmp(opt, "KEEPTTL") && !(*flags & OBJ_PERSIST) && - !(*flags & OBJ_EX) && !(*flags & OBJ_EXAT) && - !(*flags & OBJ_PX) && !(*flags & OBJ_PXAT) && (command_type == COMMAND_SET)) - { - *flags |= OBJ_KEEPTTL; - } else if (!strcasecmp(opt,"PERSIST") && (command_type == COMMAND_GET) && - !(*flags & OBJ_EX) && !(*flags & OBJ_EXAT) && - !(*flags & OBJ_PX) && !(*flags & OBJ_PXAT) && - !(*flags & OBJ_KEEPTTL)) - { - *flags |= OBJ_PERSIST; - } else if ((opt[0] == 'e' || opt[0] == 'E') && - (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && - !(*flags & OBJ_KEEPTTL) && !(*flags & OBJ_PERSIST) && - !(*flags & OBJ_EXAT) && !(*flags & OBJ_PX) && - !(*flags & OBJ_PXAT) && next) - { - *flags |= OBJ_EX; - *expire = next; - j++; - } else if ((opt[0] == 'p' || opt[0] == 'P') && - (opt[1] == 'x' || opt[1] == 'X') && opt[2] == '\0' && - !(*flags & OBJ_KEEPTTL) && !(*flags & OBJ_PERSIST) && - !(*flags & OBJ_EX) && !(*flags & OBJ_EXAT) && - !(*flags & OBJ_PXAT) && next) - { - *flags |= OBJ_PX; - *unit = UNIT_MILLISECONDS; - *expire = next; - j++; - } else if ((opt[0] == 'e' || opt[0] == 'E') && - (opt[1] == 'x' || opt[1] == 'X') && - (opt[2] == 'a' || opt[2] == 'A') && - (opt[3] == 't' || opt[3] == 'T') && opt[4] == '\0' && - !(*flags & OBJ_KEEPTTL) && !(*flags & OBJ_PERSIST) && - !(*flags & OBJ_EX) && !(*flags & OBJ_PX) && - !(*flags & OBJ_PXAT) && next) - { - *flags |= OBJ_EXAT; - *expire = next; - j++; - } else if ((opt[0] == 'p' || opt[0] == 'P') && - (opt[1] == 'x' || opt[1] == 'X') && - (opt[2] == 'a' || opt[2] == 'A') && - (opt[3] == 't' || opt[3] == 'T') && opt[4] == '\0' && - !(*flags & OBJ_KEEPTTL) && !(*flags & OBJ_PERSIST) && - !(*flags & OBJ_EX) && !(*flags & OBJ_EXAT) && - !(*flags & OBJ_PX) && next) - { - *flags |= OBJ_PXAT; - *unit = UNIT_MILLISECONDS; - *expire = next; - j++; - } else { - addReplyErrorObject(c,shared.syntaxerr); - return C_ERR; - } - /* clang-format on */ - } - return C_OK; -} - /* SET key value [NX | XX | IFEQ comparison-value] [GET] * [EX seconds | PX milliseconds | * EXAT seconds-timestamp | PXAT milliseconds-timestamp | KEEPTTL] */ @@ -359,9 +232,9 @@ void setCommand(client *c) { robj *expire = NULL; robj *comparison = NULL; int unit = UNIT_SECONDS; - int flags = OBJ_NO_FLAGS; + int flags = ARGS_NO_FLAGS; - if (parseExtendedStringArgumentsOrReply(c, &flags, &unit, &expire, &comparison, COMMAND_SET) != C_OK) { + if (parseExtendedCommandArgumentsOrReply(c, &flags, &unit, &expire, &comparison, COMMAND_SET, c->argc) != C_OK) { return; } @@ -371,17 +244,17 @@ void setCommand(client *c) { void setnxCommand(client *c) { c->argv[2] = tryObjectEncoding(c->argv[2]); - setGenericCommand(c, OBJ_SET_NX, c->argv[1], c->argv[2], NULL, 0, shared.cone, shared.czero, NULL); + setGenericCommand(c, ARGS_SET_NX, c->argv[1], c->argv[2], NULL, 0, shared.cone, shared.czero, NULL); } void setexCommand(client *c) { c->argv[3] = tryObjectEncoding(c->argv[3]); - setGenericCommand(c, OBJ_EX | OBJ_ARGV3, c->argv[1], c->argv[3], c->argv[2], UNIT_SECONDS, NULL, NULL, NULL); + setGenericCommand(c, ARGS_EX | ARGS_ARGV3, c->argv[1], c->argv[3], c->argv[2], UNIT_SECONDS, NULL, NULL, NULL); } void psetexCommand(client *c) { c->argv[3] = tryObjectEncoding(c->argv[3]); - setGenericCommand(c, OBJ_PX | OBJ_ARGV3, c->argv[1], c->argv[3], c->argv[2], UNIT_MILLISECONDS, NULL, NULL, NULL); + setGenericCommand(c, ARGS_PX | ARGS_ARGV3, c->argv[1], c->argv[3], c->argv[2], UNIT_MILLISECONDS, NULL, NULL, NULL); } /* DELIFEQ key value */ @@ -445,9 +318,9 @@ void getCommand(client *c) { void getexCommand(client *c) { robj *expire = NULL; int unit = UNIT_SECONDS; - int flags = OBJ_NO_FLAGS; + int flags = ARGS_NO_FLAGS; - if (parseExtendedStringArgumentsOrReply(c, &flags, &unit, &expire, NULL, COMMAND_GET) != C_OK) { + if (parseExtendedCommandArgumentsOrReply(c, &flags, &unit, &expire, NULL, COMMAND_GET, c->argc) != C_OK) { return; } @@ -472,7 +345,7 @@ void getexCommand(client *c) { /* This command is never propagated as is. It is either propagated as PEXPIRE[AT],DEL,UNLINK or PERSIST. * This why it doesn't need special handling in feedAppendOnlyFile to convert relative expire time to absolute one. */ - if (((flags & OBJ_PXAT) || (flags & OBJ_EXAT)) && checkAlreadyExpired(milliseconds)) { + if (((flags & ARGS_PXAT) || (flags & ARGS_EXAT)) && checkAlreadyExpired(milliseconds)) { /* When PXAT/EXAT absolute timestamp is specified, there can be a chance that timestamp * has already elapsed so delete the key in that case. */ deleteExpiredKeyFromOverwriteAndPropagate(c, c->argv[1]); @@ -486,7 +359,7 @@ void getexCommand(client *c) { signalModifiedKey(c, c->db, c->argv[1]); notifyKeyspaceEvent(NOTIFY_GENERIC, "expire", c->argv[1], c->db->id); server.dirty++; - } else if (flags & OBJ_PERSIST) { + } else if (flags & ARGS_PERSIST) { if (removeExpire(c->db, c->argv[1])) { signalModifiedKey(c, c->db, c->argv[1]); rewriteClientCommandVector(c, 2, shared.persist, c->argv[1]); diff --git a/src/unit/test_entry.c b/src/unit/test_entry.c new file mode 100644 index 00000000000..27a2028f958 --- /dev/null +++ b/src/unit/test_entry.c @@ -0,0 +1,471 @@ +#include "../entry.h" +#include "test_help.h" +#include "../expire.h" +#include "../monotonic.h" +#include "../server.h" +#include +#include +#include +#include +#include + +/* Constants for test values */ +#define SHORT_FIELD "foo" +#define SHORT_VALUE "bar" +#define LONG_FIELD "k:123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890" +#define LONG_VALUE "v:12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890" + +/* Verify entry properties */ +static int verify_entry_properties(entry *e, sds field, sds value_copy, long long expiry, bool has_expiry, bool has_valueptr) { + TEST_ASSERT(sdscmp(entryGetField(e), field) == 0); + TEST_ASSERT(sdscmp(entryGetValue(e), value_copy) == 0); + TEST_ASSERT(entryGetExpiry(e) == expiry); + TEST_ASSERT(entryHasExpiry(e) == has_expiry); + TEST_ASSERT(entryHasEmbeddedValue(e) != has_valueptr); + return 0; +} + +/** + * Test entryCreate functunallity: + * 1. embedded with expiry + * 2. embedded without expiry + * 3. non-embedded with expiry + * 4. non-embedded without expiry + */ +int test_entryCreate(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + + // Test with embedded value with expiry + sds field1 = sdsnew(SHORT_FIELD); + sds value1 = sdsnew(SHORT_VALUE); + sds value_copy1 = sdsdup(value1); // Keep a copy since entryCreate takes ownership of value + long long expiry1 = 100; + entry *e1 = entryCreate(field1, value1, expiry1); + verify_entry_properties(e1, field1, value_copy1, expiry1, true, false); + + // Test with embedded value with no expiry + sds field2 = sdsnew(SHORT_FIELD); + sds value2 = sdsnew(SHORT_VALUE); + sds value_copy2 = sdsdup(value2); + long long expiry2 = EXPIRY_NONE; + entry *e2 = entryCreate(field2, value2, expiry2); + verify_entry_properties(e2, field2, value_copy2, expiry2, false, false); + + // Test with non-embedded field and value with expiry + sds field3 = sdsnew(LONG_FIELD); + sds value3 = sdsnew(LONG_VALUE); + sds value_copy3 = sdsdup(value3); + long long expiry3 = 100; + entry *e3 = entryCreate(field3, value3, expiry3); + verify_entry_properties(e3, field3, value_copy3, expiry3, true, true); + + // Test with non-embedded field and value with no expiry + sds field4 = sdsnew(LONG_FIELD); + sds value4 = sdsnew(LONG_VALUE); + sds value_copy4 = sdsdup(value4); + long long expiry4 = EXPIRY_NONE; + entry *e4 = entryCreate(field4, value4, expiry4); + verify_entry_properties(e4, field4, value_copy4, expiry4, false, true); + + entryFree(e1); + entryFree(e2); + entryFree(e3); + entryFree(e4); + + // Free field as entryCreate doesn't take ownership + sdsfree(field1); + sdsfree(field2); + sdsfree(field3); + sdsfree(field4); + + sdsfree(value_copy1); + sdsfree(value_copy2); + sdsfree(value_copy3); + sdsfree(value_copy4); + + return 0; +} + +/** + * Test entryUpdate with various combinations of value and expiry changes: + * 1. Update only the value (keeping embedded) + * 2. Update only the expiry (keeping embedded) + * 3. Update both value and expiry (keeping embedded) + * 4. Update with no changes (should return same entry) + * 5. Update to a value that's too large to be embedded + * 6. Update expiry of a non-embedded entry + * 7. Update from non-embedded back to embedded value + * 8. Update entry to less then 3/4 allocation size + * 9. Update entry to more than 3/4 allocation size + * 8. Update entry to exactly 3/4 allocation size + */ +int test_entryUpdate(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + + // Create embedded entry + sds value1 = sdsnew(SHORT_VALUE); + sds field = sdsnew(SHORT_FIELD); + sds value_copy1 = sdsdup(value1); + long long expiry1 = 100; + entry *e1 = entryCreate(field, value1, expiry1); + verify_entry_properties(e1, field, value_copy1, expiry1, true, false); + + // Update only value (keeping embedded) + sds value2 = sdsnew("bar2"); + sds value_copy2 = sdsdup(value2); + long long expiry2 = expiry1; + entry *e2 = entryUpdate(e1, value2, expiry2); + verify_entry_properties(e2, field, value_copy2, expiry2, true, false); + + // Update only expiry (keeping embedded) + long long expiry3 = 200; + entry *e3 = entryUpdate(e2, NULL, expiry3); + verify_entry_properties(e3, field, value_copy2, expiry3, true, false); + + // Update both value and expiry (keeping embedded) + sds value4 = sdsnew("bar4"); + long long expiry4 = 300; + sds value_copy4 = sdsdup(value4); + entry *e4 = entryUpdate(e3, value4, expiry4); + verify_entry_properties(e4, field, value_copy4, expiry4, true, false); + + // Update with no changes (should return same entry) + entry *e5 = entryUpdate(e4, NULL, expiry4); + verify_entry_properties(e5, field, value_copy4, expiry4, true, false); + TEST_ASSERT(e5 == e4); + + // Update to a value that's too large to be embedded + sds value6 = sdsnew(LONG_VALUE); + sds value_copy6 = sdsdup(value6); + long long expiry6 = expiry4; + entry *e6 = entryUpdate(e5, value6, expiry6); + verify_entry_properties(e6, field, value_copy6, expiry6, true, true); + + // Update expiry of a non-embedded entry + long long expiry7 = 400; + entry *e7 = entryUpdate(e6, NULL, expiry7); + verify_entry_properties(e7, field, value_copy6, expiry7, true, true); + + // Update from non-embedded back to embedded value + sds value8 = sdsnew("bar8"); + sds value_copy8 = sdsdup(value8); + long long expiry8 = expiry7; + entry *e8 = entryUpdate(e7, value8, expiry8); + verify_entry_properties(e8, field, value_copy8, expiry8, true, false); + + // Update value with identical value (keeping embedded) + sds value9 = sdsnew("bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"); + sds value_copy9 = sdsdup(value9); + long long expiry9 = expiry8; + entry *e9 = entryUpdate(e8, value9, expiry9); + verify_entry_properties(e9, field, value_copy9, expiry9, true, false); + + // Update the value so that memory usage is less than 3/4 of the current allocation size + // Ensuring required_embedded_size < current_embedded_allocation_size * 3 / 4, which creates a new entry + size_t current_embedded_allocation_size = entryMemUsage(e9); + sds value10 = sdsnew("xxxxxxxxxxxxxxxxxxxxx"); + sds value_copy10 = sdsdup(value10); + long long expiry10 = expiry9; + entry *e10 = entryUpdate(e9, value10, expiry10); + verify_entry_properties(e10, field, value_copy10, expiry10, true, false); + TEST_ASSERT(entryMemUsage(e10) < current_embedded_allocation_size * 3 / 4); + TEST_ASSERT(e10 != e9); + + // Update the value so that memory usage is at least 3/4 of the current memory usage + // Ensuring required_embedded_size > current_embedded_allocation_size * 3 / 4 without creating a new entry + current_embedded_allocation_size = entryMemUsage(e10); + sds value11 = sdsnew("yyyyyyyyyyyyy"); + sds value_copy11 = sdsdup(value11); + long long expiry11 = expiry10; + entry *e11 = entryUpdate(e10, value11, expiry11); + verify_entry_properties(e11, field, value_copy11, expiry11, true, false); + TEST_ASSERT(entryMemUsage(e11) >= current_embedded_allocation_size * 3 / 4); + TEST_ASSERT(entryMemUsage(e11) <= current_embedded_allocation_size); + TEST_ASSERT(entryMemUsage(e11) <= + EMBED_VALUE_MAX_ALLOC_SIZE); + TEST_ASSERT(e10 == e11); + + // Update the value so that memory usage is exactly equal to the current allocation size + // Ensuring required_embedded_size == current_embedded_allocation_size without creating a new entry + current_embedded_allocation_size = entryMemUsage(e11); + sds value12 = sdsnew("zzzzzzzzzzzzz"); + sds value_copy12 = sdsdup(value12); + long long expiry12 = expiry11; + entry *e12 = entryUpdate(e11, value12, expiry12); + verify_entry_properties(e11, field, value_copy12, expiry12, true, false); + TEST_ASSERT(entryMemUsage(e12) == current_embedded_allocation_size); + TEST_ASSERT(entryMemUsage(e12) <= EMBED_VALUE_MAX_ALLOC_SIZE); + TEST_ASSERT(e12 == e11); + + entryFree(e12); + sdsfree(field); + sdsfree(value_copy1); + sdsfree(value_copy2); + sdsfree(value_copy4); + sdsfree(value_copy6); + sdsfree(value_copy8); + sdsfree(value_copy9); + sdsfree(value_copy10); + sdsfree(value_copy11); + sdsfree(value_copy12); + + return 0; +} + +/** + * Test setting expiry on an entry: + * 1. No expiry + * 2. Set expiry on entry without expiry + * 3. Update expiry on entry with expiry + * 4. Test with non-embedded entry + * 5. Set expiry on non-embedded entry + */ +int test_entryHasexpiry_entrySetExpiry(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + + // No expiry + sds field1 = sdsnew(SHORT_FIELD); + sds value1 = sdsnew(SHORT_VALUE); + entry *e1 = entryCreate(field1, value1, EXPIRY_NONE); + TEST_ASSERT(entryHasExpiry(e1) == false); + TEST_ASSERT(entryGetExpiry(e1) == EXPIRY_NONE); + + // Set expiry on entry without expiry + long long expiry2 = 100; + entry *e2 = entrySetExpiry(e1, expiry2); + TEST_ASSERT(entryHasExpiry(e2) == true); + TEST_ASSERT(entryGetExpiry(e2) == expiry2); + + // Update expiry on entry with expiry + long long expiry3 = 200; + entry *e3 = entrySetExpiry(e2, expiry3); + TEST_ASSERT(entryHasExpiry(e3) == true); + TEST_ASSERT(entryGetExpiry(e3) == expiry3); + TEST_ASSERT(e2 == e3); // Should be the same pointer when just updating expiry + + // Test with non-embedded entry + sds field4 = sdsnew(LONG_FIELD); + sds value4 = sdsnew(LONG_VALUE); + entry *e4 = entryCreate(field4, value4, EXPIRY_NONE); + TEST_ASSERT(entryHasExpiry(e4) == false); + TEST_ASSERT(entryHasEmbeddedValue(e4) == false); + + // Set expiry on entry without expiry + long long expiry5 = 100; + entry *e5 = entrySetExpiry(e4, expiry5); + TEST_ASSERT(entryHasExpiry(e5) == true); + TEST_ASSERT(entryGetExpiry(e5) == expiry5); + + // Update expiry on entry with expiry + long long expiry6 = 200; + entry *e6 = entrySetExpiry(e5, expiry6); + TEST_ASSERT(entryHasExpiry(e6) == true); + TEST_ASSERT(entryGetExpiry(e6) == expiry6); + TEST_ASSERT(e5 == e6); // Should be the same pointer when just updating expiry + + entryFree(e3); + entryFree(e6); + sdsfree(field1); + sdsfree(field4); + + return 0; +} + +/** + * Test entryIsExpired: + * 1. No expiry + * 2. Future expiry + * 3. Current time expiry + * 4. Past expiry + * 5. Test with loading mode + * 6. Test with import mode and import source client + * 7. Test with import mode and import source client and import expiry + * 8. Test with import mode and import source client and import expiry and import expiry is in the past + */ +int test_entryIsExpired(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + + // Setup server state + enterExecutionUnit(1, ustime()); + long long current_time = commandTimeSnapshot(); + + // No expiry + sds field1 = sdsnew(SHORT_FIELD); + sds value1 = sdsnew(SHORT_VALUE); + entry *e1 = entryCreate(field1, value1, EXPIRY_NONE); + TEST_ASSERT(entryGetExpiry(e1) == EXPIRY_NONE); + TEST_ASSERT(entryIsExpired(e1) == false); + + // Future expiry + sds field2 = sdsnew(SHORT_FIELD); + sds value2 = sdsnew(SHORT_VALUE); + long long future_time = current_time + 10000; // 10 seconds in future + entry *e2 = entryCreate(field2, value2, future_time); + TEST_ASSERT(entryGetExpiry(e2) == future_time); + TEST_ASSERT(entryIsExpired(e2) == false); + + // Current time expiry + sds field3 = sdsnew(SHORT_FIELD); + sds value3 = sdsnew(SHORT_VALUE); + entry *e3 = entryCreate(field3, value3, current_time); + TEST_ASSERT(entryGetExpiry(e3) == current_time); + TEST_ASSERT(entryIsExpired(e3) == false); + + // Test with past expiry + sds field4 = sdsnew(SHORT_FIELD); + sds value4 = sdsnew(SHORT_VALUE); + long long past_time = current_time - 10000; // 10 seconds ago + entry *e4 = entryCreate(field4, value4, past_time); + TEST_ASSERT(entryGetExpiry(e4) == past_time); + TEST_ASSERT(entryIsExpired(e4) == true); + + entryFree(e1); + entryFree(e2); + entryFree(e3); + entryFree(e4); + sdsfree(field1); + sdsfree(field2); + sdsfree(field3); + sdsfree(field4); + exitExecutionUnit(); + return 0; +} + +/** + * Test entryMemUsage: + * 1. Embedded entry tests: + * - Initial creation without expiry + * - Adding expiry (should increase memory usage) + * - Updating expiry (should not change memory usage) + * - Updating value while keeping it embedded: + * * To smaller value (should not decrease memory usage) + * * To bigger value (should not increase memory usage) + * + * 2. Non-embedded entry tests: + * - Initial creation without expiry + * - Adding expiry (should increase memory usage) + * - Updating expiry (should not change memory usage) + * - Updating value: + * * To smaller value (should decrease memory usage) + * * To bigger value (should increase memory usage) + */ +int test_entryMemUsage_entrySetExpiry_entrySetValue(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + + // Tests with embedded entry + // Embedded entry without expiry + sds field1 = sdsnew(SHORT_FIELD); + sds value1 = sdsnew(SHORT_VALUE); + sds value_copy1 = sdsdup(value1); + long long expiry1 = EXPIRY_NONE; + entry *e1 = entryCreate(field1, value1, expiry1); + size_t e1_entryMemUsage = entryMemUsage(e1); + verify_entry_properties(e1, field1, value_copy1, expiry1, false, false); + TEST_ASSERT(e1_entryMemUsage > 0); + + // Add expiry to embedded entry without expiry + // This should increase memory usage by sizeof(long long) + 2 bytes + // (long long for the expiry value, 2 bytes for SDS header adjustment) + long long expiry2 = 100; + entry *e2 = entrySetExpiry(e1, expiry2); + size_t e2_entryMemUsage = entryMemUsage(e2); + verify_entry_properties(e2, field1, value_copy1, expiry2, true, false); + TEST_ASSERT(zmalloc_usable_size((char *)e2 - sizeof(long long) - 3) == e2_entryMemUsage); + + // Update expiry on an entry that already has one + // This should NOT change memory usage as we're just updating the expiry value (long long) + long long expiry3 = 10000; + entry *e3 = entrySetExpiry(e2, expiry3); + size_t e3_entryMemUsage = entryMemUsage(e3); + verify_entry_properties(e3, field1, value_copy1, expiry3, true, false); + TEST_ASSERT(e3_entryMemUsage == e2_entryMemUsage); + + // Update to smaller value (keeping embedded) + // Memory usage should decrease by the difference in value size (2 bytes) + sds value4 = sdsnew("x"); + sds value_copy4 = sdsdup(value4); + entry *e4 = entrySetValue(e3, value4); + size_t e4_entryMemUsage = entryMemUsage(e4); + verify_entry_properties(e4, field1, value_copy4, expiry3, true, false); + TEST_ASSERT(zmalloc_usable_size((char *)e4 - sizeof(long long) - 3) == e4_entryMemUsage); + + // Update to bigger value (keeping embedded) + // Memory usage should increase by the difference in value size (1 byte) + sds value5 = sdsnew("xx"); + sds value_copy5 = sdsdup(value5); + entry *e5 = entrySetValue(e4, value5); + size_t e5_entryMemUsage = entryMemUsage(e5); + verify_entry_properties(e5, field1, value_copy5, expiry3, true, false); + TEST_ASSERT(zmalloc_usable_size((char *)e5 - sizeof(long long) - 3) == e5_entryMemUsage); + + // Tests with non-embedded entry + // Non-embedded entry without expiry + sds field6 = sdsnew(LONG_FIELD); + field6 = sdscat(field6, LONG_FIELD); // Double the length to ensure non-embedded entry + sds value6 = sdsnew(LONG_VALUE); + sds value_copy6 = sdsdup(value6); + long long expiry6 = EXPIRY_NONE; + entry *e6 = entryCreate(field6, value6, EXPIRY_NONE); + size_t e6_entryMemUsage = entryMemUsage(e6); + verify_entry_properties(e6, field6, value_copy6, expiry6, false, true); + TEST_ASSERT(e6_entryMemUsage > 0); + + // Add expiry to non-embedded entry without expiry + // For non-embedded entries this increases memory by exactly sizeof(long long) + long long expiry7 = 100; + entry *e7 = entrySetExpiry(e6, expiry7); + size_t e7_entryMemUsage = entryMemUsage(e7); + verify_entry_properties(e7, field6, value_copy6, expiry7, true, true); + size_t expected_e7_entry_mem = zmalloc_usable_size((char *)e7 - sizeof(long long) - sizeof(sds) - 3) + sdsAllocSize(value6); + TEST_ASSERT(expected_e7_entry_mem == e7_entryMemUsage); + + // Update expiry on a non-embedded entry that already has one + // This should not change memory usage as we're just updating the expiry value + long long expiry8 = 10000; + entry *e8 = entrySetExpiry(e7, expiry8); + size_t e8_entryMemUsage = entryMemUsage(e8); + verify_entry_properties(e8, field6, value_copy6, expiry8, true, true); + TEST_ASSERT(e8_entryMemUsage == e7_entryMemUsage); + + // Update to smaller value (keeping non-embedded) + // Memory usage should increase by at least the difference between LONG_VALUE and "x" (143) + sds value9 = sdsnew("x"); + sds value_copy9 = sdsdup(value9); + entry *e9 = entrySetValue(e8, value9); + size_t e9_entryMemUsage = entryMemUsage(e9); + verify_entry_properties(e9, field6, value_copy9, expiry8, true, true); + size_t expected_e9_entry_mem = zmalloc_usable_size((char *)e9 - sizeof(long long) - sizeof(sds) - 3) + sdsAllocSize(value9); + TEST_ASSERT(expected_e9_entry_mem == e9_entryMemUsage); + + // Update to bigger value (keeping non-embedded) + // Memory usage increases by the difference in value size (1 byte) + sds value10 = sdsnew("xx"); + sds value_copy10 = sdsdup(value10); + entry *e10 = entrySetValue(e9, value10); + size_t e10_entryMemUsage = entryMemUsage(e10); + size_t expected_10_entry_mem = zmalloc_usable_size((char *)e10 - sizeof(long long) - sizeof(sds) - 3) + sdsAllocSize(value10); + TEST_ASSERT(expected_10_entry_mem == e10_entryMemUsage); + + entryFree(e5); + entryFree(e10); + sdsfree(field1); + sdsfree(field6); + sdsfree(value_copy1); + sdsfree(value_copy4); + sdsfree(value_copy5); + sdsfree(value_copy6); + sdsfree(value_copy9); + sdsfree(value_copy10); + + return 0; +} diff --git a/src/unit/test_files.h b/src/unit/test_files.h index bb003993421..d7befe08943 100644 --- a/src/unit/test_files.h +++ b/src/unit/test_files.h @@ -20,6 +20,11 @@ int test_dictDisableResizeReduceTo3(int argc, char **argv, int flags); int test_dictDeleteOneKeyTriggerResizeAgain(int argc, char **argv, int flags); int test_dictBenchmark(int argc, char **argv, int flags); int test_endianconv(int argc, char *argv[], int flags); +int test_entryCreate(int argc, char **argv, int flags); +int test_entryUpdate(int argc, char **argv, int flags); +int test_entryHasexpiry_entrySetExpiry(int argc, char **argv, int flags); +int test_entryIsExpired(int argc, char **argv, int flags); +int test_entryMemUsage_entrySetExpiry_entrySetValue(int argc, char **argv, int flags); int test_cursor(int argc, char **argv, int flags); int test_set_hash_function_seed(int argc, char **argv, int flags); int test_add_find_delete(int argc, char **argv, int flags); @@ -242,6 +247,7 @@ unitTest __test_crc64_c[] = {{"test_crc64", test_crc64}, {NULL, NULL}}; unitTest __test_crc64combine_c[] = {{"test_crc64combine", test_crc64combine}, {NULL, NULL}}; unitTest __test_dict_c[] = {{"test_dictCreate", test_dictCreate}, {"test_dictAdd16Keys", test_dictAdd16Keys}, {"test_dictDisableResize", test_dictDisableResize}, {"test_dictAddOneKeyTriggerResize", test_dictAddOneKeyTriggerResize}, {"test_dictDeleteKeys", test_dictDeleteKeys}, {"test_dictDeleteOneKeyTriggerResize", test_dictDeleteOneKeyTriggerResize}, {"test_dictEmptyDirAdd128Keys", test_dictEmptyDirAdd128Keys}, {"test_dictDisableResizeReduceTo3", test_dictDisableResizeReduceTo3}, {"test_dictDeleteOneKeyTriggerResizeAgain", test_dictDeleteOneKeyTriggerResizeAgain}, {"test_dictBenchmark", test_dictBenchmark}, {NULL, NULL}}; unitTest __test_endianconv_c[] = {{"test_endianconv", test_endianconv}, {NULL, NULL}}; +unitTest __test_entry_c[] = {{"test_entryCreate", test_entryCreate}, {"test_entryUpdate", test_entryUpdate}, {"test_entryHasexpiry_entrySetExpiry", test_entryHasexpiry_entrySetExpiry}, {"test_entryIsExpired", test_entryIsExpired}, {"test_entryMemUsage_entrySetExpiry_entrySetValue", test_entryMemUsage_entrySetExpiry_entrySetValue}, {NULL, NULL}}; unitTest __test_hashtable_c[] = {{"test_cursor", test_cursor}, {"test_set_hash_function_seed", test_set_hash_function_seed}, {"test_add_find_delete", test_add_find_delete}, {"test_add_find_delete_avoid_resize", test_add_find_delete_avoid_resize}, {"test_instant_rehashing", test_instant_rehashing}, {"test_bucket_chain_length", test_bucket_chain_length}, {"test_two_phase_insert_and_pop", test_two_phase_insert_and_pop}, {"test_replace_reallocated_entry", test_replace_reallocated_entry}, {"test_incremental_find", test_incremental_find}, {"test_scan", test_scan}, {"test_iterator", test_iterator}, {"test_safe_iterator", test_safe_iterator}, {"test_compact_bucket_chain", test_compact_bucket_chain}, {"test_random_entry", test_random_entry}, {"test_random_entry_with_long_chain", test_random_entry_with_long_chain}, {"test_random_entry_sparse_table", test_random_entry_sparse_table}, {NULL, NULL}}; unitTest __test_intset_c[] = {{"test_intsetValueEncodings", test_intsetValueEncodings}, {"test_intsetBasicAdding", test_intsetBasicAdding}, {"test_intsetLargeNumberRandomAdd", test_intsetLargeNumberRandomAdd}, {"test_intsetUpgradeFromint16Toint32", test_intsetUpgradeFromint16Toint32}, {"test_intsetUpgradeFromint16Toint64", test_intsetUpgradeFromint16Toint64}, {"test_intsetUpgradeFromint32Toint64", test_intsetUpgradeFromint32Toint64}, {"test_intsetStressLookups", test_intsetStressLookups}, {"test_intsetStressAddDelete", test_intsetStressAddDelete}, {NULL, NULL}}; unitTest __test_kvstore_c[] = {{"test_kvstoreAdd16Keys", test_kvstoreAdd16Keys}, {"test_kvstoreIteratorRemoveAllKeysNoDeleteEmptyHashtable", test_kvstoreIteratorRemoveAllKeysNoDeleteEmptyHashtable}, {"test_kvstoreIteratorRemoveAllKeysDeleteEmptyHashtable", test_kvstoreIteratorRemoveAllKeysDeleteEmptyHashtable}, {"test_kvstoreHashtableIteratorRemoveAllKeysNoDeleteEmptyHashtable", test_kvstoreHashtableIteratorRemoveAllKeysNoDeleteEmptyHashtable}, {"test_kvstoreHashtableIteratorRemoveAllKeysDeleteEmptyHashtable", test_kvstoreHashtableIteratorRemoveAllKeysDeleteEmptyHashtable}, {NULL, NULL}}; @@ -268,6 +274,7 @@ struct unitTestSuite { {"test_crc64combine.c", __test_crc64combine_c}, {"test_dict.c", __test_dict_c}, {"test_endianconv.c", __test_endianconv_c}, + {"test_entry.c", __test_entry_c}, {"test_hashtable.c", __test_hashtable_c}, {"test_intset.c", __test_intset_c}, {"test_kvstore.c", __test_kvstore_c}, diff --git a/src/util.c b/src/util.c index aea6ae59371..0e93bbc7a18 100644 --- a/src/util.c +++ b/src/util.c @@ -59,8 +59,6 @@ #include #endif -#define UNUSED(x) ((void)(x)) - /* Glob-style pattern matching. */ static int stringmatchlen_impl(const char *pattern, int patternLen, diff --git a/src/util.h b/src/util.h index 514346939c6..db15f2d9003 100644 --- a/src/util.h +++ b/src/util.h @@ -33,6 +33,17 @@ #include #include "sds.h" +/* Anti-warning macro... */ +#ifndef UNUSED +#define UNUSED(V) ((void)V) +#endif + +/* min/max */ +#undef min +#undef max +#define min(a, b) ((a) < (b) ? (a) : (b)) +#define max(a, b) ((a) > (b) ? (a) : (b)) + /* The maximum number of characters needed to represent a long double * as a string (long double has a huge range of some 4952 chars, see LDBL_MAX). * This should be the size of the buffer given to ld2string */ diff --git a/src/valkey-check-rdb.c b/src/valkey-check-rdb.c index 8cc4a4eba91..e5bd9fa64d1 100644 --- a/src/valkey-check-rdb.c +++ b/src/valkey-check-rdb.c @@ -146,8 +146,11 @@ char *rdb_type_string[] = { "stream-v2", "set-listpack", "stream-v3", + "hash-volatile-items", }; +static_assert(sizeof(rdb_type_string) / sizeof(rdb_type_string[0]) == RDB_TYPE_LAST, "Mismatch between enum and string table"); + char *type_name[OBJ_TYPE_MAX] = {"string", "list", "set", "zset", "hash", "module", /* module type is special */ "stream"}; diff --git a/src/volatile_set.c b/src/volatile_set.c new file mode 100644 index 00000000000..97cbbbab870 --- /dev/null +++ b/src/volatile_set.c @@ -0,0 +1,79 @@ +#include +#include "volatile_set.h" +#include "zmalloc.h" +#include "config.h" +#include "endianconv.h" +#include "serverassert.h" + +#define EXPIRY_HASH_SIZE 16 +volatile_set *createVolatileSet(volatileEntryType *type) { + volatile_set *set = zmalloc(sizeof(volatile_set)); + set->etypr = type; + set->expiry_buckets = raxNew(); + return set; +} + +void freeVolatileSet(volatile_set *b) { + raxFree(b->expiry_buckets); + zfree(b); +} + +int volatileSetAddEntry(volatile_set *set, void *entry, long long expiry) { + unsigned char buf[EXPIRY_HASH_SIZE]; + expiry = htonu64(expiry); + memcpy(buf, &expiry, sizeof(expiry)); + memcpy(buf + 8, &entry, sizeof(entry)); + if (sizeof(entry) == 4) memset(buf + 12, 0, 4); /* Zero padding for 32bit target. */ + return raxTryInsert(set->expiry_buckets, buf, sizeof(buf), NULL, NULL); +} + +int volatileSetRemoveEntry(volatile_set *set, void *entry, long long expiry) { + unsigned char buf[EXPIRY_HASH_SIZE]; + expiry = htonu64(expiry); + memcpy(buf, &expiry, sizeof(expiry)); + memcpy(buf + 8, &entry, sizeof(entry)); + if (sizeof(entry) == 4) memset(buf + 12, 0, 4); /* Zero padding for 32bit target. */ + return raxRemove(set->expiry_buckets, buf, sizeof(buf), NULL); +} + +int volatileSetUpdateEntry(volatile_set *set, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { + if (old_entry == new_entry && old_expiry == new_expiry) return 1; + + if (old_entry && old_expiry != -1) { + assert(volatileSetRemoveEntry(set, old_entry, old_expiry)); + } + if (new_entry && new_expiry != -1) { + assert(volatileSetAddEntry(set, new_entry, new_expiry)); + } + return 1; +} + +int volatileSetExpireEntry(volatile_set *set, void *entry) { + volatileSetRemoveEntry(set, entry, set->etypr->getExpiry(entry)); + if (set->etypr->expire) { + set->etypr->expire(entry); + return 1; + } + return 0; +} + +size_t volatileSetNumEntries(volatile_set *set) { + assert(set && set->expiry_buckets); + return set->expiry_buckets->numele; +} + +void volatileSetStart(volatile_set *set, volatileSetIterator *it) { + raxStart(&it->bucket, set->expiry_buckets); +} + +int volatileSetNext(volatileSetIterator *it, void **entryptr) { + if (raxNext(&it->bucket)) { + assert(it->bucket.key_len == EXPIRY_HASH_SIZE); + memcpy(entryptr, it->bucket.key + sizeof(long long), sizeof(*entryptr)); + return 1; + } + return 0; +} +void volatileSetReset(volatileSetIterator *it) { + raxStop(&it->bucket); +} diff --git a/src/volatile_set.h b/src/volatile_set.h new file mode 100644 index 00000000000..37dc7c9923a --- /dev/null +++ b/src/volatile_set.h @@ -0,0 +1,40 @@ +#ifndef VOLATILESET_H +#define VOLATILESET_H + +#include +#include "rax.h" +#include "sds.h" + +typedef struct { + sds (*entryGetKey)(const void *entry); + + long long (*getExpiry)(const void *entry); + + int (*expire)(void *entry); + +} volatileEntryType; + + +typedef struct { + volatileEntryType *etypr; + rax *expiry_buckets; +} volatile_set; + +typedef struct volatileSetIterator { + raxIterator bucket; +} volatileSetIterator; + + +int volatileSetRemoveEntry(volatile_set *set, void *entry, long long expiry); +int volatileSetAddEntry(volatile_set *set, void *entry, long long expiry); +int volatileSetExpireEntry(volatile_set *set, void *entry); +int volatileSetUpdateEntry(volatile_set *set, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry); +size_t volatileSetNumEntries(volatile_set *set); +void volatileSetStart(volatile_set *set, volatileSetIterator *it); +int volatileSetNext(volatileSetIterator *it, void **entryptr); +void volatileSetReset(volatileSetIterator *it); + +void freeVolatileSet(volatile_set *b); +volatile_set *createVolatileSet(volatileEntryType *type); + +#endif diff --git a/tests/unit/hashexpire.tcl b/tests/unit/hashexpire.tcl new file mode 100644 index 00000000000..c8989dace11 --- /dev/null +++ b/tests/unit/hashexpire.tcl @@ -0,0 +1,2639 @@ + +proc info_field {info field} { + foreach line [split $info "\n"] { + if {[string match "$field:*" $line]} { + return [string trim [lindex [split $line ":"] 1]] + } + } + return [s field_name] +} + +proc get_short_expire_value {command} { + expr { + ($command eq "HEXPIRE" || $command eq "EX") ? 1 : + ($command eq "HPEXPIRE" || $command eq "PX") ? 10 : + ($command eq "HEXPIREAT" || $command eq "EXAT") ? [clock seconds] + 1 : + [clock milliseconds] + 10 + } +} + +proc get_long_expire_value {command} { + expr { + ($command eq "HEXPIRE" || $command eq "EX") ? 60000000 : + ($command eq "HPEXPIRE" || $command eq "PX") ? 60000000 : + ($command eq "HEXPIREAT" || $command eq "EXAT") ? [clock seconds] + 60000000 : + [clock milliseconds] + 60000000 + } +} + +proc get_longer_then_long_expire_value {command} { + expr { + ($command eq "HEXPIRE" || $command eq "EX") ? 1200000000 : + ($command eq "HPEXPIRE" || $command eq "PX") ? 1200000000 : + ($command eq "HEXPIREAT" || $command eq "EXAT") ? [clock seconds] + 1200000000 : + [clock milliseconds] + 1200000000 + } +} + +proc get_past_zero_expire_value {command} { + expr { + ($command eq "HEXPIRE" || $command eq "EX") ? 0 : + ($command eq "HPEXPIRE" || $command eq "PX") ? 0 : + ($command eq "HEXPIREAT" || $command eq "EXAT") ? [clock seconds] - 200000 : + [clock milliseconds] - 200000 + } +} + +proc get_check_ttl_command {command} { + if {$command eq "EX"} { + return "HTTL" + } elseif {$command eq "PX"} { + return "HPTTL" + } elseif {$command eq "EXAT"} { + return "HEXPIRETIME" + } else { + return "HPEXPIRETIME" + } +} + +proc assert_keyevent_patterns {rd key args} { + foreach event_type $args { + set event [$rd read] + assert_match "pmessage __keyevent@* __keyevent@*:$event_type $key" $event + } +} + +proc setup_replication_test {primary replica primary_host primary_port} { + $primary FLUSHALL + $replica replicaof $primary_host $primary_port + wait_for_condition 50 100 { + [lindex [$replica role] 0] eq {slave} && + [string match {*master_link_status:up*} [$replica info replication]] + } else { + fail "Can't turn the instance into a replica" + } + set primary_initial_expired [info_field [$primary info stats] expired_subkeys] + set replica_initial_expired [info_field [$replica info stats] expired_subkeys] + return [list $primary_initial_expired $replica_initial_expired] +} + +proc setup_single_keyspace_notification {r} { + $r config set notify-keyspace-events KEA + set rd [valkey_deferring_client] + assert_equal {1} [psubscribe $rd __keyevent@*] + return $rd +} + + +start_server {tags {"hashexpire"}} { + ####### Valid scenarios tests ####### + foreach command {EX PX EXAT PXAT} { + test "HGETEX $command expiry" { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + r HSET myhash f1 v1 + + set ttl_cmd [get_check_ttl_command $command] + set expire_time [get_long_expire_value $command] + + # Verify HGETEX command + assert_equal "v1" [r HGETEX myhash $command $expire_time FIELDS 1 f1] + set expire_result [r $ttl_cmd myhash FIELDS 1 f1] + + # Verify expiry + if {[regexp "AT$" $command]} { + assert_equal $expire_result $expire_time + } else { + assert_morethan $expire_result 0 + } + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test "HGETEX $command with mix of existing and non-existing fields" { + r FLUSHALL + r HSET myhash f1 v1 f3 v3 + + # HGETEX on exist/non-exist fields + assert_equal "v1 {} v3" [r HGETEX myhash $command [get_long_expire_value $command] FIELDS 3 f1 f2 f3] + + # Verification checks (f2 should not be created) + assert_equal "" [r HGET myhash f2] + assert_equal -2 [r HTTL myhash FIELDS 1 f2] + assert_morethan [r HTTL myhash FIELDS 1 f1] 0 + assert_morethan [r HTTL myhash FIELDS 1 f3] 0 + } + + test "HGETEX $command on more then 1 field" { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + r HSET myhash f1 v1 f2 v2 + + set ttl_cmd [get_check_ttl_command $command] + set expire_time [get_long_expire_value $command] + + assert_equal "v1 v2" [r HGETEX myhash $command $expire_time FIELDS 2 f1 f2] + + # Verify expiration + if {[regexp "AT$" $command]} { + assert_equal $expire_time [r $ttl_cmd myhash FIELDS 1 f1] + assert_equal $expire_time [r $ttl_cmd myhash FIELDS 1 f2] + } else { + assert_morethan [r $ttl_cmd myhash FIELDS 1 f1] 0 + assert_morethan [r $ttl_cmd myhash FIELDS 1 f2] 0 + } + + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test "HGETEX $command -> PERSIST" { + r FLUSHALL + r HSET myhash f1 v1 + r HSETEX myhash EX 10000 FIELDS 1 f2 v2 + + set ttl_cmd [get_check_ttl_command $command] + set expire_time [get_long_expire_value $command] + + assert_equal "v1" [r HGETEX myhash $command $expire_time FIELDS 1 f1] + if {[regexp "AT$" $command]} { + assert_equal $expire_time [r $ttl_cmd myhash FIELDS 1 f1] + } else { + assert_morethan [r $ttl_cmd myhash FIELDS 1 f1] 0 + } + + assert_equal "v1" [r HGETEX myhash PERSIST FIELDS 1 f1] + assert_equal -1 [r HTTL myhash FIELDS 1 f1] + # Verify f2 still has ttl + assert_morethan [r HTTL myhash FIELDS 1 f2] 100 + } + + test "HGETEX $command on non-exist field" { + r FLUSHALL + r HSET myhash f1 v1 + assert_equal {{}} [r HGETEX myhash $command [get_short_expire_value $command] FIELDS 1 f2] + } + + test "HGETEX $command on non-exist key" { + r FLUSHALL + assert_equal "" [r HGETEX myhash $command [get_long_expire_value $command] FIELDS 1 f2] + } + + test "HGETEX $command with duplicate field names" { + r FLUSHALL + r HSET myhash f1 v1 + assert_equal "v1 v1" [r HGETEX myhash $command [get_long_expire_value $command] FIELDS 2 f1 f1] + } + + + test "HGETEX $command overwrites existing field TTL with bigger value" { + r FLUSHALL + r HSETEX myhash $command [get_long_expire_value $command] FIELDS 1 f1 v1 + set old_ttl [r HTTL myhash FIELDS 1 f1] + r HGETEX myhash $command [get_longer_then_long_expire_value $command] FIELDS 1 f1 + set new_ttl [r HTTL myhash FIELDS 1 f1] + assert {$new_ttl > $old_ttl} + } + + test "HGETEX $command overwrites existing field TTL with smaller value" { + r FLUSHALL + r HSETEX myhash $command [get_long_expire_value $command] FIELDS 1 f1 v1 + set old_ttl [r HTTL myhash FIELDS 1 f1] + r HGETEX myhash $command [get_short_expire_value $command] FIELDS 1 f1 + set new_ttl [r HTTL myhash FIELDS 1 f1] + assert {$new_ttl <= $old_ttl} + } + } + + foreach command {EX PX} { + test "HGETEX $command with 0 ttl" { + r FLUSHALL + r HSET myhash f1 v1 + assert_equal "v1" [r HGETEX myhash $command 0 FIELDS 1 f1] + assert_equal "" [r HGET myhash f1] + assert_equal -2 [r HTTL myhash FIELDS 1 f1] + } + } + + foreach command {EXAT PXAT} { + test "HGETEX $command with past expiry" { + r FLUSHALL + r HSET myhash f1 v1 + assert_equal "v1" [r HGETEX myhash $command [get_past_zero_expire_value $command] FIELDS 1 f1] + assert_equal "" [r HGET myhash f1] + assert_equal -2 [r HTTL myhash FIELDS 1 f1] + } + } + + test {HGETEX - verify no change when field does not exist} { + r FLUSHALL + r HSET myhash f1 v1 + set mem_before [r MEMORY USAGE myhash] + assert_equal {{}} [r HGETEX myhash EX 1 FIELDS 1 f2] + set memory_after [r MEMORY USAGE myhash] + assert_equal $mem_before $memory_after + } + + ####### Invalid scenarios tests ####### + test {HGETEX EX- multiple options used (EX + PX)} { + r FLUSHALL + r HSET myhash f1 v1 + assert_error "ERR*" {r HGETEX myhash EX 60 PX 1000 FIELDS 1 f1} + } + + test {HGETEX EXAT- multiple options used (EXAT + PXAT)} { + r FLUSHALL + r HSET myhash f1 v1 + assert_error "ERR*" {r HGETEX myhash EXAT [expr {[clock seconds] + 100}] PXAT [expr {[clock milliseconds] + 100000}] 1000 FIELDS 1 f1} + } + + # Common error scenarios for all commands + foreach cmd {EX PX EXAT PXAT} { + test "HGETEX $cmd- missing TTL value" { + r FLUSHALL + r HSET myhash f1 v1 + catch {r HGETEX myhash $cmd FIELDS 1 f1} e + set e + } {ERR *} + + test "HGETEX $cmd- negative TTL" { + r FLUSHALL + r HSET myhash f1 v1 + catch {r HGETEX myhash $cmd -10 FIELDS 1 f1} e + set e + } {ERR invalid expire time in 'hgetex' command} + + test "HGETEX $cmd- non-integer TTL" { + r FLUSHALL + r HSET myhash f1 v1 + catch {r HGETEX myhash $cmd abc FIELDS 1 f1} e + set e + } {ERR value is not an integer or out of range} + + test "HGETEX $cmd- missing FIELDS keyword" { + r FLUSHALL + r HSET myhash f1 v1 + catch {r HGETEX myhash $cmd [get_short_expire_value $cmd] 1 f1} e + set e + } {ERR *} + + test "HGETEX $cmd- wrong numfields count (too few fields)" { + r FLUSHALL + r HSET myhash f1 v1 f2 v2 + catch {r HGETEX myhash $cmd [get_short_expire_value $cmd] FIELDS 2 f1} e + set e + } {ERR *} + + test "HGETEX $cmd- wrong numfields count (too many fields)" { + r FLUSHALL + r HSET myhash f1 v1 + catch {r HGETEX myhash $cmd [get_short_expire_value $cmd] FIELDS 1 f1 f2} e + set e + } {ERR *} + + test "HGETEX $cmd- key is wrong type (string instead of hash)" { + r FLUSHALL + r SET mystring "v1" + catch {r HGETEX mystring $cmd [get_short_expire_value $cmd] FIELDS 1 f1} e + set e + } {WRONGTYPE Operation against a key holding the wrong kind of value} + + test "HGETEX $cmd with FIELDS 0" { + r FLUSHALL + catch {r HGETEX myhash $cmd [get_short_expire_value $cmd] FIELDS 0} e + set e + } {ERR *} + + test "HGETEX $cmd with negative numfields" { + r FLUSHALL + catch {r HGETEX myhash $cmd [get_short_expire_value $cmd] FIELDS -10} e + set e + } {ERR *} + + test "HGETEX $cmd with missing key" { + r FLUSHALL + catch {r HGETEX $cmd [get_short_expire_value $cmd] FIELDS 1 f1} e + set e + } {ERR *} + } +} + +## HGETEX -> Keyspace notification tests #### +start_server {tags {"hashexpire"}} { + if {$::singledb} { + set db 0 + } else { + set db 9 + } + set all_h_pattern "h*" + set hexpire_pattern "hexpire" + set hpersist_pattern "hpersist" + + r config set notify-keyspace-events KEA + + foreach command {EX PX EXAT PXAT} { + test "HGETEX $command generates hexpire keyspace notification" { + r FLUSHALL + r HSET myhash f1 v1 + + set rd [setup_single_keyspace_notification r] + + r HGETEX myhash $command [get_long_expire_value $command] FIELDS 1 f1 + + assert_keyevent_patterns $rd myhash hexpire + $rd close + } + + test "HGETEX $command with multiple fields generates single notification" { + r FLUSHALL + r HSET myhash f1 v1 f2 v2 f3 v3 + + set rd [setup_single_keyspace_notification r] + + r HGETEX myhash $command [get_long_expire_value $command] FIELDS 3 f1 f2 f3 + + assert_keyevent_patterns $rd myhash hexpire + # Verify no notification (getting hset and not hexpire) + r HSET dummy dummy dummy + assert_keyevent_patterns $rd dummy hset + $rd close + } + + test "HGETEX $command on non-existent field generates no notification" { + r FLUSHALL + r HSET myhash f1 v1 + + set rd [setup_single_keyspace_notification r] + + # This HGETEX targets a non-existent field, so no notification about hexpire should be emitted + r HGETEX myhash $command [get_long_expire_value $command] FIELDS 1 f2 + + # Verify no notification (getting hset and not hexpire) + r HSET dummy dummy dummy + assert_keyevent_patterns $rd dummy hset + + $rd close + } + } + + test {HGETEX PERSIST generates hpersist keyspace notification} { + r FLUSHALL + r HSET myhash f1 v1 + r HEXPIRE myhash 60 FIELDS 1 f1 + + set rd [setup_single_keyspace_notification r] + + r HGETEX myhash PERSIST FIELDS 1 f1 + + assert_keyevent_patterns $rd myhash hpersist + $rd close + } + + foreach command {EX PX EXAT PXAT} { + + test "HGETEX $command 0/past time works correctly with 1 field" { + r FLUSHALL + + # Create hash with field + r HSET myhash f1 v1 + assert_equal 1 [r HLEN myhash] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + set rd [setup_single_keyspace_notification r] + + # Set field to expire immediately + r HGETEX myhash $command [get_past_zero_expire_value $command] FIELDS 1 f1 + + # Verify field and keys are deleted + assert_keyevent_patterns $rd myhash hexpired del + assert_equal -2 [r HTTL myhash FIELDS 1 f1] + assert_equal 0 [r HLEN myhash] + assert_equal 0 [r EXISTS myhash] + assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + $rd close + } + + test "HGETEX $command 0/past time works correctly with 1 field on field with expire" { + r FLUSHALL + + # Create hash with field + r HSETEX myhash EX 1000 FIELDS 1 f1 v1 + assert_equal 1 [r HLEN myhash] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + set rd [setup_single_keyspace_notification r] + + # Set field to expire immediately + r HGETEX myhash $command [get_past_zero_expire_value $command] FIELDS 1 f1 + + # Verify field and keys are deleted + assert_keyevent_patterns $rd myhash hexpired del + assert_equal -2 [r HTTL myhash FIELDS 1 f1] + assert_equal 0 [r HLEN myhash] + assert_equal 0 [r EXISTS myhash] + assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + $rd close + } + + test "HGETEX $command 0/past time works correctly with more then 1 field" { + r FLUSHALL + + # Create hash with field + r HSET myhash f1 v1 f2 v2 + assert_equal 2 [r HLEN myhash] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + set rd [setup_single_keyspace_notification r] + + # Set field to expire immediately + r HGETEX myhash $command [get_past_zero_expire_value $command] FIELDS 1 f2 + + # Verify field and keys are deleted + assert_keyevent_patterns $rd myhash hexpired + assert_equal -2 [r HTTL myhash FIELDS 1 f2] + assert_equal 1 [r HLEN myhash] + assert_equal 1 [r EXISTS myhash] + assert_match 1 [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + $rd close + } + + test "HGETEX $command 0/past time works correctly with more then 1 field and expire" { + r FLUSHALL + + # Create hash with field + r HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 + r HEXPIRE myhash 1000000 FIELDS 1 f1 + assert_equal 4 [r HLEN myhash] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + set rd [setup_single_keyspace_notification r] + + # Set field to expire immediately + r HGETEX myhash $command [get_past_zero_expire_value $command] FIELDS 1 f1 + + # Verify field and keys are deleted + assert_keyevent_patterns $rd myhash hexpired + assert_equal -2 [r HTTL myhash FIELDS 1 f1] + assert_equal 3 [r HLEN myhash] + assert_equal 1 [r EXISTS myhash] + assert_match 1 [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + $rd close + } + } +} + +# HSETEX #### +start_server {tags {"hashexpire"}} { + test {HSETEX KEEPTTL - preserves existing TTL of field} { + r FLUSHALL + + # Set a field with a known TTL + r HSETEX myhash PX 1000 FIELDS 1 field1 val1 + set original_pttl [r HPTTL myhash FIELDS 1 field1] + set original_expiretime [r HEXPIRETIME myhash FIELDS 1 field1] + + # Validate TTL is active and expiretime is in the future + assert {$original_pttl > 0} + assert {$original_expiretime > [clock seconds]} + + # Overwrite the field with KEEPTTL + r HSETEX myhash KEEPTTL FIELDS 1 field1 newval + + # Ensure TTL is preserved + set updated_pttl [r HPTTL myhash FIELDS 1 field1] + set updated_expiretime [r HEXPIRETIME myhash FIELDS 1 field1] + assert {$updated_pttl > 0} + assert {$updated_pttl <= $original_pttl} + assert_equal $original_expiretime $updated_expiretime + + # Ensure value was updated + assert_equal newval [r HGET myhash field1] + } + + test {HSETEX EX - FIELDS 0 returns error} { + r FLUSHALL + catch {r HSETEX myhash EX 10 FIELDS 0} e + set e + } {ERR *} + + test {HSETEX EX - test negative ttl} { + set ttl -10 + catch {r HSETEX myhash EX $ttl FIELDS 1 field1 val1} e + set e + } {ERR invalid expire time in 'hsetex' command} + + test {HSETEX EX - test non-numeric ttl} { + set ttl abc + catch {r HSETEX myhash EX $ttl FIELDS 1 field1 val1} e + set e + } {ERR value is not an integer or out of range} + + test {HSETEX EX - overwrite field resets TTL} { + r FLUSHALL + r HSETEX myhash EX 100 FIELDS 1 field1 val1 + r HSETEX myhash EX 200 FIELDS 1 field1 newval + assert_equal 200 [r HTTL myhash FIELDS 1 field1] + assert_equal newval [r HGET myhash field1] + } + + test {HSETEX EX - test zero ttl expires immediately} { + r FLUSHALL + r HSETEX myhash EX 0 FIELDS 1 field1 val1 + after 10 + assert_equal 0 [r HEXISTS myhash field1] + } + + test {HSETEX EX - test mix of expiring and persistent fields} { + r FLUSHALL + r HSET myhash field2 "persistent" + r HSETEX myhash EX 1 FIELDS 1 field1 "temp" + after 1100 + assert_equal 0 [r HEXISTS myhash field1] + assert_equal 1 [r HEXISTS myhash field2] + } + + test {HSETEX EX - test missing TTL} { + catch {r HSETEX myhash EX FIELDS 1 field1 val1} e + set e + } {ERR *} + + test {HSETEX EX - mismatched field/value count} { + catch {r HSETEX myhash EX 10 FIELDS 2 field1 val1} e + set e + } {ERR *} + + + + ###### PX ####### + + test {HSETEX PX - test negative ttl} { + set ttl -50 + catch {r HSETEX myhash PX $ttl FIELDS 1 field1 val1} e + set e + } {ERR invalid expire time in 'hsetex' command} + + test {HSETEX PX - test non-numeric ttl} { + set ttl xyz + catch {r HSETEX myhash PX $ttl FIELDS 1 field1 val1} e + set e + } {ERR value is not an integer or out of range} + + test {HSETEX PX - overwrite field resets TTL} { + r FLUSHALL + r HSETEX myhash PX 10000 FIELDS 1 field1 val1 + r HSETEX myhash PX 20000 FIELDS 1 field1 newval + set ttl [r HPTTL myhash FIELDS 1 field1] + assert {$ttl >= 19000 && $ttl <= 20000} + assert_equal newval [r HGET myhash field1] + } + + test {HSETEX PX - test zero ttl expires immediately} { + r FLUSHALL + r HSETEX myhash PX 0 FIELDS 1 field1 val1 + after 10 + assert_equal 0 [r HEXISTS myhash field1] + } + + test {HSETEX PX - test mix of expiring and persistent fields} { + r FLUSHALL + r HSET myhash field2 "persistent" + r HSETEX myhash PX 10 FIELDS 1 field1 "temp" + after 20 + assert_equal 0 [r HEXISTS myhash field1] + assert_equal 1 [r HEXISTS myhash field2] + } + + test {HSETEX PX - test missing TTL} { + catch {r HSETEX myhash PX FIELDS 1 field1 val1} e + set e + } {ERR *} + + # test {HSETEX PX - mismatched field/value count} { + # catch {r HSETEX myhash PX 100 FIELDS 2 field1 val1} e + # set e + # } {ERR wrong number of arguments for 'hsetex' command} + + + ## FNX/FXX + + # hsetex throws ERR *, it shouldn't + test {HSETEX EX FNX - set only if none of the fields exist} { + r FLUSHALL + r HSET myhash field1 val1 + set res [r HSETEX myhash EX 10 FNX FIELDS 1 field1 val2] + assert_equal 0 $res + assert_equal val1 [r HGET myhash field1] + + # Now try with all-new fields + set res [r HSETEX myhash EX 10 FNX FIELDS 2 f2 v2 f3 v3] + assert_equal 1 $res + assert_equal v2 [r HGET myhash f2] + assert_equal v3 [r HGET myhash f3] + } + + test {HSETEX EX FXX - set only if all fields exist} { + r FLUSHALL + r HSET myhash field1 val1 field2 val2 + set res [r HSETEX myhash EX 10 FXX FIELDS 2 field1 new1 field2 new2] + assert_equal 1 $res + assert_equal new1 [r HGET myhash field1] + assert_equal new2 [r HGET myhash field2] + + # Now try when one field doesn't exist + set res [r HSETEX myhash EX 10 FXX FIELDS 2 field1 x fieldX y] + assert_equal 0 $res + assert_equal new1 [r HGET myhash field1] + assert_equal 0 [r HEXISTS myhash fieldX] + } + + # Syntax error: HSETEX myhash PX 100 FNX FIELDS 2 x 2 y 3 + test {HSETEX PX FNX - partial conflict returns 0} { + r FLUSHALL + r HSET myhash x 1 + set res [r HSETEX myhash PX 100 FNX FIELDS 2 x 2 y 3] + assert_equal 0 $res + assert_equal 1 [r HEXISTS myhash x] + assert_equal 0 [r HEXISTS myhash y] + } + + test {HSETEX PX FXX - one field missing returns 0} { + r FLUSHALL + r HSET myhash a 1 + set res [r HSETEX myhash PX 100 FXX FIELDS 2 a 2 b 3] + assert_equal 0 $res + assert_equal 1 [r HGET myhash a] + assert_equal 0 [r HEXISTS myhash b] + } + + test {HSETEX EX - FNX and FXX conflict error} { + catch {r HSETEX myhash EX 10 FNX FXX FIELDS 1 x y} e + set e + } {ERR *} + + ###### Test EXPIRE ############# + + + # Basic Expiry Functionality + test {HEXPIRE - set TTL on existing field} { + r FLUSHALL + r HSET myhash field1 hello + r HEXPIRE myhash 10 FIELDS 1 field1 + set ttl [r HTTL myhash FIELDS 1 field1] + assert {$ttl > 0} + } + + test {HEXPIRE - TTL 0 deletes field} { + r FLUSHALL + r HSET myhash field1 goodbye + set res [r HEXPIRE myhash 0 FIELDS 1 field1] + assert_equal {2} $res + assert_equal 0 [r HEXISTS myhash field1] + } + + test {HEXPIRE - negative TTL returns error} { + r FLUSHALL + r HSET myhash field1 val + catch {r HEXPIRE myhash -5 FIELDS 1 field1} e + set e + } {ERR invalid expire time in 'hexpire' command} + + test {HEXPIRE - wrong type key returns error} { + r FLUSHALL + r SET myhash notahash + catch {r HEXPIRE myhash 10 FIELDS 1 field1} e + set e + } {WRONGTYPE Operation against a key holding the wrong kind of value} + + # Conditionals: NX + test {HEXPIRE NX - only set when field has no TTL} { + r FLUSHALL + r HSETEX myhash PX 100 FIELDS 1 field1 val + set res [r HEXPIRE myhash 10 NX FIELDS 1 field1] + assert_equal {0} $res + + r HSET myhash field2 val2 + set res2 [r HEXPIRE myhash 10 NX FIELDS 1 field2] + assert_equal {1} $res2 + } + + # Conditionals: XX + test {HEXPIRE XX - only set when field has TTL} { + r FLUSHALL + r HSET myhash field1 val1 field2 val2 + r HEXPIRE myhash 20 FIELDS 1 field1 + set res [r HEXPIRE myhash 30 XX FIELDS 2 field1 field2] + assert_equal {1 0} $res + } + + # Conditionals: GT + test {HEXPIRE GT - only set if new TTL > existing TTL} { + r FLUSHALL + r HSETEX myhash EX 300 FIELDS 1 field1 val1 + after 10 + set res [r HEXPIRE myhash 600 GT FIELDS 1 field1] ;# 600s > 300s remaining + assert_equal {1} $res + + # GT should fail if field is persistent + r HSET myhash field2 val2 + set res2 [r HEXPIRE myhash 1 GT FIELDS 1 field2] + assert_equal {0} $res2 + } + + # Conditionals: LT + test {HEXPIRE LT - only set if new TTL < existing TTL} { + r FLUSHALL + r HSETEX myhash EX 600 FIELDS 1 field1 val1 + set res [r HEXPIRE myhash 1 LT FIELDS 1 field1] + assert_equal {1} $res + + ## TODO this is an expected behavior really? what does non existintg ttl mean? + r HSET myhash field2 val2 + set res2 [r HEXPIRE myhash 1 LT FIELDS 1 field2] + assert_equal {1} $res2 + } + + # TTL Refresh + test {HEXPIRE - refresh TTL with new value} { + r FLUSHALL + r HSET myhash field1 val1 + r HEXPIRE myhash 1 FIELDS 1 field1 + after 500 + r HEXPIRE myhash 3 FIELDS 1 field1 + set ttl [r HTTL myhash FIELDS 1 field1] + assert {$ttl >= 2} + } + + # HEXPIRE on a non-existent field + test {HEXPIRE on a non-existent field (should not create field)} { + r FLUSHALL + r HSET myhash f1 v1 + r HEXPIRE myhash 1000 FIELDS 1 f2 + assert_equal 0 [r HEXISTS myhash f2] + assert_equal -2 [r HTTL myhash FIELDS 1 f2] + } + + # Error Cases + test {HEXPIRE - conflicting conditions error} { + r FLUSHALL + r HSET myhash field1 val + catch {r HEXPIRE myhash 10 NX XX FIELDS 1 field1} e + set e + } {ERR NX and XX, GT or LT options at the same time are not compatible} + + test {HEXPIRE - missing FIELDS error} { + r FLUSHALL + r HSET myhash field1 val + catch {r HEXPIRE myhash 10} e + set e + } {ERR wrong number of arguments for 'hexpire' command} + + test {HEXPIRE - no fields after FIELDS keyword} { + r FLUSHALL + r HSET myhash field1 val + catch {r HEXPIRE myhash 10 FIELDS 0} e + set e + } {ERR wrong number of arguments for 'hexpire' command} + + test {HEXPIRE - non-integer TTL error} { + r FLUSHALL + r HSET myhash field1 val + catch {r HEXPIRE myhash abc FIELDS 1 field1} e + set e + } {ERR value is not an integer or out of range} + + test {HEXPIRE - non-existing key returns -2} { + r FLUSHALL + set res [r HEXPIRE nokey 10 FIELDS 1 field1] + assert_equal {-2} $res + } + + test {HEXPIRE EX - set TTL on multiple fields} { + r FLUSHALL + r HSET myhash fieldA valA fieldB valB + set ttl 100 + r HEXPIRE myhash $ttl FIELDS 2 fieldA fieldB + + set ttlA [r HTTL myhash FIELDS 1 fieldA] + set ttlB [r HTTL myhash FIELDS 1 fieldB] + + assert { $ttlA > 0 && $ttlA <= $ttl } + assert { $ttlB > 0 && $ttlB <= $ttl } + } {} + + test {HEXPIRE returns -2 on non-existing key} { + r FLUSHALL + assert_equal {-2 -2} [r HEXPIRE nokey 10 FIELDS 2 field1 field2] + } {} + + test {HEXPIRE - GT condition fails when field has no TTL} { + r FLUSHALL + r HSET myhash f1 v1 + assert_equal 0 [r HEXPIRE myhash 10 GT fields 1 f1] + } + + test {HEXPIRE - LT condition succeeds when field has no TTL} { + r FLUSHALL + r HSET myhash f1 v1 + assert_equal 1 [r HEXPIRE myhash 10 LT fields 1 f1] + } + + ##### HTTL ##### + test {HTTL - persistent field returns -1} { + r FLUSHALL + r HSET myhash field1 val1 + assert_equal -1 [r HTTL myhash FIELDS 1 field1] + } {} + + test {HTTL - non-existent field returns -2} { + r FLUSHALL + r HSET myhash field1 val1 + assert_equal -2 [r HTTL myhash FIELDS 1 nofield] + } {} + + test {HTTL - non-existent key returns -2} { + r FLUSHALL + assert_equal -2 [r HTTL nokey FIELDS 1 field1] + } {} + + ##### EXPIRETIME ###### + + # Basic Expiry Functionality + test {HEXPIREAT - set absolute expiry on field} { + r FLUSHALL + r HSET myhash field1 hello + set now [clock seconds] + set exp [expr {$now + 30}] + r HEXPIREAT myhash $exp FIELDS 1 field1 + set etime [r HEXPIRETIME myhash FIELDS 1 field1] + assert_equal $exp $etime + } + + test {HEXPIREAT - timestamp in past deletes field immediately} { + r FLUSHALL + r HSET myhash field1 gone + set past [expr {[clock seconds] - 1000}] + set res [r HEXPIREAT myhash $past FIELDS 1 field1] + assert_equal {2} $res + assert_equal 0 [r HEXISTS myhash field1] + } + + + test {HEXPIREAT - set TTL on multiple fields (existing + non-existing)} { + r FLUSHALL + r HSET myhash field1 hello field2 world + set exp [expr {[clock seconds] + 10}] + set res [r HEXPIREAT myhash $exp FIELDS 3 field1 field2 fieldX] + assert_equal {1 1 -2} $res + } + + + # Conditionals: NX + test {HEXPIREAT NX - only set when field has no TTL} { + r FLUSHALL + r HSETEX myhash EX 100 FIELDS 1 field1 val + set exp [expr {[clock seconds] + 100}] + set res [r HEXPIREAT myhash $exp NX FIELDS 1 field1] + assert_equal {0} $res + + r HSET myhash field2 val2 + set res2 [r HEXPIREAT myhash $exp NX FIELDS 1 field2] + assert_equal {1} $res2 + } + + # Conditionals: XX + test {HEXPIREAT XX - only set when field has TTL} { + r FLUSHALL + r HSET myhash field1 val1 field2 val2 + set exp1 [expr {[clock seconds] + 20}] + r HEXPIREAT myhash $exp1 FIELDS 1 field1 + set exp2 [expr {[clock seconds] + 30}] + set res [r HEXPIREAT myhash $exp2 XX FIELDS 2 field1 field2] + assert_equal {1 0} $res + } + + # Conditionals: GT + test {HEXPIREAT GT - only set if new expiry > existing} { + r FLUSHALL + r HSETEX myhash PX 5000 FIELDS 1 field1 val1 + after 10 + set now [clock seconds] + set future [expr {$now + 10}] + set res [r HEXPIREAT myhash $future GT FIELDS 1 field1] + assert_equal {1} $res + + r HSET myhash field2 val2 + set res2 [r HEXPIREAT myhash $future GT FIELDS 1 field2] + assert_equal {0} $res2 + } + + + # Conditionals: LT + test {HEXPIREAT LT - only set if new expiry < existing} { + r FLUSHALL + set now [clock seconds] + # now + 20K seconds + set long_future_expiration [expr {$now + 20000}] + # now + 1K seconds + set short_future_expiration [expr {$now + 1000}] + r HSETEX myhash EX $long_future_expiration FIELDS 1 field1 val1 + assert_equal {1} [r HEXPIREAT myhash $short_future_expiration LT FIELDS 1 field1] + + r HSET myhash field2 val2 + assert_equal {1} [r HEXPIREAT myhash $short_future_expiration LT FIELDS 1 field2] + # TODO is this the expected behavior? if no TTL exist, it should be treated as minimum ttl possible? + } + + test {HEXPIREAT - refresh TTL with new future timestamp} { + r FLUSHALL + r HSET myhash field1 val1 + + # Set initial expiry to very near future + set ts1 [expr {[clock seconds] + 10}] + r HEXPIREAT myhash $ts1 FIELDS 1 field1 + + # Immediately refresh to a further expiry (no sleep needed) + set ts2 [expr {$ts1 + 5}] + r HEXPIREAT myhash $ts2 FIELDS 1 field1 + + # Confirm that expiry was updated + set actual [r HEXPIRETIME myhash FIELDS 1 field1] + assert_equal $ts2 $actual + } + + + # TTL Validations + test {HEXPIREAT - TTL is accurate via HEXPIRETIME} { + r FLUSHALL + r HSET myhash field1 val1 + set ts [expr {[clock seconds] + 50}] + r HEXPIREAT myhash $ts FIELDS 1 field1 + set returned [r HEXPIRETIME myhash FIELDS 1 field1] + assert_equal $ts $returned + } + + # Error Cases + test {HEXPIREAT - conflicting options error} { + r FLUSHALL + r HSET myhash field1 val + set ts [expr {[clock seconds] + 5}] + catch {r HEXPIREAT myhash $ts NX XX FIELDS 1 field1} e + set e + } {ERR NX and XX, GT or LT options at the same time are not compatible} + + + + test {HEXPIREAT - missing FIELDS keyword} { + r FLUSHALL + r HSET myhash field1 val + set ts [expr {[clock seconds] + 5}] + catch {r HEXPIREAT myhash $ts} e + set e + } {ERR wrong number of arguments for 'hexpireat' command} + + test {HEXPIREAT - no fields after FIELDS} { + r FLUSHALL + r HSET myhash field1 val + set ts [expr {[clock seconds] + 5}] + catch {r HEXPIREAT myhash $ts FIELDS 0} e + set e + } {ERR wrong number of arguments for 'hexpireat' command} + + test {HEXPIREAT - non-integer timestamp} { + r FLUSHALL + r HSET myhash field1 val + catch {r HEXPIREAT myhash tomorrow FIELDS 1 field1} e + set e + } {ERR value is not an integer or out of range} + + + + test {HEXPIREAT - non-existing key returns -2} { + r FLUSHALL + set ts [expr {[clock seconds] + 5}] + set res [r HEXPIREAT nokey $ts FIELDS 1 field1] + assert_equal {-2} $res + } + + #################### HEXPIRETIME ################## + + # Basic TTL retrieval + test {HEXPIRETIME - returns expiry timestamp for single field with TTL} { + r FLUSHALL + r HSET myhash field1 val + set ts [expr {[clock seconds] + 3}] + r HEXPIREAT myhash $ts FIELDS 1 field1 + set out [r HEXPIRETIME myhash FIELDS 1 field1] + assert_equal $ts $out + } + + + # No expiration set + test {HEXPIRETIME - field has no TTL returns -1} { + r FLUSHALL + r HSET myhash field1 val + set out [r HEXPIRETIME myhash FIELDS 1 field1] + assert_equal -1 $out + } + + # Non-existent field + test {HEXPIRETIME - field does not exist returns -2} { + r FLUSHALL + r HSET myhash field1 val + set out [r HEXPIRETIME myhash FIELDS 1 fieldX] + assert_equal -2 $out + } + + # Non-existent key + test {HEXPIRETIME - key does not exist returns -2} { + r FLUSHALL + set out [r HEXPIRETIME missingkey FIELDS 1 field1] + assert_equal -2 $out + } + + # Multiple fields: mix of TTL, no TTL, and missing + test {HEXPIRETIME - multiple fields mixed cases} { + r FLUSHALL + r HSET myhash f1 a f2 b + set now [clock seconds] + r HEXPIREAT myhash [expr {$now + 100}] FIELDS 1 f1 + set out [r HEXPIRETIME myhash FIELDS 3 f1 f2 f3] + # Should return: expiry for f1, -1 for f2 (no TTL), -2 for f3 (not found) + assert_equal [list [expr {$now + 100}] -1 -2] $out + } + + # Invalid usages + test {HEXPIRETIME - no FIELDS keyword} { + r FLUSHALL + r HSET myhash f1 a + catch {r HEXPIRETIME myhash} e + set e + } {ERR wrong number of arguments for 'hexpiretime' command} + + test {HEXPIRETIME - FIELDS 0} { + r FLUSHALL + r HSET myhash f1 a + catch {r HEXPIRETIME myhash FIELDS 0} e + set e + } {ERR wrong number of arguments for 'hexpiretime' command} + + test {HEXPIRETIME - wrong FIELDS count} { + r FLUSHALL + r HSET myhash f1 a + catch {r HEXPIRETIME myhash FIELDS 1} e + set e + } {ERR wrong number of arguments for 'hexpiretime' command} + + test {HEXPIRETIME - wrong type key} { + r FLUSHALL + r SET myhash "not a hash" + catch {r HEXPIRETIME myhash FIELDS 1 f1} e + set e + } {WRONGTYPE Operation against a key holding the wrong kind of value} + + + # Basic expiration in milliseconds + test {HPEXPIREAT - set absolute expiry with ms precision} { + r FLUSHALL + r HSET myhash field1 val + set now [clock milliseconds] + set future [expr {$now + 123456789}] + r HPEXPIREAT myhash $future FIELDS 1 field1 + set t [r HPEXPIRETIME myhash FIELDS 1 field1] + assert_equal $future $t + } + + test {HPEXPIREAT - past timestamp deletes field immediately} { + r FLUSHALL + r HSET myhash field1 val + set past [expr {[clock milliseconds] - 10000}] + set res [r HPEXPIREAT myhash $past FIELDS 1 field1] + assert_equal {2} $res + assert_equal 0 [r HEXISTS myhash field1] + } + + test {HPEXPIREAT - non-existent key returns -2} { + r FLUSHALL + set ts [expr {[clock milliseconds] + 1000}] + set res [r HPEXPIREAT nokey $ts FIELDS 1 field1] + assert_equal {-2} $res + } + + test {HPEXPIREAT - mixed fields} { + r FLUSHALL + r HSET myhash f1 a f2 b + set ts [expr {[clock milliseconds] + 200000}] + set res [r HPEXPIREAT myhash $ts FIELDS 3 f1 f2 fX] + assert_equal {1 1 -2} $res + } + + test {HPEXPIREAT - GT and LT options with success and failure cases} { + r FLUSHALL + r HSET myhash f1 a + + # Setup: assign a baseline expiry time + set now [clock milliseconds] + set ts1 [expr {$now + 10000}] + set ts2 [expr {$now + 20000}] + r HPEXPIREAT myhash $ts1 FIELDS 1 f1 + + # --- GT Case --- + # ts2 > ts1 → should succeed + set res_gt_pass [r HPEXPIREAT myhash $ts2 GT FIELDS 1 f1] + assert_equal {1} $res_gt_pass + + # ts1 < ts2 → now try GT with ts1 again (should fail because ts2 is already set) + set res_gt_fail [r HPEXPIREAT myhash $ts1 GT FIELDS 1 f1] + assert_equal {0} $res_gt_fail + + # --- LT Case --- + # ts1 < ts2 → LT should fail + set res_lt_fail [r HPEXPIREAT myhash $ts2 LT FIELDS 1 f1] + assert_equal {0} $res_lt_fail + + # ts1 < ts2 → try LT with earlier timestamp, should succeed + set ts0 [expr {$now + 5000}] + set res_lt_pass [r HPEXPIREAT myhash $ts0 LT FIELDS 1 f1] + assert_equal {1} $res_lt_pass + } + + test {HPEXPIREAT - invalid inputs} { + r FLUSHALL + r HSET myhash f1 a + catch {r HPEXPIREAT myhash abc FIELDS 1 f1} e + assert_match {*not an integer*} $e + + catch {r HPEXPIREAT myhash 12345 NX XX FIELDS 1 f1} e2 + assert_match {ERR NX and XX, GT or LT options at the same time are not compatible} $e2 + } + + + test {HPEXPIRETIME - check with multiple fields} { + r FLUSHALL + + # Setup: one expiring field, one persistent, one missing + r HSET myhash f1 v1 f2 v2 + set ts [expr {[clock milliseconds] + 1000}] + r HPEXPIREAT myhash $ts FIELDS 1 f1 + + # Query all 3 fields + set result [r HPEXPIRETIME myhash FIELDS 3 f1 f2 f3] + + # Expect: [timestamp] for f1, -1 for f2, -2 for f3 + assert {[llength $result] == 3} + # f1: has TTL → returns exact timestamp + assert_equal $ts [lindex $result 0] + + # f2: exists, no TTL → returns -1 + assert_equal -1 [lindex $result 1] + + # f3: doesn't exist → returns -2 + assert_equal -2 [lindex $result 2] + + } + + #################### HPERSIST ################## + + test "HPERSIST - field does not exist" { + r FLUSHALL + r hset myhash field1 value1 + assert_equal {-2} [r hpersist myhash FIELDS 1 field2] + } + + test "HPERSIST - key does not exist" { + r FLUSHALL + assert_equal {-2} [r hpersist nonexistent FIELDS 1 field1] + } + + test "HPERSIST - field exists but no expiration" { + r del myhash + r hset myhash field1 value1 + assert_equal {-1} [r hpersist myhash FIELDS 1 field1] + } + + test "HPERSIST - field exists with expiration" { + r FLUSHALL + r hset myhash field1 value1 + r hexpire myhash 600 FIELDS 1 field1 + assert_morethan [r httl myhash FIELDS 1 field1] 0 + assert_equal {1} [r hpersist myhash FIELDS 1 field1] + assert_equal {-1} [r httl myhash FIELDS 1 field1] + } + + test "HPERSIST - multiple fields with mixed state" { + r FLUSHALL + r hset myhash f1 v1 + r hset myhash f2 v2 + r hset myhash f3 v3 + r hexpire myhash 600 FIELDS 1 f1 + # f2 will have no expiration + # f4 does not exist + assert_equal {1 -1 -2} [r hpersist myhash FIELDS 3 f1 f2 f4] + } + + test {HPERSIST, then HEXPIRE, check new TTL is set} { + r FLUSHALL + r HSET myhash f1 v1 + r HEXPIRE myhash 1000 FIELDS 1 f1 + assert_equal 1 [r HPERSIST myhash FIELDS 1 f1] + r HEXPIRE myhash 2000 FIELDS 1 f1 + assert_morethan [r HTTL myhash FIELDS 1 f1] 1000 + } + + #################### HRANDFIELD ################## + + test "HRANDFIELD - CASE 1: negative count" { + r FLUSHALL + assert_equal {1} [r HSETEX myhash PX 1 fields 5 f1 v1 f2 v2 f3 v3 f4 v4 f5 v5] + wait_for_condition 100 100 { + [r HGETALL myhash] eq {} + } else { + fail "Hash is showing expired elements" + } + # check that we get an empty response even though there are expired fields + assert_match {} [r hrandfield myhash 1] + + # Now write a persistent element + assert_equal {1} [r HSET myhash f5 v5] + # make sure this is the element we will get all the time + for {set i 1} {$i <= 50} {incr i} { + assert_equal {f5 f5 f5 f5 f5} [r hrandfield myhash -5] + } + + } + + test "HRANDFIELD - CASE 2: The number of requested elements is greater than the number of elements inside the hash" { + r FLUSHALL + assert_equal {1} [r HSETEX myhash PX 1 fields 5 f1 v1 f2 v2 f3 v3 f4 v4 f5 v5] + wait_for_condition 100 100 { + [r HGETALL myhash] eq {} + } else { + fail "Hash is showing expired elements" + } + # check that we get an empty response even though there are expired fields + assert_match {} [r hrandfield myhash 10] + + # Now write a persistent element + assert_equal {3} [r HSET myhash f5 v5 f6 v6 f7 v7] + # make sure this is the element we will get all the time + for {set i 1} {$i <= 50} {incr i} { + set result [r hrandfield myhash 10] + assert_equal 3 [llength [split $result]] + assert_match {*f5*} $result + assert_match {*f6*} $result + assert_match {*f7*} $result + } + + } + + test "HRANDFIELD - CASE 3: The number of elements inside the hash is not greater than 3 times the number of requested elements" { + r FLUSHALL + assert_equal {1} [r HSETEX myhash PX 1 fields 5 f1 v1 f2 v2 f3 v3 f4 v4 f5 v5] + wait_for_condition 100 100 { + [r HGETALL myhash] eq {} + } else { + fail "Hash is showing expired elements" + } + # check that we get an empty response even though there are expired fields + assert_match {} [r hrandfield myhash 4] + + # Now write a persistent elements + assert_equal {4} [r HSET myhash f5 v5 f6 v6 f7 v7 f8 v8] + # make sure this is the elements we will get all the time + for {set i 1} {$i <= 50} {incr i} { + set result [r hrandfield myhash 4] + assert_equal 4 [llength [split $result]] + assert_match {*f5*} $result + assert_match {*f6*} $result + assert_match {*f7*} $result + assert_match {*f8*} $result + } + } + + test "HRANDFIELD - CASE 4: The number of elements inside the hash is greater than 3 times the number of requested elements" { + r FLUSHALL + assert_equal {1} [r HSETEX myhash PX 1 fields 8 f1 v1 f2 v2 f3 v3 f4 v4 f5 v5 f6 v6 f7 v7 f8 v8] + wait_for_condition 100 100 { + [r HGETALL myhash] eq {} + } else { + fail "Hash is showing expired elements" + } + + # check that we get an empty response even though there are expired fields + assert_match {} [r hrandfield myhash 2] + + # Now write a persistent elements + assert_equal {3} [r HSET myhash f8 v8 f9 v9 f10 v10] + # make sure this is the elements we will get all the time + for {set i 1} {$i <= 50} {incr i} { + set result [r hrandfield myhash 3] + assert_equal 3 [llength [split $result]] + assert_match {*f8*} $result + assert_match {*f9*} $result + assert_match {*f10*} $result + } + } +} + +####### Expiry fields skip tests +start_server {tags {"hashexpire"}} { + test {HGETALL skips expired fields} { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + + # Set two fields: one persistent, one with short TTL + r HSET myhash persistent "val1" + r HSETEX myhash PX 5 FIELDS 1 expiring "val2" + + # Wait for expiry to pass + after 10 + + # HGETALL should skip expired field + set result [r HGETALL myhash] + assert_equal {persistent val1} $result + + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {HSCAN skips expired fields} { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + + # Set multiple fields, one with expiry + r HSET myhash persistent1 "a" persistent2 "b" + r HSETEX myhash PX 5 FIELDS 1 expiring "c" + + # Wait for expiration + after 10 + + # HSCAN must not return the expired field + set cursor 0 + set allfields {} + while {1} { + set res [r HSCAN myhash $cursor] + set cursor [lindex $res 0] + set kvs [lindex $res 1] + lappend allfields {*}$kvs + if {$cursor eq "0"} break + } + + # Extract just the field names + set fieldnames [lmap {k v} $allfields { set k }] + set fieldnames_sorted [lsort $fieldnames] + + # Should only include persistent1 and persistent2 + assert_equal {persistent1 persistent2} $fieldnames_sorted + + # Re-enable active expiry for future tests + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {MOVE preserves field TTLs} { + r FLUSHALL + r SELECT 0 + r HSETEX myhash PX 50000 FIELDS 1 field1 val1 + + # Capture original TTL + set original_ttl [r HPTTL myhash FIELDS 1 field1] + assert {$original_ttl > 0} + + # Move to DB 1 + assert_equal 1 [r MOVE myhash 1] + + # Switch to target DB + r SELECT 1 + + # Field must exist and TTL must be preserved + set moved_ttl [r HPTTL myhash FIELDS 1 field1] + assert {$moved_ttl > 0 && $moved_ttl <= $original_ttl} + } {} {needs:debug} + + test {HSET - overwrite expired field without TTL clears expiration} { + r FLUSHALL + r debug SET-ACTIVE-EXPIRE no + + # This test verifies that if a field has expired (but not yet lazily deleted), + # and it is overwritten using a plain HSET (i.e., no TTL), + # Valkey treats the field as non existing and updates it, + # effectively clearing the old TTL and making the field persistent. + + r HSETEX myhash PX 10 FIELDS 1 field1 oldval + wait_for_condition 100 100 { + [r HTTL myhash FIELDS 1 field1] eq "-2" + } else { + fail "hash value was not expired after timeout" + } + + # Field should still be present in memory due to lazy expiry + assert_equal 1 [r HLEN myhash] + + # Overwrite with HSET (no TTL) before accessing + r HSET myhash field1 newval + + # TTL should now be gone; field becomes persistent + set ttl [r HPTTL myhash FIELDS 1 field1] + assert_equal -1 $ttl + assert_equal newval [r HGET myhash field1] + assert_equal 1 [r HLEN myhash] + + r debug SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {HINCRBY - on expired field} { + r FLUSHALL + r debug SET-ACTIVE-EXPIRE no + + # This test verifies that if a field has expired, + # and it is overwritten using a plain HINCRBY (i.e., no TTL), + # Valkey treats the field as still existing and updates it, + # effectively clearing the old TTL and starting the value from 0. + + r HSETEX myhash PX 10 FIELDS 1 field1 1 + wait_for_condition 100 100 { + [r HTTL myhash FIELDS 1 field1] eq "-2" + } else { + fail "hash value was not expired after timeout" + } + + # Field should still be present in memory + assert_equal 1 [r HLEN myhash] + + # Overwrite with HINCRBY (no TTL) before accessing + r HINCRBY myhash field1 1 + + # Sanity check: check we only have one field in the hash + assert_equal 1 [r HLEN myhash] + + # TTL should now be gone; field becomes persistent + set ttl [r HPTTL myhash FIELDS 1 field1] + assert_equal -1 $ttl + assert_equal 1 [r HGET myhash field1] + assert_equal 1 [r HLEN myhash] + + # set expiration on the field + assert_equal 1 [r HEXPIRE myhash 100000000 FIELDS 1 field1] + # verify the field has TTL + assert_morethan [r HPTTL myhash FIELDS 1 field1] 0 + # now incr the field again + assert_equal 2 [r HINCRBY myhash field1 1] + # verify the field has TTL + assert_morethan [r HPTTL myhash FIELDS 1 field1] 0 + r debug SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {HINCRBYFLOAT - on expired field} { + r FLUSHALL + r debug SET-ACTIVE-EXPIRE no + + # This test verifies that if a field has expired, + # and it is overwritten using a plain HINCRBYFLOAT (i.e., no TTL), + # Valkey treats the field as still existing and updates it, + # effectively clearing the old TTL and starting the value from 0. + + r HSETEX myhash PX 10 FIELDS 1 field1 1 + wait_for_condition 100 100 { + [r HTTL myhash FIELDS 1 field1] eq "-2" + } else { + fail "hash value was not expired after timeout" + } + + # Field should still be present in memory + assert_equal 1 [r HLEN myhash] + + # Overwrite with HINCRBYFLOAT (no TTL) before accessing + r HINCRBYFLOAT myhash field1 1 + + # Sanity check: check we only have one field in the hash + assert_equal 1 [r HLEN myhash] + + # TTL should now be gone; field becomes persistent + set ttl [r HPTTL myhash FIELDS 1 field1] + assert_equal -1 $ttl + assert_equal 1 [r HGET myhash field1] + assert_equal 1 [r HLEN myhash] + + # set expiration on the field + assert_equal 1 [r HEXPIRE myhash 100000000 FIELDS 1 field1] + # verify the field has TTL + assert_morethan [r HPTTL myhash FIELDS 1 field1] 0 + # now incr the field again + assert_equal 2 [r HINCRBYFLOAT myhash field1 1] + # verify the field has TTL + assert_morethan [r HPTTL myhash FIELDS 1 field1] 0 + r debug SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {HSET - overwrite unexpired field removes TTL} { + r FLUSHALL + r debug SET-ACTIVE-EXPIRE no + + # This test verifies that overwriting a field with HSET, + # even while its TTL is still valid (not expired), + # clears the TTL and makes the field persistent. + # This behavior is consistent with how HSET works for normal keys. + + # Set field with long TTL + r HSETEX myhash PX 1000 FIELDS 1 field1 val1 + + # Confirm TTL is active + set before [r HPTTL myhash FIELDS 1 field1] + assert {$before > 0} + + # Overwrite with HSET before TTL expires + r HSET myhash field1 newval + + # TTL should now be gone + set after [r HPTTL myhash FIELDS 1 field1] + assert_equal -1 $after + assert_equal newval [r HGET myhash field1] + + r debug SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {HDEL - expired field is removed without triggering expiry logic} { + r FLUSHALL + r debug SET-ACTIVE-EXPIRE no + + # This test proves that deleting an expired field with HDEL + # does NOT trigger Valkey's expiration mechanism. + # + # The key observation is that Valkey tracks how many fields were + # expired via TTL using the `expired_subkeys` counter in INFO stats. + # If HDEL caused expiration to be processed internally, + # this counter would increment. We assert that it remains unchanged. + + # Capture expired_subkeys before + set before_info [r INFO stats] + set before [info_field $before_info expired_subkeys] + + # Create field with short TTL + r HSETEX myhash PX 10 FIELDS 1 field1 val1 + after 20 + + # Field is technically expired, but still in-memory due to lazy expiry + assert_equal 1 [r HLEN myhash] + + # Delete the expired field directly + r HDEL myhash field1 + + # Field should be gone + assert_equal 0 [r HEXISTS myhash field1] + + # Capture expired_subkeys again + set after_info [r INFO stats] + set after [info_field $after_info expired_subkeys] + + # Verify that no expiry occurred internally + assert_equal $before $after + r debug SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {HDEL on field with TTL, then re-add and check TTL is gone} { + r FLUSHALL + r HSET myhash f1 v1 + r HEXPIRE myhash 10000 FIELDS 1 f1 + assert_morethan [r HTTL myhash FIELDS 1 f1] 0 + r HDEL myhash f1 + r HSET myhash f1 v2 + assert_equal -1 [r HTTL myhash FIELDS 1 f1] + } + +} + +####### Test info +start_server {tags {"hash-ttl-info external:skip"}} { + test {Hash ttl - check command stats} { + r FLUSHALL + + # Run all relevant hash TTL commands + r HSET myhash f1 v1 f2 v2 + r HEXPIRE myhash 10 FIELDS 1 f1 + r HEXPIREAT myhash [expr {[clock seconds] + 10}] FIELDS 1 f2 + r HEXPIRETIME myhash FIELDS 2 f1 f2 + r HPEXPIRE myhash 1000 FIELDS 1 f1 + r HPEXPIREAT myhash [expr {[clock milliseconds] + 2000}] FIELDS 1 f2 + r HPEXPIRETIME myhash FIELDS 2 f1 f2 + r HGETEX myhash EX 120 FIELDS 1 f1 + r HTTL myhash FIELDS 1 f2 + r HPTTL myhash FIELDS 1 f1 + + # Fetch commandstats + set info [r INFO commandstats] + + # Extract call counts + proc get_calls {info cmd} { + foreach line [split $info "\n"] { + if {[string match "cmdstat_$cmd:*" $line]} { + regexp {calls=(\d+)} $line -> count + return $count + } + } + return -1 + } + + # Assert each command appears with correct call count (1 call each) + assert_equal 1 [get_calls $info hexpire] + assert_equal 1 [get_calls $info hexpireat] + assert_equal 1 [get_calls $info hexpiretime] + assert_equal 1 [get_calls $info hpexpire] + assert_equal 1 [get_calls $info hpexpireat] + assert_equal 1 [get_calls $info hpexpiretime] + assert_equal 1 [get_calls $info hgetex] + assert_equal 1 [get_calls $info httl] + assert_equal 1 [get_calls $info hpttl] + } +} + + +#### Replication #### +start_server {tags {"hashexpire external:skip"}} { + # Start another server to test replication of TTLs + start_server {tags {needs:repl external:skip}} { + # Set the outer layer server as primary + set primary [srv -1 client] + set primary_host [srv -1 host] + set primary_port [srv -1 port] + # Set this inner layer server as replica + set replica [srv 0 client] + + test {Setup replica and check field expiry after full sync} { + $primary flushall + + # Set up some TTLs on primary BEFORE replica connects + set now [clock milliseconds] + set f1_exp [expr {$now + 50000}] + set f2_exp [expr {$now + 70000}] + + $primary HSET myhash f1 v1 f2 v2 + $primary HPEXPIREAT myhash $f1_exp FIELDS 1 f1 + $primary HPEXPIREAT myhash $f2_exp FIELDS 1 f2 + + # Now connect replica + $replica replicaof $primary_host $primary_port + + wait_for_condition 100 100 { + [info_field [$replica info replication] master_link_status] eq "up" + } else { + fail "Master <-> Replica didn't finish sync" + } + + + # Wait for full sync + wait_for_ofs_sync $primary $replica + + + # Validate TTLs replicated correctly + set r1 [$replica HPEXPIRETIME myhash FIELDS 1 f1] + set r2 [$replica HPEXPIRETIME myhash FIELDS 1 f2] + + assert_equal $f1_exp $r1 + assert_equal $f2_exp $r2 + } + + + + test {HASH TTL - replicated TTL is absolute and consistent on replica} { + $primary flushall + + set now [clock milliseconds] + set future [expr {$now + 5000}] + set future_sec [expr {$future / 1000}] + + # HPEXPIREAT + $primary HSET myhash f1 v1 + $primary HPEXPIREAT myhash $future FIELDS 1 f1 + + # HSETEX EX + $primary HSETEX myhash EX 5 FIELDS 1 f2 v2 + + # HEXPIRE + $primary HSET myhash f3 v3 + $primary HEXPIRE myhash 5 FIELDS 1 f3 + + wait_for_ofs_sync $primary $replica + + set t1 [$primary HPEXPIRETIME myhash FIELDS 1 f1] + set t1r [$replica HPEXPIRETIME myhash FIELDS 1 f1] + assert_equal $t1 $t1r + + set t2 [$primary HEXPIRETIME myhash FIELDS 1 f2] + set t2r [$replica HEXPIRETIME myhash FIELDS 1 f2] + assert_equal $t2 $t2r + + set t3 [$primary HEXPIRETIME myhash FIELDS 1 f3] + set t3r [$replica HEXPIRETIME myhash FIELDS 1 f3] + assert_equal $t3 $t3r + } + + test {HASH TTL - field expired on master gets deleted on replica} { + $primary flushall + + $primary HSETEX myhash PX 10 FIELDS 1 f1 val1 + after 20 + wait_for_ofs_sync $primary $replica + + + # Trigger lazy expiry + catch {$primary HGET myhash f1} + wait_for_ofs_sync $primary $replica + + + assert_equal 0 [$replica HEXISTS myhash f1] + } + + + test {HASH TTL - replica retains TTL and field before expiration} { + $primary flushall + + $primary HSETEX myhash PX 1000 FIELDS 1 f1 val1 + wait_for_ofs_sync $primary $replica + + set master_ttl [$primary HPTTL myhash FIELDS 1 f1] + set replica_ttl [$replica HPTTL myhash FIELDS 1 f1] + assert {$replica_ttl > 0} + assert {$replica_ttl <= $master_ttl} + + } + + test {HSETEX with expired time is propagated to the replica} { + $primary flushall + + assert_equal [$primary HSET myhash f1 val1] "1" + + wait_for_condition 100 100 { + [$replica HGET myhash f1] eq {val1} + } else { + fail "hash field was not set on replica after timeout" + } + + assert_equal [$primary HSETEX myhash EXAT 0 FIELDS 1 f1 val1] {1} + + wait_for_condition 100 100 { + [$primary EXISTS myhash] eq "0" + } else { + fail "hash object was not deleted on primary after timeout" + } + wait_for_ofs_sync $primary $replica + + wait_for_condition 100 100 { + [$replica EXISTS myhash] eq "0" + } else { + fail "hash object was not deleted on replica after timeout" + } + } + + test {HGETEX with expired time is propagated to the replica} { + $primary flushall + + assert_equal [$primary HSET myhash f1 val1] "1" + + wait_for_condition 100 100 { + [$replica HGET myhash f1] eq {val1} + } else { + fail "hash field was not set on replica after timeout" + } + + assert_equal [$primary HGETEX myhash EXAT 0 FIELDS 1 f1] {val1} + + wait_for_condition 100 100 { + [$primary EXISTS myhash] eq "0" + } else { + fail "hash object was not deleted on primary after timeout" + } + wait_for_ofs_sync $primary $replica + + wait_for_condition 100 100 { + [$replica EXISTS myhash] eq "0" + } else { + fail "hash object was not deleted on replica after timeout" + } + } + test {HEXPIREAT with expired time is propagated to the replica} { + $primary flushall + + assert_equal [$primary HSET myhash f1 val1] "1" + + wait_for_condition 100 100 { + [$replica HGET myhash f1] eq {val1} + } else { + fail "hash field was not set on replica after timeout" + } + + assert_equal [$primary HEXPIREAT myhash 0 FIELDS 1 f1] {2} + + wait_for_condition 100 100 { + [$primary EXISTS myhash] eq "0" + } else { + fail "hash object was not deleted on primary after timeout" + } + wait_for_ofs_sync $primary $replica + + wait_for_condition 100 100 { + [$replica EXISTS myhash] eq "0" + } else { + fail "hash object was not deleted on replica after timeout" + } + } + } +} + +start_server {tags {"hashexpire external:skip"}} { + set primary [srv 0 client] + set primary_host [srv 0 host] + set primary_port [srv 0 port] + start_server {tags {needs:repl external:skip}} { + set replica_1 [srv 0 client] + set replica_1_host [srv 0 host] + set replica_1_port [srv 0 port] + + test {Replication Primary -> R1} { + lassign [setup_replication_test $primary $replica_1 $primary_host $primary_port] primary_initial_expired replica_1_initial_expired + + # Initialize deferred clients and subscribe to keyspace notifications + foreach instance [list $primary $replica_1] { + $instance config set notify-keyspace-events KEA + } + set rd_primary [valkey_deferring_client -1] + set rd_replica_1 [valkey_deferring_client $replica_1_host $replica_1_port] + foreach rd [list $rd_primary $rd_replica_1] { + assert_equal {1} [psubscribe $rd __keyevent@*] + } + + + # Setup hash, set expire and set expire 0 + $primary HSET myhash f1 v1 f2 v2 ;# Should trigger 3 hset + # Create hash and timing - f1 < f2 expiry times + set f1_exp [expr {[clock seconds] + 10000}] + $primary HEXPIREAT myhash $f1_exp FIELDS 1 f1 ;# Should trigger 3 hexpire + wait_for_ofs_sync $primary $replica_1 + + $primary HEXPIRE myhash 0 FIELDS 1 f1 ;# Should trigger 1 hexpired (for primary) and 1 hdel (for replica) + wait_for_ofs_sync $primary $replica_1 + + # Wait for f1 expiration + wait_for_condition 50 100 { + [$primary HTTL myhash FIELDS 1 f1] eq -2 && \ + [$replica_1 HTTL myhash FIELDS 1 f1] eq -2 + } else { + fail "f1 still exists" + } + + # Verify keyspace notification + foreach rd [list $rd_primary $rd_replica_1] { + assert_keyevent_patterns $rd myhash hset hexpire + } + # primary gets hexpired and replica gets hdel + assert_keyevent_patterns $rd_primary myhash hexpired + assert_keyevent_patterns $rd_replica_1 myhash hdel + + $rd_primary close + $rd_replica_1 close + } + + start_server {tags {needs:repl external:skip}} { + $primary FLUSHALL + set replica_2 [srv 0 client] + set replica_2_host [srv 0 host] + set replica_2_port [srv 0 port] + + test {Chain Replication (Primary -> R1 -> R2) preserves TTL} { + $replica_1 replicaof $primary_host $primary_port + # Wait for R2 to connect to R1 + wait_for_condition 100 100 { + [info_field [$replica_1 info replication] master_link_status] eq "up" + } else { + fail "R1 <-> PRIMARY didn't establish connection" + } + + $replica_2 replicaof $replica_1_host $replica_1_port + # Wait for R2 to connect to R1 + wait_for_condition 100 100 { + [info_field [$replica_1 info replication] master_link_status] eq "up" + } else { + fail "R2 <-> R1 didn't establish connection" + } + + # Initialize deferred clients and subscribe to keyspace notifications + set rd_primary [valkey_deferring_client -2] + set rd_replica_1 [valkey_deferring_client -1] + set rd_replica_2 [valkey_deferring_client $replica_2_host $replica_2_port] + assert_equal {1} [psubscribe $rd_primary __keyevent@*] + assert_equal {1} [psubscribe $rd_replica_1 __keyevent@*] + assert_equal {1} [psubscribe $rd_replica_2 __keyevent@*] + + # Create hash and timing - f1 < f2 < f3 expiry times + set f1_exp [expr {[clock seconds] + 10000}] + + ############################################# STEUP HASH ############################################# + $primary HSET myhash f1 v1 f2 v2 ;# Should trigger 3 hset + $primary HEXPIREAT myhash $f1_exp FIELDS 1 f1 ;# Should trigger 3 hexpire + wait_for_ofs_sync $primary $replica_1 + wait_for_ofs_sync $replica_1 $replica_2 + + $primary HPEXPIRE myhash 0 FIELDS 1 f1 ;# Should trigger 1 hexpired (for primary) and 2 hdel (for replicas) + wait_for_ofs_sync $primary $replica_1 + wait_for_ofs_sync $replica_1 $replica_2 + + + # Wait for f1 expiration + wait_for_condition 50 100 { + [$primary HTTL myhash FIELDS 1 f1] eq -2 && \ + [$replica_1 HTTL myhash FIELDS 1 f1] eq -2 && \ + [$replica_2 HTTL myhash FIELDS 1 f1] eq -2 + } else { + fail "f1 still exists" + } + + # primary gets hexpired and replicas get hdel + foreach rd [list $rd_primary $rd_replica_1 $rd_replica_2] { + assert_keyevent_patterns $rd myhash hset hexpire + } + assert_keyevent_patterns $rd_primary myhash hexpired + assert_keyevent_patterns $rd_replica_1 myhash hdel + assert_keyevent_patterns $rd_replica_2 myhash hdel + + $rd_primary close + $rd_replica_1 close + $rd_replica_2 close + } + } + + test {Replica Failover} { + $primary FLUSHALL + $primary DEBUG SET-ACTIVE-EXPIRE no + $replica_1 DEBUG SET-ACTIVE-EXPIRE no + ####### Replication setup ####### + $replica_1 replicaof $primary_host $primary_port + wait_for_condition 50 100 { + [lindex [$replica_1 role] 0] eq {slave} && + [string match {*master_link_status:up*} [$replica_1 info replication]] + } else { + fail "Can't turn the instance into a replica" + } + + # Create hash fields with TTL on primary + set f1_exp [expr {[clock seconds] + 200}] + set f2_exp [expr {[clock seconds] + 300000}] + $primary HSET myhash f1 v1 f2 v2 f3 v3 + $primary HEXPIREAT myhash $f1_exp FIELDS 1 f1 + $primary HEXPIREAT myhash $f2_exp FIELDS 1 f2 + # f3 remains persistent + + # Wait for full sync + wait_for_ofs_sync $primary $replica_1 + + # Verify primary and replica are the same + foreach instance [list $primary $replica_1] { + assert_equal $f1_exp [$instance HEXPIRETIME myhash FIELDS 1 f1] + assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] + assert_equal -1 [$instance HTTL myhash FIELDS 1 f3] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal "v1 v2 v3" [$instance HMGET myhash f1 f2 f3] + assert_equal 3 [$instance HLEN myhash] + } + + # Perform failover + $replica_1 replicaof no one + # Wait for replica to become primary + wait_for_condition 100 100 { + [info_field [$replica_1 info replication] role] eq "master" + } else { + fail "Replica didn't become master" + } + + # Setup keyspace notifications for the promoted replica + $replica_1 config set notify-keyspace-events KEA + set rd_replica [valkey_deferring_client $replica_1_host $replica_1_port] + assert_equal {1} [psubscribe $rd_replica __keyevent@*] + + # Check all values that checked before are the same + assert_equal 3 [$replica_1 HLEN myhash] + assert_equal $f1_exp [$replica_1 HEXPIRETIME myhash FIELDS 1 f1] + assert_equal $f2_exp [$replica_1 HEXPIRETIME myhash FIELDS 1 f2] + assert_equal -1 [$replica_1 HTTL myhash FIELDS 1 f3] + assert_equal "v1 v2 v3" [$replica_1 HGETEX myhash FIELDS 3 f1 f2 f3] + assert_equal 3 [$replica_1 HLEN myhash] + + # Set f1 to expire in 1 second and wait for expiration + $replica_1 HEXPIRE myhash 1 FIELDS 1 f1 ;# will trigger hexpire + wait_for_condition 50 100 { + [$replica_1 HTTL myhash FIELDS 1 f1] eq -2 + } else { + fail "f1 not expired" + } + + # Verify expiry in replica + assert_equal "" [$replica_1 HGET myhash f1] + assert_equal 3 [$replica_1 HLEN myhash] + + # Verify no expiry in primary + assert_equal "v1" [$primary HGET myhash f1] + + # Change TTL of f2 + $replica_1 HEXPIRE myhash 1000000 FIELDS 1 f2 ;# will trigger hexpire + assert_morethan [$replica_1 HTTL myhash FIELDS 1 f2] 9000 + assert_equal $f2_exp [$primary HEXPIRETIME myhash FIELDS 1 f2] + + # Change TTL of f2 to 0 (immediate expiry) + $replica_1 HGETEX myhash EX 0 FIELDS 1 f2 ;# will trigger hexpired + # Verify final state + assert_equal 2 [$replica_1 HLEN myhash] + assert_equal "{} {} v3" [$replica_1 HGETEX myhash FIELDS 3 f1 f2 f3] + assert_equal "v1 v2 v3" [$primary HGETEX myhash FIELDS 3 f1 f2 f3] ;# No change for primary + + assert_keyevent_patterns $rd_replica myhash hexpire hexpire hexpired + + $rd_replica close + # Re-enable active expiry + $primary DEBUG SET-ACTIVE-EXPIRE yes + $replica_1 DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + + test {Promotion to primary} { + lassign [setup_replication_test $primary $replica_1 $primary_host $primary_port] primary_initial_expired replica_1_initial_expired + + # Initialize deferred clients and subscribe to keyspace notifications + foreach instance [list $primary $replica_1] { + $instance config set notify-keyspace-events KEA + $instance DEBUG SET-ACTIVE-EXPIRE no + } + ####### Replication setup ####### + $replica_1 replicaof $primary_host $primary_port + wait_for_condition 50 100 { + [lindex [$replica_1 role] 0] eq {slave} && + [string match {*master_link_status:up*} [$replica_1 info replication]] + } else { + fail "Can't turn the instance into a replica" + } + + # Create hash fields with TTL on primary + set f1_exp [expr {[clock seconds] + 200}] + set f2_exp [expr {[clock seconds] + 300000}] + $primary HSET myhash f1 v1 f2 v2 f3 v3 + $primary HEXPIREAT myhash $f1_exp FIELDS 1 f1 + $primary HEXPIREAT myhash $f2_exp FIELDS 1 f2 + # f3 remains persistent + + # Wait for full sync + wait_for_ofs_sync $primary $replica_1 + + # Verify primary and replica are the same + foreach instance [list $primary $replica_1] { + assert_equal $f1_exp [$instance HEXPIRETIME myhash FIELDS 1 f1] + assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] + assert_equal -1 [$instance HTTL myhash FIELDS 1 f3] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal "v1 v2 v3" [$instance HMGET myhash f1 f2 f3] + assert_equal 3 [$instance HLEN myhash] + } + + # Perform promotion to primary + $primary FAILOVER TO $replica_1_host $replica_1_port + # Wait for replica to become primary + wait_for_condition 100 100 { + [info_field [$replica_1 info replication] role] eq "master" + } else { + fail "Replica didn't become master" + } + + # Setup keyspace notifications + $primary config set notify-keyspace-events KEA + $replica_1 config set notify-keyspace-events KEA + set rd_primary [valkey_deferring_client -1] + set rd_replica_1 [valkey_deferring_client $replica_1_host $replica_1_port] + assert_equal {1} [psubscribe $rd_primary __keyevent@*] + assert_equal {1} [psubscribe $rd_replica_1 __keyevent@*] + + # Check all values that checked before are the same after the failover + foreach instance [list $primary $replica_1] { + assert_equal $f1_exp [$instance HEXPIRETIME myhash FIELDS 1 f1] + assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] + assert_equal -1 [$instance HTTL myhash FIELDS 1 f3] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal "v1 v2 v3" [$instance HMGET myhash f1 f2 f3] + assert_equal 3 [$instance HLEN myhash] + } + + # Set f1 to expire in 1 second and wait for expiration + $replica_1 HEXPIRE myhash 1 FIELDS 1 f1 ;# will trigger hexpire + wait_for_ofs_sync $replica_1 $primary + wait_for_condition 50 100 { + [$replica_1 HTTL myhash FIELDS 1 f1] eq -2 + } else { + fail "f1 not expired" + } + + # Verify replica and primary are sync + foreach instance [list $primary $replica_1] { + assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] + assert_equal -2 [$instance HTTL myhash FIELDS 1 f1] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal "{} v2 v3" [$instance HMGET myhash f1 f2 f3] + assert_equal 3 [$instance HLEN myhash] + } + + # Change TTL of f2 + $replica_1 HEXPIRE myhash 1000000 FIELDS 1 f2 ;# will trigger hexpire + wait_for_ofs_sync $replica_1 $primary + foreach instance [list $primary $replica_1] { + assert_morethan [$instance HTTL myhash FIELDS 1 f2] 9000 + } + + # Change TTL of f2 to 0 (immediate expiry) + $replica_1 HGETEX myhash EX 0 FIELDS 1 f2 ;# will trigger hexpired for replica_1 and hdel for primary + # Verify final state + foreach instance [list $primary $replica_1] { + assert_equal 2 [$instance HLEN myhash] + assert_equal "{} {} v3" [r HMGET myhash f1 f2 f3] + } + + foreach rd [list $rd_replica_1 $rd_primary] { + assert_keyevent_patterns $rd myhash hexpire hexpire + } + assert_keyevent_patterns $rd_replica_1 myhash hexpired + assert_keyevent_patterns $rd_primary myhash hdel + + $rd_replica_1 close + $rd_primary close + # Re-enable active expiry + $primary DEBUG SET-ACTIVE-EXPIRE yes + $replica_1 DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + } +} + +### Slot Migration #### +start_cluster 3 0 {tags {"cluster mytest external:skip"} overrides {cluster-node-timeout 1000}} { + # Flush all data on all cluster nodes before starting + for {set i 0} {$i < 3} {incr i} { + R $i FLUSHALL + } + if {$::singledb} { + set db 0 + } else { + set db 9 + } + set R0_id [R 0 CLUSTER MYID] + set R1_id [R 1 CLUSTER MYID] + + # Use a fixed hash tag to ensure key is in one slot + set key "{mymigrate}myhash" + + test {Hash with TTL fields migrates correctly between nodes} { + R 0 DEBUG SET-ACTIVE-EXPIRE no + R 1 DEBUG SET-ACTIVE-EXPIRE no + # Create hash fields + R 0 HSET $key f1 v1 f2 v2 f3 v3 + + # Set TTL on fields f1 and f2 + R 0 HEXPIRE $key 300 FIELDS 2 f1 f2 + + # Verify before slot migration + assert_equal 3 [R 0 HLEN $key] + assert_morethan [R 0 HTTL $key FIELDS 1 f1] 290 + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [R 0 info keyspace]] keys=%d] + + # Prepare slot migration + set slot [R 0 CLUSTER KEYSLOT $key] + assert_equal OK [R 1 CLUSTER SETSLOT $slot IMPORTING $R0_id] + assert_equal OK [R 0 CLUSTER SETSLOT $slot MIGRATING $R1_id] + + # Migrate key to destination node + R 0 MIGRATE [srv -1 host] [srv -1 port] $key 0 5000 + + # Complete slot migration + R 0 CLUSTER SETSLOT $slot NODE $R1_id + R 1 CLUSTER SETSLOT $slot NODE $R1_id + + # Verify after slot migration + assert_equal 3 [R 1 HLEN $key] + assert_morethan [R 1 HTTL $key FIELDS 1 f1] 280 + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [R 1 info keyspace]] keys=%d] + + # Setup keyspace notifications + R 1 config set notify-keyspace-events KEA + set rd [valkey_deferring_client -1] + assert_equal {1} [psubscribe $rd __keyevent@0__:hexpired] + + # Set expiration to 0 + R 1 HGETEX $key EX 0 FIELDS 1 f1 + + # Veridy expiration + assert_keyevent_patterns $rd "{$key}" hexpired + assert_equal 2 [R 1 HLEN $key] + assert_equal "" [R 1 HGET $key f1] + assert_equal -2 [R 1 HTTL $key FIELDS 1 f1] + + $rd close + # Re-enable active expiry + R 0 DEBUG SET-ACTIVE-EXPIRE yes + R 1 DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} +} + +start_server {tags {"hashexpire external:skip"}} { + foreach cmd {RENAME RESTORE} { + test "$cmd Preserves Field TTLs" { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + r HSET myhash f1 v1 f2 v2 + r HEXPIRE myhash 200 FIELDS 1 f1 + + # Verify initial TTL state + set mem_before [r MEMORY USAGE myhash] + assert_equal "v1 v2" [r HMGET myhash f1 f2] + assert_morethan [r HTTL myhash FIELDS 1 f1] 100 + assert_equal -1 [r HTTL myhash FIELDS 1 f2] + assert_equal 2 [r HLEN myhash] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + # Run the command + if {$cmd eq "RENAME"} { + r rename myhash nwhash + set newhash nwhash + } elseif {$cmd eq "RESTORE"} { + set serialized [r DUMP myhash] + r RESTORE rstrhs 0 $serialized + set newhash rstrhs + } + + # Verify field values and TTLs are preserved + set memory_after [r MEMORY USAGE $newhash] + assert_equal "v1 v2" [r HMGET $newhash f1 f2] + assert_morethan [r HTTL $newhash FIELDS 1 f1] 100 + assert_equal -1 [r HTTL $newhash FIELDS 1 f2] + assert_equal 2 [r HLEN $newhash] + if {$cmd eq "RESTORE"} { + assert_match {2} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + } else { + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + } + assert_equal $mem_before $memory_after + } + } + + test {COPY Preserves TTLs} { + r flushall + r DEBUG SET-ACTIVE-EXPIRE no + + # Create hash with fields + r HSET myhash f1 v1 f3 v3 f4 v4 + + # Set TTL on f1 only + r HEXPIRE myhash 200 FIELDS 1 f1 + r HEXPIRE myhash 2 FIELDS 1 f3 + + # Verify initial TTL state + set mem_before [r MEMORY USAGE myhash] + assert_equal "v1 v3 v4" [r HMGET myhash f1 f3 f4] + assert_morethan [r HTTL myhash FIELDS 1 f1] 100 + assert_morethan [r HTTL myhash FIELDS 1 f3] 0 + assert_equal -1 [r HTTL myhash FIELDS 1 f4] + assert_equal 3 [r HLEN myhash] + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + # Copy hash to new key + r copy myhash newhash1 + + # Verify myhash is the same + assert_equal "v1 v3 v4" [r HMGET myhash f1 f3 f4] + assert_morethan [r HTTL myhash FIELDS 1 f1] 100 + assert_morethan [r HTTL myhash FIELDS 1 f3] 0 + assert_equal -1 [r HTTL myhash FIELDS 1 f4] + assert_equal 3 [r HLEN myhash] + + # Verify new hash got same values + set mem_after [r MEMORY USAGE myhash] + assert_equal "v1 v3 v4" [r HMGET myhash f1 f3 f4] + assert_morethan [r HTTL newhash1 FIELDS 1 f1] 100 + assert_morethan [r HTTL newhash1 FIELDS 1 f3] 0 + assert_equal -1 [r HTTL newhash1 FIELDS 1 f4] + assert_equal 3 [r HLEN newhash1] + assert_match {2} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + assert_equal $mem_before $mem_after + + # Modify TTL in original hash + r HEXPIRE myhash 5 FIELDS 1 f3 + + # Wait for original TTL to expire in copy + after 2000 + assert_equal "v1 {}" [r HMGET newhash1 f1 f3] + assert_equal "v1 v3" [r HMGET myhash f1 f3] + + r HSETEX myhash EX 2 FIELDS 1 f3 v3 + # Create second copy + r copy myhash newhash2 + + # Modify TTL in second copy + r HEXPIRE newhash2 500 FIELDS 1 f3 + + # Wait for original hash TTL to expire + after 2000 + assert_equal "v1 {}" [r HMGET myhash f1 f3] + assert_equal "v1 v3" [r HMGET newhash2 f1 f3] + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {Hash Encoding Transitions with TTL - Add TTL to Existing Fields} { + r flushall + r DEBUG SET-ACTIVE-EXPIRE no + + # Create small hash with listpack encoding + r HSET myhash f1 v1 f2 v2 + + # Verify initial encoding + set "listpack" [r OBJECT ENCODING myhash] + + # Add TTL to existing field + r HEXPIRE myhash 300 FIELDS 1 f1 + + # Verify encoding changed to hashtable + set "hashtable" [r OBJECT ENCODING myhash] + + # Verify field values are preserved + assert_equal "v1 v2" [r HMGET myhash f1 f2] + # Veridy expiry + assert_morethan [r HTTL myhash FIELDS 1 f1] 100 + assert_equal -1 [r HTTL myhash FIELDS 1 f2] + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {Hash Encoding Transitions with TTL - Create New Fields with TTL} { + r flushall + r DEBUG SET-ACTIVE-EXPIRE no + + # Create small hash with listpack encoding + r HSET myhash f1 v1 f2 v2 + + # Verify initial encoding + set "listpack" [r OBJECT ENCODING myhash] + + # Add many fields to force encoding transition + for {set i 3} {$i <= 600} {incr i} { + lappend pairs "f$i" "v$i" + } + r HSET myhash {*}$pairs + r HEXPIRE myhash 3 FIELDS 5 f1 f10 f100 f200 f300 + + # Verify encoding changed to hashtable + set "hashtable" [r OBJECT ENCODING myhash] + + # Verify all field values and TTLs are correct + for {set i 1} {$i <= 600} {incr i} { + assert_equal "v$i" [r HGET myhash "f$i"] + if {$i == 1 || $i == 10 || $i == 100 || $i == 200 || $i == 300} { + assert_equal 3 [r HTTL myhash FIELDS 1 "f$i"] + } else { + assert_equal -1 [r HTTL myhash FIELDS 1 "f$i"] + } + } + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} +} + +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + + foreach time_unit {s, ms} { + test "Key TTL expires before field TTL: entire hash should be deleted timeunit: $time_unit" { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + r config set notify-keyspace-events KEA + set rd [valkey_deferring_client] + assert_equal {1} [psubscribe $rd __keyevent@*] + + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 3 [r HLEN myhash] + if {$time_unit eq "s"} { + r HEXPIRE hash1 10 FIELDS 1 f1 + r EXPIRE hash1 1 + } else { + r HPEXPIRE myhash 10000 FIELDS 1 f1 + r PEXPIRE myhash 1000 + } + + wait_for_condition 100 100 { + [r EXISTS myhash] eq "0" + } else { + fail "myhash still exists" + } + assert_equal 0 [r HLEN myhash] + assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + + assert_keyevent_patterns $rd myhash hset hexpire expire + $rd close + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test "Field TTL expires before key TTL: only the specific field should expire: $time_unit" { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + set rd [valkey_deferring_client] + assert_equal {1} [psubscribe $rd __keyevent@*] + + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 3 [r HLEN myhash] + if {$time_unit eq "s"} { + r HEXPIRE myhash 1 FIELDS 1 f1 + r EXPIRE myhash 10 + } else { + r HPEXPIRE myhash 1000 FIELDS 1 f1 + r PEXPIRE myhash 10000 + } + + wait_for_condition 100 100 { + [r HGET myhash f1] eq "" + } else { + fail "f1 not expired" + } + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [r EXISTS myhash] + assert_equal "{} v2 v3" [r HMGET myhash f1 f2 f3] + assert_keyevent_patterns $rd myhash hset hexpire + $rd close + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test "Key and field TTL expire simultaneously: entire hash should be deleted: $time_unit" { + r FLUSHALL + r DEBUG SET-ACTIVE-EXPIRE no + + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 3 [r HLEN myhash] + + + if {$time_unit eq "s"} { + set expire [expr {[clock seconds] + 1}] + r HEXPIREAT myhash $expire FIELDS 1 f1 + r EXPIREAT myhash $expire + } else { + set expire [expr {[clock milliseconds] + 1000}] + r HPEXPIREAT myhash $expire FIELDS 1 f1 + r PEXPIREAT myhash $expire + } + + wait_for_condition 100 100 { + [r EXISTS myhash] eq 0 + } else { + fail "myhash still exist" + } + + assert_equal "{} {} {}" [r HMGET myhash f1 f2 f3] + assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 0 [r HLEN myhash] + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + + test {Millisecond/Seconds precision} { + r flushall + r DEBUG SET-ACTIVE-EXPIRE no + + r HSET myhash f1 v1 f2 v2 + if {$time_unit eq "s"} { + r HEXPIRE myhash 3 FIELDS 1 f1 + r EXPIRE myhash 1 + } else { + r HPEXPIRE myhash 3000 FIELDS 1 f1 + r PEXPIRE myhash 1000 + } + + after 1500 + assert_equal 0 [r EXISTS myhash] + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + } + + test {Ensure that key-level PERSIST on the key don't affect field TTL} { + r FLUSHALL + + r HSET myhash f1 v1 f2 v2 + assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 2 [r HLEN myhash] + r HEXPIRE myhash 100000 FIELDS 1 f1 + r PERSIST myhash + + assert_equal -1 [r TTL myhash] + assert_morethan [r HTTL myhash FIELDS 1 f1] 0 + } +} + +#### AOF Test ##### +tags {"aof external:skip"} { + set defaults {appendonly {yes} appendfilename {appendonly.aof} appenddirname {appendonlydir} auto-aof-rewrite-percentage {0}} + set server_path [tmpdir server.multi.aof] + start_server_aof [list dir $server_path] { + test {TTL Persistence in AOF} { + r flushall + r DEBUG SET-ACTIVE-EXPIRE no + r config set appendonly yes + r config set appendfsync always + + # Create hash with 1 short, long and no expired fields + set long_expire [expr {[clock seconds] + 1000000}] + # Create 10 fields with long expiry + for {set i 1} {$i <= 10} {incr i} { + r HSETEX myhash EXAT $long_expire FIELDS 1 f$i v$i ;# 10 PXAT to aof + } + + # Create 10 fields with short expiry + for {set i 11} {$i <= 20} {incr i} { + r HSETEX myhash PXAT [expr {[clock milliseconds] + 10}] FIELDS 1 f$i v$i ;# 10 PXAT to aof + } + + # Create 10 fields with expire 0 + for {set i 21} {$i <= 30} {incr i} { + r HSET myhash f$i v$i + r HEXPIRE myhash 0 FIELDS 1 f$i ;# 10 HDEL to aof + } + + # Create 10 fields with no expiry + for {set i 31} {$i <= 40} {incr i} { + r HSET myhash f$i v$i + } + + # Now wait for expire of the short expiry + for {set i 11} {$i <= 20} {incr i} { + wait_for_condition 100 100 { + [r HTTL myhash FIELDS 1 f$i] eq "-2" + } else { + fail "hash value was not expired after timeout" + } + } + + # Verify initial HLEN + assert_equal 30 [r HLEN myhash] + # Verify values + for {set i 1} {$i <= 40} {incr i} { + if {$i >= 11 && $i <= 30} { + assert_equal "" [r HGET myhash f$i] + } else { + assert_equal v$i [r HGET myhash f$i] + } + } + + # Ensure the initial rewrite finishes + waitForBgrewriteaof r + + # Get the last incremental AOF file path + set aof_file [get_last_incr_aof_path r] + + wait_for_condition 100 100 { + [file exists $aof_file] eq 1 + } else { + fail "hash value was not expired after timeout" + } + + # Read and check content + set aof_content [exec cat $aof_file] + + # Verify amount of PXAT and HDEL + # Count PXAT commands (should be 20: 10 long + 10 short) + set pxat_count [regexp -all {PXAT} $aof_content] + assert_equal 20 $pxat_count + # Count HDEL commands (should be 10: from expire 0) + set hdel_count [regexp -all {HDEL} $aof_content] + assert_equal 10 $hdel_count + + # Restart the server and load the AOF + restart_server 0 true false + r debug loadaof + + # Verify hash after loading from aof + # Verify same HLEN + assert_equal 30 [r HLEN myhash] + # Verify the TTLs are preserved + for {set i 1} {$i <= 10} {incr i} { + assert_equal $long_expire [r HEXPIRETIME myhash FIELDS 1 f$i] + assert_equal v$i [r HGET myhash f$i] + } + # Verify expired fields + for {set i 11} {$i <= 30} {incr i} { + assert_equal -2 [r HTTL myhash FIELDS 1 f$i] + assert_equal "" [r HGET myhash f$i] + } + # Verify fields with no TTL + for {set i 31} {$i <= 40} {incr i} { + assert_equal -1 [r HTTL myhash FIELDS 1 f$i] + assert_equal v$i [r HGET myhash f$i] + } + # Re-enable active expiry + r DEBUG SET-ACTIVE-EXPIRE yes + } {OK} {needs:debug} + } +} From 1eb6d93f13f91735b111606e5cf0668966ece265 Mon Sep 17 00:00:00 2001 From: Ran Shidlansik Date: Tue, 5 Aug 2025 11:27:13 +0300 Subject: [PATCH 2/3] Introduce volatile set ------------- MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Overview: --------- This PR introduces a complete redesign of the 'vset' (stands for volatile set) data structure, creating an adaptive container for expiring entries. The new design is memory-efficient, scalable, and dynamically promotes/demotes its internal representation depending on runtime behavior and volume. The core concept uses a single tagged pointer (`expiry_buckets`) that encodes one of several internal structures: - NONE (-1): Empty set - SINGLE (0x1): One entry - VECTOR (0x2): Sorted vector of entry pointers - HT (0x4): Hash table for larger buckets with many entries - RAX (0x6): Radix tree (keyed by aligned expiry timestamps) This allows the set to grow and shrink seamlessly while optimizing for both space and performance. Motivation: ----------- The previous design lacked flexibility in high-churn environments or workloads with skewed expiry distributions. This redesign enables dynamic layout adjustment based on the time distribution and volume of the inserted entries, while maintaining fast expiry checks and minimal memory overhead. Key Concepts: ------------- - All pointers stored in the structure must be odd-aligned to preserve 3 bits for tagging. This is safe with SDS strings (which set the LSB). - Buckets evolve automatically: - Start as NONE. - On first insert → become SINGLE. - If another entry with similar expiry → promote to VECTOR. - If VECTOR exceeds 127 entries → convert to RAX. - If a RAX bucket's vector fills and cannot split → promote to HT. - Each vector bucket is kept sorted by `entry->getExpiry()`. - Binary search is used for efficient insertion and splitting. # Coarse Buckets Expiration System for Hash Fields This PR introduces **coarse-grained expiration buckets** to support per-field expirations in hash types — a feature known as *volatile fields*. It enables scalable expiration tracking by grouping fields into time-aligned buckets instead of individually tracking exact timestamps. ## Motivation Valkey traditionally supports key-level expiration. However, in many applications, there's a strong need to expire individual fields within a hash (e.g., session keys, token caches, etc.). Tracking these at fine granularity is expensive and potentially unscalable, so this implementation introduces *bucketed expirations* to batch expirations together. ## Bucket Granularity and Timestamp Handling - Each expiration bucket represents a time slice of fixed width (e.g., 8192 ms). - Expiring fields are mapped to the **end** of a time slice (not the floor). - This design facilitates: - Efficient *splitting* of large buckets when needed - *Downgrading* buckets when fields permit tighter packing - Coalescing during lazy cleanup or memory pressure ### Example Calculation Suppose a field has an expiration time of `1690000123456` ms and the max bucket interval is 8192 ms: ``` BUCKET_INTERVAL_MAX = 8192; expiry = 1690000123456; bucket_ts = (expiry & ~(BUCKET_INTERVAL_MAX - 1LL)) + BUCKET_INTERVAL_MAX; = (1690000123456 & ~8191) + 8192 = 1690000122880 + 8192 = 1690000131072 ``` The field is stored in a bucket that **ends at** `1690000131072` ms. ### Bucket Alignment Diagram ``` Time (ms) → |----------------|----------------|----------------| 128ms buckets → 1690000122880 1690000131072 ^ ^ | | expiry floor assigned bucket end ``` ## Bucket Placement Logic - If a suitable bucket **already exists** (i.e., its `end_ts > expiry`), the field is added. - If no bucket covers the `expiry`, a **new bucket** is created at the computed `end_ts`. ## Bucket Downgrade Conditions Buckets are downgraded to smaller intervals when overpopulated (>127 fields). This happens when **all fields fit into a tighter bucket**. Downgrade rule: ``` (max_expiry & ~(BUCKET_INTERVAL_MIN - 1LL)) + BUCKET_INTERVAL_MIN < current_bucket_ts ``` If the above holds, all fields can be moved to a tighter bucket interval. ### Downgrade Bucket — Diagram ``` Before downgrade: Current Bucket (8192 ms) |----------------------------------------| | Field A | Field B | Field C | Field D | | exp=+30 | +200 | +500 | +1500 | |----------------------------------------| ↑ All expiries fall before tighter boundary After downgrade to 1024 ms: New Bucket (1024 ms) |------------------| | A | B | C | D | |------------------| ``` ### Bucket Split Strategy If downgrade is not possible, the bucket is **split**: - Fields are sorted by expiration time. - A subset that fits in an earlier bucket is moved out. - Remaining fields stay in the original bucket. ### Split Bucket — Diagram ``` Before split: Large Bucket (8192 ms) |--------------------------------------------------| | A | B | C | D | E | F | G | H | I | J | ... | Z | |---------------- Sorted by expiry ---------------| ↑ Fields A–L can be moved to an earlier bucket After split: Bucket 1 (end=1690000129024) Bucket 2 (end=1690000131072) |------------------------| |------------------------| | A | B | C | ... | L | | M | N | O | ... | Z | |------------------------| |------------------------| ``` ## Summary of Bucket Behavior | Scenario | Action Taken | |--------------------------------|------------------------------| | No bucket covers expiry | New bucket is created | | Existing bucket fits | Field is added | | Bucket overflows (>127 fields) | Downgrade or split attempted | API Changes: ------------ Create/Free: void vsetInit(vset *set); void vsetClear(vset *set); Mutation: bool vsetAddEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry); bool vsetRemoveEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry); bool vsetUpdateEntry(vset *set, vsetGetExpiryFunc getExpiry, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry); Expiry Retrieval: long long vsetEstimatedEarliestExpiry(vset *set, vsetGetExpiryFunc getExpiry); size_t vsetPopExpired(vset *set, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx); Utilities: bool vsetIsEmpty(vset *set); size_t vsetMemUsage(vset *set); Iteration: void vsetStart(vset *set, vsetIterator *it); bool vsetNext(vsetIterator *it, void **entryptr); void vsetStop(vsetIterator *it); Entry Requirements: ------------------- All entries must conform to the following interface via `volatileEntryType`: sds entryGetKey(const void entry); // for deduplication long long getExpiry(const void entry); // used for bucketing int expire(void db, void o, void entry); // used for expiration callbacks Diagrams: --------- 1. Tagged Pointer Representation ----------------------------- Lower 3 bits of `expiry_buckets` encode bucket type: +------------------------------+ | pointer | TAG (3b) | +------------------------------+ ↑ masked via VSET_PTR_MASK TAG values: 0x1 → SINGLE 0x2 → VECTOR 0x4 → HT 0x6 → RAX 2. Evolution of the Bucket ------------------------ *Volatile set top-level structure:* ``` +--------+ +--------+ +--------+ +--------+ | NONE | --> | SINGLE | --> | VECTOR | --> | RAX | +--------+ +--------+ +--------+ +--------+ ``` *If the top-level element is a RAX, it has child buckets of type:* ``` +--------+ +--------+ +-----------+ | SINGLE | --> | VECTOR | --> | HASHTABLE | +--------+ +--------+ +-----------+ ``` *Vectors can split into multiple vectors and shrink into SINGLE buckets. A RAX with only one element is collapsed by replacing the RAX with its single element on the top level (except for HASHTABLE buckets which are not allowed on the top level).* 3. RAX Structure with Expiry-Aligned Keys -------------------------------------- Buckets in RAX are indexed by aligned expiry timestamps: +------------------------------+ | RAX key (bucket_ts) → Bucket| +------------------------------+ | 0x00000020 → VECTOR | | 0x00000040 → VECTOR | | 0x00000060 → HT | +------------------------------+ 4. Bucket Splitting (Inside RAX) ----------------------------- If a vector bucket in a RAX fills: - Binary search for best split point. - Use `getExpiry(entry)` + `get_bucket_ts()` to find transition. - Create 2 new buckets and update RAX. Original: [entry1, entry2, ..., entryN] ← bucket_ts = 64ms After split: [entry1, ..., entryK] → bucket_ts = 32ms [entryK+1, ..., entryN] → bucket_ts = 64ms If all entries share same bucket_ts → promote to HT. 5. Shrinking Behavior ------------------ On deletion: - HT may shrink to VECTOR. - VECTOR with 1 item → becomes SINGLE. - If RAX has only one key left, it’s promoted up. Summary: -------- This redesign provides: ✓ Fine-grained memory control ✓ High scalability for bursty TTL data ✓ Fast expiry checks via windowed organization ✓ Minimal overhead for sparse sets ✓ Flexible binary-search-based sorting and bucketing It also lays the groundwork for future enhancements, including metrics, prioritized expiry policies, or segmented cleaning. Signed-off-by: Ran Shidlansik --- cmake/Modules/SourceFiles.cmake | 2 +- src/Makefile | 2 +- src/hashtable.c | 6 +- src/object.c | 4 + src/server.h | 4 +- src/t_hash.c | 80 +- src/unit/test_files.h | 10 + src/unit/test_vset.c | 518 +++++++ src/volatile_set.c | 79 - src/volatile_set.h | 40 - src/vset.c | 2393 +++++++++++++++++++++++++++++++ src/vset.h | 97 ++ 12 files changed, 3063 insertions(+), 172 deletions(-) create mode 100644 src/unit/test_vset.c delete mode 100644 src/volatile_set.c delete mode 100644 src/volatile_set.h create mode 100644 src/vset.c create mode 100644 src/vset.h diff --git a/cmake/Modules/SourceFiles.cmake b/cmake/Modules/SourceFiles.cmake index da23f9f880b..861d782070b 100644 --- a/cmake/Modules/SourceFiles.cmake +++ b/cmake/Modules/SourceFiles.cmake @@ -119,7 +119,7 @@ set(VALKEY_SERVER_SRCS ${CMAKE_SOURCE_DIR}/src/server.c ${CMAKE_SOURCE_DIR}/src/logreqres.c ${CMAKE_SOURCE_DIR}/src/entry.c - ${CMAKE_SOURCE_DIR}/src/volatile_set.c) + ${CMAKE_SOURCE_DIR}/src/vset.c) # valkey-cli diff --git a/src/Makefile b/src/Makefile index 2f4f360bf2b..b9f2e9f0eed 100644 --- a/src/Makefile +++ b/src/Makefile @@ -423,7 +423,7 @@ ENGINE_NAME=valkey SERVER_NAME=$(ENGINE_NAME)-server$(PROG_SUFFIX) ENGINE_SENTINEL_NAME=$(ENGINE_NAME)-sentinel$(PROG_SUFFIX) ENGINE_TRACE_OBJ=trace/trace.o trace/trace_commands.o trace/trace_db.o trace/trace_cluster.o trace/trace_server.o trace/trace_rdb.o trace/trace_aof.o -ENGINE_SERVER_OBJ=threads_mngr.o adlist.o vector.o quicklist.o ae.o anet.o dict.o hashtable.o kvstore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o memory_prefetch.o io_threads.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o commandlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o valkey-check-rdb.o valkey-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o allocator_defrag.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script.o functions.o commands.o strl.o connection.o unix.o logreqres.o rdma.o scripting_engine.o entry.o volatile_set.o lua/script_lua.o lua/function_lua.o lua/engine_lua.o lua/debug_lua.o +ENGINE_SERVER_OBJ=threads_mngr.o adlist.o vector.o quicklist.o ae.o anet.o dict.o hashtable.o kvstore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o memory_prefetch.o io_threads.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o commandlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o valkey-check-rdb.o valkey-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o allocator_defrag.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script.o functions.o commands.o strl.o connection.o unix.o logreqres.o rdma.o scripting_engine.o entry.o vset.o lua/script_lua.o lua/function_lua.o lua/engine_lua.o lua/debug_lua.o ENGINE_SERVER_OBJ+=$(ENGINE_TRACE_OBJ) ENGINE_CLI_NAME=$(ENGINE_NAME)-cli$(PROG_SUFFIX) ENGINE_CLI_OBJ=anet.o adlist.o dict.o valkey-cli.o zmalloc.o release.o ae.o serverassert.o crcspeed.o crccombine.o crc64.o siphash.o crc16.o monotonic.o cli_common.o mt19937-64.o strl.o cli_commands.o sds.o util.o sha256.o diff --git a/src/hashtable.c b/src/hashtable.c index 214df11e7ee..4d42f41428c 100644 --- a/src/hashtable.c +++ b/src/hashtable.c @@ -1800,7 +1800,7 @@ size_t hashtableScanDefrag(hashtable *ht, size_t cursor, hashtableScanFunction f size_t used_before = ht->used[0]; bucket *b = &ht->tables[0][idx]; do { - if (b->presence != 0) { + if (fn && b->presence != 0) { int pos; for (pos = 0; pos < ENTRIES_PER_BUCKET; pos++) { if (isPositionFilled(b, pos) && validateElementIfNeeded(ht, b->entries[pos])) { @@ -1843,7 +1843,7 @@ size_t hashtableScanDefrag(hashtable *ht, size_t cursor, hashtableScanFunction f size_t used_before = ht->used[table_small]; bucket *b = &ht->tables[table_small][idx]; do { - if (b->presence) { + if (fn && b->presence) { for (int pos = 0; pos < ENTRIES_PER_BUCKET; pos++) { if (isPositionFilled(b, pos) && validateElementIfNeeded(ht, b->entries[pos])) { void *emit = emit_ref ? &b->entries[pos] : b->entries[pos]; @@ -1873,7 +1873,7 @@ size_t hashtableScanDefrag(hashtable *ht, size_t cursor, hashtableScanFunction f size_t used_before = ht->used[table_large]; bucket *b = &ht->tables[table_large][idx]; do { - if (b->presence) { + if (fn && b->presence) { for (int pos = 0; pos < ENTRIES_PER_BUCKET; pos++) { if (isPositionFilled(b, pos) && validateElementIfNeeded(ht, b->entries[pos])) { void *emit = emit_ref ? &b->entries[pos] : b->entries[pos]; diff --git a/src/object.c b/src/object.c index d85699b7cb4..144907c2015 100644 --- a/src/object.c +++ b/src/object.c @@ -28,10 +28,12 @@ * POSSIBILITY OF SUCH DAMAGE. */ +#include "hashtable.h" #include "server.h" #include "serverassert.h" #include "functions.h" #include "intset.h" /* Compact integer set structure */ +#include "vset.h" #include "zmalloc.h" #include "sds.h" #include "module.h" @@ -1201,6 +1203,7 @@ size_t objectComputeSize(robj *key, robj *o, size_t sample_size, int dbid) { } else if (o->encoding == OBJ_ENCODING_HASHTABLE) { hashtable *ht = o->ptr; hashtableIterator iter; + vset *volatile_fields = hashtableMetadata(ht); hashtableInitIterator(&iter, ht, 0); void *next; @@ -1211,6 +1214,7 @@ size_t objectComputeSize(robj *key, robj *o, size_t sample_size, int dbid) { } hashtableResetIterator(&iter); if (samples) asize += (double)elesize / samples * hashtableSize(ht); + if (vsetIsValid(volatile_fields)) asize += vsetMemUsage(volatile_fields); } else { serverPanic("Unknown hash encoding"); } diff --git a/src/server.h b/src/server.h index a4e60788595..d63c5eaf20d 100644 --- a/src/server.h +++ b/src/server.h @@ -80,7 +80,7 @@ #include "rax.h" /* Radix tree */ #include "connection.h" /* Connection abstraction */ #include "memory_prefetch.h" -#include "volatile_set.h" +#include "vset.h" #include "trace/trace.h" #include "entry.h" @@ -2627,7 +2627,7 @@ typedef struct { unsigned char *fptr, *vptr; hashtableIterator iter; - volatileSetIterator viter; + vsetIterator viter; void *next; } hashTypeIterator; diff --git a/src/t_hash.c b/src/t_hash.c index 14332fcea80..b529355ff2d 100644 --- a/src/t_hash.c +++ b/src/t_hash.c @@ -35,7 +35,7 @@ #include "hashtable.h" #include "rax.h" #include "sds.h" -#include "volatile_set.h" +#include "vset.h" #include "server.h" #include "zmalloc.h" #include @@ -52,27 +52,23 @@ typedef enum { EXPIRATION_MODIFICATION_EXPIRE_ASAP = 2, /* if apply of the expiration modification was set to a time in the past (i.e field is immediately expired) */ } expiryModificationResult; -volatileEntryType hashVolatileEntryType = { - .entryGetKey = (sds(*)(const void *entry))entryGetField, - .getExpiry = (long long (*)(const void *entry))entryGetExpiry, -}; - /*----------------------------------------------------------------------------- * Hash type Expiry API *----------------------------------------------------------------------------*/ -static volatile_set *hashTypeGetVolatileSet(robj *o) { +static vset *hashTypeGetVolatileSet(robj *o) { serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); - return *(volatile_set **)hashtableMetadata(o->ptr); -} - -void hashTypeFreeVolatileSet(robj *o) { - volatile_set *set = hashTypeGetVolatileSet(o); - if (set) freeVolatileSet(set); + vset *set = (vset *)hashtableMetadata(o->ptr); + return vsetIsValid(set) ? set : NULL; } bool hashTypeHasVolatileElements(robj *o) { - return ((o->encoding == OBJ_ENCODING_HASHTABLE) && (hashTypeGetVolatileSet(o) != NULL)); + if (o->encoding == OBJ_ENCODING_HASHTABLE) { + vset *set = hashTypeGetVolatileSet(o); + if (set && !vsetIsEmpty(set)) + return true; + } + return false; } /* make any access to the hash object elements ignore the specific elements expiration. @@ -80,44 +76,43 @@ bool hashTypeHasVolatileElements(robj *o) { static inline void hashTypeIgnoreTTL(robj *o, bool ignore) { if (o->encoding == OBJ_ENCODING_HASHTABLE) { /* prevent placing access function if not needed */ - if (!ignore && !hashTypeHasVolatileElements(o)) { + if (!ignore && hashTypeGetVolatileSet(o) == NULL) { ignore = true; } hashtableSetType(o->ptr, ignore ? &hashHashtableType : &hashWithVolatileItemsHashtableType); } } -static volatile_set *hashTypeGetOrcreateVolatileSet(robj *o) { +static vset *hashTypeGetOrcreateVolatileSet(robj *o) { serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); - volatile_set **volatile_set_ref = hashtableMetadata(o->ptr); - if (*volatile_set_ref == NULL) { - *volatile_set_ref = createVolatileSet(&hashVolatileEntryType); + vset *set = (vset *)hashtableMetadata(o->ptr); + if (!vsetIsValid(set)) { + vsetInit(set); /* serves mainly for optimization. Use type which supports access function only when needed. */ hashTypeIgnoreTTL(o, false); } - return *volatile_set_ref; + return set; } -static void hashTypeDeleteVolatileSet(robj *o) { - volatile_set **volatile_set_ref = hashtableMetadata(o->ptr); - freeVolatileSet(*volatile_set_ref); - *volatile_set_ref = NULL; +void hashTypeFreeVolatileSet(robj *o) { + vset *set = (vset *)hashtableMetadata(o->ptr); + if (vsetIsValid(set)) vsetRelease(set); /* serves mainly for optimization. by changing the hashtable type we can avoid extra function call in hashtable access */ hashTypeIgnoreTTL(o, true); } void hashTypeTrackEntry(robj *o, void *entry) { - volatile_set *set = hashTypeGetOrcreateVolatileSet(o); - serverAssert(volatileSetAddEntry(set, entry, entryGetExpiry(entry))); + vset *set = hashTypeGetOrcreateVolatileSet(o); + serverAssert(vsetAddEntry(set, entryGetExpiry, entry)); } void hashTypeUntrackEntry(robj *o, void *entry) { if (!entryHasExpiry(entry)) return; - volatile_set *set = hashTypeGetVolatileSet(o); + vset *set = hashTypeGetVolatileSet(o); debugServerAssert(set); - serverAssert(volatileSetRemoveEntry(set, entry, entryGetExpiry(entry))); - if (volatileSetNumEntries(set) == 0) { - hashTypeDeleteVolatileSet(o); + serverAssert(vsetRemoveEntry(set, entryGetExpiry, entry)); + if (vsetIsEmpty(set)) { + hashTypeFreeVolatileSet(o); } } @@ -128,20 +123,13 @@ void hashTypeTrackUpdateEntry(robj *o, void *old_entry, void *new_entry, long lo if (!old_tracked && !new_tracked) return; - volatile_set *set = hashTypeGetOrcreateVolatileSet(o); - debugServerAssert(set); + vset *set = hashTypeGetOrcreateVolatileSet(o); + debugServerAssert(!old_tracked || !vsetIsEmpty(set)); - if (old_tracked && !new_tracked) { - serverAssert(volatileSetRemoveEntry(set, old_entry, old_expiry)); - } else if (new_tracked && !old_tracked) { - serverAssert(volatileSetAddEntry(set, new_entry, new_expiry)); - } else { - volatile_set *set = hashTypeGetVolatileSet(o); - debugServerAssert(set); - serverAssert(volatileSetUpdateEntry(set, old_entry, new_entry, old_expiry, new_expiry) == 1); - } - if (volatileSetNumEntries(set) == 0) { - hashTypeDeleteVolatileSet(o); + serverAssert(vsetUpdateEntry(set, entryGetExpiry, old_entry, new_entry, old_expiry, new_expiry) == 1); + + if (vsetIsEmpty(set)) { + hashTypeFreeVolatileSet(o); } } @@ -599,7 +587,7 @@ void hashTypeInitVolatileIterator(robj *subject, hashTypeIterator *hi) { if (hi->encoding == OBJ_ENCODING_LISTPACK) { return; } else if (hi->encoding == OBJ_ENCODING_HASHTABLE) { - volatileSetStart(hashTypeGetVolatileSet(subject), &hi->viter); + vsetInitIterator(hashTypeGetVolatileSet(subject), &hi->viter); } else { serverPanic("Unknown hash encoding"); } @@ -610,7 +598,7 @@ void hashTypeResetIterator(hashTypeIterator *hi) { if (!hi->volatile_items_iter) hashtableResetIterator(&hi->iter); else - volatileSetReset(&hi->viter); + vsetResetIterator(&hi->viter); } } @@ -650,7 +638,7 @@ int hashTypeNext(hashTypeIterator *hi) { if (!hi->volatile_items_iter) { if (!hashtableNext(&hi->iter, &hi->next)) return C_ERR; } else { - if (!volatileSetNext(&hi->viter, &hi->next)) return C_ERR; + if (!vsetNext(&hi->viter, &hi->next)) return C_ERR; } } else { serverPanic("Unknown hash encoding"); diff --git a/src/unit/test_files.h b/src/unit/test_files.h index d7befe08943..34459201bb2 100644 --- a/src/unit/test_files.h +++ b/src/unit/test_files.h @@ -201,6 +201,14 @@ int test_reclaimFilePageCache(int argc, char **argv, int flags); int test_writePointerWithPadding(int argc, char **argv, int flags); int test_valkey_strtod(int argc, char **argv, int flags); int test_vector(int argc, char **argv, int flags); +int test_vset_add_and_iterate(int argc, char **argv, int flags); +int test_vset_large_batch_same_expiry(int argc, char **argv, int flags); +int test_vset_large_batch_update_entry_same_expiry(int argc, char **argv, int flags); +int test_vset_large_batch_update_entry_multiple_expiries(int argc, char **argv, int flags); +int test_vset_iterate_multiple_expiries(int argc, char **argv, int flags); +int test_vset_add_and_remove_all(int argc, char **argv, int flags); +int test_vset_defrag(int argc, char **argv, int flags); +int test_vset_fuzzer(int argc, char **argv, int flags); int test_ziplistCreateIntList(int argc, char **argv, int flags); int test_ziplistPop(int argc, char **argv, int flags); int test_ziplistGetElementAtIndex3(int argc, char **argv, int flags); @@ -261,6 +269,7 @@ unitTest __test_sha1_c[] = {{"test_sha1", test_sha1}, {NULL, NULL}}; unitTest __test_util_c[] = {{"test_string2ll", test_string2ll}, {"test_string2l", test_string2l}, {"test_ll2string", test_ll2string}, {"test_ld2string", test_ld2string}, {"test_fixedpoint_d2string", test_fixedpoint_d2string}, {"test_version2num", test_version2num}, {"test_reclaimFilePageCache", test_reclaimFilePageCache}, {"test_writePointerWithPadding", test_writePointerWithPadding}, {NULL, NULL}}; unitTest __test_valkey_strtod_c[] = {{"test_valkey_strtod", test_valkey_strtod}, {NULL, NULL}}; unitTest __test_vector_c[] = {{"test_vector", test_vector}, {NULL, NULL}}; +unitTest __test_vset_c[] = {{"test_vset_add_and_iterate", test_vset_add_and_iterate}, {"test_vset_large_batch_same_expiry", test_vset_large_batch_same_expiry}, {"test_vset_large_batch_update_entry_same_expiry", test_vset_large_batch_update_entry_same_expiry}, {"test_vset_large_batch_update_entry_multiple_expiries", test_vset_large_batch_update_entry_multiple_expiries}, {"test_vset_iterate_multiple_expiries", test_vset_iterate_multiple_expiries}, {"test_vset_add_and_remove_all", test_vset_add_and_remove_all}, {"test_vset_defrag", test_vset_defrag}, {"test_vset_fuzzer", test_vset_fuzzer}, {NULL, NULL}}; unitTest __test_ziplist_c[] = {{"test_ziplistCreateIntList", test_ziplistCreateIntList}, {"test_ziplistPop", test_ziplistPop}, {"test_ziplistGetElementAtIndex3", test_ziplistGetElementAtIndex3}, {"test_ziplistGetElementOutOfRange", test_ziplistGetElementOutOfRange}, {"test_ziplistGetLastElement", test_ziplistGetLastElement}, {"test_ziplistGetFirstElement", test_ziplistGetFirstElement}, {"test_ziplistGetElementOutOfRangeReverse", test_ziplistGetElementOutOfRangeReverse}, {"test_ziplistIterateThroughFullList", test_ziplistIterateThroughFullList}, {"test_ziplistIterateThroughListFrom1ToEnd", test_ziplistIterateThroughListFrom1ToEnd}, {"test_ziplistIterateThroughListFrom2ToEnd", test_ziplistIterateThroughListFrom2ToEnd}, {"test_ziplistIterateThroughStartOutOfRange", test_ziplistIterateThroughStartOutOfRange}, {"test_ziplistIterateBackToFront", test_ziplistIterateBackToFront}, {"test_ziplistIterateBackToFrontDeletingAllItems", test_ziplistIterateBackToFrontDeletingAllItems}, {"test_ziplistDeleteInclusiveRange0To0", test_ziplistDeleteInclusiveRange0To0}, {"test_ziplistDeleteInclusiveRange0To1", test_ziplistDeleteInclusiveRange0To1}, {"test_ziplistDeleteInclusiveRange1To2", test_ziplistDeleteInclusiveRange1To2}, {"test_ziplistDeleteWithStartIndexOutOfRange", test_ziplistDeleteWithStartIndexOutOfRange}, {"test_ziplistDeleteWithNumOverflow", test_ziplistDeleteWithNumOverflow}, {"test_ziplistDeleteFooWhileIterating", test_ziplistDeleteFooWhileIterating}, {"test_ziplistReplaceWithSameSize", test_ziplistReplaceWithSameSize}, {"test_ziplistReplaceWithDifferentSize", test_ziplistReplaceWithDifferentSize}, {"test_ziplistRegressionTestForOver255ByteStrings", test_ziplistRegressionTestForOver255ByteStrings}, {"test_ziplistRegressionTestDeleteNextToLastEntries", test_ziplistRegressionTestDeleteNextToLastEntries}, {"test_ziplistCreateLongListAndCheckIndices", test_ziplistCreateLongListAndCheckIndices}, {"test_ziplistCompareStringWithZiplistEntries", test_ziplistCompareStringWithZiplistEntries}, {"test_ziplistMergeTest", test_ziplistMergeTest}, {"test_ziplistStressWithRandomPayloadsOfDifferentEncoding", test_ziplistStressWithRandomPayloadsOfDifferentEncoding}, {"test_ziplistCascadeUpdateEdgeCases", test_ziplistCascadeUpdateEdgeCases}, {"test_ziplistInsertEdgeCase", test_ziplistInsertEdgeCase}, {"test_ziplistStressWithVariableSize", test_ziplistStressWithVariableSize}, {"test_BenchmarkziplistFind", test_BenchmarkziplistFind}, {"test_BenchmarkziplistIndex", test_BenchmarkziplistIndex}, {"test_BenchmarkziplistValidateIntegrity", test_BenchmarkziplistValidateIntegrity}, {"test_BenchmarkziplistCompareWithString", test_BenchmarkziplistCompareWithString}, {"test_BenchmarkziplistCompareWithNumber", test_BenchmarkziplistCompareWithNumber}, {"test_ziplistStress__ziplistCascadeUpdate", test_ziplistStress__ziplistCascadeUpdate}, {NULL, NULL}}; unitTest __test_zipmap_c[] = {{"test_zipmapIterateWithLargeKey", test_zipmapIterateWithLargeKey}, {"test_zipmapIterateThroughElements", test_zipmapIterateThroughElements}, {NULL, NULL}}; unitTest __test_zmalloc_c[] = {{"test_zmallocAllocReallocCallocAndFree", test_zmallocAllocReallocCallocAndFree}, {"test_zmallocAllocZeroByteAndFree", test_zmallocAllocZeroByteAndFree}, {NULL, NULL}}; @@ -288,6 +297,7 @@ struct unitTestSuite { {"test_util.c", __test_util_c}, {"test_valkey_strtod.c", __test_valkey_strtod_c}, {"test_vector.c", __test_vector_c}, + {"test_vset.c", __test_vset_c}, {"test_ziplist.c", __test_ziplist_c}, {"test_zipmap.c", __test_zipmap_c}, {"test_zmalloc.c", __test_zmalloc_c}, diff --git a/src/unit/test_vset.c b/src/unit/test_vset.c new file mode 100644 index 00000000000..f8646875586 --- /dev/null +++ b/src/unit/test_vset.c @@ -0,0 +1,518 @@ +#include "../vset.h" +#include "../entry.h" +#include "test_help.h" +#include "../zmalloc.h" + +#include +#include +#include +#include +#include +#include +#include + +typedef entry mock_entry; + +static mock_entry *mockCreateEntry(const char *keystr, long long expiry) { + sds field = sdsnew(keystr); + mock_entry *e = entryCreate(field, sdsnew("value"), expiry); + sdsfree(field); + return e; +} + +static void mockFreeEntry(void *entry) { + // printf("mockFreeEntry: %p\n", entry); + entryFree(entry); +} + +static mock_entry *mockEntryUpdate(mock_entry *entry, long long expiry) { + mock_entry *new_entry = entryCreate(entryGetField(entry), sdsdup(entryGetValue(entry)), expiry); + entryFree(entry); + return new_entry; +} + +static long long mockGetExpiry(const void *entry) { + return entryGetExpiry(entry); +} + +int test_vset_add_and_iterate(int argc, char **argv, int flags) { + (void)argc; + (void)argv; + (void)flags; + + vset set; + vsetInit(&set); + + mock_entry *e1 = mockCreateEntry("item1", 123); + mock_entry *e2 = mockCreateEntry("item2", 456); + + TEST_ASSERT(vsetAddEntry(&set, mockGetExpiry, e1)); + TEST_ASSERT(vsetAddEntry(&set, mockGetExpiry, e2)); + + TEST_ASSERT(!vsetIsEmpty(&set)); + + vsetIterator it; + vsetInitIterator(&set, &it); + + void *entry; + int count = 0; + while (vsetNext(&it, &entry)) { + TEST_EXPECT(entry != NULL); + count++; + } + + TEST_ASSERT(count == 2); + + vsetResetIterator(&it); + vsetRelease(&set); + mockFreeEntry(e1); + mockFreeEntry(e2); + + TEST_PRINT_INFO("Test passed with %d expects", failed_expects); + return 0; +} + +int test_vset_large_batch_same_expiry(int argc, char **argv, int flags) { + (void)argc; + (void)argv; + (void)flags; + + vset set; + vsetInit(&set); + + const long long expiry_time = 1000LL; + const int total_entries = 200; + + // Allocate and add 200 entries with same expiry + mock_entry **entries = zmalloc(sizeof(mock_entry *) * total_entries); + TEST_ASSERT(entries != NULL); + + for (int i = 0; i < total_entries; i++) { + char key_buf[32]; + snprintf(key_buf, sizeof(key_buf), "entry_%d", i); + entries[i] = mockCreateEntry(key_buf, expiry_time); + TEST_ASSERT(vsetAddEntry(&set, mockGetExpiry, entries[i])); + } + + // Verify set is not empty + TEST_ASSERT(!vsetIsEmpty(&set)); + + // Iterate all entries and count them + vsetIterator it; + vsetInitIterator(&set, &it); + + void *entry; + int count = 0; + while (vsetNext(&it, &entry)) { + TEST_EXPECT(entry != NULL); + count++; + } + TEST_ASSERT(count == total_entries); + + // Cleanup + vsetResetIterator(&it); + vsetRelease(&set); + + for (int i = 0; i < total_entries; i++) { + mockFreeEntry(entries[i]); + } + zfree(entries); + + TEST_PRINT_INFO("Inserted and iterated %d entries with same expiry", total_entries); + return 0; +} + +int test_vset_large_batch_update_entry_same_expiry(int argc, char **argv, int flags) { + (void)argc; + (void)argv; + (void)flags; + + vset set; + vsetInit(&set); + + const long long expiry_time = 1000LL; + const unsigned int total_entries = 1000; + + mock_entry *entries[total_entries]; + + for (unsigned int i = 0; i < total_entries; i++) { + char key_buf[32]; + snprintf(key_buf, sizeof(key_buf), "entry_%d", i); + entries[i] = mockCreateEntry(key_buf, expiry_time); + TEST_ASSERT(vsetAddEntry(&set, mockGetExpiry, entries[i])); + } + // Verify set is not empty + TEST_ASSERT(!vsetIsEmpty(&set)); + + // Now iterate and replace all entries + for (unsigned int i = 0; i < total_entries; i++) { + mock_entry *old_entry = entries[i]; + entries[i] = mockEntryUpdate(entries[i], expiry_time); + TEST_ASSERT(vsetUpdateEntry(&set, mockGetExpiry, old_entry, entries[i], expiry_time, expiry_time)); + } + + for (unsigned int i = 0; i < total_entries; i++) { + TEST_ASSERT(vsetRemoveEntry(&set, mockGetExpiry, entries[i])); + } + + // Verify set is empty + TEST_ASSERT(vsetIsEmpty(&set)); + + // Cleanup + for (unsigned int i = 0; i < total_entries; i++) { + mockFreeEntry(entries[i]); + } + + TEST_PRINT_INFO("Inserted, updated and deleted %d entries with same expiry", total_entries); + return 0; +} + +int test_vset_large_batch_update_entry_multiple_expiries(int argc, char **argv, int flags) { + (void)argc; + (void)argv; + (void)flags; + const unsigned int total_entries = 1000; + + vset set; + vsetInit(&set); + + // Prepare entries with mixed expiry times, some duplicates + mock_entry *entries[total_entries]; + + // Initialize keys + for (unsigned int i = 0; i < total_entries; i++) { + char key_buf[32]; + snprintf(key_buf, sizeof(key_buf), "entry_%d", i); + long long expiry_time = rand() % 10000; + entries[i] = mockCreateEntry(key_buf, expiry_time); + TEST_ASSERT(vsetAddEntry(&set, mockGetExpiry, entries[i])); + } + // Verify set is not empty + TEST_ASSERT(!vsetIsEmpty(&set)); + + // Now iterate and replace all entries + for (unsigned int i = 0; i < total_entries; i++) { + mock_entry *old_entry = entries[i]; + long long old_expiry = entryGetExpiry(entries[i]); + long long new_expiry = old_expiry + rand() % 100000; + entries[i] = mockEntryUpdate(entries[i], new_expiry); + TEST_ASSERT(vsetUpdateEntry(&set, mockGetExpiry, old_entry, entries[i], old_expiry, new_expiry)); + } + + for (unsigned int i = 0; i < total_entries; i++) { + TEST_ASSERT(vsetRemoveEntry(&set, mockGetExpiry, entries[i])); + } + + // Verify set is empty + TEST_ASSERT(vsetIsEmpty(&set)); + + // Cleanup + for (unsigned int i = 0; i < total_entries; i++) { + mockFreeEntry(entries[i]); + } + + TEST_PRINT_INFO("Inserted, updated and deleted %d entries with different expiry", total_entries); + return 0; +} + +int test_vset_iterate_multiple_expiries(int argc, char **argv, int flags) { + (void)argc; + (void)argv; + (void)flags; + const unsigned int total_entries = 5; + + vset set; + vsetInit(&set); + + // Prepare entries with mixed expiry times, some duplicates + mock_entry *entries[total_entries]; + + // Initialize keys + for (unsigned int i = 0; i < total_entries; i++) { + char key_buf[32]; + snprintf(key_buf, sizeof(key_buf), "entry_%d", i); + long long expiry_time = rand() % 10000; + entries[i] = mockCreateEntry(key_buf, expiry_time); + TEST_ASSERT(vsetAddEntry(&set, mockGetExpiry, entries[i])); + } + + vsetIterator it; + vsetInitIterator(&set, &it); + + int found[5] = {0}; + int total = 0; + + void *entry; + while (vsetNext(&it, &entry)) { + TEST_EXPECT(entry != NULL); + mock_entry *e = (mock_entry *)entry; + + // Match the entries we inserted + for (int i = 0; i < 5; i++) { + if (strcmp(entryGetField(e), entryGetField(entries[i])) == 0) { + found[i] = 1; + break; + } + } + total++; + } + + TEST_ASSERT(total == 5); + + for (int i = 0; i < 5; i++) { + TEST_EXPECT(found[i]); + } + + vsetResetIterator(&it); + vsetRelease(&set); + for (int i = 0; i < 5; i++) mockFreeEntry(entries[i]); + + TEST_PRINT_INFO("Iterated all %d mixed expiry entries successfully", total); + return 0; +} + +int test_vset_add_and_remove_all(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + + vset set; + vsetInit(&set); + + const int total_entries = 130; + mock_entry *entries[total_entries]; + long long expiry = 5000; + + for (int i = 0; i < total_entries; i++) { + char key[32]; + snprintf(key, sizeof(key), "key_%d", i); + entries[i] = mockCreateEntry(key, expiry); + TEST_ASSERT(vsetAddEntry(&set, mockGetExpiry, entries[i])); + } + + for (int i = 0; i < total_entries; i++) { + TEST_ASSERT(vsetRemoveEntry(&set, mockGetExpiry, entries[i])); + mockFreeEntry(entries[i]); + } + + TEST_ASSERT(vsetIsEmpty(&set)); + vsetRelease(&set); + + TEST_PRINT_INFO("Add/remove %d entries, set size now 0", total_entries); + return 0; +} + +/********************* Fuzzer tests ********************************/ + +#define NUM_ITERATIONS 100000 +#define MAX_ENTRIES 10000 +#define NUM_DEFRAG_STEPS 100 + +/* Global array to simulate a test database */ +mock_entry *mock_entries[MAX_ENTRIES]; +int mock_entry_count = 0; + +/* --------- volatileEntryType Callbacks --------- */ +sds mock_entry_get_key(const void *entry) { + return (sds)entry; +} + +long long mock_entry_get_expiry(const void *entry) { + return mockGetExpiry(entry); +} + +int mock_entry_expire(void *entry, void *ctx) { + mock_entry *e = (mock_entry *)entry; + long long now = *(long long *)ctx; + TEST_ASSERT(mock_entry_get_expiry(entry) <= now); + for (int i = 0; i < mock_entry_count; i++) { + if (mock_entries[i] == e) { + // printf("expire entry %p with expiry %llu\n", e, mockGetExpiry(e)); + mockFreeEntry(e); + mock_entries[i] = mock_entries[--mock_entry_count]; + return 1; + } + } + return 0; +} + +/* --------- Helper Functions --------- */ +mock_entry *mock_entry_create(const char *keystr, long long expiry) { + return mockCreateEntry(keystr, expiry); +} + +int insert_mock_entry(vset *set) { + if (mock_entry_count >= MAX_ENTRIES) return 0; + char keybuf[32]; + snprintf(keybuf, sizeof(keybuf), "key_%d", mock_entry_count); + + long long expiry = rand() % 10000 + 100; + mock_entry *e = mock_entry_create(keybuf, expiry); + // printf("adding entry %p with expiry %llu\n", e, expiry); + TEST_ASSERT(vsetAddEntry(set, mockGetExpiry, e)); + mock_entries[mock_entry_count++] = e; + return 0; +} + +int insert_mock_entry_with_expiry(vset *set, long long expiry) { + if (mock_entry_count >= MAX_ENTRIES) return 0; + char keybuf[32]; + snprintf(keybuf, sizeof(keybuf), "key_%d", mock_entry_count); + + mock_entry *e = mock_entry_create(keybuf, expiry); + // printf("adding entry %p with expiry %llu\n", e, expiry); + TEST_ASSERT(vsetAddEntry(set, mockGetExpiry, e)); + mock_entries[mock_entry_count++] = e; + return 0; +} + +int update_mock_entry(vset *set) { + if (mock_entry_count == 0) return 0; + int idx = rand() % mock_entry_count; + mock_entry *old = mock_entries[idx]; + long long old_expiry = mockGetExpiry(old); + long long new_expiry = old_expiry + (rand() % 500); + mock_entry *updated = mockEntryUpdate(old, new_expiry); + mock_entries[idx] = updated; + // printf("Update entry %p with entry %p with old expiry %llu new expiry %llu\n", old, updated, old_expiry, new_expiry); + TEST_ASSERT(vsetUpdateEntry(set, mockGetExpiry, old, updated, old_expiry, new_expiry)); + return 0; +} + +int remove_mock_entry(vset *set) { + if (mock_entry_count == 0) return 0; + int idx = rand() % mock_entry_count; + mock_entry *e = mock_entries[idx]; + // printf("removing entry %p with expiry %llu\n", e, mockGetExpiry(e)); + TEST_ASSERT(vsetRemoveEntry(set, mockGetExpiry, e)); + mockFreeEntry(e); + mock_entries[idx] = mock_entries[--mock_entry_count]; + + return 0; +} + + +int expire_mock_entries(vset *set, mstime_t now) { + // printf("Before expired entries entries: %d\n", mock_entry_count); + vsetRemoveExpired(set, mockGetExpiry, mock_entry_expire, now, mock_entry_count, &now); + // printf("After expired %zu entries left entries: %d and set is empty: %s\n", count, mock_entry_count, vsetIsEmpty(set) ? "true" : "false"); + return 0; +} + +void *mock_defragfn(void *ptr) { + size_t size = zmalloc_size(ptr); + void *newptr = zmalloc(size); + memcpy(newptr, ptr, size); + zfree(ptr); + return newptr; +} + +int mock_defrag_rax_node(raxNode **noderef) { + raxNode *newnode = mock_defragfn(*noderef); + if (newnode) { + *noderef = newnode; + return 1; + } + return 0; +} + +size_t defrag_vset(vset *set, size_t cursor, size_t steps) { + if (steps == 0) steps = ULONG_MAX; + do { + cursor = vsetScanDefrag(set, cursor, mock_defragfn, mock_defrag_rax_node); + steps--; + } while (cursor != 0 && steps > 0); + return cursor; +} + +int free_mock_entries(void) { + for (int i = 0; i < mock_entry_count; i++) { + mock_entry *e = mock_entries[i]; + mockFreeEntry(e); + } + mock_entry_count = 0; + return 0; +} + +/* --------- Defrag Test --------- */ +int test_vset_defrag(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + srand(time(NULL)); + + vset set; + vsetInit(&set); + + /* defrag empty set */ + TEST_ASSERT(defrag_vset(&set, 0, 0) == 0); + + /* defrag when single entry */ + insert_mock_entry(&set); + TEST_ASSERT(defrag_vset(&set, 0, 0) == 0); + + /* defrag when vector */ + for (int i = 0; i < 127 - 1; i++) + insert_mock_entry(&set); + TEST_ASSERT(defrag_vset(&set, 0, 0) == 0); + + long long expiry = rand() % 10000 + 100; + for (int i = 0; i < 127 * 2; i++) { + insert_mock_entry_with_expiry(&set, expiry); + } + TEST_ASSERT(defrag_vset(&set, 0, 0) == 0); + + size_t cursor = 0; + for (int i = 0; i < NUM_ITERATIONS; i++) { + if (i % NUM_DEFRAG_STEPS == 0) + cursor = defrag_vset(&set, cursor, NUM_DEFRAG_STEPS); + insert_mock_entry_with_expiry(&set, expiry); + } + TEST_ASSERT(defrag_vset(&set, 0, 0) == 0); + + vsetRelease(&set); + free_mock_entries(); + + return 0; +} + +/* --------- Fuzzer Test --------- */ +int test_vset_fuzzer(int argc, char **argv, int flags) { + UNUSED(argc); + UNUSED(argv); + UNUSED(flags); + srand(time(NULL)); + + vset set; + vsetInit(&set); + + for (int i = 0; i < NUM_ITERATIONS; i++) { + int op = rand() % 5; + switch (op) { + case 0: + case 1: + insert_mock_entry(&set); + break; + case 2: + update_mock_entry(&set); + break; + case 3: + remove_mock_entry(&set); + break; + case 4: + TEST_ASSERT(defrag_vset(&set, 0, 0) == 0); + break; + } + + if (i % 100 == 0) { + mstime_t now = rand() % 10000; + expire_mock_entries(&set, now); + } + } + /* now expire all the entries and check that we have no entries left */ + expire_mock_entries(&set, LONG_LONG_MAX); + TEST_ASSERT(vsetIsEmpty(&set) && mock_entry_count == 0); + vsetRelease(&set); + free_mock_entries(); /* Just in case */ + return 0; +} diff --git a/src/volatile_set.c b/src/volatile_set.c deleted file mode 100644 index 97cbbbab870..00000000000 --- a/src/volatile_set.c +++ /dev/null @@ -1,79 +0,0 @@ -#include -#include "volatile_set.h" -#include "zmalloc.h" -#include "config.h" -#include "endianconv.h" -#include "serverassert.h" - -#define EXPIRY_HASH_SIZE 16 -volatile_set *createVolatileSet(volatileEntryType *type) { - volatile_set *set = zmalloc(sizeof(volatile_set)); - set->etypr = type; - set->expiry_buckets = raxNew(); - return set; -} - -void freeVolatileSet(volatile_set *b) { - raxFree(b->expiry_buckets); - zfree(b); -} - -int volatileSetAddEntry(volatile_set *set, void *entry, long long expiry) { - unsigned char buf[EXPIRY_HASH_SIZE]; - expiry = htonu64(expiry); - memcpy(buf, &expiry, sizeof(expiry)); - memcpy(buf + 8, &entry, sizeof(entry)); - if (sizeof(entry) == 4) memset(buf + 12, 0, 4); /* Zero padding for 32bit target. */ - return raxTryInsert(set->expiry_buckets, buf, sizeof(buf), NULL, NULL); -} - -int volatileSetRemoveEntry(volatile_set *set, void *entry, long long expiry) { - unsigned char buf[EXPIRY_HASH_SIZE]; - expiry = htonu64(expiry); - memcpy(buf, &expiry, sizeof(expiry)); - memcpy(buf + 8, &entry, sizeof(entry)); - if (sizeof(entry) == 4) memset(buf + 12, 0, 4); /* Zero padding for 32bit target. */ - return raxRemove(set->expiry_buckets, buf, sizeof(buf), NULL); -} - -int volatileSetUpdateEntry(volatile_set *set, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { - if (old_entry == new_entry && old_expiry == new_expiry) return 1; - - if (old_entry && old_expiry != -1) { - assert(volatileSetRemoveEntry(set, old_entry, old_expiry)); - } - if (new_entry && new_expiry != -1) { - assert(volatileSetAddEntry(set, new_entry, new_expiry)); - } - return 1; -} - -int volatileSetExpireEntry(volatile_set *set, void *entry) { - volatileSetRemoveEntry(set, entry, set->etypr->getExpiry(entry)); - if (set->etypr->expire) { - set->etypr->expire(entry); - return 1; - } - return 0; -} - -size_t volatileSetNumEntries(volatile_set *set) { - assert(set && set->expiry_buckets); - return set->expiry_buckets->numele; -} - -void volatileSetStart(volatile_set *set, volatileSetIterator *it) { - raxStart(&it->bucket, set->expiry_buckets); -} - -int volatileSetNext(volatileSetIterator *it, void **entryptr) { - if (raxNext(&it->bucket)) { - assert(it->bucket.key_len == EXPIRY_HASH_SIZE); - memcpy(entryptr, it->bucket.key + sizeof(long long), sizeof(*entryptr)); - return 1; - } - return 0; -} -void volatileSetReset(volatileSetIterator *it) { - raxStop(&it->bucket); -} diff --git a/src/volatile_set.h b/src/volatile_set.h deleted file mode 100644 index 37dc7c9923a..00000000000 --- a/src/volatile_set.h +++ /dev/null @@ -1,40 +0,0 @@ -#ifndef VOLATILESET_H -#define VOLATILESET_H - -#include -#include "rax.h" -#include "sds.h" - -typedef struct { - sds (*entryGetKey)(const void *entry); - - long long (*getExpiry)(const void *entry); - - int (*expire)(void *entry); - -} volatileEntryType; - - -typedef struct { - volatileEntryType *etypr; - rax *expiry_buckets; -} volatile_set; - -typedef struct volatileSetIterator { - raxIterator bucket; -} volatileSetIterator; - - -int volatileSetRemoveEntry(volatile_set *set, void *entry, long long expiry); -int volatileSetAddEntry(volatile_set *set, void *entry, long long expiry); -int volatileSetExpireEntry(volatile_set *set, void *entry); -int volatileSetUpdateEntry(volatile_set *set, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry); -size_t volatileSetNumEntries(volatile_set *set); -void volatileSetStart(volatile_set *set, volatileSetIterator *it); -int volatileSetNext(volatileSetIterator *it, void **entryptr); -void volatileSetReset(volatileSetIterator *it); - -void freeVolatileSet(volatile_set *b); -volatile_set *createVolatileSet(volatileEntryType *type); - -#endif diff --git a/src/vset.c b/src/vset.c new file mode 100644 index 00000000000..4a5bc144184 --- /dev/null +++ b/src/vset.c @@ -0,0 +1,2393 @@ +#include "vset.h" +#include "rax.h" +#include "endianconv.h" +#include "serverassert.h" +#include "hashtable.h" +#include "util.h" +#include "zmalloc.h" + +#include +#include +#include + +#ifndef static_assert +#define static_assert _Static_assert +#endif + +/* + *----------------------------------------------------------------------------- + * Volatile Set - Adaptive, Expiry-aware Set Structure + *----------------------------------------------------------------------------- + * + * The `vset` is a dynamic, memory-efficient container for managing + * entries with expiry semantics. It is designed to efficiently track entries + * that expire at varying times and scales to large sets by adapting its internal + * representation as it grows or shrinks. + * + *----------------------------------------------------------------------------- + * Expiry Buckets and Pointer Tagging + *----------------------------------------------------------------------------- + * + * Internally, the `vset` maintains a single `vsetBucket*` pointer, + * which can point to different types of buckets depending on the number of + * entries and the needed resolution. The pointer is tagged using the lowest 3 bits: + * + * #define VSET_BUCKET_NONE -1 + * #define VSET_BUCKET_SINGLE 0x1ULL // pointer to single entry (odd ptr) + * #define VSET_BUCKET_VECTOR 0x2ULL // pointer to pointer vector + * #define VSET_BUCKET_HT 0x4ULL // pointer to hashtable + * #define VSET_BUCKET_RAX 0x6ULL // pointer to radix tree + * + * #define VSET_TAG_MASK 0x7ULL + * #define VSET_PTR_MASK (~VSET_TAG_MASK) + * + * IMPORTANT!!!! - All entries must have LSB set (i.e., be odd-aligned) to be compatible with !!!! + * tagging constraints. + * + *----------------------------------------------------------------------------- + * Time Bucket Management + *----------------------------------------------------------------------------- + * + * Entries are grouped into **time buckets** based on their expiry time. + * Each time bucket represents a window aligned to: + * + * #define VOLATILESET_BUCKET_INTERVAL_MIN (1 << 4) // 16ms + * #define VOLATILESET_BUCKET_INTERVAL_MAX (1 << 13) // 8192ms + * + * A time bucket key is computed by rounding the expiry timestamp up to the + * nearest aligned window using `get_bucket_ts()`. + * + *----------------------------------------------------------------------------- + * Entry Addition and Bucket Promotion + *----------------------------------------------------------------------------- + * + * When a new entry is added: + * + * 1. If the current set is `NONE`, it becomes a `SINGLE` bucket. + * 2. If the set is a `SINGLE` bucket and another entry arrives: + * -> it is promoted to a `VECTOR` bucket (sorted by expiry). + * 3. If the `VECTOR` exceeds `VOLATILESET_VECTOR_BUCKET_MAX_SIZE` (127): + * -> the set becomes a `RAX`, and existing entries are migrated. + * 4. IF the set is using RAX encoding it will locate a bucket to add the entry + * following the strategy explained below. + * + *----------------------------------------------------------------------------- + * RAX Bucket and Dynamic Splitting + *----------------------------------------------------------------------------- + * + * Each bucket in the RAX bucket corresponds to a **time window**, defined by + * its bucket timestamp (`bucket_ts`). This timestamp represents the **END** of + * the time window. Entries in the bucket must expire *before* this timestamp. + * + * Time windows are defined in granular ranges: + * - Minimum granularity: VOLATILESET_BUCKET_INTERVAL_MIN (16 ms) + * - Maximum granularity: VOLATILESET_BUCKET_INTERVAL_MAX (8192 ms) + * + * A bucket can only contain entries that: + * 1. Have expiry < bucket_ts + * 2. Do not fit into any bucket with a smaller timestamp (i.e., earlier window) + * + * The structure allows multiple encodings: + * VSET_BUCKET_SINGLE - A single pointer to one entry. + * VSET_BUCKET_VECTOR - A sorted vector of pointers (up to 127 entries). + * VSET_BUCKET_HT - A hashtable used when vectors become too dense. + * + * Bucket Timestamp (END of window): + * + * |------------------ Bucket Span ------------------| + * [window_start .................................. bucket_ts) + * + * Layout Example: + * + * Timeline: ----------> increasing time -----------> + * +--------------+-------------+---------+ + * | B0 | B1 | B2 | + * | ts=32 | ts=128 | ts=2048 | + * +--------------+-------------+---------+ + * ^ ^ ^ + * | | | + * [E1,E2] ∈ B0 [E3...E7] ∈ B1 [E8...E15] ∈ B2 + * + * All entries expire BEFORE their bucket_ts + * + * Bucket Splitting Strategy: + * ---------------------------------- + * + * When a bucket (e.g. VECTOR) becomes too dense or needs realignment: + * + * 1. Re-align to lower granularity: + * - Adjust the bucket timestamp down to a finer granularity (e.g. 16ms). + * - Only done if ALL entries still fit in the tighter window. + * - Effectively “moves” the bucket to an earlier timestamp. + * + * Example: B(ts=128, span=128ms) -> B(ts=64, span=16ms) + * + * 2. Split into two buckets: + * - Use binary search to find a “natural” boundary based on entry expiry. + * - Original bucket retains its timestamp (but holds fewer entries). + * - New bucket is inserted before the current one with its own tighter timestamp. + * + * Example: + * + * Before: + * [ Entry0 ... Entry126 ] -> B(ts=128) + * + * After Split: + * [ Entry0...Entry62 ] -> New B(ts=64) + * [ Entry63...Entry126 ] -> Original B(ts=128) + * + * 3. Convert to hashtable: + * - When no clean split is found (e.g. all entries share similar expiry), + * and realignment is not possible. + * - This allows efficient O(1) lookups even with clustered expiry values. + * + * Vector B(ts=128) -> Hashtable B(ts=128) + * + * This hierarchical design ensures: + * - Efficient memory usage (tight buckets) + * - Predictable iteration by expiry time + * - Low overhead insertions & deletions + * - Graceful promotion & demotion of bucket types + * + * NOTE: Buckets are always sorted by their `bucket_ts` in the radix tree (RAX), + * which allows efficient search for insertion/removal based on expiry. + * + *----------------------------------------------------------------------------- + * RAX Bucket Layout + *----------------------------------------------------------------------------- + * + * * RAX View with Time Keys: + * + * expiry_buckets = rax * | 0x6 + * + * +--------------------------+ + * | RAX (key = bucket_ts) | + * |--------------------------| + * | "000016" -> [entry1] | <- Vector (SINGLE->VECTOR->HT) + * | "000032" -> [entry2...] | <- Full vector, might split + * | "000048" -> [entry...] | + * +--------------------------+ + * + * * Splitting a Full Vector in RAX: + * + * Suppose vector at key "000032" has 13 entries: + * + * 1. Use binary search to find a transition point in expiry bucket_ts. + * We search the first 2 following entries which belong to different lwo granularity time windows, + * but as close as possible to the middle of the vector: + * [entry1, entry7, ..., entry13] + * ↑ + * split (first where get_bucket_ts(entry) > min_ts) + * + * 2. Create two vectors: + * bucket A -> [entry1..entry6] with key = "000032" + * bucket B -> [entry7..entry13] with key = "000048" + * + * 3. Insert both back to the RAX. + * + *----------------------------------------------------------------------------- + * Bucket Lifecycle + *----------------------------------------------------------------------------- + * + * NONE + * | + * v + * SINGLE (1 entry) + * | + * v + * VECTOR (sorted, up to 127) + * | + * v + * RAX (holds multiple buckets, keyed by each bucket's end timestamp) + * Bucket types within a RAX: + * + * SINGLE + * | + * v + * VECTOR (sorted, up to 127, can split + * | into multiple vectors) + * | + * v + * HASHTABLE (only when a vector can't split) + */ + +/************************************************************************************************************* + * pVector Implementation + *************************************************************************************************************/ + +#define PV_CARD_BITS 30 +#define PV_ALLOC_BITS 34 + +/* Custom vector structure with embedded allocation and length counters */ +typedef struct { + uint64_t len : PV_CARD_BITS; /* Number of elements (cardinality) */ + uint64_t alloc : PV_ALLOC_BITS; /* Allocated memory (zmalloc_size of the current vector allocation) */ + void *data[]; /* Flexible array member */ +} pVector; + +static const size_t PV_HEADER_SIZE = (sizeof(pVector)); + + +/* Returns the number of elements currently stored in the pVector. + * + * Arguments: + * vec - The pVector to query. + * + * Return: + * The number of elements in the vector. + * Note that a NULL is a !!!valid!!! vector - returns 0 if the vector is NULL. */ +static inline uint32_t +pvLen(pVector *vec) { + return (vec ? vec->len : 0); +} + +/* Returns the number of bytes allocated by the os to store the vector. + * This value is equal to the usable size returned by calling zrealloc_usable. + * + * Arguments: + * vec - The pVector to query. + * + * Return: + * The allocation size of the vector + * Note that a NULL is a !!!valid!!! vector - returns 0 if the vector is NULL. */ +static inline uint32_t pvAlloc(pVector *vec) { + return (vec ? vec->alloc : 0); +} + +/* Ensures that a pVector has enough capacity to hold additional elements. + * + * This function guarantees that the given pVector `pv` has at least enough + * allocated space to accommodate `additional` more elements, growing it if necessary. + * If the vector is currently `NULL`, it will be newly allocated. + * + * The allocation is handled using `zmalloc` or `zrealloc_usable`, depending on whether + * the vector is new or already initialized. The internal `alloc` field is updated to + * reflect the actual allocated size. + * + * Arguments: + * pv - Pointer to an existing pVector or NULL. + * additional - The number of additional elements the vector should be able to accommodate. + * + * Return: + * A pointer to the resized (or newly allocated) pVector with sufficient capacity. + * + * Note: + * The `additional` is the number of *additional* elements beyond the current length. + * This function does not modify the vector's logical length (`len`), only its allocation. */ +static pVector *pvMakeRoomFor(pVector *pv, size_t additional) { + if (additional == 0) return pv; + /* Make sure we will have the capacity to store the extra number of elements */ + assert(pvLen(pv) + additional <= (1UL << PV_CARD_BITS) - 1); + + size_t required = PV_HEADER_SIZE + (pvLen(pv) + additional) * sizeof(void *); + + if (pvAlloc(pv) >= required) return pv; + + if (!pv) { + pv = zmalloc(required); + pv->len = 0; + } else { + pv = zrealloc_usable(pv, required, &required); + } + /* Make sure we have the capacity to save the alloation size */ + assert(required <= (size_t)((1ULL << PV_ALLOC_BITS) - 1)); + pv->alloc = required; + return pv; +} + +/* Shrinks a pVector to release unused allocated memory. + * + * This function checks if the current allocation (`used`) for the given + * `pVector` exceeds the memory actually required to store its elements. + * If so, it reallocates the vector to use only the needed memory, helping reduce + * memory overhead and improve space efficiency. + * + * The function uses `zrealloc_usable()` to reallocate memory in a way compatible + * with jemalloc (or other zmalloc backends) and updates the internal allocation + * size (`alloc`) to reflect the new length. + * + * Arguments: + * pv - A pointer to the `pVector` to shrink. + * + * Return: + * A potentially reallocated `pVector` with minimized memory usage. + * + * This function does not change the logical contents of the vector. + * It only adjusts the allocated memory footprint. If no reallocation + * is needed, the original pointer is returned unchanged. + * + * Example: + * pVector *vec = pvNew(); + * // After some insertions and deletions + * vec = pvShrinkToFit(vec); */ +static pVector *pvShrinkToFit(pVector *pv) { + if (!pv) return NULL; + + size_t used = pvAlloc(pv); + size_t required = pvLen(pv) == 0 ? 0 : PV_HEADER_SIZE + pvLen(pv) * sizeof(void *); + + if (used > required) { + if (!required) { + zfree(pv); + return NULL; + } + pv = zrealloc_usable(pv, required, &required); + pv->alloc = required; + } + return pv; +} + +/** + * pvSplit - Splits a pVector into two parts at a given index. + * + * Arguments: + * pv_ptr: A pointer to the pVector* to split. This pointer is + * updated in-place to point to the left portion (elements [0..split_index-1]). + * split_index: The index at which to split the vector. The resulting right + * vector will contain elements [split_index..len-1]. + * + * This function is used to **efficiently split a sorted vector of pointers** + * into two separate vectors. The original vector is truncated in-place to + * only contain the first half, and a new vector is returned containing the + * second half. This allows for logical partitioning of data without scanning + * or reallocating unnecessary memory. + * + * The vector is assumed to be densely packed and its elements are of type `void*`. + * + * Memory is allocated for the new right vector using `zmalloc`, and the unused + * portion of the original vector may be freed or shrunk via `pvShrinkToFit` + * to optimize memory usage. + * + * Return: + * - A new pVector containing the right split [split_index..len-1]. + * + * Side effects: + * - The original vector pointer (`*pv_ptr`) is modified to point to the + * resized left portion. + * + * Example: + * -------- + * Suppose `pv_ptr` points to a vector of 5 elements: + * [A, B, C, D, E] + * + * Calling: + * pVector *right = pvSplit(&pv_ptr, 3); + * + * Results in: + * pv_ptr -> [A, B, C] + * right -> [D, E] + * + * If the split_index is 5 (i.e. the end), the function returns NULL and the + * original vector is unchanged. */ +pVector *pvSplit(pVector **pv_ptr, uint32_t split_index) { + pVector *pv = *pv_ptr; + + /* Handle edge cases: */ + + /* 1. null vector, ot split index which includes the entire vector in the left size + * Should simply return a NULL vector (right size). + */ + if (!pv || split_index >= pvLen(pv)) return NULL; + + /* 2. zero split index means no left side. just return the existing vector and zero the input vector. */ + if (split_index == 0) { + *pv_ptr = NULL; + return pv; + } + + // Number of elements for the right half + uint64_t right_len = pv->len - split_index; + + // Allocate new vector for right part + size_t item_bytes = sizeof(void *); + size_t total_bytes = sizeof(pVector) + right_len * item_bytes; + size_t new_alloc; + pVector *right = zmalloc_usable(total_bytes, &new_alloc); + right->alloc = new_alloc; + right->len = right_len; + + // Copy the right part + memcpy(&right->data[0], &pv->data[split_index], right_len * item_bytes); + + // Shrink original vector + pv->len = split_index; + *pv_ptr = pvShrinkToFit(pv); + + return right; +} + +/* Creates a new pVector with the specified initial capacity. + * + * This function initializes a new pVector capable of holding at least + * `capacity` elements. Internally, it delegates allocation and setup to + * `pvMakeRoomFor`, starting from a NULL vector. + * + * Arguments: + * capacity - The initial number of elements the vector should be able to store. + * + * Return: + * A pointer to the newly allocated pVector. + * Note that a NULL is a !!valid!! cector which size is zero. + * + * Note: + * The logical length (`len`) of the returned vector is initialized to 0. + */ +pVector *pvNew(uint32_t capacity) { + return pvMakeRoomFor(NULL, capacity); +} + +/* Inserts an element at the specified position in the pVector. + * + * Ensures enough capacity for the new element, shifts elements to make space, + * and inserts the given element at the desired position. + * + * Arguments: + * pv - The pVector to insert into (can be NULL). + * elem - The pointer to be inserted. + * idx - The index at which to insert the element (must be ≤ pv->len). + * + * Return: + * The updated pVector with the element inserted. */ +pVector *pvInsertAt(pVector *pv, void *elem, uint32_t idx) { + assert(idx <= pv->len); + pv = pvMakeRoomFor(pv, 1); + + if (idx < pv->len) { + memmove(&pv->data[idx + 1], &pv->data[idx], (pv->len - idx) * sizeof(void *)); + } + + pv->data[idx] = elem; + pv->len++; + return pv; +} + +/* Finds the index of the given element in the pVector. + * + * Parameters: + * pv - The vector to search. + * elem - The element to look for (pointer equality). + * + * Returns: + * The index of the element if found; otherwise, returns pv->len (i.e., not found). + * + * Notes: + * - This compares elements using raw pointer equality (`==`). + * - If pv is NULL or empty, returns 0 as a safe fallback. + * - Return value being equal to pv->len can be used to check for absence. */ +uint32_t pvFind(pVector *pv, void *elem) { + if (!pv || pv->len == 0) return 0; + + for (uint32_t i = 0; i < pv->len; i++) { + if (pv->data[i] == elem) { + return i; + } + } + return pv->len; +} + + +/* Removes the element at the specified index from the pVector. + * + * Shifts elements as necessary and optionally shrinks the vector if memory can be saved. + * If this is the last element in the vector, the vector is freed and NULL is returned. + * + * Arguments: + * pv - The pVector to remove from. + * idx - The index of the element to remove (must be < pv->len). + * + * Return: + * The updated pVector after removal. + * Returns NULL if the last element was removed and the vector was freed. */ +pVector *pvRemoveAt(pVector *pv, uint32_t idx) { + assert(pv && pv->len > 0); + assert(idx < pv->len); + if (pv->len == 1) { + /* Last element being removed; delete vector */ + zfree(pv); + return NULL; + } else if (idx < pv->len - 1UL) + memmove(&pv->data[idx], &pv->data[idx + 1], (pv->len - idx - 1) * sizeof(void *)); + pv->len--; + return pvShrinkToFit(pv); +} + +/* Removes the first matching element from the pVector. + * + * Performs a linear search for the given pointer and removes the first match. + * Updates the vector pointer in case a removal was done. + * + * Arguments: + * pv - A pointer to the pVector to remove from. + * elem - The element pointer to match and remove. + * removed - A pointer to a memory location to store the result of the removal. + * + * Return: + * the vector after the removal attempt */ +pVector *pvRemove(pVector *pv, void *elem, bool *removed) { + bool was_removed = false; + if (pv && pvLen(pv) > 0) { + uint32_t idx = pvFind(pv, elem); + if (idx < pvLen(pv)) { + pv = pvRemoveAt(pv, idx); + was_removed = true; + } + } + *removed = was_removed; + return pv; +} + +/* Retrieves the element at the specified index in the pVector. + * + * Arguments: + * vec - The pVector to retrieve from. + * idx - The index of the element to access. + * + * Return: + * A pointer to the element at the given index. + * Returns NULL if the vector is NULL or the index is out of bounds. */ +void *pvGet(pVector *pv, uint32_t idx) { + assert(pv && idx < pvLen(pv)); + return pv->data[idx]; +} + +/* Frees the memory used by the pVector. + * + * Arguments: + * pv - The pVector to free. + * + * Return: + * None. */ +void pvFree(pVector *pv) { + if (pv) zfree(pv); +} + +/* Appends an element to the end of the given pVector. + * + * Parameters: + * pv - The vector to append to. + * elem - The element to append. + * + * Returns: + * A (possibly reallocated) pVector with the new element inserted at the end. + * + * Notes: + * Internally this uses pvInsert() with the current length of the vector, + * effectively appending the element. */ +pVector *pvPush(pVector *pv, void *elem) { + return pvInsertAt(pv, elem, pvLen(pv)); +} + +/* Removes and optionally returns the last element from the given pVector. + * + * Parameters: + * pv - The vector to remove the element from. + * pelem - Optional pointer to store the popped element. Can be NULL. + * + * Returns: + * A (possibly reallocated) pVector with the last element removed. + * + * Notes: + * Calling this function on an empty vector will trigger assertion. + * You can pass NULL for `pelem` if you don't need the removed value. */ +pVector *pvPop(pVector *pv, void **pelem) { + assert(pvLen(pv) > 0); + uint32_t last_idx = pvLen(pv) - 1; + if (pelem) *pelem = pvGet(pv, last_idx); + return pvRemoveAt(pv, last_idx); +} + +/* Set the element at given index inside the pVector. + * + * Parameters: + * pv - The vector containing the elements to swap. + * idx - Index of the element. + * elem - pointer to the new element. + * + * Returns: + * None. + * + * Preconditions: + * - idx must be valid indices within the vector. */ +void pvSet(pVector *pv, uint32_t idx, void *elem) { + assert(idx < pvLen(pv)); + pv->data[idx] = elem; +} + +/* Swaps two elements at given indices inside the pVector. + * + * Parameters: + * pv - The vector containing the elements to swap. + * idx1 - Index of the first element. + * idx2 - Index of the second element. + * + * Returns: + * None. + * + * Preconditions: + * - idx1 and idx2 must both be valid indices within the vector. + * + * Notes: + * This is a simple in-place swap that uses direct pointer assignment. */ +void pvSwap(pVector *pv, uint32_t idx1, uint32_t idx2) { + assert(pv && pvLen(pv) > 0 && idx1 < pvLen(pv) && idx2 < pvLen(pv)); + void *temp = pv->data[idx1]; + pv->data[idx1] = pv->data[idx2]; + pv->data[idx2] = temp; +} + +/* Sort the elements of a pVector using a user-provided comparison function. + * + * This function performs an in-place sort of the elements in the given pVector. + * It uses the standard C library `qsort()` function under the hood and assumes + * the elements are pointers. The caller must supply a comparison function + * compatible with `qsort()`, which determines the ordering of the elements. + * + * Parameters: + * pv - A pointer to the pVector to sort. + * compare - A function pointer used to compare two elements. This function must + * match the signature: int compare(const void *a, const void *b) + * and return: + * < 0 if *a < *b + * > 0 if *a > *b + * 0 if *a == *b + * + * Returns: + * None. The pVector is sorted in place. + * + * Example: + * int cmp(const void *a, const void *b) { + * return strcmp(*(const char **)a, *(const char **)b); + * } + * + * pvSort(my_vector, cmp); */ +void pvSort(pVector *pv, int (*compare)(const void *a, const void *b)) { + if (pvLen(pv) <= 1) return; + qsort(pv->data, pv->len, sizeof(void *), compare); +} + +/************************************************************************************************************* + * pVector End + *************************************************************************************************************/ + +#define VOLATILESET_BUCKET_INTERVAL_MAX (1LL << 13LL) // 2^13 = 8192 milliseconds +#define VOLATILESET_BUCKET_INTERVAL_MIN (1LL << 4LL) // 2^4 = 16 milliseconds + +#define VOLATILESET_VECTOR_BUCKET_MAX_SIZE 127 + +#define VSET_NONE_BUCKET_PTR ((void *)(uintptr_t) - 1) +#define VSET_BUCKET_NONE -1 // matching the NULL case +#define VSET_BUCKET_SINGLE 0x1UL // xx1 (assuming sds) +#define VSET_BUCKET_VECTOR 0x2UL // 010 +#define VSET_BUCKET_HT 0x4UL // 100 +#define VSET_BUCKET_RAX 0x6UL // 110 + +#define VSET_TAG_MASK 0x7UL +#define VSET_PTR_MASK (~VSET_TAG_MASK) + +// Generic bucket type +typedef void vsetBucket; + +typedef struct vsetInternalIterator { + /* for rax bucket */ + raxIterator riter; + union { + /* for hashtable bucket */ + hashtableIterator hiter; + /* for vector bucket */ + uint32_t viter; + /* for single bucket */ + void *vsingle; + }; + /* the parent of the bucket we are currently iterating on */ + vsetBucket *parent_bucket; + /* the bucket we are currently iterating on */ + vsetBucket *bucket; + /* the pointer entry */ + void *entry; + /* In case of rax encoded set, this is the current iterated bucket timestamp */ + long long bucket_ts; + /* the state of the iteration */ + int iteration_state; +} vsetInternalIterator; + +/* The opaque hashtableIterator is defined as a blob of bytes. */ +static_assert(sizeof(vsetIterator) >= sizeof(vsetInternalIterator), + "Opaque iterator size"); + +/* Conversion from user-facing opaque iterator type to internal struct. */ +static inline vsetInternalIterator *iteratorFromOpaque(vsetIterator *iterator) { + return (vsetInternalIterator *)(void *)iterator; +} + +/* Conversion from user-facing opaque iterator type to internal struct. */ +static inline vsetIterator *opaqueFromIterator(vsetInternalIterator *iterator) { + return (vsetIterator *)(void *)iterator; +} + + +/* Determine bucket type */ +static inline int vsetBucketType(vsetBucket *b) { + assert(b); + if (b == VSET_NONE_BUCKET_PTR) return VSET_BUCKET_NONE; + + uintptr_t bits = (uintptr_t)b; + if (bits & 0x1) + return VSET_BUCKET_SINGLE; + return bits & VSET_TAG_MASK; +} + +/* Access raw pointer */ +static inline void *vsetBucketRawPtr(vsetBucket *b) { + return (void *)((uintptr_t)b & VSET_PTR_MASK); +} + +// Accessors with type assertions +static inline pVector *vsetBucketVector(vsetBucket *b) { + assert(vsetBucketType(b) == VSET_BUCKET_VECTOR); + return (pVector *)vsetBucketRawPtr(b); +} + +static inline hashtable *vsetBucketHashtable(vsetBucket *b) { + assert(vsetBucketType(b) == VSET_BUCKET_HT); + return (hashtable *)vsetBucketRawPtr(b); +} + +static inline rax *vsetBucketRax(vsetBucket *b) { + assert(vsetBucketType(b) == VSET_BUCKET_RAX); + return (rax *)vsetBucketRawPtr(b); +} + +static inline void *vsetBucketSingle(vsetBucket *b) { + return b; +} + +static inline vsetBucket *vsetBucketFromRawPtr(void *ptr, int type) { + uintptr_t p = (uintptr_t)ptr; + return (vsetBucket *)(p | (type & VSET_TAG_MASK)); +} + +static inline vsetBucket *vsetBucketFromVector(pVector *vec) { + return vsetBucketFromRawPtr(vec, VSET_BUCKET_VECTOR); +} + +static inline vsetBucket *vsetBucketFromHashtable(hashtable *ht) { + return vsetBucketFromRawPtr(ht, VSET_BUCKET_HT); +} + +static inline vsetBucket *vsetBucketFromSingle(void *ptr) { + return ptr; +} + +static inline vsetBucket *vsetBucketFromNone(void) { + return VSET_NONE_BUCKET_PTR; +} + +static inline vsetBucket *vsetBucketFromRax(rax *r) { + return vsetBucketFromRawPtr(r, VSET_BUCKET_RAX); +} + +/****************** Helper Functions *******************************************/ + +/* compare 2 expiration times */ +#define EXPIRE_COMPARE(exp1, exp2) (exp1 < exp2 ? -1 : exp1 == exp2 ? 0 \ + : 1) + +/* Since we do not have native posix support for qsort_r, we use this variable to help the vset + * compare function operate entry comparison given a dynamic getExpiry function is passed to + * different vset functions. */ +static __thread vsetGetExpiryFunc current_getter_func; + +static inline void vsetSetExpiryGetter(vsetGetExpiryFunc f) { + assert(current_getter_func == NULL); + current_getter_func = f; +} + +static inline void vsetUnsetExpiryGetter(void) { + current_getter_func = NULL; +} + +static inline vsetGetExpiryFunc vsetGetExpiryGetter(void) { + return current_getter_func; +} + +static int vsetCompareEntries(const void *a, const void *b) { + vsetGetExpiryFunc getExpiry = vsetGetExpiryGetter(); + long long ea = getExpiry(*(void **)a); + long long eb = getExpiry(*(void **)b); + return (ea > eb) - (ea < eb); +} + +/* used for popping form rax bucket where we KNOW all entries are expired. */ +static long long vsetGetExpiryZero(const void *entry) { + UNUSED(entry); + return 0; +} + +static inline long long get_bucket_ts(long long expiry) { + return (expiry & ~(VOLATILESET_BUCKET_INTERVAL_MIN - 1LL)) + VOLATILESET_BUCKET_INTERVAL_MIN; +} + +static inline long long get_max_bucket_ts(long long expiry) { + return (expiry & ~(VOLATILESET_BUCKET_INTERVAL_MAX - 1LL)) + VOLATILESET_BUCKET_INTERVAL_MAX; +} + +static inline size_t encodeExpiryKey(long long expiry, unsigned char *key) { + long long be_ts = htonu64(expiry); + size_t size = sizeof(be_ts); + memcpy(key, &be_ts, size); + return size; +} + +static inline long long decodeExpiryKey(unsigned char *key) { + long long res; + memcpy(&res, key, sizeof(res)); + res = ntohu64(res); + return res; +} + +static inline size_t encodeNewExpiryBucketKey(unsigned char *key, long long expiry) { + long long bucket_ts = get_max_bucket_ts(expiry); + long long be_ts = htonu64(bucket_ts); + size_t size = sizeof(be_ts); + memcpy(key, &be_ts, size); + return size; +} + +/** + * Performs binary search to find the index where the element should be inserted. + * Returns the index where the element should be placed to keep the array sorted. + * + * pv Pointer to the sorted vector + * elem Pointer to the element to insert + * cmp Comparison function (like strcmp-style: <0, ==0, >0) + * returns the insertion index (between 0 and pv->len) */ +static inline uint32_t findInsertPosition(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, long long expiry) { + pVector *pv = vsetBucketVector(bucket); + uint32_t left = 0; + uint32_t right = pvLen(pv); + while (left < right) { + uint32_t mid = (left + right) / 2; + int res = EXPIRE_COMPARE(expiry, getExpiry(pv->data[mid])); + if (res <= 0) + right = mid; + else + left = mid + 1; + } + + return left; // Final position to insert the element +} + +/* findSplitPosition - Locate the first index where a bucket timestamp transition occurs + * + * This function finds a split point in a sorted pointer vector (`pVector`) of elements, + * where elements are grouped by their coarse-grained expiry time buckets. + * The goal is to identify the first pair of adjacent elements `e[i-1]` and `e[i]` + * such that: + * + * get_bucket_ts(getExpiry(e[i - 1])) < get_bucket_ts(getExpiry(e[i])) + * + * The vector is assumed to be sorted by the raw expiry timestamp (in ascending order). + * Bucket timestamps are derived using `get_bucket_ts()` on each element's expiry value. + * + * Arguments: + * - getExpiry: A function pointer that extracts an expiry timestamp from an element. + * - bucket: A pointer to a `vsetBucket` containing a sorted `pVector` of elements. + * - split_ts_out (optional): If provided, it will be set to the bucket timestamp of + * the last element in the lower (left) partition. + * + * The search begins from the middle of the vector and expands outwards in both + * directions, checking for the earliest position where a bucket transition occurs. + * This approach improves locality and helps produce balanced splits where possible. + * + * If a valid split is found, the function returns the index `i` at which the split + * should occur (i.e., elements `[0..i-1]` belong to one bucket, and `[i..len-1]` to another). + * If no split is found (i.e., all elements map to the same bucket), the function + * returns `pv->len`, indicating the entire vector belongs to one bucket. + * + * Return: + * - A split index in the range [1, pv->len), or + * - `pv->len` if no transition is found (no split possible). + * + * Example: + * -------- + * Raw expiry values: [1001, 1002, 1003, 2048, 2049] + * Bucket timestamps: [1024, 1024, 1024, 4096, 4096] + * + * This function returns index 3, as: + * get_bucket_ts(1003) == 1024 + * get_bucket_ts(2048) == 4096 → transition point + * + * So the vector can be split as: + * - Left partition: [1001, 1002, 1003] + * - Right partition: [2048, 2049] */ +static uint32_t findSplitPosition(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, long long *split_ts_out) { + pVector *pv = vsetBucketVector(bucket); + if (!pv || pv->len < 2) return pv ? pv->len : 0; + + int mid = pv->len / 2; + int offset = 0; + + while (1) { + int left = mid - offset; + int right = mid + offset; + + // Check left side (as long as i > 0 to allow e[i-1]) + if (left > 0) { + long long ts1 = get_bucket_ts(getExpiry(pvGet(pv, left - 1))); + long long ts2 = get_bucket_ts(getExpiry(pvGet(pv, left))); + if (ts1 < ts2) { + if (split_ts_out) *split_ts_out = ts1; + return left; + } + } + + // Check right side (as long as i > 0 to allow e[i-1]) + if (right > 0 && right < pv->len) { + long long ts1 = get_bucket_ts(getExpiry(pvGet(pv, right - 1))); + long long ts2 = get_bucket_ts(getExpiry(pvGet(pv, right))); + if (ts1 < ts2) { + if (split_ts_out) *split_ts_out = ts1; + return right; + } + } + + offset++; + if (mid - offset < 1 && mid + offset >= pv->len) break; // searched entire vector + } + + return pv->len; // no split found +} + +#define VSET_BUCKET_KEY_LEN 8 + +/* hash_pointer - Computes a high-quality 64-bit hash from a pointer value. + * + * This function is designed to produce a well-distributed hash from a memory + * pointer, avoiding the common pitfall of poor entropy due to pointer alignment. + * It uses a platform-dependent mixing strategy based on MurmurHash3 finalization + * constants, ensuring good avalanche behavior and low collision rates. + * + * For 32-bit systems: + * The function uses a reduced MurmurHash3 32-bit finalizer: + * - XORs and right shifts to mix higher-order bits into lower ones. + * - Multiplies by large constants to further spread the bits. + * + * + * For 64-bit systems: + * The function uses MurmurHash3 64-bit finalizer constants: + * - These constants are chosen to maximize bit diffusion and avoid hash clustering. + * - This version benefits from the full 64-bit pointer space. + * + * Why this works: + * - Pointers tend to have low entropy in their lower bits (due to alignment). + * - A naive cast to integer leads to clustering and collisions in hash tables. + * - This function performs fast and effective bit mixing to reduce collisions. + * - Ideal for use in pointer-keyed hash tables, interning systems, or caches. + * + * Note: + * - This is not a cryptographic hash. It is suitable for fast, internal use only. + * - Returns a 64-bit hash value, even on 32-bit systems. + * + * Returns: + * A 64-bit hash value derived from the input pointer. */ +static uint64_t hash_pointer(const void *ptr) { + uintptr_t x = (uintptr_t)ptr; +#if UINTPTR_MAX == 0xFFFFFFFF + // 32-bit platform + x ^= x >> 16; + x *= 0x85ebca6b; + x ^= x >> 13; + x *= 0xc2b2ae35; + x ^= x >> 16; + +#else + // 64-bit platform + x ^= x >> 33; + x *= 0xff51afd7ed558ccdULL; + x ^= x >> 33; + x *= 0xc4ceb9fe1a85ec53ULL; + x ^= x >> 33; +#endif + return (uint64_t)x; +} + +hashtableType pointerHashtableType = { + .hashFunction = hash_pointer, +}; + +static inline vsetBucket *findBucket(rax *expiry_buckets, long long expiry, unsigned char *key, size_t *key_len, long long *pbucket_ts, raxNode **node) { + *key_len = encodeExpiryKey(expiry, key); + vsetBucket *bucket = vsetBucketFromNone(); + /* First try to locate the first bucket which is larger than the specified key */ + raxIterator iter; + raxStart(&iter, expiry_buckets); + raxSeek(&iter, ">", (unsigned char *)key, *key_len); + + if (raxNext(&iter)) { + long long bucket_ts = decodeExpiryKey(iter.key); + /* If this bucket span over a window to far in the future, it is not a candidate. */ + if (get_max_bucket_ts(expiry) < bucket_ts) { + raxStop(&iter); + return vsetBucketFromNone(); + } + bucket = iter.data; + assert(iter.node->iskey); + if (node) *node = iter.node; + if (key) { + assert(iter.key_len == VSET_BUCKET_KEY_LEN); + memcpy(key, iter.key, iter.key_len); + } + if (pbucket_ts) *pbucket_ts = decodeExpiryKey(iter.key); + } + raxStop(&iter); + return bucket; +} + +/* Free all the vsetBucket memory. + * Since the bucket only holds references to entries the entries themselves are NOT freed */ +static void freeVsetBucket(vsetBucket *bucket) { + switch (vsetBucketType(bucket)) { + case VSET_BUCKET_NONE: + case VSET_BUCKET_SINGLE: + // No internal memory to free + break; + case VSET_BUCKET_VECTOR: + pvFree(vsetBucketVector(bucket)); + break; + case VSET_BUCKET_HT: + hashtableRelease(vsetBucketHashtable(bucket)); + break; + case VSET_BUCKET_RAX: + raxFreeWithCallback(vsetBucketRax(bucket), freeVsetBucket); + break; + default: + panic("Unknown volatile set type in freeVsetBucket"); + } +} + +static bool splitBucketIfPossible(vsetBucket *parent, vsetGetExpiryFunc getExpiry, vsetBucket *bucket, long long bucket_ts, raxNode *node) { + /* We can only split vector encoded buckets */ + if (vsetBucketType(bucket) != VSET_BUCKET_VECTOR) { + return false; + } + size_t key_len; + long long target_bucket_ts = bucket_ts; + unsigned char key[VSET_BUCKET_KEY_LEN] = {0}; + vsetBucket *new_bucket = vsetBucketFromNone(); + pVector *pv = vsetBucketVector(bucket); + rax *expiry_buckets = vsetBucketRax(parent); + /* first lets sort the vector. we cannot take a decision without it. + * We set the global expiry getter so we can sort according to the provided getExpiry function. + * TODO: After some thought I think it might be better to avoid sorting and attempt a quickselect. just allocate a new vector with the same size. + * Than scan once and choose a pivot which is the median or average bucket_ts. Then move all entries smaller to the new vector. then shrink both vectors as needed. */ + vsetSetExpiryGetter(getExpiry); + pvSort(pv, vsetCompareEntries); + vsetUnsetExpiryGetter(); + + long long max_bucket_ts = get_bucket_ts(getExpiry(pv->data[pvLen(pv) - 1])); + long long min_bucket_ts = get_bucket_ts(getExpiry(pv->data[0])); + + if (max_bucket_ts < bucket_ts) { + /* In case the bucket is already spanning over a larger window than needed, just place the bucket in a new place */ + key_len = encodeExpiryKey(bucket_ts, key); + assert(raxRemove(expiry_buckets, key, key_len, (void **)&new_bucket)); + assert(new_bucket == bucket); + target_bucket_ts = max_bucket_ts; + + } else if (min_bucket_ts != max_bucket_ts) { + /* lets split the bucket. we know we can do it. */ + uint32_t split_index = findSplitPosition(getExpiry, bucket, &target_bucket_ts); + assert(target_bucket_ts < bucket_ts); + assert(split_index != pvLen(pv)); /* no way to split it ??? */ + pVector *new_bucket_vector = vsetBucketVector(bucket); + bucket = vsetBucketFromVector(pvSplit(&new_bucket_vector, split_index)); + new_bucket = vsetBucketFromVector(new_bucket_vector); + assert(pvLen(vsetBucketVector(new_bucket)) > 0); + assert(pvLen(vsetBucketVector(bucket)) > 0); + /* modify the current bucket data pointer */ + key_len = encodeExpiryKey(bucket_ts, key); + /* In order to avoid rax override, we directly change the node data */ + // alternative: raxInsert(*set, key, key_len, bucket, NULL); + raxSetData(node, bucket); + + } else { + /* We cannot split the bucket. just return false */ + return false; + } + /* We change the current bucket position OR we split it, either way we have a new bucket to insert. */ + key_len = encodeExpiryKey(target_bucket_ts, key); + raxInsert(expiry_buckets, key, key_len, new_bucket, NULL); + return true; +} + +static inline vsetBucket *insertToBucket_NONE(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, void *entry, long long expiry) { + UNUSED(getExpiry); + UNUSED(expiry); + UNUSED(bucket); + return vsetBucketFromSingle(entry); +} + +static inline vsetBucket *insertToBucket_SINGLE(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, void *entry, long long expiry) { + /* Upgrade to vector */ + pVector *pv = pvNew(2); + void *curr_entry = vsetBucketSingle(bucket); + long long curr_expiry = getExpiry(curr_entry); + if (curr_expiry < expiry) { + pv = pvPush(pv, curr_entry); + pv = pvPush(pv, entry); + } else { + pv = pvPush(pv, entry); + pv = pvPush(pv, curr_entry); + } + bucket = vsetBucketFromVector(pv); + return bucket; +} + +static inline vsetBucket *insertToBucket_VECTOR(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, void *entry, long long expiry, int pos) { + UNUSED(getExpiry); + UNUSED(expiry); + pVector *pv = vsetBucketVector(bucket); + /* limit of the number of elements in a vector. */ + if (pvLen(pv) >= VOLATILESET_VECTOR_BUCKET_MAX_SIZE) { + // Upgrade to hashtable + hashtable *ht = hashtableCreate(&pointerHashtableType); + for (uint32_t i = 0; i < pvLen(pv); i++) { + hashtableAdd(ht, pvGet(pv, i)); + } + pvFree(pv); + /* Add the new entry as well */ + hashtableAdd(ht, entry); + + return vsetBucketFromHashtable(ht); + } else { + if (pos >= 0) + /* In case we are explicitly provided a position to insert place the entry there */ + return vsetBucketFromVector(pvInsertAt(pv, entry, pos)); + else + /* Otherwise it is better to just push the entry to the vector with less change of memmove and reallocation. */ + return vsetBucketFromVector(pvPush(pv, entry)); + } + return vsetBucketFromNone(); +} + +static inline vsetBucket *insertToBucket_HASHTABLE(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, void *entry, long long expiry) { + UNUSED(getExpiry); + UNUSED(expiry); + + hashtable *ht = vsetBucketHashtable(bucket); + assert(hashtableAdd(ht, entry)); + return bucket; +} + +static inline vsetBucket *insertToBucket_RAX(vsetGetExpiryFunc getExpiry, vsetBucket *target, void *entry, long long expiry) { + unsigned char key[VSET_BUCKET_KEY_LEN] = {0}; + size_t key_len; + long long bucket_ts; + rax *expiry_buckets = vsetBucketRax(target); + raxNode *node; + vsetBucket *bucket = findBucket(expiry_buckets, expiry, key, &key_len, &bucket_ts, &node); + int type = vsetBucketType(bucket); + if (type == VSET_BUCKET_NONE) { + /* No bucket: create single-entry bucket */ + bucket = insertToBucket_NONE(getExpiry, bucket, entry, expiry); + assert(vsetBucketType(bucket) == VSET_BUCKET_SINGLE); + size_t key_size = encodeNewExpiryBucketKey(key, expiry); + raxInsert(expiry_buckets, key, key_size, bucket, NULL); + return target; + } else if (type == VSET_BUCKET_SINGLE) { + /* Upgrade to vector */ + bucket = insertToBucket_SINGLE(getExpiry, bucket, entry, expiry); + assert(vsetBucketType(bucket) == VSET_BUCKET_VECTOR); + /* In order to avoid rax override, we directly change the node data */ + // alternative: raxInsert(expiry_buckets, key, key_len, bucket, NULL); + raxSetData(node, bucket); + } else if (type == VSET_BUCKET_VECTOR) { + pVector *pv = vsetBucketVector(bucket); + if (pvLen(pv) == VOLATILESET_VECTOR_BUCKET_MAX_SIZE) { + /* Try to split the bucket. If not possible switch to hashtable encoding. */ + if (!splitBucketIfPossible(target, getExpiry, bucket, bucket_ts, node)) { + /* Can't split? insrt to the vector anyway, it will just expand to hashtable */ + bucket = insertToBucket_VECTOR(getExpiry, bucket, entry, expiry, -1); + assert(vsetBucketType(bucket) == VSET_BUCKET_HT); + /* In order to avoid rax override, we directly change the node data */ + // alternative raxInsert(expiry_buckets, key, key_len, bucket, NULL); + raxSetData(node, bucket); + } else { + /* we split the bucket. go and find again a bucket to place the entry since there can be new options now. */ + return insertToBucket_RAX(getExpiry, target, entry, expiry); + } + } else { + vsetBucket *new_bucket = insertToBucket_VECTOR(getExpiry, bucket, entry, expiry, -1); + if (new_bucket != bucket) + /* In order to avoid rax override, we directly change the node data */ + // alternative: raxInsert(expiry_buckets, key, key_len, new_bucket, NULL); + raxSetData(node, new_bucket); + } + } else if (vsetBucketType(bucket) == VSET_BUCKET_HT) { + bucket = insertToBucket_HASHTABLE(getExpiry, bucket, entry, expiry); + } else { + panic("Unknown bucket type in insertToBucket_RAX"); + } + return target; +} + +static inline vsetBucket *removeFromBucket_SINGLE(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, void *entry, long long expiry, bool *removed) { + UNUSED(getExpiry); + UNUSED(expiry); + + if (vsetBucketSingle(bucket) == entry) { + *removed = true; + return vsetBucketFromNone(); + } else { + *removed = false; + return bucket; + } +} + +static inline vsetBucket *removeFromBucket_VECTOR(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, void *entry, long long expiry, bool *removed, bool pop) { + UNUSED(getExpiry); + UNUSED(expiry); + + vsetBucket *new_bucket = bucket; + bool success = false; + pVector *pv = vsetBucketVector(bucket); + /* In case we we removed the entry */ + uint32_t vlen = pvLen(pv); + if (vlen <= 2) { + /* convert to single if needed */ + uint32_t idx = pvFind(pv, entry); + if (idx == vlen) { + success = false; + } else { + if (vlen == 1) + new_bucket = vsetBucketFromNone(); + else + new_bucket = vsetBucketFromSingle(pvGet(pv, idx == 0 ? 1 : 0)); + success = true; + pvFree(pv); + } + } else { + /* pop is a more efficient way to remove an element from the vector. However it may + * change the order of the elements in the vector, so we should ask the user to indicate if to use pop or not. */ + if (pop) { + uint32_t idx = pvFind(pv, entry); + if (idx < vlen) { + void *popped_entry = NULL; + pvSwap(pv, idx, pvLen(pv) - 1); + success = true; + new_bucket = vsetBucketFromVector(pvPop(pv, &popped_entry)); + assert(popped_entry == entry); + } + } else { + pv = pvRemove(pv, entry, &success); + if (success) + new_bucket = vsetBucketFromVector(pv); + } + } + if (removed) *removed = success; + return new_bucket; +} + +static inline vsetBucket *removeFromBucket_HASHTABLE(vsetGetExpiryFunc getExpiry, vsetBucket *bucket, void *entry, long long expiry, bool *removed) { + UNUSED(getExpiry); + UNUSED(expiry); + + bool success = false; + vsetBucket *new_bucket = bucket; + hashtable *ht = vsetBucketHashtable(bucket); + if (hashtableDelete(ht, entry)) { + success = true; + assert(hashtableSize(ht) > 0); + if (hashtableSize(ht) == 1) { + // Downgrade to SINGLE + hashtableIterator hi; + hashtableInitIterator(&hi, ht, 0); + void *ptr; + hashtableNext(&hi, &ptr); + hashtableRelease(ht); + new_bucket = vsetBucketFromSingle(ptr); + } + } + if (removed) *removed = success; + return new_bucket; +} +static bool removeEntryFromRaxBucket(vsetBucket *rax_bucket, vsetGetExpiryFunc getExpiry, void *entry, vsetBucket *bucket, unsigned char *key, size_t key_len, vsetBucket **pbucket, raxNode *node) { + bool removed = false; + switch (vsetBucketType(bucket)) { + case VSET_BUCKET_SINGLE: + bucket = removeFromBucket_SINGLE(getExpiry, bucket, entry, 0, &removed); + if (removed) { + raxRemove(vsetBucketRax(rax_bucket), key, key_len, NULL); + if (pbucket) *pbucket = vsetBucketFromNone(); + } + break; + case VSET_BUCKET_VECTOR: { + vsetBucket *new_bucket = removeFromBucket_VECTOR(getExpiry, bucket, entry, 0, &removed, true); + if (new_bucket != bucket) { + if (vsetBucketType(new_bucket) == VSET_BUCKET_NONE) { + raxRemove(vsetBucketRax(rax_bucket), key, key_len, NULL); + if (pbucket) *pbucket = vsetBucketFromNone(); + } else { + /* In order to avoid rax override, we directly change the node data */ + // alternative: raxInsert(*set, key, key_len, new_bucket, NULL); + raxSetData(node, new_bucket); + if (pbucket) *pbucket = new_bucket; + } + } + break; + } + case VSET_BUCKET_HT: { + vsetBucket *new_bucket = removeFromBucket_HASHTABLE(getExpiry, bucket, entry, 0, &removed); + if (new_bucket != bucket) + /* In order to avoid rax override, we directly change the node data */ + // alternative: raxInsert(*set, key, key_len, bucket, NULL); + raxSetData(node, new_bucket); + + if (pbucket) *pbucket = new_bucket; + break; + } + default: + panic("Unknown bucket type for removeEntryFromRaxBucket"); + return false; + } + return removed; +} + +static inline bool shrinkRaxBucketIfPossible(vsetBucket **target, vsetGetExpiryFunc getExpiry) { + rax *expiry_buckets = vsetBucketRax(*target); + if (raxSize(expiry_buckets) == 1) { + raxIterator it; + raxStart(&it, expiry_buckets); + assert(raxSeek(&it, "^", NULL, 0)); + assert(raxNext(&it)); + vsetBucket *bucket = it.data; + int bucket_type = vsetBucketType(bucket); + raxStop(&it); + /* We will not convert hashtable to our only bucket since we will lose the ability to scan the items in a sorted way. + * We will also not shrink when we have a full vector, since it might immediately be repopulated. */ + if (bucket_type == VSET_BUCKET_SINGLE || + (bucket_type == VSET_BUCKET_VECTOR && pvLen(vsetBucketVector(bucket)) < VOLATILESET_VECTOR_BUCKET_MAX_SIZE)) { + if (bucket_type == VSET_BUCKET_VECTOR) { + pVector *pv = vsetBucketVector(bucket); + /* first lets sort the vector. we cannot set the target bucket as unsorted vector bucket */ + vsetSetExpiryGetter(getExpiry); + pvSort(pv, vsetCompareEntries); + vsetUnsetExpiryGetter(); + } + /* lets make our bucket to be the only left bucket */ + *target = bucket; + raxFree(expiry_buckets); + return true; + } + } + return false; +} + +static inline vsetBucket *removeFromBucket_RAX(vsetGetExpiryFunc getExpiry, vsetBucket *target, void *entry, long long expiry, bool *removed) { + unsigned char key[VSET_BUCKET_KEY_LEN] = {0}; + long long bucket_ts; + size_t key_len; + raxNode *node; + rax *expiry_buckets = vsetBucketRax(target); + vsetBucket *bucket = findBucket(expiry_buckets, expiry, key, &key_len, &bucket_ts, &node); + assert(bucket != VSET_NONE_BUCKET_PTR); + bool success = removeEntryFromRaxBucket(target, getExpiry, entry, bucket, key, key_len, NULL, node); + if (removed) *removed = success; + // shrink to single bucket if possible + shrinkRaxBucketIfPossible(&target, getExpiry); + return target; +} + +static inline size_t vsetBucketRemoveExpired_NONE(vsetBucket **bucket, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx) { + UNUSED(bucket); + UNUSED(getExpiry); + UNUSED(expiryFunc); + UNUSED(now); + UNUSED(max_count); + UNUSED(ctx); + return 0; +} + +static inline size_t vsetBucketRemoveExpired_SINGLE(vsetBucket **bucket, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx) { + void *entry = vsetBucketSingle(*bucket); + if (max_count && getExpiry(entry) <= now) { + freeVsetBucket(*bucket); + *bucket = vsetBucketFromNone(); + if (expiryFunc) expiryFunc(entry, ctx); + return 1; + } + return 0; +} + +static inline size_t vsetBucketRemoveExpired_VECTOR(vsetBucket **bucket, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx) { + pVector *pv = vsetBucketVector(*bucket); + uint32_t len = min(pvLen(pv), max_count); + uint32_t i = 0; + for (; i < len; i++) { + void *entry = pvGet(pv, i); + /* break as soon as the expiryFunc stops us OR we reached an entry which is not expired */ + if (getExpiry(entry) > now) + break; + if (expiryFunc) expiryFunc(entry, ctx); + } + pVector *new_pv = pvSplit(&pv, i); + *bucket = (new_pv ? vsetBucketFromVector(new_pv) : vsetBucketFromNone()); + pvFree(pv); + return i; +} + +static inline size_t vsetBucketRemoveExpired_HASHTABLE(vsetBucket **bucket, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx) { + UNUSED(getExpiry); + UNUSED(now); + hashtable *ht = vsetBucketHashtable(*bucket); + hashtableIterator it; + void *entry; + size_t count = 0; + hashtableInitIterator(&it, ht, HASHTABLE_ITER_SAFE); + while (count < max_count && hashtableNext(&it, &entry)) { + assert(hashtableDelete(ht, entry)); + expiryFunc(entry, ctx); + count++; + } + hashtableResetIterator(&it); + + /* in case we completed scanning the hashtable or a single element is left, we can convert the hashtable. */ + size_t ht_size = hashtableSize(ht); + if (ht_size == 0) { + hashtableRelease(ht); + *bucket = vsetBucketFromNone(); + } else if (ht_size == 1) { + assert(entry); + *bucket = vsetBucketFromSingle(entry); + } + return count; +} + +static inline size_t vsetBucketRemoveExpired_RAX(vsetBucket **bucket, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx) { + UNUSED(getExpiry); + rax *buckets = vsetBucketRax(*bucket); + size_t count = 0; + while (count < max_count && raxSize(buckets) > 0) { + raxIterator it; + raxStart(&it, buckets); + raxSeek(&it, "^", NULL, 0); + assert(raxNext(&it)); + /* lets start again by going into the first bucket. */ + unsigned char key[VSET_BUCKET_KEY_LEN] = {0}; + vsetBucket *time_bucket = it.data; + int time_bucket_type = vsetBucketType(time_bucket); + long long time_bucket_ts = decodeExpiryKey(it.key); + memcpy(key, it.key, it.key_len); + size_t key_len = it.key_len; + raxNode *node = it.node; + raxStop(&it); + if (time_bucket_ts > now) + break; + switch (time_bucket_type) { + case VSET_BUCKET_SINGLE: + count += vsetBucketRemoveExpired_SINGLE(&time_bucket, vsetGetExpiryZero, expiryFunc, now, max_count - count, ctx); + break; + case VSET_BUCKET_VECTOR: + count += vsetBucketRemoveExpired_VECTOR(&time_bucket, vsetGetExpiryZero, expiryFunc, now, max_count - count, ctx); + break; + case VSET_BUCKET_HT: + count += vsetBucketRemoveExpired_HASHTABLE(&time_bucket, vsetGetExpiryZero, expiryFunc, now, max_count - count, ctx); + break; + default: + panic("Cannot expire entries from bucket which is not single, vector or hashtable"); + } + if (time_bucket == VSET_NONE_BUCKET_PTR) { + /* in case the bucket is freed, we can just remove it and continue to the next bucket. */ + raxRemove(buckets, key, key_len, NULL); + } else { + /* in case the bucket still exists, it must be since we reached the max_count or stopped due to expiry function. + * So we save the new bucket to the rax and bail. */ + raxSetData(node, time_bucket); + break; + } + } + /* if all buckets are removed, */ + if (raxSize(buckets) == 0) { + raxFree(buckets); + *bucket = vsetBucketFromNone(); + } else { + shrinkRaxBucketIfPossible(bucket, getExpiry); + } + return count; +} + +static int vsetBucketNext_NONE(vsetInternalIterator *it, void **entryptr) { + UNUSED(it); + UNUSED(entryptr); + return 0; +} + +static inline int vsetBucketNext_SINGLE(vsetInternalIterator *it, void **entryptr) { + bool init_bucket_scan = (it->iteration_state == VSET_BUCKET_NONE); + if (init_bucket_scan) { + it->iteration_state = VSET_BUCKET_SINGLE; + it->entry = vsetBucketSingle(it->bucket); + if (entryptr) *entryptr = it->entry; + return 1; + } + return 0; +} + +static inline int vsetBucketNext_VECTOR(vsetInternalIterator *it, void **entryptr) { + bool init_bucket_scan = (it->iteration_state == VSET_BUCKET_NONE); + pVector *pv = vsetBucketVector(it->bucket); + if (init_bucket_scan) { + it->iteration_state = VSET_BUCKET_VECTOR; + it->viter = 0; + } else { + it->viter++; + } + if (it->viter < pvLen(pv)) { + it->entry = pvGet(pv, it->viter); + } else { + return 0; + } + if (entryptr) *entryptr = it->entry; + return 1; +} + +static inline int vsetBucketNext_HASHTABLE(vsetInternalIterator *it, void **entryptr) { + bool init_bucket_scan = (it->iteration_state == VSET_BUCKET_NONE); + hashtable *ht = vsetBucketHashtable(it->bucket); + if (init_bucket_scan) { + it->iteration_state = VSET_BUCKET_HT; + hashtableInitIterator(&it->hiter, ht, 0); + } + if (!hashtableNext(&it->hiter, &it->entry)) { + hashtableResetIterator(&it->hiter); + return 0; + } + if (entryptr) *entryptr = it->entry; + return 1; +} + +static inline int vsetBucketNext_RAX(vsetInternalIterator *it, void **entryptr) { + bool init_bucket_scan = (it->iteration_state == VSET_BUCKET_NONE); + if (init_bucket_scan) { + /* set myself as the parent bucket */ + it->parent_bucket = it->bucket; + raxStart(&it->riter, vsetBucketRax(it->bucket)); + raxSeek(&it->riter, "^", NULL, 0); + } + if (raxNext(&it->riter)) { + /* lets start again by going into the first bucket. */ + it->iteration_state = vsetBucketType(it->riter.data); + it->bucket_ts = decodeExpiryKey(it->riter.key); + it->bucket = it->riter.data; + it->iteration_state = VSET_BUCKET_NONE; + return vsetNext(opaqueFromIterator(it), entryptr); + } else { + /* We currently do not support nested RAX buckets */ + it->parent_bucket = vsetBucketFromNone(); + return 0; + } + return 1; +} + +static inline size_t vsetBucketMemUsage_NONE(vsetBucket *bucket) { + UNUSED(bucket); + return 0; +} + +static inline size_t vsetBucketMemUsage_SINGLE(vsetBucket *bucket) { + UNUSED(bucket); + return 0; +} + +static inline size_t vsetBucketMemUsage_VECTOR(vsetBucket *bucket) { + pVector *pv = vsetBucketVector(bucket); + assert(pv); + return pv->alloc; +} + +static inline size_t vsetBucketMemUsage_HASHTABLE(vsetBucket *bucket) { + hashtable *ht = vsetBucketHashtable(bucket); + return hashtableMemUsage(ht); +} + +static inline size_t vsetBucketMemUsage_RAX(vsetBucket *bucket) { + rax *r = vsetBucketRax(bucket); + size_t total_mem = raxAllocSize(r); + raxIterator it; + raxStart(&it, r); + assert(raxSeek(&it, "^", NULL, 0)); + while (raxNext(&it)) { + switch (vsetBucketType(it.data)) { + case VSET_BUCKET_NONE: + total_mem += vsetBucketMemUsage_NONE(it.data); + break; + case VSET_BUCKET_SINGLE: + total_mem += vsetBucketMemUsage_SINGLE(it.data); + break; + case VSET_BUCKET_VECTOR: + total_mem += vsetBucketMemUsage_VECTOR(it.data); + break; + case VSET_BUCKET_HT: + total_mem += vsetBucketMemUsage_HASHTABLE(it.data); + break; + default: + panic("Unknown bucket type encountered in vsetBucketMemUsage_HASHTABLE"); + } + } + raxStop(&it); + return total_mem; +} + +/* Adds an entry to a volatile set (vset) based on its expiration time. + * + * The volatile set maintains buckets of entries grouped by time windows. Each + * entry is inserted into an appropriate bucket based on its expiry timestamp. + * Buckets are memory-efficient and use dynamic representations that evolve as + * the number of entries grows: + * + * - VSET_BUCKET_NONE: + * Indicates the set is empty. A new SINGLE bucket is created to hold the entry. + * + * - VSET_BUCKET_SINGLE: + * Holds a single entry directly. Upon inserting a second entry, the bucket + * is promoted to a VECTOR, preserving the sorted order. + * + * - VSET_BUCKET_VECTOR: + * Stores entries in a compact, sorted vector. The maximum size is 127 entries. + * If inserting a new entry exceeds the limit: + * - If all entries share the same bucket timestamp (same high-resolution time window), + * the entire vector is moved into a RAX bucket as a single node. + * - Otherwise, each vector entry is redistributed into the new RAX structure. + * + * - VSET_BUCKET_RAX: + * A radix tree (RAX) used for scalable management of multiple time-based buckets. + * Entries are inserted by computing their bucket key based on their expiration timestamp. + * + * The function uses the entry’s expiration time (provided via the getExpiry function) + * to determine the correct bucket. It promotes bucket types as needed to maintain + * sorted and efficient storage. + * + * In all cases, if the insertion causes a structural change (e.g., bucket promotion), + * the pointer to the root of the bucket tree is updated via the `set` pointer. + * + * This function always returns true, as insertion is guaranteed to succeed + * (barring internal memory allocation failure, which is outside its concern). + * + * Notes: + * - Buckets are upgraded in-place based on size and time span distribution. + * - Vector buckets allow binary search insertion to maintain order. + * - Tagged pointers are used to determine bucket types efficiently. + * - It is assumed that all entries have odd-valued pointers (LSB set). + * - Key encoding in RAX is based on the maximum expiration timestamp + * that falls within a fixed window granularity. + * + * Example: + * vset *myset = NULL; + * vsetAddEntry(&myset, extract_expiry, my_object); + * + * // Internally, my_object is placed into the appropriate bucket. */ +bool vsetAddEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry) { + long long expiry = getExpiry(entry); + vsetBucket *expiry_buckets = *set; + assert(expiry_buckets); + int bucket_type = vsetBucketType(expiry_buckets); + switch (bucket_type) { + case VSET_BUCKET_NONE: + expiry_buckets = insertToBucket_NONE(getExpiry, expiry_buckets, entry, expiry); + break; + case VSET_BUCKET_SINGLE: + expiry_buckets = insertToBucket_SINGLE(getExpiry, expiry_buckets, entry, expiry); + break; + case VSET_BUCKET_VECTOR: { + pVector *vec = vsetBucketVector(expiry_buckets); + uint32_t len = pvLen(vec); + /* in case the vector is full, we need to turn into RAX */ + if (len == VOLATILESET_VECTOR_BUCKET_MAX_SIZE) { + rax *r = raxNew(); + long long min_expiry = getExpiry(pvGet(vec, 0)); + long long max_expiry = getExpiry(pvGet(vec, len - 1)); + if (get_max_bucket_ts(min_expiry) == get_max_bucket_ts(max_expiry)) { + /* In case we can just insert the bucket, no need to iterate and insert it's elements. we can just push the bucket as a whole. */ + unsigned char key[VSET_BUCKET_KEY_LEN] = {0}; + size_t key_len = encodeNewExpiryBucketKey(key, max_expiry); + raxInsert(r, key, key_len, expiry_buckets, NULL); + expiry_buckets = vsetBucketFromRax(r); + expiry_buckets = insertToBucket_RAX(getExpiry, expiry_buckets, entry, expiry); + } else { + /* We need to migrate entries to the new set of buckets since we do not know all entries are in the same bucket */ + expiry_buckets = vsetBucketFromRax(r); + for (uint32_t i = 0; i < len; i++) { + void *moved_entry = pvGet(vec, i); + expiry_buckets = insertToBucket_RAX(getExpiry, expiry_buckets, moved_entry, getExpiry(moved_entry)); + } + /* free the vector */ + pvFree(vec); + /* now insert the new entry to the buckets */ + expiry_buckets = insertToBucket_RAX(getExpiry, expiry_buckets, entry, expiry); + } + } else { + uint32_t pos = findInsertPosition(getExpiry, expiry_buckets, expiry); + expiry_buckets = insertToBucket_VECTOR(getExpiry, expiry_buckets, entry, expiry, pos); + } + break; + } + case VSET_BUCKET_RAX: + expiry_buckets = insertToBucket_RAX(getExpiry, expiry_buckets, entry, expiry); + break; + default: + panic("Cannot insert to bucket which is not single, vector or rax"); + } + /* update the set */ + *set = expiry_buckets; + return true; +} + +static inline bool vsetRemoveEntryWithExpiry(vset *set, vsetGetExpiryFunc getExpiry, void *entry, long long expiry) { + bool removed; + vsetBucket *bucket = *set; + assert(bucket); + int bucket_type = vsetBucketType(bucket); + switch (bucket_type) { + case VSET_BUCKET_NONE: + /* We cannot remove from empty set */ + return false; + case VSET_BUCKET_SINGLE: + bucket = removeFromBucket_SINGLE(getExpiry, bucket, entry, expiry, &removed); + break; + case VSET_BUCKET_VECTOR: + bucket = removeFromBucket_VECTOR(getExpiry, bucket, entry, expiry, &removed, false); + break; + case VSET_BUCKET_HT: + bucket = removeFromBucket_HASHTABLE(getExpiry, bucket, entry, expiry, &removed); + break; + case VSET_BUCKET_RAX: + bucket = removeFromBucket_RAX(getExpiry, bucket, entry, expiry, &removed); + break; + default: + panic("Cannot remove from bucket which is not single, vector, hashtable or rax"); + } + *set = bucket; + return removed; +} + +/* Removes an entry from the volatile set (vset), based on its expiration time. + * + * The volatile set organizes entries into time-based buckets of varying types: + * SINGLE, VECTOR, or RAX. The bucket type determines how entries are stored + * and managed internally. This function will locate and remove the entry + * from its appropriate bucket. + * + * The removal process works as follows: + * + * 1. The expiration timestamp of the entry is used to compute which bucket + * (based on its end time) the entry should reside in. + * + * 2. Depending on the current top-level bucket type of the vset, the function + * dispatches to the appropriate removal handler: + * + * - VSET_BUCKET_SINGLE: + * If the stored entry matches, the bucket is set to NONE. + * + * - VSET_BUCKET_VECTOR: + * Performs a binary search to find and remove the entry from the vector. + * If the resulting vector size drops to 1, it is converted to a SINGLE bucket. + * If the vector becomes empty, it is removed entirely (set to NONE). + * + * - VSET_BUCKET_RAX: + * The function decodes the appropriate bucket key (based on the expiration + * time), looks up the RAX node, and dispatches removal to the sub-bucket. + * If a sub-bucket becomes empty or has only one entry left, its bucket + * type may be downgraded (e.g., to SINGLE or removed). + * + * 3. If the removal results in a structural change (e.g., shrinking a bucket), + * the bucket type may be changed, and the root pointer is updated accordingly. + * + * 4. If the entry is not found in the expected bucket, no action is taken. + * + * Notes: + * - Buckets self-adjust during removal for memory efficiency. + * - The vector bucket keeps entries sorted for fast search/removal. + * - RAX-based sets support a large number of buckets and scale well + * with many time windows. + * - Entries are assumed to have pointer identity (odd-valued pointers). + * - Correct expiration timestamp must be provided for accurate removal. + * + * Return value: + * Returns true if the entry was found and removed successfully. + * Returns false if the entry was not found. + * + * Example usage: + * vsetRemoveEntry(myset, extract_expiry, my_object); + * + * // my_object is removed from the appropriate bucket in myset BUT is not freed. */ +bool vsetRemoveEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry) { + return vsetRemoveEntryWithExpiry(set, getExpiry, entry, getExpiry(entry)); +} + +static inline vsetBucket *vsetBucketUpdateEntry_SINGLE(vsetBucket *bucket, vsetGetExpiryFunc getExpiry, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { + UNUSED(getExpiry); + UNUSED(old_expiry); + UNUSED(new_expiry); + + if (vsetBucketSingle(bucket) == old_entry) { + return vsetBucketFromSingle(new_entry); + } + return vsetBucketFromNone(); +} + +static inline vsetBucket *vsetBucketUpdateEntry_VECTOR(vsetBucket *bucket, vsetGetExpiryFunc getExpiry, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { + UNUSED(getExpiry); + UNUSED(old_expiry); + UNUSED(new_expiry); + + pVector *pv = vsetBucketVector(bucket); + uint32_t idx = pvFind(pv, old_entry); + /* in case we did not locate the entry, just return NONE bucket */ + if (idx == pvLen(pv)) + return vsetBucketFromNone(); + pvSet(pv, idx, new_entry); + return bucket; +} + +static inline vsetBucket *vsetBucketUpdateEntry_HASHTABLE(vsetBucket *bucket, vsetGetExpiryFunc getExpiry, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { + UNUSED(getExpiry); + UNUSED(old_expiry); + UNUSED(new_expiry); + + /* In this case no need to change anything. */ + if (old_entry == new_entry) + return bucket; + + hashtablePosition pos; + hashtable *ht = vsetBucketHashtable(bucket); + /* We do a two stage pop in order to avoid rehashing. */ + void **ref = hashtableTwoPhasePopFindRef(ht, old_entry, &pos); + if (!ref) { + /* In case no entry found, the rehashing did not pause, so it is safe to return. */ + return vsetBucketFromNone(); + } else { + /* We know for sure the two entries are not the same, so it is safe to add the new and remove the old */ + assert(hashtableAdd(ht, new_entry)); + hashtableTwoPhasePopDelete(ht, &pos); + } + return bucket; +} + +static inline vsetBucket *vsetBucketUpdateEntry_RAX(vsetBucket *target, vsetGetExpiryFunc getExpiry, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { + unsigned char key[VSET_BUCKET_KEY_LEN] = {0}; + size_t key_len; + long long bucket_ts; + rax *expiry_buckets = vsetBucketRax(target); + raxNode *node; + /* In case new and old are to be updated in the same bucket - just update the bucket. */ + bool update_bucket = (get_bucket_ts(old_expiry) == get_bucket_ts(new_expiry)); + vsetBucket *bucket = findBucket(expiry_buckets, old_expiry, key, &key_len, &bucket_ts, &node); + + if (!update_bucket) { + /* if the old and new entries are in different buckets, remove the old entry and add the new one. */ + if (removeEntryFromRaxBucket(target, getExpiry, old_entry, bucket, key, key_len, NULL, node)) + target = insertToBucket_RAX(getExpiry, target, new_entry, new_expiry); + else + return vsetBucketFromNone(); + } else { + /* Just update the current bucket */ + switch (vsetBucketType(bucket)) { + case VSET_BUCKET_NONE: + /* No bucket means there is no such old entry. return NONE */ + return vsetBucketFromNone(); + case VSET_BUCKET_SINGLE: + bucket = vsetBucketUpdateEntry_SINGLE(bucket, getExpiry, old_entry, new_entry, old_expiry, new_expiry); + break; + case VSET_BUCKET_VECTOR: + bucket = vsetBucketUpdateEntry_VECTOR(bucket, getExpiry, old_entry, new_entry, old_expiry, new_expiry); + break; + case VSET_BUCKET_HT: + bucket = vsetBucketUpdateEntry_HASHTABLE(bucket, getExpiry, old_entry, new_entry, old_expiry, new_expiry); + break; + default: + panic("Unknown bucket type to update entry"); + } + if (bucket) + raxSetData(node, bucket); + else + return vsetBucketFromNone(); + } + return target; +} + +/** + * Updates an existing entry in the volatile set (vset), optionally replacing it + * with a new entry and expiration time. + * + * This function provides a unified interface for removing an old entry and + * adding a new one. It supports three main cases: + * + * 1. Entry identity or expiry time didn't change: + * If the `old_entry` and `new_entry` are the same, and their expiration + * timestamps are also equal, the function returns early with no action taken. + * + * 2. Removal of the old entry: + * If `old_entry` is provided (i.e., not NULL) and its old expiration time + * is valid (`old_expiry != -1`), the function will remove it from the set. + * + * Note: Since the object might already be deallocated (or changed), the + * expiration time is passed explicitly as an argument, rather than + * relying on `getExpiry(old_entry)` which might not be safe to call. + * + * 3. Insertion of the new entry: + * If `new_entry` is provided (i.e., not NULL) and its new expiration time + * is valid (`new_expiry != -1`), the function will insert it into the set. + * + * The function assumes both `vsetRemoveEntryWithExpiry()` and + * `vsetAddEntry()` succeed. It uses assertions to enforce this at runtime, + * assuming this function is used in trusted code paths. + * + * Notes: + * - The update is not atomic. If the removal fails (assertion fails), + * insertion of the new entry does not occur. + * - If the new entry is the same as the old one, but the expiry changed, + * the entry is effectively reinserted in the correct bucket. + * - This is useful for renewal or replacement logic where entries may + * need to change time buckets due to updated TTLs or key mutation. + * + * Return value: + * Always returns true on success. + * In case of assertion failures, the program will abort. + * + * Example usage: + * vsetUpdateEntry(myset, getExpiry, old_ptr, new_ptr, old_ts, new_ts); + */ +bool vsetUpdateEntry(vset *set, vsetGetExpiryFunc getExpiry, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { + assert(*set); + /* Nothing to do */ + if (old_entry == new_entry && old_expiry == new_expiry) + return true; + vsetBucket *updated = vsetBucketFromNone(); + /* case 1 - both entries were tracked. update the bucket */ + if (old_entry && old_expiry != -1 && new_entry && new_expiry != -1) { + switch (vsetBucketType(*set)) { + case VSET_BUCKET_NONE: + return false; + case VSET_BUCKET_SINGLE: + updated = vsetBucketUpdateEntry_SINGLE(*set, getExpiry, old_entry, new_entry, old_expiry, new_expiry); + break; + case VSET_BUCKET_VECTOR: + if (old_expiry != new_expiry) { + /* NOTE! - in this specific case we might have changed the vector order - need to sort it again (NLogN) */ + /* or remove it from the vector and re-add it (N+LogN). the later also looks cleaner... */ + if (!vsetRemoveEntryWithExpiry(set, getExpiry, old_entry, old_expiry)) + return false; + return vsetAddEntry(set, getExpiry, new_entry); + } + /* We are just updating the entry ref, so sorting is not impacted */ + updated = vsetBucketUpdateEntry_VECTOR(*set, getExpiry, old_entry, new_entry, old_expiry, new_expiry); + break; + + case VSET_BUCKET_RAX: + updated = vsetBucketUpdateEntry_RAX(*set, getExpiry, old_entry, new_entry, old_expiry, new_expiry); + } + if (updated == VSET_NONE_BUCKET_PTR) + return false; + *set = updated; + return true; + } + /* case 2 - old entry was not tracked. just add the new entry */ + else if ((!old_entry || old_expiry == -1) && new_entry && new_expiry != -1) + return vsetAddEntry(set, getExpiry, new_entry); + /* case 3 - old entry was tracked. new entry is not. just remove the old entry */ + else if ((!new_entry || new_expiry == -1) && old_entry && old_expiry != -1) + /* We cannot take the expiration time from the removed entry, since it might not be allocated anymore. + * For this reason we ask the API user to provide us the removed entry expiration time. */ + return vsetRemoveEntryWithExpiry(set, getExpiry, old_entry, old_expiry); + else + return false; + + return false; +} + +/* vsetPopExpired - Remove expired entries from a volatile set up to a maximum count. + * + * Parameters: + * set: Pointer to the volatile set (vset *) to operate on. + * getExpiry: Function to retrieve the expiration time from an entry. + * expiryFunc: Function to call on each expired entry (e.g., to free or notify). + * now: Current time in milliseconds used to compare against expiry times. + * max_count: Maximum number of expired entries to remove. + * ctx: Opaque context pointer passed through to the expiryFunc callback. + * + * This function delegates expiration popping to a type-specific handler based on the + * internal bucket type of the set. It supports various bucket encodings: + * - NONE + * - SINGLE + * - VECTOR + * - RAX (radix tree) + * - HT (hashtable) + * + * Returns the number of expired entries successfully removed (and passed to expiryFunc). + * + * Panics if the bucket type is unknown or unsupported. + * + * Return: + * Number of expired entries removed (size_t). */ +size_t vsetRemoveExpired(vset *set, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx) { + vsetBucket *bucket = *set; + int bucket_type = vsetBucketType(bucket); + switch (bucket_type) { + case VSET_BUCKET_NONE: + return vsetBucketRemoveExpired_NONE(set, getExpiry, expiryFunc, now, max_count, ctx); + break; + case VSET_BUCKET_RAX: + return vsetBucketRemoveExpired_RAX(set, getExpiry, expiryFunc, now, max_count, ctx); + break; + case VSET_BUCKET_SINGLE: + return vsetBucketRemoveExpired_SINGLE(set, getExpiry, expiryFunc, now, max_count, ctx); + break; + case VSET_BUCKET_VECTOR: + return vsetBucketRemoveExpired_VECTOR(set, getExpiry, expiryFunc, now, max_count, ctx); + break; + case VSET_BUCKET_HT: + return vsetBucketRemoveExpired_HASHTABLE(set, getExpiry, expiryFunc, now, max_count, ctx); + break; + default: + panic("Unknown volatile set bucket type in vsetPopExpired"); + } + return 0; +} + +/* vsetEstimatedEarliestExpiry - Estimate the earliest expiration time in a volatile set. + * + * Parameters: + * set: Pointer to the volatile set (vset *) to inspect. + * getExpiry: Callback function used to extract the expiration time from a set entry. + * + * Returns the earliest expiration time based on the structure of the volatile set. + * This is an *approximate* value: + * - For bucketed types (e.g., radix tree, vector), it returns the expiry of the first bucket or entry, + * which may not be the actual earliest expiring item. + * - For single-entry sets, it returns the expiry of the sole item. + * - For VSET_BUCKET_NONE, it returns -1 to indicate there is no data. + * + * Supported bucket types: + * - VSET_BUCKET_SINGLE + * - VSET_BUCKET_VECTOR + * - VSET_BUCKET_RAX + * + * Panics if called with an unsupported bucket type. + * + * Return: + * Estimated earliest expiry time in milliseconds, or -1 if the set is empty. */ +long long vsetEstimatedEarliestExpiry(vset *set, vsetGetExpiryFunc getExpiry) { + int set_type = vsetBucketType(*set); + void *entry = NULL; + long long expiry; + switch (set_type) { + case VSET_BUCKET_NONE: + return -1; + break; + case VSET_BUCKET_RAX: { + rax *r = vsetBucketRax(set); + raxIterator it; + raxStart(&it, r); + expiry = decodeExpiryKey(it.key); + raxStop(&it); + break; + } + case VSET_BUCKET_SINGLE: { + entry = vsetBucketSingle(*set); + expiry = getExpiry(entry); + break; + } + case VSET_BUCKET_VECTOR: { + entry = pvGet(vsetBucketVector(*set), 0); + expiry = getExpiry(entry); + break; + } + default: + panic("Unsupported vset encoding type. Only supported types are single, vector or rax"); + } + return expiry; +} + +/* Advances the volatile set iterator to the next entry. + * + * This function handles iteration over various bucket types in the set. It attempts + * to return the next valid entry, updating the iterator state accordingly. + * + * If the current bucket is exhausted, the iterator automatically switches back to + * the parent bucket (typically used when iterating nested structures, such as RAX buckets). + * + * Parameters: + * - it: Pointer to an initialized vsetInternalIterator. + * - entryptr: Output pointer to receive the next entry. + * + * Returns: + * - true if a next entry is found. + * - false if iteration is complete. */ +bool vsetNext(vsetIterator *iter, void **entryptr) { + vsetInternalIterator *it = iteratorFromOpaque(iter); + vsetBucket *bucket = it->bucket; + int bucket_type = vsetBucketType(bucket); + int ret = 0; + switch (bucket_type) { + case VSET_BUCKET_NONE: + return vsetBucketNext_NONE(it, entryptr); + break; + case VSET_BUCKET_RAX: + return vsetBucketNext_RAX(it, entryptr); + break; + case VSET_BUCKET_SINGLE: + ret = vsetBucketNext_SINGLE(it, entryptr); + break; + case VSET_BUCKET_VECTOR: + ret = vsetBucketNext_VECTOR(it, entryptr); + break; + case VSET_BUCKET_HT: + ret = vsetBucketNext_HASHTABLE(it, entryptr); + break; + default: + panic("Unknown volatile set bucket type in vsetNext"); + } + if (ret == 0) { + /* continue iterating the parent bucket */ + it->iteration_state = vsetBucketType(it->parent_bucket); + it->bucket = it->parent_bucket; + return vsetNext(opaqueFromIterator(it), entryptr); + } + return ret == 1; +} + +size_t vsetMemUsage(vset *set) { + int bucket_type = vsetBucketType(*set); + switch (bucket_type) { + case VSET_BUCKET_NONE: + return vsetBucketMemUsage_NONE(*set); + case VSET_BUCKET_SINGLE: + return vsetBucketMemUsage_SINGLE(*set); + case VSET_BUCKET_VECTOR: + return vsetBucketMemUsage_VECTOR(*set); + case VSET_BUCKET_HT: + panic("Unsupported hashtable bucket type for vset"); + case VSET_BUCKET_RAX: + return vsetBucketMemUsage_RAX(*set); + default: + panic("Unknown set type encountered in vsetMemUsage"); + } + return 0; +} + +/* Initializes a volatile set iterator. + * + * This function prepares the iterator for scanning a volatile set from the beginning. + * It sets the internal state, pointing to the main set bucket, and uses VSET_BUCKET_NONE + * as an initial placeholder to transition correctly into the actual bucket logic. + * + * Parameters: + * - set: Pointer to the volatile set to iterate. + * - it: Pointer to a vsetInternalIterator structure to initialize. */ +void vsetInitIterator(vset *set, vsetIterator *iter) { + vsetInternalIterator *it = iteratorFromOpaque(iter); + it->iteration_state = VSET_BUCKET_NONE; /*lets start by going to the first bucket. */ + it->bucket = *set; + it->bucket_ts = -1; + it->parent_bucket = vsetBucketFromNone(); +} + +/* Finalizes and cleans up an active volatile set iterator. + * + * Some internal iterators (e.g., RAX, hashtable) allocate temporary state. + * This function ensures proper cleanup of those structures when the iteration is done. + * + * Parameters: + * - it: Pointer to the vsetInternalIterator that was previously initialized with vsetInitIterator(). */ +void vsetResetIterator(vsetIterator *iter) { + vsetInternalIterator *it = iteratorFromOpaque(iter); + int bucket_type = vsetBucketType(it->bucket); + int parent_bucket_type = vsetBucketType(it->parent_bucket); + if (parent_bucket_type == VSET_BUCKET_RAX) + raxStop(&it->riter); + if (bucket_type == VSET_BUCKET_HT) + hashtableResetIterator(&it->hiter); +} + +/* Initializes an empty volatile set. + * + * The function sets the set to its initial state by assigning a "NONE" bucket. + * This is the starting point for all volatile sets before entries are inserted. + * + * Parameters: + * - set: Pointer to the volatile set to initialize. */ +void vsetInit(vset *set) { + *set = vsetBucketFromNone(); +} + +/* Clears the volatile set, freeing all memory used for internal buckets. + * + * This function deallocates all internal data structures used by the set (buckets, vectors, + * hash tables, etc.). It does NOT free the entries themselves, since the set only holds + * references. + * + * After this call, the set is reset to an empty state. + * + * Parameters: + * - set: Pointer to the volatile set to clear. */ +void vsetClear(vset *set) { + if (*set == VSET_NONE_BUCKET_PTR) return; + freeVsetBucket(*set); + *set = vsetBucketFromNone(); +} + +/* Same as calling vsetClear, but also de-initialize the set. + * After this call you will have to call vsetInit again in order to continue using the set. */ +void vsetRelease(vset *set) { + vsetClear(set); + *set = NULL; +} + +/* Return true in case this set is an initialized set and false otherwise. */ +bool vsetIsValid(vset *set) { + if (set && *set) { + switch (vsetBucketType(*set)) { + case VSET_BUCKET_NONE: + case VSET_BUCKET_SINGLE: + case VSET_BUCKET_VECTOR: + case VSET_BUCKET_HT: + case VSET_BUCKET_RAX: + return true; + } + } + return false; +} + +/* Checks whether a volatile set is empty. + * + * This function simply checks if the set's current bucket type is VSET_BUCKET_NONE. + * + * Parameters: + * - set: Pointer to the volatile set. + * + * Returns: + * - true if the set contains no entries. + * - false otherwise. */ +bool vsetIsEmpty(vset *set) { + assert(*set); + return vsetBucketType(*set) == VSET_BUCKET_NONE; +} + +/**************** Defrag Logic *********************/ +static struct vsetDefragState { + long long bucket_ts; + size_t bucket_cursor; +} defragState; + +static size_t vsetBucketDefrag_VECTOR(vsetBucket **bucket, size_t cursor, void *(*defragfn)(void *)) { + UNUSED(cursor); + pVector *pv = vsetBucketVector(*bucket); + pv = defragfn(pv); + if (pv) + *bucket = vsetBucketFromVector(pv); + return 0; +} + +static size_t vsetBucketDefrag_HASHTABLE(vsetBucket **bucket, size_t cursor, void *(*defragfn)(void *)) { + hashtable *ht = vsetBucketHashtable(*bucket); + if (cursor == 0) { + /* First time we enter this hashtable, defrag the tables first. */ + hashtable *new_ht = hashtableDefragTables(ht, defragfn); + if (new_ht) { + ht = new_ht; + *bucket = vsetBucketFromHashtable(ht); + } + } + return hashtableScanDefrag(ht, cursor, NULL, NULL, defragfn, 0); +} + +static size_t vsetBucketDefrag_RAX(vsetBucket **bucket, size_t cursor, void *(*defragfn)(void *), int (*defragRaxNode)(raxNode **)) { + struct vsetDefragState *state = (struct vsetDefragState *)cursor; + size_t bucket_cursor = 0; + unsigned char key[VSET_BUCKET_KEY_LEN] = {0}; + size_t key_len; + long long bucket_ts; + rax *r = vsetBucketRax(*bucket); + raxIterator ri; + + /* init the state if this is the first time we enter the bucket */ + if (!state) { + state = &defragState; + state->bucket_ts = -1; + state->bucket_cursor = 0; + if ((r = defragfn(r))) *bucket = vsetBucketFromRax(r); + r = vsetBucketRax(*bucket); + } + raxStart(&ri, r); + ri.node_cb = defragRaxNode; + if (state->bucket_ts < 0) { + /* No prev timestamp, meaning we are starting a new RAX bucket scan */ + assert(raxSeek(&ri, "^", NULL, 0)); + assert(raxNext(&ri)); /* there MUST be at least one bucket! */ + bucket_ts = decodeExpiryKey(ri.key); + } else { + /* we are continuing a RAX bucket scan. lets try and locate the last scanned bucket. + * If not found we can search for the next one. */ + key_len = encodeExpiryKey(state->bucket_ts, key); + if (state->bucket_cursor) { + /* We were in the middle of scanning a bucket. lets try and continue there. + * It is possible that this bucket was deleted. if so we will get to a new bucket + * which is also fine. */ + assert(raxSeek(&ri, ">=", key, key_len)); + } else { + /* in case we completed the last bucket, lets progress to a later bucket */ + assert(raxSeek(&ri, ">", key, key_len)); + } + /* in case we reached the end of the RAX, we are done. */ + if (!raxNext(&ri)) { + return 0; + } + bucket_ts = decodeExpiryKey(ri.key); + if (state->bucket_ts != bucket_ts) { + /* if this is a new bucket, lets start from the beginning */ + bucket_cursor = 0; + } else { + bucket_cursor = state->bucket_cursor; + } + } + raxStop(&ri); + vsetBucket *time_bucket = ri.data; + switch (vsetBucketType(time_bucket)) { + case VSET_BUCKET_NONE: + case VSET_BUCKET_SINGLE: + bucket_cursor = 0; + break; + case VSET_BUCKET_VECTOR: + bucket_cursor = vsetBucketDefrag_VECTOR(&time_bucket, bucket_cursor, defragfn); + if (time_bucket != ri.data) + raxSetData(ri.node, time_bucket); + break; + case VSET_BUCKET_HT: + bucket_cursor = vsetBucketDefrag_HASHTABLE(&time_bucket, bucket_cursor, defragfn); + if (time_bucket != ri.data) + raxSetData(ri.node, time_bucket); + break; + default: + panic("Unsupported vset bucket type for RAX bucket. Only supported types are single, vector or hashtable"); + } + /* if we reached here, we are not done. lets return the state and next time we can continue from this bucket. */ + state->bucket_ts = bucket_ts; + state->bucket_cursor = bucket_cursor; + return (size_t)state; +} + +size_t vsetScanDefrag(vset *set, size_t cursor, void *(*defragfn)(void *), int (*defragRaxNode)(raxNode **)) { + switch (vsetBucketType(*set)) { + case VSET_BUCKET_NONE: + case VSET_BUCKET_SINGLE: + /* nothing to do */ + return 0; + case VSET_BUCKET_VECTOR: + return vsetBucketDefrag_VECTOR(set, cursor, defragfn); + case VSET_BUCKET_RAX: + return vsetBucketDefrag_RAX(set, cursor, defragfn, defragRaxNode); + default: + panic("Unknown vset node type to defrag"); + } + return 0; +} diff --git a/src/vset.h b/src/vset.h new file mode 100644 index 00000000000..7349aa46ed1 --- /dev/null +++ b/src/vset.h @@ -0,0 +1,97 @@ +#ifndef VOLATILESET_H +#define VOLATILESET_H + +#include +#include + +#include "hashtable.h" +#include "rax.h" +#include "sds.h" +#include "monotonic.h" /* for mstime_t*/ + +/* + *----------------------------------------------------------------------------- + * Volatile Set - Adaptive, Expiry-aware Set Structure + *----------------------------------------------------------------------------- + * + * The `vset` is a dynamic, memory-efficient container for managing + * entries with expiry semantics. It is designed to efficiently track entries + * that expire at varying times and scales to large sets by adapting its internal + * representation as it grows or shrinks. + * + *----------------------------------------------------------------------------- + * Public API + *----------------------------------------------------------------------------- + * + * Create/Free: + * vsetInit(vset *set) - used in order to initialize a new vset. + * void vsetClear(vset *set) - used in order to empty all the data in a vset. + * void vsetRelease(vset *set) - just like vsetClear, but also release the set itself so it will become unusable. + * and will require a new call to vsetInit in order to continue using the set. + * Example: + * vset set; + * vsetInit(&set); + * // add some elements to the vset + * vsetClear(&set); + * // verify the set is empty: + * assert(vsetIsEmpty(&set)); + * + * Mutation: + * bool vsetAddEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry) - used in order to insert a new entry into the set. + * The API also make use of the provided getExpiry function in order to compare the 'entry' expiration time of the other existing + * entries in the set. + * + * bool vsetRemoveEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry) - used in order to remove and entry from the set. + * + * bool vsetUpdateEntry(vset *set, vsetGetExpiryFunc getExpiry, void *old_entry, + * void *new_entry, long long old_expiry, + * long long new_expiry) - is used in order to update an existing entry in the set. + * Note that the implementation assumes the 'old_entry' might not point to a valid memory location, thus it require that the 'old_expiry' + * is provided and matches the old entry expiration time. + * + * Expiry Retrieval/Removal: + * long long vsetEstimatedEarliestExpiry(vset *set, vsetGetExpiryFunc getExpiry) - will return an estimation to the lowest expiry time of + * the entries which currently exists in the set. Because of the semi-sorted ordering this implementation is using, the returned value MIGHT not be the 'real' minimum + * but rather some value which is the maximum among a group of entries which are all close or equal to the 'real' minimum. + * + * size_t vsetRemoveExpired(vset *set, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx) - can be used + * in order to remove up to max_count entries from the vset. The removed entries will all satisfy the condition that their expiration time is smaller than the provided now. + * Note that there are no guarantees about the order to the entries. + * + * Utilities: + * bool vsetIsEmpty(vset *set) - used in order to check if a given set has any entries. + * + * Iteration: + * void vsetInitIterator(vset *set, vsetIterator *it) - used to initialize a new vset iterator. + * bool vsetNext(vsetIterator *it, void **entryptr) - used to iterate to the next element. Will return false if there are no more elements. + * void vsetResetIterator(vsetIterator *it) - used in order to reset the iterator at the end of the iteration. + * + * Note that the vset iterator is NOT safe, Meaning you should not change the set while iterating it. Adding entries and/or removing entries + * can result in unexpected behavior.! */ + +/* Return the absolute expiration time in milliseconds for the provided entry */ +typedef long long (*vsetGetExpiryFunc)(const void *entry); +/* Callback to be optionally provided to vsetPopExpired. when item is removed from the vset this callback will also be applied. */ +typedef int (*vsetExpiryFunc)(void *entry, void *ctx); +// vset is just a pointer to a bucket +typedef void *vset; + +typedef uint8_t vsetIterator[560]; + +bool vsetAddEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry); +bool vsetRemoveEntry(vset *set, vsetGetExpiryFunc getExpiry, void *entry); +bool vsetUpdateEntry(vset *set, vsetGetExpiryFunc getExpiry, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry); +bool vsetIsEmpty(vset *set); +void vsetInitIterator(vset *set, vsetIterator *it); +bool vsetNext(vsetIterator *it, void **entryptr); +void vsetResetIterator(vsetIterator *it); +void vsetInit(vset *set); +void vsetClear(vset *set); +void vsetRelease(vset *set); +bool vsetIsValid(vset *set); +long long vsetEstimatedEarliestExpiry(vset *set, vsetGetExpiryFunc getExpiry); +size_t vsetRemoveExpired(vset *set, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx); +size_t vsetMemUsage(vset *set); +size_t vsetScanDefrag(vset *set, size_t cursor, void *(*defragfn)(void *), int (*defragRaxNode)(raxNode **)); + +#endif From 4319edbe5ca565a4ad5532cf263901f0b3cf0ff7 Mon Sep 17 00:00:00 2001 From: Ran Shidlansik Date: Tue, 5 Aug 2025 11:31:50 +0300 Subject: [PATCH 3/3] Add ACTIVE-EXPIRY and ACTIVE-DEFRAG for hash objects with volatile items This change adds support for active expiration of hash fields with TTLs (Hash Field Expiration), building on the existing key-level expiry system. Field TTL metadata is tracked in volatile sets associated with each hash key. Expired fields are reclaimed incrementally by the active expiration loop, using a new job type to alternate between key expiry and field expiry within the same logic and effort budget. Both key and field expiration now share the same scheduler infrastructure. Alternating job types ensures fairness and avoids starvation, while keeping CPU usage predictable. +-----------------+ | DB | +-----------------+ | v +---------------------+ | myhash | (key with TTL) +---------------------+ | v +------------------------------------+ | fields (hashType) | | - field1 | | - field2 | | - fieldN | +------------------------------------+ | v +------------------------------------+ | volatile set (field-level TTL) | | - field1 expires at T1 | | - field5 expires at T5 | +------------------------------------+ No new configuration was introduced; the existing active-expire-effort and time budget are reused for both key and field expiry. Also active defrag for volatile sets is added. Signed-off-by: Ran Shidlansik --- src/aof.c | 2 +- src/db.c | 171 +++- src/defrag.c | 69 +- src/expire.c | 263 +++-- src/expire.h | 30 +- src/lazyfree.c | 7 +- src/module.c | 1 + src/monotonic.h | 2 - src/object.c | 5 + src/rdb.c | 6 +- src/server.c | 36 +- src/server.h | 44 +- src/t_hash.c | 192 +++- src/trace/README.md | 2 + src/trace/trace_db.h | 20 + src/unit/test_quicklist.c | 15 - src/unit/test_vset.c | 13 +- src/util.c | 17 +- src/util.h | 5 + src/vset.c | 39 +- src/vset.h | 7 +- tests/unit/hashexpire.tcl | 1898 ++++++++++++++++++++++++++++++++++++- 22 files changed, 2568 insertions(+), 276 deletions(-) diff --git a/src/aof.c b/src/aof.c index 567acdf60cf..7188d7e5171 100644 --- a/src/aof.c +++ b/src/aof.c @@ -1957,7 +1957,7 @@ int rewriteHashObject(rio *r, robj *key, robj *o) { hashTypeIterator hi; long long count = 0, volatile_items = 0, non_volatile_items; /* First serialize volatile items if exist */ - if (hashTypeHasVolatileElements(o)) { + if (hashTypeHasVolatileFields(o)) { hashTypeInitVolatileIterator(o, &hi); while (hashTypeNext(&hi) != C_ERR) { long long expiry = entryGetExpiry(hi.next); diff --git a/src/db.c b/src/db.c index 9350bd56cc6..cf61aec3bb8 100644 --- a/src/db.c +++ b/src/db.c @@ -185,6 +185,18 @@ robj *lookupKeyWriteOrReply(client *c, robj *key, robj *reply) { return o; } +/* For hash keys, checks if they contain volatile items and updates tracking accordingly. + * Always accesses the tracking kvstore, even if the tracking state doesn't change. */ +void dbUpdateObjectWithVolatileItemsTracking(serverDb *db, robj *o) { + if (o->type == OBJ_HASH) { + if (hashTypeHasVolatileFields(o)) { + dbTrackKeyWithVolatileItems(db, o); + } else { + dbUntrackKeyWithVolatileItems(db, o); + } + } +} + /* Add a key-value entry to the DB. * * A copy of 'key' is stored in the database. The caller must ensure the @@ -217,6 +229,9 @@ static void dbAddInternal(serverDb *db, robj *key, robj **valref, int update_if_ /* Not existing. Convert val to valkey object and insert. */ robj *val = *valref; val = objectSetKeyAndExpire(val, key->ptr, -1); + /* Track hash object if it has volatile fields (for active expiry). + * For example, this is needed when a hash is moved to a new DB (e.g. MOVE). */ + dbTrackKeyWithVolatileItems(db, val); initObjectLRUOrLFU(val); kvstoreHashtableAdd(db->keys, dict_index, val); signalKeyAsReady(db, key, val->type); @@ -284,6 +299,10 @@ int dbAddRDBLoad(serverDb *db, sds key, robj **valref) { val = objectSetKeyAndExpire(val, key, -1); kvstoreHashtableInsertAtPosition(db->keys, dict_index, val, &pos); initObjectLRUOrLFU(val); + + /* Track hash objects containing volatile items, created by rdbLoadObject (which lacks DB context). */ + dbTrackKeyWithVolatileItems(db, val); + *valref = val; return 1; } @@ -483,6 +502,11 @@ int dbGenericDeleteWithDictIndex(serverDb *db, robj *key, int async, int flags, debugServerAssert(0 == kvstoreHashtableDelete(db->expires, dict_index, key->ptr)); } + /* If deleting a hash object, un-track it from the volatile items tracking if it contains volatile items.*/ + if (val->type == OBJ_HASH && hashTypeHasVolatileFields(val)) { + dbUntrackKeyWithVolatileItems(db, val); + } + if (async) { freeObjAsync(key, val, db->id); } else { @@ -501,6 +525,20 @@ int dbGenericDelete(serverDb *db, robj *key, int async, int flags) { return dbGenericDeleteWithDictIndex(db, key, async, flags, dict_index); } +/* Add a key with volatile items to the tracking kvstore. */ +void dbTrackKeyWithVolatileItems(serverDb *db, robj *o) { + if (o->type == OBJ_HASH && hashTypeHasVolatileFields(o)) { + int dict_index = getKVStoreIndexForKey(objectGetKey(o)); + kvstoreHashtableAdd(db->keys_with_volatile_items, dict_index, o); + } +} + +/* Delete a key from the keys with volatile entries tracking kvstore */ +void dbUntrackKeyWithVolatileItems(serverDb *db, robj *o) { + int dict_index = getKVStoreIndexForKey(objectGetKey(o)); + kvstoreHashtableDelete(db->keys_with_volatile_items, dict_index, objectGetKey(o)); +} + /* Delete a key, value, and associated expiration entry if any, from the DB */ int dbSyncDelete(serverDb *db, robj *key) { return dbGenericDelete(db, key, 0, DB_FLAG_KEY_DELETED); @@ -556,6 +594,17 @@ robj *dbUnshareStringValue(serverDb *db, robj *key, robj *o) { return o; } +/* Reset the expiry tracking state of a database. + * + * This clears the `expiry` array, which holds per-expiry-type + * data such as average TTL (for stats) and scan cursors used by + * the active expiration cycle. + * + * Should be called whenever the database is emptied or reinitialized. */ +void resetDbExpiryState(serverDb *db) { + memset(db->expiry, 0, sizeof(db->expiry)); +} + /* Remove all keys from the database(s) structure. The dbarray argument * may not be the server main DBs (could be a temporary DB). * @@ -582,10 +631,10 @@ long long emptyDbStructure(serverDb **dbarray, int dbnum, int async, void(callba } else { kvstoreEmpty(dbarray[j]->keys, callback); kvstoreEmpty(dbarray[j]->expires, callback); + kvstoreEmpty(dbarray[j]->keys_with_volatile_items, callback); } /* Because all keys of database are removed, reset average ttl. */ - dbarray[j]->avg_ttl = 0; - dbarray[j]->expires_cursor = 0; + resetDbExpiryState(dbarray[j]); } return removed; @@ -651,6 +700,7 @@ void discardTempDb(serverDb **tempDb) { if (tempDb[i]) { kvstoreRelease(tempDb[i]->keys); kvstoreRelease(tempDb[i]->expires); + kvstoreRelease(tempDb[i]->keys_with_volatile_items); /* These are expected to be empty on temporary databases */ serverAssert(dictSize(tempDb[i]->blocking_keys) == 0); @@ -1628,6 +1678,15 @@ void scanDatabaseForDeletedKeys(serverDb *emptied, serverDb *replaced_with) { dictReleaseIterator(di); } +/* Copy expiry tracking state from one DB to another. + * + * This copies the `expiry` array, which contains per-expiry-type + * metadata such as the average TTL (for stats) and the active + * expiry scan cursor. */ +static void copyDbExpiry(serverDb *target, const serverDb *source) { + memcpy(target->expiry, source->expiry, sizeof(target->expiry)); +} + /* Swap two databases at runtime so that all clients will magically see * the new database even if already connected. Note that the client * structure c->db points to a given DB, so we need to be smarter and @@ -1657,13 +1716,14 @@ int dbSwapDatabases(int id1, int id2) { * remain in the same DB they were. */ db1->keys = db2->keys; db1->expires = db2->expires; - db1->avg_ttl = db2->avg_ttl; - db1->expires_cursor = db2->expires_cursor; + db1->keys_with_volatile_items = db2->keys_with_volatile_items; + copyDbExpiry(db1, db2); + db2->keys = aux.keys; db2->expires = aux.expires; - db2->avg_ttl = aux.avg_ttl; - db2->expires_cursor = aux.expires_cursor; + db2->keys_with_volatile_items = aux.keys_with_volatile_items; + copyDbExpiry(db2, &aux); /* Now we need to handle clients blocked on lists: as an effect * of swapping the two DBs, a client that was waiting for list @@ -1702,13 +1762,13 @@ void swapMainDbWithTempDb(serverDb **tempDb) { * remain in the same DB they were. */ activedb->keys = newdb->keys; activedb->expires = newdb->expires; - activedb->avg_ttl = newdb->avg_ttl; - activedb->expires_cursor = newdb->expires_cursor; + activedb->keys_with_volatile_items = newdb->keys_with_volatile_items; + copyDbExpiry(activedb, newdb); newdb->keys = aux.keys; newdb->expires = aux.expires; - newdb->avg_ttl = aux.avg_ttl; - newdb->expires_cursor = aux.expires_cursor; + newdb->keys_with_volatile_items = aux.keys_with_volatile_items; + copyDbExpiry(newdb, &aux); /* Now we need to handle clients blocked on lists: as an effect * of swapping the two DBs, a client that was waiting for list @@ -1786,7 +1846,15 @@ robj *setExpire(client *c, serverDb *db, robj *key, long long when) { serverAssertWithInfo(NULL, key, valref != NULL); val = *valref; long long old_when = objectGetExpire(val); + robj *newval = objectSetExpire(val, when); + if (newval->type == OBJ_HASH && hashTypeHasVolatileFields(newval)) { + /* Replace the pointer in the keys_with_volatile_items table without accessing the old pointer. */ + int dict_index = getKVStoreIndexForKey(objectGetKey(newval)); + hashtable *volatile_items_ht = kvstoreGetHashtable(db->keys_with_volatile_items, dict_index); + int replaced = hashtableReplaceReallocatedEntry(volatile_items_ht, val, newval); + serverAssert(replaced); + } if (old_when != -1) { /* Val already had an expire field, so it was not reallocated. */ serverAssert(newval == val); @@ -1890,6 +1958,89 @@ void propagateDeletion(serverDb *db, robj *key, int lazy) { server.replication_allowed = prev_replication_allowed; } +static const size_t EXPIRE_BULK_LIMIT = 1024; /* Maximum number of fields to active-expire (per replicated HDEL command */ + +/* Propagate HDEL commands for deleted hash fields to AOF and replicas. + * + * This function builds and propagates a single HDEL command with multiple fields + * for the given hash object `o`. It temporarily enables replication (if needed), + * constructs the command using the field names, and sends it via alsoPropagate(). */ +static void propagateFieldsDeletion(serverDb *db, robj *o, size_t n_fields, robj *fields[]) { + int prev_replication_allowed = server.replication_allowed; + server.replication_allowed = 1; + + robj *argv[EXPIRE_BULK_LIMIT + 2]; /* HDEL + key + fields */ + int argc = 0; + robj *keyobj = createStringObjectFromSds(objectGetKey(o)); + argv[argc++] = shared.hdel; // HDEL command + argv[argc++] = keyobj; // key name + for (size_t i = 0; i < n_fields; i++) { + // field to delete + argv[argc++] = fields[i]; + } + + alsoPropagate(db->id, argv, argc, PROPAGATE_AOF | PROPAGATE_REPL); + server.replication_allowed = prev_replication_allowed; + for (int i = 0; i < argc; i++) { + decrRefCount(argv[i]); + } +} + +/* Process expired fields for a hash delete them and propagate changes to replicas and AOF. + * + * This routine: + * - iteratively identifies expired hash fields from the volatile set (batching up to 1024 at a time) + * - deletes the expired fields + * - deletes the entire key if the hash becomes empty + * - propagates HDEL commands for deleted fields if the key remains, or DEL if the key is fully deleted + * + * Batching avoids large stack allocations while allowing max_entries to be arbitrarily large. + * Returns the total number of expired fields removed. */ +size_t dbReclaimExpiredFields(robj *o, serverDb *db, mstime_t now, unsigned long max_entries) { + size_t total_expired = 0; + bool deleteKey = false; + + while (max_entries > 0) { + /* Process in batches to avoid large stack allocations. */ + unsigned long batch_size = max_entries > EXPIRE_BULK_LIMIT ? EXPIRE_BULK_LIMIT : max_entries; + robj *entries[EXPIRE_BULK_LIMIT]; + size_t expired = hashTypeDeleteExpiredFields(o, now, batch_size, entries); + if (expired == 0) break; + + /* Clean up volatile set if no more volatile fields remain */ + if (!hashTypeHasVolatileFields(o)) { + dbUntrackKeyWithVolatileItems(db, o); + } + + /* Check if key is now empty after removing expired fields */ + deleteKey = hashTypeLength(o) == 0; + + enterExecutionUnit(1, 0); + robj *keyobj = createStringObjectFromSds(objectGetKey(o)); + /* Note that even though if might have been more efficient to only propagate del in case the key has no more items left, + * we must keep consistency in order to allow the replica to report hdel notifications before del. */ + propagateFieldsDeletion(db, o, expired, entries); + notifyKeyspaceEvent(NOTIFY_EXPIRED, "hexpired", keyobj, db->id); + if (deleteKey) { + dbDelete(db, keyobj); + propagateDeletion(db, keyobj, server.lazyfree_lazy_expire); + notifyKeyspaceEvent(NOTIFY_GENERIC, "del", keyobj, db->id); + } else { + if (!hashTypeHasVolatileFields(o)) dbUntrackKeyWithVolatileItems(db, o); + } + signalModifiedKey(NULL, db, keyobj); + exitExecutionUnit(); + postExecutionUnitOperations(); + decrRefCount(keyobj); + + total_expired += expired; + max_entries -= expired; + if (deleteKey) break; /* Stop if key was deleted */ + } + + return total_expired; +} + /* Use this instead of keyIsExpired if you already have the value object. */ static int objectIsExpired(robj *val) { /* Don't expire anything while loading. It will be done later. */ diff --git a/src/defrag.c b/src/defrag.c index 8eb0e32accd..b4ce13e2543 100644 --- a/src/defrag.c +++ b/src/defrag.c @@ -39,7 +39,6 @@ */ #include "server.h" -#include "entry.h" #include "hashtable.h" #include "eval.h" #include "script.h" @@ -197,7 +196,7 @@ void *activeDefragAlloc(void *ptr) { * Returns NULL in case the allocation wasn't moved. * When it returns a non-null value, the old pointer was already released * and should NOT be accessed. */ -static sds activeDefragSds(sds sdsptr) { +sds activeDefragSds(sds sdsptr) { void *ptr = sdsAllocPtr(sdsptr); void *newptr = activeDefragAlloc(ptr); if (newptr) { @@ -442,28 +441,9 @@ static void scanLaterSet(robj *ob, unsigned long *cursor) { *cursor = hashtableScanDefrag(ht, *cursor, activeDefragSdsHashtableCallback, NULL, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF); } -/* Hashtable scan callback for hash datatype */ -static void activeDefragEntry(void *privdata, void *element_ref) { - entry **entry_ref = (entry **)element_ref; - entry *old_entry = *entry_ref, *new_entry = NULL; - long long old_expiry = entryGetExpiry(old_entry); - - new_entry = entryDefrag(*entry_ref, activeDefragAlloc, activeDefragSds); - if (new_entry) { - /* In case the entry is tracked we need to update it in the volatile set */ - if (entryHasExpiry(new_entry)) { - robj *obj = (robj *)privdata; - serverAssert(obj); - hashTypeTrackUpdateEntry(obj, old_entry, new_entry, old_expiry, entryGetExpiry(new_entry)); - } - *entry_ref = new_entry; - } -} - static void scanLaterHash(robj *ob, unsigned long *cursor) { serverAssert(ob->type == OBJ_HASH && ob->encoding == OBJ_ENCODING_HASHTABLE); - hashtable *ht = ob->ptr; - *cursor = hashtableScanDefrag(ht, *cursor, activeDefragEntry, ob, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF); + *cursor = hashTypeScanDefrag(ob, *cursor, activeDefragAlloc); } static void defragQuicklist(robj *ob) { @@ -500,20 +480,24 @@ static void defragZsetSkiplist(robj *ob) { } } +/* Defragment a hash object. + * + * Large hashtable-encoded hashes are deferred via `defrag_later`. + * Smaller ones are defragmented immediately, possibly over multiple passes. + * Listpack-encoded hashes are always handled in a single pass. */ static void defragHash(robj *ob) { - serverAssert(ob->type == OBJ_HASH && ob->encoding == OBJ_ENCODING_HASHTABLE); hashtable *ht = ob->ptr; - if (hashtableSize(ht) > server.active_defrag_max_scan_fields) { + if (ob->encoding == OBJ_ENCODING_HASHTABLE && hashtableSize(ht) > server.active_defrag_max_scan_fields) { + /* Large hashtable-encoded hashes are deferred via `defrag_later` */ defragLater(ob); } else { + /* Smaller hashtables are defragmented immediately, possibly over multiple passes. + * Listpack-encoded hashes are always handled in a single pass in hashTypeScanDefrag. */ unsigned long cursor = 0; do { - cursor = hashtableScanDefrag(ht, cursor, activeDefragEntry, ob, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF); + cursor = hashTypeScanDefrag(ob, cursor, activeDefragAlloc); } while (cursor != 0); } - /* defrag the hashtable struct and tables */ - hashtable *new_hashtable = hashtableDefragTables(ht, activeDefragAlloc); - if (new_hashtable) ob->ptr = new_hashtable; } static void defragSet(robj *ob) { @@ -710,6 +694,13 @@ static void defragKey(defragKeysCtx *ctx, robj **elemref) { int replaced = hashtableReplaceReallocatedEntry(expires_ht, ob, newob); serverAssert(replaced); } + if (newob->type == OBJ_HASH && hashTypeHasVolatileFields(newob)) { + /* Check if this is a hash object containing volatile fields. + * and update keys_with_volatile_items after defrag. */ + hashtable *keys_with_volatile_items_ht = kvstoreGetHashtable(db->keys_with_volatile_items, slot); + int replaced = hashtableReplaceReallocatedEntry(keys_with_volatile_items_ht, ob, newob); + serverAssert(replaced); + } ob = newob; } @@ -741,13 +732,7 @@ static void defragKey(defragKeysCtx *ctx, robj **elemref) { serverPanic("Unknown sorted set encoding"); } } else if (ob->type == OBJ_HASH) { - if (ob->encoding == OBJ_ENCODING_LISTPACK) { - if ((newzl = activeDefragAlloc(ob->ptr))) ob->ptr = newzl; - } else if (ob->encoding == OBJ_ENCODING_HASHTABLE) { - defragHash(ob); - } else { - serverPanic("Unknown hash encoding"); - } + defragHash(ob); } else if (ob->type == OBJ_STREAM) { defragStream(ob); } else if (ob->type == OBJ_MODULE) { @@ -973,6 +958,15 @@ static doneStatus defragStageExpiresKvstore(monotime endtime, void *target, void scanHashtableCallbackCountScanned, NULL, NULL); } +// Target is a DBID +static doneStatus defragStageKeysWithvolaItemsKvstore(monotime endtime, void *target, void *privdata) { + UNUSED(privdata); + int dbid = (uintptr_t)target; + serverDb *db = server.db[dbid]; + return defragStageKvstoreHelper(endtime, db->keys_with_volatile_items, + scanHashtableCallbackCountScanned, NULL, NULL); +} + static doneStatus defragStagePubsubKvstore(monotime endtime, void *target, void *privdata) { // target is server.pubsub_channels or server.pubsubshard_channels @@ -1244,6 +1238,7 @@ static void beginDefragCycle(void) { addDefragStage(defragStageDbKeys, (void *)(uintptr_t)dbid, NULL); addDefragStage(defragStageExpiresKvstore, (void *)(uintptr_t)dbid, NULL); + addDefragStage(defragStageKeysWithvolaItemsKvstore, (void *)(uintptr_t)dbid, NULL); } static getClientChannelsFnWrapper getClientPubSubChannelsFn = {getClientPubSubChannels}; @@ -1331,4 +1326,8 @@ robj *activeDefragStringOb(robj *ob) { void defragWhileBlocked(void) { } +sds activeDefragSds(sds sdsptr) { + return sdsptr; +} + #endif diff --git a/src/expire.c b/src/expire.c index d0a465979ec..c8c45134089 100644 --- a/src/expire.c +++ b/src/expire.c @@ -119,10 +119,9 @@ int activeExpireCycleTryExpire(serverDb *db, robj *val, long long now) { #define ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP 20 /* Keys for each DB loop. */ #define ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1000 /* Microseconds. */ #define ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC 25 /* Max % of CPU to use. */ -#define ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE 10 /* % of stale keys after which \ - we do extra efforts. */ +#define ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE 10 /* % of stale keys after which */ -/* Data used by the expire dict scan callback. */ +/* Data used by the key expire kvstore scan callback. */ typedef struct { serverDb *db; long long now; @@ -130,8 +129,17 @@ typedef struct { unsigned long expired; /* num keys expired */ long long ttl_sum; /* sum of ttl for key with ttl not yet expired */ int ttl_samples; /* num keys with ttl not yet expired */ + + /* Entry-specific fields */ + unsigned long max_entries; /* Max number of entries (e.g. fields) to expire during this scan */ + bool has_more_expired_entries; /* True if the hash likely has more fields to expire */ } expireScanData; +typedef struct activeExpireFieldIterator { + int current_db; + unsigned long cursor; /* Cursor for keys with volatile items (field-level TTL) */ +} activeExpireFieldIterator; + void expireScanCallback(void *privdata, void *entry) { robj *val = entry; expireScanData *data = privdata; @@ -149,7 +157,23 @@ void expireScanCallback(void *privdata, void *entry) { data->sampled++; } -static inline int expireShouldSkipTableForSamplingCb(hashtable *ht) { +/* Expires up to `max_entries` fields from a hash with volatile fields. + * Sets `has_more_expired_entries` if more remain. Updates stats. */ +void fieldExpireScanCallback(void *privdata, void *volaKey) { + expireScanData *data = privdata; + robj *o = volaKey; + serverAssert(o); + serverAssert(hashTypeHasVolatileFields(o)); + mstime_t now = server.mstime; + size_t expired_fields = dbReclaimExpiredFields(o, data->db, now, data->max_entries); + if (expired_fields) { + data->has_more_expired_entries = (expired_fields == data->max_entries); + data->expired++; + } + data->sampled++; +} + +static int expireShouldSkipTableForSamplingCb(hashtable *ht) { long long numkeys = hashtableSize(ht); unsigned long buckets = hashtableBuckets(ht); /* When there are less than 1% filled buckets, sampling the key @@ -161,43 +185,44 @@ static inline int expireShouldSkipTableForSamplingCb(hashtable *ht) { return 0; } -void activeExpireCycle(int type) { - /* Adjust the running parameters according to the configured expire - * effort. The default effort is 1, and the maximum configurable effort - * is 10. */ - unsigned long effort = server.active_expire_effort - 1, /* Rescale from 0 to 9. */ - config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP + ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP / 4 * effort, - config_cycle_fast_duration = - ACTIVE_EXPIRE_CYCLE_FAST_DURATION + ACTIVE_EXPIRE_CYCLE_FAST_DURATION / 4 * effort, - config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC + 2 * effort, - config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE - effort; +/* Returns the zero-based active expire effort level. + * + * Internally we use a 0-based effort level (0–9), while the server config + * exposes it as 1–10. This helper normalizes it for internal use. */ +static int activeExpireEffort(void) { + return server.active_expire_effort - 1; +} + +static long long activeExpireCycleJob(enum activeExpiryType jobType, int cycleType, long long timelimit_us) { + if (timelimit_us <= 0) return 0; + + unsigned long config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE - activeExpireEffort(); + unsigned long keys_per_loop = + ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP + ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP / 4 * activeExpireEffort(); /* This function has some global state in order to continue the work * incrementally across calls. */ - static unsigned int current_db = 0; /* Next DB to test. */ - static int timelimit_exit = 0; /* Time limit hit in previous call? */ - static long long last_fast_cycle = 0; /* When last fast cycle ran. */ + typedef struct { + unsigned int current_db; /* Next DB to test. */ + bool timelimit_exit; /* Time limit hit in previous call? */ + } expireState; + static expireState _expire_state[ACTIVE_EXPIRY_TYPE_COUNT] = {0}; // [KEYS, FIELDS] + expireState *state = &_expire_state[jobType]; + double *expired_stale_perc[ACTIVE_EXPIRY_TYPE_COUNT] = { + &server.stat_expired_keys_stale_perc, + &server.stat_expired_keys_with_vola_stale_perc, + }; int j, iteration = 0; int dbs_per_call = CRON_DBS_PER_CALL; int dbs_performed = 0; - long long start = ustime(), timelimit, elapsed; - - /* If 'expire' action is paused, for whatever reason, then don't expire any key. - * Typically, at the end of the pause we will properly expire the key OR we - * will have failed over and the new primary will send us the expire. */ - if (isPausedActionsWithUpdate(PAUSE_ACTION_EXPIRE)) return; + monotime start = getMonotonicUs(); - if (type == ACTIVE_EXPIRE_CYCLE_FAST) { + if (cycleType == ACTIVE_EXPIRE_CYCLE_FAST) { /* Don't start a fast cycle if the previous cycle did not exit * for time limit, unless the percentage of estimated stale keys is - * too high. Also never repeat a fast cycle for the same period - * as the fast cycle total duration itself. */ - if (!timelimit_exit && server.stat_expired_stale_perc < config_cycle_acceptable_stale) return; - - if (start < last_fast_cycle + (long long)config_cycle_fast_duration * 2) return; - - last_fast_cycle = start; + * too high. */ + if (!state->timelimit_exit && *expired_stale_perc[jobType] < config_cycle_acceptable_stale) return 0; } /* We usually should test CRON_DBS_PER_CALL per iteration, with @@ -207,17 +232,9 @@ void activeExpireCycle(int type) { * 2) If last time we hit the time limit, we want to scan all DBs * in this iteration, as there is work to do in some DB and we don't want * expired keys to use memory for too much time. */ - if (dbs_per_call > server.dbnum || timelimit_exit) dbs_per_call = server.dbnum; - - /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU - * time per iteration. Since this function gets called with a frequency of - * server.hz times per second, the following is the max amount of - * microseconds we can spend in this function. */ - timelimit = config_cycle_slow_time_perc * 1000000 / server.hz / 100; - timelimit_exit = 0; - if (timelimit <= 0) timelimit = 1; + if (dbs_per_call > server.dbnum || state->timelimit_exit) dbs_per_call = server.dbnum; - if (type == ACTIVE_EXPIRE_CYCLE_FAST) timelimit = config_cycle_fast_duration; /* in microseconds. */ + state->timelimit_exit = false; /* Accumulate some global stats as we expire keys, to have some idea * about the number of keys that are already logically expired, but still @@ -225,32 +242,48 @@ void activeExpireCycle(int type) { long total_sampled = 0; long total_expired = 0; - /* Try to smoke-out bugs (server.also_propagate should be empty here) */ - serverAssert(server.also_propagate.numops == 0); - /* Stop iteration when one of the following conditions is met: * * 1) We have checked a sufficient number of databases with expiration time. * 2) The time limit has been exceeded. * 3) All databases have been traversed. */ - for (j = 0; dbs_performed < dbs_per_call && timelimit_exit == 0 && j < server.dbnum; j++) { + for (j = 0; dbs_performed < dbs_per_call && state->timelimit_exit == 0 && j < server.dbnum; j++) { /* Scan callback data including expired and checked count per iteration. */ - expireScanData data; + expireScanData data = {0}; + /* Increment the DB now so we are sure if we run out of time + * in the current DB we'll restart from the next. This allows to + * distribute the time evenly across DBs. */ + serverDb *db = server.db[(state->current_db++ % server.dbnum)]; + /* In case the current database is not used we can simply skip to the next database. */ + if (!db) continue; + data.ttl_sum = 0; data.ttl_samples = 0; - - serverDb *db = server.db[(current_db % server.dbnum)]; + data.max_entries = keys_per_loop * 4; data.db = db; int db_done = 0; /* The scan of the current DB is done? */ int update_avg_ttl_times = 0, repeat = 0; - /* Increment the DB now so we are sure if we run out of time - * in the current DB we'll restart from the next. This allows to - * distribute the time evenly across DBs. */ - current_db++; + hashtableScanFunction scan_cb; - if (db && kvstoreSize(db->expires)) dbs_performed++; + kvstore *kvs = NULL; + if (db) { + switch (jobType) { + case KEYS: + kvs = db->expires; + scan_cb = expireScanCallback; + break; + case FIELDS: + kvs = db->keys_with_volatile_items; + scan_cb = fieldExpireScanCallback; + break; + default: + serverPanic("Unknown active expiry job type %d.", jobType); + } + } + + if (db && kvstoreSize(kvs)) dbs_performed++; /* Continue to expire if at the end of the cycle there are still * a big percentage of keys to expire, compared to the number of keys @@ -265,18 +298,18 @@ void activeExpireCycle(int type) { iteration++; /* If there is nothing to expire try next DB ASAP. */ - if ((num = kvstoreSize(db->expires)) == 0) { - db->avg_ttl = 0; + if ((num = kvstoreSize(kvs)) == 0) { + db->expiry[jobType].avg_ttl = 0; break; } - data.now = mstime(); + data.now = server.mstime; /* The main collection cycle. Scan through keys among keys * with an expire set, checking for expired ones. */ data.sampled = 0; data.expired = 0; - if (num > config_keys_per_loop) num = config_keys_per_loop; + if (num > keys_per_loop) num = keys_per_loop; /* Here we access the low level representation of the hash table * for speed concerns: this makes this code coupled with dict.c, @@ -294,9 +327,11 @@ void activeExpireCycle(int type) { int origin_ttl_samples = data.ttl_samples; while (data.sampled < num && checked_buckets < max_buckets) { - db->expires_cursor = kvstoreScan(db->expires, db->expires_cursor, -1, expireScanCallback, - expireShouldSkipTableForSamplingCb, &data); - if (db->expires_cursor == 0) { + unsigned long cursor = db->expiry[jobType].cursor; + cursor = kvstoreScan(kvs, cursor, -1, scan_cb, + expireShouldSkipTableForSamplingCb, &data); + if (!data.has_more_expired_entries) db->expiry[jobType].cursor = cursor; + if (db->expiry[jobType].cursor == 0 && !data.has_more_expired_entries) { db_done = 1; break; } @@ -322,14 +357,17 @@ void activeExpireCycle(int type) { !repeat) { /* Update the average TTL stats every 16 iterations or about to exit. */ /* Update the average TTL stats for this database, * because this may reach the time limit. */ - if (data.ttl_samples) { + if (data.ttl_samples && jobType == KEYS) { + /* Average TTL is calculated only for keys, as there's currently + * no reliable way to compute it for fields. */ + long long avg_ttl = data.ttl_sum / data.ttl_samples; /* Do a simple running average with a few samples. * We just use the current estimate with a weight of 2% * and the previous estimate with a weight of 98%. */ - if (db->avg_ttl == 0) { - db->avg_ttl = avg_ttl; + if (db->expiry[jobType].avg_ttl == 0) { + db->expiry[jobType].avg_ttl = avg_ttl; } else { /* The origin code is as follow. * for (int i = 0; i < update_avg_ttl_times; i++) { @@ -343,16 +381,15 @@ void activeExpireCycle(int type) { * = avg_ttl + (db->avg_ttl - avg_ttl) * pow(0.98, update_avg_ttl_times) * Notice that update_avg_ttl_times is between 1 and 16, we use a constant table * to accelerate the calculation of pow(0.98, update_avg_ttl_times).*/ - db->avg_ttl = avg_ttl + (db->avg_ttl - avg_ttl) * avg_ttl_factor[update_avg_ttl_times - 1]; + db->expiry[jobType].avg_ttl = avg_ttl + (db->expiry[jobType].avg_ttl - avg_ttl) * avg_ttl_factor[update_avg_ttl_times - 1]; } update_avg_ttl_times = 0; data.ttl_sum = 0; data.ttl_samples = 0; } if ((iteration & 0xf) == 0) { /* check time limit every 16 iterations. */ - elapsed = ustime() - start; - if (elapsed > timelimit) { - timelimit_exit = 1; + if (elapsedUs(start) > (uint64_t)timelimit_us) { + state->timelimit_exit = 1; server.stat_expired_time_cap_reached_count++; break; } @@ -361,10 +398,12 @@ void activeExpireCycle(int type) { } while (repeat); } - elapsed = ustime() - start; - server.stat_expire_cycle_time_used += elapsed; - latencyAddSampleIfNeeded("expire-cycle", elapsed); - latencyTraceIfNeeded(db, expire_cycle, elapsed); + long long elapsed = (long long)elapsedUs(start); + if (jobType == KEYS) { + latencyTraceIfNeeded(db, expire_cycle_keys, elapsed); + } else if (jobType == FIELDS) { + latencyTraceIfNeeded(db, expire_cycle_fields, elapsed); + } /* Update our estimate of keys existing but yet to be expired. * Running average with this sample accounting for 5%. */ @@ -373,7 +412,84 @@ void activeExpireCycle(int type) { current_perc = (double)total_expired / total_sampled; } else current_perc = 0; - server.stat_expired_stale_perc = (current_perc * 0.05) + (server.stat_expired_stale_perc * 0.95); + *expired_stale_perc[jobType] = (current_perc * 0.05) + (*expired_stale_perc[jobType] * 0.95); + + return elapsed; +} + +/* activeExpireCycle + * + * This function performs active expiration of both normal keys (with TTL) + * and hash fields (with field-level TTL via volatile sets). Its purpose is to + * reclaim memory from logically expired entries. + * + * The expiry is performed incrementally over multiple databases, respecting + * a CPU time budget derived from the configured active-expire-effort. + * + * There are two separate expiry mechanisms for keys and for hash fields + * because their iteration models are fundamentally different: + * - key expiry operates on db->key entries, scanning random keys + * with attached TTL entries. + * - field expiry operates on db->key->volatile_set entries, scanning + * fields within a hash that each have their own TTL. + * This hierarchy and lookup pattern are entirely different, requiring + * separate cursors, iteration logic, and data structure handling. + * + * The function uses an alternating scheme across event loop cycles: on one + * cycle it will prioritize key expiry first, then hash field expiry if time + * permits; on the next cycle, it will prioritize hash field expiry first, + * then key expiry if time permits. This ensures fairness and prevents + * starvation of either mechanism. Since the memory reclaim pace and iteration + * model of keys versus hash fields are different and unpredictable, + * alternating naturally balances the overall expiry effort when both are + * fully consuming their available time budget. */ +void activeExpireCycle(int type) { + /* If 'expire' action is paused, for whatever reason, then don't expire any key. + * Typically, at the end of the pause we will properly expire the key OR we + * will have failed over and the new primary will send us the expire. */ + if (isPausedActionsWithUpdate(PAUSE_ACTION_EXPIRE)) return; + + /* Adjust the running parameters according to the configured expire + * effort. The default effort is 1, and the maximum configurable effort + * is 10. Also make sure not to run fast cycles back to back. */ + long long timelimit_us; + if (type == ACTIVE_EXPIRE_CYCLE_FAST) { + long long config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION + ACTIVE_EXPIRE_CYCLE_FAST_DURATION / 4 * activeExpireEffort(); + + /* Never repeat a fast cycle for the same period + * as the fast cycle total duration itself. */ + static monotime last_fast_cycle_start_time; /* When last fast cycle ran. */ + monotime start = getMonotonicUs(); + if (start < last_fast_cycle_start_time + config_cycle_fast_duration * 2) return; + + last_fast_cycle_start_time = start; + timelimit_us = config_cycle_fast_duration; + } else { + /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU + * time per iteration. Since this function gets called with a frequency of + * server.hz times per second, the following is the max amount of + * microseconds we can spend in this function. */ + int config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC + 2 * activeExpireEffort(); + timelimit_us = config_cycle_slow_time_perc * 1000000 / server.hz / 100; + } + + static bool expireCycleStartWithFields = 0; + long long elapsed = 0; + + /* Try to smoke-out bugs (server.also_propagate should be empty here) */ + serverAssert(server.also_propagate.numops == 0); + + if (expireCycleStartWithFields) { + elapsed += activeExpireCycleJob(FIELDS, type, timelimit_us - elapsed); + elapsed += activeExpireCycleJob(KEYS, type, timelimit_us - elapsed); + } else { + elapsed += activeExpireCycleJob(KEYS, type, timelimit_us - elapsed); + elapsed += activeExpireCycleJob(FIELDS, type, timelimit_us - elapsed); + } + server.stat_expire_cycle_time_used += elapsed; + latencyAddSampleIfNeeded("expire-cycle", elapsed); + latencyTraceIfNeeded(db, expire_cycle, elapsed); + expireCycleStartWithFields = !expireCycleStartWithFields; } /*----------------------------------------------------------------------------- @@ -818,7 +934,8 @@ void touchCommand(client *c) { addReplyLongLong(c, touched); } -/* Returns 1 if the expire value is expired, 0 otherwise. */ +/* Returns true if the provided timestamp represents an expired time, false otherwise. + * A negative value means no expiration. */ bool timestampIsExpired(mstime_t when) { if (when < 0) return false; /* no expire */ mstime_t now = commandTimeSnapshot(); diff --git a/src/expire.h b/src/expire.h index 11ef9d9c103..cc0cfbd6d49 100644 --- a/src/expire.h +++ b/src/expire.h @@ -1,9 +1,8 @@ #ifndef EXPIRE_H #define EXPIRE_H -#include #include -#include "monotonic.h" +#include "util.h" /* Special Expiry values */ #define EXPIRY_NONE -1 @@ -35,13 +34,38 @@ typedef enum { POLICY_DELETE_EXPIRED /* Delete expired keys on access. */ } expirationPolicy; +/* Types of active expiry jobs. Used to track and orchestrate + * separate expiry mechanisms within the same database. + * + * KEYS: Expiry of top-level keys via db->expires. + * FIELDS: Expiry of hash fields stored in volatile sets (e.g., per-field TTLs). + * + * ACTIVE_EXPIRY_TYPE_COUNT: Number of expiry types, used for sizing arrays and iteration. */ +enum activeExpiryType { + KEYS, + FIELDS, + ACTIVE_EXPIRY_TYPE_COUNT +}; + /* Forward declarations */ typedef struct client client; typedef struct serverObject robj; +typedef struct serverDb serverDb; -bool timestampIsExpired(mstime_t when); +/* return the relevant expiration policy based on the current server state and the provided flags. + * FLAGS can indicate either: + * EXPIRE_AVOID_DELETE_EXPIRED - which indicate the command is explicitly executed with the NO_EXPIRE flag. + * EXPIRE_FORCE_DELETE_EXPIRED - which indicate to delete expired keys even in case of a replica (for the writable replicas case) */ expirationPolicy getExpirationPolicyWithFlags(int flags); int parseExtendedExpireArgumentsOrReply(client *c, int *flags, int max_args); int convertExpireArgumentToUnixTime(client *c, robj *arg, long long basetime, int unit, long long *unixtime); +/* Handling of expired keys and hash fields */ +void activeExpireCycle(int type); +void expireReplicaKeys(void); +void rememberReplicaKeyWithExpire(serverDb *db, robj *key); +void flushReplicaKeysWithExpireList(void); +size_t getReplicaKeyWithExpireCount(void); +bool timestampIsExpired(mstime_t when); + #endif diff --git a/src/lazyfree.c b/src/lazyfree.c index 6d43c2c8f6c..904f1e8904c 100644 --- a/src/lazyfree.c +++ b/src/lazyfree.c @@ -24,10 +24,12 @@ void lazyfreeFreeObject(void *args[]) { void lazyfreeFreeDatabase(void *args[]) { kvstore *da1 = args[0]; kvstore *da2 = args[1]; + kvstore *da3 = args[2]; size_t numkeys = kvstoreSize(da1); kvstoreRelease(da1); kvstoreRelease(da2); + kvstoreRelease(da3); atomic_fetch_sub_explicit(&lazyfree_objects, numkeys, memory_order_relaxed); atomic_fetch_add_explicit(&lazyfreed_objects, numkeys, memory_order_relaxed); } @@ -192,11 +194,12 @@ void emptyDbAsync(serverDb *db) { slot_count_bits = CLUSTER_SLOT_MASK_BITS; flags |= KVSTORE_FREE_EMPTY_HASHTABLES; } - kvstore *oldkeys = db->keys, *oldexpires = db->expires; + kvstore *oldkeys = db->keys, *oldexpires = db->expires, *oldkeyswithexpires = db->keys_with_volatile_items; db->keys = kvstoreCreate(&kvstoreKeysHashtableType, slot_count_bits, flags); db->expires = kvstoreCreate(&kvstoreExpiresHashtableType, slot_count_bits, flags); + db->keys_with_volatile_items = kvstoreCreate(&kvstoreExpiresHashtableType, slot_count_bits, flags); atomic_fetch_add_explicit(&lazyfree_objects, kvstoreSize(oldkeys), memory_order_relaxed); - bioCreateLazyFreeJob(lazyfreeFreeDatabase, 2, oldkeys, oldexpires); + bioCreateLazyFreeJob(lazyfreeFreeDatabase, 3, oldkeys, oldexpires, oldkeyswithexpires); } /* Free the key tracking table. diff --git a/src/module.c b/src/module.c index 080eec240f5..6b415ae2766 100644 --- a/src/module.c +++ b/src/module.c @@ -5364,6 +5364,7 @@ int VM_HashSet(ValkeyModuleKey *key, int flags, ...) { decrRefCount(field); } } + dbUpdateObjectWithVolatileItemsTracking(key->db, key->value); va_end(ap); moduleDelKeyIfEmpty(key); if (count == 0) errno = ENOENT; diff --git a/src/monotonic.h b/src/monotonic.h index 2880cda858b..b465f90b109 100644 --- a/src/monotonic.h +++ b/src/monotonic.h @@ -20,8 +20,6 @@ * variable is associated with the monotonic clock and should not be confused * with other types of time.*/ typedef uint64_t monotime; -typedef long long mstime_t; /* millisecond time type. */ -typedef long long ustime_t; /* microsecond time type. */ /* Retrieve counter of micro-seconds relative to an arbitrary point in time. */ extern monotime (*getMonotonicUs)(void); diff --git a/src/object.c b/src/object.c index 144907c2015..07f647b766b 100644 --- a/src/object.c +++ b/src/object.c @@ -226,6 +226,11 @@ robj *createStringObject(const char *ptr, size_t len) { return createRawStringObject(ptr, len); } +/* Similar to createStringObject() but takes an existing SDS as input. */ +robj *createStringObjectFromSds(const sds s) { + return createStringObject(s, sdslen(s)); +} + robj *createStringObjectWithKeyAndExpire(const char *ptr, size_t len, const sds key, long long expire) { /* When to embed? Embed when the sum is up to 64 bytes. There may be better * heuristics, e.g. we can look at the jemalloc sizes (16-byte intervals up diff --git a/src/rdb.c b/src/rdb.c index 6ec4e064dd7..c1ed6dc147d 100644 --- a/src/rdb.c +++ b/src/rdb.c @@ -718,7 +718,7 @@ int rdbSaveObjectType(rio *rdb, robj *o) { if (o->encoding == OBJ_ENCODING_LISTPACK) return rdbSaveType(rdb, RDB_TYPE_HASH_LISTPACK); else if (o->encoding == OBJ_ENCODING_HASHTABLE) - if (hashTypeHasVolatileElements(o)) + if (hashTypeHasVolatileFields(o)) return rdbSaveType(rdb, RDB_TYPE_HASH_2); else return rdbSaveType(rdb, RDB_TYPE_HASH); @@ -966,8 +966,8 @@ ssize_t rdbSaveObject(rio *rdb, robj *o, robj *key, int dbid) { return -1; } nwritten += n; - /* check if need to add expired time for the hash elements */ - bool add_expiry = hashTypeHasVolatileElements(o); + /* check if need to add expired time for the hash fields */ + bool add_expiry = hashTypeHasVolatileFields(o); hashtableIterator iter; hashtableInitIterator(&iter, ht, HASHTABLE_ITER_SKIP_VALIDATION); void *next; diff --git a/src/server.c b/src/server.c index 75495ab80ec..ff3d0a31471 100644 --- a/src/server.c +++ b/src/server.c @@ -316,22 +316,6 @@ void serverLogFromHandler(int level, const char *fmt, ...) { serverLogRawFromHandler(level, msg); } -/* Return the UNIX time in microseconds */ -long long ustime(void) { - struct timeval tv; - long long ust; - - gettimeofday(&tv, NULL); - ust = ((long long)tv.tv_sec) * 1000000; - ust += tv.tv_usec; - return ust; -} - -/* Return the UNIX time in milliseconds */ -mstime_t mstime(void) { - return ustime() / 1000; -} - /* Return the command time snapshot in milliseconds. * The time the command started is the logical time it runs, * and all the time readings during the execution time should @@ -2712,7 +2696,9 @@ void resetServerStats(void) { server.stat_numcommands = 0; server.stat_numconnections = 0; server.stat_expiredkeys = 0; - server.stat_expired_stale_perc = 0; + server.stat_expiredfields = 0; + server.stat_expired_keys_stale_perc = 0; + server.stat_expired_keys_with_vola_stale_perc = 0; server.stat_expired_time_cap_reached_count = 0; server.stat_expire_cycle_time_used = 0; server.stat_evictedkeys = 0; @@ -2803,13 +2789,13 @@ serverDb *createDatabase(int id) { serverDb *db = zmalloc(sizeof(serverDb)); db->keys = kvstoreCreate(&kvstoreKeysHashtableType, slot_count_bits, flags); db->expires = kvstoreCreate(&kvstoreExpiresHashtableType, slot_count_bits, flags); - db->expires_cursor = 0; + db->keys_with_volatile_items = kvstoreCreate(&kvstoreExpiresHashtableType, slot_count_bits, flags); db->blocking_keys = dictCreate(&keylistDictType); db->blocking_keys_unblock_on_nokey = dictCreate(&objectKeyPointerValueDictType); db->ready_keys = dictCreate(&objectKeyPointerValueDictType); db->watched_keys = dictCreate(&keylistDictType); db->id = id; - db->avg_ttl = 0; + resetDbExpiryState(db); return db; } @@ -6084,7 +6070,9 @@ sds genValkeyInfoString(dict *section_dict, int all_sections, int everything) { "sync_partial_ok:%lld\r\n", server.stat_sync_partial_ok, "sync_partial_err:%lld\r\n", server.stat_sync_partial_err, "expired_keys:%lld\r\n", server.stat_expiredkeys, - "expired_stale_perc:%.2f\r\n", server.stat_expired_stale_perc * 100, + "expired_fields:%lld\r\n", server.stat_expiredfields, + "expired_stale_perc:%.2f\r\n", server.stat_expired_keys_stale_perc * 100, + "expired_keys_with_volatile_items_stale_perc:%.2f\r\n", server.stat_expired_keys_with_vola_stale_perc * 100, "expired_time_cap_reached_count:%lld\r\n", server.stat_expired_time_cap_reached_count, "expire_cycle_cpu_milliseconds:%lld\r\n", server.stat_expire_cycle_time_used / 1000, "evicted_keys:%lld\r\n", server.stat_evictedkeys, @@ -6341,13 +6329,15 @@ sds genValkeyInfoString(dict *section_dict, int all_sections, int everything) { for (j = 0; j < server.dbnum; j++) { serverDb *db = server.db[j]; if (db == NULL) continue; - long long keys, vkeys; + long long keys, vkeys, keysvitems; keys = kvstoreSize(db->keys); vkeys = kvstoreSize(db->expires); + keysvitems = kvstoreSize(db->keys_with_volatile_items); + if (keys || vkeys) { - info = sdscatprintf(info, "db%d:keys=%lld,expires=%lld,avg_ttl=%lld\r\n", j, keys, vkeys, - db->avg_ttl); + info = sdscatprintf(info, "db%d:keys=%lld,expires=%lld,avg_ttl=%lld,keys_with_volatile_items=%lld\r\n", j, keys, vkeys, + db->expiry[KEYS].avg_ttl, keysvitems); } } } diff --git a/src/server.h b/src/server.h index d63c5eaf20d..a76a80b1f0c 100644 --- a/src/server.h +++ b/src/server.h @@ -868,9 +868,9 @@ typedef struct replBufBlock { * by integers from 0 (the default database) up to the max configured * database. The database number is the 'id' field in the structure. */ typedef struct serverDb { - kvstore *keys; /* The keyspace for this DB */ - kvstore *expires; /* Timeout of keys with a timeout set */ - kvstore *object_with_volatile_elements; + kvstore *keys; /* The keyspace for this DB */ + kvstore *expires; /* Timeout of keys with a timeout set */ + kvstore *keys_with_volatile_items; /* Keys with volatile items */ dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/ dict *blocking_keys_unblock_on_nokey; /* Keys with clients waiting for * data, and should be unblocked if key is deleted (XREADEDGROUP). @@ -878,8 +878,10 @@ typedef struct serverDb { dict *ready_keys; /* Blocked keys that received a PUSH */ dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */ int id; /* Database ID */ - long long avg_ttl; /* Average TTL, just for stats */ - unsigned long expires_cursor; /* Cursor of the active expire cycle. */ + struct { + long long avg_ttl; /* Average TTL, just for stats */ + unsigned long cursor; /* Cursor of the active expire cycle. */ + } expiry[ACTIVE_EXPIRY_TYPE_COUNT]; } serverDb; /* forward declaration for functions ctx */ @@ -1626,6 +1628,7 @@ typedef enum childInfoType { CHILD_INFO_TYPE_RDB_COW_SIZE, CHILD_INFO_TYPE_MODULE_COW_SIZE } childInfoType; + struct valkeyServer { /* General */ pid_t pid; /* Main process pid. */ @@ -1752,7 +1755,9 @@ struct valkeyServer { long long stat_numcommands; /* Number of processed commands */ long long stat_numconnections; /* Number of connections received */ long long stat_expiredkeys; /* Number of expired keys */ - double stat_expired_stale_perc; /* Percentage of keys probably expired */ + long long stat_expiredfields; /* Number of expired hash fields */ + double stat_expired_keys_stale_perc; /* Percentage of keys probably expired */ + double stat_expired_keys_with_vola_stale_perc; /* Percentage of keys probably expired */ long long stat_expired_time_cap_reached_count; /* Early expire cycle stops.*/ long long stat_expire_cycle_time_used; /* Cumulative microseconds used. */ long long stat_evictedkeys; /* Number of evicted keys (maxmemory) */ @@ -2673,8 +2678,6 @@ extern dict *modules; void populateCommandLegacyRangeSpec(struct serverCommand *c); /* Utils */ -long long ustime(void); -mstime_t mstime(void); mstime_t commandTimeSnapshot(void); uint64_t crc64(uint64_t crc, const unsigned char *s, uint64_t l); void exitFromChild(int retcode); @@ -2954,6 +2957,7 @@ void dismissObject(robj *o, size_t dump_size); robj *createObject(int type, void *ptr); void initObjectLRUOrLFU(robj *o); robj *createStringObject(const char *ptr, size_t len); +robj *createStringObjectFromSds(const sds s); robj *createRawStringObject(const char *ptr, size_t len); robj *tryCreateRawStringObject(const char *ptr, size_t len); robj *tryCreateStringObject(const char *ptr, size_t len); @@ -3315,6 +3319,7 @@ void checkChildrenDone(void); int setOOMScoreAdj(int process_class); void rejectCommandFormat(client *c, const char *fmt, ...); void *activeDefragAlloc(void *ptr); +sds activeDefragSds(sds sdsptr); robj *activeDefragStringOb(robj *ob); void dismissSds(sds s); void dismissMemoryInChild(void); @@ -3359,10 +3364,10 @@ robj *setTypeDup(robj *o); #define HASH_SET_COPY 0 -void hashTypeFreeVolatileSet(robj *o); -void hashTypeTrackEntry(robj *o, void *entry); -void hashTypeUntrackEntry(robj *o, void *entry); -void hashTypeTrackUpdateEntry(robj *o, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry); +void hashTypeFreeVolatileSet(robj *o); /* needed only for freeHashObject */ +void hashTypeTrackEntry(robj *o, void *entry); /* needed only for rdbLoadObject */ +size_t hashTypeScanDefrag(robj *ob, size_t cursor, void *(*defragAlloc)(void *)); +size_t hashTypeDeleteExpiredFields(robj *o, mstime_t now, unsigned long max_fields, robj **out_fields); void hashTypeConvert(robj *o, int enc); void hashTypeTryConversion(robj *subject, robj **argv, int start, int end); @@ -3384,8 +3389,7 @@ robj *hashTypeLookupWriteOrCreate(client *c, robj *key); robj *hashTypeGetValueObject(robj *o, sds field); int hashTypeSet(robj *o, sds field, sds value, long long expiry, int flags); robj *hashTypeDup(robj *o); -bool hashTypeHasVolatileElements(robj *o); -size_t hashTypeNumVolatileElements(robj *o); +bool hashTypeHasVolatileFields(robj *o); /* Pub / Sub */ int pubsubUnsubscribeAllChannels(client *c, int notify); @@ -3500,6 +3504,7 @@ int removeExpire(serverDb *db, robj *key); void deleteExpiredKeyAndPropagate(serverDb *db, robj *keyobj); void deleteExpiredKeyFromOverwriteAndPropagate(client *c, robj *keyobj); void propagateDeletion(serverDb *db, robj *key, int lazy); +size_t dbReclaimExpiredFields(robj *o, serverDb *db, mstime_t now, unsigned long max_entries); int keyIsExpired(serverDb *db, robj *key); long long getExpire(serverDb *db, robj *key); robj *setExpire(client *c, serverDb *db, robj *key, long long when); @@ -3543,6 +3548,7 @@ robj *dbUnshareStringValue(serverDb *db, robj *key, robj *o); #define EMPTYDB_NOFUNCTIONS (1 << 1) /* Indicate not to flush the functions. */ long long emptyData(int dbnum, int flags, void(callback)(hashtable *)); long long emptyDbStructure(serverDb **dbarray, int dbnum, int async, void(callback)(hashtable *)); +void resetDbExpiryState(serverDb *db); void flushAllDataAndResetRDB(int flags); long long dbTotalServerKeyCount(void); serverDb *initTempDb(int id); @@ -3559,6 +3565,9 @@ size_t lazyfreeGetFreedObjectsCount(void); void lazyfreeResetStats(void); void freeObjAsync(robj *key, robj *obj, int dbid); void freeReplicationBacklogRefMemAsync(list *blocks, rax *index); +void dbUntrackKeyWithVolatileItems(serverDb *db, robj *o); +void dbTrackKeyWithVolatileItems(serverDb *db, robj *o); +void dbUpdateObjectWithVolatileItemsTracking(serverDb *db, robj *o); /* API to get key arguments from commands */ #define GET_KEYSPEC_DEFAULT 0 @@ -3665,13 +3674,6 @@ void removeClientFromTimeoutTable(client *c); void handleBlockedClientsTimeout(void); int clientsCronHandleTimeout(client *c, mstime_t now_ms); -/* expire.c -- Handling of expired keys */ -void activeExpireCycle(int type); -void expireReplicaKeys(void); -void rememberReplicaKeyWithExpire(serverDb *db, robj *key); -void flushReplicaKeysWithExpireList(void); -size_t getReplicaKeyWithExpireCount(void); - /* evict.c -- maxmemory handling and LRU eviction. */ void evictionPoolAlloc(void); #define LFU_INIT_VAL 5 diff --git a/src/t_hash.c b/src/t_hash.c index b529355ff2d..f0f363075f7 100644 --- a/src/t_hash.c +++ b/src/t_hash.c @@ -62,7 +62,9 @@ static vset *hashTypeGetVolatileSet(robj *o) { return vsetIsValid(set) ? set : NULL; } -bool hashTypeHasVolatileElements(robj *o) { +bool hashTypeHasVolatileFields(robj *o) { + if (o == NULL) return false; + serverAssert(o->type == OBJ_HASH); if (o->encoding == OBJ_ENCODING_HASHTABLE) { vset *set = hashTypeGetVolatileSet(o); if (set && !vsetIsEmpty(set)) @@ -102,11 +104,17 @@ void hashTypeFreeVolatileSet(robj *o) { } void hashTypeTrackEntry(robj *o, void *entry) { - vset *set = hashTypeGetOrcreateVolatileSet(o); - serverAssert(vsetAddEntry(set, entryGetExpiry, entry)); + vset *set; + if (hashTypeHasVolatileFields(o)) { + set = hashTypeGetVolatileSet(o); + } else { + set = hashTypeGetOrcreateVolatileSet(o); + } + bool added = vsetAddEntry(set, entryGetExpiry, entry); + serverAssert(added); } -void hashTypeUntrackEntry(robj *o, void *entry) { +static void hashTypeUntrackEntry(robj *o, void *entry) { if (!entryHasExpiry(entry)) return; vset *set = hashTypeGetVolatileSet(o); debugServerAssert(set); @@ -116,7 +124,7 @@ void hashTypeUntrackEntry(robj *o, void *entry) { } } -void hashTypeTrackUpdateEntry(robj *o, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { +static void hashTypeTrackUpdateEntry(robj *o, void *old_entry, void *new_entry, long long old_expiry, long long new_expiry) { int old_tracked = (old_entry && old_expiry != EXPIRY_NONE); int new_tracked = (new_entry && new_expiry != EXPIRY_NONE); /* If entry was not tracked before and not going to be tracked now, we can simply return */ @@ -398,7 +406,9 @@ int hashTypeSet(robj *o, sds field, sds value, long long expiry, int flags) { int replaced = hashtableReplaceReallocatedEntry(ht, existing, new_entry); serverAssert(replaced); } + hashTypeTrackUpdateEntry(o, existing, new_entry, entry_expiry, expiry); + /* since we are exposed to expired entries, we must NOT reflect them as being "updated" */ update = is_expired ? 0 : 1; } @@ -1007,20 +1017,27 @@ void hmgetCommand(client *c) { void hdelCommand(client *c) { robj *o; - int j, deleted = 0, keyremoved = 0; + int j, deleted = 0; + bool keyremoved = false; if ((o = lookupKeyWriteOrReply(c, c->argv[1], shared.czero)) == NULL || checkType(c, o, OBJ_HASH)) return; + + bool hash_volatile_items = hashTypeHasVolatileFields(o); for (j = 2; j < c->argc; j++) { if (hashTypeDelete(o, c->argv[j]->ptr)) { deleted++; if (hashTypeLength(o) == 0) { - dbDelete(c->db, c->argv[1]); - keyremoved = 1; + if (hash_volatile_items) dbUntrackKeyWithVolatileItems(c->db, o); + dbDelete(c->db, c->argv[1]); /* Please note that this will also remove the tracking from the kvstore */ + keyremoved = true; break; } } } if (deleted) { + if (!keyremoved && hash_volatile_items != hashTypeHasVolatileFields(o)) { + dbUpdateObjectWithVolatileItemsTracking(c->db, o); + } signalModifiedKey(c, c->db, c->argv[1]); notifyKeyspaceEvent(NOTIFY_HASH, "hdel", c->argv[1], c->db->id); if (keyremoved) notifyKeyspaceEvent(NOTIFY_GENERIC, "del", c->argv[1], c->db->id); @@ -1070,7 +1087,11 @@ void hsetnxCommand(client *c) { addReply(c, shared.czero); } else { hashTypeTryConversion(o, c->argv, 2, 3); + bool has_volatile_fields = hashTypeHasVolatileFields(o); hashTypeSet(o, c->argv[2]->ptr, c->argv[3]->ptr, EXPIRY_NONE, HASH_SET_COPY | HASH_SET_KEEP_EXPIRY); + if (has_volatile_fields != hashTypeHasVolatileFields(o)) { + dbUpdateObjectWithVolatileItemsTracking(c->db, o); + } signalModifiedKey(c, c->db, c->argv[1]); notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); server.dirty++; @@ -1089,9 +1110,13 @@ void hsetCommand(client *c) { if ((o = hashTypeLookupWriteOrCreate(c, c->argv[1])) == NULL) return; hashTypeTryConversion(o, c->argv, 2, c->argc - 1); - - for (i = 2; i < c->argc; i += 2) created += !hashTypeSet(o, c->argv[i]->ptr, c->argv[i + 1]->ptr, EXPIRY_NONE, HASH_SET_COPY); - + bool has_volatile_fields = hashTypeHasVolatileFields(o); + for (i = 2; i < c->argc; i += 2) { + created += !hashTypeSet(o, c->argv[i]->ptr, c->argv[i + 1]->ptr, EXPIRY_NONE, HASH_SET_COPY); + } + if (has_volatile_fields != hashTypeHasVolatileFields(o)) { + dbUpdateObjectWithVolatileItemsTracking(c->db, o); + } signalModifiedKey(c, c->db, c->argv[1]); notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); server.dirty += (c->argc - 2) / 2; @@ -1189,6 +1214,8 @@ void hsetexCommand(client *c) { dbAdd(c->db, c->argv[1], &o); } + bool has_volatile_fields = hashTypeHasVolatileFields(o); + /* Handle parsing and calculating the expiration time. */ if (flags & ARGS_KEEPTTL) set_flags |= HASH_SET_KEEP_EXPIRY; @@ -1238,6 +1265,9 @@ void hsetexCommand(client *c) { if (changes) { + if (has_volatile_fields != hashTypeHasVolatileFields(o)) { + dbUpdateObjectWithVolatileItemsTracking(c->db, o); + } notifyKeyspaceEvent(NOTIFY_HASH, "hset", c->argv[1], c->db->id); if (set_expired) { replaceClientCommandVector(c, new_argc, new_argv); @@ -1345,6 +1375,9 @@ void hgetexCommand(client *c) { if ((o = lookupKeyReadOrReply(c, c->argv[1], shared.null[c->resp])) == NULL || checkType(c, o, OBJ_HASH)) return; + /* Check if the hash object has volatile fields, used for active-expiry tracking */ + bool has_volatile_fields = hashTypeHasVolatileFields(o); + /* Handle parsing and calculating the expiration time. */ if (flags & ARGS_PERSIST) { persist = 1; @@ -1432,6 +1465,10 @@ void hgetexCommand(client *c) { server.dirty += changes; signalModifiedKey(c, c->db, c->argv[1]); + if (has_volatile_fields != hashTypeHasVolatileFields(o)) { + dbUpdateObjectWithVolatileItemsTracking(c->db, o); + } + /* Delete the object in case it was left empty */ if (hashTypeLength(o) == 0) { dbDelete(c->db, c->argv[1]); @@ -1601,6 +1638,9 @@ void hexpireGenericCommand(client *c, long long basetime, int unit) { if (checkType(c, obj, OBJ_HASH)) { return; } + + bool has_volatile_fields = hashTypeHasVolatileFields(obj); + /* From this point we would return array reply */ addReplyArrayLen(c, num_fields); @@ -1631,6 +1671,9 @@ void hexpireGenericCommand(client *c, long long basetime, int unit) { } if (expired || updated) { + if (has_volatile_fields != hashTypeHasVolatileFields(obj)) { + dbUpdateObjectWithVolatileItemsTracking(c->db, obj); + } if (expired) { replaceClientCommandVector(c, new_argc, new_argv); /* We would like to reduce the number of hexpired events in case there are potential many expired fields. */ @@ -1712,6 +1755,8 @@ void hpersistCommand(client *c) { if (checkType(c, hash, OBJ_HASH)) return; + bool has_volatile_fields = hashTypeHasVolatileFields(hash); + for (int i = 0; i < num_fields; i++, fields_index++) { result = hashTypePersist(hash, c->argv[fields_index]->ptr); if (result == EXPIRATION_MODIFICATION_SUCCESSFUL) { @@ -1721,6 +1766,9 @@ void hpersistCommand(client *c) { addReplyLongLong(c, result); } if (changes) { + if (has_volatile_fields != hashTypeHasVolatileFields(hash)) { + dbUpdateObjectWithVolatileItemsTracking(c->db, hash); + } notifyKeyspaceEvent(NOTIFY_HASH, "hpersist", c->argv[1], c->db->id); signalModifiedKey(c, c->db, c->argv[1]); } @@ -2045,3 +2093,125 @@ void hrandfieldCommand(client *c) { hashTypeRandomElement(hash, hashTypeLength(hash), &ele, NULL); hashReplyFromListpackEntry(c, &ele); } + +/* Context structure for tracking expiry operations on hash fields. */ +typedef struct { + robj *key; /* the hash object */ + unsigned long n_fields; /* number of entries processed */ + robj **fields; /* array of expired entries to replicate later */ +} expiryContext; + +/* Callback for popping expired entries from the volatile set. + * Deletes the entry from the hash table and tracks it in the expiry context. + * Returns 1 if deleted, 0 if nothing to do. */ +static int hashTypeExpireEntry(void *entry, void *c) { + expiryContext *ctx = c; + robj *o = ctx->key; + serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); + + hashtable *ht = o->ptr; + void *entry_ptr = NULL; + int deleted = hashtablePop(ht, entry, &entry_ptr); + + if (deleted) { + if (ctx->fields) + ctx->fields[ctx->n_fields++] = createStringObjectFromSds(entryGetField(entry)); + server.stat_expiredfields++; + entryFree(entry); + return 1; + } + return 0; +} + +/* Extract expired entries from a hash object's volatile set. + * Returns number of expired entries, populates `out_entries`. */ +size_t hashTypeDeleteExpiredFields(robj *o, mstime_t now, unsigned long max_fields, robj **out_entries) { + serverAssert(o->encoding == OBJ_ENCODING_HASHTABLE); + + /* skip TTL checks temporarily (to allow hashtable lookup) */ + hashTypeIgnoreTTL(o, 1); + + vset *vset = hashTypeGetVolatileSet(o); + if (!vset || vsetIsEmpty(vset)) { + hashTypeIgnoreTTL(o, 0); + return 0; + } + + expiryContext ctx = {.key = o, .fields = out_entries, .n_fields = 0}; + size_t expired = vsetRemoveExpired(vset, entryGetExpiry, hashTypeExpireEntry, now, max_fields, &ctx); + serverAssert(ctx.n_fields <= max_fields); + hashTypeIgnoreTTL(o, 0); + if (!hashTypeHasVolatileFields(o)) { + hashTypeFreeVolatileSet(o); + } + return expired; +} + +/* Hashtable scan callback for hash datatype */ +static void defragHashTypeEntry(void *privdata, void *element_ref) { + entry **entry_ref = (entry **)element_ref; + entry *old_entry = *entry_ref; + + entry *new_entry = entryDefrag(old_entry, activeDefragAlloc, activeDefragSds); + if (new_entry) { + long long expiry = entryGetExpiry(new_entry); + /* In case the entry is tracked we need to update it in the volatile set */ + if (expiry != EXPIRY_NONE) { + // We don't need to pass the db because db-level tracking isn't going to change for this update. + hashTypeTrackUpdateEntry(privdata, old_entry, new_entry, expiry, expiry); + } + *entry_ref = new_entry; + } +} + +size_t hashTypeScanDefrag(robj *ob, size_t cursor, void *(*defragAllocfn)(void *)) { + if (ob->encoding == OBJ_ENCODING_LISTPACK) { + unsigned char *newzl; + if ((newzl = activeDefragAlloc(ob->ptr))) ob->ptr = newzl; + return 0; + } + serverAssert(ob->encoding == OBJ_ENCODING_HASHTABLE); + static struct volatileSetCursor { + size_t cursor; + bool is_vsetDefrag; + } volaSetIter; + static struct volatileSetCursor *vset_cursor = NULL; + + vset_cursor = (struct volatileSetCursor *)cursor; + + if (!vset_cursor) { + /* New object scan */ + hashtable *ht = ob->ptr; + /* defrag the hashtable struct and tables */ + hashtable *new_hashtable = hashtableDefragTables(ht, defragAllocfn); + if (new_hashtable) ob->ptr = new_hashtable; + vset_cursor = &volaSetIter; + vset_cursor->cursor = 0; + vset_cursor->is_vsetDefrag = false; + } + + if (!vset_cursor->is_vsetDefrag) { + hashtable *ht = ob->ptr; + vset_cursor->cursor = hashtableScanDefrag(ht, vset_cursor->cursor, defragHashTypeEntry, ob, + defragAllocfn, + HASHTABLE_SCAN_EMIT_REF); + if (vset_cursor->cursor == 0) { + if (hashTypeHasVolatileFields(ob)) { + /* We're done scanning the hash table, continue to defrag the volatile set only if there's one. */ + vset_cursor->is_vsetDefrag = true; + } else { + /* We're done with this object. */ + return 0; + } + } + } else { + /* We're already defraging volatile set. */ + vset *vset = hashTypeGetVolatileSet(ob); + vset_cursor->cursor = vsetScanDefrag(vset, vset_cursor->cursor, activeDefragAlloc); + if (vset_cursor->cursor == 0) { + /* We're done with this hash object. */ + return 0; + } + } + return (long)vset_cursor; +} diff --git a/src/trace/README.md b/src/trace/README.md index b5049c9dd54..7b4b2c6e8c9 100644 --- a/src/trace/README.md +++ b/src/trace/README.md @@ -98,6 +98,8 @@ Generally valkey-server would not run in full utilization, the overhead is accep | eviction_lazyfree | valkey_db | | eviction_cycle | valkey_db | | expire_cycle | valkey_db | +| expire_cycle_fields | valkey_db | +| expire_cycle_keys | valkey_db | | cluster_config_open | valkey_cluster | | cluster_config_write | valkey_cluster | | cluster_config_fsync | valkey_cluster | diff --git a/src/trace/trace_db.h b/src/trace/trace_db.h index f6abefe14c1..a3264cc3f28 100644 --- a/src/trace/trace_db.h +++ b/src/trace/trace_db.h @@ -105,6 +105,26 @@ LTTNG_UST_TRACEPOINT_EVENT_INSTANCE( ) ) +LTTNG_UST_TRACEPOINT_EVENT_INSTANCE( + /* Name of the tracepoint class provider */ + valkey_db, valkey_db_class, valkey_db, expire_cycle_keys, + + /* List of tracepoint arguments (input) */ + LTTNG_UST_TP_ARGS( + uint64_t, duration + ) +) + +LTTNG_UST_TRACEPOINT_EVENT_INSTANCE( + /* Name of the tracepoint class provider */ + valkey_db, valkey_db_class, valkey_db, expire_cycle_fields, + + /* List of tracepoint arguments (input) */ + LTTNG_UST_TP_ARGS( + uint64_t, duration + ) +) + #define valkey_db_trace(...) lttng_ust_tracepoint(__VA_ARGS__) #endif /* __VALKEY_TRACE_DB_H__ */ diff --git a/src/unit/test_quicklist.c b/src/unit/test_quicklist.c index 63a97fedb98..7124e5f3451 100644 --- a/src/unit/test_quicklist.c +++ b/src/unit/test_quicklist.c @@ -21,21 +21,6 @@ static unsigned int err = 0; /*----------------------------------------------------------------------------- * Unit Function *----------------------------------------------------------------------------*/ -/* Return the UNIX time in microseconds */ -static long long ustime(void) { - struct timeval tv; - long long ust; - - gettimeofday(&tv, NULL); - ust = ((long long)tv.tv_sec) * 1000000; - ust += tv.tv_usec; - return ust; -} - -/* Return the UNIX time in milliseconds */ -static long long mstime(void) { - return ustime() / 1000; -} /* Generate new string concatenating integer i against string 'prefix' */ static char *genstr(char *prefix, int i) { diff --git a/src/unit/test_vset.c b/src/unit/test_vset.c index f8646875586..504ac465910 100644 --- a/src/unit/test_vset.c +++ b/src/unit/test_vset.c @@ -2,6 +2,7 @@ #include "../entry.h" #include "test_help.h" #include "../zmalloc.h" +#include "../allocator_defrag.h" #include #include @@ -407,19 +408,10 @@ void *mock_defragfn(void *ptr) { return newptr; } -int mock_defrag_rax_node(raxNode **noderef) { - raxNode *newnode = mock_defragfn(*noderef); - if (newnode) { - *noderef = newnode; - return 1; - } - return 0; -} - size_t defrag_vset(vset *set, size_t cursor, size_t steps) { if (steps == 0) steps = ULONG_MAX; do { - cursor = vsetScanDefrag(set, cursor, mock_defragfn, mock_defrag_rax_node); + cursor = vsetScanDefrag(set, cursor, mock_defragfn); steps--; } while (cursor != 0 && steps > 0); return cursor; @@ -439,6 +431,7 @@ int test_vset_defrag(int argc, char **argv, int flags) { UNUSED(argc); UNUSED(argv); UNUSED(flags); + allocatorDefragInit(); srand(time(NULL)); vset set; diff --git a/src/util.c b/src/util.c index 0e93bbc7a18..0631736c86d 100644 --- a/src/util.c +++ b/src/util.c @@ -52,7 +52,6 @@ #include "config.h" #include "zmalloc.h" #include "serverassert.h" - #include "valkey_strtod.h" #if HAVE_X86_SIMD @@ -1569,6 +1568,22 @@ int snprintf_async_signal_safe(char *to, size_t n, const char *fmt, ...) { return result; } +/* Return the UNIX time in microseconds */ +long long ustime(void) { + struct timeval tv; + long long ust; + + gettimeofday(&tv, NULL); + ust = ((long long)tv.tv_sec) * 1000000; + ust += tv.tv_usec; + return ust; +} + +/* Return the UNIX time in milliseconds */ +mstime_t mstime(void) { + return ustime() / 1000; +} + /* Writes a pointer into an 8 bytes field, padding with zeros on 32bit targets * to ensure a consistent fixed width encoding. */ void writePointerWithPadding(unsigned char *buf, const void *ptr) { diff --git a/src/util.h b/src/util.h index db15f2d9003..bdf6909e657 100644 --- a/src/util.h +++ b/src/util.h @@ -68,6 +68,9 @@ typedef enum { LD_STR_HEX /* %La */ } ld2string_mode; +typedef long long mstime_t; /* millisecond time type. */ +typedef long long ustime_t; /* microsecond time type. */ + int stringmatchlen(const char *p, int plen, const char *s, int slen, int nocase); int stringmatch(const char *p, const char *s, int nocase); int stringmatchlen_fuzz_test(void); @@ -114,6 +117,8 @@ void getRandomSeedCString(char *buff, size_t len); void setRandomSeedCString(char *seed_str, size_t len); void getRandomHexChars(char *p, size_t len); void getRandomBytes(unsigned char *p, size_t len); +long long ustime(void); +mstime_t mstime(void); void writePointerWithPadding(unsigned char *buf, const void *ptr); #endif diff --git a/src/vset.c b/src/vset.c index 4a5bc144184..ad242cbd3fa 100644 --- a/src/vset.c +++ b/src/vset.c @@ -5,6 +5,7 @@ #include "hashtable.h" #include "util.h" #include "zmalloc.h" +#include "server.h" // for activeDefragAlloc #include #include @@ -762,6 +763,7 @@ static inline void *vsetBucketSingle(vsetBucket *b) { } static inline vsetBucket *vsetBucketFromRawPtr(void *ptr, int type) { + assert(ptr != NULL); uintptr_t p = (uintptr_t)ptr; return (vsetBucket *)(p | (type & VSET_TAG_MASK)); } @@ -1430,9 +1432,12 @@ static inline size_t vsetBucketRemoveExpired_VECTOR(vsetBucket **bucket, vsetGet break; if (expiryFunc) expiryFunc(entry, ctx); } - pVector *new_pv = pvSplit(&pv, i); - *bucket = (new_pv ? vsetBucketFromVector(new_pv) : vsetBucketFromNone()); - pvFree(pv); + /* If no expiry occurred, no need to split. */ + if (i > 0) { + pVector *new_pv = pvSplit(&pv, i); + *bucket = (new_pv ? vsetBucketFromVector(new_pv) : vsetBucketFromNone()); + pvFree(pv); + } return i; } @@ -1858,18 +1863,9 @@ static inline vsetBucket *vsetBucketUpdateEntry_HASHTABLE(vsetBucket *bucket, vs if (old_entry == new_entry) return bucket; - hashtablePosition pos; hashtable *ht = vsetBucketHashtable(bucket); - /* We do a two stage pop in order to avoid rehashing. */ - void **ref = hashtableTwoPhasePopFindRef(ht, old_entry, &pos); - if (!ref) { - /* In case no entry found, the rehashing did not pause, so it is safe to return. */ - return vsetBucketFromNone(); - } else { - /* We know for sure the two entries are not the same, so it is safe to add the new and remove the old */ - assert(hashtableAdd(ht, new_entry)); - hashtableTwoPhasePopDelete(ht, &pos); - } + hashtableDelete(ht, old_entry); + assert(hashtableAdd(ht, new_entry)); return bucket; } @@ -2085,7 +2081,7 @@ long long vsetEstimatedEarliestExpiry(vset *set, vsetGetExpiryFunc getExpiry) { return -1; break; case VSET_BUCKET_RAX: { - rax *r = vsetBucketRax(set); + rax *r = vsetBucketRax(*set); raxIterator it; raxStart(&it, r); expiry = decodeExpiryKey(it.key); @@ -2376,7 +2372,18 @@ static size_t vsetBucketDefrag_RAX(vsetBucket **bucket, size_t cursor, void *(*d return (size_t)state; } -size_t vsetScanDefrag(vset *set, size_t cursor, void *(*defragfn)(void *), int (*defragRaxNode)(raxNode **)) { +/* Defrag callback for radix tree iterator, called for each node, + * used in order to defrag the nodes allocations. */ +static int defragRaxNode(raxNode **noderef) { + raxNode *newnode = activeDefragAlloc(*noderef); + if (newnode) { + *noderef = newnode; + return 1; + } + return 0; +} + +size_t vsetScanDefrag(vset *set, size_t cursor, void *(*defragfn)(void *)) { switch (vsetBucketType(*set)) { case VSET_BUCKET_NONE: case VSET_BUCKET_SINGLE: diff --git a/src/vset.h b/src/vset.h index 7349aa46ed1..8c1074c2cb4 100644 --- a/src/vset.h +++ b/src/vset.h @@ -6,8 +6,7 @@ #include "hashtable.h" #include "rax.h" -#include "sds.h" -#include "monotonic.h" /* for mstime_t*/ +#include "util.h" /* *----------------------------------------------------------------------------- @@ -71,7 +70,7 @@ /* Return the absolute expiration time in milliseconds for the provided entry */ typedef long long (*vsetGetExpiryFunc)(const void *entry); -/* Callback to be optionally provided to vsetPopExpired. when item is removed from the vset this callback will also be applied. */ +/* Callback to be optionally provided to vsetRemoveExpired. when item is removed from the vset this callback will also be applied. */ typedef int (*vsetExpiryFunc)(void *entry, void *ctx); // vset is just a pointer to a bucket typedef void *vset; @@ -92,6 +91,6 @@ bool vsetIsValid(vset *set); long long vsetEstimatedEarliestExpiry(vset *set, vsetGetExpiryFunc getExpiry); size_t vsetRemoveExpired(vset *set, vsetGetExpiryFunc getExpiry, vsetExpiryFunc expiryFunc, mstime_t now, size_t max_count, void *ctx); size_t vsetMemUsage(vset *set); -size_t vsetScanDefrag(vset *set, size_t cursor, void *(*defragfn)(void *), int (*defragRaxNode)(raxNode **)); +size_t vsetScanDefrag(vset *set, size_t cursor, void *(*defragfn)(void *)); #endif diff --git a/tests/unit/hashexpire.tcl b/tests/unit/hashexpire.tcl index c8989dace11..0cd4ac5a75d 100644 --- a/tests/unit/hashexpire.tcl +++ b/tests/unit/hashexpire.tcl @@ -1,4 +1,3 @@ - proc info_field {info field} { foreach line [split $info "\n"] { if {[string match "$field:*" $line]} { @@ -8,6 +7,35 @@ proc info_field {info field} { return [s field_name] } +proc get_keys_with_volatile_items {r} { + set line [$r info keyspace] + set match [regexp -inline {keys_with_volatile_items=([\d]+)} $line] + + if {[llength $match] == 2} { + return [lindex $match 1] + } else { + return 0 + } +} + +proc get_keys {r} { + set line [$r info keyspace] + set match [regexp -inline {keys=([\d]+)} $line] + + if {[llength $match] == 2} { + return [lindex $match 1] + } else { + return 0 + } +} + +proc check_myhash_and_expired_subkeys {r myhash expected_len initial_expired expected_increment} { + expr { + [$r HLEN $myhash] == $expected_len && + [info_field [$r info stats] expired_fields] == ($initial_expired + $expected_increment) + } +} + proc get_short_expire_value {command} { expr { ($command eq "HEXPIRE" || $command eq "EX") ? 1 : @@ -72,8 +100,8 @@ proc setup_replication_test {primary replica primary_host primary_port} { } else { fail "Can't turn the instance into a replica" } - set primary_initial_expired [info_field [$primary info stats] expired_subkeys] - set replica_initial_expired [info_field [$replica info stats] expired_subkeys] + set primary_initial_expired [info_field [$primary info stats] expired_fields] + set replica_initial_expired [info_field [$replica info stats] expired_fields] return [list $primary_initial_expired $replica_initial_expired] } @@ -84,6 +112,13 @@ proc setup_single_keyspace_notification {r} { return $rd } +proc wait_for_active_expiry {r key expected_len initial_expired expected_increment {timeout 100} {interval 100}} { + wait_for_condition $timeout $interval { + [check_myhash_and_expired_subkeys $r $key $expected_len $initial_expired $expected_increment] + } else { + fail "Active expiry did not occur as expected" + } +} start_server {tags {"hashexpire"}} { ####### Valid scenarios tests ####### @@ -225,6 +260,48 @@ start_server {tags {"hashexpire"}} { } } + foreach command {EX PX EXAT PXAT} { + test "HGETEX $command overwrites existing field TTL with bigger value" { + r FLUSHALL + set config [dict create \ + EX [list setup_cmd EX setup_val 100000 bigger_val 200000] \ + PX [list setup_cmd PX setup_val 100000000 bigger_val 200000000] \ + EXAT [list setup_cmd EX setup_val 100000 bigger_val [expr {[clock seconds] + 200000}]] \ + PXAT [list setup_cmd PX setup_val 100000000 bigger_val [expr {[clock milliseconds] + 200000000}]] \ + ] + set params [dict get $config $command] + set setup_cmd [dict get $params setup_cmd] + set setup_val [dict get $params setup_val] + set bigger_val [dict get $params bigger_val] + + r HSETEX myhash $setup_cmd $setup_val FIELDS 1 f1 v1 + set old_ttl [r HTTL myhash FIELDS 1 f1] + r HGETEX myhash $command $bigger_val FIELDS 1 f1 + set new_ttl [r HTTL myhash FIELDS 1 f1] + assert {$new_ttl > $old_ttl} + } + + test "HGETEX $command overwrites existing field TTL with smaller value" { + r FLUSHALL + set config [dict create \ + EX [list setup_cmd EX setup_val 100000 smaller_val 50000] \ + PX [list setup_cmd PX setup_val 100000000 smaller_val 50000000] \ + EXAT [list setup_cmd EX setup_val 100000 smaller_val [expr {[clock seconds] + 50000}]] \ + PXAT [list setup_cmd PX setup_val 100000000 smaller_val [expr {[clock milliseconds] + 50000000}]] \ + ] + set params [dict get $config $command] + set setup_cmd [dict get $params setup_cmd] + set setup_val [dict get $params setup_val] + set smaller_val [dict get $params smaller_val] + + r HSETEX myhash $setup_cmd $setup_val FIELDS 1 f1 v1 + set old_ttl [r HTTL myhash FIELDS 1 f1] + r HGETEX myhash $command $smaller_val FIELDS 1 f1 + set new_ttl [r HTTL myhash FIELDS 1 f1] + assert {$new_ttl <= $old_ttl} + } + } + test {HGETEX - verify no change when field does not exist} { r FLUSHALL r HSET myhash f1 v1 @@ -335,34 +412,36 @@ start_server {tags {"hashexpire"}} { test "HGETEX $command generates hexpire keyspace notification" { r FLUSHALL r HSET myhash f1 v1 - + assert_equal 0 [get_keys_with_volatile_items r] set rd [setup_single_keyspace_notification r] r HGETEX myhash $command [get_long_expire_value $command] FIELDS 1 f1 assert_keyevent_patterns $rd myhash hexpire + assert_equal 1 [get_keys_with_volatile_items r] $rd close } test "HGETEX $command with multiple fields generates single notification" { r FLUSHALL r HSET myhash f1 v1 f2 v2 f3 v3 - + assert_equal 0 [get_keys_with_volatile_items r] set rd [setup_single_keyspace_notification r] - + r HGETEX myhash $command [get_long_expire_value $command] FIELDS 3 f1 f2 f3 assert_keyevent_patterns $rd myhash hexpire # Verify no notification (getting hset and not hexpire) r HSET dummy dummy dummy assert_keyevent_patterns $rd dummy hset + assert_equal 1 [get_keys_with_volatile_items r] $rd close } test "HGETEX $command on non-existent field generates no notification" { r FLUSHALL r HSET myhash f1 v1 - + assert_equal 0 [get_keys_with_volatile_items r] set rd [setup_single_keyspace_notification r] # This HGETEX targets a non-existent field, so no notification about hexpire should be emitted @@ -371,7 +450,7 @@ start_server {tags {"hashexpire"}} { # Verify no notification (getting hset and not hexpire) r HSET dummy dummy dummy assert_keyevent_patterns $rd dummy hset - + assert_equal 0 [get_keys_with_volatile_items r] $rd close } } @@ -379,26 +458,29 @@ start_server {tags {"hashexpire"}} { test {HGETEX PERSIST generates hpersist keyspace notification} { r FLUSHALL r HSET myhash f1 v1 - r HEXPIRE myhash 60 FIELDS 1 f1 + assert_equal 0 [get_keys_with_volatile_items r] + + r HEXPIRE myhash [get_long_expire_value HEXPIRE] FIELDS 1 f1 + assert_equal 1 [get_keys_with_volatile_items r] set rd [setup_single_keyspace_notification r] r HGETEX myhash PERSIST FIELDS 1 f1 assert_keyevent_patterns $rd myhash hpersist + assert_equal 0 [get_keys_with_volatile_items r] $rd close } foreach command {EX PX EXAT PXAT} { - test "HGETEX $command 0/past time works correctly with 1 field" { r FLUSHALL # Create hash with field r HSET myhash f1 v1 assert_equal 1 [r HLEN myhash] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] - + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal 1 [get_keys r] set rd [setup_single_keyspace_notification r] # Set field to expire immediately @@ -409,8 +491,8 @@ start_server {tags {"hashexpire"}} { assert_equal -2 [r HTTL myhash FIELDS 1 f1] assert_equal 0 [r HLEN myhash] assert_equal 0 [r EXISTS myhash] - assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] - + assert_equal 0 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] $rd close } @@ -420,7 +502,8 @@ start_server {tags {"hashexpire"}} { # Create hash with field r HSETEX myhash EX 1000 FIELDS 1 f1 v1 assert_equal 1 [r HLEN myhash] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys_with_volatile_items r] + assert_equal 1 [get_keys r] set rd [setup_single_keyspace_notification r] @@ -432,7 +515,8 @@ start_server {tags {"hashexpire"}} { assert_equal -2 [r HTTL myhash FIELDS 1 f1] assert_equal 0 [r HLEN myhash] assert_equal 0 [r EXISTS myhash] - assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 0 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] $rd close } @@ -443,7 +527,8 @@ start_server {tags {"hashexpire"}} { # Create hash with field r HSET myhash f1 v1 f2 v2 assert_equal 2 [r HLEN myhash] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal 1 [get_keys r] set rd [setup_single_keyspace_notification r] @@ -455,7 +540,8 @@ start_server {tags {"hashexpire"}} { assert_equal -2 [r HTTL myhash FIELDS 1 f2] assert_equal 1 [r HLEN myhash] assert_equal 1 [r EXISTS myhash] - assert_match 1 [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] $rd close } @@ -465,9 +551,10 @@ start_server {tags {"hashexpire"}} { # Create hash with field r HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 - r HEXPIRE myhash 1000000 FIELDS 1 f1 + r HEXPIRE myhash [get_long_expire_value HEXPIRE] FIELDS 1 f1 + assert_equal 1 [get_keys_with_volatile_items r] assert_equal 4 [r HLEN myhash] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] set rd [setup_single_keyspace_notification r] @@ -479,7 +566,8 @@ start_server {tags {"hashexpire"}} { assert_equal -2 [r HTTL myhash FIELDS 1 f1] assert_equal 3 [r HLEN myhash] assert_equal 1 [r EXISTS myhash] - assert_match 1 [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] $rd close } @@ -495,6 +583,7 @@ start_server {tags {"hashexpire"}} { r HSETEX myhash PX 1000 FIELDS 1 field1 val1 set original_pttl [r HPTTL myhash FIELDS 1 field1] set original_expiretime [r HEXPIRETIME myhash FIELDS 1 field1] + assert_equal 1 [get_keys_with_volatile_items r] # Validate TTL is active and expiretime is in the future assert {$original_pttl > 0} @@ -551,6 +640,7 @@ start_server {tags {"hashexpire"}} { r FLUSHALL r HSET myhash field2 "persistent" r HSETEX myhash EX 1 FIELDS 1 field1 "temp" + assert_equal 1 [get_keys_with_volatile_items r] after 1100 assert_equal 0 [r HEXISTS myhash field1] assert_equal 1 [r HEXISTS myhash field2] @@ -589,6 +679,7 @@ start_server {tags {"hashexpire"}} { set ttl [r HPTTL myhash FIELDS 1 field1] assert {$ttl >= 19000 && $ttl <= 20000} assert_equal newval [r HGET myhash field1] + assert_equal 1 [get_keys_with_volatile_items r] } test {HSETEX PX - test zero ttl expires immediately} { @@ -1562,13 +1653,13 @@ start_server {tags {"hashexpire"}} { # does NOT trigger Valkey's expiration mechanism. # # The key observation is that Valkey tracks how many fields were - # expired via TTL using the `expired_subkeys` counter in INFO stats. + # expired via TTL using the `expired_fields` counter in INFO stats. # If HDEL caused expiration to be processed internally, # this counter would increment. We assert that it remains unchanged. - # Capture expired_subkeys before + # Capture expired_fields before set before_info [r INFO stats] - set before [info_field $before_info expired_subkeys] + set before [info_field $before_info expired_fields] # Create field with short TTL r HSETEX myhash PX 10 FIELDS 1 field1 val1 @@ -1583,9 +1674,9 @@ start_server {tags {"hashexpire"}} { # Field should be gone assert_equal 0 [r HEXISTS myhash field1] - # Capture expired_subkeys again + # Capture expired_fields again set after_info [r INFO stats] - set after [info_field $after_info expired_subkeys] + set after [info_field $after_info expired_fields] # Verify that no expiry occurred internally assert_equal $before $after @@ -1991,7 +2082,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_equal $f1_exp [$instance HEXPIRETIME myhash FIELDS 1 f1] assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] assert_equal -1 [$instance HTTL myhash FIELDS 1 f3] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal 1 [get_keys $instance] + assert_equal 1 [get_keys_with_volatile_items $instance] assert_equal "v1 v2 v3" [$instance HMGET myhash f1 f2 f3] assert_equal 3 [$instance HLEN myhash] } @@ -2087,7 +2179,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_equal $f1_exp [$instance HEXPIRETIME myhash FIELDS 1 f1] assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] assert_equal -1 [$instance HTTL myhash FIELDS 1 f3] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal 1 [get_keys $instance] + assert_equal 1 [get_keys_with_volatile_items $instance] assert_equal "v1 v2 v3" [$instance HMGET myhash f1 f2 f3] assert_equal 3 [$instance HLEN myhash] } @@ -2114,7 +2207,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_equal $f1_exp [$instance HEXPIRETIME myhash FIELDS 1 f1] assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] assert_equal -1 [$instance HTTL myhash FIELDS 1 f3] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal 1 [get_keys $instance] + assert_equal 1 [get_keys_with_volatile_items $instance] assert_equal "v1 v2 v3" [$instance HMGET myhash f1 f2 f3] assert_equal 3 [$instance HLEN myhash] } @@ -2132,7 +2226,8 @@ start_server {tags {"hashexpire external:skip"}} { foreach instance [list $primary $replica_1] { assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] assert_equal -2 [$instance HTTL myhash FIELDS 1 f1] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal 1 [get_keys $instance] + assert_equal 1 [get_keys_with_volatile_items $instance] assert_equal "{} v2 v3" [$instance HMGET myhash f1 f2 f3] assert_equal 3 [$instance HLEN myhash] } @@ -2197,7 +2292,8 @@ start_cluster 3 0 {tags {"cluster mytest external:skip"} overrides {cluster-node assert_equal 3 [R 0 HLEN $key] assert_morethan [R 0 HTTL $key FIELDS 1 f1] 290 assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [R 0 info keyspace]] keys=%d] - + assert_equal 1 [scan [lindex [regexp -inline {keys_with_volatile_items=([\d]+)} [R 0 info keyspace]] 1] "%d"] + # Prepare slot migration set slot [R 0 CLUSTER KEYSLOT $key] assert_equal OK [R 1 CLUSTER SETSLOT $slot IMPORTING $R0_id] @@ -2214,6 +2310,7 @@ start_cluster 3 0 {tags {"cluster mytest external:skip"} overrides {cluster-node assert_equal 3 [R 1 HLEN $key] assert_morethan [R 1 HTTL $key FIELDS 1 f1] 280 assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [R 1 info keyspace]] keys=%d] + assert_equal 1 [scan [lindex [regexp -inline {keys_with_volatile_items=([\d]+)} [R 1 info keyspace]] 1] "%d"] # Setup keyspace notifications R 1 config set notify-keyspace-events KEA @@ -2250,7 +2347,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_morethan [r HTTL myhash FIELDS 1 f1] 100 assert_equal -1 [r HTTL myhash FIELDS 1 f2] assert_equal 2 [r HLEN myhash] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 1 [get_keys_with_volatile_items r] # Run the command if {$cmd eq "RENAME"} { @@ -2269,9 +2367,11 @@ start_server {tags {"hashexpire external:skip"}} { assert_equal -1 [r HTTL $newhash FIELDS 1 f2] assert_equal 2 [r HLEN $newhash] if {$cmd eq "RESTORE"} { - assert_match {2} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 2 [get_keys r] + assert_equal 2 [get_keys_with_volatile_items r] } else { - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 1 [get_keys_with_volatile_items r] } assert_equal $mem_before $memory_after } @@ -2295,7 +2395,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_morethan [r HTTL myhash FIELDS 1 f3] 0 assert_equal -1 [r HTTL myhash FIELDS 1 f4] assert_equal 3 [r HLEN myhash] - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 1 [get_keys_with_volatile_items r] # Copy hash to new key r copy myhash newhash1 @@ -2314,7 +2415,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_morethan [r HTTL newhash1 FIELDS 1 f3] 0 assert_equal -1 [r HTTL newhash1 FIELDS 1 f4] assert_equal 3 [r HLEN newhash1] - assert_match {2} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 2 [get_keys r] + assert_equal 2 [get_keys_with_volatile_items r] assert_equal $mem_before $mem_after @@ -2412,7 +2514,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_equal {1} [psubscribe $rd __keyevent@*] r HSET myhash f1 v1 f2 v2 f3 v3 - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] assert_equal 3 [r HLEN myhash] if {$time_unit eq "s"} { r HEXPIRE hash1 10 FIELDS 1 f1 @@ -2428,9 +2531,10 @@ start_server {tags {"hashexpire external:skip"}} { fail "myhash still exists" } assert_equal 0 [r HLEN myhash] - assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 0 [get_keys r] assert_keyevent_patterns $rd myhash hset hexpire expire + assert_equal 0 [get_keys_with_volatile_items r] $rd close # Re-enable active expiry r DEBUG SET-ACTIVE-EXPIRE yes @@ -2443,7 +2547,8 @@ start_server {tags {"hashexpire external:skip"}} { assert_equal {1} [psubscribe $rd __keyevent@*] r HSET myhash f1 v1 f2 v2 f3 v3 - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] assert_equal 3 [r HLEN myhash] if {$time_unit eq "s"} { r HEXPIRE myhash 1 FIELDS 1 f1 @@ -2458,10 +2563,13 @@ start_server {tags {"hashexpire external:skip"}} { } else { fail "f1 not expired" } - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] assert_equal 1 [r EXISTS myhash] assert_equal "{} v2 v3" [r HMGET myhash f1 f2 f3] assert_keyevent_patterns $rd myhash hset hexpire + # When active expire is disabled, expired key is + # not deleted and get_keys_with_volatile_items is the same + assert_equal 1 [get_keys_with_volatile_items r] $rd close # Re-enable active expiry r DEBUG SET-ACTIVE-EXPIRE yes @@ -2472,7 +2580,7 @@ start_server {tags {"hashexpire external:skip"}} { r DEBUG SET-ACTIVE-EXPIRE no r HSET myhash f1 v1 f2 v2 f3 v3 - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] assert_equal 3 [r HLEN myhash] @@ -2493,7 +2601,7 @@ start_server {tags {"hashexpire external:skip"}} { } assert_equal "{} {} {}" [r HMGET myhash f1 f2 f3] - assert_match "" [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 0 [get_keys r] assert_equal 0 [r HLEN myhash] # Re-enable active expiry r DEBUG SET-ACTIVE-EXPIRE yes @@ -2523,13 +2631,15 @@ start_server {tags {"hashexpire external:skip"}} { r FLUSHALL r HSET myhash f1 v1 f2 v2 - assert_match {1} [scan [regexp -inline {keys\=([\d]*)} [r info keyspace]] keys=%d] + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] assert_equal 2 [r HLEN myhash] r HEXPIRE myhash 100000 FIELDS 1 f1 r PERSIST myhash assert_equal -1 [r TTL myhash] assert_morethan [r HTTL myhash FIELDS 1 f1] 0 + assert_equal 1 [get_keys_with_volatile_items r] } } @@ -2538,11 +2648,13 @@ tags {"aof external:skip"} { set defaults {appendonly {yes} appendfilename {appendonly.aof} appenddirname {appendonlydir} auto-aof-rewrite-percentage {0}} set server_path [tmpdir server.multi.aof] start_server_aof [list dir $server_path] { + r DEBUG SET-ACTIVE-EXPIRE no test {TTL Persistence in AOF} { r flushall r DEBUG SET-ACTIVE-EXPIRE no r config set appendonly yes r config set appendfsync always + assert_equal 0 [get_keys_with_volatile_items r] # Create hash with 1 short, long and no expired fields set long_expire [expr {[clock seconds] + 1000000}] @@ -2586,6 +2698,7 @@ tags {"aof external:skip"} { assert_equal v$i [r HGET myhash f$i] } } + assert_equal 1 [get_keys_with_volatile_items r] # Ensure the initial rewrite finishes waitForBgrewriteaof r @@ -2613,10 +2726,19 @@ tags {"aof external:skip"} { # Restart the server and load the AOF restart_server 0 true false r debug loadaof + r DEBUG SET-ACTIVE-EXPIRE no - # Verify hash after loading from aof - # Verify same HLEN - assert_equal 30 [r HLEN myhash] + set hlen [r HLEN myhash] + set expired_fields [info_field [r info stats] expired_fields] + assert_equal 1 [get_keys_with_volatile_items r] + + # Verify that HLEN is between 20 and 30 (inclusive), and + # when combined with expired_fields, the total should be 30 + if {$hlen < 20 || $hlen > 30} { + fail "Expected HLEN to be between 20 and 30, but got $hlen" + } + assert_equal 30 [expr ($expired_fields + $hlen)] + # Verify the TTLs are preserved for {set i 1} {$i <= 10} {incr i} { assert_equal $long_expire [r HEXPIRETIME myhash FIELDS 1 f$i] @@ -2637,3 +2759,1687 @@ tags {"aof external:skip"} { } {OK} {needs:debug} } } + +### ACTIVE EXPIRY TESTS #### +##### HGETEX Active Expiry Tests ##### +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + + foreach command {EX PX EXAT PXAT} { + test "HGETEX $command active expiry with single field" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 + assert_equal 2 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + + # Use HGETEX to set expiry + assert_equal "v1" [r HGETEX myhash $command [get_short_expire_value $command] FIELDS 1 f1] + wait_for_active_expiry r myhash 1 $initial_expired 1 + assert_equal "{} v2" [r HGETEX myhash FIELDS 2 f1 f2] + assert_equal 0 [get_keys_with_volatile_items r] + } + + test "HGETEX $command active expiry with multiple fields" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_equal 3 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + + # Set expiry on multiple fields with HGETEX + assert_equal "v1 v3" [r HGETEX myhash $command [get_short_expire_value $command] FIELDS 2 f1 f3] + + wait_for_active_expiry r myhash 1 $initial_expired 2 + + # Verify only non-expired field remains + assert_equal "{} v2 {}" [r HGETEX myhash FIELDS 3 f1 f2 f3] + assert_equal 0 [get_keys_with_volatile_items r] + } + + test "HGETEX $command active expiry removes entire key when last field expires" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f1 v1 + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal "v1" [r HGETEX myhash $command [get_short_expire_value $command] FIELDS 1 f1] + wait_for_active_expiry r myhash 0 $initial_expired 1 + assert_equal 0 [r EXISTS myhash] + assert_equal 0 [get_keys_with_volatile_items r] + } + + test "HGETEX $command and HPEXPIRE" { + r FLUSHALL + r HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 + r HEXPIRE myhash 3000 FIELDS 1 f1 + r HSETEX myhash EX 5000 FIELDS 1 f2 v2 + r HEXPIRE myhash 60000 FIELDS 1 f3 + assert_equal "v1 v2 v3 v4" [r HGETEX myhash FIELDS 4 f1 f2 f3 f4] + assert_equal "v3" [r HGETEX myhash PERSIST FIELDS 1 f3] + r HPEXPIRE myhash 1 FIELDS 1 f1 + } + } + + test "HGETEX PERSIST removes expiry and prevents active expiry" { + r FLUSHALL + + r HSET myhash f1 v1 f2 v2 + assert_equal 2 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + + # Set short expiry + assert_equal "v1" [r HGETEX myhash PX 1000 FIELDS 1 f1] + + # Immediately persist to prevent expiry + assert_equal "v1" [r HGETEX myhash PERSIST FIELDS 1 f1] + assert_equal -1 [r HTTL myhash FIELDS 1 f1] + + # Wait longer than original expiry time + after 200 + + # Field should still exist due to PERSIST + assert_equal "v1" [r HGET myhash f1] + assert_equal 2 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + } + + test "HGETEX overwrite existing expiry with active expiry" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 + assert_equal 1 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + + # Set initial long expiry + r HEXPIRE myhash [get_long_expire_value HEXPIRE] FIELDS 1 f1 + assert_morethan [r HTTL myhash FIELDS 1 f1] 5000 + + # Use HGETEX to set shorter expiry + assert_equal "v1" [r HGETEX myhash PX 100 FIELDS 1 f1] + + # Wait for active expiry with new shorter time + wait_for_active_expiry r myhash 0 $initial_expired 1 + + assert_equal 0 [r EXISTS myhash] + assert_equal 0 [get_keys_with_volatile_items r] + } +} + +##### HGETEX Active Expiry Keyspace Notifications ##### +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + foreach command {EX PX EXAT PXAT} { + test "HGETEX $command keyspace notifications for active expiry" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 + assert_equal 2 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + + assert_equal 2 [r HLEN myhash] + set rd [setup_single_keyspace_notification r] + + # Set expiry with HGETEX + r HGETEX myhash $command [get_short_expire_value $command] FIELDS 1 f1 + + wait_for_active_expiry r myhash 1 $initial_expired 1 + assert_keyevent_patterns $rd myhash hexpire hexpired + assert_equal 0 [get_keys_with_volatile_items r] + $rd close + } + } + + test "HGETEX keyspace notification when key deleted with active expiry" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 + assert_equal 1 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + + set rd [setup_single_keyspace_notification r] + + # Set expiry on only field + r HGETEX myhash PX [get_short_expire_value PX] FIELDS 1 f1 + + wait_for_active_expiry r myhash 0 $initial_expired 1 + assert_equal 0 [r EXISTS myhash] + # Should get both hexpired and del notifications + assert_keyevent_patterns $rd myhash hexpire hexpired del + assert_equal 0 [get_keys_with_volatile_items r] + $rd close + } +} + +##### HSETEX Active Expiry Tests ##### +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + + foreach command {EX PX EXAT PXAT} { + test "HSETEX $command single field expires leaving other fields intact" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f2 v2 + assert_equal 1 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + # Use HSETEX to set expiry + r HSETEX myhash $command [get_short_expire_value $command] FIELDS 1 f1 v1 + wait_for_active_expiry r myhash 1 $initial_expired 1 + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal "{} v2" [r HGETEX myhash FIELDS 2 f1 f2] + } + + test "HSETEX $command multiple fields expire leaving non-expired fields intact" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f2 v2 + assert_equal 1 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + # Set expiry on multiple fields with HSETEX + r HSETEX myhash $command [get_short_expire_value $command] FIELDS 2 f1 v1 f3 v3 + wait_for_active_expiry r myhash 1 $initial_expired 2 + assert_equal 0 [get_keys_with_volatile_items r] + # Verify only non-expired field remains + assert_equal "{} v2 {}" [r HGETEX myhash FIELDS 3 f1 f2 f3] + } + + test "HSETEX $command hash key deleted when all fields expire" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSETEX myhash $command [get_short_expire_value $command] FIELDS 1 f1 v1 + wait_for_active_expiry r myhash 0 $initial_expired 1 + assert_equal 0 [r EXISTS myhash] + } + + test "HSETEX $command after HSETEX $command" { + r FLUSHALL + r HSETEX myhash EX 1000000000 FIELDS 1 f1 v1 + r HSETEX myhash PX 10 FIELDS 1 f2 v2 + } + } + + test "HPERSIST cancels HSETEX expiry preventing field deletion" { + r FLUSHALL + r HSET myhash f2 v2 + assert_equal 1 [r HLEN myhash] + # Set short expiry + r HSETEX myhash PX [get_short_expire_value PX] FIELDS 1 f1 v1 + # Immediately persist to prevent expiry + r HPERSIST myhash FIELDS 1 f1 + assert_equal -1 [r HTTL myhash FIELDS 1 f1] + # Wait longer than original expiry time + after 200 + # Field should still exist due to PERSIST + assert_equal "v1" [r HGET myhash f1] + assert_equal 2 [r HLEN myhash] + } + + test "HSETEX overwrites existing field expiry with new shorter expiry" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f1 v1 + assert_equal 1 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + # Set initial long expiry + r HEXPIRE myhash [get_long_expire_value HEXPIRE] FIELDS 1 f1 + assert_equal 1 [get_keys_with_volatile_items r] + assert_morethan [r HTTL myhash FIELDS 1 f1] 5000 + # Use HSETEX to set shorter expiry + r HSETEX myhash PX 100 FIELDS 1 f1 v1 + # Wait for active expiry with new shorter time + wait_for_active_expiry r myhash 0 $initial_expired 1 + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal 0 [r EXISTS myhash] + } +} + +##### HSETEX Active Expiry Keyspace Notifications ##### +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + foreach command {EX PX EXAT PXAT} { + test "HSETEX $command - keyspace notifications fired on field expiry" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f2 v2 + assert_equal 1 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + set rd [setup_single_keyspace_notification r] + r HSETEX myhash $command [get_short_expire_value $command] FIELDS 1 f1 v1 + wait_for_active_expiry r myhash 1 $initial_expired 1 + assert_keyevent_patterns $rd myhash hset hexpire hexpired + assert_equal 0 [get_keys_with_volatile_items r] + $rd close + } + } + + test "HSETEX - keyspace notifications include del event when hash key removed" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + set rd [setup_single_keyspace_notification r] + r HSETEX myhash PX 100 FIELDS 1 f1 v1 + wait_for_active_expiry r myhash 0 $initial_expired 1 + assert_equal 0 [r EXISTS myhash] + assert_keyevent_patterns $rd myhash hset hexpire hexpired del + $rd close + } +} + +##### Active expiry test with 1 node ##### +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + set rd [valkey_deferring_client] + assert_equal {1} [psubscribe $rd __keyevent@*] + + test {Active expiry deletes entire key when only field expires} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f1 v1 + assert_equal 1 [r HLEN myhash] + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] + r HPEXPIRE myhash 100 FIELDS 1 f1 + wait_for_active_expiry r myhash 0 $initial_expired 1 + # Key is deleted after its only field got expired + assert_equal 0 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal "" [r HGET myhash f1] + assert_equal 0 [r EXISTS myhash] + # Verify keyspace notifications + assert_keyevent_patterns $rd myhash hset hexpire hexpired del + } + + test {Active expiry removes only expired field while preserving others} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_equal 3 [r HLEN myhash] + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] + r HPEXPIRE myhash 100 FIELDS 1 f1 + set mem_before [r MEMORY USAGE myhash] + wait_for_active_expiry r myhash 2 $initial_expired 1 + # Key still exists because it has 2 fields remaining + assert_equal 1 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal "{} v2 v3" [r HGETEX myhash FIELDS 3 f1 f2 f3] + # Verify memory decreased after field expiry + set mem_after [r MEMORY USAGE myhash] + assert_morethan $mem_before $mem_after + # Verify keyspace notifications + assert_keyevent_patterns $rd myhash hset hexpire hexpired + assert_equal 0 [get_keys_with_volatile_items r] + } + + test {Active expiry reclaims memory correctly with large hash containing many fields} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + set value [string repeat x 1024] + set num_fields 10000 + # Set multiple fields + for {set i 1} {$i <= $num_fields} {incr i} { + lappend pairs "f$i" $value$i + } + r HSET myhash {*}$pairs + assert_equal 0 [get_keys_with_volatile_items r] + assert_equal $num_fields [r HLEN myhash] + + set mem_before_expire [r MEMORY USAGE myhash] + if {$mem_before_expire eq ""} {set mem_before_expire 0} + assert_morethan $mem_before_expire 10000000 + assert_equal 1 [get_keys r] + assert_equal $num_fields [r HLEN myhash] + r HPEXPIRE myhash 100 FIELDS 1 f1 + + wait_for_active_expiry r myhash [expr {$num_fields - 1}] $initial_expired 1 + # Key still exists because it has num_fields 1 fields remaining + assert_equal 1 [get_keys r] + assert_equal "" [r HGET myhash f1] + for {set i 2} {$i <= $num_fields} {incr i} { + assert_equal $value$i [r HGET myhash "f$i"] + } + assert_equal 0 [get_keys_with_volatile_items r] + + # Expire all remaining fields + set all_field_names {} + for {set i 2} {$i <= $num_fields} {incr i} { + lappend all_field_names "f$i" + } + r HPEXPIRE myhash 100 FIELDS [expr {$num_fields - 1}] {*}$all_field_names + wait_for_active_expiry r myhash 0 $initial_expired $num_fields 350 100 + # Verify memory decreased by at least 15MB (size of hash key) + set mem_after_expire [r MEMORY USAGE myhash] + if {$mem_after_expire eq ""} {set mem_after_expire 0} + assert_morethan [expr {$mem_before_expire - $mem_after_expire}] 10000000 + # Verify keyspace notifications + assert_keyevent_patterns $rd myhash hset hexpire hexpired hexpire hexpired + # Wait for del, maximum num_fields reads + for {set i 2} {$i <= $num_fields} {incr i} { + if {[string match "pmessage __keyevent@* __keyevent@*:del myhash" [$rd read]]} { + break + } + } + assert_equal 0 [get_keys_with_volatile_items r] + } + + test {Active expiry handles fields with different TTL values correctly} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_equal 3 [r HLEN myhash] + + # Set very short expiry and longer expiry + r HPEXPIRE myhash [get_short_expire_value HPEXPIRE] FIELDS 1 f1 + # Wait for f1 to expire + wait_for_active_expiry r myhash 2 $initial_expired 1 + r HEXPIRE myhash [get_long_expire_value HEXPIRE] FIELDS 1 f2 + # f3 has no expiry + # Verify f2 and f3 still exist + assert_equal 2 [r HLEN myhash] + assert_equal "{} v2 v3" [r HGETEX myhash FIELDS 3 f1 f2 f3] + assert_keyevent_patterns $rd myhash hset hexpire hexpired hexpire + } + + test {Active expiry removes only specified fields leaving others intact} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 f5 v5 + assert_equal 5 [r HLEN myhash] + + # Set expiry on alternating fields + r HPEXPIRE myhash 100 FIELDS 2 f1 f3 + # f2, f4, f5 have no expiry + + wait_for_active_expiry r myhash 3 $initial_expired 2 + + # Verify expired fields are gone and non-expired exists + assert_equal "{} v2 {} v4 v5" [r HGETEX myhash FIELDS 5 f1 f2 f3 f4 f5] + + # Key should still exist + assert_equal 1 [get_keys r] + } + + $rd close + + test {Field TTL is removed when field value is overwritten with HSET} { + r FLUSHALL + r HSET myhash f1 v1 + r HEXPIRE myhash 100000 FIELDS 1 f1 + r HSET myhash f1 v2 + # TTL should be removed after overwrite + assert_equal -1 [r HPTTL myhash FIELDS 1 f1] + # Field should still exist + assert_equal "v2" [r HGET myhash f1] + } + + # Active expiry with field deletion and recreation + test {Field TTL is cleared when field is deleted and recreated} { + r FLUSHALL + r HSET myhash f1 v1 + r HPEXPIRE myhash 100 FIELDS 1 f1 + r HDEL myhash f1 + r HSET myhash f1 v2 + assert_equal -1 [r HPTTL myhash FIELDS 1 f1] + after 200 + assert_equal v2 [r HGET myhash f1] + } +} + +##### Test Active Expiry Tests with all hash expire commands ##### +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + + foreach command {HEXPIRE HPEXPIRE HEXPIREAT HPEXPIREAT} { + test "$command active expiry on single field" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 + assert_equal 2 [r HLEN myhash] + + # Set expiry based on command type + r $command myhash [get_short_expire_value $command] FIELDS 1 f1 + + # Wait for active expiry + wait_for_active_expiry r myhash 1 $initial_expired 1 + + # Verify only expired field is gone + assert_equal "{} v2" [r HGETEX myhash FIELDS 2 f1 f2] + } + + test "$command active expiry with multiple fields" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 + assert_equal 4 [r HLEN myhash] + + # Set expiry on multiple fields + r $command myhash [get_short_expire_value $command] FIELDS 3 f1 f2 f4 + + # Wait for active expiry + wait_for_active_expiry r myhash 1 $initial_expired 3 + + # Only f3 should remain + assert_equal "{} {} v3 {}" [r HGETEX myhash FIELDS 4 f1 f2 f3 f4] + } + + test "$command active expiry removes entire key when last field expires" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 + assert_equal 1 [r HLEN myhash] + + # Set expiry on only field + r $command myhash [get_short_expire_value $command] FIELDS 1 f1 + + + # Wait for active expiry to remove key + wait_for_active_expiry r myhash 0 $initial_expired 1 + + assert_equal 0 [r EXISTS myhash] + } + + test "$command active expiry with non-existing fields" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 + assert_equal 2 [r HLEN myhash] + + # Try to expire non-existing fields + r $command myhash [get_short_expire_value $command] FIELDS 2 f3 f4 + + + # Wait to ensure no active expiry occurs + after 1500 + assert [check_myhash_and_expired_subkeys r myhash 2 $initial_expired 0] + } + + test "$command active expiry with mixed existing and non-existing fields" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_equal 3 [r HLEN myhash] + + # Mix of existing and non-existing fields + r $command myhash [get_short_expire_value $command] FIELDS 4 f1 f4 f3 f5 + + + # Wait for active expiry of existing fields only + wait_for_active_expiry r myhash 1 $initial_expired 2 + + # Only f2 should remain + assert_equal "{} v2 {}" [r HGETEX myhash FIELDS 3 f1 f2 f3] + } + + test "$command active expiry with already expired fields" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_equal 3 [r HLEN myhash] + + # Set very short expiry on f1 + r $command myhash [get_short_expire_value $command] FIELDS 1 f1 + + + # Wait for active expiry + wait_for_active_expiry r myhash 2 $initial_expired 1 + + # Now try to expire f1 again (already expired) and f2 (existing) + r $command myhash [get_short_expire_value $command] FIELDS 2 f1 f2 + + # Wait for f2 to expire + wait_for_active_expiry r myhash 1 $initial_expired 2 + + # Only f3 should remain + assert_equal "{} {} v3" [r HGETEX myhash FIELDS 3 f1 f2 f3] + } + } +} + +##### Active expiry test slot migration ##### +start_cluster 3 0 {tags {"cluster mytest external:skip"} overrides {cluster-node-timeout 1000}} { + # Flush all data on all cluster nodes before starting + for {set i 0} {$i < 3} {incr i} { + R $i FLUSHALL + } + set R0_id [R 0 CLUSTER MYID] + set R1_id [R 1 CLUSTER MYID] + + # Use a fixed hash tag to ensure key is in one slot + set key "{mymigrate}myhash" + + test {Hash field TTL values and active expiry state preserved during cluster slot migration} { + set initial_expired [info_field [R 0 info stats] expired_fields] + + R 0 HSET $key f1 v1 f2 v2 f3 v3 + assert_equal 3 [R 0 HLEN $key] + + set far_exp [expr {[clock seconds] + 30000}] + R 0 HEXPIREAT $key $far_exp FIELDS 1 f1 ; # f1 with far expire + R 0 HPEXPIRE $key 100 FIELDS 1 f2 ; # f2 with short expire + assert_equal 1 [scan [lindex [regexp -inline {keys_with_volatile_items=([\d]+)} [R 0 info keyspace]] 1] "%d"] + + # Wait for short expire field (f2) to be expired by active expire + wait_for_condition 100 100 { + [R 0 HLEN $key] eq 2 && + [info_field [R 0 info stats] expired_fields] eq [expr {$initial_expired + 1}] + } else { + fail "Fields should have expired" + } + + # Verify expired field returns empty string and non-expired returns value + assert_equal "v1 {} v3" [R 0 HMGET $key f1 f2 f3] + + # Prepare slot migration + set slot [R 0 CLUSTER KEYSLOT $key] + assert_equal OK [R 1 CLUSTER SETSLOT $slot IMPORTING $R0_id] + assert_equal OK [R 0 CLUSTER SETSLOT $slot MIGRATING $R1_id] + + # Migrate key to destination node + R 0 MIGRATE [srv -1 host] [srv -1 port] $key 0 5000 + + # Complete slot migration + R 0 CLUSTER SETSLOT $slot NODE $R1_id + R 1 CLUSTER SETSLOT $slot NODE $R1_id + + set initial_expired [info_field [R 1 info stats] expired_fields] + + # Verify after slot migration all fields are present and ttl is kept + assert_match {1} [scan [regexp -inline {keys=([\d]*)} [R 1 info keyspace]] keys=%d] + assert_equal 1 [scan [lindex [regexp -inline {keys_with_volatile_items=([\d]+)} [R 1 info keyspace]] 1] "%d"] + assert_equal 2 [R 1 HLEN $key] + assert_equal "v1 {} v3" [R 1 HMGET $key f1 f2 f3] + assert_equal -1 [R 1 HTTL $key FIELDS 1 f3] + assert_equal $far_exp [R 1 HEXPIRETIME $key FIELDS 1 f1] + assert_equal -2 [R 1 HTTL $key FIELDS 1 f2] + + # Set short expiration on all fields (some do not exist) + R 1 HPEXPIRE $key 100 FIELDS 3 f1 f2 f3 + + # Verify active expiry + wait_for_condition 200 50 { + [R 1 HLEN $key] eq 0 && + [info_field [R 1 info stats] expired_fields] eq [expr {$initial_expired + 2}] + } else { + fail "All fields should have expired" + } + assert_match "" [scan [regexp -inline {keys=([\d]*)} [R 1 info keyspace]] keys=%d] + # TODO handle empty #Keyspace properly + # assert_equal 0 [scan [lindex [regexp -inline {keys_with_volatile_items=([\d]+)} [R 1 info keyspace]] 1] "%d"] + } +} + +##### Active expiry test slot migration with multiple fields ##### +start_cluster 3 0 {tags {"cluster mytest external:skip"} overrides {cluster-node-timeout 1000}} { + # Flush all data on all cluster nodes before starting + for {set i 0} {$i < 3} {incr i} { + R $i FLUSHALL + } + set R0_id [R 0 CLUSTER MYID] + set R1_id [R 1 CLUSTER MYID] + + # Use a fixed hash tag to ensure key is in one slot + set key "{mymigrate}myhash" + + test {Large hash with mixed TTL fields maintains expiry state after cluster slot migration} { + set initial_expired [info_field [R 0 info stats] expired_fields] + set num_fields 100 + + # Create hash fields + for {set i 1} {$i <= $num_fields} {incr i} { + lappend pairs "f$i" "v$i" + } + R 0 HSET $key {*}$pairs + assert_equal $num_fields [R 0 HLEN $key] + + set far_exp [expr {[clock seconds] + 30000}] + # Set large TTL on 25 fields + for {set i 1} {$i <= 25} {incr i} { + R 0 HEXPIREAT $key $far_exp FIELDS 1 "f$i" + } + + # Set short TTL on 25 fields + for {set i 26} {$i <= 50} {incr i} { + R 0 HPEXPIRE $key 100 FIELDS 1 "f$i" + } + + # wait for short expire field to be expired by active expire + wait_for_condition 100 100 { + [R 0 HLEN $key] eq 75 && + [info_field [R 0 info stats] expired_fields] eq [expr {$initial_expired + 25}] + } else { + fail "Fields should have expired" + } + + # Verify expired fields return empty string and non-expired return values + for {set i 26} {$i <= 50} {incr i} { + assert_equal "" [R 0 HGET $key "f$i"] + } + for {set i 1} {$i <= 25} {incr i} { + assert_equal "v$i" [R 0 HGET $key "f$i"] + } + for {set i 51} {$i <= $num_fields} {incr i} { + assert_equal "v$i" [R 0 HGET $key "f$i"] + } + + # Prepare slot migration + set slot [R 0 CLUSTER KEYSLOT $key] + assert_equal OK [R 1 CLUSTER SETSLOT $slot IMPORTING $R0_id] + assert_equal OK [R 0 CLUSTER SETSLOT $slot MIGRATING $R1_id] + + # Migrate key to destination node + R 0 MIGRATE [srv -1 host] [srv -1 port] $key 0 5000 + + # Complete slot migration + R 0 CLUSTER SETSLOT $slot NODE $R1_id + R 1 CLUSTER SETSLOT $slot NODE $R1_id + + set initial_expired [info_field [R 1 info stats] expired_fields] + # Verify after slot migration all fields are present and ttl is kept + assert_equal 75 [R 1 HLEN $key] + for {set i 1} {$i <= $num_fields} {incr i} { + if {$i > 50} { + assert_equal -1 [R 1 HTTL $key FIELDS 1 "f$i"] + assert_equal "v$i" [R 1 HGET $key "f$i"] + } else { + if {$i <= 25} { + assert_equal $far_exp [R 1 HEXPIRETIME $key FIELDS 1 f$i] + assert_equal "v$i" [R 1 HGET $key "f$i"] + } else { + assert_equal -2 [R 1 HTTL $key FIELDS 1 "f$i"] + assert_equal "" [R 1 HGET $key "f$i"] + } + } + } + + # Set short expiration on all fields (some do not exist) + set fields {} + for {set i 1} {$i <= 100} {incr i} { + lappend fields "f$i" + } + R 1 HPEXPIRE $key 100 FIELDS 100 {*}$fields + + # Verify active expiry + wait_for_condition 100 100 { + [R 1 HLEN $key] eq 0 && + [info_field [R 1 info stats] expired_fields] eq [expr {$initial_expired + 75}] + } else { + fail "All fields should have expired" + } + } +} + +##### Active expiry test replication ##### +start_server {tags {"hashexpire external:skip"}} { + set primary [srv 0 client] + set primary_host [srv 0 host] + set primary_port [srv 0 port] + start_server {tags {needs:repl external:skip}} { + set replica [srv 0 client] + set replica_host [srv 0 host] + set replica_port [srv 0 port] + # Set this inner layer server as replica + set replica [srv 0 client] + + test {Hash field active expiry on primary triggers HDEL replication to replica} { + lassign [setup_replication_test $primary $replica $primary_host $primary_port] primary_initial_expired replica_initial_expired + + # Initialize deferred clients and subscribe to keyspace notifications + foreach instance [list $primary $replica] { + $instance config set notify-keyspace-events KEA + } + set rd_primary [valkey_deferring_client -1] + set rd_replica [valkey_deferring_client $replica_host $replica_port] + foreach rd [list $rd_primary $rd_replica] { + assert_equal {1} [psubscribe $rd __keyevent@*] + } + + # Create hash and timing f1 < f2 expiry times + set f1_exp [expr {[clock seconds] + 10000}] + + # Setup hash, set expire and set expire 0 + $primary HSET myhash f1 v1 f2 v2 ;# Should trigger hset + wait_for_ofs_sync $primary $replica + + $primary HPEXPIRE myhash 500 FIELDS 1 f1 ;# Should trigger 1 hexpire and then hexpired (for primary) and 1 hdel (for replica) + wait_for_ofs_sync $primary $replica + + # Wait for active expiry + wait_for_active_expiry $primary myhash 1 $primary_initial_expired 1 + # Ensure the replica does not increment expired_fields + assert_equal $replica_initial_expired [info_field [$replica info stats] expired_fields] + + # Verify expired field returns empty string and non-expired returns value + foreach instance [list $primary $replica] { + assert_equal "{} v2" [$instance HMGET myhash f1 f2] + assert_equal 0 [get_keys_with_volatile_items $instance] + } + + # Verify keyspace notification + foreach rd [list $rd_primary $rd_replica] { + assert_keyevent_patterns $rd myhash hset + assert_keyevent_patterns $rd myhash hexpire + } + # primary gets hexpired and replica gets hdel + assert_keyevent_patterns $rd_primary myhash hexpired + assert_keyevent_patterns $rd_replica myhash hdel + + $rd_primary close + $rd_replica close + } + + start_server {tags {needs:repl external:skip}} { + $primary FLUSHALL + set replica_2 [srv 0 client] + set replica_2_host [srv 0 host] + set replica_2_port [srv 0 port] + + test {Hash field TTL and active expiry propagates correctly through chain replication} { + $replica replicaof $primary_host $primary_port + # Wait for R2 to connect to R1 + wait_for_condition 100 100 { + [info_field [$replica info replication] master_link_status] eq "up" + } else { + fail "Replica <-> Primary connection not established" + } + + $replica_2 replicaof $replica_host $replica_port + # Wait for R2 to connect to R1 + wait_for_condition 100 100 { + [info_field [$replica info replication] master_link_status] eq "up" + } else { + fail "Second replica <-> First replica connection not established" + } + + # Initialize deferred clients and subscribe to keyspace notifications + foreach instance [list $primary $replica $replica_2] { + $instance config set notify-keyspace-events KEA + } + set rd_primary [valkey_deferring_client -2] + set rd_replica [valkey_deferring_client -1] + set rd_replica_2 [valkey_deferring_client $replica_2_host $replica_2_port] + foreach rd [list $rd_primary $rd_replica $rd_replica_2] { + assert_equal {1} [psubscribe $rd __keyevent@*] + } + + # Create hash and timing f1 < f2 expiry times + set f1_exp [expr {[clock seconds] + 10000}] + + ############################################# STEUP HASH ############################################# + $primary HSET myhash f1 v1 f2 v2 ;# Should trigger 3 hset + $primary HEXPIREAT myhash $f1_exp FIELDS 1 f1 ;# Should trigger 3 hexpire + wait_for_ofs_sync $primary $replica + wait_for_ofs_sync $replica $replica_2 + + set primary_initial_expired [info_field [$primary info stats] expired_fields] + set replica_initial_expired [info_field [$replica info stats] expired_fields] + set replica_2_initial_expired [info_field [$replica_2 info stats] expired_fields] + + $primary HPEXPIRE myhash 100 FIELDS 1 f1 ;# Should trigger 1 hexpired (for primary) and 2 hdel (for replicas) + wait_for_ofs_sync $primary $replica + wait_for_ofs_sync $replica $replica_2 + + # Wait for active expire + wait_for_active_expiry $primary myhash 1 $primary_initial_expired 1 + + # Ensure the replica does not increment expired_fields + assert_equal $replica_initial_expired [info_field [$replica info stats] expired_fields] + assert_equal $replica_2_initial_expired [info_field [$replica_2 info stats] expired_fields] + + + # Verify expired field returns empty string and non-expired returns value + foreach instance [list $primary $replica $replica_2] { + assert_equal "{} v2" [$instance HMGET myhash f1 f2] + assert_equal 0 [get_keys_with_volatile_items $instance] + } + + # primary gets hexpired and replicas get hdel + foreach rd [list $rd_primary $rd_replica $rd_replica_2] { + assert_keyevent_patterns $rd myhash hset hexpire hexpire + } + assert_keyevent_patterns $rd_primary myhash hexpired + assert_keyevent_patterns $rd_replica myhash hdel + assert_keyevent_patterns $rd_replica_2 myhash hdel + + $rd_primary close + $rd_replica close + $rd_replica_2 close + } + } + + proc verify_values {instance f1_exp f2_exp} { + assert_equal $f1_exp [$instance HEXPIRETIME myhash FIELDS 1 f1] + assert_equal $f2_exp [$instance HEXPIRETIME myhash FIELDS 1 f2] + assert_equal -1 [$instance HTTL myhash FIELDS 1 f3] + assert_match {1} [scan [regexp -inline {keys=([\d]*)} [$instance info keyspace]] keys=%d] + assert_equal "v1" [$instance HGET myhash f1] + assert_equal "v2" [$instance HGET myhash f2] + assert_equal "v3" [$instance HGET myhash f3] + assert_equal 3 [$instance HLEN myhash] + } + + test {Hash field TTL values remain intact after replica promotion to primary} { + lassign [setup_replication_test $primary $replica $primary_host $primary_port] primary_initial_expired replica_initial_expired + + # Initialize deferred clients and subscribe to keyspace notifications + foreach instance [list $primary $replica] { + $instance config set notify-keyspace-events KEA + } + set rd_primary [valkey_deferring_client -1] + set rd_replica [valkey_deferring_client $replica_host $replica_port] + foreach rd [list $rd_primary $rd_replica] { + assert_equal {1} [psubscribe $rd __keyevent@*] + } + + # Create hash fields with TTL on primary + set f1_exp [expr {[clock seconds] + 2000}] + set f2_exp [expr {[clock seconds] + 300000}] + $primary HSET myhash f1 v1 f2 v2 f3 v3 + $primary HEXPIREAT myhash $f1_exp FIELDS 1 f1 + $primary HEXPIREAT myhash $f2_exp FIELDS 1 f2 + # f3 remains persistent + + # Wait for full sync + wait_for_ofs_sync $primary $replica + + # Verify primary and replica are the same + foreach instance [list $primary $replica] { + verify_values $instance $f1_exp $f2_exp + assert_equal 1 [get_keys_with_volatile_items $instance] + } + + # Perform failover + $replica replicaof no one + # Wait for replica to become primary + wait_for_condition 100 100 { + [info_field [$replica info replication] role] eq "master" + } else { + fail "Replica didn't become master" + } + + # Check all values that checked before are the same + verify_values $replica $f1_exp $f2_exp + + # Set f1 to expire in 1 second and wait for active expiration + set replica_initial_expired [info_field [$replica info stats] expired_fields] + $replica HEXPIRE myhash 1 FIELDS 1 f1 + wait_for_active_expiry $replica myhash 2 $replica_initial_expired 1 + + assert_equal "{} v2 v3" [$replica HMGET myhash f1 f2 f3] + # Not affected primary + assert_equal 3 [$primary HLEN myhash] + assert_equal "v1 v2 v3" [$primary HMGET myhash f1 f2 f3] + set primary_initial_expired [info_field [$primary info stats] expired_fields] + assert_equal 0 [expr {[info_field [$primary info stats] expired_fields] - $primary_initial_expired}] + + foreach rd [list $rd_primary $rd_replica] { + assert_keyevent_patterns $rd myhash hset hexpire hexpire + } + assert_keyevent_patterns $rd_replica myhash hexpire + assert_keyevent_patterns $rd_replica myhash hexpired + $rd_primary close + $rd_replica close + } + + test {Hash field TTL values persist correctly during FAILOVER command execution} { + lassign [setup_replication_test $primary $replica $primary_host $primary_port] primary_initial_expired replica_initial_expired + + # Initialize deferred clients and subscribe to keyspace notifications + foreach instance [list $primary $replica] { + $instance config set notify-keyspace-events KEA + } + set rd_primary [valkey_deferring_client -1] + set rd_replica [valkey_deferring_client $replica_host $replica_port] + foreach rd [list $rd_primary $rd_replica] { + assert_equal {1} [psubscribe $rd __keyevent@*] + } + + # Create hash fields with TTL on primary + set f1_exp [expr {[clock seconds] + 2000}] + set f2_exp [expr {[clock seconds] + 300000}] + $primary HSET myhash f1 v1 f2 v2 f3 v3 + $primary HEXPIREAT myhash $f1_exp FIELDS 1 f1 + $primary HEXPIREAT myhash $f2_exp FIELDS 1 f2 + # f3 remains persistent + + # Wait for full sync + wait_for_ofs_sync $primary $replica + + # Verify primary and replica are the same + foreach instance [list $primary $replica] { + verify_values $instance $f1_exp $f2_exp + assert_equal 1 [get_keys_with_volatile_items $instance] + } + + # Perform failover swap roles + $primary FAILOVER TO $replica_host $replica_port + # Wait for role swap + wait_for_condition 100 100 { + [info_field [$replica info replication] role] eq "master" && + [info_field [$primary info replication] role] eq "slave" + } else { + fail "Failover didn't complete" + } + + # Verify primary and replica are still the same + foreach instance [list $primary $replica] { + verify_values $instance $f1_exp $f2_exp + assert_equal 1 [get_keys_with_volatile_items $instance] + } + + # Set f1 to expire in 1 second and wait for active expiration + $replica HEXPIRE myhash 1 FIELDS 1 f1 ;# will trigger hexpire + wait_for_ofs_sync $replica $primary + set replica_initial_expired [info_field [$replica info stats] expired_fields] + wait_for_active_expiry $replica myhash 2 $replica_initial_expired 1 + + # Verify prev primary, which is now replica of new primary (prev primary) is sync + assert_equal 2 [$primary HLEN myhash] + # Verify expiry + assert_equal "{} v2 v3" [$replica HMGET myhash f1 f2 f3] + assert_equal "" [$primary HGET myhash f1] + assert_equal "v2" [$primary HGET myhash f2] + assert_equal "v3" [$primary HGET myhash f3] + + # Primary is now replica, so no expected change in expired_fields + assert_equal [info_field [$primary info stats] expired_fields] $primary_initial_expired + + foreach rd [list $rd_primary $rd_replica] { + assert_keyevent_patterns $rd myhash hset hexpire hexpire hexpire + } + assert_keyevent_patterns $rd_replica myhash hexpired + assert_keyevent_patterns $rd_primary myhash hdel + $rd_primary close + $rd_replica close + } + } +} + +## Check monitor tests ### +start_server {tags {"hashexpire external:skip"}} { + set primary [srv 0 client] + set primary_host [srv 0 host] + set primary_port [srv 0 port] + start_server {tags {needs:repl external:skip}} { + set replica [srv 0 client] + set replica_host [srv 0 host] + set replica_port [srv 0 port] + # Set this inner layer server as replica + set replica [srv 0 client] + + proc setup_replica_monitor_test {primary replica primary_host primary_port replica_host replica_port} { + lassign [setup_replication_test $primary $replica $primary_host $primary_port] primary_initial_expired replica_initial_expired + + set rd_replica [valkey_deferring_client $replica_host $replica_port] + $rd_replica monitor + assert_match {*OK*} [$rd_replica read] + + return [list $primary_initial_expired $rd_replica] + } + + proc read_monitor_output {rd_replica read_amount} { + set res {} + set i 0 + while {$i < $read_amount} { + set curr_read [$rd_replica read] + + # Skip lines with INFO commands + if {[regexp {\"info\"} $curr_read] || [regexp {\"SELECT\"} $curr_read]} { + continue + } + lappend res $curr_read + incr i + } + $rd_replica close + return [join $res " "] + } + + # These tests are flaky, probably monitor output should be filtered + test {Multiple expired hash fields are replicated as single HDEL command to replica} { + lassign [setup_replica_monitor_test $primary $replica $primary_host $primary_port $replica_host $replica_port] primary_initial_expired rd_replica + $primary HSET myhash f1 v1 f2 v2 f3 v3 + wait_for_ofs_sync $primary $replica + $primary HPEXPIRE myhash 50 FIELDS 1 f2 + wait_for_ofs_sync $primary $replica + wait_for_active_expiry $primary myhash 2 $primary_initial_expired 1 + set _ [read_monitor_output $rd_replica 3] + } {*HSET*myhash*f1*f2*f3*HDEL*myhash*f2*} + + test {HDEL replication includes only actually expired fields not non-existent ones} { + lassign [setup_replica_monitor_test $primary $replica $primary_host $primary_port $replica_host $replica_port] primary_initial_expired rd_replica + + $primary HSET myhash f1 v1 f2 v2 f3 v3 + wait_for_ofs_sync $primary $replica + $primary HPEXPIRE myhash 50 FIELDS 2 f1 f5 + wait_for_ofs_sync $primary $replica + wait_for_active_expiry $primary myhash 2 $primary_initial_expired 1 + set _ [read_monitor_output $rd_replica 3] + } {*HSET*myhash*f1*f2*f3*HDEL*myhash*f1*} + } +} + +## expired_fields Tests #### +start_server {tags {"hashexpire external:skip"}} { + r config set notify-keyspace-events KEA + test {expired_fields metric increments by one when single hash field expires} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + # Create hash with fields and ttl + r HSET myhash f1 v1 f2 v2 f3 v3 + assert_equal 3 [r HLEN myhash] + + # Force expiration by setting very short TTL + r HPEXPIRE myhash 1 FIELDS 1 f1 + + # Wait for expiration + wait_for_active_expiry r myhash 2 $initial_expired 1 + + # Check expired_fields incremented + assert_equal 1 [info_field [r info stats] expired_fields] + + # Verify expired field returns empty string and non-expired return values + assert_equal "{} v2 v3" [r HMGET myhash f1 f2 f3] + } + + test {expired_fields metric tracks multiple field expirations with keyspace notifications} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + set rd [setup_single_keyspace_notification r] + + # Create hash with expiring fields + r HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 f5 v5 + r HEXPIRE myhash 1000 FIELDS 1 f1 + r HEXPIRE myhash 2000 FIELDS 1 f2 + + # Force expiration with short ttl + r HPEXPIRE myhash 1 FIELDS 1 f1 + + # Wait for expiration + wait_for_active_expiry r myhash 4 $initial_expired 1 + + # Verify expired_fields incremented + assert_equal 1 [expr {[info_field [r info stats] expired_fields] - $initial_expired}] + + # Verify expired field returns empty string and non-expired return values + assert_equal "{} v2 v3 v4 v5" [r HMGET myhash f1 f2 f3 f4 f5] + + # Test HPERSIST remove TTL from f2 + r HPERSIST myhash FIELDS 1 f2 + + # Verify f2 no longer has TTL + assert_equal -1 [r HTTL myhash FIELDS 1 f2] + assert_equal 1 [expr {[info_field [r info stats] expired_fields] - $initial_expired}] + + # Expire 2 fields at once + r HPEXPIRE myhash 1 FIELDS 2 f4 f5 + wait_for_active_expiry r myhash 2 $initial_expired 3 + assert_equal 3 [expr {[info_field [r info stats] expired_fields] - $initial_expired}] + + # Verify expired fields return empty string and non-expired return values + assert_equal "{} v2 v3 {} {}" [r HMGET myhash f1 f2 f3 f4 f5] + + # Wait for hset and hexpire events + assert_keyevent_patterns $rd myhash hset hexpire hexpire hexpire hexpired hpersist hexpire hexpired + $rd close + } +} + + +start_server {tags {"hashexpire external:skip"}} { + set primary [srv 0 client] + set primary_host [srv 0 host] + set primary_port [srv 0 port] + start_server {tags {needs:repl external:skip}} { + set replica [srv 0 client] + set replica_host [srv 0 host] + set replica_port [srv 0 port] + + test {expired_fields metric increments only on primary not replica during field expiry} { + lassign [setup_replication_test $primary $replica $primary_host $primary_port] primary_initial_expired replica_initial_expired + + # Create hash fields with different TTLs + $primary HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 + $primary HEXPIRE myhash 3000 FIELDS 1 f1 + $primary HSETEX myhash EX 5000 FIELDS 1 f2 v2 + $primary HEXPIRE myhash 60000 FIELDS 1 f3 + wait_for_ofs_sync $primary $replica + + # Verify PERSIST + assert_equal "v3" [$primary HGETEX myhash PERSIST FIELDS 1 f3] + wait_for_ofs_sync $primary $replica + assert_equal -1 [$primary HTTL myhash FIELDS 1 f3] + assert_equal -1 [$replica HTTL myhash FIELDS 1 f3] + + $primary HPEXPIRE myhash 1 FIELDS 1 f1 + wait_for_ofs_sync $primary $replica + # Wait for active expiry + wait_for_active_expiry $primary myhash 3 $primary_initial_expired 1 + + assert_equal 0 [info_field [$replica info stats] expired_fields] + } + } +} + +start_server {tags {"hashexpire external:skip"}} { + set primary [srv 0 client] + set primary_host [srv 0 host] + set primary_port [srv 0 port] + start_server {tags {needs:repl external:skip}} { + set replica [srv 0 client] + set replica_host [srv 0 host] + set replica_port [srv 0 port] + + test {expired_fields metric correctly tracks sequential field expirations in replication} { + lassign [setup_replication_test $primary $replica $primary_host $primary_port] primary_initial_expired replica_initial_expired + # Initialize deferred clients and subscribe to keyspace notifications + foreach instance [list $primary $replica] { + $instance config set notify-keyspace-events KEA + } + set rd_primary [valkey_deferring_client -1] + set rd_replica [valkey_deferring_client $replica_host $replica_port] + foreach rd [list $rd_primary $rd_replica] { + assert_equal {1} [psubscribe $rd __keyevent@*] + } + + # Create hash fields with different TTLs + $primary HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 + $primary HEXPIRE myhash 3000 FIELDS 1 f1 + $primary HSETEX myhash EX 5000 FIELDS 1 f2 v2 + $primary HEXPIRE myhash 60000 FIELDS 1 f3 + wait_for_ofs_sync $primary $replica + + # Verify TTLs are set correctly + assert_morethan [$primary HTTL myhash FIELDS 1 f1] 0 + assert_morethan [$primary HTTL myhash FIELDS 1 f2] 0 + assert_morethan [$primary HTTL myhash FIELDS 1 f3] 0 + assert_equal -1 [$primary HTTL myhash FIELDS 1 f4] + + assert_equal 4 [$primary HLEN myhash] + assert_equal 4 [$replica HLEN myhash] + + # Verify values + assert_equal "v1 v2 v3 v4" [$primary HMGET myhash f1 f2 f3 f4] + assert_equal "v1 v2 v3 v4" [$replica HMGET myhash f1 f2 f3 f4] + + # Verify PERSIST + assert_equal "v3" [$primary HGETEX myhash PERSIST FIELDS 1 f3] + wait_for_ofs_sync $primary $replica + assert_equal -1 [$primary HTTL myhash FIELDS 1 f3] + assert_equal -1 [$replica HTTL myhash FIELDS 1 f3] + + assert_equal 1 [get_keys_with_volatile_items $primary] + assert_equal 1 [get_keys_with_volatile_items $replica] + # Expire fields one by one + for {set i 1} {$i <= 4} {incr i} { + assert_equal 1 [get_keys $primary] + assert_equal 1 [get_keys $replica] + + # Set field to expire immediately + $primary HPEXPIRE myhash 1 FIELDS 1 f$i + wait_for_ofs_sync $primary $replica + + # Wait for active expiry + wait_for_active_expiry $primary myhash [expr {4 - $i}] $primary_initial_expired $i + + # Replica should NOT increment expired_fields + assert_equal 0 [info_field [$replica info stats] expired_fields] + + # Replica should also have the field removed with replication + assert_equal [expr {4 - $i}] [$replica HLEN myhash] + } + assert_equal 0 [get_keys_with_volatile_items $primary] + assert_equal 0 [get_keys_with_volatile_items $replica] + + # Hash should be deleted when all fields expire + assert_equal 0 [$primary EXISTS myhash] + assert_equal 0 [$replica EXISTS myhash] + assert_equal 0 [get_keys $primary] + assert_equal 0 [get_keys $replica] + assert_equal 0 [get_keys_with_volatile_items $primary] + assert_equal 0 [get_keys_with_volatile_items $replica] + + foreach rd [list $rd_primary $rd_replica] { + assert_keyevent_patterns $rd myhash hset hexpire hset hexpire hexpire hpersist hexpire + } + assert_keyevent_patterns $rd_primary myhash hexpired ; # f1 + assert_keyevent_patterns $rd_replica myhash hdel + foreach rd [list $rd_primary $rd_replica] { + assert_keyevent_patterns $rd myhash hexpire + } + assert_keyevent_patterns $rd_primary myhash hexpired ; # f2 + assert_keyevent_patterns $rd_replica myhash hdel + foreach rd [list $rd_primary $rd_replica] { + assert_keyevent_patterns $rd myhash hexpire + } + assert_keyevent_patterns $rd_primary myhash hexpired ; # f3 + assert_keyevent_patterns $rd_replica myhash hdel + foreach rd [list $rd_primary $rd_replica] { + assert_keyevent_patterns $rd myhash hexpire + } + assert_keyevent_patterns $rd_primary myhash hexpired del ; # f4 + assert_keyevent_patterns $rd_replica myhash hdel del + $rd_primary close + $rd_replica close + } + } +} + +#### CLIENT PAUSE WRITE prevents active expiration test ##### +start_server {tags {"hashexpire external:skip"}} { + test "CLIENT PAUSE WRITE blocks hash field active expiry until pause ends" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 + assert_equal 2 [r HLEN myhash] + + # To avoid flakiness - run commands in transaction + r multi + + r HPEXPIRE myhash 500 FIELDS 1 f1 + r CLIENT PAUSE 1200 WRITE + + r exec + + # Verify no expiry happened immediately after transaction + assert_equal 2 [r HLEN myhash] + assert_equal 0 [expr {[info_field [r info stats] expired_fields] - $initial_expired}] + + # Wait longer than expiry time while paused + after 600 + + # Field should still exist because active expiry is paused + assert_equal 2 [r HLEN myhash] + assert_equal 0 [expr {[info_field [r info stats] expired_fields] - $initial_expired}] + + # Wait for pause to end + after 600 + + # Now active expiry should work + wait_for_active_expiry r myhash 1 $initial_expired 1 50 20 + + assert_equal "{} v2" [r HMGET myhash f1 f2] + } +} + + +##### Active Expiry Tests After RENAME/COPY/RESTORE Operations ##### +start_server {tags {"hashexpire external:skip"}} { + foreach command {HEXPIRE HPEXPIRE HEXPIREAT HPEXPIREAT} { + foreach op {RENAME COPY RESTORE MOVE} { + test "$command active expiry works correctly after $op operation" { + r FLUSHALL + r SELECT 0 + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 v1 f2 v2 f3 v3 f4 v4 + assert_equal 4 [r HLEN myhash] + assert_equal 0 [get_keys_with_volatile_items r] + + # Set expiry on fields + r $command myhash [get_short_expire_value $command] FIELDS 1 f1 + wait_for_active_expiry r myhash 3 $initial_expired 1 + r $command myhash [get_long_expire_value $command] FIELDS 1 f4 + assert_equal 1 [get_keys_with_volatile_items r] + + # Run op command + if {$op eq "RENAME"} { + r RENAME myhash newhash + set target_key newhash + } elseif {$op eq "COPY"} { + r COPY myhash copyhash + set target_key copyhash + } elseif {$op eq "RESTORE"} { + # RESTORE + set serialized [r DUMP myhash] + r DEL myhash + r RESTORE restorehash 0 $serialized + set target_key restorehash + } else { + r MOVE myhash 1 + # Switch to target DB + r SELECT 1 + set target_key myhash + } + if {$op eq "COPY"} { + assert_equal 2 [get_keys_with_volatile_items r] + } else { + assert_equal 1 [get_keys_with_volatile_items r] + } + + # Set expiry on fields after op command + r $command $target_key [get_short_expire_value $command] FIELDS 1 f3 + # Wait for active expiry on "new" key + wait_for_active_expiry r $target_key 2 $initial_expired 2 + + assert_equal "{} v2 {}" [r HMGET $target_key f1 f2 f3] + # In copy verify original hash hasnt changed + if {$op eq "COPY"} { + assert_equal "{} v2 v3" [r HMGET myhash f1 f2 f3] + } + } + } + } +} + +foreach command {HEXPIRE HPEXPIRE HEXPIREAT HPEXPIREAT} { + start_server {tags {"hashexpire external:skip"}} { + test "$command active expiry processes multiple hash keys with different field counts" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + # Create multiple hash keys + for {set i 1} {$i <= 5} {incr i} { + r HSET hash$i f1 v1_$i f2 v2_$i f3 v3_$i + } + assert_equal 5 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] + + r $command hash1 [get_short_expire_value $command] FIELDS 1 f1 + r $command hash2 [get_short_expire_value $command] FIELDS 2 f1 f2 + r $command hash3 [get_short_expire_value $command] FIELDS 3 f1 f2 f3 + r $command hash4 [get_short_expire_value $command] FIELDS 1 f2 + + wait_for_condition 100 100 { + [r HLEN hash1] eq 2 && [r HLEN hash2] eq 1 && + [r HLEN hash3] eq 0 && [r HLEN hash4] eq 2 && [r HLEN hash5] eq 3 && + [expr {[info_field [r info stats] expired_fields] - $initial_expired}] eq 7 + } else { + fail "Fields should expire across multiple keys" + } + + assert_equal "{} v2_1 v3_1" [r HMGET hash1 f1 f2 f3] + assert_equal "{} {} v3_2" [r HMGET hash2 f1 f2 f3] + assert_equal 0 [r EXISTS hash3] + assert_equal "v1_4 {} v3_4" [r HMGET hash4 f1 f2 f3] + assert_equal "v1_5 v2_5 v3_5" [r HMGET hash5 f1 f2 f3] + assert_equal 4 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] + + # Set long expire + r $command hash1 [get_long_expire_value $command] FIELDS 1 f2 + assert_equal 1 [get_keys_with_volatile_items r] + + r $command hash2 [get_long_expire_value $command] FIELDS 1 f3 + assert_equal 2 [get_keys_with_volatile_items r] + } + } +} +foreach command {HEXPIRE HPEXPIRE HEXPIREAT HPEXPIREAT} { + start_server {tags {"hashexpire external:skip"}} { + test "$command handles mixed short and long expiry times across multiple keys" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET key1 f1 v1 f2 v2 f3 v3 + r HSET key2 f1 v1 f2 v2 f3 v3 + r HSET key3 f1 v1 f2 v2 f3 v3 + r HSET key4 f1 v1 f2 v2 f3 v3 + assert_equal 4 [get_keys r] + assert_equal 0 [get_keys_with_volatile_items r] + + r $command key2 [get_long_expire_value $command] FIELDS 1 f1 + assert_equal 1 [get_keys_with_volatile_items r] + + set short_expire [get_short_expire_value $command] + r $command key1 $short_expire FIELDS 1 f1 + r $command key3 $short_expire FIELDS 2 f1 f2 + r $command key4 $short_expire FIELDS 3 f1 f2 f3 + + wait_for_condition 100 100 { + [r HLEN key1] eq 2 && [r HLEN key3] eq 1 && + [r HLEN key4] eq 0 && [expr {[info_field [r info stats] expired_fields] - $initial_expired}] eq 6 + } else { + fail "Short expiry fields should expire" + } + + assert_equal "{} v2 v3" [r HMGET key1 f1 f2 f3] + assert_equal "v1 v2 v3" [r HMGET key2 f1 f2 f3] + assert_equal "{} {} v3" [r HMGET key3 f1 f2 f3] + assert_equal 0 [r EXISTS key4] + assert_equal 3 [get_keys r] + assert_equal 1 [get_keys_with_volatile_items r] + + assert_morethan [r HTTL key2 FIELDS 1 f1] 3000 + } + } +} +foreach command {HEXPIRE HPEXPIRE HEXPIREAT HPEXPIREAT} { + start_server {tags {"hashexpire external:skip"}} { + + test "$command deletes entire keys when all fields expire while preserving partial keys" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + # Create keys where some will be completely deleted + for {set i 1} {$i <= 4} {incr i} { + r HSET delkey$i f1 v1 + } + r HSET keepkey f1 v1 f2 v2 + + # Set expiry on f1 field in delkey1-4 (which is all the fields there) + for {set i 1} {$i <= 4} {incr i} { + r $command delkey$i [get_short_expire_value $command] FIELDS 1 f1 + } + r $command keepkey [get_short_expire_value $command] FIELDS 1 f1 + + # Wait for active expiry - 4 keys deleted, 1 key reduced + wait_for_condition 100 100 { + [r EXISTS delkey1] eq 0 && [r EXISTS delkey2] eq 0 && + [r EXISTS delkey3] eq 0 && [r EXISTS delkey4] eq 0 && + [r HLEN keepkey] eq 1 && + [info_field [r info stats] expired_fields] eq [expr {$initial_expired + 5}] + } else { + fail "Keys should be deleted when last field expires" + } + + assert_equal "{} v2" [r HMGET keepkey f1 f2] + } + } +} + +foreach command {HEXPIRE HPEXPIRE HEXPIREAT HPEXPIREAT} { + start_server {tags {"hashexpire external:skip"}} { + test "$command active expiry reclaims memory efficiently across multiple large hash keys" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + # Create keys with large values + set large_value [string repeat "x" 1024] + # 5 keys, 10 "large" fields in each + for {set i 1} {$i <= 5} {incr i} { + for {set j 1} {$j <= 10} {incr j} { + r HSET myhash$i f$j $large_value$i$j + } + } + + # Save initial memory + set total_mem_before 0 + for {set i 1} {$i <= 5} {incr i} { + set mem [r MEMORY USAGE myhash$i] + if {$mem eq ""} {set mem 0} + incr total_mem_before $mem + } + + # For each key, set expire for 5 fields + for {set i 1} {$i <= 5} {incr i} { + r $command myhash$i [get_short_expire_value $command] FIELDS 5 f1 f2 f3 f4 f5 + } + + # Wait for expiry + wait_for_condition 100 100 { + [r HLEN myhash1] eq 5 && [r HLEN myhash2] eq 5 && + [r HLEN myhash3] eq 5 && [r HLEN myhash4] eq 5 && + [r HLEN myhash5] eq 5 && + [info_field [r info stats] expired_fields] eq [expr {$initial_expired + 25}] + } else { + fail "25 fields should expire across 5 keys" + } + + # Verify memory reduction + set total_mem_after 0 + for {set i 1} {$i <= 5} {incr i} { + set mem [r MEMORY USAGE myhash$i] + if {$mem eq ""} {set mem 0} + incr total_mem_after $mem + } + + # Memory should be reduced + if {$total_mem_before > 0} { + assert_morethan [expr {$total_mem_before - $total_mem_after}] 10000 + } + } + } +} + +##### HINCRBY/HINCRBYFLOAT Active Expiry Tests ##### +start_server {tags {"hashexpire external:skip"}} { + foreach cmd {HINCRBY HINCRBYFLOAT} { + # Set increment values + if {$cmd eq "HINCRBY"} { + set inc1 2 + set inc2 3 + set inc3 4 + } else { + set inc1 2.5 + set inc2 3.5 + set inc3 4.5 + } + + # 1 key, 1 field + test "$cmd recreates field with correct value after active expiry deletion" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 1 + assert_equal 1 [r HLEN myhash] + + # Set expiry on f1 + r HPEXPIRE myhash 100 FIELDS 1 f1 + + # Wait for active expiry + wait_for_active_expiry r myhash 0 $initial_expired 1 + + # Try increment after expiry (should recreate field) + r $cmd myhash f1 $inc1 + assert_equal $inc1 [r HGET myhash f1] + } + + # 1 key, 1 field, increment before expiry + test "$cmd preserves existing TTL when incrementing field value" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 1 + assert_equal 1 [r HLEN myhash] + + # Set expiry after increment + r HEXPIRE myhash 100000 FIELDS 1 f1 + + # Increment after expiry set + r $cmd myhash f1 $inc1 + + # Check value and expiry is still set + assert_equal [expr {$inc1 + 1}] [r HGET myhash f1] + assert_morethan [r HTTL myhash FIELDS 1 f1] 90000 + } + + # 1 key, 3 fields, increment multiple fields, expiry on multiple fields + test "$cmd handles mix of expired and existing fields during increment operations" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 1 f2 2 f3 3 + assert_equal 3 [r HLEN myhash] + + # Set expiry on f1 and f3 + r HPEXPIRE myhash 100 FIELDS 2 f1 f3 + + # Wait for active expiry + wait_for_active_expiry r myhash 1 $initial_expired 2 + + # Increment all fields (f1 and f3 should be recreated, f2 should increment) + r $cmd myhash f1 $inc1 + r $cmd myhash f2 $inc2 + r $cmd myhash f3 $inc3 + + # Check values + assert_equal "$inc1 [expr {$inc2+2}] $inc3" [r HMGET myhash f1 f2 f3] + } + + # 1 key, 3 fields, increment before expiry, then expire + test "$cmd maintains TTL values when incrementing fields with existing expiry" { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + + r HSET myhash f1 1 f2 2 f3 3 + assert_equal 3 [r HLEN myhash] + + # Set expiry on f1 and f3 + r HEXPIRE myhash 100000 FIELDS 2 f1 f3 + + # Increment/ Decrement all fields + r $cmd myhash f1 $inc1 + r $cmd myhash f3 -$inc3 + # Only f2 should remain + assert_equal "[expr {$inc1+1}] 2 [expr {-$inc3+3}]" [r HMGET myhash f1 f2 f3] + } + } +} + +### HDEL WITH ACTIVE EXPIRE ##### +start_server {tags {"hashexpire external:skip"}} { + test {HDEL removes both expired and non-expired fields deleting key when empty} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f1 v1 f2 v2 + r HEXPIRE myhash 1 FIELDS 1 f1 + wait_for_active_expiry r myhash 1 $initial_expired 1 + # f1 is expired, f2 is not, f3 does not exist + r HDEL myhash f1 f2 f3 + # f1 and f2 should be gone, f3 never existed + assert_equal 0 [r HEXISTS myhash f1] + assert_equal 0 [r HEXISTS myhash f2] + assert_equal 0 [r HEXISTS myhash f3] + # The key should be deleted since all fields are gone + assert_equal 0 [r EXISTS myhash] + } +} + +##### HPERSIST TEST WITH ACTIVE EXPIRY ##### +start_server {tags {"hashexpire external:skip"}} { + test {HPERSIST returns -2 when attempting to persist already expired field} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f1 v1 + r HPEXPIRE myhash 50 FIELDS 1 f1 + wait_for_active_expiry r myhash 0 $initial_expired 1 + assert_equal -2 [r HPERSIST myhash FIELDS 1 f1] + assert_equal -2 [r HTTL myhash FIELDS 1 f1] + assert_equal "" [r HGET myhash f1] + } + + test {HPEXPIRE works correctly on field after HPERSIST removes its TTL} { + r FLUSHALL + set initial_expired [info_field [r info stats] expired_fields] + r HSET myhash f1 v1 + r HEXPIRE myhash 10000 FIELDS 1 f1 + r HPERSIST myhash FIELDS 1 f1 + r HPEXPIRE myhash 150 FIELDS 1 f1 + wait_for_active_expiry r myhash 0 $initial_expired 1 + assert_equal 0 [r EXISTS myhash] + } +} +