Forge is an embedded persistent key value storage engine. It's design is based on a lsm tree and sstable. Because of this were able to achieve high write throughputs easily. A lot of the engineering effort is to bring read speeds up and the orchestration and efficiency of the compaction process without sacrificing write speeds.
Current write and read speeds(depends largely on size of data/database):
- 1 - 1.67 million sequential writes per second.
- 1 million random writes per second.
- 100k sequential reads per second.
- 50k random reads per second
instruction to use the cli interface here https://github.com/iFrogHop2Worlds/Forge/blob/main/ForgeCli/readme.md
ForgeEngine docs can be generated by running cargo doc --open in the Forge/ForgeEngine directory. Will publish to docs.rs soon.
The Write Ahead Log (WAL) is a crucial component of Forge's architecture. It ensures durability by logging all write operations before they are applied to the database. This allows Forge to recover from crashes or failures by replaying the WAL entries during recovery, ensuring that no data is lost. The WAL is also used for crash recovery and point-in-time recovery, providing a mechanism to restore the database to a consistent state after a failure.
The Memtable is a temporary in-memory data structure that holds recently written data before it is flushed to disk. It acts as a buffer, improving write performance by reducing disk I/O. The Memtable is designed to be flushed to disk periodically or when it reaches a certain size, ensuring that data is persisted to disk. This design choice allows Forge to maintain high write throughput while ensuring data durability.
The Sorted String Table (SSTable) is a persistent data structure that stores data in sorted order on disk. It is designed for efficient read operations and is used as the primary storage format for Forge's database. SSTables are immutable, meaning once written, they cannot be modified. Instead, new SSTables are created for updated data, and the old ones are deleted during compaction. This design choice allows Forge to achieve high read performance while maintaining data durability and consistency.
Compaction is a process in Forge's architecture that merges multiple SSTables into fewer, larger SSTables. This process helps to reduce the number of SSTables and improve read performance. During compaction, Forge merges overlapping SSTables, discarding obsolete data and compacting the remaining data into a new SSTable. This process also helps to reduce the size of SSTables, making them more efficient to read from disk. Compaction is an essential part of Forge's architecture, ensuring that the database remains performant and efficient over time.
Each SSTable opened by Forge is wrapped in an in-memory table cache. The table cache keeps the lookup metadata for that table resident in memory so a point read does not need to reopen and reparse the SSTable companion files every time. Today the cache holds:
- the sparse index
- the bloom filter
- the optional key range fence
- the optional block index
- one open reader for the table data file
This makes the read path much cheaper because most lookups can stay in memory until Forge knows the exact byte range that must be touched on disk.
The table cache also owns the block cache for that table. Forge does not immediately decode and retain every block it touches. On a cold miss it reads the raw block bytes, searches that block, and returns the matching entry if found. If the same block is touched again, Forge promotes it into the decoded block cache. Once promoted, repeated reads can reuse the parsed entries directly instead of decoding the block again. This hybrid approach gives Forge better behavior for both random reads and localized repeated reads.
The fence stores the smallest and largest key in an SSTable. Forge persists this as a small companion file next to the table so it can be loaded into the table cache at open time.
Before Forge checks the bloom filter or the block index, it first asks whether the requested key is inside the fence's
min_key..=max_key range. If the key is outside that range, the whole table can be skipped immediately.
This is a cheap but important optimization. It reduces unnecessary bloom checks, unnecessary table probes, and overall read amplification when many SSTables are present.
The Sparse Index is a data structure that allows for efficient retrieval of data from SSTables. It is designed to reduce the number of disk reads required to retrieve data, improving read performance. The Sparse Index is built by sampling data from SSTables and storing those sampled keys and byte offsets in memory. When Forge looks up a key, the sparse index gives the byte offset of the closest sampled key less than or equal to the requested key. Forge then seeks to that offset and scans forward.
If the block index path cannot fully answer the lookup, the sparse index becomes the fallback path. Forge still avoids decoding every full record during that scan. It reads the key and record lengths, skips value payloads for non-matching entries, and only materializes the full value when the requested key is found.
The Block Index maps sampled first-keys to physical blocks inside the SSTable data file. This gives Forge a more precise starting point than the sparse index alone. When a block index is present, Forge can usually narrow a point lookup down to a single block instead of seeking to a coarser offset and scanning a larger portion of the file.
When querying a key, Forge uses the block index to locate the candidate block, reads that one block, and searches within it. If the block becomes hot, it is promoted into the decoded block cache owned by the table cache.
The Bloom Filter is a probabilistic data structure used in Forge's architecture to quickly determine if a key exists in a SSTable. It is designed to reduce the number of disk reads required to retrieve data, improving read performance. The Bloom Filter is built by hashing keys and storing the results in a bit array. This allows for quick lookups and reduces the need to read entire SSTables from disk.
In the current read path the bloom filter is not the first check. Forge first uses the fence to decide whether a table can be skipped by range. Only if the key is inside the table's range does Forge consult the bloom filter. This ordering keeps the bloom filter useful while avoiding work on tables that can already be rejected by their key range.