Skip to content

This repository is a learners journey to understanding Bloom filter basics

License

Notifications You must be signed in to change notification settings

stainlessray/bloom

Repository files navigation

๐Ÿง  Interactive Bloom Filter Demo (v0.1.1)

This console-based demo allows you to explore Bloom filter variants in real time, observe their bit patterns, and persist or reload filter states between sessions.

๐Ÿซ Build

mvn clean package

๐ŸŽฎ Launch

mvn clean package
java -cp target/classes com.bloomfilter.demo.InteractiveBloomDemo

Play

=======================================
 BLOOM FILTER INTERACTIVE DEMO 
=======================================

Switched to classic mode.
>

๐Ÿงฉ Commands Overview

Command Description
mode <classic partitioned counting> Switches the active Bloom filter type
add <word> Inserts an element
check <word> Tests membership of an element
remove <word> Removes an element (Counting only)
clear Resets all bits/counters
info Shows estimated count and false-positive rate
ingestlist <input.txt> <outputFolder/> Loads a word list, creates and saves a standardized binary
save <file> Manually saves the active filter state
load <file> Loads a legacy .bin file saved by save
loadstd <file> Loads a standardized .bin file created by ingestlist
loadmeta <file> Displays metadata (algorithm, bit size, hashes, source, timestamp) without loading
crossload <file> Hot-swaps the current in-memory filter with a serialized binary (same configuration required)
help Displays available commands
exit Quits the demo

๐Ÿงฐ Ingestion and Auto-Naming

The ingestlist command standardizes source text data into persistent .bin filters.

Example

> mode classic
> ingestlist src/main/resources/data/fruit.txt filters/

Produces:

[Standardized binary created] filters/fruit_Classic_m64_k3_v20251028192638.bin
Algorithm=ClassicBloomFilter | Bits=64 | Hashes=3 | Source=src/main/resources/data/fruit.txt

Auto-Naming Convention

<listName>_<Algorithm>_m<bitArraySize>_k<hashCount>_v<timestamp>.bin

Examples:

fruit_Classic_m64_k3_v20251028192638.bin
fruit_Counting_m64_k3_v20251028192700.bin
fruit_Partitioned_p4x32_k3_v20251028192812.bin
  • m = total bit array size
  • p#x# = partitioned filter configuration (partitions ร— size)
  • k = number of hash functions
  • v = creation timestamp (local time, yyyyMMddHHmmss)

Each algorithm uses its own storage layout. โœ… One text file may be ingested for multiple algorithms. ๐Ÿšซ One binary file cannot be used across modes.


๐Ÿงฌ Metadata and Standardization

Each standardized binary embeds descriptive metadata in the header:

Algorithm=ClassicBloomFilter | Bits=64 | Hashes=3 |
Source=src/main/resources/data/fruit.txt | Created=2025-10-28T19:26:38

This enables:

  • Self-documenting filters
  • Mode validation on load/crossload
  • Clean long-term persistence for experimentation

๐Ÿงฉ Example Workflow

> mode classic
> ingestlist src/main/resources/data/fruit.txt filters/
> loadstd filters/fruit_Classic_m64_k3_v20251028192638.bin
> check apple
Result: apple โ†’ possibly in set
> mode counting
> ingestlist src/main/resources/data/fruit.txt filters/
> remove apple
> check apple
Result: apple โ†’ definitely not
> save filters/fruit_counting_m64_k3_v20251028192700.bin

๐Ÿง  Notes

  • Verbose mode is automatically enabled for interactive learning, showing each hash, index, and bit operation.

  • All binary filters are portable within this framework version.

  • Cross-mode loading gracefully errors if configuration mismatch occurs, preserving the console session.

  • Planned enhancements for v0.2:

    • lsfilters command to list recent filters
    • optional JSON metadata export
    • Docker sandbox environment for browser-based testing

Bloom Filter Family Educational implementation and visualization framework. Not intended for production โ€” designed for learning, experimentation, and extension.


๐Ÿง  Overview

This project provides a learning-focused, extensible set of Bloom Filter implementations with full visualization and persistence support.
It allows you to explore the probabilistic behavior of these filters through console interaction, wordlist ingestion, and standardized binary serialization.


โš™๏ธ Features

Capability Description
๐Ÿงฉ Modular Filters ClassicBloomFilter, CountingBloomFilter, PartitionedBloomFilter all implement MembershipFilter<T>.
๐Ÿงฎ Hashing MurmurHash3 (x64/128) via HashUtils for stable, fast indexing.
๐Ÿง  Visualization Real-time console visualization of bit arrays and counters.
๐Ÿ’พ Persistence Filters can be saved, loaded, or ingested from .txt wordlists as standardized .bin files.
๐Ÿงพ Metadata Each .bin file includes FilterMetadata (algorithm, source, bit size, hash count, timestamp).
๐Ÿง‘โ€๐Ÿ’ป Interactive Console Learn Bloom filters by directly adding, checking, and removing elements.

๐Ÿงฉ Project Structure


bloom_project_extracted/
โ”‚
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ how_it_started.md
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ pickup-notes-102825.md
โ”œโ”€โ”€ literal_example.md
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ about_bloom_filters.md
โ”œโ”€โ”€ pom.xml
โ”œโ”€โ”€ CHANGELOG.md
โ”œโ”€โ”€ EDUCATIVE.md
โ”œโ”€โ”€ QUICKSTART.md
โ”‚
โ”œโ”€โ”€ .git/
โ”‚   โ”œโ”€โ”€ config
โ”‚   โ”œโ”€โ”€ HEAD
โ”‚   โ”œโ”€โ”€ index
โ”‚   โ”œโ”€โ”€ refs/
โ”‚   โ”œโ”€โ”€ objects/
โ”‚   โ”œโ”€โ”€ hooks/
โ”‚   โ”œโ”€โ”€ logs/
โ”‚   โ””โ”€โ”€ branches/
โ”‚
โ”œโ”€โ”€ .idea/
โ”‚   โ”œโ”€โ”€ compiler.xml
โ”‚   โ”œโ”€โ”€ encodings.xml
โ”‚   โ”œโ”€โ”€ misc.xml
โ”‚   โ”œโ”€โ”€ modules.xml
โ”‚   โ”œโ”€โ”€ vcs.xml
โ”‚   โ”œโ”€โ”€ workspace.xml
โ”‚   โ””โ”€โ”€ dictionaries/project.xml
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main/
โ”‚   โ”‚   โ”œโ”€โ”€ java/com/bloomfilter/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ AbstractBloomFilter.java
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ClassicBloomFilter.java
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ CountingBloomFilter.java
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ FilterIO.java
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ FilterMetadata.java
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ HashUtils.java
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ MembershipFilter.java
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ PartitionedBloomFilter.java
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ demo/
โ”‚   โ”‚   โ”‚       โ”œโ”€โ”€ InteractiveBloomDemo.java
โ”‚   โ”‚   โ”‚       โ””โ”€โ”€ MasterVisualDemo.java
โ”‚   โ”‚   โ”‚
โ”‚   โ”‚   โ””โ”€โ”€ resources/
โ”‚   โ”‚       โ”œโ”€โ”€ data/
โ”‚   โ”‚       โ”‚   โ”œโ”€โ”€ animals.txt
โ”‚   โ”‚       โ”‚   โ”œโ”€โ”€ cities.txt
โ”‚   โ”‚       โ”‚   โ”œโ”€โ”€ fruit.txt
โ”‚   โ”‚       โ”‚   โ”œโ”€โ”€ random.txt
โ”‚   โ”‚       โ”‚   โ”œโ”€โ”€ uuids.txt
โ”‚   โ”‚       โ”‚   โ””โ”€โ”€ doc/
โ”‚   โ”‚       โ”‚       โ”œโ”€โ”€ ABOUT_BLOOM_FILTERS.md
โ”‚   โ”‚       โ”‚       โ”œโ”€โ”€ CHANGELOG.md
โ”‚   โ”‚       โ”‚       โ””โ”€โ”€ PROJECT_ORIGIN.md
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ test/
โ”‚       โ””โ”€โ”€ java/com/bloomfilter/
โ”‚           โ”œโ”€โ”€ BloomFilterTest.java
โ”‚           โ”œโ”€โ”€ ClassicBloomFilterTest.java
โ”‚           โ”œโ”€โ”€ CountingBloomFilterTest.java
โ”‚           โ”œโ”€โ”€ HashUtilsTest.java
โ”‚           โ””โ”€โ”€ PartitionedBloomFilterTest.java
โ”‚
โ”œโ”€โ”€ target/
โ”‚   โ”œโ”€โ”€ bloom-0.1-SNAPSHOT.jar
โ”‚   โ”œโ”€โ”€ classes/...
โ”‚   โ”œโ”€โ”€ test-classes/...
โ”‚   โ”œโ”€โ”€ surefire-reports/ (JUnit results)
โ”‚   โ”œโ”€โ”€ generated-sources/
โ”‚   โ”œโ”€โ”€ generated-test-sources/
โ”‚   โ””โ”€โ”€ maven-status/
โ”‚
โ””โ”€โ”€ filters/
    โ”œโ”€โ”€ animals_ClassicBloomFilter_m64_k3_v20251028210153.bin
    โ”œโ”€โ”€ animals_CountingBloomFilter_m64_k3_v20251028210218.bin
    โ”œโ”€โ”€ animals_PartitionedBloomFilter_p4x32_k3_v20251028210240.bin
    โ”œโ”€โ”€ cities_ClassicBloomFilter_m64_k3_v20251028210444.bin
    โ”œโ”€โ”€ cities_CountingBloomFilter_m64_k3_v20251028210459.bin
    โ”œโ”€โ”€ cities_PartitionedBloomFilter_p4x32_k3_v20251028210507.bin
    โ”œโ”€โ”€ fruit_ClassicBloomFilter_m64_k3_v20251028205146.bin
    โ”œโ”€โ”€ fruit_CountingBloomFilter_m64_k3_v20251028202122.bin
    โ””โ”€โ”€ fruit_PartitionedBloomFilter_p4x32_k3_v20251028205314.bin


๐Ÿ’พ Standardized Binary Format

Each .bin file contains:

  1. A serialized FilterMetadata header:

    Algorithm=ClassicBloomFilter | Bits=64 | Hashes=3 | Source=data/fruit.txt | Created=2025-10-28T17:30:28
    
  2. A binary-encoded representation of the filterโ€™s internal state (BitSet or counter array).

This format ensures deterministic, comparable runs across filter variants and datasets.


๐Ÿงช Tests

Run JUnit tests:

mvn test

Youโ€™ll see console output illustrating the add/check/remove workflow for each filter type, verifying correctness and demonstrating expected FPR behavior.


๐Ÿงฑ Roadmap

Milestone Description Status
v0.1.0 (MVP) Ingestion, persistence, visualization โœ… Complete
v0.2.0 loadmeta (metadata inspection) and enhanced CLI help โณ Planned
v0.3.0 Docker containerization for sandboxed demo โณ Planned
v1.0.0 Documentation, optimization, educational release โณ Future

๐Ÿง‘โ€๐ŸŽ“ Educational Use

This project is not optimized for production performance or concurrency. It is intentionally verbose, instrumented for learning, and safe for experimentation.


๐Ÿงญ License

This repository is provided under the MIT License (add LICENSE file if not yet present). You are free to modify and extend it for educational or research purposes.


Built with persistence and curiosity. Probabilistic data structures in action.


About

This repository is a learners journey to understanding Bloom filter basics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published