We fit a complete, modern, professional-grade, industry-standard English dictionary -- 123,676 words -- onto a Commodore 64.
A computer from 1982 with 64KB of RAM can now check your spelling against the exact same English dictionary embedded in all modern web browsers and most Linux distributions.
That English dictionary is the SCOWL 60 word list. That word list is 1.19MB by itself, before any code that operates on it. Unfortunately, the Commodore 1541 can fit only 170KB of data on a single disk. And the poor Commodore 64 has only 64KB of RAM. Do the math.
Bloom filters + 1541 disk drive as external memory.
Think of it as a probabilistic hash table that lives on your floppy disk. Five different hash functions compute bit positions in real-time on that 1MHz 6510 processor, and the program reads only the exact disk sectors it needs -- between one and five sectors per word lookup. This gives you spell checking with a 0.81% false positive rate and zero false negatives.
- 123,676 words - More than most people's active vocabulary
- 0.81% false positive rate - 99.19% of misspellings get caught
- 0% false negative rate - Words in this dictionary ALWAYS pass
- Fits on one floppy disk - 160KB Bloom filter + 8KB program
- Sorted disk access - Minimizes 1541 head movement
When you type a word, the C64:
- Computes 5 hash values (FNV-1a, DJB2, SDBM, Jenkins, Murmur)
- Sorts the bit positions for optimal disk access
- Reads 2-3 disk sectors on average
- Returns your answer in a few seconds
All while displaying progress dots and color-coded results in glorious PETSCII.
Download spellcheck.d64 - Latest build from main branch
Load it in VICE, or if you're truly l33t, write it to a real 1541 disk and run it on actual Commodore 64 hardware.
LOAD"SPELLCHECK",8
RUNType a word. Get a color-coded answer. Question your assumptions about what "vintage computing limitations" really means.
- Commodore 64 (or VICE emulator)
- 1541 disk drive (or emulation thereof)
- CMake 3.18+
- LLVM-MOS SDK with
mos-c64-clang - Python 3.8+ (for Bloom filter generation)
- VICE tools (c1541)
# Install dependencies
pip install -e .
# Get LLVM-MOS
wget https://github.com/llvm-mos/llvm-mos-sdk/releases/latest/download/llvm-mos-linux.tar.xz
tar xf llvm-mos-linux.tar.xz
sudo mv llvm-mos /opt/
export PATH="/opt/llvm-mos/bin:$PATH"
# Build
mkdir build && cd build
cmake .. && makeThe build process:
- Downloads the SCOWL 60 word list
- Generates a 1.28-million-bit Bloom filter
- Optimizes it for 1541 disk geometry
- Compiles C64-native code with LLVM-MOS
- Creates a bootable .d64 disk image
Want British spelling? Hacker jargon? Roman numerals? Edit src/python/build_bloom.py:
SCOWL_CONFIG = {
'max_size': 60, # 10-95 (small to insane)
'spelling': ['US'], # US, GBs, GBz, CA, AU
'max_variant': 0, # 0-3 (none to seldom-used)
'special': ['hacker', 'roman-numerals'],
}Then rebuild. The entire dictionary regenerates with your preferences.
- 1,286,256 bits organized as 633 × 254-byte REL records
- 5 hash functions (k=5) for optimal false positive rate
- 10.4 bits per word (m/n ratio)
- 38% bit density (actual vs theoretical: spot on)
The false positive rate formula: (1 - e^(-kn/m))^k = 0.0081
Translation: Only 1 in 123 misspellings sneaks through. And we empirically validated this with 100,000 random strings.
The 1541 drive head moves sequentially through the disk. We:
- Sort bit positions before checking (left-to-right on disk)
- Cache the current 254-byte REL record
- Minimize redundant seeks
Result: 2-3 disk reads per word instead of 5. The difference between "fast" and "glacial" on floppy hardware.
| Operation | Time | Notes |
|---|---|---|
| Hash computation | <10ms | 5 functions on 1MHz CPU |
| Disk sector read | ~200ms | 1541 seek + read |
| Total per word | ~400-600ms | Cached reads help |
| User perception | "Soonish" | Progress dots + color |
C64 RAM (64KB):
- Program code: ~5KB
- Record buffer: 254 bytes
- Variables: <1KB
- Remaining: ~59KB free!
Disk (170KB):
- BLOOM.DAT: 160KB (REL file)
- SPELLCHECK: 8KB (PRG file)
- Directory: 2KB
bloomer/
├── src/
│ ├── spellcheck.c # C64 spell checker (522 lines of C)
│ └── python/ # Build toolchain
│ ├── build_bloom.py # Orchestrator
│ ├── bloom_filter.py # Bloom filter implementation
│ ├── bloom_statistics.py # FP rate validation
│ └── disk_creator.py # D64 image creation
├── build/
│ ├── artifacts/
│ │ ├── spellcheck.prg # Compiled 6502 code
│ │ └── spellcheck.d64 # Bootable disk image
│ └── generated/
│ ├── bloom.dat # 160KB Bloom filter
│ └── bloom_config.h # Auto-generated constants
└── CMakeLists.txt # LLVM-MOS build config
LOAD"SPELLCHECK",8
RUNType any word:
- Green circle + "OK" → Spelled correctly
- Red X + "NOT FOUND" → Not in dictionary
Three-clause BSD. Do pretty much what you want with it, just don't claim that you wrote it, and don't sue me when it deletes your mom or whatever.
SPDX-License-Identifier: BSD-3-Clause
- SCOWL (Spell Checker Oriented Word Lists): http://wordlist.aspell.net/
- Bloom, Burton H. (1970): "Space/Time Trade-offs in Hash Coding with Allowable Errors"
- LLVM-MOS: https://github.com/llvm-mos/llvm-mos-sdk - Making modern C compilation for 6502 possible
- Wikipedia: Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter - For the curious