Skip to content

Shrink crates.io footprint: build precompute_hash/sample tables lazily#48

Merged
GordonYuanyc merged 2 commits into
mainfrom
reduce-sloc-precompute-buildtime
May 14, 2026
Merged

Shrink crates.io footprint: build precompute_hash/sample tables lazily#48
GordonYuanyc merged 2 commits into
mainfrom
reduce-sloc-precompute-buildtime

Conversation

@GordonYuanyc
Copy link
Copy Markdown
Collaborator

Summary

Replaces the ~131K-line checked-in precompute tables (PRECOMPUTED_HASH, PRECOMPUTED_SAMPLE, PRECOMPUTED_SAMPLE_RATE_1PERCENT) with std::sync::LazyLock-backed tables that materialise once on first access.

Net diff vs main: +115 / -147,611 lines.

Why not just generate them at build time?

Two clean library-idiomatic options exist for "table that is really just a deterministic function of i":

  1. Build-time codegen into OUT_DIR — the approach the previous commit on this branch took for the sample tables. It works, but for PRECOMPUTED_HASH specifically the generator would need to either duplicate the byte-conversion + XxHash3_128 logic in build.rs or pull twox-hash in as a build dependency (drift risk; the contract is literally "what hash128_seeded returns for these inputs"). It also writes 64K+ lines of generated Rust under target/ on every clean build.
  2. LazyLock — describe the function, materialise the table on first access. No build script involvement, no extra build deps, and for the hash table the values are computed by the crate's own hash128_seeded, so by construction they can never drift.

This PR picks (2) for all three tables.

Per-table summary

  • precompute_hash.rs: 16 390 lines → 31 lines. pub static PRECOMPUTED_HASH: LazyLock<Box<[u128]>> computed from hash128_seeded(0, &DataInput::U64(i)).
  • precompute_sample.rs / precompute_sample2.rs: LazyLock<Box<[f64]>> from a SmallRng seeded with the same 0xA5A0_5A71_B11B_C0DE the previous commit on this branch settled on, so values remain reproducible across runs/versions. The 1%-rate table reuses a private build_ln_one_minus_u_table(scale) helper.
  • build.rs: stripped of all codegen; only compiles .proto files now.
  • Cargo.toml: removed rand from [build-dependencies], the three [[bin]] entries for generate_precomputed_*, and the now-unused internal-bins feature.
  • src/bin/: deleted (only ever held the three generator bins; directory removed entirely).
  • docs/library_map.md: updated to point at the new LazyLock design instead of the removed bins.

API impact

Names are unchanged. Indexed access (PRECOMPUTED_HASH[i]), iteration (.iter()), and length (.len()) continue to work via Deref to the underlying slice — the existing call sites in src/sketch_framework/nitro.rs and src/common/structure_utils.rs are unmodified.

The element container changes from [T; N] to Box<[T]>:

  • PRECOMPUTED_HASH[i] -> u128: unchanged.
  • &PRECOMPUTED_HASH as &[u128; 0x4000]: no longer compiles; use &PRECOMPUTED_HASH[..] or &[u128].
  • New PRECOMPUTED_HASH_LEN (= 0x4000) and PRECOMPUTED_SAMPLE_LEN (= 0x10000) constants are exposed for compile-time length needs.

Runtime cost

One-time initialisation on first access:

  • hash table: ~16K XxHash3_128 evaluations + one 256 KiB heap alloc
  • each sample table: ~64K SmallRng draws + ln + one 512 KiB heap alloc

Steady state: indexed access adds one relaxed atomic load (well-cached) versus the previous static array.

Test plan

  • cargo build succeeds
  • cargo test --lib — 394/394 pass, including the nitro_batch_* tests that hammer PRECOMPUTED_SAMPLE_RATE_1PERCENT in the hot path
  • cargo clippy --lib --tests --no-deps clean
  • Reviewer to sanity-check the Box<[T]> API change is acceptable for downstream callers (the names and indexed-access pattern are preserved)

Made with Cursor

GordonYuanyc and others added 2 commits May 12, 2026 20:51
Move the two 65,541-line geometric-sampling tables out of the committed
source and regenerate them deterministically from build.rs into OUT_DIR.
Each precompute_sample{,2}.rs in src/common/ now reduces to a single
`include!(concat!(env!("OUT_DIR"), "/..."))` line.

This is a temporary fix to shrink the SLoC reported on crates.io
(currently ~169K, ~83% of which is these tables). precompute_hash.rs
is left as-is because the generator depends on crate-internal hashing
and can't be trivially moved into build.rs.

Note: the seeded SmallRng produces different table values than the
previously committed snapshot, but the values are only used as a
pre-rolled stream of random samples — no test relies on exact values
and all 394 lib tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LazyLock

Replace the giant checked-in `PRECOMPUTED_HASH` array (and the
build-script codegen that the previous commit introduced for the two
`PRECOMPUTED_SAMPLE*` tables) with `LazyLock`-backed tables that
materialise once on first access.

For `PRECOMPUTED_HASH` this is meaningfully better than build-time
codegen: the table is by definition "what `hash128_seeded` returns for
these inputs", so computing it from the crate's own hasher at runtime
guarantees the two can never drift, and avoids either duplicating the
XxHash3_128 byte-conversion logic in `build.rs` or pulling `twox-hash`
in as a build dependency.

For the two sample tables the LazyLock path is a strict simplification
of the previous build-script approach: same fixed seed, same
determinism across runs/versions, but no `OUT_DIR` codegen, no extra
build-dep on `rand`, and no megabytes of generated source under
`target/` on every clean build.

While here, drop the three `generate_precomputed_*` bins, the
`internal-bins` feature, and the `src/bin/` directory they lived in --
they only existed to regenerate the checked-in tables and have no
remaining purpose.

Net diff vs main: +115 / -147,611 lines.

API impact: `PRECOMPUTED_HASH`, `PRECOMPUTED_SAMPLE`, and
`PRECOMPUTED_SAMPLE_RATE_1PERCENT` keep the same names and the same
indexed-access pattern (`X[i]`, `.iter()`, `.len()` all work via
`Deref`). The element container changes from `[T; N]` to `Box<[T]>`,
so callers spelling the type as `&[u128; 0x4000]` need `&[u128]`
instead; new `PRECOMPUTED_HASH_LEN` / `PRECOMPUTED_SAMPLE_LEN`
constants are exposed for compile-time length.

Co-authored-by: Cursor <cursoragent@cursor.com>
@GordonYuanyc GordonYuanyc merged commit e1c936e into main May 14, 2026
2 checks passed
@GordonYuanyc GordonYuanyc deleted the reduce-sloc-precompute-buildtime branch May 14, 2026 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant