Skip to content

engine: deterministic VDF slot-table decode + arrayed lookup-only descriptor rebind#605

Merged
bpowers merged 1 commit into
mainfrom
vdf-deterministic-decode
May 21, 2026
Merged

engine: deterministic VDF slot-table decode + arrayed lookup-only descriptor rebind#605
bpowers merged 1 commit into
mainfrom
vdf-deterministic-decode

Conversation

@bpowers
Copy link
Copy Markdown
Owner

@bpowers bpowers commented May 21, 2026

Summary

Two related VDF-reader corrections that close the last structural gaps the
heuristics were papering over. The format is now decoded deterministically (no
scan/stride heuristics), and lookup-only "variables" -- graphical-function
tables, not time series -- are no longer reconstructed.

Slot table (Fixes #470, Fixes #549)

The slot-table under-count on edited fixtures (risk2.vdf, SCEN01.VDF, the
metasd social-network-valuation pair) was not a find_slot_table bug.
Root cause: parse_name_table_extended stopped at the first stale/deleted
name-table entry (a deleted name's non-printable binary payload, e.g. risk2's
91 01 00 00 63 79 20 30) instead of skipping it by its declared u16 length,
which capped the slot count via max_name_count. Vensim's reader skips such
entries; docs/design/vdf.md already specified the skip -- only the Rust code
stopped.

With the name table complete, the slot table is fully determined by the section
header, so the backward-scan heuristic (largest run of unique, in-range,
4-byte-aligned offsets with min_stride >= 4) is removed. slot_table_from_header:

  • start = section 1's field1 1-based word pointer
  • count = block1[7]
  • cross-checks the 0x00430000 terminator and the name-table boundary

These three over-determine each other and were verified to agree on all 138
run-file and 6 dataset VDFs
in the corpus. The block1[7] structural invariant
is tightened from a +/-2 diagnostic window to exact, with the
rust_slot_table_undercount_known exemptions removed.

Lookup-only descriptors (Fixes #597)

A graphical function is a table indexed by an explicit input (y = lookup(input)); a bare lookup with no call site of its own is not a time
series
, so Vensim saves no data block for it -- only a descriptor record whose
f[11] is a section-6 lookup-record index. The reader now drops standalone
lookup-only descriptors (exactly as it already drops overlapping descriptors)
instead of reconstructing a series at their f[11]-as-OT-start stock-ghost
block. The table's values, where they matter, are carried by the consumer
variables that call it with a real input -- those are ordinary owners the reader
emits under their own names.

This shrinks the C-LEARN EXPECTED_VDF_RESIDUAL from 21 to 13 with zero
regressions (verified by clearn_residual_exactness, grew == []): six
lookup-only bases (historical_gdp_lookup, historical_forestry_lookup,
rs_gdp_in_trillions, ozone_precursor_forcings, oc,_bc,_and_bio_aerosol_forcings,
other_forcings_smooth_plus_rcp85) drop out of the comparison entirely. The nine
that remain are lookup tables the model-free reader cannot safely distinguish
from a real owner, so it still emits a ghost column the comparator flags:

  • rs_hfc* (8): the descriptor forward-links to the wider 2-D consumer
    RS HFC[COP, HFC type] (forward width 63 != the descriptor's 7), so the
    conservative width gate declines to drop it.
  • ref_global_emissions_from_graph_lookup: its forward link is Time/0.

The deeper bug is in the engine: lowering a bare lookup to gf(Time) (#590)
synthesises a phantom series -- a table is not generally a function of time, and
using Time as the index is a unit error (it only coincidentally matches in
C-LEARN, where the tables' x-axis is calendar year and the consumers call them as
LOOKUP(Time / One year)). Fixing that removes the remaining nine from the
matched set entirely; tracked as a separate engine follow-up.

Verification

  • All VDF unit + integration tests pass (vdf, vdf_multidim, vdf_alias_decoder, vdf_structural_invariants).
  • clearn_residual_exactness and simulates_clearn pass in release (residual exact at 13, zero regressions).
  • Full pre-commit suite passes (fmt, clippy -D warnings, workspace tests, WASM build, TypeScript, Python).

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Review: deterministic VDF slot-table decode + arrayed lookup-only rebind

The slot-table determinization (slot_table_from_header) and the arrayed lookup-only rebind both read cleanly, and the bounds/cross-checks in slot_table_from_header are sound. One correctness issue stands out, plus a minor robustness note.

[P1] Name-key walkers not updated to skip stale entries — record→name resolution misaligns on edited files

src/simlin-engine/src/vdf.rs:1779 (and the un-updated siblings at src/simlin-engine/src/vdf.rs:1491-1508 and src/simlin-engine/src/vdf/section5_dims.rs:82-97)

parse_name_table_extended now skips stale/non-printable entries — it advances pos += len without pushing them to names, so vdf.names is compacted. But the two sibling walkers that rebuild indices into vdf.namesrecord_name_key_to_name_index and build_name_key_to_name — still bump name_idx for every non-zero-length entry, stale ones included. After the first stale entry the two walks diverge: an offset key for a real name that sits at compacted index i in vdf.names is recorded with name_idx = i + (# preceding stale entries), so decoded_record_spans resolves a record's f[2] to the wrong vdf.names[name_idx] (or drops it when name_idx runs off the end). This fires precisely on the edited fixtures this PR targets (risk2.vdf, SCEN01.VDF, the social-network-valuation pair) whenever a record-backed name follows a stale entry, mislabeling or dropping columns in to_results_via_records and sec-5 dimension recovery. The touched tests only assert column counts (e.g. non_time_named >= 150), so the mislabeling passes unnoticed. Both walkers need the same !s.is_empty() && all-printable skip (advancing pos without incrementing name_idx) so their indices match the compacted vdf.names.

[P3] Unbounded slot_count * 4 from a raw file u32

src/simlin-engine/src/vdf.rs:1859

slot_count is read straight from block1[7] and multiplied by 4 before any upper bound. On a 32-bit target (wasm32 is a first-class build target here) a corrupt/crafted file with a huge block1[7] overflows usize in slot_start + slot_count * 4 — a debug-build panic, and in release it wraps (the terminator cross-check then usually rejects it, so impact is limited to malformed input). The old scanner bounded the count by names.len(); a checked_mul/explicit cap would preserve that property and matches the defensive style already used in this function.


Overall correctness: not correct. The P1 name-key misalignment is a data-correctness regression on the very edited-file fixtures the PR sets out to fix, and is not caught by the count-only assertions in the touched tests.

…ptors

Two related VDF-reader corrections that close the last structural gaps the
heuristics were papering over.

Slot table (#470, #549). The slot-table under-count on edited fixtures was not a
find_slot_table bug: parse_name_table_extended stopped at the first stale/deleted
name-table entry (a deleted name's non-printable binary payload) instead of
skipping it by its declared length, capping the slot count. Vensim's reader skips
such entries -- docs/design/vdf.md already specified the skip, only the Rust code
stopped. With the name table complete, the slot table is fully determined by the
section header, so the backward-scan heuristic is removed: slot_table_from_header
reads the start from section 1's field1 1-based word pointer and the count from
block1[7], and cross-checks the 0x00430000 terminator and the name-table boundary.
Verified on all 138 run-file and 6 dataset VDFs in the corpus. The block1[7]
invariant is tightened from +/-2 to exact, with no fixture exemptions.

Lookup-only descriptors (#597). A bare graphical function is a TABLE indexed by
an explicit input, not a time series, so Vensim saves no series for it -- only a
descriptor record whose f[11] is a section-6 lookup-record index. The reader now
DROPS standalone lookup-only descriptors (like overlapping ones) instead of
reconstructing a series at their f[11]-as-OT-start stock ghost block; their
values, where they matter, are carried by the consumer variables that call them,
emitted as ordinary owners under their own names. This shrinks the C-LEARN
EXPECTED_VDF_RESIDUAL from 21 to 13: six lookup-only bases drop out of the
comparison, and the nine that remain are lookup tables the model-free reader
cannot safely distinguish from a real owner (the rs_hfc* family forward-links to
a wider 2-D consumer; one scalar forward-links to Time/0), so it still emits a
ghost column the comparator flags. The deeper bug is in the engine: lowering a
bare lookup to gf(Time) synthesises a phantom series (a table is not generally a
function of time; using Time as the index is a unit error). Fixing that removes
the remaining nine from the matched set entirely.
@bpowers bpowers force-pushed the vdf-deterministic-decode branch from dd96f12 to 8c7bdb9 Compare May 21, 2026 05:39
@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Review: deterministic VDF slot-table decode + drop lookup-only descriptors

I reviewed the engine changes (vdf.rs, vdf/record_results.rs) and the test updates. The implementation is correct and well-tested; I did not find a blocking bug. Notes below.

Verified

  • slot_table_from_header bounds — the short-circuit chain pins every read in-bounds: terminator_offset + 4 != name_table_offset + terminator_offset + 4 > data.len() guarantee the terminator read_u32 is within data, and the slot reads stay inside [slot_start, terminator_offset). Bail-out to (0, empty) degrades gracefully to the record-based path.
  • parse_name_table_extended skip-vs-break — the loop is still bounded by pos + len > parse_end and the len > 256 guard, always advances pos, and cannot read past parse_end (clamped to data.len()). The existing test_parse_name_table_extended_stops_on_invalid_data still holds.
  • standalone_lookup_only_descriptors — the span_len == 0 guard prevents a vacuously-true ghost_all_stock; the arrayed width gate (word[11] == span_len) and the forward-block owner check are consistent with the new/renamed unit tests. output_width() reads words[11], safe given words: [u32; 13].
  • rebinds / emit_descriptor_rebinds removal — no dangling references remain; HashMap is still used elsewhere in the module. The only names[slot_table.len()..] slice is in a corpus test, so the strict slot_count == block1[7] (which can exceed names.len() in principle) carries no production panic risk; production consumers use slot_table only as a HashSet.

Non-blocking (P3): stale module doc

src/simlin-engine/CLAUDE.md (lines 5 and 80) still describes record_results.rs::standalone_descriptor_rebinds as re-binding a standalone descriptor "to its forward-link evaluated-output OT ... scalar only, arrayed standalone lookups are deferred." This PR renames it to standalone_lookup_only_descriptors, inverts the behavior (drop, not rebind), and adds arrayed handling. Per the file's own maintenance note, this bullet and the "Last updated" summary item (5) should be updated to match. Not part of the diff, so flagging here rather than inline.

Overall correctness: correct

Existing tests and behavior are preserved; the change is free of blocking issues. The doc drift above is documentation-only and does not affect correctness.

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Code review — PR #605 (deterministic VDF slot-table decode + drop lookup-only descriptors)

I reviewed the slot-table determinism change, the name-table stale-entry skip, and the descriptor rebind→drop conversion for correctness and memory safety.

Verified sound

  • slot_table_from_header is memory-safe: the || cross-check short-circuits terminator_offset + 4 > data.len() before the read_u32(data, terminator_offset), and the slot-read loop is bounded by terminator_offset < name_table_offset <= data.len(). The count-word read is guarded by count_word_offset + 4 > data.len().
  • Section indexing in both call sites is guarded (dataset: sections.len() != 5; simulation: sections.get(1) + a valid name_section_idx).
  • standalone_lookup_only_descriptors uses safe .get() for the forward-link/width lookups; n_lookups == lookup_word10.len() (both derive from section6_lookup_records()), and words: [u32; 13] makes output_width()'s words[11] in bounds.
  • No dangling references remain to the removed DescriptorIdentification::rebinds / emit_descriptor_rebinds. Issue refs (engine: VDF reader mis-decodes arrayed standalone graphical-function ("lookup-only") descriptor columns (Ref.vdf) #597, engine: standalone lookup-only variable is lowered to gf(Time) (unit error; phantom Time-indexed series for a bare graphical function) #606) exist and match. The EXPECTED_VDF_RESIDUAL list has exactly 13 entries, matching the updated comment.
  • The name-parser change (skip-and-continue vs. break) cannot loop forever (pos advances ≥2 each iteration) and only grows the recovered name set, so it does not regress the slot_table.len() <= names.len() assumption.

Finding

[P3] Stale module doc: record_results.rs description in src/simlin-engine/CLAUDE.md not updated for the rename + behavior inversionsrc/simlin-engine/CLAUDE.md:80 (and the "Last updated" note at line 5) still describe standalone_descriptor_rebinds as re-binding a standalone lookup-only descriptor to its forward-link OT, "scalar only, arrayed standalone lookups are deferred." This PR renames that function to standalone_lookup_only_descriptors, inverts the behavior (it now drops the descriptor rather than re-binding/emitting a column), and adds arrayed handling with the word[11] width gate. The repo's documented hard rule ("Keep this file up to date when adding, removing, or reorganizing modules") makes this worth updating in the same commit. This is documentation-only and outside the diff'd files, so it does not affect correctness.

Overall correctness verdict

Correct. The patch is memory-safe, logically consistent, and the behavioral changes (deterministic header-driven slot decode; dropping lookup-only descriptors) are intentional and verified by the updated unit/integration tests and clearn_residual_exactness (grew == [], residual 21→13) with no regressions. The only item above is a non-blocking documentation-consistency nit.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 93.50649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.86%. Comparing base (59e48a9) to head (8c7bdb9).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/simlin-engine/src/vdf.rs 86.95% 3 Missing ⚠️
...c/simlin-engine/tests/vdf_structural_invariants.rs 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #605      +/-   ##
==========================================
- Coverage   82.87%   82.86%   -0.01%     
==========================================
  Files         260      261       +1     
  Lines       69576    69813     +237     
==========================================
+ Hits        57659    57850     +191     
- Misses      11917    11963      +46     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bpowers bpowers merged commit 7b2cf1a into main May 21, 2026
15 checks passed
@bpowers bpowers deleted the vdf-deterministic-decode branch May 21, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant