Skip to content

Close the C-LEARN simulation residual; add CLI GET DIRECT external-data support#600

Merged
bpowers merged 21 commits into
mainfrom
clearn-residual
May 20, 2026
Merged

Close the C-LEARN simulation residual; add CLI GET DIRECT external-data support#600
bpowers merged 21 commits into
mainfrom
clearn-residual

Conversation

@bpowers
Copy link
Copy Markdown
Owner

@bpowers bpowers commented May 20, 2026

Summary

Closes the C-LEARN model's simulation residual vs Vensim Ref.vdf as general engine/CLI primitives (no C-LEARN special-casing). Five phases:

  • Lookup-only variables (standalone graphical functions) now lower uniformly to gf(Time) across scalar / arrayed / apply-to-all, replacing the inconsistent gf(0)-vs-literal-0 lowering.
  • Shadowed builtin macros: a user :MACRO: RAMP FROM TO(...) is no longer linearized at import time and now resolves through the compile-time macro path, so its exponential branch runs.
  • Passthrough INITIAL macros (:MACRO: INIT(x) = INITIAL(x)) collapse to the LoadInitial opcode at the call site; plus a general dt-runlist chain-break fix for stock submodel outputs (salsa path now at parity with the legacy path).
  • Runlist determinism: fixed a pre-existing HashMap-iteration-order nondeterminism in the initials runlist, so the same model always produces the same runlist and output.
  • Residual attribution + gate: EXPECTED_VDF_RESIDUAL tightened from 32 to 21 bases, each with a sourced category (0 engine-genuine remain; the rest are VDF-reader decode artifacts, benign-near-zero, or :NA:-arithmetic boundaries). Both --ignored C-LEARN gates pass deterministically, with comparator tolerances/floors unchanged (guard-tested). Fixed the scalar VDF-reader decode of standalone graphical-function descriptor columns.
  • CLI external data: simlin-cli now resolves Vensim GET DIRECT * external data via a FilesystemDataProvider rooted at the model path (stdin uses the null provider).

The engine value is provably correct for every remaining residual base (the divergences are reference-side, in our Ref.vdf test-reader). Follow-ups filed: #595 (init-runlist soundness gap), #597 (arrayed VDF-reader decode), #598 (WASM data-supply API), tech-debt #64. #590 / #591 updated with final disposition.

Test plan

  • cargo test -p simlin-engine and cargo test -p simlin-cli (default suites) pass.
  • Release gates pass deterministically: cargo test -p simlin-engine --features file_io --release --test simulate -- --ignored simulates_clearn clearn_residual_exactness.
  • Human verification steps in docs/test-plans/2026-05-19-clearn-residual.md.

bpowers added 21 commits May 20, 2026 15:58
Adds the pure predicate `is_lookup_only(equation, has_tables)` plus unit
tests in compiler/mod.rs. A lookup-only variable is a standalone
graphical-function holder: it has a graphical function and its equation
is empty or the MDL lookup sentinel (no functional input expression).

The empty-equation arm is load-bearing: an XMILE-sourced lookup-only
variable carries an empty/absent equation plus a gf, while an
MDL-sourced one carries LOOKUP_SENTINEL plus a gf -- the sentinel string
is MDL-format-internal, not a stable datamodel contract. WITH LOOKUP
(tables + a real input expr) and an ordinary aux (no tables) both return
false, mirroring mdl::writer::is_lookup_only_equation.

The helper is unused in non-test code in this commit (carrying a
transient allow(dead_code)); the following commit wires it into the
scalar/arrayed/A2A lowering sites and removes the allow.
… semantics

A standalone graphical-function ("lookup-only") variable -- one whose
entire equation is an inline table with no functional argument -- was
lowered inconsistently: the scalar path wrapped it as LOOKUP(self, 0)
(a constant gf(0)), while the arrayed and apply-to-all hoisting paths
emitted a literal 0 and never consulted the attached tables. None
matched genuine Vensim.

Detect lookup-only variables (the committed is_lookup_only atom,
composed by var_is_lookup_only over the datamodel Equation shape) and
lower all three shapes uniformly through the existing scalar Lookup
opcode at the determined index. Phase 1 determined that index
empirically to be the current simulation time -- gf(Time): applied
consumers of C-LEARN's year-indexed lookup tables byte-match gf(Time)
against Ref.vdf (with clamp-to-endpoint outside the x-domain), and the
importer's MdlEquation::Implicit arm already lowers an unspecified-input
gf to equation = "TIME". The index is centralized in
lookup_only_index_expr so the three sites stay in lock-step.

The arrayed case has per-element tables (tables.len() == n) and wraps
each element's own offset off+i (codegen derives elem_off = i,
table_count = n); the A2A case has one shared variable-level table
(tables.len() == 1) and must wrap the BASE offset off (elem_off 0,
table_count 1) -- wrapping off+i there would push the VM Lookup bounds
check out of range for i>0 and return NaN. WITH LOOKUP (tables + a real
input) keeps its LOOKUP(self, input) lowering; ordinary auxes and stocks
are untouched. No new datamodel field, protobuf change, or VM/codegen
primitive; the per-element table layout (build_tables /
reorder_arrayed_element_tables) and the Lookup opcode are reused. The
production salsa-incremental path reuses compiler::Var::new, so the fix
applies to both the legacy and incremental paths.

Adds C-LEARN-independent generality fixtures (lookup_only_tests.rs) for
the scalar, arrayed (non-alphabetical declared order), A2A, and applied
(LOOKUP(g, idx) with idx distinct from Time) cases. These were RED
before this lowering -- the arrayed/A2A standalone cases produced
[0,0,0] and the scalar standalone case produced a flat constant gf(0)
[11,11,11] -- and are GREEN with it. Removes the transient
allow(dead_code) on is_lookup_only now that production references it, and
pins the regular Lookup opcode's out-of-range clamp / NaN-index behavior
(unchanged by this fix, AC1.6) with a focused vm test.
…eview)

The arrayed lookup-only lowering (`expand_arrayed_with_hoisting`) wrapped
`Expr::Var(off + i)` unconditionally, which is only correct when the table
layout is per-element. `variable.rs::build_tables` produces one table per
element (`tables.len() == n`) ONLY when at least one element carries its own
gf; an `Equation::Arrayed` whose element equations are all empty/sentinel with
a single variable-level gf falls through to the shared-table branch
(`tables.len() == 1`, the same shape A2A always has). In that case
`codegen::extract_table_info` derives `elem_off = i` while `table_count == 1`,
so the VM's `Lookup` bounds check pushes NaN for every element with i > 0.

Make the offset depend on the actual table layout, mirroring how
`build_tables` decides it: wrap `off + i` for the per-element layout and the
BASE `off` for the single shared table (exactly like `expand_a2a_with_hoisting`).
The two layouts are encoded as `Option<LookupOnlyLayout>` so the invalid
"lookup-only but neither layout" state is unrepresentable and the caller
derives the variant from `tables.len()`. No name/path special-casing; the
existing `Lookup` opcode is reused.

The MDL/XMILE importers never emit this shape (they attach per-element gfs to
an arrayed lookup), but `datamodel::Project` is a public compile input that
protobuf/JSON/serde/MCP/pysimlin/libsimlin all preserve verbatim, so a
directly-constructed project can reach it. Pinned by a new fixture
(`arrayed_lookup_only_variable_level_gf_shares_one_table_at_time`): every
element now reads the one shared table at gf(Time) instead of NaN for i > 0.
The MDL->datamodel expression formatter (`xmile_compat.rs`) had a
`format_call_ctx` arm that rewrote any 4+-arg `RAMP FROM TO(...)` into a
hardcoded linear `(xfrom) + RAMP(slope, tstart, tend)` string at import
time, discarding the 5th `islinear` selector and the entire macro body,
*before* compile-time macro resolution runs. `RAMP FROM TO` is not a
native builtin, so a model that uses it must define a same-named
`:MACRO:` -- which the macro-shadows-everything precedence in
`builtins_visitor.rs` resolves correctly. The import-time rewrite
silently pre-empted that path, so a macro selecting its exponential
branch (`islinear = 0`) always ran the linear ramp instead.

Deleting the arm lets the call survive import as `RAMP_FROM_TO(...)`,
which canonicalizes to `ramp_from_to` and resolves through the existing
macro path. No new builtin/opcode/codegen primitive is added. Line 555's
`"ramp from to" => "RAMP_FROM_TO"` name mapping is intentionally kept --
it is the active formatting path once the arm is gone (redundant with the
catch-all, but self-documenting).

Covers clearn-residual.AC2.1/AC2.2/AC2.3/AC2.4. The new discriminating
test `macro_ramp_from_to_runs_selected_branch` (exp branch uses double
slope so it is provably distinct from the linear branch mid-ramp) and the
inverted formatter test `test_ramp_from_to_survives_as_macro_call` were
both RED before the arm deletion and are GREEN after. The pre-existing
no-regression guard `simulates_macro_clearn_ramp_from_to_mdl`
(`islinear = 1`, same linear series via the macro path) still passes.

Evidence (the ignored, release-only `clearn_residual_exactness` gate, NOT
pruned here -- Phase 4 reconciles `EXPECTED_VDF_RESIDUAL`): five excluded
bases no longer fail (`shrank`): im_3_emissions, im_3_emissions_vs_rs,
im_3_ff_co2, relative_emissions_to_equity,
relative_emissions_to_equity_target. Two bases newly enter the residual
(rs_co2_ff, rs_gdp_in_trillions) -- Phase 4 re-measures and reconciles the
gate after Phases 1-3.
Record the audit conclusion above `format_call_ctx`'s match: arms that
*restructure* a builtin-named call into a different expression at import
time pre-empt any same-named user `:MACRO:`, because the rewrite runs
before compile-time macro resolution. The `ramp from to` arm was the one
C-LEARN tripped (now removed); the other restructuring arms
(`sample if true`, `pulse`, `pulse train`, `modulo`, `zidz`, `xidz`,
`get data between times`, etc.) carry the same latent hazard but are not
currently shadowed by any in-repo model -- if a future model defines such
a macro, the arm must be guarded the same way.

Name-preserving renames are safe and need no change: `SSHAPE` is handled
only by `format_function_name` (a genuine 3-arg builtin, not restructured
here), so a same-named macro shadows it cleanly. The existing
`ac5_4_macro_shadows_ramp_from_to_builtin` and
`ac5_4_macro_shadows_sshape_builtin` tests already pin both the
multi-word and single-word shadowing cases, so no new test is added.

Documentation only; no runtime behavior change. Covers
clearn-residual.AC2.5.
Add the pure structural classifier classify_passthrough and its
PassthroughBuiltin result type. A genuine passthrough macro is a
single-parameter, single-output macro whose primary-output body is
exactly out = BUILTIN(param) where BUILTIN canonicalizes to the same
renamed-builtin-collision name as the macro itself -- the self-call
shape the MDL importer's INITIAL -> INIT rename produces for
:MACRO: INIT(x) = INITIAL(x).

This is the Functional-Core decision a later commit uses to collapse
such a macro directly to its proven opcode (LoadInitial) at the call
site, bypassing the buggy per-element synthetic module. The strict
structural match (single param; no additional outputs; body exactly a
single one-argument call; the argument is the bare parameter; the call
canonicalizes to the macro name; that name is a renamed-builtin
collision) cannot misfire on a non-passthrough macro that merely shares
a builtin name. The items carry a temporary allow(dead_code) until the
next commit threads them onto ModuleFunctionDescriptor.
Thread the passthrough classification onto ModuleFunctionDescriptor as a
new passthrough: Option<PassthroughBuiltin> field, populated in
MacroRegistry::build via the new classify_macro_passthrough bridge: it
locates the primary-output body variable, parses its single scalar
equation (the only place each macro body is parsed for this), and calls
the pure classify_passthrough. Stdlib descriptors and non-passthrough
macros -- including Phase 2's RAMP FROM TO, which is not a passthrough --
get None.

The field participates in PartialEq/Eq (PassthroughBuiltin derives them)
so salsa invalidation of project_macro_registry stays correct, and the
salsa::Update derivation is preserved. This commit adds the field and
its classification only; the call site does not consume it yet, so there
is no runtime behavior change.
…the call site

A user macro :MACRO: INIT(x) = INITIAL(x) collides (after the importer
renames Vensim INITIAL -> INIT) with the renamed builtin and, under
macro-shadows-everything precedence, expands every INITIAL call into a
buggy per-element synthetic module whose value is mis-ordered /
mis-propagated for an element-wise recurrence (#591-c1). The call site
now collapses a GENUINE passthrough macro (descriptor.passthrough set by
the registry classifier) directly to the proven opcode by skipping
expand_module_function and falling through to the existing
renamed-builtin intrinsic routing -- init -> LoadInitial with the
existing make_temp_arg hoisting. This generalizes the #554 self-call
exception from inside the macro body to the call site. The self-call
invariant the classifier guarantees (canonicalize(call) ==
canonicalize(macro_name) AND is_renamed_builtin_macro_collision) makes
the fall-through route func to the right intrinsic.

equation_is_module_call gets the same treatment: a passthrough macro is
no longer pre-classified as a module call, so collect_module_idents
agrees with the call site (the variable is the scalar opcode it collapses
to, not a multi-slot module). This is the same module-vs-opcode decision
made consistently at both layers -- a passthrough INIT macro now behaves
exactly like the bare INIT builtin it collapses to (which is likewise not
a module call), the way the proven bare-INITIAL recurrence already does.

No NA-arithmetic change (float.rs::NA untouched).

The two fixtures added here were RED before this collapse and are GREEN
after: the element-wise macro_init_recurrence fixture (helper_recurrence
+ the :MACRO: INIT block) dropped to ecc=0 at t>=1 via the synthetic
module and now holds 1/2/4 across all saved steps (AC3.2); the inline
scalar capture holds INITIAL TIME constant (AC3.1). The bare-INITIAL
helper_recurrence path is unchanged (AC3.3).
A non-module variable that reads a synthetic module's STOCK output (e.g.
`v = SMOOTH(input, dt)` -> `v` reads the SMOOTH's INTEG `output`) kept a
spurious `v -> module` dt dependency in the salsa runlist: `normalize_deps`
collapses the `module·output` reference to the bare module name, and unlike
the legacy `model.rs::module_output_deps` path (which omits a stock output's
module dep in the dt phase) `build_var_info` only applied that stock-output
chain-break to MODULE readers, not ordinary ones.

Because a module is a sink in the cycle relation (`dt_walk_successors`) but
carries its input `src` as a direct dep in `dt_dependencies`, the spurious
`reader -> module` edge closes an ordering cycle that cycle DETECTION never
sees. `topo_sort_str` then broke that cycle arbitrarily (HashMap-iteration
dependent), sometimes emitting the SMOOTH module BEFORE its input helper. The
mis-ordered element read a stale (0) input every flows step and decayed to 0
at t>=1 while matching at t0 -- the C-LEARN `emissions_with_stopped_growth`
(SAMPLE IF TRUE -> PREVIOUS-self + SMOOTH whose input feeds back through the
emissions chain) #591-c1 residual, and a latent nondeterministic runlist for
any such stock-output feedback.

The fix moves the existing stock-submodel-output filter so it applies to
every dt reader (mirroring the legacy gate), and resolves the (synthesized,
implicit) module-instance name to its model to test the output's kind. Only
`dt_deps` is filtered; the init phase keeps the edge (a stock's init value is
computed during init, so it is a real init dependency). Adds an inline-MDL
regression asserting every element of an arrayed SAMPLE-IF-TRUE+SMOOTH with a
dt-only input feedback holds its smoothed constant at every saved step.
…stdoc)

- simulate.rs (Minor 1): correct the scalar AC3.1 test's doc comment. It is
  NOT a value RED->GREEN discriminator for the Task 4 synthetic-module
  collapse (the scalar non-recurrence INITIAL-capture is handled correctly
  by the synthetic-module path; the implementation-time "RED" was a
  NotSimulatable compile inconsistency, not a wrong value). The comment now
  states it asserts the AC3.1 held-constant invariant plus the
  compile-consistency the model.rs passthrough-exclusion enables, and points
  to the element-wise test as the genuine value discriminator. Assertions
  are unchanged.
- module_functions.rs (Minor 2): add classify_passthrough gate-5 unit test
  (single-arg previous(x) in an init-named macro -> None) so the self-call
  name check is exercised in isolation; the existing different-call-name
  test uses a TWO-arg body and is rejected earlier at the single-arg gate.
- model.rs (Minor 3): add the passthrough-exclusion caveat to the
  equation_is_module_call rustdoc so it aligns with the inline comment -- a
  genuine passthrough macro is not a module call here and behaves like the
  bare builtin it collapses to.
The Phase 3 call-site collapse (#591-c1) added a special case to
equation_is_module_call: a genuine passthrough macro was excluded from
the module-call classification, with a comment claiming that otherwise
collect_module_idents would mark the variable module-backed while the
walk emits a scalar opcode, so the fragments would fail to compile.

Investigation falsifies that claim. The module_idents set this
classification feeds is consumed in builtins_visitor only via
is_module_backed_ident, which gates whether a *referencing* PREVIOUS/INIT
synthesizes a scalar temp arg. A passthrough macro collapses to a plain
flat-slot variable, so that temp-arg copy is value-identical to reading
the slot directly -- the classification changes no observable result.
Toggling the exclusion (active vs neutralized) on a scalar and an arrayed
fixture where a passthrough-collapsed value is referenced inside a
downstream PREVIOUS and INITIAL produced byte-identical series in both
states, and the full default + file_io simulate suites stay green either
way. No shape exercises the exclusion.

Per "simple, general, testable code is better," remove the exclusion so a
passthrough caller is classified like any other resolved macro, and
rewrite the model.rs comment and the AC3.1 scalar-test doc to state only
verifiable facts: the scalar test asserts the held-constant invariant and
is not the value RED/GREEN discriminator for the collapse (the
synthetic-module path already produces the constant for the scalar case);
the element-wise recurrence test remains the genuine discriminator (it
drops to 0 with the collapse neutralized). No NA-arithmetic change, no
C-LEARN special-casing, EXPECTED_VDF_RESIDUAL untouched.
A lookup-only variable that Vensim saves *only* as a graphical-function
descriptor record (no separate consumer-owner record) has its `f[11]`
holding a section-6 lookup-record index, not an OT-block start. When that
index value reinterpreted as an OT-start lands in `[1, OT_count)` on a slot
with an owner class code, `decoded_record_spans` emits a span for it; and
because `identify_descriptor_records` only peels descriptors that sit in an
*overlapping* OT component, a standalone descriptor (which collides with
nothing) slips through as an owner and decodes at that spurious
`f[11]`-as-OT-start ghost slot -- a class-0x08 stock slot holding 0/garbage.
The real series is the forward-linked evaluated-output OT
`lookup_record[f[11]].word[10]` (proven byte-exact vs the engine for
"Global Emissions from graph LOOKUP" -> OT 1612).

The fix adds a pure functional-core `standalone_descriptor_rebinds` that
recognises such a record and re-binds it to its forward OT, gated to avoid
disturbing legitimate owners: the span is non-overlapping; `f[11]` is a
valid lookup-record index; the span is scalar (an arrayed descriptor
re-bound to one forward OT would scalarize and lose its element columns --
deferred); the `f[11]`-as-OT-start slot carries the STOCK class code (a
graphical-function variable is never a stock, so a stock slot is the
spurious-owner telltale -- a legitimate scalar owner whose `f[11]` is
coincidentally a small lookup index carries a non-stock 0x11 code and is
left untouched); and the forward link is a valid data OT (never Time/0).
`build_record_result_columns` emits the re-binds as scalar columns at the
forward OT, sharing it with any real consumer-owner (Vensim stores one
evaluated series consumed under several names) rather than claiming it.

On C-LEARN this reconciles exactly two scalar lookup-only bases that decode
byte-exact vs the engine -- `global_emissions_from_graph_lookup` and
`other_forcings_composite_plus_rcp85` -- moving them into the
`clearn_residual_exactness` `shrank` set. The other named scalar lookup-only
bases (`oc,_bc,_and_bio_aerosol_forcings`, `other_forcings_smooth_plus_rcp85`,
`ozone_precursor_forcings`, `ref_global_emissions_from_graph_lookup`) are NOT
reconciled: their forward links point at a foreign/shared consumer OT or at
Time, which differs from the engine's lookup-only output -- Vensim stores no
distinct column for them, a genuine residual. The arrayed lookup-only subset
(`rs_*`, `historical_*_lookup`) is explicitly deferred (the scalar gate
leaves their decode unchanged) because it needs element-order info the VDF
format does not store on disk; a follow-up issue will track it.

`EXPECTED_VDF_RESIDUAL` is untouched here (that pruning is the gate-tightening
task), so `clearn_residual_exactness` still fails overall with the two
newly-reconciled bases reported in `shrank`. The existing VDF reader corpus
tests (`vdf`, `vdf_alias_decoder`, `vdf_multidim`) stay green. Covered by a
non-ignored synthetic regression test on a minimal record set independent of
any C-LEARN name.
The Initials runlist was a function of the per-process HashMap
RandomState seed, not of the model: `model_dependency_graph_impl`
materialized its candidate set (`init_list`) from a `HashSet` and handed
it straight to `topo_sort_str`, which emits `names` in visit order and
breaks ties -- variables with no ordering dependency between them, e.g.
independent constants, stocks, and the lagged-input-stripped
`PREVIOUS()`/`INITIAL()` helpers -- by exactly that order. Two compiles
of the SAME model therefore produced different init orderings, and for
any unordered pair that feeds the init value of a `PREVIOUS()`/`INITIAL()`
variable, different (one possibly wrong) initial values. The Flows and
Stocks runlists were already deterministic because they filter the
pre-sorted `var_names`; only the initials phase leaked hash order.

Fix: sort `init_list` before `topo_sort_str`, matching the flows/stocks
contract. With a sorted `names` argument and BTreeSet-stored dependency
edges, `topo_sort_str`'s entire traversal -- including how it breaks any
residual ordering cycle -- is a deterministic function of the model. This
is general (deterministic simulation for all models), not C-LEARN
specific, and changes nothing about NA-arithmetic or tolerances.

Pre-existing, not introduced by this branch: the
`init_set.into_iter().collect()` pattern dates to the original salsa
incremental-compilation work (PR #289), long before the C-LEARN residual
branch. It only became observable now because C-LEARN's residual cells
sit near the 1% cross-simulator tolerance, where a tiny order-induced
floating-point delta flips a verdict. Empirically: across 12 fresh-DB
compiles of C-LEARN the initials runlist had 12 distinct orderings before
this change and 1 after (flows/stocks were 1 both times), and the live
residual set is now byte-identical run to run.

Relation to GH #595: this resolves the HashMap-iteration-order
nondeterminism #595 identifies as the proximate mechanism (the arbitrary,
hash-dependent init tie-break is now a deterministic function of the
model). It does NOT close #595's deeper soundness gap -- the
detection-vs-ordering edge-set inconsistency that lets an init ordering
cycle through a synthetic-module stock output exist undetected; such a
cycle is now ordered deterministically rather than reported. No in-repo
model triggers that shape today.

A regression test pins the property directly (no probability):
`initials_runlist_is_deterministic_across_fresh_databases` asserts a
byte-identical runlist across 32 freshly-seeded databases, and
`initials_runlist_is_sorted_topological_order` asserts the order equals
the stable (sorted-name tie-break) topological sort reconstructed from the
engine's own `initial_dependencies`. Both fail if the sort is removed.
Phases 1-3 plus the init-runlist-determinism fix (e24b008) reconciled the
bulk of the C-LEARN-vs-Ref.vdf residual. Re-measuring the live failing set
through the unchanged classify_vdf_ident comparator (run twice, byte-identical:
3482 idents matched; 21 live residual bases) shrinks EXPECTED_VDF_RESIDUAL from
32 base variables to exactly 21, each now carrying a one-line sourced reason
under a five-category taxonomy (no "unknown"):

  - engine-genuine-tracked (0): NONE. The only engine-genuine divergence,
    init-runlist nondeterminism (which permuted depth_at_bottom's per-layer
    init values), was fixed in e24b008; #595's deeper init-ordering soundness
    gap remains tracked.
  - VDF-decode-artifact (17): engine output proven correct (gf(Time) matches
    the model tables; applied consumers match Ref.vdf exactly); our Ref.vdf
    reader mis-decodes the standalone graphical-function descriptor columns.
    Scalar half fixed in d69754b; the arrayed element-order half (13 bases)
    and the 4 scalar no-distinct-column cases are tracked in #597.
  - benign-near-zero (2): co2eq_gap_closing_percentage (ratio of near-equal
    small values, peak ~1.1e-2) and diffusion_flux (~2% on a small early
    transient, matches late) -- cross-simulator noise on near-zero magnitudes.
  - NaN-vs-:NA: (0): NONE by construction. Where Vensim writes literal IEEE NaN
    and Simlin writes the finite :NA: sentinel (slr_inches_from_2000), the
    comparator NaN-skips those cells and the finite tail matches, so the series
    is neither all-NaN nor failing and never enters the failure set.
  - boundary (2): historical_gdp and last_set_target_year -- match everywhere
    except :NA:-arithmetic cells (Vensim -2*NA = -1.298e33 vs Simlin's bare
    sentinel -6.49e32). NA-arithmetic is confirmed correct and out of scope.

The comparator's five constants (VDF_RTOL, K_ATOL, MIN_MATCHED_FRACTION,
MIN_MATCHED_ABSOLUTE, MAX_NAN_SKIPPED_FRACTION) and the matched-floor logic are
byte-identical to before -- this is a membership tightening of the documented
exclusion, never a tolerance loosening. Two new non-ignored guard tests pin
that, on synthetic inputs independent of C-LEARN:
  - vdf_comparator_constants_and_floor_are_pinned: pins the five constants and
    proves a below-floor matched count still PANICS (the gate, which checks the
    floor AFTER exclusion, cannot pass vacuously).
  - classify_vdf_ident_nan_vs_na_skips_without_failing: proves a NaN-vs-:NA:
    series yields nan_skipped cells and zero failures, so it never enters the
    residual set.

Both --ignored gates pass deterministically: simulates_clearn (3360 matched
after exclusion vs a 348 floor, ~9.7x, no tolerance violations) and
clearn_residual_exactness (live set == the 21-base list, no grew/no shrank,
identical across two runs).

Issue dispositions posted via gh:
  - #590: the engine 0+0 graph-lookup zeroing is FIXED (Phase 1, lookup-only
    lowers to gf(Time)); the residual is now a VDF-reader decode artifact
    (engine proven correct), scalar half fixed (d69754b), arrayed half tracked
    in #597. Recommended closing #590 (left for a human to confirm).
  - #591: per-cluster final disposition -- RAMP FROM TO (Phase 2), passthrough
    INIT + dt-runlist ordering (Phase 3), and runlist determinism (e24b008)
    reconciled the bulk; the remainder is benign-near-zero (2) plus the
    :NA:-arithmetic boundary (2, documented genuine remainder).
  - #595: the nondeterminism half is fixed by e24b008 (deterministic init
    runlist); the deeper init-ordering soundness gap remains.
Address three Phase 4 review items; all are test-quality fixes, no
production logic changes.

- VDF standalone-descriptor rebind: add a negative unit test
  (legit_dynamic_owner_blocked_only_by_stock_slot_guard) where the
  STOCK-slot guard is the SOLE condition rejecting the re-bind -- the
  span is non-overlapping, scalar, has a valid f[11] lookup index, and
  its forward link resolves to a valid in-range owner OT, so guards
  1/2/3/5 all pass and only the 0x08 STOCK requirement at span.start
  rejects. The pre-existing legit_dynamic_owner test had the forward
  link out of range, so guard 5 rejected first and the STOCK guard was
  never load-bearing. Mutation-verified: removing guard 4 flips the new
  test to a non-empty re-bind ({0: 2}) while all other tests stay green.

- initials_runlist_is_sorted_topological_order: document that the DFS
  post-order reference reconstruction assumes an acyclic initials
  runlist (no resolved init SCC); a resolved SCC would be emitted
  SCC-contiguous by topo_sort_str's condensation and diverge.

- classify_vdf_ident_nan_vs_na_skips_without_failing: clarify the test
  covers only the partial-NaN / finite-tail case, and make the boundary
  executable with an assertion that an entirely-NaN VDF series flips
  all_nan true and WOULD enter the residual set (the separately-guarded
  exception).
The CLI's Vensim open path called the plain open_vensim(contents), which
uses the null DataProvider, so any model with a GET DIRECT DATA/CONSTANTS/
LOOKUPS/SUBSCRIPT reference failed to open from simlin-cli with "external
data file '...' referenced but no DataProvider configured" -- the engine
treats a missing provider as a hard error, not a silent zero.

Extract the InputFormat::Vensim arm body into open_vensim_model(path,
contents). When the model is a real file on disk, build a
FilesystemDataProvider rooted at the file's parent directory and call
open_vensim_with_data, mirroring the engine's simulate_mdl_path_with_data
test helper; relative references like 'data/a.csv' then resolve to a
sibling of the model file (Vensim's relative-to-model semantics). stdin
(/dev/stdin, a character device) and any non-regular-file path fall back
to the null provider, so /dev is never used as a data root.

The extraction landed first keeping the null-provider behavior, so the
data-driven test failed RED on behavior ("no DataProvider configured")
before the provider was wired, then passed GREEN after.

AC5.1/AC5.2: a #[cfg(test)] module exercises the directconst fixture
(GET DIRECT CONSTANTS reading data/a.csv cell B2 = 2050) through
open_vensim_model and the incremental-salsa run path, asserting the
data-driven value (not zero/NaN). AC5.3: a model referencing a missing
CSV yields a clear file-level diagnostic ("cannot resolve data file
'...'") rather than a silent zero. WASM/libsimlin default paths are
untouched (still on the null provider; libsimlin already exposes
simlin_project_open_vensim_with_data for the data case).
…data

open_vensim_model previously decided whether to build a
FilesystemDataProvider via path.is_file() on the input path, which for
stdin is the synthesized /dev/stdin sentinel. That guard holds for a TTY
or pipe (is_file() == false -> null provider), but is wrong when stdin is
redirected from a regular file: `simlin simulate < model.mdl` makes
/dev/stdin resolve to a regular file, so is_file() == true and a provider
gets built rooted at /dev/stdin's parent (/dev). The rustdoc and inline
comment claimed "no provider is built" for stdin, which the code did not
actually guarantee for the `< file` case.

Key the decision on the user's intent -- whether a real model path was
passed on the command line -- rather than filesystem metadata of the
sentinel. open_vensim_model now takes Option<&Path>: Some(model_path)
builds a provider rooted at the model's parent (bare filename -> CWD, as
before), None (stdin, whether a pipe or a `< file` redirection) uses the
null provider. The call site threads input.path (genuinely None for
stdin) straight through. Behavior for real path arguments is unchanged.

A unit test pins the stdin branch: open_vensim_model(None, ..) on a
GET DIRECT model's contents must surface the "no DataProvider configured"
error, proving no provider is built for stdin. The doc now states only
what the code guarantees.
The C-LEARN residual work (#590/#591) landed five general
import/simulation primitives whose contracts the per-component
CLAUDE.md files did not yet reflect:

- engine: standalone lookup-only variables lower uniformly to
  gf(Time) (LookupOnlyLayout PerElement/Shared); genuine passthrough
  macros collapse to the builtin opcode (classify_passthrough +
  PassthroughBuiltin); the initials runlist is now deterministic
  (sort before topo_sort_str, #595) and the dt stock-submodel-output
  chain-break applies to all readers; xmile_compat carries a
  macro-shadowing audit (RAMP FROM TO arm removed); the VDF reader
  re-binds standalone graphical-function descriptors.
- cli: GET DIRECT external data resolves via a FilesystemDataProvider
  rooted at the model path (stdin uses the null provider).

Bumps the engine freshness lead to 2026-05-20 and adds a freshness
date plus the lookup_only_tests.rs entry. Docs-only.
@bpowers bpowers force-pushed the clearn-residual branch from cb3bbb0 to 666bd0c Compare May 20, 2026 23:05
@bpowers bpowers merged commit 59e48a9 into main May 20, 2026
13 checks passed
@bpowers bpowers deleted the clearn-residual branch May 20, 2026 23:10
@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Code review — no blocking findings

Reviewed all production changes (CLI external-data resolution, lookup-only lowering, passthrough-macro collapse, dep-graph determinism + dt chain-break, RAMP FROM TO macro-shadowing, and the VDF standalone-descriptor rebind). I traced each change against the surrounding source and could not identify a discrete, introduced bug that meaningfully impacts correctness.

Notes from the verification I ran (all checked out as correct):

  • Lookup-only lowering (compiler/mod.rs): the scalar/A2A/arrayed offset logic is sound — Shared wraps base off (element_offset == 0, table_count == 1) and PerElement wraps off + i (matches build_tables' row-major declared-index layout). Confirmed is_lookup_only fires on the MDL Lookup sentinel ("0+0") while Implicit ("TIME") already reaches gf(Time) via the normal path, so the fix targets exactly the case that was wrong. WITH LOOKUP correctly stays non-lookup-only.
  • dt stock-submodel-output chain-break (db_dep_graph.rs): extending keep_dt_dep to all readers (not just module-kind) drops only submodel·subvar edges where subvar is a Stock, matching the legacy model.rs::module_output_deps gate; stock outputs are prior-timestep reads, so removing the same-step edge is correct. Init phase keeps the edge. dt_deps/VariableDeps/ImplicitVarDeps types (BTreeSet<String>) align.
  • Deterministic init runlist: sorting init_list before topo_sort_str is a pure determinism improvement.
  • VDF rebinds (record_results.rs): descriptor_indices and rebinds are consistently keyed by rec_idx (matching the owner-emit filter at the consumption site); emit_descriptor_rebinds correctly avoids shadowing existing owners and dedupes by lowest rec_idx.
  • RAMP FROM TO arm removal (xmile_compat.rs): an intentional, documented change that matches genuine Vensim semantics (the function is macro-defined, not native), with the test updated accordingly.

One minor non-blocking observation: the empty-equation arm of is_lookup_only (documented as "load-bearing") is unreachable in the scalar/A2A lowering path because an empty equation yields ast == None, which hits the EmptyEquation early-return before var_is_lookup_only is consulted. This is defensive rather than functional — no MDL import produces empty-equation + tables (they get "0+0" or "TIME"), and it is not a regression — so it does not affect any real input.

Overall correctness verdict: correct. The patch is well-tested, internally consistent, and does not appear to break existing code or behavior.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 93.92097% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.86%. Comparing base (37f9d6b) to head (666bd0c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/simlin-engine/tests/simulate.rs 84.37% 15 Missing ⚠️
src/simlin-cli/src/main.rs 80.00% 2 Missing ⚠️
src/simlin-engine/src/compiler/mod.rs 98.21% 1 Missing ⚠️
src/simlin-engine/src/module_functions.rs 97.05% 1 Missing ⚠️
src/simlin-engine/src/vdf/record_results.rs 97.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #600      +/-   ##
==========================================
+ Coverage   82.79%   82.86%   +0.06%     
==========================================
  Files         260      260              
  Lines       69286    69576     +290     
==========================================
+ Hits        57367    57653     +286     
- Misses      11919    11923       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant