Skip to content

engine: per-element graphical functions are mapped by Arrayed-Vec (sorted) position instead of element-name -> dimension index #589

@bpowers

Description

@bpowers

Summary

A variable with a per-element (arrayed) graphical function has its per-element table list built positionally -- the i-th elems entry's GF is placed at table-index i -- but the runtime, the LTM polarity analysis, and the closed-#502 static-polarity path all select the per-element table by dimension index (the flat array offset). For any dimension whose declared element order differs from sorted/alphabetical order, every per-element GF feeds the wrong element's table to each dimension slot. This is a general, silent runtime-numeric correctness bug; it is broader than C-LEARN, which is just where it was caught.

Root cause

datamodel::Equation::Arrayed carries Vec<(ElementName, _, _, Option<GraphicalFunction>)>. The per-element Vec<Table> is built by iterating that Vec in order and pushing each element's table, discarding the ElementName, at two sites:

  • src/simlin-engine/src/variable.rs:336-351 -- build_tables: for (_, _, _, elem_gf) in elements { tables.push(parse_table(elem_gf)) }. (Its own comment claims "table[element_offset] corresponds to the correct element" -- that alignment is exactly what is not guaranteed.)
  • src/simlin-engine/src/db.rs:2923-2937 -- extract_tables_from_source_var (the production salsa path): elements.iter().map(|e| ... e.gf ...).collect(), same positional layout, name discarded.

Meanwhile the MDL importer sorts the elems Vec alphabetically by element name:

  • src/simlin-engine/src/mdl/convert/variables.rs:545 -- elements.sort_by(|a, b| a.0.cmp(&b.0));

But the runtime selects the per-element table by dimension index (the flat array offset, 0-based declared order via Dimension::get_offset, src/simlin-engine/src/dimensions.rs:127):

  • src/simlin-engine/src/vm.rs:1519 (Opcode::Lookup): gf_idx = (*base_gf as usize) + (element_offset as usize) (element_offset is the popped flat-array offset).
  • src/simlin-engine/src/vm.rs:2415 (LookupArray): &context.graphical_functions[*base_gf as usize + elem_off] where elem_off = input_view.flat_offset(&indices) (row-major flat offset).

So whenever the declared element order differs from alphabetical order, table-index i (= sorted position) is read for dimension index i (= declared position) -- a mismatch.

Concrete C-LEARN manifestation

In test/xmutil_test_models/C-LEARN v77 for Vensim.mdl, the COP dimension is declared in this (non-alphabetical) order:

OECD US, OECD EU, G77 China, G77 India, Remaining Developed, Remaining Developing A, COP Developing B

After the alphabetical sort, elems[0] is COP Developing B's table, but dimension index 0 is OECD US. So e.g. un_population_high[oecd_us] evaluates COP Developing B's table (elems[0]) instead of OECD US's. This was the dominant numeric divergence vs the genuine-Vensim Ref.vdf: a Phase 7 diagnostic measured ~28% of matched cells diverging, almost all cascading from this one bug through population -> GDP -> emissions.

Also affects LTM link polarity

src/simlin-engine/src/ltm/polarity.rs:765-774 already indexes the per-element table list by dimension index:

tables.get(dim.get_offset(elem)).map(analyze_graphical_function_polarity)

so per-element GF static link polarity is also silently wrong for non-sorted dimensions today (the table at get_offset(elem) belongs to a different element). This is the consumer added by the now-closed #502 ("support per-element graphical functions in static link polarity analysis"): #502 made the analysis index by dimension offset on the assumption the table layout was offset-aligned -- this issue is that the layout is in fact sorted-position-aligned, so #502's correct dimension-index access reads the wrong table.

Why it matters

  • Correctness: silently wrong simulation output for any arrayed graphical function over a dimension whose declared element order isn't alphabetical -- no crash, no diagnostic, just wrong numbers, cascading through every downstream variable.
  • Also corrupts LTM static link polarity for the same models (wrong loop sign / classification).

Why it was not caught earlier

Every existing arrayed-GF unit test uses a dimension whose declared order is already alphabetical (A1, A2, A3 / A, B, C), where positional layout == dimension index, so the bug is invisible. The simulating integration tests assert only order-invariant aggregates (sums), which are unchanged when two elements' tables are swapped.

Components affected

  • src/simlin-engine/src/variable.rs (build_tables, positional build site)
  • src/simlin-engine/src/db.rs (extract_tables_from_source_var, the production salsa build site)
  • src/simlin-engine/src/mdl/convert/variables.rs (the alphabetical elements.sort_by)
  • src/simlin-engine/src/ltm/polarity.rs (the per-element-GF polarity consumer, indexes by get_offset)
  • src/simlin-engine/src/dimensions.rs (Dimension::get_offset, the 0-based declared-order index primitive)

Fix direction (in progress as element-cycle-resolution Phase 7 Task 5)

At lowering/compile time, reorder the per-element table layout so element e's table lands at Dimension::get_offset(e) (0-based declared-order dimension index), in both build sites (variable.rs and db.rs). Handle the multi-dimensional case via row-major flattening across all the variable's dimensions. Leave Table::empty() in slots for elements without a GF (unchanged today). This is compile-time only: the VM hot path (opcode + handler) is unchanged -- no per-step name lookup. Baseline correctness tests plus a compile-time structural-layout assertion are added before the fix (TDD).

Note: the right fix is compile-time table reordering keyed on get_offset, not removing the importer's sort_by -- the sort is fine for the equation/element ordering; what's wrong is that table layout is keyed on that sorted position instead of the dimension index. (Both the equation strings and the tables need to agree with the dimension index; the simplest correct approach is to make the table layout authoritative on get_offset.)

Context / lineage

Confirmed end-to-end (read at every cited site) during the element-level cycle resolution work (branch clearn-hero-model). Being fixed now as Phase 7 Task 5; filing for the record because it is a general correctness bug, not C-LEARN-specific. Lineage: element-cycle-resolution AC8.1 (C-LEARN numeric finalization).

Related (distinct) issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingengineIssues with the rust-based simulation engine

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions