diff --git a/docs/design/vdf.md b/docs/design/vdf.md index 704fd4905..7a84b76c0 100644 --- a/docs/design/vdf.md +++ b/docs/design/vdf.md @@ -120,10 +120,12 @@ the next section's magic; the last section runs to end-of-file. **Identify sections by index, not by `field4`** -- `field4` values vary across files. `field1` is a 1-based word pointer from the section magic: -`section.file_offset + 4 * (field1 - 1)`. For section 6 this points at the OT -class-code array start (`header[0x58] - OT_count`); for section 7 it points at -`header[0x60]`; for section 5 it points at the section's final word -(`region_end - 4`). +`section.file_offset + 4 * (field1 - 1)`. For **section 1** it points at the +first entry of the slot table (see "Slot table"); for section 6 it points at +the OT class-code array start (`header[0x58] - OT_count`); for section 7 it +points at `header[0x60]`; for section 5 it points at the section's final word +(`region_end - 4`). (In dataset files the record/header area and slot table +shift into section 0, so section 0's `field1` plays section 1's role.) | Index | Role | |---|---| @@ -157,6 +159,8 @@ Three words in the preamble are stable cross-corpus invariants: - `data[8..12] == count of section-6 lookup mapping records` Block 1 also satisfies `block1[10] >> 16 == block1[11]` on every observed file. +**`block1[7]` is the slot count** -- the number of slot-table entries -- on +every run-file and dataset VDF in the corpus (see "Slot table"). The first variable record starts at `sec1.data_offset() + 204` and records follow on 64-byte strides until just before the slot table. A few files leave a @@ -190,12 +194,28 @@ owner spans form a clean, non-overlapping OT partition. ### Slot table -An array of N `u32` values located between the last record and the name table, -one per name table entry. Each value is a byte offset into section-1 data. The +An array of `N` `u32` values, each a byte offset into section-1 data. The pairing is direct: `slot_table[i]` belongs to `names[i]`. (`#`-signature internal-helper names sit past the slotted prefix in the name table and have no -slot-table entry.) The parser finds the table by scanning backward from the -name table for the largest run of unique, in-range, 4-byte-aligned offsets. +slot-table entry, so `N <= name_count`.) + +The table's location is **fully determined by the header** -- no scan or stride +heuristic is needed: + +- **start** = section 1's `field1` 1-based word pointer + (`sec1.file_offset + 4 * (field1 - 1)`); +- **count** `N` = `block1[7]` (the actively-written slot count); +- the table is followed by a single `0x00430000` terminator word, then the + name-table section magic. + +These three facts over-determine each other -- `start + (N + 1) * 4` equals the +name-table section's file offset on every run-file and dataset VDF in the corpus +(138 run files + 6 datasets, zero exceptions). A reader can take any two and +cross-check the third; `vdf::slot_table_from_header` reads `field1` + `block1[7]` +and verifies the terminator and boundary. (An earlier reader scanned backward +for "the largest run of unique, in-range, 4-byte-aligned offsets"; that heuristic +under-counted on edited files whose name table contained stale entries and +over-counted by one elsewhere. It has been replaced by the structural decode.) ## Section 2: name table @@ -464,6 +484,38 @@ Reconstructing the result set is a single pass over the section-1 records: alphabetical" ordering visible in the OT array is a consequence of Vensim's compiler allocation, not a rule a reader needs. +### Standalone graphical-function ("lookup-only") descriptors + +A lookup-only variable is a **graphical function = a table indexed by an +explicit input** (`y = lookup(input)`). A *bare* lookup -- a table with no +call site of its own -- is **not a time series**, so Vensim saves no data block +for it: only a descriptor record exists, with no separate consumer-owner record. +The overlap-pruning step above never sees it (it collides with nothing), so it +would otherwise decode at its `f[11]`-as-OT-start ghost slot (a class-`0x08` +stock slot holding `0`/garbage). The reader recognises it structurally (its +ghost slots all carry the stock class code -- a lookup is never a stock -- its +`f[11]` is a valid lookup-record index, and the forward link +`lookup_record[f[11]].word[10]` is a valid owner OT, with `word[11]` matching the +descriptor's element count for the arrayed case) and **drops it**, exactly like +an overlapping descriptor (`record_results::standalone_lookup_only_descriptors`). +The table's values, where they matter, are carried by the **consumer** variables +that call it with a real input -- those are ordinary owners the reader emits +under their own names. + +This is why the reader does not (and should not) reconstruct a series for a +lookup-only variable: the variable's value is `lookup(input)` for whatever input +the model passes, which the VDF does not store. The forward link only points at +*a* consumer, and the model defines how that consumer relates to the lookup -- +on `Ref.vdf`: an identity pass-through (`Historical GDP[COP] = IF Time<=cutoff +THEN Historical GDP LOOKUP(Time/One year) ELSE :NA:`), a unit-scaled copy +(`RS GDP = RS GDP in trillions(...) * million per trillion dollars`), a fixed-time +snapshot (`Forestry emissions at start year = Historical forestry LOOKUP(start +year)`), or one row of a wider 2-D consumer (`rs_hfc125` is the `HFC125` column +of `RS HFC[COP, HFC type]`). Recovering the lookup variable's own series from any +of these needs the model, not the VDF. (A `gf(Time)` lowering for such a bare +lookup is an engine bug -- a table is not generally a function of time; see +#597.) + ### Worked example: a `SMOOTH1` call One `SMOOTH(in, t)` call (= `SMOOTH1`) adds: **+1 OT** (the level, class `0x08`, diff --git a/src/simlin-engine/src/vdf.rs b/src/simlin-engine/src/vdf.rs index e9a20f2c7..fc29554ff 100644 --- a/src/simlin-engine/src/vdf.rs +++ b/src/simlin-engine/src/vdf.rs @@ -229,6 +229,19 @@ pub const FILE_HEADER_SIZE: usize = 0x80; /// VDF fixture. Expressed explicitly as `12 + 3 * 64 = 204`. pub const RECORD_REGION_START_OFFSET: usize = 12 + 3 * RECORD_SIZE; +/// Byte offset, within the header section's data, of the `u32` slot count +/// (`block1[7]`): word 7 of the *second* 64-byte header block -- past the +/// 12-byte preamble and the first 64-byte block. Vensim writes the +/// actively-saved slot count here, and it equals the slot-table entry count on +/// every observed run-file and dataset VDF. +pub const SLOT_COUNT_WORD_OFFSET: usize = 12 + RECORD_SIZE + 7 * 4; + +/// Sentinel `u32` written immediately after the slot table, separating it from +/// the name-table section magic. Constant (`0x00430000`) across every observed +/// run-file and dataset VDF; used to cross-check the deterministic slot-table +/// decode. +pub const SLOT_TABLE_TERMINATOR: u32 = 0x0043_0000; + /// Vensim system variable names that appear in every VDF name table. pub const SYSTEM_NAMES: [&str; 5] = ["Time", "INITIAL TIME", "FINAL TIME", "TIME STEP", "SAVEPER"]; @@ -548,10 +561,21 @@ pub struct VdfSection6LookupRecord { } impl VdfSection6LookupRecord { - /// OT index for this lookup table definition. + /// OT index of the lookup's evaluated-output block (word[10]). + /// + /// This is the OT of a *consumer* of the lookup, not the lookup-def name's + /// own record; it can be shared by several lookup records (see + /// `docs/design/vdf.md`, "Lookup mapping records"). pub fn ot_index(&self) -> usize { self.words[10] as usize } + + /// Width of the evaluated-output block (word[11]): the number of OT slots + /// the consumer occupies starting at [`Self::ot_index`]. `1` for a scalar + /// consumer; the element count for an arrayed one. + pub fn output_width(&self) -> usize { + self.words[11] as usize + } } /// A contiguous OT index range implied by record start indices (field[11]). @@ -722,9 +746,11 @@ impl VdfDatasetFile { // section 0 behaves like the string/record area and section 1 carries // the printable name table. let names = parse_name_table_extended(&data, §ions[1], sections[1].region_end); - let section0_data_size = sections[0].region_data_size(); + // Dataset VDFs shift the record/header area into section 0, so the + // slot-table header (`field1` + `block1[7]`) is section 0 while the + // name table is section 1. let (slot_table_offset, slot_table) = - find_slot_table(&data, §ions[1], names.len(), section0_data_size); + slot_table_from_header(&data, §ions[0], sections[1].file_offset); // Dataset VDFs share the "12-byte preamble + three header blocks, // then records" layout with simulation VDFs. The only difference is @@ -885,17 +911,15 @@ impl VdfFile { }) .unwrap_or_default(); - // Section at index 1 is the string table section. Its field4 - // value varies across VDF versions (2, 42, etc.) so we identify it - // by position rather than field4. - let sec1_data_size = sections.get(1).map(|s| s.region_data_size()).unwrap_or(0); - - let (slot_table_offset, slot_table) = name_section_idx - .map(|ns_idx| { - let ns = §ions[ns_idx]; - find_slot_table(&data, ns, names.len(), sec1_data_size) - }) - .unwrap_or((0, Vec::new())); + // The slot table's start pointer and entry count live in the + // section-1 header (`field1` + `block1[7]`). Section 1 is identified + // by position because its `field4` varies across VDF versions. + let (slot_table_offset, slot_table) = match (sections.get(1), name_section_idx) { + (Some(sec1), Some(ns_idx)) => { + slot_table_from_header(&data, sec1, sections[ns_idx].file_offset) + } + _ => (0, Vec::new()), + }; // Find records. The record region lives at a fixed offset within // section 1's data: `RECORD_REGION_START_OFFSET` bytes past @@ -1747,11 +1771,14 @@ pub fn parse_name_table_extended(data: &[u8], section: &Section, parse_end: usiz .map(|&b| b as char) .collect(); - if s.is_empty() || !s.chars().all(|c| c.is_ascii_graphic() || c == ' ') { - break; + // Edited Vensim files leave stale/deleted entries: a valid u16 length + // prefix over non-printable binary payload. Vensim's reader skips them + // by the declared length and keeps going -- the table does not end at + // the first such entry. Only the out-of-bounds and implausible-length + // guards above stop the table. (See docs/design/vdf.md, section 2.) + if !s.is_empty() && s.chars().all(|c| c.is_ascii_graphic() || c == ' ') { + names.push(s); } - - names.push(s); pos += len; } @@ -1787,60 +1814,61 @@ pub fn find_name_table_section_idx(data: &[u8], sections: &[Section]) -> Option< None } -/// Find the slot lookup table. This is an array of u32 values (typically one per -/// name) located just before the name table section. The values are byte offsets -/// into section 1 data. For small models they have uniform stride (e.g., 16); for -/// larger models the stride may vary (variable-length slot metadata). +/// Decode the slot table deterministically from the section header. +/// +/// The slot table is an array of `u32` byte-offsets into the header section's +/// data, one per slotted name. Vensim records its location exactly, so no +/// backward scan or stride heuristic is needed: +/// +/// - **start** is the 1-based word pointer in the header section's `field1` +/// (`header_section.file_offset + 4 * (field1 - 1)`) -- the same field-1 +/// convention sections 6 and 7 use for their payload pointers; +/// - **count** is `block1[7]` (`SLOT_COUNT_WORD_OFFSET` into the section's +/// data), the actively-written slot count; +/// - the table is followed by a single [`SLOT_TABLE_TERMINATOR`] word and then +/// the name-table section magic. +/// +/// `header_section` is section 1 for simulation-run files and section 0 for +/// dataset files (whose record/header area shifts one section left). +/// `name_table_offset` is the table's upper boundary -- the name-table +/// section's file offset. /// -/// Returns (file_offset_of_table, values). -pub fn find_slot_table( +/// These three facts over-determine each other and were verified to agree on +/// every run-file and dataset VDF in the corpus. The decode therefore +/// cross-checks the layout (`start + (count + 1) * 4 == name_table_offset`, +/// terminator word present) and returns `(0, empty)` if it does not hold, +/// rather than emitting a mis-decoded table. +pub fn slot_table_from_header( data: &[u8], - name_table_section: &Section, - max_name_count: usize, - section1_data_size: usize, + header_section: &Section, + name_table_offset: usize, ) -> (usize, Vec) { - if max_name_count == 0 { + if header_section.field1 == 0 { return (0, Vec::new()); } - let end = name_table_section.file_offset; - - for gap in 0..20 { - // Prefer the largest structurally valid table for this gap. - for name_count in (1..=max_name_count).rev() { - let table_size_bytes = name_count * 4; - if end < gap + table_size_bytes { - continue; - } - let table_start = end - gap - table_size_bytes; - - let values: Vec = (0..name_count) - .map(|i| read_u32(data, table_start + i * 4)) - .collect(); - - let mut sorted = values.clone(); - sorted.sort(); - sorted.dedup(); - - if sorted.len() != name_count { - continue; - } - - let all_valid = sorted - .iter() - .all(|&v| v % 4 == 0 && v > 0 && (v as usize) < section1_data_size); - if !all_valid { - continue; - } - - let strides: Vec = sorted.windows(2).map(|pair| pair[1] - pair[0]).collect(); - let min_stride = strides.iter().copied().min().unwrap_or(0); - if min_stride >= 4 { - return (table_start, values); - } - } + let slot_start = header_section.file_offset + 4 * (header_section.field1 as usize - 1); + let count_word_offset = header_section.data_offset() + SLOT_COUNT_WORD_OFFSET; + if count_word_offset + 4 > data.len() { + return (0, Vec::new()); + } + let slot_count = read_u32(data, count_word_offset) as usize; + + // Cross-check the over-determined layout: `slot_count` slot words, then one + // terminator word, then the name table. Bail out (no slots) if the file + // does not match, rather than emit a mis-decoded table. + let terminator_offset = slot_start + slot_count * 4; + if slot_start >= name_table_offset + || terminator_offset + 4 != name_table_offset + || terminator_offset + 4 > data.len() + || read_u32(data, terminator_offset) != SLOT_TABLE_TERMINATOR + { + return (0, Vec::new()); } - (0, Vec::new()) + let slot_table = (0..slot_count) + .map(|i| read_u32(data, slot_start + i * 4)) + .collect(); + (slot_start, slot_table) } /// Find 64-byte variable records between `search_start` (inclusive) and @@ -2311,6 +2339,84 @@ mod tests { assert_eq!(names[1], "abc"); } + #[test] + fn test_parse_name_table_extended_skips_stale_nonprintable_entries() { + // Edited Vensim files leave stale/deleted name-table entries: a valid + // u16 length prefix followed by non-printable binary payload (often a + // 4-byte value plus a fragment of the deleted name). Vensim's reader + // skips them by the declared byte count and keeps going; the table does + // not end there. (Real example from `risk2.vdf`: len=10, payload + // `91 01 00 00 63 79 20 30 00 00`.) Verify the parser skips rather than + // stops, so the names after the stale entry are still recovered. + let mut data = vec![0u8; 256]; + data[0..4].copy_from_slice(&VDF_SECTION_MAGIC); + data[20..24].copy_from_slice(&(6u32 << 16).to_le_bytes()); + + // First name at 24: "Time\0\0" (len 6, no u16 prefix). + data[24..28].copy_from_slice(b"Time"); + + // Name 2 at 30: u16=6, "abc". + data[30..32].copy_from_slice(&6u16.to_le_bytes()); + data[32..35].copy_from_slice(b"abc"); + + // STALE entry at 38: u16=10, non-printable payload (in-bounds, len<=256). + data[38..40].copy_from_slice(&10u16.to_le_bytes()); + data[40..50].copy_from_slice(&[0x91, 0x01, 0x00, 0x00, b'c', b'y', b' ', b'0', 0, 0]); + + // Name 3 at 50: u16=6, "def". + data[50..52].copy_from_slice(&6u16.to_le_bytes()); + data[52..55].copy_from_slice(b"def"); + + let section = Section { + file_offset: 0, + field1: 0, + region_end: 80, + field3: 500, + field4: 0, + field5: 6u32 << 16, + }; + + let names = parse_name_table_extended(&data, §ion, 80); + assert_eq!(names, vec!["Time", "abc", "def"]); + } + + #[test] + fn test_slot_table_deterministic_matches_header() { + // The slot table is decoded deterministically from the section-1 + // header: it starts at the `field1` 1-based word pointer, has + // `block1[7]` entries, and is followed by a single `0x00430000` + // terminator word before the name table. The previous heuristic + // scanner under-counted on edited files (name-parser cap) and + // over-counted by one elsewhere (a spurious leading word); the + // deterministic decode matches the header exactly. See the + // corpus-wide invariant in tests/vdf_structural_invariants.rs. + for path in [ + "../../test/bobby/vdf/econ/risk2.vdf", + "../../test/metasd/WRLD3-03/SCEN01.VDF", + "../../test/metasd/social-network-valuation/optimistic.vdf", + "../../test/bobby/vdf/econ/risk.vdf", + "../../test/bobby/vdf/water/Current.vdf", + ] { + let vdf = vdf_file(path); + let sec1 = &vdf.sections[1]; + // block1[7]: 12-byte preamble + 64-byte block0 + 7 words into block1. + let block1_word7 = read_u32(&vdf.data, sec1.data_offset() + 12 + RECORD_SIZE + 28); + let field1_start = sec1.file_offset + 4 * (sec1.field1 as usize - 1); + assert_eq!( + vdf.slot_table.len() as u32, + block1_word7, + "{path}: slot count must equal block1[7]" + ); + assert_eq!( + vdf.slot_table_offset, field1_start, + "{path}: slot table must start at the field1 word pointer" + ); + // The word immediately after the slot table is the terminator. + let terminator = read_u32(&vdf.data, field1_start + vdf.slot_table.len() * 4); + assert_eq!(terminator, 0x0043_0000, "{path}: terminator word"); + } + } + fn vdf_file(path: &str) -> VdfFile { let data = std::fs::read(path) .unwrap_or_else(|e| panic!("failed to read VDF file {}: {}", path, e)); @@ -3642,20 +3748,18 @@ mod tests { } #[test] - fn test_to_results_via_records_handles_records_exceeding_slots() { - // WRLD3-03/SCEN01.VDF emits more records than slot-table entries. - // The direct f[2] key path should not care about that cardinality - // relationship; it decodes each record's section-2 name pointer - // independently and keeps the partial coverage available on this - // still-ambiguous large fixture. + fn test_to_results_via_records_decodes_large_ambiguous_fixture() { + // WRLD3-03/SCEN01.VDF is a large, edited fixture. The direct f[2] key + // path decodes each record's section-2 name pointer independently of + // the slot table, keeping substantial (partial) coverage on this + // still-ambiguous file. (Before the deterministic slot-table decode + // this fixture reported fewer slots than records -- a slot-under-count + // artifact of the old name-parser/scanner, not a property the + // record-based mapping ever relied on.) let vdf = vdf_file("../../test/metasd/WRLD3-03/SCEN01.VDF"); - assert!( - vdf.records.len() > vdf.slot_table.len(), - "test assumes records > slots for WRLD3 SCEN01", - ); let results = vdf .to_results_via_records() - .expect("WRLD3 SCEN01: record-based mapping should not error when records > slots"); + .expect("WRLD3 SCEN01: record-based mapping should not error"); let non_time_named = results.offsets.len().saturating_sub(1); assert!( non_time_named >= 150, diff --git a/src/simlin-engine/src/vdf/record_results.rs b/src/simlin-engine/src/vdf/record_results.rs index d6d735c2b..def39a367 100644 --- a/src/simlin-engine/src/vdf/record_results.rs +++ b/src/simlin-engine/src/vdf/record_results.rs @@ -181,15 +181,15 @@ pub(super) fn decoded_record_spans( /// Result of identifying graphical-function descriptor records. /// -/// `descriptor_indices` are the records (by `rec_idx`) that must NOT be -/// emitted at their `f[11]`-as-OT-start slot. Two sub-cases: -/// - **Overlapping descriptors** are dropped entirely: their consuming owner -/// record exists separately in the same OT component and carries the series. -/// - **Standalone descriptors** (a lookup-only variable Vensim saves *only* as -/// a descriptor, no separate consumer-owner record) are re-bound: they -/// appear additionally in `rebinds` mapping `rec_idx -> forward-link OT` -/// (`lookup_record[f[11]].word[10]`), and the caller emits them there -/// instead of dropping them. +/// `descriptor_indices` are the records (by `rec_idx`) that are dropped -- NOT +/// emitted at their `f[11]`-as-OT-start slot, because they are graphical-function +/// definitions (tables), not saved time series. Two sub-cases: +/// - **Overlapping descriptors**: their consuming owner record exists separately +/// in the same OT component and carries the series. +/// - **Standalone descriptors** (a lookup-only variable Vensim saves *only* as a +/// descriptor, no separate consumer-owner record): a bare lookup is a table, +/// so it has no series of its own; the consumer variables that call it are +/// separately-emitted owners. Both sub-cases are simply dropped. /// /// `used_f10_fallback` records when the descriptor peeling step had to /// resort to the highest-`f[10]` tie-break because the lexical @@ -199,9 +199,6 @@ pub(super) fn decoded_record_spans( #[derive(Clone, Debug, Default)] pub(super) struct DescriptorIdentification { pub(super) descriptor_indices: HashSet, - /// `rec_idx -> forward-link OT` for standalone descriptors re-bound to - /// their evaluated-output OT. A subset of `descriptor_indices`. - pub(super) rebinds: HashMap, #[allow(dead_code)] pub(super) used_f10_fallback: bool, } @@ -371,89 +368,108 @@ pub(super) fn identify_descriptor_records( } // Standalone (non-overlapping) descriptors: a lookup-only variable Vensim - // saves only as a descriptor record. The overlap path above never sees it - // (it collides with nothing), so it would otherwise decode at its spurious - // `f[11]`-as-OT-start ghost slot. Recognise it and re-bind to its - // forward-link evaluated-output OT. - let lookup_word10: Vec = vdf - .section6_lookup_records() + // saves only as a descriptor record (a graphical-function definition). The + // overlap path above never sees it (it collides with nothing), so it would + // otherwise decode at its spurious `f[11]`-as-OT-start ghost slot. A bare + // lookup is a table, not a time series, so recognise it and DROP it -- its + // values, where they matter, are carried by the consumer variables that + // call it (separately-emitted owners). + let lookup_records = vdf.section6_lookup_records(); + let lookup_word10: Vec = lookup_records + .as_ref() .map(|recs| recs.iter().map(|r| r.ot_index()).collect()) .unwrap_or_default(); + let lookup_word11: Vec = lookup_records + .as_ref() + .map(|recs| recs.iter().map(|r| r.output_width()).collect()) + .unwrap_or_default(); let class_codes = vdf.section6_ot_class_codes().unwrap_or_default(); let f11_by_span: Vec = spans .iter() .map(|s| vdf.records[s.rec_idx].fields[11]) .collect(); - let rebinds = standalone_descriptor_rebinds( + let standalone = standalone_lookup_only_descriptors( spans, &f11_by_span, &overlapping, n_lookups, &lookup_word10, + &lookup_word11, &class_codes, vdf.offset_table_count, ); - // A re-bound standalone descriptor is also a descriptor: it must not be - // emitted at its `f[11]`-as-OT-start slot. - descriptor_indices.extend(rebinds.keys().copied()); + descriptor_indices.extend(standalone); DescriptorIdentification { descriptor_indices, - rebinds, used_f10_fallback, } } /// Identify *standalone* (non-overlapping) graphical-function descriptor -/// records and compute their forward-link re-bind OT. +/// records to DROP. /// -/// `identify_descriptor_records` only peels descriptors that sit in an -/// overlapping OT component, because a descriptor that collides with a real -/// owner is recognised by the collision. A lookup-only variable that Vensim -/// saves *only* as a descriptor record (no separate consumer-owner record) -/// does not overlap anything, so it slips through as an owner and decodes at -/// its `f[11]`-as-OT-start slot -- a ghost stock slot holding `0`/garbage (see -/// `docs/design/vdf.md`, "Descriptor pruning"). Its real series is the -/// forward-linked evaluated-output OT `lookup_record[f[11]].word[10]`. +/// A bare graphical function is a **table, not a time series**: Vensim saves no +/// series for it, only a descriptor record whose `f[11]` is a section-6 +/// lookup-record index (not an OT start). `identify_descriptor_records` only +/// peels descriptors that sit in an *overlapping* OT component (a collision +/// with their consuming owner reveals them). A lookup-only variable saved +/// *only* as a descriptor collides with nothing, so it would otherwise decode +/// at its spurious `f[11]`-as-OT-start ghost slot (a stock slot holding +/// `0`/garbage; see `docs/design/vdf.md`, "Standalone graphical-function +/// descriptors"). This recognises such a record and returns its `rec_idx` so +/// the caller drops it -- exactly like an overlapping descriptor. The table's +/// values, where they matter, are carried by the consumer variables that call +/// it (e.g. `Historical GDP[COP] = IF Time<=cutoff THEN Historical GDP +/// LOOKUP(Time/One year) ELSE :NA:`), which the reader emits as ordinary +/// owners under their own names. /// -/// This pure function (functional core) recognises such a record and returns -/// its `rec_idx -> forward OT` re-bind, gated to avoid disturbing legitimate -/// owners: +/// This pure function (functional core) detects the descriptor conservatively +/// to avoid dropping a real owner: /// - the span is NOT in `overlapping` (the connected-component peeling path /// owns the overlapping case); /// - its `f[11]` (`f11_by_span[i]`) is a valid section-6 lookup-record index /// (`< n_lookups`) -- the structural pre-condition for the forward link; -/// - its `f[11]`-as-OT-start slot (`span.start`) carries the **stock** class -/// code (`0x08`). A graphical-function/lookup variable is never a stock, so -/// landing on a stock slot is the spurious-owner telltale. A legitimate -/// scalar owner whose `f[11]` is coincidentally `< n_lookups` carries a -/// non-stock data code (`0x11` dynamic etc.) and is left untouched; +/// - **every** `f[11]`-as-OT-start ghost slot (`span.start .. span.end`) +/// carries the **stock** class code (`0x08`). A graphical function is never a +/// stock, so landing on stock slots is the spurious-owner telltale; a +/// legitimate non-stock owner whose `f[11]` is coincidentally `< n_lookups` +/// carries a `0x11` (dynamic) etc. code and is left untouched; /// - the forward link `lookup_record[f[11]].word[10]` is a valid data OT -/// (`1 <= ot < ot_count` with an owner class code -- never Time/0). +/// (`1 <= ot < ot_count`) with owner class codes across +/// `[word[10], word[10] + span_len)`, and for an **arrayed** descriptor +/// (`span_len > 1`) the forward width (`word[11]`) equals the element count. +/// These confirm `f[11]` really indexes this variable's graphical function +/// rather than coincidentally landing in the lookup-index range. /// -/// When the forward link is not a valid data OT (e.g. it points at Time/0, the -/// "no saved consumer" case), the record is NOT re-bound: it has no recoverable -/// series, and re-binding to Time would be worse than leaving it. Such records -/// remain a genuine residual. -fn standalone_descriptor_rebinds( +/// Not matched here (the model-free reader cannot safely distinguish them from +/// real owners, so they are left as-is): the `rs_hfc*` family, whose forward +/// link is a *wider* shared 2-D consumer (`word[11] != span_len`), and a +/// descriptor whose forward link is Time/`0`. The engine should not be +/// synthesising a `gf(Time)` series for any of these lookup tables in the first +/// place; see #597. +// Functional core: takes pre-extracted slices (the section-6 lookup forward +// links, OT class codes) rather than `&VdfFile`, so the detection is unit +// testable on synthetic inputs with no fixture. That decoupling is the reason +// for the parameter count. +#[allow(clippy::too_many_arguments)] +fn standalone_lookup_only_descriptors( spans: &[DecodedRecordSpan], f11_by_span: &[u32], overlapping: &HashSet, n_lookups: usize, lookup_word10: &[usize], + lookup_word11: &[usize], class_codes: &[u8], ot_count: usize, -) -> HashMap { - let mut rebinds = HashMap::new(); +) -> HashSet { + let mut dropped = HashSet::new(); for (i, span) in spans.iter().enumerate() { if overlapping.contains(&i) { continue; } - // Scalar only. An arrayed descriptor re-bound to a single forward OT - // would scalarize and lose its element columns; the arrayed lookup-only - // case needs element-order info the VDF does not store on disk and is - // deferred (see `docs/design/vdf.md` and the C-LEARN residual plan). - if span.length() != 1 { + let span_len = span.length(); + if span_len == 0 { continue; } let f11 = match f11_by_span.get(i) { @@ -464,9 +480,13 @@ fn standalone_descriptor_rebinds( if f11 >= n_lookups { continue; } - // The f[11]-as-OT-start slot must be a STOCK slot -- the spurious-owner - // telltale. (`span.start` is exactly the `f[11]`-as-OT-start.) - if class_codes.get(span.start).copied() != Some(VDF_SECTION6_OT_CODE_STOCK) { + // Every f[11]-as-OT-start ghost slot must carry the STOCK class code -- + // the spurious-owner telltale (a graphical-function/lookup variable is + // never a stock). For an arrayed descriptor all `span_len` ghost slots + // are checked; for a scalar one this is the single `span.start` slot. + let ghost_all_stock = (span.start..span.end) + .all(|ot| class_codes.get(ot).copied() == Some(VDF_SECTION6_OT_CODE_STOCK)); + if !ghost_all_stock { continue; } // Resolve the forward link and require it be a valid data OT. @@ -477,17 +497,38 @@ fn standalone_descriptor_rebinds( if fwd == 0 || fwd >= ot_count { continue; } - let fwd_is_owner = class_codes - .get(fwd) - .copied() - .map(is_owner_ot_class_code) - .unwrap_or(false); - if !fwd_is_owner { + // Confirmation gate for arrayed descriptors: the forward link's output + // width (`word[11]`) must equal the descriptor's element count, i.e. the + // forward consumer has this variable's exact shape. This is what lets us + // be confident `f[11]` indexes this variable's graphical function rather + // than coincidentally landing in the lookup-index range -- so it is safe + // to drop. A wider shared consumer (the `rs_hfc*` family forwarding to + // one 63-wide block) has `width != span_len` and is NOT matched here + // (the model-free reader can't safely tell it from an arrayed owner). + // Scalar descriptors (no width gate) rely on the forward-link guards. + if span_len > 1 { + match lookup_word11.get(f11) { + Some(&width) if width == span_len => {} + _ => continue, + } + } + // The whole forward block must carry real-data owner class codes. + if fwd + span_len > ot_count { + continue; + } + let fwd_block_all_owner = (fwd..fwd + span_len).all(|ot| { + class_codes + .get(ot) + .copied() + .map(is_owner_ot_class_code) + .unwrap_or(false) + }); + if !fwd_block_all_owner { continue; } - rebinds.insert(span.rec_idx, fwd); + dropped.insert(span.rec_idx); } - rebinds + dropped } #[cfg(test)] @@ -514,12 +555,12 @@ mod standalone_descriptor_tests { /// A standalone graphical-function descriptor whose `f[11]` is a valid /// section-6 lookup-record index and whose `f[11]`-as-OT-start lands on a - /// STOCK (0x08) ghost slot holding the wrong value must be re-bound to its - /// forward link `lookup_record[f[11]].word[10]`, not emitted at the ghost - /// slot. Reproduces the `Ref.vdf` standalone-lookup-only mis-decode on a - /// minimal synthetic record set (NOT keyed on any C-LEARN name). + /// STOCK (0x08) ghost slot must be DROPPED (recognised as a lookup-only + /// table), not emitted at the ghost slot. Reproduces the `Ref.vdf` + /// standalone-lookup-only mis-decode on a minimal synthetic record set + /// (NOT keyed on any C-LEARN name). #[test] - fn standalone_lookup_descriptor_rebinds_to_forward_link() { + fn standalone_lookup_descriptor_is_dropped() { // OT layout (class codes): 0=Time, 1=dynamic owner (the real GF output // the descriptor must resolve to), 2=stock-coded GHOST slot the // descriptor's f[11]-as-OT-start spuriously lands on. @@ -534,6 +575,7 @@ mod standalone_descriptor_tests { // record[1], whose word[10] (evaluated-output OT) == 1. let lookup_word10 = [9usize, 1usize]; let n_lookups = lookup_word10.len(); + let lookup_word11 = vec![1usize; n_lookups]; // One standalone descriptor span: its f[11] == 1 (a valid lookup // index), but as an OT-start it lands on OT 2 (the stock ghost). It is @@ -542,27 +584,31 @@ mod standalone_descriptor_tests { let f11_by_span = [1u32]; let overlapping: HashSet = HashSet::new(); - let rebinds = standalone_descriptor_rebinds( + let dropped = standalone_lookup_only_descriptors( &spans, &f11_by_span, &overlapping, n_lookups, &lookup_word10, + &lookup_word11, &class_codes, ot_count, ); - // The descriptor (rec_idx 0) must be re-bound to forward OT 1, NOT left - // at its ghost f[11]-as-OT-start slot (OT 2). - assert_eq!(rebinds.get(&0).copied(), Some(1)); + // The descriptor (rec_idx 0) must be dropped (recognised as a + // lookup-only table), NOT emitted at its ghost f[11]-as-OT-start slot. + assert!( + dropped.contains(&0), + "lookup-only descriptor must be dropped" + ); } /// A legitimate scalar owner whose data lives at its `f[11]`-as-OT-start - /// slot (a DYNAMIC 0x11 slot) must NOT be re-bound, even if `f[11]` happens + /// slot (a DYNAMIC 0x11 slot) must NOT be dropped, even if `f[11]` happens /// to be a valid lookup index. This guards the two `Ref.vdf` /// `*_conc_change_at_impact_year` owners (class 0x11) the fix must preserve. #[test] - fn legit_dynamic_owner_with_small_f11_is_not_rebound() { + fn legit_dynamic_owner_with_small_f11_is_not_dropped() { let class_codes = [ VDF_SECTION6_OT_CODE_TIME, VDF_SECTION6_OT_CODE_DYNAMIC, // OT 1: the owner's real data @@ -570,41 +616,43 @@ mod standalone_descriptor_tests { let ot_count = class_codes.len(); let lookup_word10 = [9usize, 9usize]; let n_lookups = lookup_word10.len(); + let lookup_word11 = vec![1usize; n_lookups]; // f[11] == 1 is both the owner's OT start (dynamic, holds its data) AND // coincidentally a valid lookup index. It must stay an owner. let spans = [span(0, "Some Concentration", 1)]; let f11_by_span = [1u32]; let overlapping: HashSet = HashSet::new(); - let rebinds = standalone_descriptor_rebinds( + let dropped = standalone_lookup_only_descriptors( &spans, &f11_by_span, &overlapping, n_lookups, &lookup_word10, + &lookup_word11, &class_codes, ot_count, ); assert!( - rebinds.is_empty(), - "a dynamic-coded owner must not be re-bound: {rebinds:?}" + dropped.is_empty(), + "a dynamic-coded owner must not be dropped: {dropped:?}" ); } /// Pins the STOCK-slot guard as the *sole* load-bearing reason a legitimate /// dynamic owner is left untouched. /// - /// `legit_dynamic_owner_with_small_f11_is_not_rebound` (above) also exercises + /// `legit_dynamic_owner_with_small_f11_is_not_dropped` (above) also exercises /// a dynamic owner, but there the forward link `lookup_word10[f11] == 9` is - /// out of OT range, so the "valid forward data OT" guard rejects the re-bind + /// out of OT range, so the "valid forward data OT" guard rejects the drop /// *first* and the STOCK-slot guard never decides. This case constructs a /// span where every *other* precondition passes -- it is non-overlapping, /// scalar, its `f[11]` is a valid lookup index, and its forward link resolves /// to a valid in-range owner OT -- so the ONLY condition standing between the - /// owner and a (wrong) re-bind is the STOCK-slot requirement: its + /// owner and a (wrong) drop is the STOCK-slot requirement: its /// `f[11]`-as-OT-start lands on a DYNAMIC (0x11) slot, not a STOCK (0x08) /// ghost. Removing or broadening that guard would flip this test to a - /// non-empty re-bind (verified by mutation), so it enforces the docstring + /// non-empty drop (verified by mutation), so it enforces the docstring /// promise to preserve legitimate 0x11 owners. #[test] fn legit_dynamic_owner_blocked_only_by_stock_slot_guard() { @@ -623,31 +671,33 @@ mod standalone_descriptor_tests { // span.start (OT 1, DYNAMIC) can reject. let lookup_word10 = [9usize, 2usize]; let n_lookups = lookup_word10.len(); + let lookup_word11 = vec![1usize; n_lookups]; let spans = [span(0, "Some Dynamic Owner", 1)]; let f11_by_span = [1u32]; let overlapping: HashSet = HashSet::new(); - let rebinds = standalone_descriptor_rebinds( + let dropped = standalone_lookup_only_descriptors( &spans, &f11_by_span, &overlapping, n_lookups, &lookup_word10, + &lookup_word11, &class_codes, ot_count, ); assert!( - rebinds.is_empty(), + dropped.is_empty(), "a dynamic owner whose only disqualifier is the non-stock slot must \ - not be re-bound (the STOCK-slot guard is the sole gate here): {rebinds:?}" + not be dropped (the STOCK-slot guard is the sole gate here): {dropped:?}" ); } /// A standalone descriptor whose forward link `word[10]` points at Time - /// (OT 0) has no valid evaluated-output OT; it must NOT be re-bound (Time is + /// (OT 0) has no valid evaluated-output OT; it must NOT be dropped (Time is /// never a data owner). Guards the `Ref Global Emissions ... LOOKUP` case. #[test] - fn standalone_descriptor_with_time_forward_link_is_not_rebound() { + fn standalone_descriptor_with_time_forward_link_is_not_dropped() { let class_codes = [ VDF_SECTION6_OT_CODE_TIME, VDF_SECTION6_OT_CODE_STOCK, // OT 1: ghost stock slot @@ -656,27 +706,29 @@ mod standalone_descriptor_tests { // lookup record[1].word[10] == 0 -> Time, an invalid evaluated output. let lookup_word10 = [9usize, 0usize]; let n_lookups = lookup_word10.len(); + let lookup_word11 = vec![1usize; n_lookups]; let spans = [span(0, "Ref graph LOOKUP", 1)]; let f11_by_span = [1u32]; let overlapping: HashSet = HashSet::new(); - let rebinds = standalone_descriptor_rebinds( + let dropped = standalone_lookup_only_descriptors( &spans, &f11_by_span, &overlapping, n_lookups, &lookup_word10, + &lookup_word11, &class_codes, ot_count, ); assert!( - rebinds.is_empty(), - "a Time forward-link must not be re-bound: {rebinds:?}" + dropped.is_empty(), + "a Time forward-link must not be dropped: {dropped:?}" ); } /// An OVERLAPPING descriptor (already handled by the connected-component - /// peeling path) must NOT be re-bound by the standalone path -- it is the + /// peeling path) must NOT be dropped by the standalone path -- it is the /// existing path's responsibility to drop it in favor of its colliding /// consumer owner. #[test] @@ -685,63 +737,118 @@ mod standalone_descriptor_tests { let ot_count = class_codes.len(); let lookup_word10 = [9usize, 1usize]; let n_lookups = lookup_word10.len(); + let lookup_word11 = vec![1usize; n_lookups]; let spans = [span(0, "Overlapping graph", 1)]; let f11_by_span = [1u32]; // Mark span 0 as overlapping. let mut overlapping: HashSet = HashSet::new(); overlapping.insert(0); - let rebinds = standalone_descriptor_rebinds( + let dropped = standalone_lookup_only_descriptors( + &spans, + &f11_by_span, + &overlapping, + n_lookups, + &lookup_word10, + &lookup_word11, + &class_codes, + ot_count, + ); + assert!( + dropped.is_empty(), + "an overlapping descriptor must be left to the component path: {dropped:?}" + ); + } + + /// An ARRAYED standalone descriptor (span length > 1) IS dropped to its + /// forward OT-block start when the forward link's output width + /// (`word[11]`) equals the descriptor's element count -- the clean case + /// where the forward block is this variable's own per-element series. The + /// caller emits the block with section-3 element labels. Mirrors the + /// `Ref.vdf` `historical_*_lookup` / `rs_ch4` (COP, 7) bases. + #[test] + fn arrayed_standalone_descriptor_is_dropped_when_width_matches() { + // OT layout: 0=Time, [1,4) = 3 dynamic owners (the forward block the + // descriptor should resolve to), [4,7) = 3 stock GHOST slots its + // f[11]-as-OT-start spuriously covers. + let class_codes = [ + VDF_SECTION6_OT_CODE_TIME, + VDF_SECTION6_OT_CODE_DYNAMIC, + VDF_SECTION6_OT_CODE_DYNAMIC, + VDF_SECTION6_OT_CODE_DYNAMIC, + VDF_SECTION6_OT_CODE_STOCK, + VDF_SECTION6_OT_CODE_STOCK, + VDF_SECTION6_OT_CODE_STOCK, + ]; + let ot_count = class_codes.len(); + // lookup record[1]: word[10] == 1 (forward block start), word[11] == 3 + // (output width == the descriptor's 3 elements). + let lookup_word10 = [9usize, 1usize]; + let lookup_word11 = [0usize, 3usize]; + let n_lookups = lookup_word10.len(); + // A 3-element arrayed descriptor over the stock ghost span [4,7). + let spans = [span_with_len(0, "RS arrayed graph", 4, 3)]; + let f11_by_span = [1u32]; + let overlapping: HashSet = HashSet::new(); + + let dropped = standalone_lookup_only_descriptors( &spans, &f11_by_span, &overlapping, n_lookups, &lookup_word10, + &lookup_word11, &class_codes, ot_count, ); + // Recognised as a lookup-only table and dropped. assert!( - rebinds.is_empty(), - "an overlapping descriptor must be left to the component path: {rebinds:?}" + dropped.contains(&0), + "arrayed lookup-only descriptor must be dropped" ); } - /// An ARRAYED descriptor (span length > 1) must NOT be re-bound by the - /// scalar standalone path: re-binding it to a single forward OT would - /// scalarize and lose its element columns. The arrayed lookup-only case is - /// deferred (it needs element-order info the VDF format does not store on - /// disk). This guards the deferred C-LEARN `rs_*` / `historical_*_lookup` - /// arrayed descriptors against accidental scalarization. + /// An arrayed descriptor whose forward link points at a WIDER shared + /// consumer (`word[11] != span_len`) must NOT be dropped: the forward + /// block is not this variable's own series. Pins the width gate as the + /// sole disqualifier here -- every other precondition passes. Mirrors the + /// `Ref.vdf` `rs_hfc*` family, where eight 7-element descriptors all + /// forward-link to one 63-wide consumer block. #[test] - fn arrayed_standalone_descriptor_is_not_rebound() { - // OT layout: 0=Time, 1=dynamic (forward link), 2..5 = stock ghost span. + fn arrayed_standalone_descriptor_with_wider_shared_consumer_is_not_dropped() { let class_codes = [ VDF_SECTION6_OT_CODE_TIME, VDF_SECTION6_OT_CODE_DYNAMIC, + VDF_SECTION6_OT_CODE_DYNAMIC, + VDF_SECTION6_OT_CODE_DYNAMIC, VDF_SECTION6_OT_CODE_STOCK, VDF_SECTION6_OT_CODE_STOCK, VDF_SECTION6_OT_CODE_STOCK, ]; let ot_count = class_codes.len(); + // Forward block start (OT 1) and an in-range owner block [1,4) are both + // fine; ONLY the width mismatch (5 != 3) rejects the drop. let lookup_word10 = [9usize, 1usize]; + let lookup_word11 = [0usize, 5usize]; let n_lookups = lookup_word10.len(); - // A 3-element arrayed descriptor (start 2, len 3), f[11] == 1. - let spans = [span_with_len(0, "RS arrayed graph", 2, 3)]; + let spans = [span_with_len(0, "RS HFC arrayed graph", 4, 3)]; let f11_by_span = [1u32]; let overlapping: HashSet = HashSet::new(); - let rebinds = standalone_descriptor_rebinds( + let dropped = standalone_lookup_only_descriptors( &spans, &f11_by_span, &overlapping, n_lookups, &lookup_word10, + &lookup_word11, &class_codes, ot_count, ); assert!( - rebinds.is_empty(), - "an arrayed descriptor must not be scalar-re-bound: {rebinds:?}" + dropped.is_empty(), + "an arrayed descriptor with a wider shared consumer must not be \ + dropped (the width gate is the sole disqualifier here): {dropped:?}" ); } } @@ -959,57 +1066,15 @@ pub(super) fn build_record_result_columns( } } - // Emit standalone-descriptor re-binds last, each as one scalar column at - // its forward-link evaluated-output OT. These deliberately do NOT consult - // `claimed_ot`: the forward OT is frequently a *consumer* OT already owned - // by another variable (Vensim stores one evaluated series consumed under - // several names), and the lookup-only variable legitimately shares it. The - // names are distinct, so this adds a distinct column rather than a - // duplicate. Determinism: iterate in `rec_idx` order so a name that maps to - // multiple descriptor records resolves to the lowest `rec_idx` consistently. - emit_descriptor_rebinds(&spans, &desc_id.rebinds, &mut ordered); + // Standalone lookup-only descriptors are graphical-function tables, not + // saved series, so `identify_descriptor_records` puts them in + // `descriptor_indices` and they are dropped above (never emitted). Their + // values, where they matter, are carried by the consumer variables that + // call them, which appear here as ordinary owners under their own names. ordered } -/// Emit re-bound standalone descriptors as scalar columns at their forward OT. -/// -/// One column per distinct canonical name (lowest `rec_idx` wins on the rare -/// duplicate-name case), pushed onto `ordered`. Skips a name already present -/// in `ordered` so a re-bind never shadows a real owner column of the same -/// canonical identity. -fn emit_descriptor_rebinds( - spans: &[DecodedRecordSpan], - rebinds: &HashMap, - ordered: &mut Vec<(Ident, usize)>, -) { - let span_by_rec: HashMap = - spans.iter().map(|s| (s.rec_idx, s)).collect(); - let already: HashSet> = ordered.iter().map(|(id, _)| id.clone()).collect(); - - let mut entries: Vec<(usize, usize)> = rebinds - .iter() - .map(|(&rec_idx, &fwd_ot)| (rec_idx, fwd_ot)) - .collect(); - entries.sort_unstable(); - - let mut emitted: HashSet> = HashSet::new(); - for (rec_idx, fwd_ot) in entries { - let Some(span) = span_by_rec.get(&rec_idx) else { - continue; - }; - let key = if span.name.starts_with('#') { - Ident::::from_str_unchecked(&span.name) - } else { - Ident::::new(&span.name) - }; - if already.contains(&key) || !emitted.insert(key.clone()) { - continue; - } - ordered.push((key, fwd_ot)); - } -} - /// Append one owner span's columns to `ordered`, marking the OT slots in /// `claimed_ot`. The span has already been validated as an owner record /// (post descriptor identification). Element labels are resolved via diff --git a/src/simlin-engine/tests/simulate.rs b/src/simlin-engine/tests/simulate.rs index 7f950e1bc..d913e9116 100644 --- a/src/simlin-engine/tests/simulate.rs +++ b/src/simlin-engine/tests/simulate.rs @@ -1752,35 +1752,39 @@ const EXPECTED_VDF_RESIDUAL: &[&str] = &[ // value on a meaningful magnitude. (#595 tracks the remaining deeper init- // ordering soundness gap; its nondeterminism half is fixed by `e24b0080`.) - // ===== VDF-decode-artifact (17): engine output proven CORRECT; the ===== - // `Ref.vdf` REFERENCE column is mis-decoded by our reader. For every base - // here the engine's `gf(Time)` matches the model tables and applied - // consumers match `Ref.vdf` exactly; the divergence is entirely in the - // reference-side decode of standalone graphical-function ("lookup-only") - // descriptor columns. The SCALAR half was fixed in `d69754bc`; the ARRAYED - // element-order half and the 4 scalar no-distinct-column cases are tracked - // in #597 (which supersedes #590's "engine 0+0" framing -- the engine is - // correct, the reader is not). - // - // -- Arrayed descriptor block, element-order mis-decode (#597) -- - "historical_forestry_lookup", // VDF col mis-decoded; engine gf(Time) correct. - "historical_gdp_lookup", // e.g. [g77_india] vdf 2.6e-4 vs engine 1.43e5 (correct). - "rs_ch4", // VDF descriptor col decodes to all-zero ghost; engine correct. - "rs_co2_ff", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_gdp_in_trillions", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc125", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc134a", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc143a", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc152a", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc227ea", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc23", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc245ca", // arrayed lookup-only; VDF col mis-decoded, engine correct. - "rs_hfc32", // arrayed lookup-only; VDF col mis-decoded, engine correct. - // -- Scalar descriptor with NO distinct saved VDF column (#597) -- - "ref_global_emissions_from_graph_lookup", // descriptor forward-links to Time; no distinct col. - "ozone_precursor_forcings", // forward-links to a shared "other forcings" OT (>1% off). - "oc,_bc,_and_bio_aerosol_forcings", // forward-links to the same shared OT; no distinct col. - "other_forcings_smooth_plus_rcp85", // shared-OT forward link; only 131/251 cells match. + // ===== lookup-only tables the model-free reader cannot drop (9) ===== + // A bare graphical function is a *table indexed by an explicit input*, not a + // time series. The reader therefore DROPS standalone lookup-only descriptors + // rather than reconstruct a series for them + // (`record_results::standalone_lookup_only_descriptors`); most of C-LEARN's + // lookup-only variables -- `historical_gdp_lookup`, `historical_forestry_lookup`, + // `rs_gdp_in_trillions`, `rs_ch4`, `rs_co2_ff`, the ozone/forcings scalars -- + // are now dropped and gone from this list (their data, where it matters, is + // carried by the consumer variables that call them, which are emitted as + // ordinary owners). The 9 below are lookup-only tables the model-free reader + // cannot safely distinguish from a real owner, so it still emits a ghost + // column the comparator flags: + // - rs_hfc* (8): the descriptor forward-links to the WIDER 2-D consumer + // `RS HFC[COP, HFC type]` (forward width 63 != the descriptor's 7), so the + // conservative width gate declines to drop it. + // - ref_global_emissions_from_graph_lookup: its forward link is Time/0, so + // the forward-link guard declines to drop it. + // The deeper bug is in the ENGINE: a bare lookup is a table, and lowering it + // to `gf(Time)` synthesises a phantom series that the comparator then flags + // against these ghosts. Fixing that (so the engine produces no series for a + // bare lookup) removes them from the matched set entirely -- tracked by #606 + // (the engine `gf(Time)` lowering, introduced for #590). #597 is the general + // reader fix (drop lookup-only descriptors); these nine evade it only because + // the model-free reader can't safely distinguish them from real owners. + "rs_hfc125", + "rs_hfc134a", + "rs_hfc143a", + "rs_hfc152a", + "rs_hfc227ea", + "rs_hfc23", + "rs_hfc245ca", + "rs_hfc32", + "ref_global_emissions_from_graph_lookup", // ===== benign-near-zero (2): divergence ONLY on near-zero magnitudes ===== // (cross-simulator f32/integration noise), never on a meaningful value. // Tracked in #591 cluster 2. @@ -1798,7 +1802,10 @@ const EXPECTED_VDF_RESIDUAL: &[&str] = &[ // Vensim computes `-2*NA = -1.298e33` while Simlin keeps the bare `:NA:` // sentinel `-6.49e32` (`crate::float::NA`); NA-arithmetic is confirmed // CORRECT and explicitly out of scope (see `float::NA`). The genuine - // documented remainder, tracked in #591 cluster 1. + // documented remainder, tracked in #591 cluster 1. (`historical_gdp_lookup` + // -- the lookup table whose only divergence was this same -2*NA tail -- is + // now dropped as a lookup-only table, so only the genuine consumer + // `historical_gdp` remains here.) "historical_gdp", // non-:NA: cells match exactly (e.g. [oecd_us] 6.667e4); only -2*NA cells diverge. "last_set_target_year", // INIT(VECTOR SELECT(...)); every cell is -2*NA (vdf -1.298e33 vs sim -6.49e32 sentinel). ]; @@ -1823,17 +1830,20 @@ const EXPECTED_VDF_RESIDUAL: &[&str] = &[ // `DoesNotExist` blockers (incl. the `"goal 1.5 for temperature"` quoted- // period ident, #559) are likewise cleared. // -// What remains is a short, fully-attributed residual of 21 base variables -// (Phases 1-3 reconciled the rest), explicitly excluded via -// `EXPECTED_VDF_RESIDUAL` under a five-category taxonomy: 17 VDF-reader -// decode-artifacts where the engine is PROVEN correct (#597, supersedes #590's -// "engine 0+0" framing -- the #590 engine bug is fixed), 2 benign-near-zero, -// and 2 `:NA:`-arithmetic boundary cells (#591); the engine-genuine and -// NaN-vs-`:NA:` categories are empty (the sole engine-genuine divergence, init- -// runlist nondeterminism, was fixed in `e24b0080`). The exclusion is a -// transparent, documented, tracked carve-out -- NOT a tolerance loosening; the -// hard 1% gate holds for every non-excluded variable and the matched floor is -// checked after exclusion. Run with: cargo test --release -- --ignored simulates_clearn +// What remains is a short, fully-attributed residual of 13 base variables +// (Phases 1-3 reconciled the rest; the reader now DROPS standalone lookup-only +// descriptors -- bare graphical-function tables, not series -- so most of them +// leave the comparison entirely), explicitly excluded via `EXPECTED_VDF_RESIDUAL` +// under a taxonomy: 9 lookup-only tables the model-free reader cannot safely +// distinguish from a real owner and so still emits a ghost column for (the +// engine's `gf(Time)` lowering of a bare lookup is the deeper bug, tracked by #606); +// 2 benign-near-zero; and 2 `:NA:`-arithmetic boundary series (#591). The engine +// is PROVEN correct on all of them; the engine-genuine and NaN-vs-`:NA:` +// categories are empty (the sole engine-genuine divergence, init-runlist +// nondeterminism, was fixed in `e24b0080`). The exclusion is a transparent, +// documented, tracked carve-out -- NOT a tolerance loosening; the hard 1% gate +// holds for every non-excluded variable and the matched floor is checked after +// exclusion. Run with: cargo test --release -- --ignored simulates_clearn #[test] #[ignore] fn simulates_clearn() { diff --git a/src/simlin-engine/tests/vdf_alias_decoder.rs b/src/simlin-engine/tests/vdf_alias_decoder.rs index 33edb19eb..e48fbfcdb 100644 --- a/src/simlin-engine/tests/vdf_alias_decoder.rs +++ b/src/simlin-engine/tests/vdf_alias_decoder.rs @@ -295,16 +295,17 @@ fn old_style_alias_to_output_sig_pair_by_file_order() { } /// WRLD3 SCEN01's MDL-declared stdlib-call aliases and its `#FUNC(...)#` -/// output signatures do NOT pair 1:1. MDL has more aliases than the VDF -/// has old-style output sigs -- this pins the upper-bound nature of the -/// pairing claim so a future regression doesn't silently assume 1:1. +/// output signatures agree in count (12 each) once the full name table is +/// recovered. /// -/// The root cause is not yet fully characterised: some aliases may map -/// to SAMPLE/PREVIOUS/INIT forms that do not emit a `#FUNC(args)#` -/// signature, and some may share a sig via module reuse. This test -/// documents the observed mismatch rather than enforcing equality. +/// This previously looked like a non-1:1 mismatch (MDL 12 aliases vs VDF 8 +/// sigs), attributed speculatively to SAMPLE/PREVIOUS/INIT forms or module +/// reuse. The real cause was the name-parser stopping at the first stale +/// entry, truncating the table and hiding 4 of the output signatures (GH +/// #470/#549). With the skip fix the complete table recovers all 12, and the +/// mismatch disappears -- so the counts match exactly. #[test] -fn wrld3_scen01_alias_to_sig_pairing_is_not_1to1() { +fn wrld3_scen01_alias_and_output_sig_counts_agree() { let vdf_path = "../../test/metasd/WRLD3-03/SCEN01.VDF"; let mdl_path = "../../test/metasd/WRLD3-03/wrld3-03.mdl"; if !Path::new(vdf_path).exists() || !Path::new(mdl_path).exists() { @@ -312,12 +313,6 @@ fn wrld3_scen01_alias_to_sig_pairing_is_not_1to1() { } let vdf = parse_vdf(vdf_path); let output_sigs = vdf.output_signatures(); - // Observed: 8 old-style `#FUNC(args)#` output sigs on SCEN01. - assert!( - output_sigs.len() >= 6 && output_sigs.len() <= 10, - "SCEN01: output_sigs count pinned near 8, got {}", - output_sigs.len() - ); let datamodel_project = load_mdl(mdl_path); let model = datamodel_project.models.first().unwrap(); @@ -336,11 +331,12 @@ fn wrld3_scen01_alias_to_sig_pairing_is_not_1to1() { }) .filter(|text| equation_starts_with_stdlib_call(text)) .count(); - assert!( - mdl_alias_count > output_sigs.len(), - "WRLD3 SCEN01: MDL has {mdl_alias_count} stdlib-call aliases but \ - VDF only emits {} output sigs; the 1:1 pairing claim is an upper \ - bound, not a guarantee", + assert_eq!( + mdl_alias_count, + output_sigs.len(), + "WRLD3 SCEN01: the MDL's {mdl_alias_count} stdlib-call aliases should \ + match the old-style output sigs recovered from the complete name \ + table (got {})", output_sigs.len() ); } diff --git a/src/simlin-engine/tests/vdf_multidim.rs b/src/simlin-engine/tests/vdf_multidim.rs index b588f486a..f090c9120 100644 --- a/src/simlin-engine/tests/vdf_multidim.rs +++ b/src/simlin-engine/tests/vdf_multidim.rs @@ -124,6 +124,54 @@ fn ref_vdf_record_results_use_section3_axis_refs_for_array_labels() { } } +#[test] +fn ref_vdf_arrayed_lookup_only_descriptor_is_dropped() { + // `historical_gdp_lookup` / `rs_ch4` are arrayed (COP, 7) standalone + // graphical-function ("lookup-only") variables. A bare lookup is a *table*, + // not a time series: Vensim saves no series for it, only a descriptor + // record whose `f[11]` is a section-6 lookup-record index (not an OT start). + // The reader must therefore DROP it -- like an overlapping descriptor -- + // rather than reconstruct a series at its `f[11]`-as-OT-start stock ghost + // block or at a forward-linked consumer. The data, where it matters, is + // carried by the consumer that actually calls the table (e.g. + // `Historical GDP[COP] = IF Time<=cutoff THEN Historical GDP LOOKUP(...) ELSE :NA:`, + // `CH4 anthro emissions[COP] = ... RS CH4(Time/One year) * ...`), which is a + // normal owner the reader emits under its own name. See GH #597. + let ref_vdf = load_ref_vdf(); + let results = ref_vdf + .to_results_via_records() + .expect("record-based mapping should produce Ref.vdf columns"); + let vdf_data = ref_vdf.extract_data().unwrap(); + + // The consumers (real owners) are still emitted under their own names, in + // declared COP order, carrying the data blocks the lookups feed. + for (name, ot) in [ + ("Historical GDP[OECD US]", 1824), + ("Historical GDP[COP Developing B]", 1830), + ("CH4 anthro emissions[OECD US]", 911), + ("CH4 anthro emissions[COP Developing B]", 917), + ] { + assert_result_column_matches_ot(&results, &vdf_data, name, ot); + } + + // The lookup-only variables themselves are dropped: neither a numeric ghost + // column nor a labeled series is emitted for them. + for absent in [ + "historical_gdp_lookup[OECD US]", + "historical_gdp_lookup[COP Developing B]", + "historical_gdp_lookup[0]", + "rs_ch4[OECD US]", + "rs_ch4[0]", + ] { + assert!( + !results + .offsets + .contains_key(&Ident::::new(absent)), + "lookup-only variable must be dropped, not emitted as a series: {absent}" + ); + } +} + #[test] fn ref_vdf_record_results_do_not_emit_numeric_array_fallbacks() { let ref_vdf = load_ref_vdf(); diff --git a/src/simlin-engine/tests/vdf_structural_invariants.rs b/src/simlin-engine/tests/vdf_structural_invariants.rs index 02d486388..e3290bd5c 100644 --- a/src/simlin-engine/tests/vdf_structural_invariants.rs +++ b/src/simlin-engine/tests/vdf_structural_invariants.rs @@ -287,64 +287,27 @@ fn test_block1_word10_high_equals_word11_across_corpus() { ); } -/// Whether the Rust `find_slot_table` scanner is known to under-count -/// the slot table on this fixture. The Python `tools/vdf_xray.py` parser -/// consumes a broader leading-extra-slot layout and returns a larger -/// slot count that matches `block1[7]`; the Rust scanner uses a stricter -/// `min_stride >= 4` rule and misses a chunk of entries on these -/// specific files. The Python-observed delta (per the memory-regions -/// audit) is 0 on risk2 and 1 on SCEN01; the metasd -/// `social-network-valuation/optimistic.vdf` and `pessimistic.vdf` -/// fixtures show the same class of Rust under-count at a much larger -/// magnitude (delta -23), tracked in GH #549. The Rust under-count is -/// orthogonal to the format invariant tested here, so we explicitly -/// exempt those fixtures and track the parser gap separately. -fn rust_slot_table_undercount_known(path: &Path) -> bool { - let file_name = path.file_name().and_then(|n| n.to_str()).unwrap_or(""); - let parent = path - .parent() - .and_then(|p| p.file_name()) - .and_then(|n| n.to_str()) - .unwrap_or(""); - matches!( - (parent, file_name), - ("econ", "risk2.vdf") - | ("WRLD3-03", "SCEN01.VDF") - | ("social-network-valuation", "optimistic.vdf") - | ("social-network-valuation", "pessimistic.vdf") - ) -} - -/// Pin `block1[7]` as an "actively-written slot count" diagnostic that -/// is always within 2 of `slot_table.len()` on the `test/` corpus of -/// simulation VDFs. -/// -/// This is *not* a pinned equality: the investigation found deltas of 0, -/// 1, and 2 across the corpus. The bound is diagnostic only, but having -/// it executable means any regression that breaks the relationship shows -/// up immediately. +/// Pin the deterministic slot-table decode against `block1[7]` across the +/// `test/` corpus of simulation VDFs: `slot_table.len()` must equal `block1[7]` +/// **exactly**, and the table must start at the section-1 `field1` word +/// pointer. /// -/// Scope: the assertion runs across the `test/bobby/vdf`, -/// `test/metasd`, and `test/xmutil_test_models` roots. The zambaqui -/// third-party corpus is excluded from this test -- the audit notes -/// `zambaqui/old runs/Pop-6.vdf` has delta=-1, and additional zambaqui -/// files trigger a deeper Rust slot-finder under-count. Those -/// discrepancies indicate a Rust parser gap, not a format-level -/// violation. +/// The slot table is decoded structurally (start = section-1 `field1` word +/// pointer, count = `block1[7]`, followed by a `0x00430000` terminator word +/// before the name table -- see `vdf::slot_table_from_header`), so this is now +/// a pinned equality rather than the old +/-2 diagnostic window. The previous +/// heuristic scanner under-counted on edited files (because the name parser +/// stopped at the first stale entry) and over-counted by one elsewhere; +/// `econ/risk2.vdf`, `WRLD3-03/SCEN01.VDF`, and the metasd +/// `social-network-valuation` pair (GH #470/#549) used to be exempted and now +/// decode exactly. /// -/// A small set of in-scope fixtures is exempted: Rust's slot-table -/// scanner is known to return a short count on `econ/risk2.vdf` and -/// `WRLD3-03/SCEN01.VDF` because its `min_stride >= 4` rule rejects -/// valid leading-extra-slot layouts that the Python `vdf_xray` parser -/// accepts; on those two the format-level invariant still holds -/// (block1[7] matches the Python-observed slot count). The metasd -/// `social-network-valuation` `optimistic.vdf` / `pessimistic.vdf` pair -/// is exempted for the same class of Rust under-count at a larger -/// magnitude (delta -23), tracked in GH #549. We flag these Rust parser -/// discrepancies separately rather than hide them behind a relaxed -/// bound. +/// Scope: `test/bobby/vdf`, `test/metasd`, and `test/xmutil_test_models`. The +/// relationship was verified to hold on every run-file VDF across `test/` and +/// `third_party/`, but the third-party corpus is excluded here to keep the +/// test hermetic. #[test] -fn test_block1_word7_matches_slot_count_within_small_delta() { +fn test_slot_count_matches_block1_word7_exactly() { const ROOTS: &[&str] = &[ "../../test/bobby/vdf", "../../test/metasd", @@ -352,8 +315,6 @@ fn test_block1_word7_matches_slot_count_within_small_delta() { ]; let mut checked = 0usize; - let mut exempted = 0usize; - let mut observed_deltas: std::collections::BTreeSet = std::collections::BTreeSet::new(); for root in ROOTS { let root_path = Path::new(root); if !root_path.exists() { @@ -372,21 +333,22 @@ fn test_block1_word7_matches_slot_count_within_small_delta() { let Some(block1) = read_sec1_block1(&vdf) else { continue; }; - let block1_word7 = block1[7] as i64; - let slot_count = vdf.slot_table.len() as i64; - let delta = slot_count - block1_word7; - if rust_slot_table_undercount_known(&path) { - exempted += 1; - continue; - } - observed_deltas.insert(delta); - assert!( - delta.abs() <= 2, - "{}: |slot_count - block1[7]| must be <= 2, got slot_count={} block1[7]={} delta={}", - path.display(), - slot_count, + let block1_word7 = block1[7] as usize; + assert_eq!( + vdf.slot_table.len(), block1_word7, - delta, + "{}: slot_table.len() must equal block1[7] exactly", + path.display(), + ); + // The deterministic decode must land on the section-1 field1 word + // pointer (1-based). A non-zero count guarantees field1 != 0. + let sec1 = &vdf.sections[1]; + let field1_start = sec1.file_offset + 4 * (sec1.field1 as usize - 1); + assert_eq!( + vdf.slot_table_offset, + field1_start, + "{}: slot table must start at the field1 word pointer", + path.display(), ); checked += 1; } @@ -396,21 +358,6 @@ fn test_block1_word7_matches_slot_count_within_small_delta() { checked >= 30, "expected to cross-check at least 30 simulation fixtures, got {checked}" ); - // Exempted fixtures should still be present in the corpus so the - // exemption list does not silently become a no-op. - assert!( - exempted >= 1, - "expected at least one exempted fixture to still be tracked; found {exempted}" - ); - - // Record the observed delta set as part of the test output so that a - // regression that shifts the distribution is visible without masking - // it behind a strict equality assertion. - let deltas: Vec = observed_deltas.into_iter().collect(); - assert!( - deltas.iter().all(|d| d.abs() <= 2), - "observed deltas outside the [-2, 2] diagnostic window: {deltas:?}", - ); } /// Assert that every simulation-result VDF fixture carries at least one