Skip to content

engine: VM vs wasm diverge on getValue after run_to(target) past FINAL_TIME #634

@bpowers

Description

@bpowers

Summary

After run_to(target) is called with target strictly greater than FINAL_TIME -- a supported, FFI-reachable clamp case (simlin_sim_run_to forwards time unclamped) -- the bytecode VM and the wasm simulation blob leave different live "current" state, so a subsequent mid-run getValue / get_value_now diverges between the two backends for both stocks and non-stock variables.

This is the narrower residual left after #632 (which closed #625): #632 fixed the common mid-INTERVAL getValue parity and fixed a panic in this past-end path, but the past-FINAL_TIME case is deliberately left divergent.

Verified mechanism

VM (src/simlin-engine/src/vm.rs run_to): there is no slab-full guard. The chunk ring holds n_chunks saved rows + 2 working chunks, so the loop steps ~1-2 chunks past FINAL_TIME before the chunk-ring exhaustion break fires in save_advance! (the if self.next_chunk + 1 >= n_chunks + 2 { break; } arm, which sets curr_chunk = next_chunk before breaking). It leaves curr on the overshoot working chunk: stock values integrated past FINAL_TIME, and flow/aux slots stale.

The post-loop flow re-eval (added in #632 for #625) is deliberately skipped in this exhausted-slab case -- it is guarded on curr_chunk != next_chunk (vm.rs:879), because borrow_two with two equal chunk indices would slice out of bounds and panic. See the explanatory comment at vm.rs:871-878 and the existing regression test run_to_past_final_time_does_not_panic (vm.rs:3796), which only asserts no-panic, not parity.

wasm blob (src/simlin-engine/src/wasmgen/module.rs emit_run_to): has a top-of-loop saved >= n_chunks guard (added in #630), so it clamps the run at the slab end. curr rests at the consistent FINAL_TIME state (stocks @ stop, flow/aux @ stop).

Consequence

getValue(var) after runTo(target > stop):

  • VM returns the overshoot/stale state (stocks integrated past FINAL_TIME, flow/aux stale).
  • wasm returns the clamped, self-consistent FINAL_TIME state.

The two backends disagree for both stocks and non-stock variables.

Why it matters

The wasm backend exists for interactive scrubbing, and #632 established a contract that mid-run getValue is byte-identical VM-vs-wasm for every variable. This past-end path is the one remaining hole in that contract: a caller that does an out-of-range run_to through the FFI gets a backend-dependent answer.

Severity is low: it is not reachable via normal interactive scrubbing (the slider range is [start, stop]); only via an explicit out-of-range run_to through the FFI (simlin_sim_run_to forwards time unclamped).

Component(s) affected

  • src/simlin-engine/src/vm.rs -- run_to (the chunk-exhaustion break in save_advance!, and the post-loop flow re-eval guarded on curr_chunk != next_chunk at :879)
  • src/simlin-engine/src/wasmgen/module.rs -- emit_run_to (the saved >= n_chunks top guard)

Suggested fix

Make the VM's run_to clamp at the slab boundary like the wasm's #630 top guard -- break before stepping past a full slab, so curr rests at the last consistent saved (FINAL_TIME) row, matching the wasm. With that change the VM's post-loop flow re-eval would run (curr_chunk != next_chunk would hold) and the two backends would agree for past-end targets too. The existing run_to_past_final_time_does_not_panic test should be widened to assert VM/wasm curr parity for this case.

How it was discovered

Identified during review of the wasm simulation backend's resumable run_to ABI (branch engine-wasm-sim), tracing why #632's post-loop re-eval is guarded on curr_chunk != next_chunk. Related: #625 (closed by #632), #630 (wasm slab-full clamp), #632 (mid-interval reconciliation + past-end panic fix).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions