Skip to content

engine: reconcile mid-run getValue across VM and wasm (#625)#632

Merged
bpowers merged 3 commits into
mainfrom
engine-midrun-getvalue-parity
May 23, 2026
Merged

engine: reconcile mid-run getValue across VM and wasm (#625)#632
bpowers merged 3 commits into
mainfrom
engine-midrun-getvalue-parity

Conversation

@bpowers
Copy link
Copy Markdown
Owner

@bpowers bpowers commented May 23, 2026

Summary

Closes #625. After a runTo(t) that stops mid-interval, the bytecode VM and the wasm blob left different content in their live curr chunk for non-stock variables, so a mid-run getValue(name) agreed between backends only for stocks + reserved time vars. Verified still live on main (post-#630) and actually worse than the issue described -- the VM returns 0 for constants mid-run:

variable kind VM getValue wasm getValue
teacup_temperature stock (agree) (agree)
room_temperature constant 0 70
characteristic_time constant 0 10
heat_loss_to_room flow 0 2.43… (one dt behind)

Per the chosen resolution (reconcile, not document), this makes mid-run getValue self-consistent and identical across backends for every variable.

Root cause

Both integration loops break on curr[TIME] > target after the save+advance tail. The advance steps time and integrates stocks but does not recompute flow/aux/constants for the advanced time:

  • VM: Euler's wholesale curr.copy_from_slice(next) (and the chunk-ring advance) lands curr on a chunk whose non-stock slots are stale -- 0 for a constant the overshoot chunk never held.
  • wasm: emit_save_advance copies only the stock offsets next -> curr, leaving flow/aux one step behind the advanced time + stocks.

So curr's stocks/time are at the resting point but its non-stock slots are not -- internally inconsistent on both, and divergent between them.

Fix

After the run_to loop, re-evaluate root flows once at the resting curr on both backends (vm.rs reuses the same StepPart::Flows eval the RK paths already use; wasmgen/module.rs emits one flows(0) call after the loop). The live curr chunk becomes fully self-consistent -- "the value at the current time" -- and byte-identical VM-vs-wasm for every variable.

Well-scoped, by construction:

  • Saved series unaffected: every results row was already committed; the VM's get_series reads chunks [0, curr_chunk), excluding the resting chunk.
  • Full run unaffected: runToEnd reads the last results row, and its break path already leaves a freshly-evaluated curr, so the re-eval is idempotent (a resumed run_to on a full slab stays the engine: move wasm sim live-state ownership into the blob #630 no-op).
  • Resume unaffected: a resumed run_to re-evaluates from scratch, and the re-eval does not re-snapshot prev_values, so PREVIOUS still sees the last completed step.

Applies uniformly to Euler/RK2/RK4. No TS/host change -- the host just reads curr.

Tests (TDD)

  • wasmgen/module.rs: mid_run_curr_is_self_consistent_and_matches_vm -- a flow-phase aux doubled = level * 2 exposes the one-step lag; asserts every var's mid-run curr equals the VM's get_value_now, and that doubled == level * 2 at the resting time (20, not the lagged 16). Watched fail first (inflow_rate wasm=2 vs vm=0).
  • wasm-backend.test.ts: AC2.2 widened from stocks+time to every variable. Watched fail first (constant diff of 10) against the pre-fix wasm, green after the rebuild.

Full pre-commit gate passes: cargo test --workspace (incl. the always-on corpus wasm_parity_hook), pnpm -C src/engine test (412), clippy, tsc, pysimlin.

After a runTo(t) that stops mid-interval, both executors' integration
loops break on curr[TIME] > target *after* an advance, so the live curr
chunk held the resting-point stocks and time but stale non-stock slots:
the VM's wholesale curr.copy_from_slice(next) (and chunk-ring advance)
left flow/aux/constants stale -- a mid-run getValue of a constant read 0 --
and the wasm advance copies only the stock offsets, leaving flow/aux one
step behind. A mid-run getValue therefore agreed between backends only for
stocks and the reserved time vars.

Re-evaluate root flows once at the resting curr after the loop, on both
backends, so the live curr chunk is fully self-consistent ("the value at
the current time") and identical VM-vs-wasm for every variable.

The re-eval touches only the live curr chunk: every results row was
already saved (and the VM's get_series reads chunks [0, curr_chunk),
excluding this one), a resumed run_to re-evaluates from scratch, and a
full runToEnd reads the last results row -- so saved series, resume, and
full-run getValue are unchanged. It does not re-snapshot prev_values, so a
resumed run's PREVIOUS still sees the last completed step. The full-run
and reset break paths already leave a freshly-evaluated curr, so the
re-eval is idempotent there (a resumed run_to on a full slab stays a
no-op).

Tests: a Rust mid-run parity + self-consistency test (a flow-phase aux
doubled = level*2 exposes the one-step lag), and the TS AC2.2 parity test
widened from stocks+time to every variable.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82193345f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// to resume, and to a full run. It does NOT re-snapshot `prev_values`, so a
// resume's `PREVIOUS` still sees the last completed step.
{
let (curr, next) = borrow_two(&mut data, n_slots, self.curr_chunk, self.next_chunk);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip post-loop re-eval when curr and next chunks alias

The new unconditional post-loop borrow_two call can panic when run_to(end) is invoked with end beyond the simulation horizon (a supported clamp case): in save_advance!, the chunk-exhaustion branch sets curr_chunk = next_chunk and breaks before advancing next_chunk, so borrow_two is later called with equal indices and slices out of bounds. This turns a previously successful run_to(very_large_time) into a runtime crash in the VM path.

Useful? React with 👍 / 👎.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Review: mid-run getValue parity (#625)

Verdict: not correct — the new VM post-loop re-eval panics when run_to exits via chunk-ring exhaustion.

[P1] run_to past FINAL_TIME now panics in the VM (borrow_two aliases the same chunk)

src/simlin-engine/src/vm.rs:870-882

The new re-eval block calls borrow_two(&mut data, n_slots, self.curr_chunk, self.next_chunk) unconditionally after the loop. When the loop exits via the chunk-ring exhaustion break in save_advance! (vm.rs:699-705), that branch runs self.curr_chunk = self.next_chunk before break, so it leaves curr_chunk == next_chunk == n_chunks+1. borrow_two with equal indices computes split = (n_chunks+1)*n_slots and then slices left[(n_chunks+1)*n_slots..(n_chunks+2)*n_slots] out of a left of length (n_chunks+1)*n_slots, panicking with "range end index out of range." This is reachable whenever run_to(target) is called with target past FINAL_TIME (e.g. the FFI simlin_sim_run_to forwards time unclamped, and the wasm side's own comment at module.rs:1168-1205 documents "a target past FINAL_TIME" / "interactive scrubbing that stays at the end" as a supported case it clamps to the slab end "exactly like the VM's chunk-ring exhaustion"). Before this commit that path returned Ok gracefully; now the VM aborts while the wasm backend (guarded by saved >= n_chunks and a fixed CURR_BASE) handles it — a new VM/wasm divergence and a panic across the C FFI boundary. Guarding the re-eval on curr_chunk != next_chunk (or otherwise handling the exhausted-slab case) restores the prior behavior.

Minor: the "idempotent for the full-run/run/reset paths because they break with a freshly-evaluated curr already" note (vm.rs:867-869, module.rs:1252-1253) is imprecise: on a full run_to(stop) the resting curr is the chunk that was only ever written as next (stocks + time only), so its flow/aux slots are stale and the re-eval does change them — it is not a no-op there (only a resumed run_to on a full slab is). The observable series is still fine, but the comment's premise does not hold.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2026

Codecov Report

❌ Patch coverage is 70.83333% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.57%. Comparing base (d1b2492) to head (03ea843).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/simlin-engine/src/vm.rs 22.22% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #632      +/-   ##
==========================================
- Coverage   83.58%   83.57%   -0.02%     
==========================================
  Files         275      276       +1     
  Lines       75395    75419      +24     
==========================================
+ Hits        63019    63029      +10     
- Misses      12376    12390      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Both PR #632 reviewers flagged a panic: the post-loop root-flow re-eval
added for #625 called borrow_two(curr_chunk, next_chunk) unconditionally,
but run_to(target) with target past FINAL_TIME exits the loop via the
chunk-ring exhaustion break in save_advance!, which sets
curr_chunk = next_chunk before breaking. borrow_two with two equal chunk
indices slices left[a*n_slots..(a+1)*n_slots] out of a left of length
a*n_slots and panics -- aborting across the FFI boundary on what was
previously a graceful clamp (e.g. an unclamped simlin_sim_run_to).

Guard the re-eval on curr_chunk != next_chunk. The exhausted-slab case is
exactly the one a mid-interval read never reaches (a full slab means time
has reached FINAL_TIME, not mid-interval), so skipping the re-eval there
is correct and restores the pre-#625 graceful behavior. Every normal
run_to (mid-interval included) breaks via the curr[TIME] > end path with
distinct chunks, so the re-eval still runs and #625 stays fixed.

Also correct the imprecise "idempotent for the full-run path" note: on the
VM a full run leaves curr on the next-written overshoot chunk (stale
flow/aux), so the re-eval does change it there -- but that chunk is beyond
the saved results, so it stays invisible to a full run. The wasm note is
tightened too (it does break with a consistent curr, and has no chunk
aliasing since curr is the fixed CURR_BASE region).

Adds a VM regression test (run_to 10x past the end returns Ok, not panic).
The new test pushed vm.rs over the 6000-line file lint, so the
vm_reset_and_run_initials_tests module is extracted to a sibling file
included via #[path] (the existing vm_set_value_tests.rs pattern), so
use super::* still resolves vm's private items.

The residual VM-vs-wasm divergence for run_to(past FINAL_TIME) (VM
overshoots the slab; wasm clamps) is tracked as #634 -- not reachable via
in-range scrubbing.
@bpowers
Copy link
Copy Markdown
Owner Author

bpowers commented May 23, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Code review

I traced both backends' break paths and the getValue read paths. The in-range scrubbing fix (partial run_to(target < stop)) is correct and consistent across VM and wasm, including for PREVIOUS auxes. The test-file split is complete (all 17 original tests preserved + the new panic regression). One issue stands out.

[P2] wasm post-loop flows(0) makes PREVIOUS auxes stale-by-one after a full run, diverging from the VM and from getSeries

src/simlin-engine/src/wasmgen/module.rs:1258-1259

The post-loop flows(0) re-eval is unconditional, so it also runs after the slab-exhaustion break that run_to_end (and any run_to(t >= stop)) takes. In that break curr was not advanced — it is the just-stepped row at t = stop, and prev_values was already snapshotted to that same row (the per-step snapshot in emit_prev_snapshot runs after the step's flows). The re-eval therefore resolves PREVIOUS(x) against curr's own snapshot, yielding x(stop) instead of x(stop - dt). Since the wasm engine's simGetValue reads the live curr (direct-backend.ts:611), getValue of a PREVIOUS-using variable after a full run returns the wrong timestep and diverges from the VM (whose simGetValue reads the last committed results row once run_to_end consumes the VM, simulation.rs:427-430) and from the wasm backend's own getSeries (committed row = x(stop - dt)). The comment's "the re-eval is idempotent there" holds only for non-PREVIOUS slots; for PREVIOUS it is not. This is pre-existing-correct behavior the PR regresses, but only in this combination: a model with a PREVIOUS() variable, the wasm engine, and a getValue read after a full/at-stop run — which is why the teacup-based parity tests (no PREVIOUS aux) and the new mid-interval test don't catch it. Guarding the re-eval on saved < n_chunks (skip on a full slab) would mirror the VM's curr_chunk != next_chunk guard and fix it while still re-evaluating the in-range scrubbing case.

Overall correctness

The patch is correct for its stated target (in-range mid-interval getValue parity) and the chunk-ring panic guard is sound. The one finding above is a narrow but genuine wasm-only divergence for PREVIOUS variables read via getValue after a full run, contradicting the PR's parity goal; I'd consider the patch not fully correct until that case is guarded or explicitly folded into the tracked #634 divergence.

PR #632 review (Claude) found a wasm-only regression: the post-loop
flows(0) re-eval added for #625 ran unconditionally, including after a
full / at-stop run. A full run breaks via the slab-exhaustion path, which
does NOT advance curr -- curr is the just-saved t=stop row, and
prev_values was already snapshotted to that same row (the per-step
snapshot runs after the step's flows). The re-eval then resolved
PREVIOUS(x) against curr's own snapshot, yielding x(stop) instead of
x(stop-dt), corrupting the live curr that the host's getValue reads. So
getValue of a PREVIOUS-using variable after a full run diverged from the
committed series and from the VM (which reads the last results row).

Guard the re-eval on saved < n_chunks: skip it exactly on the break paths
that do not advance curr (the full-slab exhaustion break and the resumed
full-slab top guard), where curr is already the correct last-saved row.
This mirrors the VM's curr_chunk != next_chunk guard ("re-eval only when
curr was advanced"). The mid-interval case (saved < n_chunks, curr
advanced) still re-evals, and there prev_values holds the genuinely
previous step, so PREVIOUS resolves correctly -- #625 stays fixed,
including for PREVIOUS auxes. Skipping also makes a resumed run_to on a
full slab a strict no-op rather than an idempotent re-eval.

The VM needs no change here: its full run advances curr (it breaks via
curr[TIME] > end after the advance, distinct chunks), so its re-eval reads
a genuine previous snapshot, and getValue after run_to_end reads the
results row regardless.

Tests: a new full-run regression (a PREVIOUS aux's live curr must equal
the last committed series row and the VM, = level at t-dt, not t), and the
mid-run parity test's fixture gains a PREVIOUS aux so the all-variable
mid-interval comparison covers it too.
@bpowers
Copy link
Copy Markdown
Owner Author

bpowers commented May 23, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Review (Claude)

Reviewed the mid-run getValue reconciliation. I traced both backends' integration-loop break paths and confirmed the two guards are correct.

  • VM (vm.rs:879): curr_chunk != next_chunk correctly skips the post-loop flow re-eval on the chunk-ring exhaustion break, where save_advance! sets curr_chunk = next_chunk and borrow_two with two equal indices would slice left[a*n..(a+1)*n] out of a left of length a*n and panic. Verified run_to(target > stop) reaches that exhaustion break (aliased chunks) rather than the curr[TIME] > end break — covered by the new run_to_past_final_time_does_not_panic.
  • wasm (wasmgen/module.rs:1270): saved < n_chunks skips the flows(0) re-eval exactly on the two slab-full break paths (post-save exhaustion + top-of-loop guard), where curr is the just-saved t=stop row and prev_values was already snapshotted to that same row, so a re-eval would resolve PREVIOUS(x) to x(stop) instead of x(stop-dt). The mid-interval path (curr advanced, prev_values = last completed step) still re-evals, so PREVIOUS resolves to x(t-dt) correctly. The guard mirrors the VM's "re-eval only when curr was advanced".

Confirmed full-run getValue parity is unaffected: simlin_sim_run_to_end consumes the VM into results (so the host reads the last results row, not the re-eval'd overshoot chunk), while a mid-run run_to leaves the VM live and reads the re-eval'd resting chunk. The wasm host reads its live curr (the last saved row after a full run, the resting chunk mid-run). The new mid_run_curr_is_self_consistent_and_matches_vm and full_run_previous_aux_curr_matches_series_and_vm tests pin both regimes, including a PREVIOUS aux.

The residual run_to(past FINAL_TIME) VM-vs-wasm divergence (VM overshoots the slab via get_value_now; wasm clamps) is pre-existing for stocks/time and is tracked as #634.

Overall correctness: correct. No blocking issues found.

@bpowers bpowers merged commit 5106a28 into main May 23, 2026
13 checks passed
@bpowers bpowers deleted the engine-midrun-getvalue-parity branch May 23, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

engine: mid-run getValue diverges between VM and wasm for non-stock vars (live curr-chunk overshoot handling)

1 participant