Skip to content

engine: init runlist ordering is nondeterministic for an init-time algebraic cycle through a synthetic-module stock output #595

@bpowers

Description

@bpowers

Summary

The init-phase dependency runlist can be nondeterministic (HashMap-iteration-order dependent) when a model has a genuine init-time algebraic cycle that passes through a synthetic-module stock output. This is the init-phase analogue of the dt-phase ordering bug fixed in commit eeace429 ("engine: chain-break stock submodel-output deps in the dt runlist"), which only repaired the dt phase and explicitly left the init phase carrying the edge.

This is a latent correctness bug: the model compiles (cycle detection reports has_cycle == false), but the chosen init order is HashMap-iteration-order dependent and can place a module before its own input, producing nondeterministic / incorrect initial values.

Mechanism

A synthetic module is treated as a sink by the init-phase cycle relation -- init_walk_successors (src/simlin-engine/src/db_dep_graph.rs, the if info.is_module { return Vec::new(); } early-return) returns no successors for a module. But the ordering data follows the real edge module -> input (module·output references are collapsed to the bare module name by normalize_deps, and the module carries its input src as a direct dep).

In the dt phase this same shape was resolved by chain-breaking the stock-submodel-output dependency: a stock's dt value does not depend on its inputs within the same step, so the spurious reader -> module edge can be dropped. See commit eeace429 and the legacy gate model.rs::module_output_deps.

In the INIT phase that chain-break is NOT valid: a stock's INIT value is a real init dependency. init_walk_successors's own rustdoc (db_dep_graph.rs ~143-153) documents this asymmetry deliberately -- "a Stock is NOT an init-phase sink", "no stock filter and no stock sink (a stock-targeted init dep is a real init dependency, kept)". So the edge cannot simply be removed the way it was in the dt phase.

The result:

  • Cycle detection never sees a cycle, because init_walk_successors makes the module a sink -> has_cycle == false -> the model compiles.
  • But the init ordering data (the initial_deps field consumed by the topo_sort_str closure, db_dep_graph.rs ~2149, called for the initials runlist at ~2311) still has the module -> input edge plus, in the cyclic shape, a reader -> module edge that closes an ordering cycle invisible to detection.
  • topo_sort_str then breaks that ordering cycle arbitrarily -- and, as the dt-phase commit message documents for the analogous case, "arbitrarily" means HashMap-iteration dependent. The module can be emitted before its own input helper => nondeterministic init runlist => potentially nondeterministic / incorrect initial values.

The dt-phase commit message states the boundary explicitly: "Only dt_deps is filtered; the init phase keeps the edge (a stock's init value is computed during init, so it is a real init dependency)." That correctly preserves init correctness for the non-cyclic case, but leaves the cyclic-shape ordering soundness gap open.

Scope / trigger

  • Triggered by a model with a non-init-broken feedback through a synthetic-module stock output -- e.g. a SMOOTH / SAMPLE IF TRUE module whose input feeds back to a reader of the module's stock output, where the feedback is NOT broken by ACTIVE INITIAL or an explicit init value.
  • C-LEARN's emissions_with_stopped_growth does not trigger it, because its feedback is init-broken by ACTIVE INITIAL (so t0 always matched). That is why the C-LEARN residual work did not surface it.
  • No known in-repo model triggers it today, so there is currently no failing test -- it is a latent soundness gap, not an observed regression.

Why it matters

  • Correctness / determinism: for the specific cyclic shape above, the salsa init runlist ordering is not a function of the model alone -- it depends on HashMap iteration order. Two runs (or two compilations) of the same model can produce different init orderings and therefore different initial values, one of which is wrong.
  • It is a general defect in the init-phase ordering analysis for module/macro stock outputs, not specific to any one model.

Components affected

  • src/simlin-engine/src/db_dep_graph.rs:
    • build_var_info -- where the dt-phase chain-break fix landed (and where an init-phase resolution would need to live, with different semantics).
    • init_walk_successors -- the init-phase cycle relation that makes a module a sink (so detection misses the cycle) while ordering data keeps the module -> input edge.
    • the initial_deps VarInfo field and the topo_sort_str closure (~2149, invoked for the initials runlist ~2311) -- where the arbitrary, HashMap-order-dependent tie-break happens.
  • Contrast with the dt-phase fix in commit eeace429 and the legacy model.rs::module_output_deps gate.

Possible approaches for resolution

The dt-phase fix (drop the edge) cannot be reused, because the init edge is a genuine dependency. Candidate directions (need design discussion, not papered over):

  • Make the init ordering consistent with what init detection assumes: since init_walk_successors already treats the module as a sink, ensure the init-phase ordering graph either (a) does not carry a back-edge that closes an ordering cycle detection cannot see, or (b) makes any such residual cycle a detected CircularDependency rather than a silently arbitrary tie-break -- whichever is the correct semantics for "stock init value feeds back through a synthetic-module output". The key invariant to restore: the set of edges topo_sort_str orders on must be exactly the set the cycle detector walked, so a real init cycle is reported and a non-cycle is deterministically ordered.
  • Whatever the fix, it must be a general structural fix keyed on the module-stock-output-in-an-init-cycle shape, and should land with a regression fixture exhibiting a non-init-broken SMOOTH/SAMPLE IF TRUE feedback whose init order is currently HashMap-dependent (mirroring the inline-MDL dt-phase regression added in eeace429).

Context / lineage

Discovered during C-LEARN residual Phase 3 Task 5 (branch clearn-residual), which fixed only the dt-phase analogue (commit eeace429). The init-phase gap is out of scope for that task and is a candidate for a dedicated fix or a Phase 4 follow-up.

Severity: latent correctness bug (nondeterministic / incorrect init values for a specific cyclic shape); no known in-repo model triggers it today, but it is a real soundness gap in the salsa init runlist ordering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions