From 1c5605c762beb0881a6c169971a086f8bbe74336 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 17:46:54 -0700
Subject: [PATCH 01/38] doc: add layout quality eval design plan

Documents infrastructure for iterating on simlin-engine's automatic
diagram layout. Today layout judges itself by edge-crossing count alone
and cannot be rendered outside the browser; this plan closes both gaps
with a pure, geometry-accurate LayoutMetrics suite scored on the same
geometry the PNG renderer draws, a benchstat-style statistics core that
treats layout quality as a distribution over seeds (median, geomean,
Mann-Whitney U) rather than one fixed-seed sample, and an on-demand
corpus sweep example that renders and scores layouts against
human-authored reference views.

The first algorithm step (Rung 0) re-points select_best_layout to rank
seeds by the full weighted_cost instead of crossings-only, and the
crossing count moves from a straight-chord approximation to polyline
sampling so curved/bent connectors are measured faithfully. Rungs 1-3
(parameter search, metric-driven annealing cost, new layout passes) are
documented as the forward path the harness is built to support.
---
 docs/README.md                                |   1 +
 .../2026-05-22-layout-quality-eval.md         | 539 ++++++++++++++++++
 2 files changed, 540 insertions(+)
 create mode 100644 docs/design-plans/2026-05-22-layout-quality-eval.md

diff --git a/docs/README.md b/docs/README.md
index b01911620..a055aed08 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -32,6 +32,7 @@
   - [design-plans/2026-05-19-clearn-residual.md](design-plans/2026-05-19-clearn-residual.md) -- Close C-LEARN's residual (#590/#591) as general Vensim import/simulation primitives: arrayed inline graphical functions, import-time macro shadowing, user-macro INITIAL recurrence, residual attribution; 5 phases
   - [design-plans/2026-05-20-wasm-backend.md](design-plans/2026-05-20-wasm-backend.md) -- WebAssembly code-generation backend: compile a model to one self-contained wasm module as an alternative to the bytecode VM (for fast interactive re-simulation), validated to full VM parity; 8 phases
   - [design-plans/2026-05-22-engine-wasm-sim.md](design-plans/2026-05-22-engine-wasm-sim.md) -- Integrate the wasm backend into `@simlin/engine` as a selectable engine (`Model.simulate({engine:'wasm'})`): vm-vs-wasm demux below the `Sim` facade in `DirectBackend`, a resumable blob run ABI for `runTo`, and a node VM-vs-wasm benchmark; 4 phases
+  - [design-plans/2026-05-22-layout-quality-eval.md](design-plans/2026-05-22-layout-quality-eval.md) -- Layout quality evaluation + hill-climbing harness: a pure geometry-accurate `LayoutMetrics` (overlap/sprawl/accurate-arc crossings) and benchstat-style seed-distribution stats, an on-demand corpus sweep that renders and scores layouts against human references, and Rung 0 (rank seeds by `weighted_cost`); 5 phases
 - [plans/](plans/README.md) -- Implementation plans (active and completed)
 - [test-plans/](test-plans/) -- Human verification plans for completed features
   - [test-plans/2026-05-22-engine-wasm-sim.md](test-plans/2026-05-22-engine-wasm-sim.md) -- Manual verification for the `@simlin/engine` selectable wasm engine (`Model.simulate({engine:'wasm'})`): re-running the automated gates, driving the gated/`#[ignore]`d heavy tests, and the human-judged extras (interactive scrubbing feel, VM-vs-wasm benchmark numbers); all 25 ACs already have automated coverage
diff --git a/docs/design-plans/2026-05-22-layout-quality-eval.md b/docs/design-plans/2026-05-22-layout-quality-eval.md
new file mode 100644
index 000000000..91f1ebc60
--- /dev/null
+++ b/docs/design-plans/2026-05-22-layout-quality-eval.md
@@ -0,0 +1,539 @@
+# Layout Quality Evaluation and Hill-Climbing Harness Design
+
+## Summary
+
+This work builds a closed-loop measurement and tooling harness around `simlin-engine`'s
+automatic diagram layout, so that an agent (or human) can improve layout quality with
+evidence instead of guesswork. The core is two **pure** Rust modules that hold all the
+logic: a *quality-metric core* (`layout/metrics.rs`) that scores a laid-out diagram on
+explicit, scale-free aesthetic cost terms -- node overlap, connectors running through
+nodes, label overlap, edge crossings, sprawl, edge-length unevenness, and aspect ratio --
+and collapses them to a single `weighted_cost` scalar; and a *statistics core*
+(`layout/eval_stats.rs`) that treats a layout's quality as a distribution over random
+seeds, summarizing it with medians, percentiles, a corpus-wide geomean, and a Mann-Whitney
+U significance test (the way Go's `benchstat` compares benchmark runs). Crucially, the
+metric is computed on the *same geometry the PNG renderer draws*, so a layout's score can
+never disagree with how it actually looks. An imperative shell -- an on-demand example
+binary (`examples/layout_eval.rs`) -- composes these cores: it sweeps a curated corpus of
+models, lays each out across many seeds, scores them, renders the best/median/worst (and any
+hand-authored reference view) to PNG, and writes a metrics table plus an HTML contact-sheet.
+
+The architecture exists to enable a tight iteration loop: change a layout parameter or code
+path, run the sweep, read the geomean delta *and look at the rendered contact-sheet*, then
+keep or revert based on whether the change is statistically significant and visually better.
+The scalar `weighted_cost` is the hill to climb; the rendered images are the guardrail
+against optimizing the number while degrading the picture (Goodhart's law); and a small set
+of human-vs-AI reference pairs is the objective check that the metric agrees with human
+taste. With that loop in place, the design takes only the first, smallest algorithm step --
+"Rung 0," re-pointing seed selection to rank by the full metric instead of crossings alone
+-- and protects the gain with a fast deterministic CI guard. Rungs 1-3 (parameter search,
+a metric-driven search objective, and new layout passes) are documented as the forward path
+the harness is built to support, not built here.
+
+## Definition of Done
+
+This work builds the measurement and tooling infrastructure that lets an agent
+iteratively improve `simlin-engine`'s automatic diagram layout. It defines *what a good
+layout is* (an explicit, geometry-accurate quality metric) and *how to judge outputs* (a
+corpus sweep that renders and statistically scores layouts), then takes the first
+improvement step (Rung 0). The layout algorithm itself is not redesigned beyond Rung 0;
+rungs 1-3 are documented as the forward path.
+
+Today the layout engine judges a layout by exactly one quantity -- edge-crossing count
+(`annealing.rs` simulated-annealing cost; `select_best_layout` seed ranking) -- and there
+is no in-repo way to *see* a generated layout outside the browser. This design closes both
+gaps.
+
+1. **A pure `LayoutMetrics` module** (`src/simlin-engine/src/layout/metrics.rs`) computes
+   scale-free aesthetic *cost* terms (0 = ideal) from a `StockFlow` view, on the same
+   geometry the PNG renderer draws: `node_overlap`, `node_connector_overlap`,
+   `label_overlap`, `crossings`, `sprawl`, `edge_length_cv`, and `aspect_penalty`, plus
+   reserved zero-weighted structure terms. `weighted_cost(&MetricWeights) -> f64` collapses
+   them to one scalar to minimize.
+
+2. **Edge crossings are counted on real geometry** -- arcs and multipoint connectors
+   sampled to polylines -- replacing the straight-chord approximation that
+   `count_view_crossings` (`mod.rs`) uses today.
+
+3. **A Rust in-tree corpus sweep driver** (`src/simlin-engine/examples/layout_eval.rs`)
+   runs over a curated `test/` corpus: for each model it generates layouts across multiple
+   independent seeds, computes `LayoutMetrics` for each, renders the best/median/worst
+   layouts to PNG, and -- where the model ships a hand-authored view -- also scores and
+   renders that view as a reference. No pysimlin or other-binding surface is added.
+
+4. **The sweep reports statistically**: per-model median + spread over the seed samples,
+   a corpus geomean-of-medians aggregate, the production best-of-k cost, and a
+   baseline-vs-candidate comparison using a Mann-Whitney U significance test -- emitted as a
+   metrics table (JSON) and an HTML contact-sheet (best/median/worst per model with score
+   breakdowns), written to a gitignored output directory under `target/`.
+
+5. **Metric weights are calibrated and committed**: initial weights set from the
+   failure-mode priorities (overlap + crossings dominant; sprawl/aspect moderate;
+   structure ~0), refined against rendered examples, and validated by a reference-pair
+   check -- on agreed human-vs-AI model pairs the metric scores the human layout lower
+   (better) than the worse machine layout.
+
+6. **Rung 0 is wired in**: `select_best_layout` (`mod.rs`) ranks the candidate seeds by
+   `weighted_cost` (using the accurate crossing count) instead of crossings-only.
+
+7. **A deterministic CI regression guard**: a fast test over a few tiny models asserts
+   `weighted_cost` stays at or below a committed threshold, and the reference-pair ordering
+   is encoded as a test -- both within the workspace's 3-minute test-time budget.
+
+8. **The hill-climbing ladder (rungs 1-3) is documented** as the forward path (parameter
+   search; metric-driven annealing cost; new layout passes), naming the seam each rung
+   touches.
+
+### Out of scope
+- Redesigning the layout algorithm beyond Rung 0 (rungs 1-3 are documented, not built).
+- Exposing metrics or rendering through pysimlin or any non-Rust binding.
+- A preference-judging UI or a trained preference model (the explicit metric is the chosen
+  signal; human preference enters only as up-front calibration).
+- SD-structure metrics as *weighted* terms (chain straightness, loop readability) -- the
+  fields exist but are zero-weighted initially, since these were de-prioritized.
+
+## Acceptance Criteria
+
+### layout-quality-eval.AC1: Metric terms are geometry-correct and scale-free
+- **layout-quality-eval.AC1.1 Success:** Two node boxes overlapping by a known area yield a `node_overlap` equal to the known overlap fraction.
+- **layout-quality-eval.AC1.2 Success:** Pairwise-disjoint nodes yield `node_overlap` = 0.
+- **layout-quality-eval.AC1.3 Success:** A connector whose polyline passes through a non-incident node box contributes to `node_connector_overlap`; one that avoids all non-incident boxes yields 0.
+- **layout-quality-eval.AC1.4 Success:** Two label boxes overlapping by a known area yield a matching `label_overlap`; non-overlapping labels yield 0.
+- **layout-quality-eval.AC1.5 Success:** `aspect_penalty` is 0 inside the target aspect band and positive outside it (a 1x10 bbox is penalized; a ~4:3 bbox is not).
+- **layout-quality-eval.AC1.6 Success:** `weighted_cost` equals the exact linear combination Σ wᵢ·termᵢ for given weights.
+- **layout-quality-eval.AC1.7 Edge:** An empty or single-element view yields all-zero terms with no NaN or divide-by-zero.
+- **layout-quality-eval.AC1.8 Success:** Uniformly scaling all coordinates leaves every normalized term unchanged within tolerance (scale invariance).
+
+### layout-quality-eval.AC2: Crossings are counted on real geometry
+- **layout-quality-eval.AC2.1 Success:** Two connectors that cross once yield a crossing count of 1; connectors sharing an endpoint yield 0.
+- **layout-quality-eval.AC2.2 Success:** An Arc/MultiPoint connector that visually crosses another edge is counted via polyline sampling, on a constructed case where the straight-chord approximation does not count it.
+- **layout-quality-eval.AC2.3 Success:** The crossing count is invariant under translation and rotation of the whole view.
+
+### layout-quality-eval.AC3: Corpus sweep produces renders and scores
+- **layout-quality-eval.AC3.1 Success:** `cargo run --release --example layout_eval` runs over the curated corpus and exits 0.
+- **layout-quality-eval.AC3.2 Success:** It writes `metrics.json` with per-model term breakdowns + `weighted_cost` and corpus aggregates.
+- **layout-quality-eval.AC3.3 Success:** It writes `index.html` referencing best/median/worst PNGs per model with score breakdowns.
+- **layout-quality-eval.AC3.4 Success:** Models shipping a hand-authored view get a reference render + score alongside the auto-layout.
+- **layout-quality-eval.AC3.5 Success:** All artifacts are written under `target/` (gitignored); nothing is committed.
+- **layout-quality-eval.AC3.6 Edge:** A model that fails to lay out or render is reported and skipped, not fatal to the sweep.
+
+### layout-quality-eval.AC4: Statistical reporting and comparison
+- **layout-quality-eval.AC4.1 Success:** Per model, M seeds produce M samples; the report includes median + spread (p25/p75) and the best-of-k production proxy.
+- **layout-quality-eval.AC4.2 Success:** The corpus aggregate is the geomean of per-model medians.
+- **layout-quality-eval.AC4.3 Success:** A baseline-vs-candidate run reports per-model and aggregate deltas, each with a Mann-Whitney U p-value / significance verdict.
+- **layout-quality-eval.AC4.4 Success:** `geomean`, median/percentile, and Mann-Whitney U match known reference values.
+- **layout-quality-eval.AC4.5 Edge:** Identical baseline and candidate yield a zero aggregate delta and a non-significant verdict.
+
+### layout-quality-eval.AC5: Calibration is validated objectively
+- **layout-quality-eval.AC5.1 Success:** Committed default `MetricWeights` give overlap and crossings the dominant weights and the reserved structure terms zero weight.
+- **layout-quality-eval.AC5.2 Success:** On the agreed human-vs-AI reference pairs, `weighted_cost(human) < weighted_cost(ai)` under the committed weights (encoded as a test).
+
+### layout-quality-eval.AC6: Rung 0 selection uses the full metric
+- **layout-quality-eval.AC6.1 Success:** `select_best_layout` picks the lowest-`weighted_cost` candidate, verified on constructed candidates where the lowest-cost layout has *more* crossings than another candidate (so the choice differs from crossings-only).
+- **layout-quality-eval.AC6.2 Success:** The existing layout test suite (`tests/layout.rs`, `layout_tests.rs`, `layout_review_tests.rs`) passes unchanged with the new selection.
+
+### layout-quality-eval.AC7: CI regression guard
+- **layout-quality-eval.AC7.1 Success:** A deterministic test over a few tiny models asserts `weighted_cost` <= a committed threshold and completes well within the test-time budget.
+- **layout-quality-eval.AC7.2 Failure:** Raising a guard model's `weighted_cost` above the threshold makes the test fail.
+
+### layout-quality-eval.AC8: Cross-cutting
+- **layout-quality-eval.AC8.1 Success:** A fixed seed reproduces a byte-identical layout (determinism), distinct from the M-seed statistical sampling.
+- **layout-quality-eval.AC8.2 Success:** Additional Considerations documents rungs 1-3 and names the seam each touches.
+
+## Glossary
+
+- **System dynamics (SD) / stock-and-flow model**: A modeling approach that represents a
+  system as stocks (accumulations) connected by flows (rates of change) and feedback links.
+  Simlin builds, simulates, and visualizes these models; their visual form is the "diagram"
+  whose layout this work scores.
+- **StockFlow / `StockFlow` view**: The engine's data structure for a model diagram -- the
+  collection of `ViewElement`s (and their positions) that make up one visual view of a
+  model. The metric takes a `&StockFlow` as input.
+- **`ViewElement`**: A single positioned item in a `StockFlow` view (a stock, flow, auxiliary
+  variable, connector, alias, etc.). Layout assigns each one a position.
+- **Connector / Arc / MultiPoint / `Flow.points`**: Connectors are the links drawn between
+  elements. They are not always straight: an Arc is a curved link, a MultiPoint connector
+  bends through intermediate points, and a flow's pipe follows `Flow.points`. The crossing
+  count and metric sample these into polylines so curved/bent geometry is measured
+  faithfully.
+- **SFDP**: The force-directed graph layout algorithm used to place nodes (`layout/sfdp.rs`),
+  treating links as springs and nodes as mutually repelling charges. Its tunable parameters
+  (`k`, `c`, `p`, spacing constants) are the target of the documented Rung 1 parameter
+  search.
+- **Force-directed layout**: The broader family of layout algorithms (SFDP is one) that
+  positions nodes by simulating attractive/repulsive forces until the system settles.
+- **Simulated annealing (SA)**: The optimization pass (`layout/annealing.rs`) that refines a
+  layout by randomly perturbing it and accepting changes probabilistically, with the
+  acceptance probability cooling over time. It currently minimizes edge crossings only;
+  Rung 2 would feed it the full `weighted_cost`.
+- **Edge crossings**: Places where two connectors visually intersect -- a primary source of
+  diagram clutter, and today the *only* quantity layout optimizes.
+- **`count_view_crossings`**: The existing function (`mod.rs`) that counts crossings. Today it
+  approximates connectors as straight chords; this work refactors it to count on sampled
+  polylines so arcs and bends are handled correctly.
+- **`LAYOUT_SEEDS` / seed sampling**: Production runs layout from four fixed random seeds
+  (`[42, 123, 456, 789]`) and keeps the best result. Because layout is deterministic per
+  seed but its quality varies across seeds, the sweep instead samples *many* seeds to
+  characterize the quality distribution rather than a single lucky/unlucky result.
+- **`select_best_layout`**: The function (`mod.rs`) that picks the winning candidate among
+  the seed runs. Rung 0 re-points it from "fewest crossings" to "lowest `weighted_cost`."
+- **`LayoutMetrics` / `weighted_cost` / `MetricWeights`**: The new quality-metric types.
+  `LayoutMetrics` holds one cost term per aesthetic concern (0 = ideal, all scale-free);
+  `MetricWeights` is one weight per term; `weighted_cost` is their weighted sum `Σ wᵢ·termᵢ`
+  -- the single scalar an optimizer minimizes.
+- **`render_png` / resvg**: `render_png` (`diagram/render_png.rs`, behind the `png_render`
+  feature) rasterizes a diagram to a PNG; resvg is the Rust SVG-rendering library it uses.
+  Because the engine's SVG output is byte-identical to the product's TypeScript renderer,
+  the PNG faithfully reflects the real UI.
+- **geomean (geometric mean)**: The aggregate used to combine per-model median costs across
+  the corpus. Unlike the arithmetic mean, it averages ratios fairly so one large-cost model
+  cannot dominate the corpus score.
+- **Mann-Whitney U test**: A non-parametric significance test that decides whether two
+  samples differ. It is used to judge whether a baseline-vs-candidate cost difference is real
+  signal or seed noise, without assuming the cost distributions are normal.
+- **benchstat**: A Go tool that compares benchmark runs by reporting center, spread, and a
+  significance test over many samples. The statistics core deliberately mirrors its approach
+  for layout quality.
+- **best-of-k**: A "production proxy" statistic -- the minimum cost over k seeds -- that
+  mirrors what production actually ships (best of the fixed seed set), reported alongside the
+  full distribution.
+- **Reference pair (human-vs-AI)**: An agreed pairing of a hand-authored ("human") layout and
+  a machine-generated ("AI") layout of the same model. The metric is validated by requiring
+  `weighted_cost(human) < weighted_cost(ai)` -- an objective check that it agrees with human
+  taste.
+- **Contact-sheet**: The generated `index.html` report -- a grid showing each model's
+  best/median/worst renders (and any reference view) with their score breakdowns, sorted
+  worst-first -- inspected every iteration as the visual guardrail.
+- **"Rungs" / hill-climbing ladder**: The staged forward path for improving layout. Rung 0
+  (built here) changes only seed selection; Rungs 1-3 (documented, not built) are parameter
+  search, a metric-driven search objective, and new layout passes -- each "rung" a discrete,
+  measurable step up the quality hill.
+- **Goodhart('s law)**: "When a measure becomes a target, it ceases to be a good measure" --
+  i.e., any single fitness scalar will eventually be gamed. The contact-sheet renders,
+  visible per-term breakdowns, and reference-pair test are the design's guards against it.
+- **Functional core / imperative shell (FCIS)**: An architectural pattern that isolates pure,
+  side-effect-free logic (here, `metrics.rs` and `eval_stats.rs`) from the I/O-performing
+  shell (here, the `layout_eval.rs` example). The cores are heavily unit/property tested; the
+  shell stays thin.
+- **salsa**: The incremental computation framework backing the engine's model database; the
+  sweep driver syncs the salsa DB before laying out a model, reusing the path that the
+  existing `tests/layout.rs` uses to load corpus models.
+
+## Architecture
+
+The system has three parts, split along the functional-core / imperative-shell line: a
+**pure metric core** and a **pure statistics core** that the **imperative sweep driver**
+composes. Rendering already exists (`diagram::render_png`) and is reused unchanged.
+
+### Quality-metric core (`layout/metrics.rs`, pure)
+
+`compute_layout_metrics(view: &StockFlow, config: &LayoutConfig) -> LayoutMetrics` is a
+pure function with no I/O. It is computed on the **same geometry the renderer draws** --
+node bounding boxes, connector paths, and label boxes obtained from the `diagram` module's
+existing geometry helpers (`diagram::elements`/`flow` `*_bounds`, `diagram::connector`
+path, `diagram::label::label_bounds`) -- so a layout's score and its rendered PNG can never
+disagree. Every term is a **cost** (0 = ideal) and normalized to be scale-free, so models
+of different sizes are comparable and the corpus can be aggregated.
+
+| Term | Definition (cost; 0 = ideal) | Pain it captures |
+|------|------------------------------|------------------|
+| `node_overlap` | Σ pairwise node-box overlap area / Σ node area | clutter |
+| `node_connector_overlap` | connector-polyline length inside non-incident node boxes / total connector length | connectors under/through nodes |
+| `label_overlap` | overlap area among label boxes and label-vs-node boxes / Σ label area | clutter |
+| `crossings` | connector-polyline crossings (arcs sampled) / connector count | tangled connectors |
+| `sprawl` | mean connector length / characteristic node size | wasted space |
+| `edge_length_cv` | stddev/mean of connector lengths | elements drifting far / unevenness |
+| `aspect_penalty` | deviation of bbox aspect ratio from a target band | unviewable shape |
+| `chain_straightness`, `loop_compactness` | reserved, zero-weighted | (SD structure; deferred) |
+
+Contract:
+
+```rust
+pub struct LayoutMetrics {
+    pub node_overlap: f64,
+    pub node_connector_overlap: f64,
+    pub label_overlap: f64,
+    pub crossings: f64,
+    pub sprawl: f64,
+    pub edge_length_cv: f64,
+    pub aspect_penalty: f64,
+    pub chain_straightness: f64, // reserved, weight 0
+    pub loop_compactness: f64,   // reserved, weight 0
+}
+
+pub struct MetricWeights { /* one f64 per term */ }
+
+impl LayoutMetrics {
+    /// Σ wᵢ·termᵢ — the scalar an optimizer minimizes.
+    pub fn weighted_cost(&self, w: &MetricWeights) -> f64;
+}
+```
+
+`node_overlap`/`node_connector_overlap`, `crossings`, and the sprawl terms pull in opposite
+directions (compact vs. non-overlapping). That tension is intended: the weights set the
+balance, and the overlap terms keep "minimize area" from collapsing the layout.
+
+**Accurate crossings.** The `crossings` term, and a refactored `count_view_crossings`,
+operate on connector geometry sampled to polylines (arcs and `MultiPoint` links, plus
+`Flow.points`), not straight chords. This requires factoring the connector's path geometry
+into a polyline producer shared by the renderer and the metric, so both see identical
+geometry. This both feeds the metric and fixes a latent undercount in today's
+seed selection.
+
+### Statistics core (`layout/eval_stats.rs`, pure)
+
+Layout is deterministic at a fixed seed (RNGs are `StdRng::seed_from_u64`; no entropy
+source; the `par_iter` over seeds preserves order), so a specific layout is exactly
+reproducible. But a layout's *quality is a distribution over seed space*, and production
+samples it at the four fixed `LAYOUT_SEEDS` and takes the min. Evaluating a change on one
+fixed seed-set conflates a real improvement with seed luck. The statistics core treats
+evaluation the way Go's `benchstat` treats benchmarks: many samples, center + spread, and a
+significance test on differences.
+
+```rust
+pub struct MetricSample { pub seed: u64, pub metrics: LayoutMetrics, pub weighted_cost: f64 }
+
+pub struct ModelStats {
+    pub model: String,
+    pub samples: Vec<MetricSample>, // one per seed
+    pub median_cost: f64,
+    pub spread: (f64, f64),         // e.g. (p25, p75)
+    pub best_of_k_cost: f64,        // production proxy: min over k seeds
+    pub best_seed: u64, pub median_seed: u64, pub worst_seed: u64,
+}
+
+pub struct CorpusReport { pub per_model: Vec<ModelStats>, pub geomean_of_medians: f64 }
+
+/// Per-model and aggregate delta, each with a Mann-Whitney U p-value (non-parametric;
+/// robust to the non-normal cost distributions layout produces).
+pub fn compare(baseline: &CorpusReport, candidate: &CorpusReport) -> Comparison;
+```
+
+`geomean` (not arithmetic mean) aggregates normalized ratios across heterogeneous models so
+one large-cost model can't dominate; `median`/percentiles summarize each model's
+distribution; Mann-Whitney U decides whether a baseline-vs-candidate delta is signal or
+noise. All are pure, table-testable functions.
+
+### Sweep driver (`examples/layout_eval.rs`, imperative shell)
+
+The shell loads each model in a curated corpus list (via the engine's model-open + salsa
+sync path used by `tests/layout.rs`), and for each model:
+
+1. Runs layout for M independent seeds, producing M `MetricSample`s (and the best-of-k
+   production proxy). Requires a per-seed layout entry point (a thin
+   `generate_layout_seeded(seed)` over the existing per-seed pipeline).
+2. Renders the best/median/worst layouts to PNG via `diagram::render_png` (after writing
+   the generated `StockFlow` onto the model's view, which `render_png` reads as
+   `views.first()`).
+3. If the model file ships a non-empty hand-authored view, renders and scores that view
+   untouched as a **reference**.
+
+It then emits, to a gitignored dir under `target/layout-eval/`:
+- `metrics.json` -- per-model `ModelStats` with term breakdowns, plus corpus aggregates.
+- `index.html` -- a contact-sheet sorted worst-cost-first; each cell shows the
+  best/median/worst renders (and the reference, where present) with their metric
+  breakdowns; the header shows corpus geomean and the baseline delta with significance.
+- baseline diff -- `compare()` against a small committed `baseline.json`, printed and
+  embedded in the report.
+
+The driver declares `required-features = ["png_render", "file_io"]` and is run on demand
+(`cargo run --release --example layout_eval`); it is not part of `cargo test`.
+
+### Rung 0 wiring
+
+`select_best_layout` (`mod.rs`) currently keeps the candidate with the fewest crossings
+(tie-break on seed). Rung 0 changes it to keep the candidate with the lowest
+`weighted_cost` (computed with the accurate crossing count), tie-break on seed. This is the
+smallest, immediately-measurable improvement: "best of the candidate seeds" becomes "best
+by the full metric." It changes only selection, not the search.
+
+### The iteration loop this enables
+
+Change a parameter or code path -> run the sweep -> read `metrics.json` *and look at the
+contact-sheet* -> keep or revert based on the geomean delta and its significance, guarded by
+the rendered images. The scalar `weighted_cost` is the hill; the renders are the guardrail
+against gaming it (Goodhart); the reference pairs are the objective check that the metric
+agrees with human taste.
+
+## Existing Patterns
+
+Investigation grounded every touch point in current code; this design adds pure modules and
+one in-tree example, and re-points one existing decision function.
+
+- **Layout module and decision seams.** `src/simlin-engine/src/layout/` holds `mod.rs`
+  (orchestration; `count_view_crossings`; `select_best_layout`; `generate_best_layout`
+  running the `LAYOUT_SEEDS = [42,123,456,789]` candidates via `par_iter`), `sfdp.rs`
+  (force placement, `StdRng::seed_from_u64`), and `annealing.rs` (crossings-only SA cost).
+  This design adds `metrics.rs` and `eval_stats.rs` beside them and edits
+  `select_best_layout`. Terminology (SFDP, annealing, pinned nodes, chains) follows
+  `docs/design-plans/2026-03-27-incremental-layout.md`.
+- **Rendering already exists.** `src/simlin-engine/src/diagram/` provides `render.rs`
+  (`render_svg`), `render_png.rs` (`render_png` / `svg_to_png`, resvg + embedded
+  Roboto-Light, behind the `png_render` feature), with geometry in `elements.rs`,
+  `flow.rs`, `connector.rs`, `label.rs` (`label_bounds`), `common.rs` (`Rect`,
+  `calc_view_box`), and shared `constants.rs`. The metric reuses these geometry helpers so
+  scores match the rendered image. The TS renderer is byte-identical to `render_svg`, so
+  the PNG faithfully reflects the product UI.
+- **In-tree example precedent.** `src/simlin-engine/examples/backend_bench.rs` is an
+  existing on-demand benchmark example (VM-vs-wasm simulation); `examples/layout_eval.rs`
+  follows the same shape and `required-features` mechanism for an on-demand sweep.
+- **Corpus loading.** `tests/layout.rs` (`verify_layout`) already loads native diagram
+  models from `test/` (e.g. `open_xmile`) and syncs the salsa DB before layout; the sweep
+  driver reuses that loading path.
+- **Test-time budget.** Per `CLAUDE.md` / `docs/dev/rust.md`, `cargo test --workspace`
+  runs under a 3-minute cap and individual tests complete in seconds. The full corpus sweep
+  therefore stays in the example (not in tests); only a tiny deterministic guard runs in the
+  test suite.
+- **FCIS.** Pure cores (`metrics.rs`, `eval_stats.rs`) hold all logic and are unit/property
+  tested to the project's coverage bar; the example is a thin imperative shell.
+
+No pattern divergence: pure functions beside existing pure layout code, one example beside
+an existing example, one edit to an existing selection function.
+
+## Implementation Phases
+
+<!-- START_PHASE_1 -->
+### Phase 1: Quality-metric core + accurate crossings
+**Goal:** A pure, geometry-accurate `LayoutMetrics` and a polyline-based crossing count.
+
+**Components:**
+- `src/simlin-engine/src/layout/metrics.rs` (new) -- `LayoutMetrics`, `MetricWeights`,
+  `compute_layout_metrics(view, config)`, `weighted_cost`. Each term computed on the
+  `diagram` module's geometry helpers.
+- Shared connector-polyline geometry factored out of `diagram::connector` (sampling arcs
+  and `MultiPoint`), reused by the renderer and the metric.
+- `count_view_crossings` (`mod.rs`) refactored to count on polylines instead of straight
+  chords.
+- Unit tests on hand-built tiny views with known geometry (two boxes overlapping by a known
+  fraction; two segments crossing once; shared-endpoint connectors -> 0; a 1x10 bbox ->
+  known aspect penalty; an arc that crosses where its chord would not). Property tests:
+  overlap symmetric and scale-invariant; crossings invariant under translation/rotation.
+
+**Dependencies:** none.
+
+**Done when:** the metric terms match the hand-computed values, scale/translation
+invariance holds, the polyline crossing count differs from the old chord count on the
+constructed arc case, and `cargo test` passes. Covers `layout-quality-eval.AC1.*`,
+`layout-quality-eval.AC2.*`.
+<!-- END_PHASE_1 -->
+
+<!-- START_PHASE_2 -->
+### Phase 2: Statistics core
+**Goal:** Pure aggregation and significance testing for seed-sample distributions.
+
+**Components:**
+- `src/simlin-engine/src/layout/eval_stats.rs` (new) -- `MetricSample`, `ModelStats`,
+  `CorpusReport`, `Comparison`; `geomean`, `median`/percentile, and a Mann-Whitney U test;
+  `compare(baseline, candidate)` producing per-model and aggregate deltas with p-values.
+- Unit tests against known reference values (geomean of a known set; Mann-Whitney U on
+  textbook samples; identical baseline/candidate -> zero delta, non-significant).
+
+**Dependencies:** none (uses `LayoutMetrics` types from Phase 1 for `MetricSample`).
+
+**Done when:** the helpers match known values and `compare()` reports the expected
+significance verdicts. Covers `layout-quality-eval.AC4.4`, `layout-quality-eval.AC4.5`.
+<!-- END_PHASE_2 -->
+
+<!-- START_PHASE_3 -->
+### Phase 3: Corpus sweep driver and report
+**Goal:** An on-demand sweep that lays out, scores, renders, and reports over the corpus.
+
+**Components:**
+- `src/simlin-engine/examples/layout_eval.rs` (new) -- loads a curated corpus list
+  (canonical SIR/teacup/logistic-growth; modules; multipoint connectors; LTM/loop models;
+  aliases; the `test/ai-information` set; a few large `test/metasd` Vensim models),
+  runs M seeds per model, scores each, renders best/median/worst PNGs, and scores+renders
+  any shipped hand-authored view as a reference.
+- A per-seed layout entry point (`generate_layout_seeded(seed)`) over the existing per-seed
+  pipeline, so the driver can sample seeds and compute the best-of-k proxy.
+- Emits `metrics.json`, `index.html` contact-sheet, and a `compare()` diff against a
+  committed `baseline.json`, under `target/layout-eval/` (gitignored).
+- `Cargo.toml` example entry with `required-features = ["png_render", "file_io"]`.
+
+**Dependencies:** Phase 1 (metric), Phase 2 (stats).
+
+**Done when:** `cargo run --release --example layout_eval` completes, writes the JSON +
+contact-sheet referencing best/median/worst (and reference) renders, reports per-model
+median+spread / corpus geomean / best-of-k and a baseline delta with significance, places
+artifacts under `target/`, and skips (reports, non-fatally) any model that fails to lay out
+or render. Covers `layout-quality-eval.AC3.*`, `layout-quality-eval.AC4.1`,
+`layout-quality-eval.AC4.2`, `layout-quality-eval.AC4.3`.
+<!-- END_PHASE_3 -->
+
+<!-- START_PHASE_4 -->
+### Phase 4: Calibration and reference-pair validation
+**Goal:** Commit metric weights that match the user's taste, validated objectively.
+
+**Components:**
+- Committed default `MetricWeights` (overlap + crossings dominant; sprawl/aspect moderate;
+  structure terms 0), set via a talk-through over the Phase 3 contact-sheet, treating the
+  user's "this layout is better than that" judgments as ordering constraints on the linear
+  cost.
+- A reference-pair fixture (agreed human-vs-AI model pairs, e.g. from `test/ai-information`)
+  and a test asserting `weighted_cost(human) < weighted_cost(ai)` under the committed
+  weights.
+
+**Dependencies:** Phase 3 (need the contact-sheet to calibrate against), Phase 1.
+
+**Done when:** the committed weights satisfy the reference-pair ordering test, and the user
+has signed off on the weights after reviewing the contact-sheet. Covers
+`layout-quality-eval.AC5.*`.
+<!-- END_PHASE_4 -->
+
+<!-- START_PHASE_5 -->
+### Phase 5: Rung 0 wiring + CI regression guard
+**Goal:** Make seed selection use the full metric, and protect the gains in normal dev.
+
+**Components:**
+- `select_best_layout` (`mod.rs`) re-pointed to minimize `weighted_cost` (accurate
+  crossings), tie-break on seed.
+- A deterministic regression-guard test over a few tiny models asserting `weighted_cost`
+  stays at or below a committed threshold (fixed seeds; fast; under the time budget).
+- Confirm existing layout tests (`tests/layout.rs`, `layout_tests.rs`,
+  `layout_review_tests.rs`) still pass with the new selection.
+
+**Dependencies:** Phase 1 (metric), Phase 4 (committed weights).
+
+**Done when:** selection picks the lowest-`weighted_cost` candidate (verified on
+constructed candidates where lowest-cost differs from fewest-crossings), the guard test
+passes within budget, and the existing layout suite is green. Covers
+`layout-quality-eval.AC6.*`, `layout-quality-eval.AC7.*`.
+<!-- END_PHASE_5 -->
+
+## Additional Considerations
+
+**The hill-climbing ladder beyond this plan (rungs 1-3).** Rung 0 (Phase 5) is the only
+algorithm change built here. The forward path, each rung measured by the Phase 3 sweep with
+the Phase 2 significance gate and guarded by the rendered contact-sheet:
+
+- **Rung 1 -- parameter search.** Sweep SFDP `k`, `c`, `p`, the spacing constants, the seed
+  count, and SA temperature/iterations (`config.rs`, `sfdp.rs`, `annealing.rs`) against the
+  corpus geomean. No algorithm change; pure config search (grid/coordinate descent).
+- **Rung 2 -- metric-driven search objective.** Feed `weighted_cost` into the SA acceptance
+  delta (`annealing.rs`, currently `perturbed_crossings - current_crossings`) so the search
+  optimizes the full metric, not just crossings. Higher leverage but costlier per
+  perturbation than a crossing count, so it is a deliberate, measured experiment -- and may
+  use a cheap subset of terms in the inner loop.
+- **Rung 3 -- new passes.** Targeted code such as an overlap-removal post-pass or
+  obstacle-aware connector routing, each validated against the corpus.
+
+**Goodhart guard.** A scalar fitness will be gamed by any optimizer. Three mitigations are
+built in: per-term breakdowns stay visible (not just the scalar); the contact-sheet's
+best/median/worst renders are inspected every iteration (a change that improves the number
+but worsens the picture means the *metric* is wrong, not the layout); and the reference-pair
+test fails if weights stop agreeing with human-judged-better layouts.
+
+**Determinism vs. statistical sampling.** These serve different needs. The CI guard uses
+fixed seeds (deterministic, fast, flake-free). The interactive sweep varies seeds to
+characterize the algorithm's quality distribution, because a single fixed-seed measurement
+cannot distinguish a real improvement from seed luck. A specific bad layout remains exactly
+reproducible by its seed for debugging.
+
+**Sweep cost.** M seeds x corpus x (layout + a few renders) is minutes-scale on the large
+`test/metasd` models; acceptable for an on-demand example, which is why it is not in the
+test suite. M and the large-model tier are configurable.
+
+**Metric/render geometry agreement.** Computing the metric from the renderer's own geometry
+helpers (rather than the `LayoutConfig` element sizes) guarantees the score reflects what
+the PNG shows -- including the connector-polyline sampling that both the renderer and the
+crossing count share.

From f27ac2b66763d3835854e96206ada6d1682bdd6d Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 18:14:01 -0700
Subject: [PATCH 02/38] doc: refine layout quality eval plan per review

Addresses a codebase-accuracy review of the design plan so it does not
mislead the implementation planner:

- The diagram geometry helpers (elements/flow/label/connector) live in
  private modules today, so Phase 1 must first expose them pub(crate)
  before layout::metrics can reuse them; this was previously presented
  as free.
- The Arc->polyline factoring must keep render_svg byte-identical (a
  TS-vs-Rust parity test asserts it); flagged as the highest-effort
  Phase 1 item with that constraint as an acceptance condition.
- MultiPoint links currently render to nothing, so AC2.2 focuses on the
  Arc case and MultiPoint is documented as a known renderer gap.
- Corpus loading uses open_xmile/open_vensim + salsa sync (verify_layout
  is only an assertion helper, not a loader), and the per-seed seam is
  the existing generate_layout_with_config, not a new function.
- AC8.1 (determinism) is now covered by Phase 5; AC8.2 is noted as
  satisfied by the design document itself.
---
 .../2026-05-22-layout-quality-eval.md         | 101 +++++++++++-------
 1 file changed, 63 insertions(+), 38 deletions(-)

diff --git a/docs/design-plans/2026-05-22-layout-quality-eval.md b/docs/design-plans/2026-05-22-layout-quality-eval.md
index 91f1ebc60..41195892a 100644
--- a/docs/design-plans/2026-05-22-layout-quality-eval.md
+++ b/docs/design-plans/2026-05-22-layout-quality-eval.md
@@ -51,9 +51,11 @@ gaps.
    reserved zero-weighted structure terms. `weighted_cost(&MetricWeights) -> f64` collapses
    them to one scalar to minimize.
 
-2. **Edge crossings are counted on real geometry** -- arcs and multipoint connectors
-   sampled to polylines -- replacing the straight-chord approximation that
-   `count_view_crossings` (`mod.rs`) uses today.
+2. **Edge crossings are counted on real geometry** -- Arc links sampled to polylines
+   instead of straight chords -- fixing the chord approximation `count_view_crossings`
+   (`mod.rs`) applies to `Link`/Arc shapes today (flow polylines are already
+   segment-sampled). MultiPoint links currently render to nothing; see Additional
+   Considerations.
 
 3. **A Rust in-tree corpus sweep driver** (`src/simlin-engine/examples/layout_eval.rs`)
    runs over a curated `test/` corpus: for each model it generates layouts across multiple
@@ -82,7 +84,7 @@ gaps.
 
 8. **The hill-climbing ladder (rungs 1-3) is documented** as the forward path (parameter
    search; metric-driven annealing cost; new layout passes), naming the seam each rung
-   touches.
+   touches. (Satisfied by this plan's Additional Considerations -- no implementation task.)
 
 ### Out of scope
 - Redesigning the layout algorithm beyond Rung 0 (rungs 1-3 are documented, not built).
@@ -106,7 +108,7 @@ gaps.
 
 ### layout-quality-eval.AC2: Crossings are counted on real geometry
 - **layout-quality-eval.AC2.1 Success:** Two connectors that cross once yield a crossing count of 1; connectors sharing an endpoint yield 0.
-- **layout-quality-eval.AC2.2 Success:** An Arc/MultiPoint connector that visually crosses another edge is counted via polyline sampling, on a constructed case where the straight-chord approximation does not count it.
+- **layout-quality-eval.AC2.2 Success:** An Arc connector that visually crosses another edge is counted via polyline sampling, on a constructed case where the straight-chord approximation does not count it. (MultiPoint links currently render to nothing, so faithfully counting them is deferred with that renderer gap -- see Additional Considerations.)
 - **layout-quality-eval.AC2.3 Success:** The crossing count is invariant under translation and rotation of the whole view.
 
 ### layout-quality-eval.AC3: Corpus sweep produces renders and scores
@@ -138,7 +140,7 @@ gaps.
 
 ### layout-quality-eval.AC8: Cross-cutting
 - **layout-quality-eval.AC8.1 Success:** A fixed seed reproduces a byte-identical layout (determinism), distinct from the M-seed statistical sampling.
-- **layout-quality-eval.AC8.2 Success:** Additional Considerations documents rungs 1-3 and names the seam each touches.
+- **layout-quality-eval.AC8.2 Success:** Additional Considerations documents rungs 1-3 and names the seam each touches. (Satisfied by this design document itself; no implementation phase.)
 
 ## Glossary
 
@@ -232,7 +234,9 @@ pure function with no I/O. It is computed on the **same geometry the renderer dr
 node bounding boxes, connector paths, and label boxes obtained from the `diagram` module's
 existing geometry helpers (`diagram::elements`/`flow` `*_bounds`, `diagram::connector`
 path, `diagram::label::label_bounds`) -- so a layout's score and its rendered PNG can never
-disagree. Every term is a **cost** (0 = ideal) and normalized to be scale-free, so models
+disagree. Those helpers are `pub fn`, but their modules (`elements`, `flow`, `label`,
+`connector`) are private in `diagram/mod.rs` today, so a prerequisite is exposing them
+`pub(crate)` for `layout` to call. Every term is a **cost** (0 = ideal) and normalized to be scale-free, so models
 of different sizes are comparable and the corpus can be aggregated.
 
 | Term | Definition (cost; 0 = ideal) | Pain it captures |
@@ -274,11 +278,14 @@ directions (compact vs. non-overlapping). That tension is intended: the weights
 balance, and the overlap terms keep "minimize area" from collapsing the layout.
 
 **Accurate crossings.** The `crossings` term, and a refactored `count_view_crossings`,
-operate on connector geometry sampled to polylines (arcs and `MultiPoint` links, plus
-`Flow.points`), not straight chords. This requires factoring the connector's path geometry
-into a polyline producer shared by the renderer and the metric, so both see identical
-geometry. This both feeds the metric and fixes a latent undercount in today's
-seed selection.
+operate on connector geometry sampled to polylines (Arc links plus `Flow.points`), not
+straight chords. This requires factoring the arc geometry -- currently entangled with
+SVG-string emission in `connector::render_arc` (which returns a `String`) -- into a polyline
+producer shared by the renderer and the metric, so both see identical geometry. This is the
+highest-effort item in Phase 1, and the factor-out must keep `render_svg` byte-for-byte
+unchanged (a TS-vs-Rust parity test asserts it). It both feeds the metric and fixes a latent
+undercount in today's seed selection. (MultiPoint links currently render to an empty group,
+so they have no drawn geometry to match; they are a known gap, not measured here.)
 
 ### Statistics core (`layout/eval_stats.rs`, pure)
 
@@ -316,12 +323,14 @@ noise. All are pure, table-testable functions.
 
 ### Sweep driver (`examples/layout_eval.rs`, imperative shell)
 
-The shell loads each model in a curated corpus list (via the engine's model-open + salsa
-sync path used by `tests/layout.rs`), and for each model:
+The shell loads each model in a curated corpus list (XMILE via `open_xmile` and Vensim via
+`open_vensim`, as `examples/backend_bench.rs` does, then salsa-syncs the project as the
+DB-backed layout tests do), and for each model:
 
 1. Runs layout for M independent seeds, producing M `MetricSample`s (and the best-of-k
-   production proxy). Requires a per-seed layout entry point (a thin
-   `generate_layout_seeded(seed)` over the existing per-seed pipeline).
+   production proxy). The per-seed seam is the existing `generate_layout_with_config`
+   (`mod.rs`, `pub`) -- its single `annealing_random_seed` drives both the SFDP and
+   annealing RNGs -- or the equivalent `generate` closure inside `generate_best_layout`.
 2. Renders the best/median/worst layouts to PNG via `diagram::render_png` (after writing
    the generated `StockFlow` onto the model's view, which `render_png` reads as
    `views.first()`).
@@ -372,14 +381,21 @@ one in-tree example, and re-points one existing decision function.
   Roboto-Light, behind the `png_render` feature), with geometry in `elements.rs`,
   `flow.rs`, `connector.rs`, `label.rs` (`label_bounds`), `common.rs` (`Rect`,
   `calc_view_box`), and shared `constants.rs`. The metric reuses these geometry helpers so
-  scores match the rendered image. The TS renderer is byte-identical to `render_svg`, so
-  the PNG faithfully reflects the product UI.
+  scores match the rendered image -- but only `common`/`constants` are `pub mod` today, so
+  the others must be exposed (see Architecture). `render_svg` is asserted byte-identical to
+  the TS renderer by `src/diagram/tests/svg-rendering.test.ts`, so the PNG faithfully
+  reflects the product UI -- and that test is the tripwire any connector-geometry refactor
+  must not break.
 - **In-tree example precedent.** `src/simlin-engine/examples/backend_bench.rs` is an
-  existing on-demand benchmark example (VM-vs-wasm simulation); `examples/layout_eval.rs`
-  follows the same shape and `required-features` mechanism for an on-demand sweep.
-- **Corpus loading.** `tests/layout.rs` (`verify_layout`) already loads native diagram
-  models from `test/` (e.g. `open_xmile`) and syncs the salsa DB before layout; the sweep
-  driver reuses that loading path.
+  existing on-demand example (auto-discovered; loads models via `std::fs` +
+  `open_vensim`/`open_xmile`). `examples/layout_eval.rs` follows its shape; the
+  `required-features` mechanism (used today by the crate's `[[test]]` entries, not by any
+  example) means adding a new `[[example]]` block to `Cargo.toml`.
+- **Corpus loading.** `tests/layout.rs` loads XMILE via `load_project`/`open_xmile`; its
+  DB-backed tests show the salsa-sync-then-layout pattern (`SimlinDb::default()` ->
+  `sync_from_datamodel_incremental` -> pass `Some((&mut db, source_project))`). The sweep
+  combines that with `open_vensim` for the Vensim `test/metasd` models. (`verify_layout`
+  itself is only an assertion helper, not a loader.)
 - **Test-time budget.** Per `CLAUDE.md` / `docs/dev/rust.md`, `cargo test --workspace`
   runs under a 3-minute cap and individual tests complete in seconds. The full corpus sweep
   therefore stays in the example (not in tests); only a tiny deterministic guard runs in the
@@ -397,13 +413,17 @@ an existing example, one edit to an existing selection function.
 **Goal:** A pure, geometry-accurate `LayoutMetrics` and a polyline-based crossing count.
 
 **Components:**
+- Expose the `diagram` geometry modules (`elements`, `flow`, `label`, `connector`) as
+  `pub(crate)` -- they are private today, so `layout::metrics` cannot call their `*_bounds` /
+  path helpers without this.
 - `src/simlin-engine/src/layout/metrics.rs` (new) -- `LayoutMetrics`, `MetricWeights`,
   `compute_layout_metrics(view, config)`, `weighted_cost`. Each term computed on the
   `diagram` module's geometry helpers.
-- Shared connector-polyline geometry factored out of `diagram::connector` (sampling arcs
-  and `MultiPoint`), reused by the renderer and the metric.
+- Connector arc-to-polyline geometry factored out of `connector::render_arc` (highest-effort
+  item; geometry is currently entangled with SVG-string building), reused by the renderer and
+  the metric. The renderer must be re-routed through it without changing its output.
 - `count_view_crossings` (`mod.rs`) refactored to count on polylines instead of straight
-  chords.
+  chords (Arc/`Link` shapes; flow polylines are already sampled).
 - Unit tests on hand-built tiny views with known geometry (two boxes overlapping by a known
   fraction; two segments crossing once; shared-endpoint connectors -> 0; a 1x10 bbox ->
   known aspect penalty; an arc that crosses where its chord would not). Property tests:
@@ -413,7 +433,8 @@ an existing example, one edit to an existing selection function.
 
 **Done when:** the metric terms match the hand-computed values, scale/translation
 invariance holds, the polyline crossing count differs from the old chord count on the
-constructed arc case, and `cargo test` passes. Covers `layout-quality-eval.AC1.*`,
+constructed arc case, `render_svg` output is unchanged (the `svg-rendering.test.ts` parity
+test still passes), and `cargo test` passes. Covers `layout-quality-eval.AC1.*`,
 `layout-quality-eval.AC2.*`.
 <!-- END_PHASE_1 -->
 
@@ -428,7 +449,7 @@ constructed arc case, and `cargo test` passes. Covers `layout-quality-eval.AC1.*
 - Unit tests against known reference values (geomean of a known set; Mann-Whitney U on
   textbook samples; identical baseline/candidate -> zero delta, non-significant).
 
-**Dependencies:** none (uses `LayoutMetrics` types from Phase 1 for `MetricSample`).
+**Dependencies:** Phase 1 (the `LayoutMetrics` type embedded in `MetricSample`).
 
 **Done when:** the helpers match known values and `compare()` reports the expected
 significance verdicts. Covers `layout-quality-eval.AC4.4`, `layout-quality-eval.AC4.5`.
@@ -441,14 +462,17 @@ significance verdicts. Covers `layout-quality-eval.AC4.4`, `layout-quality-eval.
 **Components:**
 - `src/simlin-engine/examples/layout_eval.rs` (new) -- loads a curated corpus list
   (canonical SIR/teacup/logistic-growth; modules; multipoint connectors; LTM/loop models;
-  aliases; the `test/ai-information` set; a few large `test/metasd` Vensim models),
-  runs M seeds per model, scores each, renders best/median/worst PNGs, and scores+renders
-  any shipped hand-authored view as a reference.
-- A per-seed layout entry point (`generate_layout_seeded(seed)`) over the existing per-seed
-  pipeline, so the driver can sample seeds and compute the best-of-k proxy.
+  aliases; the `test/ai-information` set; a few large `test/metasd` Vensim models) via
+  `open_xmile`/`open_vensim` + salsa sync, runs M seeds per model, scores each, renders
+  best/median/worst PNGs, and scores+renders any shipped hand-authored view as a reference.
+- The per-seed seam: wrap `generate_layout_with_config` (`mod.rs`) or the `generate` closure
+  in `generate_best_layout`, varying `annealing_random_seed` per sample, so the driver can
+  sample seeds and compute the best-of-k proxy.
 - Emits `metrics.json`, `index.html` contact-sheet, and a `compare()` diff against a
   committed `baseline.json`, under `target/layout-eval/` (gitignored).
-- `Cargo.toml` example entry with `required-features = ["png_render", "file_io"]`.
+- A new `[[example]]` entry in `Cargo.toml` with `required-features = ["png_render",
+  "file_io"]` (no example uses `required-features` today; `file_io` helps load Vensim models
+  that reference external data, and AC3.6 skip-on-failure covers any that still fail).
 
 **Dependencies:** Phase 1 (metric), Phase 2 (stats).
 
@@ -488,16 +512,17 @@ has signed off on the weights after reviewing the contact-sheet. Covers
 - `select_best_layout` (`mod.rs`) re-pointed to minimize `weighted_cost` (accurate
   crossings), tie-break on seed.
 - A deterministic regression-guard test over a few tiny models asserting `weighted_cost`
-  stays at or below a committed threshold (fixed seeds; fast; under the time budget).
+  stays at or below a committed threshold (fixed seeds; fast; under the time budget), plus a
+  determinism check (the same seed reproduces a byte-identical layout).
 - Confirm existing layout tests (`tests/layout.rs`, `layout_tests.rs`,
   `layout_review_tests.rs`) still pass with the new selection.
 
 **Dependencies:** Phase 1 (metric), Phase 4 (committed weights).
 
 **Done when:** selection picks the lowest-`weighted_cost` candidate (verified on
-constructed candidates where lowest-cost differs from fewest-crossings), the guard test
-passes within budget, and the existing layout suite is green. Covers
-`layout-quality-eval.AC6.*`, `layout-quality-eval.AC7.*`.
+constructed candidates where lowest-cost differs from fewest-crossings), the guard +
+determinism tests pass within budget, and the existing layout suite is green. Covers
+`layout-quality-eval.AC6.*`, `layout-quality-eval.AC7.*`, `layout-quality-eval.AC8.1`.
 <!-- END_PHASE_5 -->
 
 ## Additional Considerations

From c33fcd374ae4d9e57d5e8ba9167a1da60d220451 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 20:47:15 -0700
Subject: [PATCH 03/38] engine: expose diagram geometry modules to layout
 (pub(crate))

---
 src/simlin-engine/src/diagram/mod.rs | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/simlin-engine/src/diagram/mod.rs b/src/simlin-engine/src/diagram/mod.rs
index 32742b326..f9cd8f614 100644
--- a/src/simlin-engine/src/diagram/mod.rs
+++ b/src/simlin-engine/src/diagram/mod.rs
@@ -4,11 +4,11 @@
 
 mod arrowhead;
 pub mod common;
-mod connector;
+pub(crate) mod connector;
 pub mod constants;
-mod elements;
-mod flow;
-mod label;
+pub(crate) mod elements;
+pub(crate) mod flow;
+pub(crate) mod label;
 mod render;
 #[cfg(feature = "png_render")]
 mod render_png;

From 39c09f06ed02050661ecdc82c10b652773369c93 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 20:52:16 -0700
Subject: [PATCH 04/38] engine: add rect overlap + segment-clip helpers to
 diagram::common

---
 src/simlin-engine/src/diagram/common.rs | 273 ++++++++++++++++++++++++
 1 file changed, 273 insertions(+)

diff --git a/src/simlin-engine/src/diagram/common.rs b/src/simlin-engine/src/diagram/common.rs
index cf4a16596..e6a10d789 100644
--- a/src/simlin-engine/src/diagram/common.rs
+++ b/src/simlin-engine/src/diagram/common.rs
@@ -137,6 +137,89 @@ pub fn rad_to_deg(r: f64) -> f64 {
     (r * 180.0) / PI
 }
 
+// These rectangle/segment geometry primitives are the load-bearing helpers for
+// the layout quality metric (`layout::metrics`, added in a later task of this
+// phase). They are exercised by the inline tests below; the production callers
+// (node-overlap, label-overlap, and node-connector-overlap terms) land in
+// subsequent tasks, so each is `#[allow(dead_code)]` until then.
+
+/// Width of a rect (right - left). May be negative for a degenerate/inverted rect.
+#[allow(dead_code)]
+pub(crate) fn rect_width(r: &Rect) -> f64 {
+    r.right - r.left
+}
+
+/// Height of a rect (bottom - top).
+#[allow(dead_code)]
+pub(crate) fn rect_height(r: &Rect) -> f64 {
+    r.bottom - r.top
+}
+
+/// Area of a rect, clamped to >= 0.
+#[allow(dead_code)]
+pub(crate) fn rect_area(r: &Rect) -> f64 {
+    (rect_width(r).max(0.0)) * (rect_height(r).max(0.0))
+}
+
+/// Area of the axis-aligned intersection of two rects (0 if they do not overlap).
+#[allow(dead_code)]
+pub(crate) fn rect_overlap_area(a: &Rect, b: &Rect) -> f64 {
+    let w = a.right.min(b.right) - a.left.max(b.left);
+    let h = a.bottom.min(b.bottom) - a.top.max(b.top);
+    if w > 0.0 && h > 0.0 { w * h } else { 0.0 }
+}
+
+/// True if `p` lies inside (or on the boundary of) `r`.
+#[allow(dead_code)]
+pub(crate) fn rect_contains_point(r: &Rect, p: &Point) -> bool {
+    p.x >= r.left && p.x <= r.right && p.y >= r.top && p.y <= r.bottom
+}
+
+/// Length of the portion of segment p0->p1 that lies within axis-aligned rect r.
+/// Returns 0 if the segment never enters r. Pure; no allocation.
+#[allow(dead_code)]
+pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 {
+    // Liang-Barsky clip of the parametric segment p0 + t*(p1-p0), t in [0,1],
+    // against left/right/top/bottom slabs.
+    let dx = p1.x - p0.x;
+    let dy = p1.y - p0.y;
+    let mut t0 = 0.0_f64;
+    let mut t1 = 1.0_f64;
+    // (p, q) pairs for the four half-planes; segment inside slab where p*t <= q.
+    let edges = [
+        (-dx, p0.x - r.left),
+        (dx, r.right - p0.x),
+        (-dy, p0.y - r.top),
+        (dy, r.bottom - p0.y),
+    ];
+    for (p, q) in edges {
+        if p == 0.0 {
+            if q < 0.0 {
+                return 0.0; // parallel and outside this slab
+            }
+        } else {
+            let t = q / p;
+            if p < 0.0 {
+                if t > t1 {
+                    return 0.0;
+                }
+                if t > t0 {
+                    t0 = t;
+                }
+            } else {
+                if t < t0 {
+                    return 0.0;
+                }
+                if t < t1 {
+                    t1 = t;
+                }
+            }
+        }
+    }
+    let seg_len = (dx * dx + dy * dy).sqrt();
+    (t1 - t0).max(0.0) * seg_len
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -282,4 +365,194 @@ mod tests {
         assert!((rad_to_deg(PI) - 180.0).abs() < 1e-10);
         assert!((rad_to_deg(PI / 2.0) - 90.0).abs() < 1e-10);
     }
+
+    #[test]
+    fn test_rect_dimensions() {
+        let r = Rect {
+            top: 10.0,
+            left: 20.0,
+            right: 50.0,
+            bottom: 70.0,
+        };
+        assert_eq!(rect_width(&r), 30.0);
+        assert_eq!(rect_height(&r), 60.0);
+        assert_eq!(rect_area(&r), 30.0 * 60.0);
+    }
+
+    #[test]
+    fn test_rect_area_clamps_negative() {
+        // An inverted/degenerate rect (right < left, bottom < top) has
+        // negative width/height; rect_area clamps each to 0 so the result is 0.
+        let inverted = Rect {
+            top: 70.0,
+            left: 50.0,
+            right: 20.0,
+            bottom: 10.0,
+        };
+        assert!(rect_width(&inverted) < 0.0);
+        assert!(rect_height(&inverted) < 0.0);
+        assert_eq!(rect_area(&inverted), 0.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_known_overlap() {
+        // a covers x in [0,10], y in [0,10]; b covers x in [5,15], y in [5,15].
+        // Their intersection is x in [5,10], y in [5,10] => 5 x 5 = 25.
+        let a = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let b = Rect {
+            top: 5.0,
+            left: 5.0,
+            right: 15.0,
+            bottom: 15.0,
+        };
+        assert_eq!(rect_overlap_area(&a, &b), 25.0);
+        // Overlap is symmetric in argument order.
+        assert_eq!(rect_overlap_area(&b, &a), 25.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_disjoint() {
+        let a = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let b = Rect {
+            top: 20.0,
+            left: 20.0,
+            right: 30.0,
+            bottom: 30.0,
+        };
+        assert_eq!(rect_overlap_area(&a, &b), 0.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_identical() {
+        // Two identical rects overlap by their full area.
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 4.0,
+        };
+        assert_eq!(rect_overlap_area(&r, &r), rect_area(&r));
+        assert_eq!(rect_overlap_area(&r, &r), 40.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_touching_edge() {
+        // b's left edge touches a's right edge (both at x=10): zero-width overlap => 0.
+        let a = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let b = Rect {
+            top: 0.0,
+            left: 10.0,
+            right: 20.0,
+            bottom: 10.0,
+        };
+        assert_eq!(rect_overlap_area(&a, &b), 0.0);
+    }
+
+    #[test]
+    fn test_rect_contains_point() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Strictly inside.
+        assert!(rect_contains_point(&r, &Point { x: 5.0, y: 5.0 }));
+        // On the boundary (inclusive).
+        assert!(rect_contains_point(&r, &Point { x: 0.0, y: 0.0 }));
+        assert!(rect_contains_point(&r, &Point { x: 10.0, y: 10.0 }));
+        assert!(rect_contains_point(&r, &Point { x: 0.0, y: 5.0 }));
+        // Outside on each side.
+        assert!(!rect_contains_point(&r, &Point { x: -1.0, y: 5.0 }));
+        assert!(!rect_contains_point(&r, &Point { x: 11.0, y: 5.0 }));
+        assert!(!rect_contains_point(&r, &Point { x: 5.0, y: -1.0 }));
+        assert!(!rect_contains_point(&r, &Point { x: 5.0, y: 11.0 }));
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_crosses_fully() {
+        // Rect spans x in [0,10], y in [0,10]. A horizontal segment from
+        // (-5, 5) to (15, 5) enters at x=0 and exits at x=10 => inside length 10.
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let got =
+            segment_length_in_rect(&Point { x: -5.0, y: 5.0 }, &Point { x: 15.0, y: 5.0 }, &r);
+        assert!((got - 10.0).abs() < 1e-9, "got {got}");
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_entirely_outside() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Segment well above the rect, never enters.
+        let got =
+            segment_length_in_rect(&Point { x: -5.0, y: 50.0 }, &Point { x: 15.0, y: 50.0 }, &r);
+        assert_eq!(got, 0.0);
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_entirely_inside() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Segment from (2,2) to (5,6): both endpoints inside; full length is
+        // sqrt(3^2 + 4^2) = 5.
+        let got = segment_length_in_rect(&Point { x: 2.0, y: 2.0 }, &Point { x: 5.0, y: 6.0 }, &r);
+        assert!((got - 5.0).abs() < 1e-9, "got {got}");
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_one_endpoint_inside() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Horizontal segment from (5,5) (inside) to (25,5) (outside): the
+        // portion inside runs from x=5 to x=10 => length 5.
+        let got = segment_length_in_rect(&Point { x: 5.0, y: 5.0 }, &Point { x: 25.0, y: 5.0 }, &r);
+        assert!((got - 5.0).abs() < 1e-9, "got {got}");
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_parallel_outside_slab() {
+        // A vertical segment to the left of the rect is parallel to the
+        // left/right slabs and outside them: dx == 0 with q < 0 => 0.
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let got =
+            segment_length_in_rect(&Point { x: -5.0, y: -5.0 }, &Point { x: -5.0, y: 15.0 }, &r);
+        assert_eq!(got, 0.0);
+    }
 }

From d643bdbc0e81b588be39e4450594b3214f85cd13 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 21:03:12 -0700
Subject: [PATCH 05/38] engine: factor arc geometry out of render_arc into a
 shared polyline producer

---
 src/simlin-engine/src/diagram/connector.rs | 310 +++++++++++++++++++--
 1 file changed, 281 insertions(+), 29 deletions(-)

diff --git a/src/simlin-engine/src/diagram/connector.rs b/src/simlin-engine/src/diagram/connector.rs
index a59f5a0e0..875c5ad49 100644
--- a/src/simlin-engine/src/diagram/connector.rs
+++ b/src/simlin-engine/src/diagram/connector.rs
@@ -13,6 +13,18 @@ use crate::diagram::common::{
 };
 use crate::diagram::constants::*;
 
+/// Number of straight segments used to approximate a drawn arc connector when
+/// producing its polyline for crossing detection and metric computation. 16
+/// segments closely tracks the curve: the maximum chord-to-arc deviation for a
+/// half-circle sampled this finely is well under a pixel at typical diagram
+/// radii, which is more than enough to detect whether the arc crosses another
+/// edge. It does not affect rendered SVG (the renderer still emits a single
+/// `A` arc command); it only governs the sampled geometry the metric sees.
+// Production callers (crate::layout::count_view_crossings / metrics.rs) arrive
+// in Tasks 4 and 5; remove this allow when the first crate-side caller lands.
+#[allow(dead_code)]
+pub(crate) const ARC_POLYLINE_SAMPLES: usize = 16;
+
 enum ElementShape {
     Circle { r: f64 },
     Rect { hw: f64, hh: f64 },
@@ -101,7 +113,10 @@ fn is_element_arrayed(element: &ViewElement, is_arrayed_fn: &dyn Fn(&str) -> boo
     }
 }
 
-fn get_visual_center(element: &ViewElement, is_arrayed_fn: &dyn Fn(&str) -> bool) -> (f64, f64) {
+pub(crate) fn get_visual_center(
+    element: &ViewElement,
+    is_arrayed_fn: &dyn Fn(&str) -> bool,
+) -> (f64, f64) {
     let (cx, cy) = match element {
         ViewElement::Aux(a) => (a.x, a.y),
         ViewElement::Stock(s) => (s.x, s.y),
@@ -140,7 +155,7 @@ fn circle_from_points(p1: Point, p2: Point, p3: Point) -> Result<Circle, &'stati
     Ok(Circle { x: cx, y: cy, r })
 }
 
-fn opposite_theta(theta: f64) -> f64 {
+pub(crate) fn opposite_theta(theta: f64) -> f64 {
     let mut t = theta + PI;
     if t > PI {
         t -= 2.0 * PI;
@@ -148,7 +163,7 @@ fn opposite_theta(theta: f64) -> f64 {
     t
 }
 
-fn intersect_element_straight(
+pub(crate) fn intersect_element_straight(
     element: &ViewElement,
     theta: f64,
     is_arrayed_fn: &dyn Fn(&str) -> bool,
@@ -164,7 +179,7 @@ fn intersect_element_straight(
     }
 }
 
-fn intersect_element_arc(
+pub(crate) fn intersect_element_arc(
     element: &ViewElement,
     circ: &Circle,
     inv: bool,
@@ -215,7 +230,7 @@ fn intersect_element_arc(
     }
 }
 
-fn is_straight_line(
+pub(crate) fn is_straight_line(
     element: &view_element::Link,
     from: &ViewElement,
     to: &ViewElement,
@@ -234,7 +249,7 @@ fn is_straight_line(
     }
 }
 
-fn arc_circle(
+pub(crate) fn arc_circle(
     element: &view_element::Link,
     from: &ViewElement,
     to: &ViewElement,
@@ -342,24 +357,48 @@ fn render_straight_line(
     svg
 }
 
-fn render_arc(
+/// The exact scalars `render_arc` needs to format its SVG, plus what an arc
+/// sampler needs to reproduce the drawn curve as a polyline. All fields are
+/// raw f64 (no pre-rounding): rounding happens only at the `js_format_number`
+/// boundary in `render_arc`, so the SVG string stays byte-for-byte identical
+/// to the pre-factor-out code (and to the TypeScript renderer).
+#[derive(Clone, Copy)]
+struct ArcGeometry {
+    /// SVG path start (= `from_visual`, the source element center).
+    start: Point,
+    /// SVG path end (= `to_visual`, the target element center).
+    arc_end: Point,
+    /// Arc center and radius.
+    circ: Circle,
+    /// SVG large-arc-flag.
+    sweep: bool,
+    /// SVG sweep-flag.
+    inv: bool,
+    /// Arrowhead anchor point on the target element boundary.
+    end: Point,
+    /// Final arrowhead rotation in degrees (already adjusted for `inv`).
+    arrow_theta: f64,
+}
+
+/// Compute the drawn-arc geometry for a connector. Returns `None` in the two
+/// cases the renderer draws nothing: a non-`Arc` shape (e.g. `MultiPoint`) and
+/// a degenerate arc where `arc_circle` cannot be constructed. The body is the
+/// verbatim geometry the original `render_arc` computed (lines that produced
+/// `circ`, `inv`, `sweep`, `start`, `arc_end`, `end`, and `arrow_theta`).
+fn arc_geometry(
     element: &view_element::Link,
     from: &ViewElement,
     to: &ViewElement,
-    is_to_stock: bool,
     is_arrayed_fn: &dyn Fn(&str) -> bool,
-) -> String {
+) -> Option<ArcGeometry> {
     let from_visual = get_visual_center(from, is_arrayed_fn);
     let to_visual = get_visual_center(to, is_arrayed_fn);
 
-    let circ = match arc_circle(element, from, to, is_arrayed_fn) {
-        Some(c) => c,
-        None => return "<g></g>".to_string(),
-    };
+    let circ = arc_circle(element, from, to, is_arrayed_fn)?;
 
     let takeoff_angle = match &element.shape {
         LinkShape::Arc(arc) => deg_to_rad(*arc),
-        _ => return "<g></g>".to_string(),
+        _ => return None,
     };
 
     let from_theta = (from_visual.1 - circ.y).atan2(from_visual.0 - circ.x);
@@ -397,23 +436,126 @@ fn render_arc(
     };
     let end = intersect_element_arc(to, &circ, !inv, is_arrayed_fn);
 
-    let path = format!(
-        "M{},{}A{},{} 0 {},{} {},{}",
-        js_format_number(start.x),
-        js_format_number(start.y),
-        js_format_number(circ.r),
-        js_format_number(circ.r),
-        sweep as u8,
-        inv as u8,
-        js_format_number(arc_end.x),
-        js_format_number(arc_end.y)
-    );
-
     let mut arrow_theta = rad_to_deg((end.y - circ.y).atan2(end.x - circ.x)) - 90.0;
     if inv {
         arrow_theta += 180.0;
     }
 
+    Some(ArcGeometry {
+        start,
+        arc_end,
+        circ,
+        sweep,
+        inv,
+        end,
+        arrow_theta,
+    })
+}
+
+/// Sample the drawn SVG arc as a polyline from `g.start` to `g.arc_end` along
+/// `g.circ`, honoring the SVG large-arc (`g.sweep`) and sweep (`g.inv`) flags.
+/// Uses the standard SVG endpoint->center arc parametrization: derive the
+/// start angle and a signed sweep `delta` from the two endpoint angles, then
+/// adjust `delta` so its sign matches the sweep-flag and its magnitude matches
+/// the large-arc-flag. Returns `samples.max(2) + 1` points.
+// Reached only through `connector_polyline`, whose first non-test caller lands
+// in Task 4; remove this allow then.
+#[allow(dead_code)]
+fn sample_arc(g: &ArcGeometry, samples: usize) -> Vec<Point> {
+    let n = samples.max(2);
+    let theta0 = (g.start.y - g.circ.y).atan2(g.start.x - g.circ.x);
+    let theta1 = (g.arc_end.y - g.circ.y).atan2(g.arc_end.x - g.circ.x);
+    // SVG sweep-flag (g.inv) selects direction; large-arc-flag (g.sweep)
+    // selects the >180-degree arc. Normalize delta accordingly.
+    let mut delta = theta1 - theta0;
+    let two_pi = 2.0 * std::f64::consts::PI;
+    // bring delta into (-2pi, 2pi)
+    while delta <= -two_pi {
+        delta += two_pi;
+    }
+    while delta >= two_pi {
+        delta -= two_pi;
+    }
+    let sweep_positive = g.inv; // sweep-flag set => angles increase
+    if sweep_positive && delta < 0.0 {
+        delta += two_pi;
+    }
+    if !sweep_positive && delta > 0.0 {
+        delta -= two_pi;
+    }
+    let large = g.sweep; // large-arc-flag
+    if large && delta.abs() < std::f64::consts::PI {
+        delta += if delta >= 0.0 { two_pi } else { -two_pi };
+    }
+    if !large && delta.abs() > std::f64::consts::PI {
+        delta += if delta >= 0.0 { -two_pi } else { two_pi };
+    }
+    (0..=n)
+        .map(|i| {
+            let t = i as f64 / n as f64;
+            let th = theta0 + delta * t;
+            Point {
+                x: g.circ.x + g.circ.r * th.cos(),
+                y: g.circ.y + g.circ.r * th.sin(),
+            }
+        })
+        .collect()
+}
+
+/// The polyline the renderer draws for a connector, as the metric/crossing
+/// code sees it. Straight links are clipped to element boundaries (matching
+/// `render_straight_line`); arcs are sampled center-to-center along the arc
+/// circle (matching `render_arc`, which draws start=from_visual to
+/// arc_end=to_visual); MultiPoint links return an empty vec because the
+/// renderer draws nothing for them today (known gap).
+// First non-test caller (crate::layout::count_view_crossings) lands in Task 4;
+// remove this allow then.
+#[allow(dead_code)]
+pub(crate) fn connector_polyline(
+    element: &view_element::Link,
+    from: &ViewElement,
+    to: &ViewElement,
+    is_arrayed_fn: &dyn Fn(&str) -> bool,
+    arc_samples: usize,
+) -> Vec<Point> {
+    if is_straight_line(element, from, to, is_arrayed_fn) {
+        let from_visual = get_visual_center(from, is_arrayed_fn);
+        let to_visual = get_visual_center(to, is_arrayed_fn);
+        let theta = (to_visual.1 - from_visual.1).atan2(to_visual.0 - from_visual.0);
+        let start = intersect_element_straight(from, theta, is_arrayed_fn);
+        let end = intersect_element_straight(to, opposite_theta(theta), is_arrayed_fn);
+        return vec![start, end];
+    }
+    match arc_geometry(element, from, to, is_arrayed_fn) {
+        None => Vec::new(), // MultiPoint or degenerate arc: renderer draws nothing
+        Some(g) => sample_arc(&g, arc_samples),
+    }
+}
+
+fn render_arc(
+    element: &view_element::Link,
+    from: &ViewElement,
+    to: &ViewElement,
+    is_to_stock: bool,
+    is_arrayed_fn: &dyn Fn(&str) -> bool,
+) -> String {
+    let g = match arc_geometry(element, from, to, is_arrayed_fn) {
+        Some(g) => g,
+        None => return "<g></g>".to_string(),
+    };
+
+    let path = format!(
+        "M{},{}A{},{} 0 {},{} {},{}",
+        js_format_number(g.start.x),
+        js_format_number(g.start.y),
+        js_format_number(g.circ.r),
+        js_format_number(g.circ.r),
+        g.sweep as u8,
+        g.inv as u8,
+        js_format_number(g.arc_end.x),
+        js_format_number(g.arc_end.y)
+    );
+
     let connector_class = if is_to_stock {
         "simlin-connector simlin-connector-dashed"
     } else {
@@ -432,9 +574,9 @@ fn render_arc(
         connector_class
     ));
     svg.push_str(&render_arrowhead(
-        end.x,
-        end.y,
-        arrow_theta,
+        g.end.x,
+        g.end.y,
+        g.arrow_theta,
         ARROWHEAD_RADIUS,
         ArrowheadType::Connector,
     ));
@@ -558,6 +700,116 @@ mod tests {
         assert!(svg.contains("simlin-arrowhead-link"));
     }
 
+    /// Byte-identical regression guard for the arc factor-out. The expected
+    /// string was captured from the pre-refactor `render_arc` output for this
+    /// exact Arc link; the geometry extraction must not change a single byte
+    /// (the `svg-rendering.test.ts` parity test asserts Rust SVG == TS SVG).
+    #[test]
+    fn test_render_arc_svg_byte_identical() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::Arc(30.0),
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 200.0, "b", 2);
+
+        let svg = render_connector(&link, &from, &to, &not_arrayed);
+        let expected = "<g><path d=\"M100,100A273.20508075688764,273.20508075688764 0 0,1 200,200\" class=\"simlin-connector-bg\"></path><path d=\"M100,100A273.20508075688764,273.20508075688764 0 0,1 200,200\" class=\"simlin-connector\"></path><g><path d=\"M199.87072507234473,192.27852897536678L188.62072507234473,196.77852897536678A27,27 0 0,1 188.62072507234473,187.77852897536678z\" class=\"simlin-arrowhead-bg\" transform=\"rotate(58.1118629772876,195.37072507234473,192.27852897536678)\"></path><path d=\"M195.37072507234473,192.27852897536678L189.37072507234473,195.27852897536678A18,18 0 0,1 189.37072507234473,189.27852897536678z\" class=\"simlin-arrowhead-link\" transform=\"rotate(58.1118629772876,195.37072507234473,192.27852897536678)\"></path></g></g>";
+        assert_eq!(svg, expected);
+        assert!(svg.starts_with("<g><path d=\"M100,100A"));
+    }
+
+    #[test]
+    fn test_connector_polyline_straight_uses_boundary_endpoints() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::Straight,
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 100.0, "b", 2);
+
+        let poly = connector_polyline(&link, &from, &to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+        assert_eq!(poly.len(), 2, "straight link yields exactly two points");
+
+        // Endpoints are clipped to the element boundary (AUX_RADIUS), NOT the
+        // raw centers (100,100) and (200,100). theta = 0 along +x.
+        let expected_start = intersect_element_straight(&from, 0.0, &not_arrayed);
+        let expected_end = intersect_element_straight(&to, opposite_theta(0.0), &not_arrayed);
+        assert!((poly[0].x - expected_start.x).abs() < 1e-9);
+        assert!((poly[0].y - expected_start.y).abs() < 1e-9);
+        assert!((poly[1].x - expected_end.x).abs() < 1e-9);
+        assert!((poly[1].y - expected_end.y).abs() < 1e-9);
+        // Sanity: start is offset from the center by AUX_RADIUS, not at center.
+        assert!((poly[0].x - (100.0 + AUX_RADIUS)).abs() < 1e-9);
+        assert!((poly[1].x - (200.0 - AUX_RADIUS)).abs() < 1e-9);
+    }
+
+    #[test]
+    fn test_connector_polyline_arc_samples_on_circle() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::Arc(30.0),
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 200.0, "b", 2);
+
+        let poly = connector_polyline(&link, &from, &to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+        assert_eq!(
+            poly.len(),
+            ARC_POLYLINE_SAMPLES + 1,
+            "arc yields ARC_POLYLINE_SAMPLES segments => N+1 points"
+        );
+
+        // The drawn arc goes center-to-center (start = from_visual,
+        // arc_end = to_visual).
+        let first = poly.first().unwrap();
+        let last = poly.last().unwrap();
+        assert!((first.x - 100.0).abs() < 1e-6 && (first.y - 100.0).abs() < 1e-6);
+        assert!((last.x - 200.0).abs() < 1e-6 && (last.y - 200.0).abs() < 1e-6);
+
+        // Every sampled point lies on the arc circle.
+        let circ = arc_circle(&link, &from, &to, &not_arrayed).unwrap();
+        for p in &poly {
+            let d = (square(p.x - circ.x) + square(p.y - circ.y)).sqrt();
+            assert!(
+                (d - circ.r).abs() < 1e-6,
+                "point ({}, {}) not on arc circle: dist {} vs r {}",
+                p.x,
+                p.y,
+                d,
+                circ.r
+            );
+        }
+    }
+
+    #[test]
+    fn test_connector_polyline_multipoint_is_empty() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::MultiPoint(vec![]),
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 200.0, "b", 2);
+
+        let poly = connector_polyline(&link, &from, &to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+        assert!(
+            poly.is_empty(),
+            "MultiPoint links draw nothing, so the polyline is empty"
+        );
+    }
+
     // --- ray_rect_intersection tests ---
 
     fn assert_on_rect_boundary(p: Point, cx: f64, cy: f64, hw: f64, hh: f64) {

From 65e564c1639a981e827d7d11450457833666bcae Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 21:15:14 -0700
Subject: [PATCH 06/38] engine: count view crossings on sampled connector
 polylines

---
 src/simlin-engine/src/diagram/connector.rs    |   9 -
 .../src/layout/crossings_tests.rs             | 221 ++++++++++++++++++
 src/simlin-engine/src/layout/mod.rs           | 117 +++++++---
 3 files changed, 307 insertions(+), 40 deletions(-)
 create mode 100644 src/simlin-engine/src/layout/crossings_tests.rs

diff --git a/src/simlin-engine/src/diagram/connector.rs b/src/simlin-engine/src/diagram/connector.rs
index 875c5ad49..14ed42f0f 100644
--- a/src/simlin-engine/src/diagram/connector.rs
+++ b/src/simlin-engine/src/diagram/connector.rs
@@ -20,9 +20,6 @@ use crate::diagram::constants::*;
 /// radii, which is more than enough to detect whether the arc crosses another
 /// edge. It does not affect rendered SVG (the renderer still emits a single
 /// `A` arc command); it only governs the sampled geometry the metric sees.
-// Production callers (crate::layout::count_view_crossings / metrics.rs) arrive
-// in Tasks 4 and 5; remove this allow when the first crate-side caller lands.
-#[allow(dead_code)]
 pub(crate) const ARC_POLYLINE_SAMPLES: usize = 16;
 
 enum ElementShape {
@@ -458,9 +455,6 @@ fn arc_geometry(
 /// start angle and a signed sweep `delta` from the two endpoint angles, then
 /// adjust `delta` so its sign matches the sweep-flag and its magnitude matches
 /// the large-arc-flag. Returns `samples.max(2) + 1` points.
-// Reached only through `connector_polyline`, whose first non-test caller lands
-// in Task 4; remove this allow then.
-#[allow(dead_code)]
 fn sample_arc(g: &ArcGeometry, samples: usize) -> Vec<Point> {
     let n = samples.max(2);
     let theta0 = (g.start.y - g.circ.y).atan2(g.start.x - g.circ.x);
@@ -508,9 +502,6 @@ fn sample_arc(g: &ArcGeometry, samples: usize) -> Vec<Point> {
 /// circle (matching `render_arc`, which draws start=from_visual to
 /// arc_end=to_visual); MultiPoint links return an empty vec because the
 /// renderer draws nothing for them today (known gap).
-// First non-test caller (crate::layout::count_view_crossings) lands in Task 4;
-// remove this allow then.
-#[allow(dead_code)]
 pub(crate) fn connector_polyline(
     element: &view_element::Link,
     from: &ViewElement,
diff --git a/src/simlin-engine/src/layout/crossings_tests.rs b/src/simlin-engine/src/layout/crossings_tests.rs
new file mode 100644
index 000000000..35c239399
--- /dev/null
+++ b/src/simlin-engine/src/layout/crossings_tests.rs
@@ -0,0 +1,221 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+//! Tests for the polyline-based `count_view_crossings` / `build_view_segments`
+//! (Phase 1, Task 4 of the layout quality eval). Kept in their own file so the
+//! `layout_tests.rs` integration suite stays under the per-file line cap.
+
+use super::*;
+
+fn cv_aux(uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Aux(view_element::Aux {
+        name: format!("a{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+        compat: None,
+    })
+}
+
+fn cv_module(uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Module(view_element::Module {
+        name: format!("m{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+    })
+}
+
+fn cv_link(uid: i32, from_uid: i32, to_uid: i32, shape: LinkShape) -> ViewElement {
+    ViewElement::Link(view_element::Link {
+        uid,
+        from_uid,
+        to_uid,
+        shape,
+        polarity: None,
+    })
+}
+
+fn cv_view(elements: Vec<ViewElement>) -> datamodel::StockFlow {
+    datamodel::StockFlow {
+        name: None,
+        elements,
+        view_box: Rect {
+            x: 0.0,
+            y: 0.0,
+            width: 1000.0,
+            height: 1000.0,
+        },
+        zoom: 1.0,
+        use_lettered_polarity: false,
+        font: None,
+        sketch_compat: None,
+    }
+}
+
+/// AC2.1: two straight links that cross once yield a crossing count of 1.
+#[test]
+fn test_count_view_crossings_two_straight_links_cross_once() {
+    // Link 1: a1(0,0) -> a2(100,100). Link 2: a3(0,100) -> a4(100,0).
+    // The two diagonals of a square cross exactly once at the center.
+    let view = cv_view(vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_aux(2, 100.0, 100.0),
+        cv_aux(3, 0.0, 100.0),
+        cv_aux(4, 100.0, 0.0),
+        cv_link(10, 1, 2, LinkShape::Straight),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ]);
+
+    assert_eq!(count_view_crossings(&view), 1);
+}
+
+/// AC2.1: two links sharing an endpoint element yield 0 crossings.
+#[test]
+fn test_count_view_crossings_shared_endpoint_no_crossing() {
+    // Both links start at a1; sharing the `elem_1` vertex suppresses any
+    // intersection at the shared endpoint.
+    let view = cv_view(vec![
+        cv_aux(1, 50.0, 50.0),
+        cv_aux(2, 100.0, 0.0),
+        cv_aux(3, 100.0, 100.0),
+        cv_link(10, 1, 2, LinkShape::Straight),
+        cv_link(11, 1, 3, LinkShape::Straight),
+    ]);
+
+    assert_eq!(count_view_crossings(&view), 0);
+}
+
+/// AC2.2: an Arc connector that visually crosses another edge is counted via
+/// polyline sampling, on a case where the straight-chord approximation does
+/// not count it. The arc from a1(0,0) to a2(200,0) bulges down to a peak near
+/// (100, 57.7); a horizontal straight link c-d at y=50 (from x=40 to x=160)
+/// passes through the bulge, crossing the curve twice (near x=58 and x=142),
+/// while the arc's straight chord (the line y=0) stays well clear of it. So the
+/// old chord-based count is 0 and the new polyline-based count is >= 1.
+#[test]
+fn test_count_view_crossings_arc_curve_crosses_when_chord_does_not() {
+    let view = cv_view(vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_aux(2, 200.0, 0.0),
+        cv_aux(3, 40.0, 50.0),
+        cv_aux(4, 160.0, 50.0),
+        // Wide arc: large take-off angle so the curve bulges well below the
+        // straight chord between the two endpoints.
+        cv_link(10, 1, 2, LinkShape::Arc(60.0)),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ]);
+
+    // The straight-chord approximation (centers, ignoring shape) does NOT
+    // count this crossing: build those chord segments inline and confirm 0.
+    let p1 = Position::new(0.0, 0.0);
+    let p2 = Position::new(200.0, 0.0);
+    let p3 = Position::new(40.0, 50.0);
+    let p4 = Position::new(160.0, 50.0);
+    let chord_segments = vec![
+        LineSegment {
+            start: p1,
+            end: p2,
+            from_node: "elem_1".to_string(),
+            to_node: "elem_2".to_string(),
+        },
+        LineSegment {
+            start: p3,
+            end: p4,
+            from_node: "elem_3".to_string(),
+            to_node: "elem_4".to_string(),
+        },
+    ];
+    assert_eq!(
+        annealing::count_crossings(&chord_segments),
+        0,
+        "chord approximation must not see this crossing"
+    );
+
+    // The polyline (sampled arc) DOES count it.
+    assert!(
+        count_view_crossings(&view) >= 1,
+        "sampled arc curve must cross the straight link"
+    );
+}
+
+/// AC2.3: the crossing count is invariant under translation and rotation of
+/// the whole view.
+#[test]
+fn test_count_view_crossings_translation_rotation_invariant() {
+    let base = vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_aux(2, 100.0, 100.0),
+        cv_aux(3, 0.0, 100.0),
+        cv_aux(4, 100.0, 0.0),
+        cv_link(10, 1, 2, LinkShape::Arc(25.0)),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ];
+    let base_count = count_view_crossings(&cv_view(base.clone()));
+
+    // Translate every coordinate by a fixed offset.
+    let translated: Vec<ViewElement> = base
+        .iter()
+        .map(|e| transform_element(e, |x, y| (x + 137.0, y - 89.0)))
+        .collect();
+    assert_eq!(
+        count_view_crossings(&cv_view(translated)),
+        base_count,
+        "translation must preserve crossing count"
+    );
+
+    // Rotate every coordinate about the origin by a fixed angle.
+    let theta = 0.7_f64; // radians
+    let (s, c) = theta.sin_cos();
+    let rotated: Vec<ViewElement> = base
+        .iter()
+        .map(|e| transform_element(e, |x, y| (x * c - y * s, x * s + y * c)))
+        .collect();
+    assert_eq!(
+        count_view_crossings(&cv_view(rotated)),
+        base_count,
+        "rotation must preserve crossing count"
+    );
+}
+
+/// Apply a coordinate transform to the (x, y) of a positioned view element.
+/// Links carry no coordinates of their own and pass through unchanged.
+fn transform_element(e: &ViewElement, f: impl Fn(f64, f64) -> (f64, f64)) -> ViewElement {
+    match e {
+        ViewElement::Aux(a) => {
+            let (x, y) = f(a.x, a.y);
+            ViewElement::Aux(view_element::Aux { x, y, ..a.clone() })
+        }
+        ViewElement::Module(m) => {
+            let (x, y) = f(m.x, m.y);
+            ViewElement::Module(view_element::Module { x, y, ..m.clone() })
+        }
+        other => other.clone(),
+    }
+}
+
+/// Module/Alias undercount fix: a link from an Aux to a Module that crosses
+/// another link is now counted. Previously Module-incident links were dropped
+/// from the segment set entirely, so this crossing was invisible.
+#[test]
+fn test_count_view_crossings_module_incident_link_participates() {
+    // Link 1: a1(0,0) -> m2(100,100) (a Module endpoint).
+    // Link 2: a3(0,100) -> a4(100,0). The two diagonals cross once.
+    let view = cv_view(vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_module(2, 100.0, 100.0),
+        cv_aux(3, 0.0, 100.0),
+        cv_aux(4, 100.0, 0.0),
+        cv_link(10, 1, 2, LinkShape::Straight),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ]);
+
+    assert_eq!(
+        count_view_crossings(&view),
+        1,
+        "a Module-incident link must participate in crossing detection"
+    );
+}
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index db63e4088..d189c1e79 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -4327,49 +4327,89 @@ fn detect_chains(
     chains
 }
 
-/// Count edge crossings in a completed StockFlow view.
+/// Build the set of [`LineSegment`]s that crossing detection runs over for a
+/// completed StockFlow view. This is the single source of geometry shared by
+/// [`count_view_crossings`] and the layout quality metric, so a layout's
+/// crossing score can never disagree with the geometry the renderer draws.
 ///
-/// Arc and multi-point link shapes are approximated as straight segments
-/// from source to target position, so counts for diagrams with curved
-/// connectors are approximate.
-pub fn count_view_crossings(view: &datamodel::StockFlow) -> usize {
+/// Connector geometry comes from [`crate::diagram::connector::connector_polyline`],
+/// the exact polyline the SVG renderer draws: straight links are clipped to
+/// element boundaries, arcs are sampled along their arc circle, and MultiPoint
+/// links contribute nothing (the renderer draws nothing for them today).
+///
+/// Element endpoints are resolved over *all* element kinds, so a link incident
+/// on a Module or Alias is no longer dropped (the previous chord-based code
+/// only mapped Stock/Flow/Aux/Cloud, silently undercounting such crossings).
+///
+/// Node naming suppresses self- and shared-endpoint "crossings" exactly like
+/// before: a connector's first vertex is `elem_{from_uid}` and its last is
+/// `elem_{to_uid}` (so two connectors sharing an element endpoint never count),
+/// while internal arc-sample vertices are `link_{link.uid}#{i}` (so the
+/// consecutive segments of one arc share an internal node name and never count
+/// as self-crossings). Flow pipe segments keep the historic `flow_{uid}#{i}`
+/// naming.
+fn build_view_segments(view: &datamodel::StockFlow) -> Vec<LineSegment> {
     if view.elements.is_empty() {
-        return 0;
+        return Vec::new();
     }
 
-    let mut uid_positions: HashMap<i32, Position> = HashMap::new();
+    // Resolve every element by uid so a link can find its endpoints regardless
+    // of the endpoint's kind (Module/Alias included).
+    let mut uid_elements: HashMap<i32, &ViewElement> = HashMap::new();
     for elem in &view.elements {
-        match elem {
-            ViewElement::Stock(s) => {
-                uid_positions.insert(s.uid, Position::new(s.x, s.y));
-            }
-            ViewElement::Flow(f) => {
-                uid_positions.insert(f.uid, Position::new(f.x, f.y));
-            }
-            ViewElement::Aux(a) => {
-                uid_positions.insert(a.uid, Position::new(a.x, a.y));
-            }
-            ViewElement::Cloud(c) => {
-                uid_positions.insert(c.uid, Position::new(c.x, c.y));
-            }
-            _ => {}
-        }
+        uid_elements.insert(elem.get_uid(), elem);
     }
 
+    // Crossing detection is center-based and deterministic; no element is
+    // treated as arrayed (matching the historic behavior).
+    let not_arrayed = |_: &str| false;
+
     let mut segments: Vec<LineSegment> = Vec::new();
 
     for elem in &view.elements {
         match elem {
             ViewElement::Link(link) => {
-                if let (Some(&from_pos), Some(&to_pos)) = (
-                    uid_positions.get(&link.from_uid),
-                    uid_positions.get(&link.to_uid),
-                ) {
+                let (Some(&from), Some(&to)) = (
+                    uid_elements.get(&link.from_uid),
+                    uid_elements.get(&link.to_uid),
+                ) else {
+                    continue; // an endpoint is genuinely missing
+                };
+
+                let polyline = crate::diagram::connector::connector_polyline(
+                    link,
+                    from,
+                    to,
+                    &not_arrayed,
+                    crate::diagram::connector::ARC_POLYLINE_SAMPLES,
+                );
+                if polyline.len() < 2 {
+                    continue; // MultiPoint / degenerate: nothing drawn
+                }
+
+                let last_idx = polyline.len() - 1;
+                // Name the first vertex after the source element and the last
+                // after the target element so two connectors sharing an element
+                // endpoint are suppressed; name internal vertices per-link so a
+                // connector never crosses itself.
+                let vertex_name = |i: usize| -> String {
+                    if i == 0 {
+                        format!("elem_{}", link.from_uid)
+                    } else if i == last_idx {
+                        format!("elem_{}", link.to_uid)
+                    } else {
+                        format!("link_{}#{}", link.uid, i)
+                    }
+                };
+
+                for i in 0..last_idx {
+                    let a = polyline[i];
+                    let b = polyline[i + 1];
                     segments.push(LineSegment {
-                        start: from_pos,
-                        end: to_pos,
-                        from_node: format!("elem_{}", link.from_uid),
-                        to_node: format!("elem_{}", link.to_uid),
+                        start: Position::new(a.x, a.y),
+                        end: Position::new(b.x, b.y),
+                        from_node: vertex_name(i),
+                        to_node: vertex_name(i + 1),
                     });
                 }
             }
@@ -4387,7 +4427,18 @@ pub fn count_view_crossings(view: &datamodel::StockFlow) -> usize {
         }
     }
 
-    annealing::count_crossings(&segments)
+    segments
+}
+
+/// Count edge crossings in a completed StockFlow view.
+///
+/// Crossings are counted on the connectors' sampled drawn polylines: straight
+/// links clipped to element boundaries, arcs sampled along their arc circle,
+/// and flow pipes as their point polylines. All element endpoints are resolved
+/// (Module/Alias included), so the count reflects the geometry the renderer
+/// actually draws rather than a straight chord approximation.
+pub fn count_view_crossings(view: &datamodel::StockFlow) -> usize {
+    annealing::count_crossings(&build_view_segments(view))
 }
 
 /// Assemble a [`datamodel::StockFlow`] from finalized layout state, copying
@@ -5157,3 +5208,7 @@ fn select_best_layout(
 #[cfg(test)]
 #[path = "layout_tests.rs"]
 mod tests;
+
+#[cfg(test)]
+#[path = "crossings_tests.rs"]
+mod crossings_tests;

From e72936e719f035cf12ad07e6953d0e4376808ed1 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 21:33:13 -0700
Subject: [PATCH 07/38] engine: add pure LayoutMetrics quality core
 (metrics.rs)

Add a pure layout::metrics module computing one quality cost per aesthetic
concern (0 = ideal): node_overlap, node_connector_overlap, label_overlap,
crossings, sprawl, edge_length_cv, aspect_penalty, plus the reserved
zero-weighted chain_straightness/loop_compactness. weighted_cost is the linear
combination an optimizer minimizes; MetricWeights::default() is all-zeros so any
accidental use before the Phase 4 calibration is inert.

compute_layout_metrics is Functional Core: it takes a StockFlow plus a
LayoutConfig (kept for the design's optimizer signature; presently unused since
box geometry comes from the diagram helpers) and performs no I/O. Every term is
sourced from the same geometry the renderer draws -- the diagram *_bounds and
label helpers, connector_polyline, and the shared build_view_segments -- so a
layout's score can never disagree with its rendered SVG, and the crossings term
exactly equals count_view_crossings. Every division guards a zero denominator,
so empty and single-element views yield all-zero, NaN-free metrics.

aspect_penalty uses the documented ar - TARGET_AR_MAX overshoot form with
TARGET_AR_MAX = 16/9. The label-vs-node term skips a label's own element box
(the label is part of that element's merged bounds, so charging it would always
add a constant equal to the label area).

Inline tests cover AC1.1-AC1.8: hand-computed node/label/connector overlaps and
aspect penalties, weighted_cost linearity, empty/single-element finiteness, a
node_overlap shuffle-invariance proptest, and scale invariance. The AC1.8 test
corrects the plan's scoping note: with fixed-pixel element boxes (the design's
load-bearing renderer-fidelity invariant), only the topological crossings term
is exactly scale-invariant; node_connector_overlap/edge_length_cv/aspect_penalty
inherit a fixed-pixel offset and are scale-sensitive (asymptotically invariant),
so the test asserts crossings invariance and pins node_connector_overlap's
documented scale-sensitivity instead.
---
 src/simlin-engine/src/layout/metrics.rs | 1003 +++++++++++++++++++++++
 src/simlin-engine/src/layout/mod.rs     |    1 +
 2 files changed, 1004 insertions(+)
 create mode 100644 src/simlin-engine/src/layout/metrics.rs

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
new file mode 100644
index 000000000..3f510e8f5
--- /dev/null
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -0,0 +1,1003 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+// pattern: Functional Core
+//
+// The layout quality core. Every term here is computed purely from a
+// `datamodel::StockFlow` (and the `LayoutConfig` parameter, kept for
+// forward-compatibility with the design's optimizer signature). All geometry
+// comes from the same `diagram` helpers the SVG renderer uses and from
+// `layout::build_view_segments`, so a layout's quality score can never disagree
+// with the geometry the renderer draws or with `count_view_crossings`.
+//
+// There is NO I/O in this module: it takes data, computes scalars, returns
+// them. That makes every term trivially testable with hand-computed expected
+// values (see the inline tests below).
+
+use std::collections::HashSet;
+
+use crate::datamodel::{self, ViewElement};
+use crate::diagram::common::{
+    self, Point, Rect, display_name, merge_bounds, rect_area, rect_overlap_area,
+    segment_length_in_rect,
+};
+use crate::diagram::connector::{ARC_POLYLINE_SAMPLES, connector_polyline};
+use crate::diagram::elements::{aux_bounds, cloud_bounds, module_bounds, stock_bounds};
+use crate::diagram::flow::flow_bounds;
+use crate::diagram::label::{LabelProps, label_bounds};
+
+use super::annealing::count_crossings;
+use super::build_view_segments;
+use super::config::LayoutConfig;
+
+/// Upper bound of the target aspect-ratio band. A view whose bounding-box
+/// aspect ratio (long side / short side, always >= 1) is at or below this value
+/// is "well-proportioned" and incurs no `aspect_penalty`. 16:9 is a generous
+/// band that comfortably contains the conventional 4:3 diagram proportions
+/// while still penalizing pathologically thin (e.g. 1x10) layouts.
+pub const TARGET_AR_MAX: f64 = 16.0 / 9.0;
+
+/// One quality cost per aesthetic concern, with `0.0` always meaning "ideal".
+///
+/// Most terms are scale-free by construction (ratios of like quantities), so
+/// they are comparable across models of different absolute coordinate scale.
+/// Three terms are *intentionally* sensitive to the absolute coordinate scale
+/// relative to the universal fixed node-box size (`node_overlap`,
+/// `label_overlap`, `sprawl`): a model whose nodes are packed tightly against
+/// the fixed pixel size of a stock/aux box should score differently from one
+/// spread far apart, and that sensitivity is what makes those terms meaningful
+/// across models. See the AC1.8 scoping note in the Phase 1 plan.
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct LayoutMetrics {
+    /// Sum of pairwise node-box overlap area, normalized by total node area.
+    pub node_overlap: f64,
+    /// Fraction of total connector length that passes through non-incident
+    /// node boxes.
+    pub node_connector_overlap: f64,
+    /// Sum of label-vs-label and label-vs-node overlap area, normalized by
+    /// total label area.
+    pub label_overlap: f64,
+    /// Edge crossings normalized by connector count.
+    pub crossings: f64,
+    /// Mean connector length relative to the characteristic node size.
+    pub sprawl: f64,
+    /// Coefficient of variation (stddev/mean) of connector lengths.
+    pub edge_length_cv: f64,
+    /// How far the view bounding-box aspect ratio exceeds the target band.
+    pub aspect_penalty: f64,
+    /// Reserved; computed in a future rung. Always 0.0, weight 0.
+    pub chain_straightness: f64,
+    /// Reserved; computed in a future rung. Always 0.0, weight 0.
+    pub loop_compactness: f64,
+}
+
+/// Per-term weights for the scalar an optimizer minimizes.
+///
+/// The calibrated production weights (and the failure-mode priority ordering)
+/// are committed in Phase 4. Until then `MetricWeights::default()` is all-zeros
+/// (see below) so any accidental use of `weighted_cost` before calibration is
+/// obviously inert rather than silently wrong.
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct MetricWeights {
+    pub node_overlap: f64,
+    pub node_connector_overlap: f64,
+    pub label_overlap: f64,
+    pub crossings: f64,
+    pub sprawl: f64,
+    pub edge_length_cv: f64,
+    pub aspect_penalty: f64,
+    pub chain_straightness: f64,
+    pub loop_compactness: f64,
+}
+
+impl Default for MetricWeights {
+    /// All-zeros: calibrated in Phase 4. An all-zero weight set makes
+    /// `weighted_cost` return 0.0 regardless of the metrics, so using the
+    /// default before calibration is inert (cannot mislead an optimizer) rather
+    /// than applying made-up weights.
+    fn default() -> Self {
+        MetricWeights {
+            node_overlap: 0.0,
+            node_connector_overlap: 0.0,
+            label_overlap: 0.0,
+            crossings: 0.0,
+            sprawl: 0.0,
+            edge_length_cv: 0.0,
+            aspect_penalty: 0.0,
+            chain_straightness: 0.0,
+            loop_compactness: 0.0,
+        }
+    }
+}
+
+impl LayoutMetrics {
+    /// Sigma w_i * term_i -- the scalar an optimizer minimizes.
+    pub fn weighted_cost(&self, w: &MetricWeights) -> f64 {
+        self.node_overlap * w.node_overlap
+            + self.node_connector_overlap * w.node_connector_overlap
+            + self.label_overlap * w.label_overlap
+            + self.crossings * w.crossings
+            + self.sprawl * w.sprawl
+            + self.edge_length_cv * w.edge_length_cv
+            + self.aspect_penalty * w.aspect_penalty
+            + self.chain_straightness * w.chain_straightness
+            + self.loop_compactness * w.loop_compactness
+    }
+}
+
+/// The drawn geometry of one connector (Link or Flow): its incident node uids
+/// (so node-connector-overlap can skip them) and the polyline the renderer
+/// draws. Built once and reused by every connector-derived term so they all see
+/// the same geometry.
+struct ConnectorGeometry {
+    /// Element uids the connector is attached to and must not be charged for
+    /// passing through (its own endpoints).
+    incident_uids: HashSet<i32>,
+    /// The drawn polyline. Always has at least two points (connectors that draw
+    /// nothing -- e.g. MultiPoint links -- are not collected at all).
+    polyline: Vec<Point>,
+    /// Total polyline length.
+    length: f64,
+}
+
+/// Polyline length: sum of segment lengths.
+fn polyline_length(points: &[Point]) -> f64 {
+    points
+        .windows(2)
+        .map(|w| {
+            let dx = w[1].x - w[0].x;
+            let dy = w[1].y - w[0].y;
+            (dx * dx + dy * dy).sqrt()
+        })
+        .sum()
+}
+
+/// Resolve the node box for an element that has one (everything except links,
+/// groups, and aliases -- aliases have no bounds helper and are excluded to
+/// match the renderer's `calc_view_box`).
+fn node_box(element: &ViewElement) -> Option<Rect> {
+    match element {
+        ViewElement::Aux(a) => Some(aux_bounds(a)),
+        ViewElement::Stock(s) => Some(stock_bounds(s)),
+        ViewElement::Module(m) => Some(module_bounds(m)),
+        ViewElement::Cloud(c) => Some(cloud_bounds(c)),
+        ViewElement::Flow(f) => Some(flow_bounds(f)),
+        ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => None,
+    }
+}
+
+/// Build a `LabelProps` for a labeled element, matching the renderer's label
+/// geometry (center, label side, display name, and the element's radii). Only
+/// elements that render a label return `Some`. The radii match the per-element
+/// `with_radii` calls in `diagram::elements`/`diagram::flow`.
+fn element_label_props(element: &ViewElement) -> Option<LabelProps> {
+    use crate::diagram::constants::{
+        AUX_RADIUS, FLOW_VALVE_RADIUS, MODULE_HEIGHT, MODULE_WIDTH, STOCK_HEIGHT, STOCK_WIDTH,
+    };
+    match element {
+        ViewElement::Aux(a) => Some(
+            LabelProps::new(a.x, a.y, a.label_side, display_name(&a.name))
+                .with_radii(AUX_RADIUS, AUX_RADIUS),
+        ),
+        ViewElement::Stock(s) => Some(
+            LabelProps::new(s.x, s.y, s.label_side, display_name(&s.name))
+                .with_radii(STOCK_WIDTH / 2.0, STOCK_HEIGHT / 2.0),
+        ),
+        ViewElement::Module(m) => Some(
+            LabelProps::new(m.x, m.y, m.label_side, display_name(&m.name))
+                .with_radii(MODULE_WIDTH / 2.0, MODULE_HEIGHT / 2.0),
+        ),
+        ViewElement::Flow(f) => Some(
+            LabelProps::new(f.x, f.y, f.label_side, display_name(&f.name))
+                .with_radii(FLOW_VALVE_RADIUS, FLOW_VALVE_RADIUS),
+        ),
+        // Aliases do render a label, but they have no `*_bounds` helper and are
+        // excluded from node bounds to match the renderer's view box; we keep
+        // the label-set consistent with the node-box set by also excluding
+        // their labels. Links/Clouds/Groups render no element label.
+        ViewElement::Alias(_)
+        | ViewElement::Link(_)
+        | ViewElement::Cloud(_)
+        | ViewElement::Group(_) => None,
+    }
+}
+
+/// Collect the drawn geometry of every connector (Link or Flow) that draws
+/// something. Links use the shared `connector_polyline` (the exact geometry the
+/// renderer draws and `build_view_segments` counts); flows use their point
+/// polyline. Connectors that draw nothing (MultiPoint links, degenerate arcs,
+/// flows with fewer than two points) are omitted entirely.
+fn collect_connector_geometry(view: &datamodel::StockFlow) -> Vec<ConnectorGeometry> {
+    let mut uid_elements = std::collections::HashMap::new();
+    for elem in &view.elements {
+        uid_elements.insert(elem.get_uid(), elem);
+    }
+    // Center-based, deterministic: nothing is treated as arrayed (matches
+    // `build_view_segments`).
+    let not_arrayed = |_: &str| false;
+
+    let mut out = Vec::new();
+    for elem in &view.elements {
+        match elem {
+            ViewElement::Link(link) => {
+                let (Some(&from), Some(&to)) = (
+                    uid_elements.get(&link.from_uid),
+                    uid_elements.get(&link.to_uid),
+                ) else {
+                    continue;
+                };
+                let polyline =
+                    connector_polyline(link, from, to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+                if polyline.len() < 2 {
+                    continue;
+                }
+                let length = polyline_length(&polyline);
+                let mut incident_uids = HashSet::new();
+                incident_uids.insert(link.from_uid);
+                incident_uids.insert(link.to_uid);
+                out.push(ConnectorGeometry {
+                    incident_uids,
+                    polyline,
+                    length,
+                });
+            }
+            ViewElement::Flow(flow) => {
+                if flow.points.len() < 2 {
+                    continue;
+                }
+                let polyline: Vec<Point> = flow
+                    .points
+                    .iter()
+                    .map(|p| Point { x: p.x, y: p.y })
+                    .collect();
+                let length = polyline_length(&polyline);
+                // A flow is incident on its own valve plus any element its
+                // points attach to (the stock/cloud at each end).
+                let mut incident_uids = HashSet::new();
+                incident_uids.insert(flow.uid);
+                for p in &flow.points {
+                    if let Some(uid) = p.attached_to_uid {
+                        incident_uids.insert(uid);
+                    }
+                }
+                out.push(ConnectorGeometry {
+                    incident_uids,
+                    polyline,
+                    length,
+                });
+            }
+            _ => {}
+        }
+    }
+    out
+}
+
+/// Compute the layout quality metrics for a completed view.
+///
+/// PURE: takes data, returns scalars, performs no I/O. The `_config` parameter
+/// is kept to match the design's optimizer-facing signature and for forward
+/// compatibility; the box geometry is sourced entirely from the `diagram`
+/// helpers (which use fixed pixel element sizes), so the config is presently
+/// unused. Every term is guaranteed finite (each division guards a zero
+/// denominator by returning 0), so empty and single-element views yield
+/// all-zero, NaN-free metrics.
+pub fn compute_layout_metrics(
+    view: &datamodel::StockFlow,
+    _config: &LayoutConfig,
+) -> LayoutMetrics {
+    // --- node boxes (with their owning element for incidence checks) ---
+    let node_boxes: Vec<(i32, Rect)> = view
+        .elements
+        .iter()
+        .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r)))
+        .collect();
+
+    // --- node_overlap ---
+    let total_node_area: f64 = node_boxes.iter().map(|(_, r)| rect_area(r)).sum();
+    let node_overlap = if total_node_area > 0.0 {
+        let mut overlap = 0.0;
+        for i in 0..node_boxes.len() {
+            for j in (i + 1)..node_boxes.len() {
+                overlap += rect_overlap_area(&node_boxes[i].1, &node_boxes[j].1);
+            }
+        }
+        overlap / total_node_area
+    } else {
+        0.0
+    };
+
+    // --- connector geometry (shared by several terms) ---
+    let connectors = collect_connector_geometry(view);
+    let total_connector_length: f64 = connectors.iter().map(|c| c.length).sum();
+
+    // --- node_connector_overlap ---
+    let node_connector_overlap = if total_connector_length > 0.0 {
+        let mut inside = 0.0;
+        for c in &connectors {
+            for (uid, rect) in &node_boxes {
+                if c.incident_uids.contains(uid) {
+                    continue; // skip the connector's own endpoints
+                }
+                for seg in c.polyline.windows(2) {
+                    inside += segment_length_in_rect(&seg[0], &seg[1], rect);
+                }
+            }
+        }
+        inside / total_connector_length
+    } else {
+        0.0
+    };
+
+    // --- label_overlap ---
+    // Each label box is tagged with its owning element's uid so the
+    // label-vs-node sum can skip that element's own node box: a label is, by
+    // construction, adjacent to (and inside the merged bounds of) its own
+    // element, so charging it against its own box would always add exactly the
+    // label's area -- a constant that is not a real collision.
+    let label_boxes: Vec<(i32, Rect)> = view
+        .elements
+        .iter()
+        .filter_map(|e| element_label_props(e).map(|props| (e.get_uid(), label_bounds(&props))))
+        .collect();
+    let total_label_area: f64 = label_boxes.iter().map(|(_, r)| rect_area(r)).sum();
+    let label_overlap = if total_label_area > 0.0 {
+        let mut overlap = 0.0;
+        // label-vs-label (each unordered pair once)
+        for i in 0..label_boxes.len() {
+            for j in (i + 1)..label_boxes.len() {
+                overlap += rect_overlap_area(&label_boxes[i].1, &label_boxes[j].1);
+            }
+        }
+        // label-vs-node, skipping the label's own element box.
+        for (lbl_uid, lbl) in &label_boxes {
+            for (node_uid, node) in &node_boxes {
+                if lbl_uid == node_uid {
+                    continue;
+                }
+                overlap += rect_overlap_area(lbl, node);
+            }
+        }
+        overlap / total_label_area
+    } else {
+        0.0
+    };
+
+    // --- crossings ---
+    let connector_count = connectors.len();
+    let crossings = if connector_count > 0 {
+        count_crossings(&build_view_segments(view)) as f64 / connector_count as f64
+    } else {
+        0.0
+    };
+
+    // --- sprawl ---
+    let sprawl = if !connectors.is_empty() && !node_boxes.is_empty() {
+        let mean_connector_length = total_connector_length / connectors.len() as f64;
+        let characteristic_node_size = node_boxes
+            .iter()
+            .map(|(_, r)| {
+                let w = common::rect_width(r);
+                let h = common::rect_height(r);
+                (w * w + h * h).sqrt()
+            })
+            .sum::<f64>()
+            / node_boxes.len() as f64;
+        if characteristic_node_size > 0.0 {
+            mean_connector_length / characteristic_node_size
+        } else {
+            0.0
+        }
+    } else {
+        0.0
+    };
+
+    // --- edge_length_cv ---
+    let edge_length_cv = if connectors.len() >= 2 {
+        let n = connectors.len() as f64;
+        let mean = total_connector_length / n;
+        if mean > 0.0 {
+            let variance = connectors
+                .iter()
+                .map(|c| {
+                    let d = c.length - mean;
+                    d * d
+                })
+                .sum::<f64>()
+                / n; // population variance
+            variance.sqrt() / mean
+        } else {
+            0.0
+        }
+    } else {
+        0.0
+    };
+
+    // --- aspect_penalty ---
+    // Bounding box over node boxes (union). The aspect ratio is the long side
+    // over the short side (always >= 1); we penalize the amount by which it
+    // exceeds the target band. Chosen formula: `ar - TARGET_AR_MAX` (a plain
+    // unit-of-ratio overshoot). Documented here and matched in the AC1.5 test.
+    let aspect_penalty = match view_bounding_box(&node_boxes) {
+        Some(bbox) => {
+            let w = common::rect_width(&bbox);
+            let h = common::rect_height(&bbox);
+            let (long, short) = if w >= h { (w, h) } else { (h, w) };
+            if short <= 0.0 {
+                0.0
+            } else {
+                let ar = long / short;
+                (ar - TARGET_AR_MAX).max(0.0)
+            }
+        }
+        None => 0.0,
+    };
+
+    LayoutMetrics {
+        node_overlap,
+        node_connector_overlap,
+        label_overlap,
+        crossings,
+        sprawl,
+        edge_length_cv,
+        aspect_penalty,
+        // reserved; computed in a future rung
+        chain_straightness: 0.0,
+        loop_compactness: 0.0,
+    }
+}
+
+/// Union of the node boxes, or `None` if there are no node boxes.
+fn view_bounding_box(node_boxes: &[(i32, Rect)]) -> Option<Rect> {
+    let mut iter = node_boxes.iter();
+    let first = iter.next()?.1;
+    Some(iter.fold(first, |acc, (_, r)| merge_bounds(acc, *r)))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::datamodel::view_element::{self, LabelSide, LinkShape};
+    use crate::diagram::constants::STOCK_WIDTH;
+    use proptest::prelude::*;
+
+    // --- fixture helpers ---
+
+    fn stock(uid: i32, name: &str, x: f64, y: f64) -> ViewElement {
+        ViewElement::Stock(view_element::Stock {
+            name: name.to_string(),
+            uid,
+            x,
+            y,
+            label_side: LabelSide::Bottom,
+            compat: None,
+        })
+    }
+
+    fn aux(uid: i32, name: &str, x: f64, y: f64) -> ViewElement {
+        ViewElement::Aux(view_element::Aux {
+            name: name.to_string(),
+            uid,
+            x,
+            y,
+            label_side: LabelSide::Bottom,
+            compat: None,
+        })
+    }
+
+    fn straight_link(uid: i32, from_uid: i32, to_uid: i32) -> ViewElement {
+        ViewElement::Link(view_element::Link {
+            uid,
+            from_uid,
+            to_uid,
+            shape: LinkShape::Straight,
+            polarity: None,
+        })
+    }
+
+    fn make_view(elements: Vec<ViewElement>) -> datamodel::StockFlow {
+        datamodel::StockFlow {
+            name: None,
+            elements,
+            view_box: datamodel::Rect {
+                x: 0.0,
+                y: 0.0,
+                width: 1000.0,
+                height: 1000.0,
+            },
+            zoom: 1.0,
+            use_lettered_polarity: false,
+            font: None,
+            sketch_compat: None,
+        }
+    }
+
+    fn cfg() -> LayoutConfig {
+        LayoutConfig::default()
+    }
+
+    /// Scale every coordinate of a view by `s` (element centers and any
+    /// flow/connector points). Used by the AC1.8 scale-invariance test.
+    fn scale_view(view: &datamodel::StockFlow, s: f64) -> datamodel::StockFlow {
+        let elements = view
+            .elements
+            .iter()
+            .map(|e| match e {
+                ViewElement::Aux(a) => ViewElement::Aux(view_element::Aux {
+                    x: a.x * s,
+                    y: a.y * s,
+                    ..a.clone()
+                }),
+                ViewElement::Stock(st) => ViewElement::Stock(view_element::Stock {
+                    x: st.x * s,
+                    y: st.y * s,
+                    ..st.clone()
+                }),
+                ViewElement::Flow(f) => ViewElement::Flow(view_element::Flow {
+                    x: f.x * s,
+                    y: f.y * s,
+                    points: f
+                        .points
+                        .iter()
+                        .map(|p| view_element::FlowPoint {
+                            x: p.x * s,
+                            y: p.y * s,
+                            attached_to_uid: p.attached_to_uid,
+                        })
+                        .collect(),
+                    ..f.clone()
+                }),
+                ViewElement::Module(m) => ViewElement::Module(view_element::Module {
+                    x: m.x * s,
+                    y: m.y * s,
+                    ..m.clone()
+                }),
+                ViewElement::Cloud(c) => ViewElement::Cloud(view_element::Cloud {
+                    x: c.x * s,
+                    y: c.y * s,
+                    ..c.clone()
+                }),
+                ViewElement::Alias(a) => ViewElement::Alias(view_element::Alias {
+                    x: a.x * s,
+                    y: a.y * s,
+                    ..a.clone()
+                }),
+                other => other.clone(),
+            })
+            .collect();
+        datamodel::StockFlow {
+            elements,
+            ..view.clone()
+        }
+    }
+
+    // --- AC1.1: node_overlap equals known overlap / total node area ---
+
+    #[test]
+    fn test_node_overlap_known_overlap_fraction() {
+        // Two stocks (45x35) whose centers are 20px apart horizontally and 10px
+        // apart vertically. With LabelSide::Bottom the label sits below the box
+        // and does not change the horizontal/vertical extent that matters for
+        // the box-box overlap of the *element* boxes; however `stock_bounds`
+        // merges the label, so we place the stocks far enough apart vertically
+        // that the labels do not collide, and compute the overlap from the full
+        // merged boxes directly.
+        let s1 = stock(1, "a", 100.0, 100.0);
+        let s2 = stock(2, "b", 120.0, 100.0);
+        let view = make_view(vec![s1.clone(), s2.clone()]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+
+        // Expected: compute directly from the two merged boxes the renderer
+        // would produce.
+        let b1 = stock_bounds(match &s1 {
+            ViewElement::Stock(s) => s,
+            _ => unreachable!(),
+        });
+        let b2 = stock_bounds(match &s2 {
+            ViewElement::Stock(s) => s,
+            _ => unreachable!(),
+        });
+        let expected_overlap = rect_overlap_area(&b1, &b2);
+        let expected_total = rect_area(&b1) + rect_area(&b2);
+        assert!(expected_overlap > 0.0, "fixture must actually overlap");
+        let expected = expected_overlap / expected_total;
+        assert!(
+            (m.node_overlap - expected).abs() < 1e-9,
+            "node_overlap {} != expected {}",
+            m.node_overlap,
+            expected
+        );
+    }
+
+    #[test]
+    fn test_node_overlap_simple_hand_computed() {
+        // Two stocks far apart vertically so labels never collide, and exactly
+        // 5px horizontal center separation so the element boxes overlap by a
+        // hand-computable amount in x while fully overlapping in y is avoided.
+        // To make the arithmetic exact and independent of label geometry, use
+        // LabelSide::Center is risky; instead verify against the helper-derived
+        // boxes (the renderer's own geometry) which is the contract.
+        let s1 = stock(1, "a", 0.0, 0.0);
+        let s2 = stock(2, "b", STOCK_WIDTH, 0.0); // centers exactly one width apart
+        let view = make_view(vec![s1, s2]);
+        let m = compute_layout_metrics(&view, &cfg());
+        // Centers one full width apart -> the 45-wide boxes just touch in x
+        // (right edge of #1 at +22.5, left edge of #2 at +22.5): zero element
+        // overlap. Labels (Bottom) are centered under each and 45px apart, each
+        // ~34px wide -> they do not overlap either. So node_overlap == 0.
+        assert_eq!(m.node_overlap, 0.0);
+    }
+
+    // --- AC1.2: pairwise-disjoint nodes => node_overlap == 0 ---
+
+    #[test]
+    fn test_node_overlap_disjoint_is_zero() {
+        let view = make_view(vec![
+            stock(1, "a", 0.0, 0.0),
+            stock(2, "b", 500.0, 500.0),
+            aux(3, "c", 1000.0, 0.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.node_overlap, 0.0);
+    }
+
+    // --- AC1.3: node_connector_overlap ---
+
+    #[test]
+    fn test_node_connector_overlap_through_third_node() {
+        // Connector from aux #1 (far left) to aux #2 (far right), passing
+        // horizontally through a stock #3 sitting on the line at the middle.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let mid = stock(3, "s", 200.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, mid, link]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.node_connector_overlap > 0.0,
+            "connector passing through a non-incident stock must contribute"
+        );
+
+        // Expected = clipped length inside the stock box / total polyline len.
+        let connectors = collect_connector_geometry(&view);
+        assert_eq!(connectors.len(), 1);
+        let c = &connectors[0];
+        let stock_box = node_box(&stock(3, "s", 200.0, 0.0)).unwrap();
+        let mut inside = 0.0;
+        for seg in c.polyline.windows(2) {
+            inside += segment_length_in_rect(&seg[0], &seg[1], &stock_box);
+        }
+        let expected = inside / c.length;
+        assert!(
+            (m.node_connector_overlap - expected).abs() < 1e-9,
+            "got {} expected {}",
+            m.node_connector_overlap,
+            expected
+        );
+    }
+
+    #[test]
+    fn test_node_connector_overlap_avoids_all_is_zero() {
+        // Connector between two auxes with a third node well off the line.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let off = stock(3, "s", 200.0, 500.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, off, link]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.node_connector_overlap, 0.0);
+    }
+
+    // --- AC1.4: label_overlap ---
+
+    #[test]
+    fn test_label_overlap_overlapping_labels() {
+        // Two auxes at the same position -> their labels (Bottom) coincide.
+        let view = make_view(vec![
+            aux(1, "samename", 100.0, 100.0),
+            aux(2, "samename", 100.0, 100.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.label_overlap > 0.0,
+            "coincident labels must produce positive label_overlap"
+        );
+    }
+
+    #[test]
+    fn test_label_overlap_disjoint_is_zero() {
+        // Two auxes far apart -> labels and node boxes are all disjoint.
+        let view = make_view(vec![aux(1, "a", 0.0, 0.0), aux(2, "b", 1000.0, 1000.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.label_overlap, 0.0);
+    }
+
+    // --- AC1.5: aspect_penalty ---
+
+    #[test]
+    fn test_aspect_penalty_thin_box_positive() {
+        // Two auxes stacked far apart vertically and close horizontally -> the
+        // node bounding box is tall and thin (ar >> target), so penalty > 0.
+        let view = make_view(vec![aux(1, "a", 0.0, 0.0), aux(2, "b", 0.0, 1000.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.aspect_penalty > 0.0,
+            "a tall thin bbox must be penalized, got {}",
+            m.aspect_penalty
+        );
+
+        // Verify it equals exactly `ar - TARGET_AR_MAX` for the computed bbox.
+        let node_boxes: Vec<(i32, Rect)> = view
+            .elements
+            .iter()
+            .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r)))
+            .collect();
+        let bbox = view_bounding_box(&node_boxes).unwrap();
+        let w = common::rect_width(&bbox);
+        let h = common::rect_height(&bbox);
+        let (long, short) = if w >= h { (w, h) } else { (h, w) };
+        let expected = (long / short - TARGET_AR_MAX).max(0.0);
+        assert!((m.aspect_penalty - expected).abs() < 1e-9);
+    }
+
+    #[test]
+    fn test_aspect_penalty_balanced_box_zero() {
+        // Four auxes placed so the bounding box is ~4:3 (well inside the 16:9
+        // band) -> zero penalty. Width 400, height 300 between centers; the
+        // fixed node radii add a small symmetric margin that keeps ar < 16/9.
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 400.0, 0.0),
+            aux(3, "c", 0.0, 300.0),
+            aux(4, "d", 400.0, 300.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        // Confirm the bbox aspect ratio really is inside the band for this
+        // fixture, then assert the penalty is exactly zero.
+        let node_boxes: Vec<(i32, Rect)> = view
+            .elements
+            .iter()
+            .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r)))
+            .collect();
+        let bbox = view_bounding_box(&node_boxes).unwrap();
+        let w = common::rect_width(&bbox);
+        let h = common::rect_height(&bbox);
+        let ar = w.max(h) / w.min(h);
+        assert!(ar <= TARGET_AR_MAX, "fixture bbox ar {} not in band", ar);
+        assert_eq!(m.aspect_penalty, 0.0);
+    }
+
+    // --- AC1.6: weighted_cost is the exact linear combination ---
+
+    #[test]
+    fn test_weighted_cost_exact_linear_combination() {
+        let m = LayoutMetrics {
+            node_overlap: 1.5,
+            node_connector_overlap: 2.0,
+            label_overlap: 0.5,
+            crossings: 3.0,
+            sprawl: 4.0,
+            edge_length_cv: 0.25,
+            aspect_penalty: 6.0,
+            chain_straightness: 7.0,
+            loop_compactness: 8.0,
+        };
+        let w = MetricWeights {
+            node_overlap: 10.0,
+            node_connector_overlap: 20.0,
+            label_overlap: 30.0,
+            crossings: 40.0,
+            sprawl: 50.0,
+            edge_length_cv: 60.0,
+            aspect_penalty: 70.0,
+            chain_straightness: 80.0,
+            loop_compactness: 90.0,
+        };
+        let expected = 1.5 * 10.0
+            + 2.0 * 20.0
+            + 0.5 * 30.0
+            + 3.0 * 40.0
+            + 4.0 * 50.0
+            + 0.25 * 60.0
+            + 6.0 * 70.0
+            + 7.0 * 80.0
+            + 8.0 * 90.0;
+        assert!((m.weighted_cost(&w) - expected).abs() < 1e-9);
+    }
+
+    #[test]
+    fn test_default_weights_are_all_zero_so_cost_is_inert() {
+        let m = LayoutMetrics {
+            node_overlap: 1.0,
+            node_connector_overlap: 1.0,
+            label_overlap: 1.0,
+            crossings: 1.0,
+            sprawl: 1.0,
+            edge_length_cv: 1.0,
+            aspect_penalty: 1.0,
+            chain_straightness: 1.0,
+            loop_compactness: 1.0,
+        };
+        assert_eq!(m.weighted_cost(&MetricWeights::default()), 0.0);
+    }
+
+    // --- AC1.7: empty / single-element views are all-zero and finite ---
+
+    fn assert_all_finite(m: &LayoutMetrics) {
+        assert!(m.node_overlap.is_finite());
+        assert!(m.node_connector_overlap.is_finite());
+        assert!(m.label_overlap.is_finite());
+        assert!(m.crossings.is_finite());
+        assert!(m.sprawl.is_finite());
+        assert!(m.edge_length_cv.is_finite());
+        assert!(m.aspect_penalty.is_finite());
+        assert!(m.chain_straightness.is_finite());
+        assert!(m.loop_compactness.is_finite());
+    }
+
+    fn assert_all_zero(m: &LayoutMetrics) {
+        assert_eq!(m.node_overlap, 0.0);
+        assert_eq!(m.node_connector_overlap, 0.0);
+        assert_eq!(m.label_overlap, 0.0);
+        assert_eq!(m.crossings, 0.0);
+        assert_eq!(m.sprawl, 0.0);
+        assert_eq!(m.edge_length_cv, 0.0);
+        assert_eq!(m.aspect_penalty, 0.0);
+        assert_eq!(m.chain_straightness, 0.0);
+        assert_eq!(m.loop_compactness, 0.0);
+    }
+
+    #[test]
+    fn test_empty_view_all_zero_finite() {
+        let view = make_view(vec![]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_all_finite(&m);
+        assert_all_zero(&m);
+    }
+
+    #[test]
+    fn test_single_element_view_all_zero_finite() {
+        let view = make_view(vec![aux(1, "only", 100.0, 100.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_all_finite(&m);
+        // A single node has no overlaps, no connectors, and a degenerate (zero
+        // short-side? no -- a real box) bounding box. Its aspect ratio is the
+        // single aux box's own ar, which for a square-ish aux box is ~1 (inside
+        // the band), so aspect_penalty is 0; all connector terms are 0.
+        assert_eq!(m.node_overlap, 0.0);
+        assert_eq!(m.node_connector_overlap, 0.0);
+        assert_eq!(m.crossings, 0.0);
+        assert_eq!(m.sprawl, 0.0);
+        assert_eq!(m.edge_length_cv, 0.0);
+    }
+
+    // --- AC1.8 (scoped): scale invariance under uniform coordinate scaling ---
+    //
+    // SCOPING (correction to the AC1.8 plan note, 2026-05-22): the plan listed
+    // `node_connector_overlap`, `crossings`, `edge_length_cv`, and
+    // `aspect_penalty` as scale-free. After implementing the metric against the
+    // ACTUAL renderer geometry (the design's load-bearing invariant: metrics
+    // are computed on the same geometry the renderer draws), only `crossings`
+    // is *exactly* scale-invariant. The reason is the same fixed-pixel element
+    // geometry the plan already cites for node_overlap/label_overlap/sprawl, and
+    // it propagates further than the plan anticipated:
+    //
+    //   * Connectors are clipped to fixed-radius element boundaries, so a
+    //     straight link's drawn length is `s*center_dist - r_from - r_to`
+    //     (AFFINE in `s`, not linear). Hence `edge_length_cv = stddev/mean` of
+    //     those affine lengths is only ASYMPTOTICALLY invariant (the fixed
+    //     offset shrinks relative to the scaled spread), not exactly.
+    //   * `node_connector_overlap` divides an inside-fixed-box overlap length
+    //     (which does NOT scale) by total connector length (which does), so it
+    //     shrinks like ~1/s -- scale-SENSITIVE, like `sprawl`.
+    //   * The view bounding box is `union(fixed boxes around scaled centers)`,
+    //     so its width/height are each `s*span + fixed_box_size`; the aspect
+    //     ratio is therefore only asymptotically invariant.
+    //
+    // The principled resolution keeps renderer-faithful geometry (the whole
+    // point of the phase) and accepts that only the topological `crossings`
+    // term is exactly scale-invariant. This test asserts that exactly, and
+    // additionally pins the documented scale-SENSITIVITY of
+    // `node_connector_overlap` (clean ~1/s) so the scoping is non-vacuous. The
+    // mismatch with the plan's term list is surfaced in the executor report and
+    // tracked for the calibration phase.
+    //
+    // The fixture has zero node-overlap and zero label-overlap so those
+    // scale-sensitive area terms are trivially 0 before and after scaling.
+    #[test]
+    fn test_scale_invariance_of_scale_free_terms() {
+        // A small connected, well-separated view: three auxes and two stocks,
+        // far enough apart that there is no node-overlap and no label-overlap,
+        // with two straight links (one of which passes through a non-incident
+        // node so node_connector_overlap is nonzero and meaningful).
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 400.0, 0.0),
+            stock(3, "s", 200.0, 0.0), // on the a->b line: nonzero conn overlap
+            aux(4, "c", 0.0, 300.0),
+            stock(5, "t", 400.0, 320.0),
+            straight_link(10, 1, 2), // passes through stock #3
+            straight_link(11, 4, 5),
+        ]);
+
+        let base = compute_layout_metrics(&view, &cfg());
+        // Sanity: the fixture must have zero node/label overlap (so the
+        // scale-sensitive area terms are trivially scale-equal) and a nonzero
+        // conn-overlap (so the documented scale-SENSITIVITY check is
+        // non-vacuous).
+        assert_eq!(base.node_overlap, 0.0, "fixture must have no node overlap");
+        assert_eq!(
+            base.label_overlap, 0.0,
+            "fixture must have no label overlap"
+        );
+        assert!(
+            base.node_connector_overlap > 0.0,
+            "fixture must have a connector through a non-incident node"
+        );
+
+        let s = 3.0;
+        let scaled = compute_layout_metrics(&scale_view(&view, s), &cfg());
+
+        // The one exactly scale-invariant term: edge crossings are a topological
+        // count, preserved by any uniform scale.
+        assert!(
+            (scaled.crossings - base.crossings).abs() < 1e-9,
+            "crossings not scale-invariant: {} vs {}",
+            scaled.crossings,
+            base.crossings
+        );
+
+        // Documented scale-SENSITIVITY of node_connector_overlap: with
+        // fixed-size node boxes, scaling the coordinates by `s` leaves the
+        // inside-box overlap length essentially unchanged (the box and the
+        // line's center crossing are fixed) while total connector length grows
+        // with `s`, so the ratio strictly DECREASES under up-scaling. (It does
+        // not drop by exactly 1/s because the denominator -- connector length
+        // clipped to fixed-radius element boundaries -- is affine in `s`, not
+        // linear; we assert the robust direction rather than a brittle factor.)
+        assert!(
+            scaled.node_connector_overlap < base.node_connector_overlap,
+            "node_connector_overlap should DROP under up-scaling (fixed boxes): \
+             scaled {} should be < base {}",
+            scaled.node_connector_overlap,
+            base.node_connector_overlap
+        );
+    }
+
+    // --- Property test: node_overlap is symmetric under element shuffle ---
+
+    proptest! {
+        #![proptest_config(ProptestConfig::with_cases(64))]
+
+        /// node_overlap is a sum over unordered element pairs, so it must be
+        /// invariant under any permutation of the element list.
+        #[test]
+        fn prop_node_overlap_shuffle_invariant(
+            // four stocks at small integer-ish coordinates so some overlap and
+            // some don't; coordinates kept modest to stay fast.
+            xs in prop::collection::vec(-50.0f64..50.0, 4),
+            ys in prop::collection::vec(-50.0f64..50.0, 4),
+            perm in prop::sample::subsequence(vec![0usize, 1, 2, 3], 4),
+        ) {
+            let elems: Vec<ViewElement> = (0..4)
+                .map(|i| stock(i as i32 + 1, "n", xs[i], ys[i]))
+                .collect();
+
+            let base = compute_layout_metrics(&make_view(elems.clone()), &cfg());
+
+            // `perm` is a random ordering of [0,1,2,3]; reorder accordingly.
+            let shuffled: Vec<ViewElement> = perm.iter().map(|&i| elems[i].clone()).collect();
+            let other = compute_layout_metrics(&make_view(shuffled), &cfg());
+
+            prop_assert!(
+                (base.node_overlap - other.node_overlap).abs() < 1e-9,
+                "node_overlap changed under shuffle: {} vs {}",
+                base.node_overlap,
+                other.node_overlap
+            );
+        }
+    }
+}
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index d189c1e79..b8c7d3d09 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -8,6 +8,7 @@ pub mod config;
 pub mod connector;
 pub mod graph;
 pub mod metadata;
+pub mod metrics;
 pub mod placement;
 pub mod sfdp;
 pub mod text;

From 69b1c99e2e6dfdb556f084c602623591f37c1433 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 21:35:18 -0700
Subject: [PATCH 08/38] engine: phase 1 fixups

Drop the now-stale #[allow(dead_code)] on the diagram::common rect helpers that
the layout::metrics core consumes (rect_width/rect_height/rect_area/
rect_overlap_area/segment_length_in_rect). Task 2 added those attributes as a
forward reference because the production callers had not landed yet; now that
metrics.rs uses them they are reachable, so the suppression is removed and the
explanatory comment updated. rect_contains_point keeps its allow since no
non-test caller needs it yet.
---
 src/simlin-engine/src/diagram/common.rs | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/simlin-engine/src/diagram/common.rs b/src/simlin-engine/src/diagram/common.rs
index e6a10d789..82310d2c3 100644
--- a/src/simlin-engine/src/diagram/common.rs
+++ b/src/simlin-engine/src/diagram/common.rs
@@ -138,31 +138,29 @@ pub fn rad_to_deg(r: f64) -> f64 {
 }
 
 // These rectangle/segment geometry primitives are the load-bearing helpers for
-// the layout quality metric (`layout::metrics`, added in a later task of this
-// phase). They are exercised by the inline tests below; the production callers
-// (node-overlap, label-overlap, and node-connector-overlap terms) land in
-// subsequent tasks, so each is `#[allow(dead_code)]` until then.
+// the layout quality metric (`layout::metrics`). `rect_width`/`rect_height`/
+// `rect_area`/`rect_overlap_area`/`segment_length_in_rect` are consumed there
+// (node-overlap, label-overlap, node-connector-overlap, sprawl, and aspect
+// terms); `rect_contains_point` is a primitive kept for completeness and
+// exercised by the inline tests below, so it stays `#[allow(dead_code)]` until
+// a caller needs it.
 
 /// Width of a rect (right - left). May be negative for a degenerate/inverted rect.
-#[allow(dead_code)]
 pub(crate) fn rect_width(r: &Rect) -> f64 {
     r.right - r.left
 }
 
 /// Height of a rect (bottom - top).
-#[allow(dead_code)]
 pub(crate) fn rect_height(r: &Rect) -> f64 {
     r.bottom - r.top
 }
 
 /// Area of a rect, clamped to >= 0.
-#[allow(dead_code)]
 pub(crate) fn rect_area(r: &Rect) -> f64 {
     (rect_width(r).max(0.0)) * (rect_height(r).max(0.0))
 }
 
 /// Area of the axis-aligned intersection of two rects (0 if they do not overlap).
-#[allow(dead_code)]
 pub(crate) fn rect_overlap_area(a: &Rect, b: &Rect) -> f64 {
     let w = a.right.min(b.right) - a.left.max(b.left);
     let h = a.bottom.min(b.bottom) - a.top.max(b.top);
@@ -177,7 +175,6 @@ pub(crate) fn rect_contains_point(r: &Rect, p: &Point) -> bool {
 
 /// Length of the portion of segment p0->p1 that lies within axis-aligned rect r.
 /// Returns 0 if the segment never enters r. Pure; no allocation.
-#[allow(dead_code)]
 pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 {
     // Liang-Barsky clip of the parametric segment p0 + t*(p1-p0), t in [0,1],
     // against left/right/top/bottom slabs.

From 7dac9f5d9f161f4d6628820b964e7bcd278a0e32 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 21:52:30 -0700
Subject: [PATCH 09/38] engine: fix label_overlap double-count and narrow
 crossings scale comment

The label_overlap term summed label-vs-label overlap plus label-vs-node
overlap, but the node boxes it compared labels against were the
label-merged `aux_bounds`/`stock_bounds`/`flow_bounds` (which union each
element's own label into the returned rect). So a label-vs-label overlap
was charged again inside the other node's merged box, inflating the
term's absolute magnitude. That magnitude matters because Phase 4
calibrates label_overlap's weight against it.

Expose label-free shape boxes (`aux_shape_bounds`/`stock_shape_bounds`/
`flow_shape_bounds`) -- the geometry each `*_bounds` already computes
before merging the label, refactored into a single source of truth so
renderer output stays byte-identical -- and compare labels against the
OTHER element's bare shape box. Modules/clouds already exclude their
label from `*_bounds`, so they are their own shape box. This cleanly
separates "label lands on another label" from "label lands on another
node's shape", counting each label pair exactly once.

Also narrow the AC1.8 scale-invariance comment: crossings are exactly
scale-invariant only for crossings interior to both connectors. A
crossing grazing a fixed-size node boundary (the polylines are clipped
to those boundaries) can flip under uniform scale. The test fixture's
crossing is at the center of the square the links form, far from every
node box, so the test remains correct.
---
 src/simlin-engine/src/diagram/elements.rs |  32 ++++--
 src/simlin-engine/src/diagram/flow.rs     |  29 ++++--
 src/simlin-engine/src/layout/metrics.rs   | 113 ++++++++++++++++++++--
 3 files changed, 151 insertions(+), 23 deletions(-)

diff --git a/src/simlin-engine/src/diagram/elements.rs b/src/simlin-engine/src/diagram/elements.rs
index ca6a56fcb..04215e974 100644
--- a/src/simlin-engine/src/diagram/elements.rs
+++ b/src/simlin-engine/src/diagram/elements.rs
@@ -49,16 +49,26 @@ pub fn render_aux(element: &view_element::Aux, is_arrayed: bool) -> String {
     svg
 }
 
-pub fn aux_bounds(element: &view_element::Aux) -> Rect {
+/// The aux's bare *shape* box (the circle's bounding rect), WITHOUT its label.
+/// `aux_bounds` is this box merged with the label; quality metrics that already
+/// account for labels separately (e.g. label-vs-node overlap) need the
+/// label-free shape to avoid double-counting the label area.
+pub(crate) fn aux_shape_bounds(element: &view_element::Aux) -> Rect {
     let cx = element.x;
     let cy = element.y;
     let r = AUX_RADIUS;
-    let bounds = Rect {
+    Rect {
         top: cy - r,
         left: cx - r,
         right: cx + r,
         bottom: cy + r,
-    };
+    }
+}
+
+pub fn aux_bounds(element: &view_element::Aux) -> Rect {
+    let cx = element.x;
+    let cy = element.y;
+    let bounds = aux_shape_bounds(element);
 
     let label_props = LabelProps::new(cx, cy, element.label_side, display_name(&element.name));
     element_with_label_bounds(bounds, &label_props)
@@ -108,17 +118,27 @@ pub fn render_stock(element: &view_element::Stock, is_arrayed: bool) -> String {
     svg
 }
 
-pub fn stock_bounds(element: &view_element::Stock) -> Rect {
+/// The stock's bare *shape* box (the rect), WITHOUT its label. See
+/// `aux_shape_bounds` for why the label-free shape is exposed separately.
+pub(crate) fn stock_shape_bounds(element: &view_element::Stock) -> Rect {
     let cx = element.x;
     let cy = element.y;
     let w = STOCK_WIDTH;
     let h = STOCK_HEIGHT;
-    let bounds = Rect {
+    Rect {
         top: cy - h / 2.0,
         left: cx - w / 2.0,
         right: cx + w / 2.0,
         bottom: cy + h / 2.0,
-    };
+    }
+}
+
+pub fn stock_bounds(element: &view_element::Stock) -> Rect {
+    let cx = element.x;
+    let cy = element.y;
+    let w = STOCK_WIDTH;
+    let h = STOCK_HEIGHT;
+    let bounds = stock_shape_bounds(element);
 
     let label_props = LabelProps::new(cx, cy, element.label_side, display_name(&element.name))
         .with_radii(w / 2.0, h / 2.0);
diff --git a/src/simlin-engine/src/diagram/flow.rs b/src/simlin-engine/src/diagram/flow.rs
index 91e911558..f7d0395a1 100644
--- a/src/simlin-engine/src/diagram/flow.rs
+++ b/src/simlin-engine/src/diagram/flow.rs
@@ -141,7 +141,12 @@ pub fn render_flow(element: &view_element::Flow, sink: &ViewElement, is_arrayed:
     svg
 }
 
-pub fn flow_bounds(element: &view_element::Flow) -> Rect {
+/// The flow's bare *shape* box (the valve plus the pipe polyline points),
+/// WITHOUT its label. `flow_bounds` is this box merged with the label; see
+/// `diagram::elements::aux_shape_bounds` for why the label-free shape is
+/// exposed separately. The flow path points ARE part of the shape (the drawn
+/// pipe), so they stay included here.
+pub(crate) fn flow_shape_bounds(element: &view_element::Flow) -> Rect {
     let cx = element.x;
     let cy = element.y;
     // Flow valve bounds use r=6 (FLOW_VALVE_RADIUS), NOT AuxRadius
@@ -153,13 +158,7 @@ pub fn flow_bounds(element: &view_element::Flow) -> Rect {
         bottom: cy + r,
     };
 
-    // Include label bounds
-    let label_props =
-        LabelProps::new(cx, cy, element.label_side, display_name(&element.name)).with_radii(r, r);
-    let l_bounds = label_bounds(&label_props);
-    bounds = merge_bounds(bounds, l_bounds);
-
-    // Include flow path points
+    // Include flow path points (the drawn pipe).
     for point in &element.points {
         bounds.left = bounds.left.min(point.x);
         bounds.right = bounds.right.max(point.x);
@@ -170,6 +169,20 @@ pub fn flow_bounds(element: &view_element::Flow) -> Rect {
     bounds
 }
 
+pub fn flow_bounds(element: &view_element::Flow) -> Rect {
+    let cx = element.x;
+    let cy = element.y;
+    let r = FLOW_VALVE_RADIUS;
+    let shape = flow_shape_bounds(element);
+
+    // Include label bounds
+    let label_props =
+        LabelProps::new(cx, cy, element.label_side, display_name(&element.name)).with_radii(r, r);
+    let l_bounds = label_bounds(&label_props);
+
+    merge_bounds(shape, l_bounds)
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index 3f510e8f5..75e5343fa 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -23,8 +23,10 @@ use crate::diagram::common::{
     segment_length_in_rect,
 };
 use crate::diagram::connector::{ARC_POLYLINE_SAMPLES, connector_polyline};
-use crate::diagram::elements::{aux_bounds, cloud_bounds, module_bounds, stock_bounds};
-use crate::diagram::flow::flow_bounds;
+use crate::diagram::elements::{
+    aux_bounds, aux_shape_bounds, cloud_bounds, module_bounds, stock_bounds, stock_shape_bounds,
+};
+use crate::diagram::flow::{flow_bounds, flow_shape_bounds};
 use crate::diagram::label::{LabelProps, label_bounds};
 
 use super::annealing::count_crossings;
@@ -167,6 +169,25 @@ fn node_box(element: &ViewElement) -> Option<Rect> {
     }
 }
 
+/// The element's bare *shape* box, WITHOUT its own label, for the same set of
+/// elements as `node_box`. `aux_bounds`/`stock_bounds`/`flow_bounds` merge each
+/// element's own label into the returned box; the label-vs-node term of
+/// `label_overlap` must use the label-free shape so a label-vs-label overlap is
+/// not also charged via the other node's label-merged box (a double-count).
+/// `module_bounds`/`cloud_bounds` already exclude the label (modules render a
+/// label that their bounds omit; clouds render none), so they are their own
+/// shape box.
+fn node_shape_box(element: &ViewElement) -> Option<Rect> {
+    match element {
+        ViewElement::Aux(a) => Some(aux_shape_bounds(a)),
+        ViewElement::Stock(s) => Some(stock_shape_bounds(s)),
+        ViewElement::Module(m) => Some(module_bounds(m)),
+        ViewElement::Cloud(c) => Some(cloud_bounds(c)),
+        ViewElement::Flow(f) => Some(flow_shape_bounds(f)),
+        ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => None,
+    }
+}
+
 /// Build a `LabelProps` for a labeled element, matching the renderer's label
 /// geometry (center, label side, display name, and the element's radii). Only
 /// elements that render a label return `Some`. The radii match the per-element
@@ -335,11 +356,26 @@ pub fn compute_layout_metrics(
     // construction, adjacent to (and inside the merged bounds of) its own
     // element, so charging it against its own box would always add exactly the
     // label's area -- a constant that is not a real collision.
+    //
+    // The label-vs-node sum compares each label against every OTHER element's
+    // bare *shape* box (`node_shape_box`), NOT its label-merged `node_box`.
+    // `aux_bounds`/`stock_bounds`/`flow_bounds` union each element's own label
+    // into the box they return, so comparing a label against another node's
+    // MERGED box would re-count a label-vs-label overlap that the label-vs-label
+    // term above already counts -- a double-count that inflates the term's
+    // magnitude (which Phase 4 calibrates against). Using the label-free shape
+    // cleanly separates "label lands on another label" from "label lands on
+    // another node's shape".
     let label_boxes: Vec<(i32, Rect)> = view
         .elements
         .iter()
         .filter_map(|e| element_label_props(e).map(|props| (e.get_uid(), label_bounds(&props))))
         .collect();
+    let node_shape_boxes: Vec<(i32, Rect)> = view
+        .elements
+        .iter()
+        .filter_map(|e| node_shape_box(e).map(|r| (e.get_uid(), r)))
+        .collect();
     let total_label_area: f64 = label_boxes.iter().map(|(_, r)| rect_area(r)).sum();
     let label_overlap = if total_label_area > 0.0 {
         let mut overlap = 0.0;
@@ -349,9 +385,9 @@ pub fn compute_layout_metrics(
                 overlap += rect_overlap_area(&label_boxes[i].1, &label_boxes[j].1);
             }
         }
-        // label-vs-node, skipping the label's own element box.
+        // label-vs-node, against the OTHER element's bare shape box.
         for (lbl_uid, lbl) in &label_boxes {
-            for (node_uid, node) in &node_boxes {
+            for (node_uid, node) in &node_shape_boxes {
                 if lbl_uid == node_uid {
                     continue;
                 }
@@ -714,6 +750,46 @@ mod tests {
         assert_eq!(m.label_overlap, 0.0);
     }
 
+    #[test]
+    fn test_label_overlap_counts_label_pair_exactly_once() {
+        // Regression for the label_overlap double-count: the label-vs-node term
+        // must compare each label against the OTHER element's bare *shape* box,
+        // not its label-merged `*_bounds` box. Otherwise a label-vs-label
+        // overlap is also counted (once or twice more) inside the other node's
+        // merged box, inflating the magnitude (and the Phase 4 weight it
+        // calibrates).
+        //
+        // Fixture: two `LabelSide::Bottom` auxes named "samename" (8 chars).
+        //   AUX_RADIUS = 9; label editor width = 8*6 + 10 = 58, height = 14.
+        //   With Bottom labels, label top = cy + 9 + LABEL_PADDING(4) = cy + 13,
+        //   bottom = cy + 27, left = cx - 29, right = cx + 29.
+        //
+        // Place them 40px apart horizontally, same y:
+        //   aux1 @ (0,0): shape [-9,9]x[-9,9],  label [-29,29]x[13,27]
+        //   aux2 @ (40,0): shape [31,49]x[-9,9], label [11,69]x[13,27]
+        //
+        // SHAPE boxes do NOT overlap (9 < 31). LABELS overlap by
+        //   x: [11,29] = 18, y: [13,27] = 14  ->  18*14 = 252.
+        // Each label clears the OTHER aux's bare shape box entirely, so the only
+        // contribution is the single label-vs-label pair: total overlap = 252.
+        let view = make_view(vec![
+            aux(1, "samename", 0.0, 0.0),
+            aux(2, "samename", 40.0, 0.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        // Total label area = 2 * (58 * 14) = 1624.
+        let expected_overlap = 18.0 * 14.0; // 252.0, counted exactly once
+        let total_label_area = 2.0 * (58.0 * 14.0); // 1624.0
+        let expected = expected_overlap / total_label_area;
+        assert!(
+            (m.label_overlap - expected).abs() < 1e-9,
+            "label_overlap should count the label pair exactly once: got {} expected {}",
+            m.label_overlap,
+            expected
+        );
+    }
+
     // --- AC1.5: aspect_penalty ---
 
     #[test]
@@ -881,9 +957,15 @@ mod tests {
     // `aspect_penalty` as scale-free. After implementing the metric against the
     // ACTUAL renderer geometry (the design's load-bearing invariant: metrics
     // are computed on the same geometry the renderer draws), only `crossings`
-    // is *exactly* scale-invariant. The reason is the same fixed-pixel element
-    // geometry the plan already cites for node_overlap/label_overlap/sprawl, and
-    // it propagates further than the plan anticipated:
+    // is exactly scale-invariant -- and even then only for crossings that lie
+    // INTERIOR to both connectors, away from the fixed-size node boundaries the
+    // polylines are clipped to (a crossing grazing a node boundary near a
+    // segment endpoint can flip; see the detailed note at the assertion below).
+    // This fixture's crossing is at the center of the square the two links form,
+    // squarely in that interior regime. The reason the other terms are not
+    // exactly invariant is the same fixed-pixel element geometry the plan
+    // already cites for node_overlap/label_overlap/sprawl, and it propagates
+    // further than the plan anticipated:
     //
     //   * Connectors are clipped to fixed-radius element boundaries, so a
     //     straight link's drawn length is `s*center_dist - r_from - r_to`
@@ -941,8 +1023,21 @@ mod tests {
         let s = 3.0;
         let scaled = compute_layout_metrics(&scale_view(&view, s), &cfg());
 
-        // The one exactly scale-invariant term: edge crossings are a topological
-        // count, preserved by any uniform scale.
+        // The one exactly scale-invariant term here: edge crossings.
+        //
+        // Crossings are NOT *universally* scale-invariant. A crossing is counted
+        // on the drawn polylines, which are clipped to the same fixed-pixel node
+        // boxes (the connector endpoints sit on element boundaries that do not
+        // scale). A crossing that merely grazes a node boundary near a segment
+        // endpoint can therefore appear or disappear under uniform scale.
+        // Crossings that lie comfortably INTERIOR to both connectors (away from
+        // those fixed-size boundaries) are exactly preserved, because the
+        // interior of each polyline is an exact affine image of itself under
+        // uniform scale and an intersection of two segments is invariant under a
+        // shared affine map. This fixture's crossing is at the center of the
+        // square the two links form -- maximally far from every node box -- so
+        // it is squarely in the scale-invariant interior regime and the count is
+        // preserved exactly.
         assert!(
             (scaled.crossings - base.crossings).abs() < 1e-9,
             "crossings not scale-invariant: {} vs {}",

From fb32e36cd3a5c340ec74a49524dad8eeafd25de2 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 22:06:41 -0700
Subject: [PATCH 10/38] engine: add pure layout eval statistics primitives
 (eval_stats.rs)

---
 src/simlin-engine/src/layout/eval_stats.rs | 451 +++++++++++++++++++++
 src/simlin-engine/src/layout/mod.rs        |   1 +
 2 files changed, 452 insertions(+)
 create mode 100644 src/simlin-engine/src/layout/eval_stats.rs

diff --git a/src/simlin-engine/src/layout/eval_stats.rs b/src/simlin-engine/src/layout/eval_stats.rs
new file mode 100644
index 000000000..0bdc97cc2
--- /dev/null
+++ b/src/simlin-engine/src/layout/eval_stats.rs
@@ -0,0 +1,451 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+// pattern: Functional Core
+//
+// Pure statistics for layout-quality seed-sample distributions, mirroring Go's
+// `benchstat`: many per-seed samples reduced to a center + spread, plus a
+// non-parametric significance test (Mann-Whitney U) on differences.
+//
+// There is NO I/O in this module: it takes slices of numbers, computes scalars,
+// and returns them. Every primitive returns a finite, documented default
+// (`0.0`, or a non-significant `p_value` of `1.0`) on empty or degenerate
+// input -- it must never return NaN, matching the engine's no-NaN policy for
+// statistics. That makes every term trivially testable with hand-computed
+// expected values (see the inline tests below).
+//
+// The corpus sweep (Phase 3) is the imperative shell that fills these structs
+// from real layouts.
+
+/// Geometric mean of strictly-positive values: `exp(mean(ln(x)))`.
+///
+/// Returns `0.0` for an empty slice. Values must be `> 0`; layout costs are
+/// `>= 0`, so callers floor with a small epsilon before calling (see
+/// [`CorpusReport::from_model_stats`]) so a single `0` cost cannot zero the
+/// whole-corpus geometric mean.
+pub fn geomean(values: &[f64]) -> f64 {
+    if values.is_empty() {
+        return 0.0;
+    }
+    // The geometric mean of a single value is that value exactly; short-circuit
+    // to avoid a needless ln/exp round-trip (which would return e.g.
+    // 4.999999999999999 for an input of 5.0).
+    if values.len() == 1 {
+        return values[0];
+    }
+    let sum_ln: f64 = values.iter().map(|&x| x.ln()).sum();
+    (sum_ln / values.len() as f64).exp()
+}
+
+/// Linear-interpolated percentile using the "type 7" convention (NumPy's
+/// default): for sorted `x` of length `n` and `p` in `[0, 1]`, the fractional
+/// rank is `p * (n - 1)`, then the result interpolates linearly between the
+/// values at the floor and ceil of that rank.
+///
+/// Returns `0.0` for an empty slice and the single value for `n == 1`.
+/// `values` need not be pre-sorted -- a copy is sorted internally. `p` is
+/// clamped to `[0, 1]`.
+pub fn percentile(values: &[f64], p: f64) -> f64 {
+    if values.is_empty() {
+        return 0.0;
+    }
+    let n = values.len();
+    if n == 1 {
+        return values[0];
+    }
+
+    let mut sorted = values.to_vec();
+    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
+
+    let p = p.clamp(0.0, 1.0);
+    // Type-7 fractional rank in [0, n-1].
+    let rank = p * (n as f64 - 1.0);
+    let lo = rank.floor() as usize;
+    let hi = rank.ceil() as usize;
+    if lo == hi {
+        return sorted[lo];
+    }
+    let frac = rank - lo as f64;
+    sorted[lo] * (1.0 - frac) + sorted[hi] * frac
+}
+
+/// Median, equal to `percentile(values, 0.5)`.
+pub fn median(values: &[f64]) -> f64 {
+    percentile(values, 0.5)
+}
+
+/// Mann-Whitney U test result for two independent samples.
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct MannWhitney {
+    /// The smaller of `u1` and `u2`.
+    pub u: f64,
+    /// U statistic for sample `a`.
+    pub u1: f64,
+    /// U statistic for sample `b`.
+    pub u2: f64,
+    /// Two-sided p-value (normal approximation with tie + continuity
+    /// correction).
+    pub p_value: f64,
+}
+
+/// Mann-Whitney U (a.k.a. Wilcoxon rank-sum) test on two independent samples.
+///
+/// Ranks the pooled samples, averaging tied ranks; computes U from the rank
+/// sums; reports the two-sided p-value via the normal approximation with tie
+/// correction and continuity correction. For tiny samples this approximation
+/// is rough; the sweep uses M >= ~20 seeds where it is good.
+///
+/// Returns `p_value = 1.0` (non-significant) when either sample is empty or all
+/// pooled values are identical (no separation is possible, so the variance of
+/// the normal approximation is zero).
+pub fn mann_whitney_u(a: &[f64], b: &[f64]) -> MannWhitney {
+    let n1 = a.len();
+    let n2 = b.len();
+    if n1 == 0 || n2 == 0 {
+        // No separation possible with an empty sample. Report a degenerate but
+        // finite result with a non-significant p-value.
+        return MannWhitney {
+            u: 0.0,
+            u1: 0.0,
+            u2: 0.0,
+            p_value: 1.0,
+        };
+    }
+
+    // 1. Pool, tagging each value with which sample it came from (false = a),
+    //    sort by value, and assign average ranks (1..=N) to tied groups.
+    let mut pooled: Vec<(f64, bool)> = Vec::with_capacity(n1 + n2);
+    pooled.extend(a.iter().map(|&v| (v, false)));
+    pooled.extend(b.iter().map(|&v| (v, true)));
+    pooled.sort_by(|x, y| x.0.partial_cmp(&y.0).unwrap_or(std::cmp::Ordering::Equal));
+
+    let n = (n1 + n2) as f64;
+    let mut r1 = 0.0; // sum of ranks belonging to sample `a`
+    // Σ (t^3 - t) over each tie group of size t, for the variance correction.
+    let mut tie_term = 0.0;
+    let mut i = 0;
+    while i < pooled.len() {
+        // Extend [i, j) over the run of values equal to pooled[i].0.
+        let mut j = i + 1;
+        while j < pooled.len() && pooled[j].0 == pooled[i].0 {
+            j += 1;
+        }
+        let group_len = j - i;
+        // Ranks are 1-based; the average rank of positions i..j (0-based) is
+        // ((i+1) + j) / 2.
+        let avg_rank = ((i + 1) + j) as f64 / 2.0;
+        for entry in &pooled[i..j] {
+            if !entry.1 {
+                r1 += avg_rank;
+            }
+        }
+        if group_len > 1 {
+            let t = group_len as f64;
+            tie_term += t * t * t - t;
+        }
+        i = j;
+    }
+
+    // 2. U statistics from the rank sums.
+    let n1f = n1 as f64;
+    let n2f = n2 as f64;
+    let u1 = r1 - n1f * (n1f + 1.0) / 2.0;
+    let u2 = n1f * n2f - u1;
+    let u = u1.min(u2);
+
+    // 3. Mean and tie-corrected variance of the U distribution.
+    let mu = n1f * n2f / 2.0;
+    let variance = (n1f * n2f / 12.0) * ((n + 1.0) - tie_term / (n * (n - 1.0)));
+
+    // 4. Two-sided p-value via the normal approximation with a 0.5 continuity
+    //    correction. When the variance is zero (all pooled values identical,
+    //    or n == 1 with no spread), no separation is possible -- report the
+    //    non-significant default rather than dividing by zero.
+    let p_value = if variance <= 0.0 {
+        1.0
+    } else {
+        let z = ((u - mu).abs() - 0.5).max(0.0) / variance.sqrt();
+        (2.0 * (1.0 - phi(z))).clamp(0.0, 1.0)
+    };
+
+    MannWhitney { u, u1, u2, p_value }
+}
+
+/// Error function via the Abramowitz & Stegun 7.1.26 rational approximation
+/// (max absolute error ~1.5e-7) -- ample accuracy for a significance verdict.
+///
+/// A small local copy keeps this module self-contained and independently
+/// testable (the VM-internal `crate::alloc::erfc_approx`/`normal_cdf` are an
+/// implementation detail of the allocation opcodes).
+fn erf(x: f64) -> f64 {
+    // A&S 7.1.26 is stated for x >= 0; erf is odd, so reflect for x < 0.
+    let sign = if x < 0.0 { -1.0 } else { 1.0 };
+    let x = x.abs();
+
+    const A1: f64 = 0.254_829_592;
+    const A2: f64 = -0.284_496_736;
+    const A3: f64 = 1.421_413_741;
+    const A4: f64 = -1.453_152_027;
+    const A5: f64 = 1.061_405_429;
+    const P: f64 = 0.327_591_1;
+
+    let t = 1.0 / (1.0 + P * x);
+    // Horner form of (a1 t + a2 t^2 + a3 t^3 + a4 t^4 + a5 t^5).
+    let poly = ((((A5 * t + A4) * t + A3) * t + A2) * t + A1) * t;
+    let y = 1.0 - poly * (-x * x).exp();
+    sign * y
+}
+
+/// Standard normal CDF, `Phi(x) = 0.5 * (1 + erf(x / sqrt(2)))`.
+fn phi(x: f64) -> f64 {
+    0.5 * (1.0 + erf(x / std::f64::consts::SQRT_2))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use proptest::prelude::*;
+
+    const EPS: f64 = 1e-9;
+
+    fn close(a: f64, b: f64) -> bool {
+        (a - b).abs() < EPS
+    }
+
+    // --- geomean ---
+
+    #[test]
+    fn test_geomean_two_values() {
+        // sqrt(2*8) = sqrt(16) = 4.
+        assert!(close(geomean(&[2.0, 8.0]), 4.0), "{}", geomean(&[2.0, 8.0]));
+    }
+
+    #[test]
+    fn test_geomean_three_values() {
+        // cbrt(1*10*100) = cbrt(1000) = 10.
+        let g = geomean(&[1.0, 10.0, 100.0]);
+        assert!(close(g, 10.0), "{}", g);
+    }
+
+    #[test]
+    fn test_geomean_empty_is_zero() {
+        assert_eq!(geomean(&[]), 0.0);
+    }
+
+    #[test]
+    fn test_geomean_single() {
+        assert_eq!(geomean(&[5.0]), 5.0);
+    }
+
+    // --- percentile / median (type 7) ---
+
+    #[test]
+    fn test_median_odd() {
+        assert_eq!(median(&[1.0, 2.0, 3.0]), 2.0);
+    }
+
+    #[test]
+    fn test_median_even() {
+        assert_eq!(median(&[1.0, 2.0, 3.0, 4.0]), 2.5);
+    }
+
+    #[test]
+    fn test_percentile_type7_quartiles() {
+        // NumPy np.percentile([1,2,3,4,5], 25) == 2.0, 75 == 4.0.
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.25), 2.0);
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.75), 4.0);
+    }
+
+    #[test]
+    fn test_percentile_empty_is_zero() {
+        assert_eq!(percentile(&[], 0.5), 0.0);
+    }
+
+    #[test]
+    fn test_percentile_single() {
+        assert_eq!(percentile(&[7.0], 0.9), 7.0);
+    }
+
+    #[test]
+    fn test_percentile_unsorted_input() {
+        // The function must sort a copy: a reversed input gives the same answer.
+        assert_eq!(percentile(&[5.0, 4.0, 3.0, 2.0, 1.0], 0.25), 2.0);
+    }
+
+    #[test]
+    fn test_percentile_endpoints() {
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.0), 1.0);
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 1.0), 5.0);
+    }
+
+    // --- Mann-Whitney U ---
+
+    #[test]
+    fn test_mann_whitney_complete_separation() {
+        // a strictly below b: complete separation. With n1 = n2 = 4,
+        // r1 = 1+2+3+4 = 10, u1 = 10 - 4*5/2 = 0, u2 = 16 - 0 = 16, u = 0.
+        let r = mann_whitney_u(&[1.0, 2.0, 3.0, 4.0], &[5.0, 6.0, 7.0, 8.0]);
+        assert_eq!(r.u1, 0.0);
+        assert_eq!(r.u2, 16.0);
+        assert_eq!(r.u, 0.0);
+        assert!(
+            r.p_value < 0.05,
+            "p_value {} should be significant",
+            r.p_value
+        );
+    }
+
+    #[test]
+    fn test_mann_whitney_no_difference() {
+        // Identical samples: every value tied. u1 == u2 == n1*n2/2 == 8, and
+        // the tie-corrected variance is 0, so p_value is the non-significant
+        // default of 1.0.
+        let r = mann_whitney_u(&[1.0, 2.0, 3.0, 4.0], &[1.0, 2.0, 3.0, 4.0]);
+        assert_eq!(r.u1, 8.0);
+        assert_eq!(r.u2, 8.0);
+        assert!(
+            r.p_value > 0.5,
+            "p_value {} should be non-significant",
+            r.p_value
+        );
+    }
+
+    #[test]
+    fn test_mann_whitney_u1_plus_u2_invariant() {
+        // u1 + u2 == n1*n2 on a mixed (interleaved, with ties) example.
+        let a = [1.0, 3.0, 5.0, 7.0, 3.0];
+        let b = [2.0, 4.0, 6.0, 3.0];
+        let r = mann_whitney_u(&a, &b);
+        let n1n2 = (a.len() * b.len()) as f64;
+        assert!(
+            close(r.u1 + r.u2, n1n2),
+            "u1 {} + u2 {} != n1*n2 {}",
+            r.u1,
+            r.u2,
+            n1n2
+        );
+    }
+
+    #[test]
+    fn test_mann_whitney_empty_is_nonsignificant() {
+        let r = mann_whitney_u(&[], &[1.0, 2.0, 3.0]);
+        assert_eq!(r.p_value, 1.0);
+        assert!(r.u.is_finite());
+        assert!(r.u1.is_finite());
+        assert!(r.u2.is_finite());
+    }
+
+    // --- erf / Phi sanity (exercised indirectly through the p-value path) ---
+
+    #[test]
+    fn test_phi_zero() {
+        assert!(close(phi(0.0), 0.5), "{}", phi(0.0));
+    }
+
+    #[test]
+    fn test_phi_1_96() {
+        // The classic 97.5th percentile of the standard normal.
+        assert!((phi(1.96) - 0.975).abs() < 1e-3, "{}", phi(1.96));
+    }
+
+    #[test]
+    fn test_erf_known_values() {
+        assert!(close(erf(0.0), 0.0), "{}", erf(0.0));
+        // erf(1) ~= 0.8427007929 (A&S 7.1.26 max error ~1.5e-7).
+        assert!((erf(1.0) - 0.842_700_792_9).abs() < 1e-6, "{}", erf(1.0));
+        // erf is odd.
+        assert!(close(erf(-0.5), -erf(0.5)), "erf not odd");
+    }
+
+    // --- No NaN: every primitive on empty / degenerate input is finite ---
+
+    #[test]
+    fn test_no_nan_on_degenerate_input() {
+        assert!(geomean(&[]).is_finite());
+        assert!(geomean(&[3.0]).is_finite());
+        assert!(percentile(&[], 0.5).is_finite());
+        assert!(percentile(&[1.0], 0.5).is_finite());
+        assert!(median(&[]).is_finite());
+        let r0 = mann_whitney_u(&[], &[]);
+        assert!(r0.u.is_finite() && r0.u1.is_finite() && r0.u2.is_finite());
+        assert!(r0.p_value.is_finite());
+        let r1 = mann_whitney_u(&[1.0, 1.0], &[1.0, 1.0]);
+        assert!(r1.p_value.is_finite());
+        assert!(phi(0.0).is_finite());
+        assert!(erf(0.0).is_finite());
+    }
+
+    // --- property tests for the statistics invariants ---
+
+    proptest! {
+        #![proptest_config(ProptestConfig::with_cases(128))]
+
+        /// The geometric mean is a function of the multiset of values: it is
+        /// invariant under any permutation of the input (the product of the
+        /// values is commutative).
+        #[test]
+        fn prop_geomean_permutation_invariant(
+            mut vals in prop::collection::vec(0.01f64..1000.0, 1..=12),
+            seed in any::<u64>(),
+        ) {
+            let base = geomean(&vals);
+            // Deterministic Fisher-Yates shuffle driven by `seed` so the
+            // property is a pure rearrangement of the same multiset.
+            let mut state = seed | 1;
+            for i in (1..vals.len()).rev() {
+                state = state.wrapping_mul(6364136223846793005).wrapping_add(1);
+                let j = (state >> 33) as usize % (i + 1);
+                vals.swap(i, j);
+            }
+            let shuffled = geomean(&vals);
+            // Relative tolerance: ln/exp accumulates rounding across orderings.
+            prop_assert!(
+                (base - shuffled).abs() <= 1e-9 * base.abs().max(1.0),
+                "geomean changed under permutation: {} vs {}",
+                base,
+                shuffled
+            );
+        }
+
+        /// `percentile` is bounded by the sample's min and max and is monotone
+        /// non-decreasing in `p`. Both are core type-7 invariants and both must
+        /// produce finite values.
+        #[test]
+        fn prop_percentile_bounded_and_monotone(
+            vals in prop::collection::vec(-500.0f64..500.0, 1..=20),
+            p_lo in 0.0f64..=1.0,
+            delta in 0.0f64..=1.0,
+        ) {
+            let p_hi = (p_lo + delta).min(1.0);
+            let min = vals.iter().cloned().fold(f64::INFINITY, f64::min);
+            let max = vals.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
+            let q_lo = percentile(&vals, p_lo);
+            let q_hi = percentile(&vals, p_hi);
+            prop_assert!(q_lo.is_finite() && q_hi.is_finite());
+            // Bounded by the data range (small slack for interpolation rounding).
+            prop_assert!(q_lo >= min - 1e-9 && q_lo <= max + 1e-9, "{} not in [{},{}]", q_lo, min, max);
+            // Monotone non-decreasing in p.
+            prop_assert!(q_hi >= q_lo - 1e-9, "percentile not monotone: {} < {}", q_hi, q_lo);
+        }
+
+        /// The partition identity `u1 + u2 == n1 * n2` holds for ANY pair of
+        /// non-empty samples, and the reported `u` is the smaller of the two.
+        /// The two-sided p-value is always a finite probability in [0, 1].
+        #[test]
+        fn prop_mann_whitney_partition_identity(
+            a in prop::collection::vec(-50.0f64..50.0, 1..=15),
+            b in prop::collection::vec(-50.0f64..50.0, 1..=15),
+        ) {
+            let r = mann_whitney_u(&a, &b);
+            let n1n2 = (a.len() * b.len()) as f64;
+            prop_assert!(
+                (r.u1 + r.u2 - n1n2).abs() < 1e-9,
+                "u1 {} + u2 {} != n1*n2 {}",
+                r.u1, r.u2, n1n2
+            );
+            prop_assert!((r.u - r.u1.min(r.u2)).abs() < 1e-9);
+            prop_assert!(r.p_value.is_finite() && (0.0..=1.0).contains(&r.p_value));
+        }
+    }
+}
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index b8c7d3d09..d5414d693 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -6,6 +6,7 @@ pub mod annealing;
 pub mod chain;
 pub mod config;
 pub mod connector;
+pub mod eval_stats;
 pub mod graph;
 pub mod metadata;
 pub mod metrics;

From a94a930baa2baf077fadf93c17ba37364eb6a800 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 22:11:33 -0700
Subject: [PATCH 11/38] engine: add ModelStats/CorpusReport constructors

---
 src/simlin-engine/src/layout/eval_stats.rs | 323 +++++++++++++++++++++
 1 file changed, 323 insertions(+)

diff --git a/src/simlin-engine/src/layout/eval_stats.rs b/src/simlin-engine/src/layout/eval_stats.rs
index 0bdc97cc2..5ba5227ed 100644
--- a/src/simlin-engine/src/layout/eval_stats.rs
+++ b/src/simlin-engine/src/layout/eval_stats.rs
@@ -18,6 +18,8 @@
 // The corpus sweep (Phase 3) is the imperative shell that fills these structs
 // from real layouts.
 
+use crate::layout::metrics::LayoutMetrics;
+
 /// Geometric mean of strictly-positive values: `exp(mean(ln(x)))`.
 ///
 /// Returns `0.0` for an empty slice. Values must be `> 0`; layout costs are
@@ -202,6 +204,165 @@ fn phi(x: f64) -> f64 {
     0.5 * (1.0 + erf(x / std::f64::consts::SQRT_2))
 }
 
+/// Floor applied to each model's median before it enters the corpus geometric
+/// mean. A geometric mean is the product of its terms, so a single `0` median
+/// would zero the whole aggregate; flooring with this small epsilon keeps a
+/// genuinely-perfect (zero-cost) model from collapsing the corpus number while
+/// remaining far below any meaningful cost. Documented and applied only in
+/// [`CorpusReport::from_model_stats`].
+pub const GEOMEAN_FLOOR_EPSILON: f64 = 1e-9;
+
+/// One per-seed layout sample: the seed that produced the layout, its computed
+/// metrics, and the scalar weighted cost the optimizer minimizes.
+#[derive(Clone, Debug)]
+pub struct MetricSample {
+    pub seed: u64,
+    pub metrics: LayoutMetrics,
+    pub weighted_cost: f64,
+}
+
+/// Aggregated statistics for one model's seed sweep: the raw per-seed samples
+/// plus the center (`median_cost`), spread (`p25`, `p75`), the best-of-k
+/// production proxy, and the best/median/worst seeds (which drive Phase 3's
+/// PNG renders).
+#[derive(Clone, Debug)]
+pub struct ModelStats {
+    pub model: String,
+    /// One sample per seed.
+    pub samples: Vec<MetricSample>,
+    pub median_cost: f64,
+    /// `(p25, p75)` of the weighted costs.
+    pub spread: (f64, f64),
+    /// Production proxy: the min weighted cost over the k production seeds.
+    pub best_of_k_cost: f64,
+    pub best_seed: u64,
+    pub median_seed: u64,
+    pub worst_seed: u64,
+}
+
+/// Corpus-wide report: one `ModelStats` per model plus the geometric mean of
+/// the per-model medians (the single headline aggregate, benchstat-style).
+#[derive(Clone, Debug)]
+pub struct CorpusReport {
+    pub per_model: Vec<ModelStats>,
+    pub geomean_of_medians: f64,
+}
+
+impl ModelStats {
+    /// Summarize a model's per-seed samples.
+    ///
+    /// `production_seeds` is the fixed seed set used for the best-of-k proxy:
+    /// `best_of_k_cost` is the min `weighted_cost` among the samples whose seed
+    /// is in that set, falling back to the global min when none of the
+    /// production seeds were sampled. The median seed is the sample whose cost
+    /// is closest to `median_cost`, breaking ties on the lowest seed (so the
+    /// chosen render is deterministic). Empty `samples` yields all-zero fields
+    /// and seeds of `0` -- no panic.
+    pub fn from_samples(
+        model: String,
+        samples: Vec<MetricSample>,
+        production_seeds: &[u64],
+    ) -> ModelStats {
+        if samples.is_empty() {
+            return ModelStats {
+                model,
+                samples,
+                median_cost: 0.0,
+                spread: (0.0, 0.0),
+                best_of_k_cost: 0.0,
+                best_seed: 0,
+                median_seed: 0,
+                worst_seed: 0,
+            };
+        }
+
+        let costs: Vec<f64> = samples.iter().map(|s| s.weighted_cost).collect();
+        let median_cost = median(&costs);
+        let spread = (percentile(&costs, 0.25), percentile(&costs, 0.75));
+
+        // best/worst seeds: the seeds of the global min / max weighted_cost.
+        // Tie-break on the lowest seed so the chosen render is deterministic.
+        let best_seed = samples
+            .iter()
+            .min_by(|x, y| {
+                x.weighted_cost
+                    .partial_cmp(&y.weighted_cost)
+                    .unwrap_or(std::cmp::Ordering::Equal)
+                    .then(x.seed.cmp(&y.seed))
+            })
+            .map(|s| s.seed)
+            .unwrap_or(0);
+        let worst_seed = samples
+            .iter()
+            .max_by(|x, y| {
+                x.weighted_cost
+                    .partial_cmp(&y.weighted_cost)
+                    .unwrap_or(std::cmp::Ordering::Equal)
+                    // For a tie on cost, max_by returns the LATER-compared-greater
+                    // element; flip the seed comparison so the lowest seed wins.
+                    .then(y.seed.cmp(&x.seed))
+            })
+            .map(|s| s.seed)
+            .unwrap_or(0);
+
+        // median seed: the sample whose cost is closest to `median_cost`,
+        // breaking ties on the lowest seed.
+        let median_seed = samples
+            .iter()
+            .min_by(|x, y| {
+                let dx = (x.weighted_cost - median_cost).abs();
+                let dy = (y.weighted_cost - median_cost).abs();
+                dx.partial_cmp(&dy)
+                    .unwrap_or(std::cmp::Ordering::Equal)
+                    .then(x.seed.cmp(&y.seed))
+            })
+            .map(|s| s.seed)
+            .unwrap_or(0);
+
+        // best-of-k: min weighted_cost among samples whose seed is a production
+        // seed; fall back to the global min when none were sampled.
+        let prod_min = samples
+            .iter()
+            .filter(|s| production_seeds.contains(&s.seed))
+            .map(|s| s.weighted_cost)
+            .fold(f64::INFINITY, f64::min);
+        let best_of_k_cost = if prod_min.is_finite() {
+            prod_min
+        } else {
+            costs.iter().cloned().fold(f64::INFINITY, f64::min)
+        };
+
+        ModelStats {
+            model,
+            samples,
+            median_cost,
+            spread,
+            best_of_k_cost,
+            best_seed,
+            median_seed,
+            worst_seed,
+        }
+    }
+}
+
+impl CorpusReport {
+    /// Build a corpus report. `geomean_of_medians` is the geometric mean of
+    /// each model's `median_cost`, with each median floored by
+    /// [`GEOMEAN_FLOOR_EPSILON`] so a single `0` median cannot zero the whole
+    /// aggregate. An empty corpus yields `geomean_of_medians == 0.0`.
+    pub fn from_model_stats(per_model: Vec<ModelStats>) -> CorpusReport {
+        let medians: Vec<f64> = per_model
+            .iter()
+            .map(|m| m.median_cost.max(GEOMEAN_FLOOR_EPSILON))
+            .collect();
+        let geomean_of_medians = geomean(&medians);
+        CorpusReport {
+            per_model,
+            geomean_of_medians,
+        }
+    }
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -448,4 +609,166 @@ mod tests {
             prop_assert!(r.p_value.is_finite() && (0.0..=1.0).contains(&r.p_value));
         }
     }
+
+    // --- Task 2: ModelStats / CorpusReport constructors ---
+
+    /// A `LayoutMetrics` whose `node_overlap` carries `cost` and every other
+    /// term is zero, so `weighted_cost` with `node_overlap == 1.0` returns
+    /// exactly `cost`. Keeps the test fixtures readable while still exercising
+    /// the real struct.
+    fn metrics_with_cost(cost: f64) -> LayoutMetrics {
+        LayoutMetrics {
+            node_overlap: cost,
+            node_connector_overlap: 0.0,
+            label_overlap: 0.0,
+            crossings: 0.0,
+            sprawl: 0.0,
+            edge_length_cv: 0.0,
+            aspect_penalty: 0.0,
+            chain_straightness: 0.0,
+            loop_compactness: 0.0,
+        }
+    }
+
+    fn sample(seed: u64, cost: f64) -> MetricSample {
+        MetricSample {
+            seed,
+            metrics: metrics_with_cost(cost),
+            weighted_cost: cost,
+        }
+    }
+
+    #[test]
+    fn test_from_samples_known_set() {
+        // Five seeds with hand-pickable costs.
+        //   seed 1 -> 10, seed 2 -> 30, seed 3 -> 20, seed 4 -> 50, seed 5 -> 40
+        // Sorted costs: [10, 20, 30, 40, 50].
+        //   median (type-7, p=0.5) = 30
+        //   p25 = 20, p75 = 40
+        //   global min cost = 10 (seed 1), max cost = 50 (seed 4)
+        //   median-nearest cost = 30 (seed 2)
+        let samples = vec![
+            sample(1, 10.0),
+            sample(2, 30.0),
+            sample(3, 20.0),
+            sample(4, 50.0),
+            sample(5, 40.0),
+        ];
+        // Production seeds: 3 and 5 (costs 20 and 40). Min over them is 20, which
+        // is NOT the global min (10, seed 1). This is the "best-of-k differs from
+        // the global min" case.
+        let production_seeds = [3u64, 5u64];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &production_seeds);
+
+        assert_eq!(stats.model, "m");
+        assert_eq!(stats.median_cost, 30.0);
+        assert_eq!(stats.spread, (20.0, 40.0));
+        assert_eq!(
+            stats.best_of_k_cost, 20.0,
+            "best-of-k must use production seeds"
+        );
+        assert_eq!(stats.best_seed, 1, "global min cost is seed 1");
+        assert_eq!(stats.worst_seed, 4, "global max cost is seed 4");
+        assert_eq!(stats.median_seed, 2, "median-nearest cost is seed 2");
+    }
+
+    #[test]
+    fn test_from_samples_best_of_k_falls_back_to_global_min() {
+        // No production seed was sampled -> best_of_k_cost falls back to global
+        // min weighted_cost.
+        let samples = vec![sample(1, 10.0), sample(2, 30.0), sample(3, 20.0)];
+        let production_seeds = [100u64, 200u64];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &production_seeds);
+        assert_eq!(
+            stats.best_of_k_cost, 10.0,
+            "no production seed sampled -> global min"
+        );
+    }
+
+    #[test]
+    fn test_from_samples_median_seed_tie_break_lowest() {
+        // Two seeds equidistant from the median cost: the lower seed wins.
+        //   seeds 5, 9 with costs 10 and 30; sorted costs [10, 30] -> median 20.
+        //   |10 - 20| == |30 - 20| == 10, a tie. Lowest seed (5) must win.
+        let samples = vec![sample(9, 30.0), sample(5, 10.0)];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &[]);
+        assert_eq!(stats.median_cost, 20.0);
+        assert_eq!(stats.median_seed, 5, "tie must break on the lowest seed");
+    }
+
+    #[test]
+    fn test_from_samples_empty_is_all_zero() {
+        let stats = ModelStats::from_samples("empty".to_string(), vec![], &[1, 2, 3]);
+        assert_eq!(stats.median_cost, 0.0);
+        assert_eq!(stats.spread, (0.0, 0.0));
+        assert_eq!(stats.best_of_k_cost, 0.0);
+        assert_eq!(stats.best_seed, 0);
+        assert_eq!(stats.median_seed, 0);
+        assert_eq!(stats.worst_seed, 0);
+        // Finite, no NaN.
+        assert!(stats.median_cost.is_finite());
+        assert!(stats.spread.0.is_finite() && stats.spread.1.is_finite());
+        assert!(stats.best_of_k_cost.is_finite());
+    }
+
+    fn model_stats_with_median(model: &str, median: f64) -> ModelStats {
+        // Build a one-sample model whose median equals `median`.
+        ModelStats::from_samples(model.to_string(), vec![sample(1, median)], &[1])
+    }
+
+    #[test]
+    fn test_from_model_stats_geomean_of_medians() {
+        // Three models with medians 2, 8, 32: geomean = cbrt(2*8*32) = cbrt(512) = 8.
+        let per_model = vec![
+            model_stats_with_median("a", 2.0),
+            model_stats_with_median("b", 8.0),
+            model_stats_with_median("c", 32.0),
+        ];
+        let medians: Vec<f64> = per_model.iter().map(|m| m.median_cost).collect();
+        let report = CorpusReport::from_model_stats(per_model);
+        assert!(
+            close(report.geomean_of_medians, geomean(&medians)),
+            "{} != {}",
+            report.geomean_of_medians,
+            geomean(&medians)
+        );
+        assert!(
+            close(report.geomean_of_medians, 8.0),
+            "{}",
+            report.geomean_of_medians
+        );
+    }
+
+    #[test]
+    fn test_from_model_stats_zero_median_does_not_zero_aggregate() {
+        // A model with median 0 must not collapse the corpus geomean to 0; the
+        // epsilon floor keeps it positive and finite.
+        let per_model = vec![
+            model_stats_with_median("a", 0.0),
+            model_stats_with_median("b", 10.0),
+            model_stats_with_median("c", 1000.0),
+        ];
+        let report = CorpusReport::from_model_stats(per_model);
+        assert!(
+            report.geomean_of_medians > 0.0,
+            "a single 0 median must not zero the aggregate: got {}",
+            report.geomean_of_medians
+        );
+        assert!(report.geomean_of_medians.is_finite());
+        // It must equal the geomean of the floored medians, exactly.
+        let floored = [GEOMEAN_FLOOR_EPSILON, 10.0, 1000.0];
+        assert!(
+            close(report.geomean_of_medians, geomean(&floored)),
+            "{} != {}",
+            report.geomean_of_medians,
+            geomean(&floored)
+        );
+    }
+
+    #[test]
+    fn test_from_model_stats_empty_corpus_is_zero() {
+        let report = CorpusReport::from_model_stats(vec![]);
+        assert_eq!(report.geomean_of_medians, 0.0);
+        assert!(report.geomean_of_medians.is_finite());
+    }
 }

From 26a223213c25112b05ada244dc635dfbd68bcbdf Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 22:17:43 -0700
Subject: [PATCH 12/38] engine: add baseline-vs-candidate compare() with
 Mann-Whitney significance

---
 src/simlin-engine/src/layout/eval_stats.rs | 335 +++++++++++++++++++++
 1 file changed, 335 insertions(+)

diff --git a/src/simlin-engine/src/layout/eval_stats.rs b/src/simlin-engine/src/layout/eval_stats.rs
index 5ba5227ed..2b56a995a 100644
--- a/src/simlin-engine/src/layout/eval_stats.rs
+++ b/src/simlin-engine/src/layout/eval_stats.rs
@@ -363,6 +363,146 @@ impl CorpusReport {
     }
 }
 
+/// Per-model verdict from comparing a baseline against a candidate report.
+#[derive(Clone, Debug)]
+pub struct ModelComparison {
+    pub model: String,
+    pub baseline_median: f64,
+    pub candidate_median: f64,
+    /// `candidate_median / baseline_median - 1.0`, or `0.0` when the baseline
+    /// median is `0` (so a degenerate baseline never produces inf/NaN). A
+    /// negative ratio means the candidate is cheaper (better).
+    pub delta_ratio: f64,
+    /// Two-sided Mann-Whitney U p-value over the two models' seed-sample
+    /// `weighted_cost` vectors.
+    pub p_value: f64,
+    /// `p_value < SIGNIFICANCE_ALPHA`.
+    pub significant: bool,
+}
+
+/// Result of comparing two corpus reports: one [`ModelComparison`] per matched
+/// model plus the corpus-wide aggregate delta and significance verdict.
+#[derive(Clone, Debug)]
+pub struct Comparison {
+    /// One entry per model present in BOTH reports (unmatched models are
+    /// skipped -- see [`compare`]), in baseline iteration order.
+    pub per_model: Vec<ModelComparison>,
+    /// `geomean(candidate medians) / geomean(baseline medians) - 1.0` over the
+    /// matched per-model medians, or `0.0` when the baseline geomean is `0`.
+    pub aggregate_delta_ratio: f64,
+    /// Two-sided Mann-Whitney U p-value over the matched per-model medians (see
+    /// [`compare`] for why Mann-Whitney rather than a paired test).
+    pub aggregate_p_value: f64,
+    /// `aggregate_p_value < SIGNIFICANCE_ALPHA`.
+    pub aggregate_significant: bool,
+}
+
+/// Significance threshold for the p-value verdicts -- the conventional 5%.
+pub const SIGNIFICANCE_ALPHA: f64 = 0.05;
+
+/// Compute `candidate / baseline - 1.0`, returning `0.0` when `baseline == 0`
+/// so a degenerate (zero) baseline never produces an infinite or NaN ratio.
+/// Mirrors the no-NaN policy of the rest of this module.
+fn delta_ratio(baseline: f64, candidate: f64) -> f64 {
+    if baseline == 0.0 {
+        0.0
+    } else {
+        candidate / baseline - 1.0
+    }
+}
+
+/// Compare two corpus reports.
+///
+/// Models are matched by `model` name; only models present in BOTH reports are
+/// compared. A model present in just one report is **skipped** (it has no
+/// counterpart to difference against). The returned `per_model` is in baseline
+/// iteration order.
+///
+/// Per matched model: the two seed-sample `weighted_cost` vectors are run
+/// through [`mann_whitney_u`]; `delta_ratio` is computed from the medians
+/// (`0.0` when the baseline median is `0`); `significant` is
+/// `p_value < SIGNIFICANCE_ALPHA`.
+///
+/// Aggregate: `aggregate_delta_ratio` is the ratio of the candidate-side to
+/// baseline-side geometric mean of the matched per-model medians (each side
+/// floored by [`GEOMEAN_FLOOR_EPSILON`] exactly as [`CorpusReport`] does, so a
+/// `0` median can't zero the aggregate). `aggregate_p_value` is
+/// `mann_whitney_u(baseline_medians, candidate_medians).p_value` over the
+/// matched per-model medians.
+///
+/// The aggregate significance test treats the two median vectors as
+/// independent samples (Mann-Whitney U), per the design. A paired test such as
+/// Wilcoxon signed-rank -- which would exploit the model-by-model pairing of
+/// the matched medians -- is a documented future refinement, not implemented
+/// here.
+///
+/// On empty or fully-disjoint reports there are no matched models:
+/// `per_model` is empty, `aggregate_delta_ratio == 0.0`, and the aggregate is
+/// non-significant with a finite p-value (no NaN).
+pub fn compare(baseline: &CorpusReport, candidate: &CorpusReport) -> Comparison {
+    // Index the candidate's models by name so we can pull the matching entry in
+    // baseline iteration order without an O(n^2) scan.
+    let candidate_by_name: std::collections::HashMap<&str, &ModelStats> = candidate
+        .per_model
+        .iter()
+        .map(|m| (m.model.as_str(), m))
+        .collect();
+
+    let mut per_model = Vec::new();
+    let mut baseline_medians = Vec::new();
+    let mut candidate_medians = Vec::new();
+
+    for base in &baseline.per_model {
+        let Some(cand) = candidate_by_name.get(base.model.as_str()) else {
+            // Unmatched: present only in the baseline, so skip it.
+            continue;
+        };
+
+        let baseline_costs: Vec<f64> = base.samples.iter().map(|s| s.weighted_cost).collect();
+        let candidate_costs: Vec<f64> = cand.samples.iter().map(|s| s.weighted_cost).collect();
+        let mw = mann_whitney_u(&baseline_costs, &candidate_costs);
+
+        let baseline_median = base.median_cost;
+        let candidate_median = cand.median_cost;
+        let ratio = delta_ratio(baseline_median, candidate_median);
+
+        baseline_medians.push(baseline_median);
+        candidate_medians.push(candidate_median);
+
+        per_model.push(ModelComparison {
+            model: base.model.clone(),
+            baseline_median,
+            candidate_median,
+            delta_ratio: ratio,
+            p_value: mw.p_value,
+            significant: mw.p_value < SIGNIFICANCE_ALPHA,
+        });
+    }
+
+    // Aggregate delta: ratio of the two geomean-of-medians, each side floored
+    // by the same epsilon CorpusReport uses so a single 0 median can't zero a
+    // side's geometric mean.
+    let baseline_floored: Vec<f64> = baseline_medians
+        .iter()
+        .map(|&m| m.max(GEOMEAN_FLOOR_EPSILON))
+        .collect();
+    let candidate_floored: Vec<f64> = candidate_medians
+        .iter()
+        .map(|&m| m.max(GEOMEAN_FLOOR_EPSILON))
+        .collect();
+    let aggregate_delta_ratio =
+        delta_ratio(geomean(&baseline_floored), geomean(&candidate_floored));
+
+    let aggregate_p_value = mann_whitney_u(&baseline_medians, &candidate_medians).p_value;
+
+    Comparison {
+        per_model,
+        aggregate_delta_ratio,
+        aggregate_p_value,
+        aggregate_significant: aggregate_p_value < SIGNIFICANCE_ALPHA,
+    }
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -771,4 +911,199 @@ mod tests {
         assert_eq!(report.geomean_of_medians, 0.0);
         assert!(report.geomean_of_medians.is_finite());
     }
+
+    // --- Task 3: compare(baseline, candidate) ---
+
+    /// Build a `ModelStats` directly from a list of `(seed, cost)` pairs, with
+    /// no production seeds (best-of-k irrelevant for the comparison tests).
+    fn model_stats_from_costs(model: &str, seed_costs: &[(u64, f64)]) -> ModelStats {
+        let samples: Vec<MetricSample> = seed_costs
+            .iter()
+            .map(|&(seed, cost)| sample(seed, cost))
+            .collect();
+        ModelStats::from_samples(model.to_string(), samples, &[])
+    }
+
+    #[test]
+    fn test_compare_identical_report_is_zero_and_nonsignificant() {
+        // AC4.5: comparing a report against itself must report no change and no
+        // significance, with p-values pinned to the non-significant default.
+        let report = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0), (4, 40.0)]),
+            model_stats_from_costs("b", &[(1, 5.0), (2, 15.0), (3, 25.0), (4, 35.0)]),
+        ]);
+
+        let cmp = compare(&report, &report);
+
+        assert_eq!(cmp.per_model.len(), 2);
+        for m in &cmp.per_model {
+            assert_eq!(m.delta_ratio, 0.0, "model {} delta_ratio", m.model);
+            assert!(!m.significant, "model {} must not be significant", m.model);
+            // Identical seed samples ⇒ every value tied ⇒ non-significant.
+            assert!(
+                m.p_value > 0.5,
+                "model {} p_value {} should be non-significant",
+                m.model,
+                m.p_value
+            );
+        }
+        assert_eq!(cmp.aggregate_delta_ratio, 0.0);
+        assert!(!cmp.aggregate_significant);
+        assert!(
+            cmp.aggregate_p_value > 0.5,
+            "aggregate p_value {} should be non-significant",
+            cmp.aggregate_p_value
+        );
+    }
+
+    #[test]
+    fn test_compare_clear_improvement_is_negative_and_significant() {
+        // Candidate strictly below baseline with non-overlapping seed samples:
+        // the aggregate delta is negative and the per-model verdict is
+        // significant where the two samples completely separate.
+        let baseline = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs(
+                "a",
+                &[(1, 100.0), (2, 110.0), (3, 120.0), (4, 130.0), (5, 140.0)],
+            ),
+            model_stats_from_costs(
+                "b",
+                &[(1, 200.0), (2, 210.0), (3, 220.0), (4, 230.0), (5, 240.0)],
+            ),
+        ]);
+        let candidate = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs(
+                "a",
+                &[(1, 10.0), (2, 11.0), (3, 12.0), (4, 13.0), (5, 14.0)],
+            ),
+            model_stats_from_costs(
+                "b",
+                &[(1, 20.0), (2, 21.0), (3, 22.0), (4, 23.0), (5, 24.0)],
+            ),
+        ]);
+
+        let cmp = compare(&baseline, &candidate);
+
+        assert_eq!(cmp.per_model.len(), 2);
+        for m in &cmp.per_model {
+            assert!(
+                m.delta_ratio < 0.0,
+                "model {} delta_ratio {} should be negative",
+                m.model,
+                m.delta_ratio
+            );
+            assert!(
+                m.candidate_median < m.baseline_median,
+                "model {} candidate median {} should be below baseline {}",
+                m.model,
+                m.candidate_median,
+                m.baseline_median
+            );
+            assert!(
+                m.significant,
+                "model {} (completely separated samples) should be significant; p_value {}",
+                m.model, m.p_value
+            );
+        }
+        assert!(
+            cmp.aggregate_delta_ratio < 0.0,
+            "aggregate_delta_ratio {} should be negative",
+            cmp.aggregate_delta_ratio
+        );
+    }
+
+    #[test]
+    fn test_compare_only_matched_models_are_compared() {
+        // Models are matched by name; a model present in only one report is
+        // skipped. baseline has {a, b, only_baseline}; candidate has
+        // {a, b, only_candidate}. The matched set compared is {a, b}, in
+        // baseline order.
+        let baseline = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs("only_baseline", &[(1, 1.0), (2, 2.0)]),
+            model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0)]),
+            model_stats_from_costs("b", &[(1, 100.0), (2, 200.0), (3, 300.0)]),
+        ]);
+        let candidate = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs("b", &[(1, 100.0), (2, 200.0), (3, 300.0)]),
+            model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0)]),
+            model_stats_from_costs("only_candidate", &[(1, 9.0), (2, 8.0)]),
+        ]);
+
+        let cmp = compare(&baseline, &candidate);
+
+        // Exactly the two matched models, in baseline iteration order.
+        let names: Vec<&str> = cmp.per_model.iter().map(|m| m.model.as_str()).collect();
+        assert_eq!(
+            names,
+            vec!["a", "b"],
+            "only matched models, in baseline order"
+        );
+        // The unmatched names appear nowhere.
+        assert!(!names.contains(&"only_baseline"));
+        assert!(!names.contains(&"only_candidate"));
+    }
+
+    #[test]
+    fn test_compare_zero_baseline_median_no_divide_by_zero() {
+        // No NaN: a model whose baseline median is 0 yields delta_ratio == 0.0
+        // (not inf/NaN) and every reported field stays finite.
+        let baseline = CorpusReport::from_model_stats(vec![model_stats_from_costs(
+            "z",
+            &[(1, 0.0), (2, 0.0), (3, 0.0)],
+        )]);
+        let candidate = CorpusReport::from_model_stats(vec![model_stats_from_costs(
+            "z",
+            &[(1, 5.0), (2, 6.0), (3, 7.0)],
+        )]);
+
+        let cmp = compare(&baseline, &candidate);
+
+        assert_eq!(cmp.per_model.len(), 1);
+        let m = &cmp.per_model[0];
+        assert_eq!(m.baseline_median, 0.0);
+        assert_eq!(
+            m.delta_ratio, 0.0,
+            "delta_ratio with a 0 baseline median must be 0.0, not inf/NaN"
+        );
+        assert!(m.delta_ratio.is_finite());
+        assert!(m.candidate_median.is_finite());
+        assert!(m.p_value.is_finite());
+        assert!(cmp.aggregate_delta_ratio.is_finite());
+        assert!(cmp.aggregate_p_value.is_finite());
+    }
+
+    #[test]
+    fn test_compare_empty_reports_are_finite_and_nonsignificant() {
+        // Degenerate input: two empty corpora compare to no per-model rows, a
+        // zero aggregate delta, and a finite non-significant verdict.
+        let empty = CorpusReport::from_model_stats(vec![]);
+        let cmp = compare(&empty, &empty);
+        assert!(cmp.per_model.is_empty());
+        assert_eq!(cmp.aggregate_delta_ratio, 0.0);
+        assert!(cmp.aggregate_delta_ratio.is_finite());
+        assert!(cmp.aggregate_p_value.is_finite());
+        assert!(!cmp.aggregate_significant);
+    }
+
+    #[test]
+    fn test_compare_no_matched_models_is_finite() {
+        // Reports with disjoint model names share no matched models: no
+        // per-model rows, a zero aggregate delta, and a finite verdict.
+        let baseline =
+            CorpusReport::from_model_stats(vec![model_stats_from_costs("a", &[(1, 10.0)])]);
+        let candidate =
+            CorpusReport::from_model_stats(vec![model_stats_from_costs("b", &[(1, 20.0)])]);
+        let cmp = compare(&baseline, &candidate);
+        assert!(cmp.per_model.is_empty());
+        assert_eq!(cmp.aggregate_delta_ratio, 0.0);
+        assert!(cmp.aggregate_delta_ratio.is_finite());
+        assert!(cmp.aggregate_p_value.is_finite());
+        assert!(!cmp.aggregate_significant);
+    }
+
+    #[test]
+    fn test_compare_significance_alpha_is_five_percent() {
+        // The exported significance threshold is the conventional 0.05.
+        assert_eq!(SIGNIFICANCE_ALPHA, 0.05);
+    }
 }

From 1e069d7f242e330e192f427e20c6bafd4d35e229 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 22:27:51 -0700
Subject: [PATCH 13/38] engine: add worst_seed tie-break regression test

The best_seed and median_seed lowest-seed tie-breaks in
ModelStats::from_samples already had dedicated tests, but worst_seed
did not. worst_seed uses a non-obvious construct -- max_by(...) followed
by a reversed seed comparison (.then(y.seed.cmp(&x.seed))) -- precisely
so the lowest seed wins when two samples tie at the maximum weighted
cost. Add a focused test that pins this behavior: two samples share the
max cost at different seeds (plus a lower-cost third), and worst_seed
must be the lower of the tied seeds. The test was verified non-vacuous
by temporarily flipping the tie-break direction (it failed, picking the
higher seed) before reverting.
---
 src/simlin-engine/src/layout/eval_stats.rs | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/simlin-engine/src/layout/eval_stats.rs b/src/simlin-engine/src/layout/eval_stats.rs
index 2b56a995a..e6532c22f 100644
--- a/src/simlin-engine/src/layout/eval_stats.rs
+++ b/src/simlin-engine/src/layout/eval_stats.rs
@@ -836,6 +836,22 @@ mod tests {
         assert_eq!(stats.median_seed, 5, "tie must break on the lowest seed");
     }
 
+    #[test]
+    fn test_from_samples_worst_seed_tie_break_lowest() {
+        // Two seeds SHARE the maximum cost; the lower seed must win. The third
+        // (lower-cost) sample ensures the max is a genuine tie, not the only
+        // value. seeds 7 and 4 both cost 50 (the max); seed 2 costs 10.
+        // worst_seed must be 4 (the lower of the two tied-at-max seeds), NOT 7.
+        // This fails if the tie-break direction in from_samples were reversed
+        // (a `.then(x.seed.cmp(&y.seed))` after max_by would pick 7).
+        let samples = vec![sample(7, 50.0), sample(2, 10.0), sample(4, 50.0)];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &[]);
+        assert_eq!(
+            stats.worst_seed, 4,
+            "max-cost tie must break on the lowest seed"
+        );
+    }
+
     #[test]
     fn test_from_samples_empty_is_all_zero() {
         let stats = ModelStats::from_samples("empty".to_string(), vec![], &[1, 2, 3]);

From df6ac00f9402c1f875f3ba8248bcca23a80a28e3 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 22:40:58 -0700
Subject: [PATCH 14/38] engine: add layout_eval example skeleton + expose
 LAYOUT_SEEDS

---
 src/simlin-engine/Cargo.toml              |   9 +
 src/simlin-engine/examples/layout_eval.rs | 264 ++++++++++++++++++++++
 src/simlin-engine/src/layout/mod.rs       |   7 +-
 3 files changed, 279 insertions(+), 1 deletion(-)
 create mode 100644 src/simlin-engine/examples/layout_eval.rs

diff --git a/src/simlin-engine/Cargo.toml b/src/simlin-engine/Cargo.toml
index c02081eed..de3c1f39a 100644
--- a/src/simlin-engine/Cargo.toml
+++ b/src/simlin-engine/Cargo.toml
@@ -115,6 +115,15 @@ name = "compiler_vector"
 name = "vdf_alias_decoder"
 required-features = ["file_io"]
 
+# The layout_eval example calls the png_render-gated `render_png` and loads
+# Vensim corpus models that reference external data (file_io). Examples are
+# auto-discovered and built by `--all-targets` / clippy / pre-commit under the
+# DEFAULT feature set (which excludes png_render); without this `[[example]]`
+# entry pinning required-features, that build would fail to compile the example.
+[[example]]
+name = "layout_eval"
+required-features = ["png_render", "file_io"]
+
 [[bench]]
 name = "array_ops"
 harness = false
diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
new file mode 100644
index 000000000..622924b59
--- /dev/null
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -0,0 +1,264 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+//! Layout-quality evaluation sweep (on-demand; NOT part of `cargo test`).
+//!
+//! Lays out a curated corpus of models across many seeds, scores each layout
+//! with the layout-quality metric, renders best/median/worst (and any
+//! hand-authored reference) to PNG, and writes a metrics table (JSON), an HTML
+//! contact-sheet, and a baseline diff -- all under a gitignored `target/` dir.
+//!
+//! This is a thin imperative shell over the metric core
+//! (`layout::metrics::compute_layout_metrics`) and the statistics core
+//! (`layout::eval_stats`). It loads each model via the public `open_xmile` /
+//! `open_vensim` loaders (like `examples/backend_bench.rs`), runs
+//! `generate_layout_with_config` per seed, scores, summarizes, renders, and
+//! emits artifacts.
+//!
+//! Usage:
+//!   cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval
+//!   LAYOUT_EVAL_MODELS=teacup,sir cargo run ... --example layout_eval
+//!
+//! Env knobs:
+//!   LAYOUT_EVAL_MODELS  comma list of corpus keys to run (default: all)
+//!   LAYOUT_EVAL_SEEDS   number of seeds M to sample (default: 25)
+//!   LAYOUT_EVAL_OUT     output directory (default: repo-root target/layout-eval)
+//!
+//! Requires `--features png_render,file_io`: `png_render` for `render_png`, and
+//! `file_io` so Vensim corpus models that reference external data can load.
+
+use std::collections::BTreeSet;
+use std::env;
+use std::io::BufReader;
+
+use simlin_engine::layout::LAYOUT_SEEDS;
+use simlin_engine::{datamodel, open_vensim, open_xmile};
+
+/// The model name the layout pipeline and renderer operate on. `Project::get_model`
+/// maps "main" to the single/main model (matching `tests/layout.rs`).
+const MAIN_MODEL: &str = "main";
+
+/// Default number of seeds to sample per model when `LAYOUT_EVAL_SEEDS` is unset.
+const DEFAULT_SEEDS: u64 = 25;
+
+// ── Corpus ─────────────────────────────────────────────────────────────────
+
+#[derive(Clone, Copy)]
+enum Format {
+    Xmile,
+    Vensim,
+}
+
+struct ModelSpec {
+    key: &'static str,
+    /// Path relative to CARGO_MANIFEST_DIR (src/simlin-engine).
+    rel_path: &'static str,
+    format: Format,
+}
+
+use Format::{Vensim, Xmile};
+
+/// The curated corpus. Paths are relative to `CARGO_MANIFEST_DIR`
+/// (`src/simlin-engine`); all 15 were verified to exist on disk.
+const CORPUS: &[ModelSpec] = &[
+    // canonical small
+    ModelSpec {
+        key: "teacup",
+        rel_path: "../../test/test-models/samples/teacup/teacup.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "sir",
+        rel_path: "../../test/test-models/samples/SIR/SIR.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "logistic_growth",
+        rel_path: "../../test/logistic_growth_ltm/logistic_growth.stmx",
+        format: Xmile,
+    },
+    // modules
+    ModelSpec {
+        key: "hares_and_foxes",
+        rel_path: "../../test/modules_hares_and_foxes/modules_hares_and_foxes.stmx",
+        format: Xmile,
+    },
+    // multipoint connectors
+    ModelSpec {
+        key: "multipoint",
+        rel_path: "../../test/test-models/samples/display/multipoint-connection.stmx",
+        format: Xmile,
+    },
+    // aliases
+    ModelSpec {
+        key: "alias1",
+        rel_path: "../../test/alias1/alias1.stmx",
+        format: Xmile,
+    },
+    // LTM / loop models
+    ModelSpec {
+        key: "cross_element",
+        rel_path: "../../test/cross_element_ltm/cross_element.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "arrayed_pop",
+        rel_path: "../../test/arrayed_population_ltm/arrayed_population.stmx",
+        format: Xmile,
+    },
+    // ai-information reference set (human vs AI; used by Phase 4 calibration)
+    ModelSpec {
+        key: "ai_pure_human",
+        rel_path: "../../test/ai-information/PureHumanModel.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "ai_pure_ai",
+        rel_path: "../../test/ai-information/PureAIModel.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "ai_edited",
+        rel_path: "../../test/ai-information/GeneratedByAIThenEdited.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "ai_modules_arrays",
+        rel_path: "../../test/ai-information/WithModulesAndArrays.stmx",
+        format: Xmile,
+    },
+    // large metasd Vensim
+    ModelSpec {
+        key: "wrld3_03",
+        rel_path: "../../test/metasd/WRLD3-03/wrld3-03.mdl",
+        format: Vensim,
+    },
+    ModelSpec {
+        key: "beer_game",
+        rel_path: "../../test/metasd/beer-game/RealBeer4-Sterman13.mdl",
+        format: Vensim,
+    },
+    ModelSpec {
+        key: "wonderland",
+        rel_path: "../../test/metasd/wonderland/Wonderland3.mdl",
+        format: Vensim,
+    },
+];
+
+/// Resolve a corpus-relative path against the crate manifest dir.
+fn abs_path(rel: &str) -> String {
+    format!("{}/{}", env!("CARGO_MANIFEST_DIR"), rel)
+}
+
+/// Load one corpus model, dispatching on its declared format: XMILE through a
+/// buffered reader + `open_xmile`, Vensim `.mdl` through a string + `open_vensim`
+/// (mirrors `examples/backend_bench.rs`). Returns a human-readable error on any
+/// I/O or parse failure so the caller can WARN-and-skip (AC3.6).
+fn load_model(spec: &ModelSpec) -> Result<datamodel::Project, String> {
+    let path = abs_path(spec.rel_path);
+    match spec.format {
+        Format::Xmile => {
+            let file =
+                std::fs::File::open(&path).map_err(|e| format!("failed to open {path}: {e}"))?;
+            let mut reader = BufReader::new(file);
+            open_xmile(&mut reader).map_err(|e| format!("failed to parse {path}: {e:?}"))
+        }
+        Format::Vensim => {
+            let contents = std::fs::read_to_string(&path)
+                .map_err(|e| format!("failed to read {path}: {e}"))?;
+            open_vensim(&contents).map_err(|e| format!("failed to parse {path}: {e:?}"))
+        }
+    }
+}
+
+/// Count the view elements in the model's as-loaded main view -- the diagram
+/// the later tasks score and render. A model with no hand-authored view yields
+/// 0 here (its layout is generated from scratch in Task 2).
+fn loaded_element_count(project: &datamodel::Project) -> usize {
+    let Some(model) = project.get_model(MAIN_MODEL) else {
+        return 0;
+    };
+    match model.views.first() {
+        Some(datamodel::View::StockFlow(sf)) => sf.elements.len(),
+        None => 0,
+    }
+}
+
+// ── Env knobs ────────────────────────────────────────────────────────────────
+
+/// The set of corpus keys to run. `LAYOUT_EVAL_MODELS` is a comma list of keys;
+/// unset/empty means the whole corpus. Unknown keys are reported and dropped so
+/// a typo does not silently run nothing without explanation.
+fn selected_keys() -> Vec<&'static str> {
+    let Ok(raw) = env::var("LAYOUT_EVAL_MODELS") else {
+        return CORPUS.iter().map(|s| s.key).collect();
+    };
+    let requested: Vec<&str> = raw
+        .split(',')
+        .map(str::trim)
+        .filter(|s| !s.is_empty())
+        .collect();
+    if requested.is_empty() {
+        return CORPUS.iter().map(|s| s.key).collect();
+    }
+    let mut keys = Vec::new();
+    for want in requested {
+        match CORPUS.iter().find(|s| s.key == want) {
+            Some(spec) => keys.push(spec.key),
+            None => eprintln!("WARN: unknown model key {want:?}; skipping"),
+        }
+    }
+    keys
+}
+
+/// Number of seeds M to sample per model (`LAYOUT_EVAL_SEEDS`, default 25).
+fn seed_count() -> u64 {
+    env::var("LAYOUT_EVAL_SEEDS")
+        .ok()
+        .and_then(|v| v.parse().ok())
+        .unwrap_or(DEFAULT_SEEDS)
+}
+
+/// The seeds to sample: the union of the production best-of-k proxy
+/// (`LAYOUT_SEEDS`) and `0..m`, deduped and sorted. Including `LAYOUT_SEEDS`
+/// guarantees the best-of-k production proxy is always computable regardless of
+/// `m`.
+fn seed_set(m: u64) -> Vec<u64> {
+    let mut seeds: BTreeSet<u64> = (0..m).collect();
+    seeds.extend(LAYOUT_SEEDS);
+    seeds.into_iter().collect()
+}
+
+/// The output directory (`LAYOUT_EVAL_OUT`, default repo-root
+/// `target/layout-eval`, derived from `CARGO_MANIFEST_DIR`).
+fn out_dir() -> String {
+    env::var("LAYOUT_EVAL_OUT")
+        .unwrap_or_else(|_| format!("{}/../../target/layout-eval", env!("CARGO_MANIFEST_DIR")))
+}
+
+fn main() {
+    let keys = selected_keys();
+    let m = seed_count();
+    let seeds = seed_set(m);
+    let out = out_dir();
+
+    std::fs::create_dir_all(&out)
+        .unwrap_or_else(|e| panic!("failed to create output dir {out}: {e}"));
+
+    println!(
+        "layout_eval: {} model(s), M={m} seeds (sampling {} unique), out={out}",
+        keys.len(),
+        seeds.len()
+    );
+
+    for spec in CORPUS.iter().filter(|s| keys.contains(&s.key)) {
+        match load_model(spec) {
+            Ok(project) => {
+                let n = loaded_element_count(&project);
+                println!("loaded {}: {n} elements", spec.key);
+            }
+            Err(err) => eprintln!("WARN: skipping {}: {err}", spec.key),
+        }
+    }
+}
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index d5414d693..6e5ae43f5 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -4485,7 +4485,12 @@ fn build_stock_flow_from_state(
 
 /// Seeds for parallel layout generation. Each seed produces a different SFDP
 /// layout; the one with fewest connector crossings is selected.
-const LAYOUT_SEEDS: [u64; 4] = [42, 123, 456, 789];
+///
+/// These are also the layout-quality sweep's best-of-k production proxy: the
+/// `layout_eval` example scores the best layout over exactly this seed set to
+/// estimate what production (which picks best-of-`LAYOUT_SEEDS`) would ship,
+/// so it is exposed publicly. The value and behavior are unchanged.
+pub const LAYOUT_SEEDS: [u64; 4] = [42, 123, 456, 789];
 
 /// Apply a model patch incrementally to an existing diagram view,
 /// preserving existing element positions and only placing new or

From e3bea9526e135bb1d5008514ebee37a13d5fdce7 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 22:50:06 -0700
Subject: [PATCH 15/38] engine: layout_eval per-seed sweep +
 ModelStats/CorpusReport

For each loaded corpus model, lay out the main model once per sampled
seed (rayon par_iter, mirroring generate_best_layout), score each layout
with compute_layout_metrics, and reduce the costs to a weighted scalar.
Collect MetricSamples into ModelStats::from_samples and aggregate into a
CorpusReport, printing a per-model median/p25/p75/best-of-k line plus the
corpus geomean-of-medians.

The weighted_cost uses a Phase-3 PLACEHOLDER MetricWeights (overlap and
crossings dominant; sprawl/edge-cv/aspect moderate; reserved structure
terms zero), since MetricWeights::default() is all-zeros until Phase 4
calibrates and commits the real weights. Phase 4 must switch this sweep
to MetricWeights::default() once those land.

The parallel results are sorted back into seed order before summarizing,
so the sample vector and every derived statistic are invariant to rayon
scheduling. Note that generate_layout_with_config is itself not
deterministic per seed -- the same (model, seed) yields slightly
different layouts even serially within one process, traced to
per-process-randomized HashMap iteration order in the layout pipeline.
That is a pre-existing layout-engine issue independent of this sweep.
---
 src/simlin-engine/examples/layout_eval.rs | 118 +++++++++++++++++++++-
 1 file changed, 116 insertions(+), 2 deletions(-)

diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
index 622924b59..4162ad75f 100644
--- a/src/simlin-engine/examples/layout_eval.rs
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -32,9 +32,42 @@ use std::collections::BTreeSet;
 use std::env;
 use std::io::BufReader;
 
+use rayon::prelude::*;
 use simlin_engine::layout::LAYOUT_SEEDS;
+use simlin_engine::layout::config::LayoutConfig;
+use simlin_engine::layout::eval_stats::{CorpusReport, MetricSample, ModelStats};
+use simlin_engine::layout::generate_layout_with_config;
+use simlin_engine::layout::metrics::{MetricWeights, compute_layout_metrics};
 use simlin_engine::{datamodel, open_vensim, open_xmile};
 
+/// Phase-3 PLACEHOLDER weights for `weighted_cost`.
+///
+/// `MetricWeights::default()` is all-zeros until Phase 4 commits the calibrated
+/// weights (so any accidental pre-calibration use of `weighted_cost` is inert
+/// rather than silently wrong). The sweep needs a *non-trivial* scalar to rank
+/// seeds (best/median/worst) and to compute the corpus geomean, so this
+/// placeholder encodes the design's intended failure-mode priorities:
+/// the overlap family (`node_overlap`, `node_connector_overlap`, `label_overlap`)
+/// and edge `crossings` are dominant; `sprawl`, `edge_length_cv`, and
+/// `aspect_penalty` are moderate; the reserved structure terms
+/// (`chain_straightness`, `loop_compactness`, always 0.0 in Phase 1-3) carry
+/// zero weight.
+///
+/// Phase 4 commits the calibrated `MetricWeights` (its `Default`); when it
+/// lands, this placeholder MUST be replaced by `MetricWeights::default()` (see
+/// the Phase 4 plan, Task 2).
+const PLACEHOLDER_WEIGHTS: MetricWeights = MetricWeights {
+    node_overlap: 1.0,
+    node_connector_overlap: 1.0,
+    label_overlap: 1.0,
+    crossings: 1.0,
+    sprawl: 0.25,
+    edge_length_cv: 0.25,
+    aspect_penalty: 0.25,
+    chain_straightness: 0.0,
+    loop_compactness: 0.0,
+};
+
 /// The model name the layout pipeline and renderer operate on. `Project::get_model`
 /// maps "main" to the single/main model (matching `tests/layout.rs`).
 const MAIN_MODEL: &str = "main";
@@ -237,6 +270,69 @@ fn out_dir() -> String {
         .unwrap_or_else(|_| format!("{}/../../target/layout-eval", env!("CARGO_MANIFEST_DIR")))
 }
 
+// ── Per-model seed sweep ─────────────────────────────────────────────────────
+
+/// Lay out `project`'s main model once for each `seed`, score each layout, and
+/// summarize the samples into a `ModelStats`.
+///
+/// The per-seed layouts run in parallel via rayon (mirroring
+/// `generate_best_layout`'s `par_iter` over seeds). The parallel results are
+/// collapsed back into `seeds`-order before being summarized, so the sample
+/// vector -- and every statistic derived from it -- is invariant to rayon's
+/// scheduling: parallelism introduces no nondeterminism here.
+///
+/// NOTE: `generate_layout_with_config` is itself NOT deterministic per seed --
+/// the same `(model, seed)` pair produces slightly different layouts on
+/// repeated calls, *even serially within one process* (verified by direct
+/// probe). The drift traces to per-process-randomized `HashMap`/`HashSet`
+/// iteration order in the layout pipeline (e.g. `sfdp::build_node_index`'s
+/// `HashMap` feeding force accumulation). This is a pre-existing layout-engine
+/// issue, not a property of this sweep; it means the reported median/spread
+/// vary run-to-run within a small band. The fix belongs in the layout engine
+/// (deterministic ordered containers). Tracked separately.
+///
+/// A seed whose layout fails to generate is dropped with a WARN (a single bad
+/// seed must not sink the whole model's sweep). The full model-level
+/// skip-on-failure path (load/render) lands in a later task; here a model with
+/// zero successful seeds yields an empty `ModelStats` (all-zero, no panic).
+fn sweep_model(key: &str, project: &datamodel::Project, seeds: &[u64]) -> ModelStats {
+    // Compute one (seed, sample) per seed in parallel, then sort back into seed
+    // order so the sample vector -- and therefore every statistic derived from
+    // it -- is independent of rayon's scheduling.
+    let mut indexed: Vec<(u64, MetricSample)> = seeds
+        .par_iter()
+        .filter_map(|&seed| {
+            let cfg = LayoutConfig {
+                annealing_random_seed: seed,
+                ..LayoutConfig::default()
+            };
+            match generate_layout_with_config(project, MAIN_MODEL, cfg.clone(), None) {
+                Ok(view) => {
+                    let metrics = compute_layout_metrics(&view, &cfg);
+                    let weighted_cost = metrics.weighted_cost(&PLACEHOLDER_WEIGHTS);
+                    Some((
+                        seed,
+                        MetricSample {
+                            seed,
+                            metrics,
+                            weighted_cost,
+                        },
+                    ))
+                }
+                Err(err) => {
+                    eprintln!("WARN: {key} seed {seed} failed to lay out: {err}");
+                    None
+                }
+            }
+        })
+        .collect();
+
+    indexed.sort_by_key(|(seed, _)| *seed);
+    let samples: Vec<MetricSample> = indexed.into_iter().map(|(_, sample)| sample).collect();
+
+    ModelStats::from_samples(key.to_string(), samples, &LAYOUT_SEEDS)
+}
+
 fn main() {
     let keys = selected_keys();
     let m = seed_count();
@@ -246,19 +342,37 @@ fn main() {
     std::fs::create_dir_all(&out)
         .unwrap_or_else(|e| panic!("failed to create output dir {out}: {e}"));
 
+    let n_sampled = seeds.len();
     println!(
-        "layout_eval: {} model(s), M={m} seeds (sampling {} unique), out={out}",
+        "layout_eval: {} model(s), M={m} seeds (sampling {n_sampled} unique), out={out}",
         keys.len(),
-        seeds.len()
     );
 
+    let mut per_model: Vec<ModelStats> = Vec::new();
     for spec in CORPUS.iter().filter(|s| keys.contains(&s.key)) {
         match load_model(spec) {
             Ok(project) => {
                 let n = loaded_element_count(&project);
                 println!("loaded {}: {n} elements", spec.key);
+
+                let stats = sweep_model(spec.key, &project, &seeds);
+                let (p25, p75) = stats.spread;
+                // The actual sampled seed count is the union of LAYOUT_SEEDS and
+                // 0..m (deduped), reported here as the real M the run swept.
+                println!(
+                    "{}: median={:.4} p25/p75={:.4}/{:.4} best_of_k={:.4} (M={n_sampled})",
+                    spec.key, stats.median_cost, p25, p75, stats.best_of_k_cost,
+                );
+                per_model.push(stats);
             }
             Err(err) => eprintln!("WARN: skipping {}: {err}", spec.key),
         }
     }
+
+    let report = CorpusReport::from_model_stats(per_model);
+    println!(
+        "corpus: geomean_of_medians={:.4} ({} model(s) scored)",
+        report.geomean_of_medians,
+        report.per_model.len(),
+    );
 }

From 71392124feaf333a4018a09779bf35eb01e9b265 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Fri, 22 May 2026 23:00:12 -0700
Subject: [PATCH 16/38] engine: layout_eval renders best/median/worst +
 reference PNGs

---
 src/simlin-engine/examples/layout_eval.rs | 212 +++++++++++++++++++++-
 1 file changed, 206 insertions(+), 6 deletions(-)

diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
index 4162ad75f..490a69d9e 100644
--- a/src/simlin-engine/examples/layout_eval.rs
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -33,11 +33,12 @@ use std::env;
 use std::io::BufReader;
 
 use rayon::prelude::*;
+use simlin_engine::diagram::{PngRenderOpts, render_png};
 use simlin_engine::layout::LAYOUT_SEEDS;
 use simlin_engine::layout::config::LayoutConfig;
 use simlin_engine::layout::eval_stats::{CorpusReport, MetricSample, ModelStats};
 use simlin_engine::layout::generate_layout_with_config;
-use simlin_engine::layout::metrics::{MetricWeights, compute_layout_metrics};
+use simlin_engine::layout::metrics::{LayoutMetrics, MetricWeights, compute_layout_metrics};
 use simlin_engine::{datamodel, open_vensim, open_xmile};
 
 /// Phase-3 PLACEHOLDER weights for `weighted_cost`.
@@ -209,12 +210,21 @@ fn load_model(spec: &ModelSpec) -> Result<datamodel::Project, String> {
 /// the later tasks score and render. A model with no hand-authored view yields
 /// 0 here (its layout is generated from scratch in Task 2).
 fn loaded_element_count(project: &datamodel::Project) -> usize {
-    let Some(model) = project.get_model(MAIN_MODEL) else {
-        return 0;
-    };
+    reference_view(project)
+        .map(|sf| sf.elements.len())
+        .unwrap_or(0)
+}
+
+/// Borrow the model's as-loaded main `StockFlow` view if it is a hand-authored
+/// reference: a non-empty view carrying non-empty `elements`. A model loaded
+/// without a saved diagram (its layout is generated from scratch in the sweep)
+/// has no such view, so this returns `None` and the caller skips the reference
+/// render.
+fn reference_view(project: &datamodel::Project) -> Option<&datamodel::StockFlow> {
+    let model = project.get_model(MAIN_MODEL)?;
     match model.views.first() {
-        Some(datamodel::View::StockFlow(sf)) => sf.elements.len(),
-        None => 0,
+        Some(datamodel::View::StockFlow(sf)) if !sf.elements.is_empty() => Some(sf),
+        _ => None,
     }
 }
 
@@ -333,6 +343,178 @@ fn sweep_model(key: &str, project: &datamodel::Project, seeds: &[u64]) -> ModelS
     ModelStats::from_samples(key.to_string(), samples, &LAYOUT_SEEDS)
 }
 
+// ── Rendering ────────────────────────────────────────────────────────────────
+
+/// One rendered diagram: the PNG filename written under the out dir (relative,
+/// so the Task-4 `index.html` can reference it with a sibling `<img src>`) and
+/// the metric breakdown of the view that was rendered. The seed is `Some` for a
+/// generated render (best/median/worst) and `None` for the as-loaded reference.
+///
+/// `seed`, `metrics`, and `weighted_cost` are recorded here but not yet read:
+/// Task 4 serializes them into `metrics.json` and the contact-sheet's per-render
+/// breakdown table. They are deliberately kept as data now (rather than dropped
+/// and recomputed) so Task 4's serializer is a pure read over this struct.
+#[allow(dead_code)]
+struct Render {
+    /// Filename of the PNG, relative to the out dir (e.g. `sir_best.png`).
+    file: String,
+    /// The seed that produced the generated view (`None` for the reference).
+    seed: Option<u64>,
+    /// Per-term metrics of the rendered view.
+    metrics: LayoutMetrics,
+    /// Scalar weighted cost under the placeholder weights.
+    weighted_cost: f64,
+}
+
+/// All renders produced for one model: the optional hand-authored reference and
+/// the three generated layouts (best/median/worst). Task 4 serializes these
+/// per-model metric breakdowns into `metrics.json` and the contact-sheet, so the
+/// fields are kept as data the report can read back. A render that failed is
+/// `None` (the failure was already WARN-logged) -- skip-on-failure feeds Task 6.
+struct ModelRenders {
+    reference: Option<Render>,
+    best: Option<Render>,
+    median: Option<Render>,
+    worst: Option<Render>,
+}
+
+/// Render one view to a PNG file under `out`, scoring it with the default
+/// layout config (the metric core is config-driven only for node sizing, which
+/// is constant across the sweep). On any render or write failure, WARN to
+/// stderr and return `None` so the sweep continues (AC3.6).
+///
+/// `project` must already carry the view to render as its main view's first
+/// view (the renderer reads `model.views.first()`). The caller installs the
+/// view (a clone of the project for a generated layout, or the as-loaded
+/// project for the reference) before calling.
+fn render_view(
+    project: &datamodel::Project,
+    metrics: LayoutMetrics,
+    seed: Option<u64>,
+    file: &str,
+    out: &str,
+) -> Option<Render> {
+    let png = match render_png(project, MAIN_MODEL, &PngRenderOpts::default()) {
+        Ok(bytes) => bytes,
+        Err(err) => {
+            eprintln!("WARN: failed to render {file}: {err}");
+            return None;
+        }
+    };
+    let path = format!("{out}/{file}");
+    if let Err(err) = std::fs::write(&path, &png) {
+        eprintln!("WARN: failed to write {path}: {err}");
+        return None;
+    }
+    let weighted_cost = metrics.weighted_cost(&PLACEHOLDER_WEIGHTS);
+    Some(Render {
+        file: file.to_string(),
+        seed,
+        metrics,
+        weighted_cost,
+    })
+}
+
+/// Regenerate the view for `seed`, install it into a clone of `project`, render
+/// it to `{key}_{suffix}.png`, and return the `Render`. A layout-generation
+/// failure is non-fatal: WARN and return `None`.
+fn render_generated(
+    key: &str,
+    suffix: &str,
+    project: &datamodel::Project,
+    seed: u64,
+    out: &str,
+) -> Option<Render> {
+    let cfg = LayoutConfig {
+        annealing_random_seed: seed,
+        ..LayoutConfig::default()
+    };
+    let view = match generate_layout_with_config(project, MAIN_MODEL, cfg.clone(), None) {
+        Ok(view) => view,
+        Err(err) => {
+            eprintln!("WARN: {key} {suffix} (seed {seed}) failed to lay out: {err}");
+            return None;
+        }
+    };
+    let metrics = compute_layout_metrics(&view, &cfg);
+    // Install the generated view into a clone so the as-loaded project (and its
+    // reference view) is never mutated.
+    let mut p = project.clone();
+    p.get_model_mut(MAIN_MODEL).unwrap().views = vec![datamodel::View::StockFlow(view)];
+    let file = format!("{key}_{suffix}.png");
+    render_view(&p, metrics, Some(seed), &file, out)
+}
+
+/// Render the model's best/median/worst generated layouts and -- if the model
+/// ships a hand-authored view -- its reference, all to PNGs under `out`.
+///
+/// The reference is rendered from the AS-LOADED `project` (before any view is
+/// overwritten) so it captures the model's own diagram, not a generated one.
+/// Generated layouts are each regenerated from `project` by seed and installed
+/// into a fresh clone, leaving `project` untouched.
+fn render_model(
+    key: &str,
+    project: &datamodel::Project,
+    stats: &ModelStats,
+    out: &str,
+) -> ModelRenders {
+    // Reference first, from the as-loaded project, before any clone-and-install.
+    // Score the hand-authored `StockFlow` directly (the renderer reads the same
+    // view from `project`, so this is the geometry being rasterized).
+    let reference = reference_view(project).and_then(|sf| {
+        let metrics = compute_layout_metrics(sf, &LayoutConfig::default());
+        render_view(project, metrics, None, &format!("{key}_reference.png"), out)
+    });
+
+    // A model whose sweep produced no samples has all-zero seeds and nothing
+    // worth rendering; skip the generated renders (the reference, if any, is
+    // already captured).
+    if stats.samples.is_empty() {
+        return ModelRenders {
+            reference,
+            best: None,
+            median: None,
+            worst: None,
+        };
+    }
+
+    let best = render_generated(key, "best", project, stats.best_seed, out);
+    let median = render_generated(key, "median", project, stats.median_seed, out);
+    let worst = render_generated(key, "worst", project, stats.worst_seed, out);
+
+    ModelRenders {
+        reference,
+        best,
+        median,
+        worst,
+    }
+}
+
+/// Print the PNG filenames produced for one model (and note a skipped reference
+/// or generated render) so a run's stdout records exactly what was written.
+fn report_renders(key: &str, renders: &ModelRenders) {
+    let mut produced: Vec<&str> = Vec::new();
+    for render in [
+        &renders.reference,
+        &renders.best,
+        &renders.median,
+        &renders.worst,
+    ]
+    .into_iter()
+    .flatten()
+    {
+        produced.push(render.file.as_str());
+    }
+    if produced.is_empty() {
+        println!("{key}: no PNGs rendered");
+    } else {
+        println!("{key}: rendered {}", produced.join(", "));
+    }
+    if renders.reference.is_none() {
+        println!("{key}: no hand-authored reference view (skipped reference render)");
+    }
+}
+
 fn main() {
     let keys = selected_keys();
     let m = seed_count();
@@ -349,6 +531,7 @@ fn main() {
     );
 
     let mut per_model: Vec<ModelStats> = Vec::new();
+    let mut renders: Vec<ModelRenders> = Vec::new();
     for spec in CORPUS.iter().filter(|s| keys.contains(&s.key)) {
         match load_model(spec) {
             Ok(project) => {
@@ -363,7 +546,14 @@ fn main() {
                     "{}: median={:.4} p25/p75={:.4}/{:.4} best_of_k={:.4} (M={n_sampled})",
                     spec.key, stats.median_cost, p25, p75, stats.best_of_k_cost,
                 );
+
+                // Render best/median/worst (and the reference, if the model
+                // ships one) to PNGs. Render failures are non-fatal.
+                let model_renders = render_model(spec.key, &project, &stats, &out);
+                report_renders(spec.key, &model_renders);
+
                 per_model.push(stats);
+                renders.push(model_renders);
             }
             Err(err) => eprintln!("WARN: skipping {}: {err}", spec.key),
         }
@@ -375,4 +565,14 @@ fn main() {
         report.geomean_of_medians,
         report.per_model.len(),
     );
+
+    // The per-model `renders` (PNG filenames + metric breakdowns) are the input
+    // Task 4 serializes into `metrics.json` and the contact-sheet. Until then,
+    // summarize how many models shipped a hand-authored reference so a run's
+    // stdout records the reference coverage AC3.4 depends on.
+    let with_reference = renders.iter().filter(|r| r.reference.is_some()).count();
+    println!(
+        "corpus: {with_reference}/{} model(s) shipped a hand-authored reference view",
+        renders.len(),
+    );
 }

From 36419e465a666ec4396c86816776c8a2d5a6bb10 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 06:47:42 -0700
Subject: [PATCH 17/38] engine: make layout deterministic per seed (fix #633)

generate_layout / generate_layout_with_config / generate_best_layout
produced different diagrams for the same (model, annealing_random_seed)
on repeated serial calls in one process. The annealing RNG is already
deterministic (StdRng::seed_from_u64); the run-to-run entropy came from
per-instance-random std HashMap iteration order in two layout functions.

In run_sfdp_with_rigid_chains, two loops over var_to_node (a HashMap)
were order-sensitive: the centroid accumulation summed float positions
(float addition is non-associative, so hash order perturbs the centroid)
and the auxiliary initial-placement loop assigned each unpositioned aux a
polar seed angle by its hash-iteration rank, so each aux got a different
seed position that SFDP diverged into a different final layout. Both loops
now iterate a once-materialized sorted Vec of the entries. var_to_node
stays a HashMap (it is still used as a .get() lookup table elsewhere).

The incremental path (incremental_layout -> diff_connectors) had the same
class of bug, which the original report did not cover: new causal links
were created while iterating a HashSet of new edges, and each new link
both allocates a sequential uid and is appended to the view in that order,
so hash order produced different uids and element ordering for the same
logical link. The two HashSet/HashMap-driven link loops now iterate sorted
keys, and the alias-match fallback picks the lowest matching key.

Adds tests/layout.rs determinism tests for both the fresh and incremental
paths: SIR previously differed in 8/15 (fresh) and 6-7/17 (incremental)
elements between two serial calls; both are now 0.
---
 src/simlin-engine/src/layout/mod.rs |  57 ++++++++++----
 src/simlin-engine/tests/layout.rs   | 110 ++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+), 14 deletions(-)

diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index 6e5ae43f5..312687164 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -1119,23 +1119,38 @@ pub fn diff_connectors(state: &mut LayoutState, metadata: &ComputedMetadata) {
     // Track which old links have been consumed so each is used at most once.
     let mut consumed_old_links: HashSet<(i32, i32)> = HashSet::new();
 
+    // Iterate edges in a deterministic order. `new_edges` is a HashSet, so its
+    // iteration order is per-process random; since each newly-created link both
+    // allocates a sequential `uid` and is appended to `state.elements` in this
+    // loop, hash order would otherwise assign different uids / element ordering
+    // to the same logical link run-to-run (the incremental analogue of #633).
+    let mut sorted_new_edges: Vec<(i32, i32)> = new_edges.iter().copied().collect();
+    sorted_new_edges.sort_unstable();
+
     // Add back preserved links (unchanged) and create new links
-    for &(from_uid, to_uid) in &new_edges {
+    for (from_uid, to_uid) in sorted_new_edges {
         if let Some(old_link) = old_links.get(&(from_uid, to_uid)) {
             // Preserved: keep the old link exactly as-is
             state.elements.push(old_link.clone());
             consumed_old_links.insert((from_uid, to_uid));
-        } else if let Some((&key, old_link)) = old_links.iter().find(|&(&(of, ot), _)| {
-            if consumed_old_links.contains(&(of, ot)) {
-                return false;
-            }
-            let rf = alias_to_primary.get(&of).copied().unwrap_or(of);
-            let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot);
-            rf == from_uid && rt == to_uid
-        }) {
+        } else if let Some(key) = old_links
+            .keys()
+            .copied()
+            .filter(|&(of, ot)| {
+                if consumed_old_links.contains(&(of, ot)) {
+                    return false;
+                }
+                let rf = alias_to_primary.get(&of).copied().unwrap_or(of);
+                let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot);
+                rf == from_uid && rt == to_uid
+            })
+            // Pick the lowest matching key so the alias-match selection is
+            // deterministic; HashMap iteration order would otherwise vary.
+            .min()
+        {
             // Preserved via alias: the old link targets an alias whose primary
             // variable matches this dependency edge. Keep the alias link as-is.
-            state.elements.push(old_link.clone());
+            state.elements.push(old_links[&key].clone());
             consumed_old_links.insert(key);
         } else if let Some((from_ident, to_ident)) = new_edge_idents.get(&(from_uid, to_uid)) {
             // Added: create new link with default shape
@@ -1172,14 +1187,19 @@ pub fn diff_connectors(state: &mut LayoutState, metadata: &ComputedMetadata) {
     // match a valid dependency. Imported views may have multiple rendered
     // connectors for the same dependency (e.g., links to two different
     // aliases of the same variable).
-    for (&(of, ot), old_link) in &old_links {
+    // Iterate in a deterministic order for the same reason as the new-edge loop:
+    // the preserved links are appended to `state.elements`, so HashMap iteration
+    // order would otherwise perturb element ordering run-to-run.
+    let mut sorted_old_links: Vec<&(i32, i32)> = old_links.keys().collect();
+    sorted_old_links.sort_unstable();
+    for &(of, ot) in sorted_old_links {
         if consumed_old_links.contains(&(of, ot)) {
             continue;
         }
         let rf = alias_to_primary.get(&of).copied().unwrap_or(of);
         let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot);
         if new_edges.contains(&(rf, rt)) {
-            state.elements.push(old_link.clone());
+            state.elements.push(old_links[&(of, ot)].clone());
         }
     }
 }
@@ -2456,7 +2476,16 @@ fn run_sfdp_with_rigid_chains(
     let mut center_y = config.start_y;
     let mut count = 0;
 
-    for (var_ident, node_id) in var_to_node {
+    // `var_to_node` is a HashMap, so its iteration order is per-process random.
+    // Two loops below are order-sensitive: the centroid accumulation sums floats
+    // (non-associative, so hash order perturbs the result) and the aux-placement
+    // loop assigns each unpositioned aux a polar seed angle by its iteration rank.
+    // Materialize a deterministic sorted view and iterate THAT in both loops so a
+    // fixed (model, seed) yields a bit-identical layout across repeated calls (#633).
+    let mut entries: Vec<(&String, &String)> = var_to_node.iter().collect();
+    entries.sort();
+
+    for &(var_ident, node_id) in &entries {
         if let Some(uid) = state.uid_manager.get_uid(var_ident)
             && let Some(&pos) = state.positions.get(&uid)
         {
@@ -2491,7 +2520,7 @@ fn run_sfdp_with_rigid_chains(
     }
 
     let mut aux_index = 0;
-    for node_id in var_to_node.values() {
+    for &(_var_ident, node_id) in &entries {
         if initial_layout.contains_key(node_id) {
             continue;
         }
diff --git a/src/simlin-engine/tests/layout.rs b/src/simlin-engine/tests/layout.rs
index fb6b4a1a0..4373cab67 100644
--- a/src/simlin-engine/tests/layout.rs
+++ b/src/simlin-engine/tests/layout.rs
@@ -2223,3 +2223,113 @@ fn test_incremental_add_chain_rebuilds_existing_cloud_flow() {
         "chain_flow and waste_flow should not overlap after incremental add (dist={dist})"
     );
 }
+
+/// Count how many elements differ between two views generated for the same
+/// model.  Element ordering is structurally stable (see
+/// `test_layout_structural_consistency`), so a positional comparison can be
+/// done index-by-index; `ViewElement` derives `PartialEq` over its f64
+/// coordinates (and flow `points`), giving an exact byte-for-byte comparison.
+/// Returns `(differing, total)`.
+fn count_layout_differences(
+    a: &simlin_engine::datamodel::StockFlow,
+    b: &simlin_engine::datamodel::StockFlow,
+) -> (usize, usize) {
+    assert_eq!(
+        a.elements.len(),
+        b.elements.len(),
+        "layouts must have the same number of elements to compare"
+    );
+    let differing = a
+        .elements
+        .iter()
+        .zip(b.elements.iter())
+        .filter(|(ea, eb)| ea != eb)
+        .count();
+    (differing, a.elements.len())
+}
+
+/// A layout produced for a fixed (model, annealing_random_seed) must be
+/// bit-identical across repeated serial calls in one process (issue #633).
+/// The RNG is already seeded deterministically; the only remaining source of
+/// run-to-run drift was per-instance-random `HashMap` iteration order inside
+/// `run_sfdp_with_rigid_chains` (centroid float accumulation and aux initial
+/// placement).  SIR has auxiliaries, so it exercises the aux-placement loop.
+#[test]
+fn test_layout_deterministic_per_seed() {
+    let project = load_project("test/test-models/samples/SIR/SIR.stmx");
+
+    let config = LayoutConfig {
+        annealing_random_seed: 42,
+        ..Default::default()
+    };
+
+    let view1 = generate_layout_with_config(&project, MAIN_MODEL, config.clone(), None)
+        .expect("first layout should succeed");
+    let view2 = generate_layout_with_config(&project, MAIN_MODEL, config, None)
+        .expect("second layout should succeed");
+
+    let (differing, total) = count_layout_differences(&view1, &view2);
+    assert_eq!(
+        differing, 0,
+        "layout for a fixed seed must be deterministic: {differing}/{total} elements differ \
+         between two serial calls"
+    );
+}
+
+/// The incremental layout path (`incremental_layout` ->
+/// `compute_new_element_positions`) must also be deterministic for a fixed
+/// model + patch.  This guards against the same class of HashMap-iteration
+/// nondeterminism in the incremental code paths.
+#[test]
+fn test_incremental_layout_deterministic() {
+    use simlin_engine::datamodel;
+    use simlin_engine::layout::incremental_layout;
+    use simlin_engine::{ModelOperation, ModelPatch};
+
+    let project = load_project("test/test-models/samples/SIR/SIR.stmx");
+    let old_view =
+        generate_layout(&project, MAIN_MODEL, None).expect("initial layout should succeed");
+
+    let mut patched_project = project.clone();
+    let model = patched_project.get_model_mut(MAIN_MODEL).unwrap();
+    model
+        .variables
+        .push(datamodel::Variable::Aux(datamodel::Aux {
+            ident: "vaccination_rate".to_string(),
+            equation: datamodel::Equation::Scalar("susceptible * 0.01".to_string()),
+            documentation: String::new(),
+            units: None,
+            gf: None,
+            ai_state: None,
+            uid: None,
+            compat: Default::default(),
+        }));
+
+    let make_patch = || ModelPatch {
+        name: String::new(),
+        ops: vec![ModelOperation::UpsertAux(datamodel::Aux {
+            ident: "vaccination_rate".to_string(),
+            equation: datamodel::Equation::Scalar("susceptible * 0.01".to_string()),
+            documentation: String::new(),
+            units: None,
+            gf: None,
+            ai_state: None,
+            uid: None,
+            compat: Default::default(),
+        })],
+    };
+
+    let new_view1 =
+        incremental_layout(&old_view, &patched_project, MAIN_MODEL, &make_patch(), None)
+            .expect("first incremental layout should succeed");
+    let new_view2 =
+        incremental_layout(&old_view, &patched_project, MAIN_MODEL, &make_patch(), None)
+            .expect("second incremental layout should succeed");
+
+    let (differing, total) = count_layout_differences(&new_view1, &new_view2);
+    assert_eq!(
+        differing, 0,
+        "incremental layout must be deterministic: {differing}/{total} elements differ \
+         between two serial calls"
+    );
+}

From 7eadbca83ade13adb406f466b5ccf8247185b04a Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 07:02:04 -0700
Subject: [PATCH 18/38] engine: layout_eval emits metrics.json + index.html
 contact-sheet

---
 src/simlin-engine/examples/layout_eval.rs | 351 +++++++++++++++++++++-
 src/simlin-engine/src/layout/metrics.rs   |  12 +-
 2 files changed, 349 insertions(+), 14 deletions(-)

diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
index 490a69d9e..93e703fb1 100644
--- a/src/simlin-engine/examples/layout_eval.rs
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -30,9 +30,11 @@
 
 use std::collections::BTreeSet;
 use std::env;
+use std::fmt::Write as _;
 use std::io::BufReader;
 
 use rayon::prelude::*;
+use serde::Serialize;
 use simlin_engine::diagram::{PngRenderOpts, render_png};
 use simlin_engine::layout::LAYOUT_SEEDS;
 use simlin_engine::layout::config::LayoutConfig;
@@ -350,11 +352,10 @@ fn sweep_model(key: &str, project: &datamodel::Project, seeds: &[u64]) -> ModelS
 /// the metric breakdown of the view that was rendered. The seed is `Some` for a
 /// generated render (best/median/worst) and `None` for the as-loaded reference.
 ///
-/// `seed`, `metrics`, and `weighted_cost` are recorded here but not yet read:
-/// Task 4 serializes them into `metrics.json` and the contact-sheet's per-render
-/// breakdown table. They are deliberately kept as data now (rather than dropped
-/// and recomputed) so Task 4's serializer is a pure read over this struct.
-#[allow(dead_code)]
+/// `seed`, `metrics`, and `weighted_cost` are read by Task 4: the report builder
+/// serializes them into `metrics.json` and the contact-sheet's per-render
+/// breakdown table. They are kept as data here (rather than dropped and
+/// recomputed) so the report builder is a pure read over this struct.
 struct Render {
     /// Filename of the PNG, relative to the out dir (e.g. `sir_best.png`).
     file: String,
@@ -515,6 +516,309 @@ fn report_renders(key: &str, renders: &ModelRenders) {
     }
 }
 
+// ── Report (metrics.json + index.html) ──────────────────────────────────────
+//
+// The structs below are the on-disk JSON shape. They are PURE DATA built once
+// from the in-memory `ModelStats` + `ModelRenders` the sweep produced, then
+// serialized straight to disk -- no recomputation. The contact-sheet HTML is
+// rendered from the same `EvalReport`, so the JSON table and the HTML can never
+// disagree. Building the report and rendering the HTML are pure (the only I/O
+// is the two `std::fs::write` calls in `main`).
+
+/// One rendered view's row in the JSON: the PNG filename, the seed that
+/// produced it (`None` for the as-loaded reference), the full per-term
+/// `LayoutMetrics` breakdown, and the scalar `weighted_cost` under the weights
+/// in use.
+#[derive(Serialize)]
+struct RenderReport {
+    file: String,
+    seed: Option<u64>,
+    metrics: LayoutMetrics,
+    weighted_cost: f64,
+}
+
+/// One model's full row in the JSON: its summary statistics (the seed-sweep
+/// center/spread, the best-of-k production proxy, the chosen best/median/worst
+/// seeds, and `m` -- the number of seeds actually swept) plus each of its
+/// renders' per-term breakdowns (`reference` present only when the model ships
+/// a hand-authored view).
+#[derive(Serialize)]
+struct ModelReport {
+    model: String,
+    /// Number of seeds swept for this model (the union of `LAYOUT_SEEDS` and
+    /// `0..M`, deduped). Recorded so a reader can interpret the spread.
+    m: usize,
+    median_cost: f64,
+    /// `(p25, p75)` of the per-seed weighted costs.
+    spread: (f64, f64),
+    /// Production proxy: min weighted cost over the `LAYOUT_SEEDS` seed set.
+    best_of_k_cost: f64,
+    best_seed: u64,
+    median_seed: u64,
+    worst_seed: u64,
+    /// The hand-authored reference render + score, when the model ships one.
+    reference: Option<RenderReport>,
+    best: Option<RenderReport>,
+    median: Option<RenderReport>,
+    worst: Option<RenderReport>,
+}
+
+/// The top-level `metrics.json` document: every scored model plus the corpus
+/// aggregates (the geomean of per-model medians and the weight set used).
+///
+/// `baseline_comparison` is the place Task 5's baseline-vs-candidate diff plugs
+/// in (per-model + aggregate deltas with Mann-Whitney p-values). It is `None`
+/// here in Phase 3; the field exists so the JSON schema is stable across the
+/// two tasks (a Phase-3 reader sees `null`, a Phase-4 reader sees the diff).
+#[derive(Serialize)]
+struct EvalReport {
+    /// Models sorted worst-cost-first (highest `median_cost` at the front), the
+    /// same order the contact-sheet renders so the JSON and HTML agree.
+    models: Vec<ModelReport>,
+    /// Geometric mean of the per-model medians -- the single headline aggregate.
+    geomean_of_medians: f64,
+    /// The `MetricWeights` used to compute every `weighted_cost` in this report.
+    weights: MetricWeights,
+    /// Reserved for Task 5's baseline diff; always `None` in Phase 3.
+    #[serde(skip_serializing_if = "Option::is_none")]
+    baseline_comparison: Option<()>,
+}
+
+/// Map an in-memory `Render` to its JSON row.
+fn render_report(render: &Render) -> RenderReport {
+    RenderReport {
+        file: render.file.clone(),
+        seed: render.seed,
+        metrics: render.metrics,
+        weighted_cost: render.weighted_cost,
+    }
+}
+
+/// Build the serializable report from the sweep's in-memory results.
+///
+/// PURE: a read over `(per_model, renders)` (paired positionally -- they are
+/// pushed together per model in `main`) plus the corpus `geomean_of_medians`
+/// and the weight set. Models are sorted worst-cost-first (highest median at
+/// the front), the order the contact-sheet inspects top-down as the visual
+/// guardrail; ties break on the model name so the order is deterministic.
+fn build_report(
+    per_model: &[ModelStats],
+    renders: &[ModelRenders],
+    geomean_of_medians: f64,
+    weights: &MetricWeights,
+) -> EvalReport {
+    let mut models: Vec<ModelReport> = per_model
+        .iter()
+        .zip(renders.iter())
+        .map(|(stats, render)| ModelReport {
+            model: stats.model.clone(),
+            m: stats.samples.len(),
+            median_cost: stats.median_cost,
+            spread: stats.spread,
+            best_of_k_cost: stats.best_of_k_cost,
+            best_seed: stats.best_seed,
+            median_seed: stats.median_seed,
+            worst_seed: stats.worst_seed,
+            reference: render.reference.as_ref().map(render_report),
+            best: render.best.as_ref().map(render_report),
+            median: render.median.as_ref().map(render_report),
+            worst: render.worst.as_ref().map(render_report),
+        })
+        .collect();
+
+    // Worst-cost-first: highest median at the front. Sort descending by median,
+    // tie-break on model name (ascending) for a deterministic ordering. NaN
+    // medians can't occur (eval_stats guarantees finite costs), but guard the
+    // partial_cmp anyway so a hypothetical NaN never panics the sort.
+    models.sort_by(|a, b| {
+        b.median_cost
+            .partial_cmp(&a.median_cost)
+            .unwrap_or(std::cmp::Ordering::Equal)
+            .then_with(|| a.model.cmp(&b.model))
+    });
+
+    EvalReport {
+        models,
+        geomean_of_medians,
+        weights: *weights,
+        baseline_comparison: None,
+    }
+}
+
+/// HTML-escape the five characters that are special in element text or
+/// attribute values. The interpolated strings are static model keys and
+/// PNG filenames derived from them, so this is defense-in-depth rather than a
+/// live injection vector -- but escaping unconditionally keeps the artifact
+/// well-formed if a corpus key ever gains a special character.
+fn html_escape(s: &str) -> String {
+    let mut out = String::with_capacity(s.len());
+    for ch in s.chars() {
+        match ch {
+            '&' => out.push_str("&amp;"),
+            '<' => out.push_str("&lt;"),
+            '>' => out.push_str("&gt;"),
+            '"' => out.push_str("&quot;"),
+            '\'' => out.push_str("&#39;"),
+            _ => out.push(ch),
+        }
+    }
+    out
+}
+
+/// Render the per-term metric breakdown for one render as a compact two-column
+/// table (term name -> value), with the scalar `weighted_cost` as the final
+/// row. PURE: appends to `html`.
+fn write_metrics_table(html: &mut String, render: &RenderReport) {
+    let m = &render.metrics;
+    let rows = [
+        ("node_overlap", m.node_overlap),
+        ("node_connector_overlap", m.node_connector_overlap),
+        ("label_overlap", m.label_overlap),
+        ("crossings", m.crossings),
+        ("sprawl", m.sprawl),
+        ("edge_length_cv", m.edge_length_cv),
+        ("aspect_penalty", m.aspect_penalty),
+        ("chain_straightness", m.chain_straightness),
+        ("loop_compactness", m.loop_compactness),
+    ];
+    html.push_str("<table class=\"metrics\">");
+    for (name, value) in rows {
+        let _ = write!(
+            html,
+            "<tr><td>{name}</td><td class=\"num\">{value:.4}</td></tr>"
+        );
+    }
+    let _ = write!(
+        html,
+        "<tr class=\"wcost\"><td>weighted_cost</td><td class=\"num\">{:.4}</td></tr>",
+        render.weighted_cost
+    );
+    html.push_str("</table>");
+}
+
+/// Render one render's cell (heading + image + breakdown table). A missing
+/// render (the model shipped no reference, or its layout/render failed) renders
+/// a muted placeholder so the contact-sheet records the gap rather than hiding
+/// it. PURE.
+fn write_render_cell(html: &mut String, kind: &str, render: Option<&RenderReport>) {
+    html.push_str("<div class=\"cell\">");
+    let _ = write!(html, "<h4>{}</h4>", html_escape(kind));
+    match render {
+        Some(r) => {
+            let src = html_escape(&r.file);
+            let alt = html_escape(&format!("{kind} layout"));
+            let _ = write!(html, "<img src=\"{src}\" alt=\"{alt}\">");
+            if let Some(seed) = r.seed {
+                let _ = write!(html, "<p class=\"seed\">seed {seed}</p>");
+            }
+            write_metrics_table(html, r);
+        }
+        None => html.push_str("<p class=\"missing\">(not rendered)</p>"),
+    }
+    html.push_str("</div>");
+}
+
+/// Render the self-contained `index.html` contact-sheet from the report.
+///
+/// PURE: a string built from `report`. The header shows the corpus
+/// `geomean_of_medians` and the weight set; models are laid out one section per
+/// model, worst-cost-first (the report is already sorted), each with its
+/// reference (if any) and best/median/worst renders side by side and a per-term
+/// breakdown under each. `<img>` paths are relative to the out dir so the file
+/// references its sibling PNGs.
+fn render_index_html(report: &EvalReport) -> String {
+    let mut html = String::new();
+    html.push_str(
+        "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n\
+         <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n\
+         <title>Layout quality eval</title>\n<style>\n\
+         :root { font-family: Roboto, Helvetica, Arial, sans-serif; }\n\
+         body { margin: 24px; color: #1a1a1a; background: #fafafa; }\n\
+         h1 { font-size: 20px; margin: 0 0 4px; }\n\
+         .summary { color: #555; font-size: 13px; margin-bottom: 16px; }\n\
+         .summary code { background: #eee; padding: 1px 4px; border-radius: 4px; }\n\
+         table.weights { border-collapse: collapse; font-size: 12px; margin: 8px 0 24px; }\n\
+         table.weights td { border: 1px solid #ddd; padding: 2px 8px; }\n\
+         .model { border: 1px solid #ddd; border-radius: 4px; background: #fff;\n\
+                  padding: 12px 16px; margin-bottom: 20px; }\n\
+         .model h2 { font-size: 16px; margin: 0 0 2px; }\n\
+         .model .stats { color: #555; font-size: 12px; margin-bottom: 12px; }\n\
+         .renders { display: flex; flex-wrap: wrap; gap: 16px; }\n\
+         .cell { flex: 0 0 auto; max-width: 280px; }\n\
+         .cell h4 { font-size: 13px; margin: 0 0 4px; text-transform: capitalize; }\n\
+         .cell img { max-width: 280px; height: auto; border: 1px solid #eee;\n\
+                     background: #fff; display: block; }\n\
+         .cell .seed { font-size: 11px; color: #888; margin: 4px 0 2px; }\n\
+         .cell .missing { font-size: 12px; color: #999; font-style: italic; }\n\
+         table.metrics { border-collapse: collapse; font-size: 11px; margin-top: 4px;\n\
+                         width: 100%; }\n\
+         table.metrics td { border-bottom: 1px solid #f0f0f0; padding: 1px 4px; }\n\
+         table.metrics td.num { text-align: right; font-variant-numeric: tabular-nums; }\n\
+         table.metrics tr.wcost td { font-weight: 600; border-top: 1px solid #ccc; }\n\
+         </style>\n</head>\n<body>\n",
+    );
+
+    html.push_str("<h1>Layout quality eval</h1>\n");
+    let _ = writeln!(
+        &mut html,
+        "<p class=\"summary\">Corpus <code>geomean_of_medians = {:.4}</code> over \
+         {} model(s), sorted worst-cost-first.</p>",
+        report.geomean_of_medians,
+        report.models.len(),
+    );
+
+    // The weight set used for every weighted_cost in this report.
+    let w = &report.weights;
+    let weight_rows = [
+        ("node_overlap", w.node_overlap),
+        ("node_connector_overlap", w.node_connector_overlap),
+        ("label_overlap", w.label_overlap),
+        ("crossings", w.crossings),
+        ("sprawl", w.sprawl),
+        ("edge_length_cv", w.edge_length_cv),
+        ("aspect_penalty", w.aspect_penalty),
+        ("chain_straightness", w.chain_straightness),
+        ("loop_compactness", w.loop_compactness),
+    ];
+    html.push_str("<table class=\"weights\"><caption>weights</caption>");
+    for (name, value) in weight_rows {
+        let _ = write!(
+            &mut html,
+            "<tr><td>{name}</td><td class=\"num\">{value:.4}</td></tr>"
+        );
+    }
+    html.push_str("</table>\n");
+
+    for model in &report.models {
+        let name = html_escape(&model.model);
+        html.push_str("<section class=\"model\">");
+        let _ = write!(&mut html, "<h2>{name}</h2>");
+        let _ = write!(
+            &mut html,
+            "<p class=\"stats\">median={:.4} &middot; p25/p75={:.4}/{:.4} &middot; \
+             best_of_k={:.4} &middot; M={} &middot; \
+             seeds best/median/worst={}/{}/{}</p>",
+            model.median_cost,
+            model.spread.0,
+            model.spread.1,
+            model.best_of_k_cost,
+            model.m,
+            model.best_seed,
+            model.median_seed,
+            model.worst_seed,
+        );
+        html.push_str("<div class=\"renders\">");
+        write_render_cell(&mut html, "reference", model.reference.as_ref());
+        write_render_cell(&mut html, "best", model.best.as_ref());
+        write_render_cell(&mut html, "median", model.median.as_ref());
+        write_render_cell(&mut html, "worst", model.worst.as_ref());
+        html.push_str("</div></section>\n");
+    }
+
+    html.push_str("</body>\n</html>\n");
+    html
+}
+
 fn main() {
     let keys = selected_keys();
     let m = seed_count();
@@ -559,20 +863,43 @@ fn main() {
         }
     }
 
-    let report = CorpusReport::from_model_stats(per_model);
+    let corpus = CorpusReport::from_model_stats(per_model);
     println!(
         "corpus: geomean_of_medians={:.4} ({} model(s) scored)",
-        report.geomean_of_medians,
-        report.per_model.len(),
+        corpus.geomean_of_medians,
+        corpus.per_model.len(),
     );
 
-    // The per-model `renders` (PNG filenames + metric breakdowns) are the input
-    // Task 4 serializes into `metrics.json` and the contact-sheet. Until then,
-    // summarize how many models shipped a hand-authored reference so a run's
-    // stdout records the reference coverage AC3.4 depends on.
     let with_reference = renders.iter().filter(|r| r.reference.is_some()).count();
     println!(
         "corpus: {with_reference}/{} model(s) shipped a hand-authored reference view",
         renders.len(),
     );
+
+    // Build the serializable report from the in-memory stats + renders, then
+    // emit both artifacts under the out dir (which defaults under the gitignored
+    // repo-root `target/`). `corpus.per_model` and `renders` are positionally
+    // paired -- both are pushed once per surviving model in the loop above.
+    let report = build_report(
+        &corpus.per_model,
+        &renders,
+        corpus.geomean_of_medians,
+        &PLACEHOLDER_WEIGHTS,
+    );
+
+    let metrics_path = format!("{out}/metrics.json");
+    match serde_json::to_string_pretty(&report) {
+        Ok(json) => match std::fs::write(&metrics_path, json) {
+            Ok(()) => println!("wrote {metrics_path}"),
+            Err(err) => eprintln!("WARN: failed to write {metrics_path}: {err}"),
+        },
+        Err(err) => eprintln!("WARN: failed to serialize metrics.json: {err}"),
+    }
+
+    let index_path = format!("{out}/index.html");
+    let html = render_index_html(&report);
+    match std::fs::write(&index_path, html) {
+        Ok(()) => println!("wrote {index_path}"),
+        Err(err) => eprintln!("WARN: failed to write {index_path}: {err}"),
+    }
 }
diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index 75e5343fa..a219eb0f5 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -50,7 +50,11 @@ pub const TARGET_AR_MAX: f64 = 16.0 / 9.0;
 /// the fixed pixel size of a stock/aux box should score differently from one
 /// spread far apart, and that sensitivity is what makes those terms meaningful
 /// across models. See the AC1.8 scoping note in the Phase 1 plan.
-#[derive(Clone, Copy, Debug, PartialEq)]
+///
+/// `Serialize` lets the layout-quality eval sweep (`examples/layout_eval.rs`)
+/// emit the per-term breakdown into its `metrics.json` artifact; the struct is
+/// pure data (every field a plain `f64`), so the derive carries no behavior.
+#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize)]
 pub struct LayoutMetrics {
     /// Sum of pairwise node-box overlap area, normalized by total node area.
     pub node_overlap: f64,
@@ -80,7 +84,11 @@ pub struct LayoutMetrics {
 /// are committed in Phase 4. Until then `MetricWeights::default()` is all-zeros
 /// (see below) so any accidental use of `weighted_cost` before calibration is
 /// obviously inert rather than silently wrong.
-#[derive(Clone, Copy, Debug, PartialEq)]
+///
+/// `Serialize` lets the layout-quality eval sweep (`examples/layout_eval.rs`)
+/// record the weight set it used in its `metrics.json` artifact; the struct is
+/// pure data (every field a plain `f64`), so the derive carries no behavior.
+#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize)]
 pub struct MetricWeights {
     pub node_overlap: f64,
     pub node_connector_overlap: f64,

From 32723163142e45a50892479360704b0f69f37669 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 07:14:35 -0700
Subject: [PATCH 19/38] engine: layout_eval baseline diff via compare()

Add serde::Deserialize alongside the existing Serialize on LayoutMetrics
and MetricWeights (metrics.rs) and add serde::Serialize+Deserialize to
MetricSample/ModelStats/CorpusReport (eval_stats.rs) so a full CorpusReport
-- including each model's per-seed samples -- round-trips through JSON. The
samples must survive so compare() can re-run Mann-Whitney U over the
seed-sample cost sets. Comparison/ModelComparison gain Serialize only (the
diff is recomputed every run, never read back).

The layout_eval example now resolves a baseline diff: LAYOUT_EVAL_WRITE_BASELINE=1
writes the run's CorpusReport to the committed examples/layout_eval_baseline.json;
a normal run reads that file, runs compare(baseline, candidate), prints the
per-model delta_ratio + p_value + significance plus the aggregate, and embeds
the Comparison into metrics.json (the baseline_comparison slot widened from
Option<()> to Option<Comparison>) and the index.html header. An absent baseline
skips the diff with a note.

The committed baseline is seeded from a small representative subset
(LAYOUT_EVAL_MODELS=sir,teacup LAYOUT_EVAL_SEEDS=8) to keep the run fast and
the JSON modest. It captures CURRENT pre-Rung-0 behavior scored with the
Phase-3 PLACEHOLDER_WEIGHTS. It MUST be regenerated after Phase 4 commits the
calibrated weights (and the example switches to MetricWeights::default()) and
before Phase 5 measures Rung 0's improvement -- see the sibling
layout_eval_baseline.README.md. Since layout is now deterministic per seed,
seeding then diffing candidate==baseline yields exactly-zero, non-significant
deltas (consistent with AC4.5).
---
 src/simlin-engine/examples/layout_eval.rs     | 265 +++++++++++-
 .../examples/layout_eval_baseline.README.md   |  32 ++
 .../examples/layout_eval_baseline.json        | 393 ++++++++++++++++++
 src/simlin-engine/src/layout/eval_stats.rs    |  32 +-
 src/simlin-engine/src/layout/metrics.rs       |  20 +-
 5 files changed, 713 insertions(+), 29 deletions(-)
 create mode 100644 src/simlin-engine/examples/layout_eval_baseline.README.md
 create mode 100644 src/simlin-engine/examples/layout_eval_baseline.json

diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
index 93e703fb1..72695d1d5 100644
--- a/src/simlin-engine/examples/layout_eval.rs
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -21,9 +21,19 @@
 //!   LAYOUT_EVAL_MODELS=teacup,sir cargo run ... --example layout_eval
 //!
 //! Env knobs:
-//!   LAYOUT_EVAL_MODELS  comma list of corpus keys to run (default: all)
-//!   LAYOUT_EVAL_SEEDS   number of seeds M to sample (default: 25)
-//!   LAYOUT_EVAL_OUT     output directory (default: repo-root target/layout-eval)
+//!   LAYOUT_EVAL_MODELS         comma list of corpus keys to run (default: all)
+//!   LAYOUT_EVAL_SEEDS          number of seeds M to sample (default: 25)
+//!   LAYOUT_EVAL_OUT            output directory (default: repo-root target/layout-eval)
+//!   LAYOUT_EVAL_WRITE_BASELINE 1 -> write this run's report to the committed
+//!                              baseline JSON (see below) instead of diffing.
+//!
+//! Baseline diff: a committed `examples/layout_eval_baseline.json` (a serialized
+//! `CorpusReport`) records a reference run. A normal run reads it back, runs
+//! `compare(baseline, candidate)`, and embeds the per-model + aggregate deltas
+//! (with Mann-Whitney U p-values / significance verdicts) into `metrics.json`
+//! and the `index.html` header. With `LAYOUT_EVAL_WRITE_BASELINE=1` the run
+//! instead overwrites that baseline file (re-seed it after the metric weights
+//! change). If the file is absent a normal run skips the diff with a note.
 //!
 //! Requires `--features png_render,file_io`: `png_render` for `render_png`, and
 //! `file_io` so Vensim corpus models that reference external data can load.
@@ -38,7 +48,9 @@ use serde::Serialize;
 use simlin_engine::diagram::{PngRenderOpts, render_png};
 use simlin_engine::layout::LAYOUT_SEEDS;
 use simlin_engine::layout::config::LayoutConfig;
-use simlin_engine::layout::eval_stats::{CorpusReport, MetricSample, ModelStats};
+use simlin_engine::layout::eval_stats::{
+    Comparison, CorpusReport, MetricSample, ModelStats, compare,
+};
 use simlin_engine::layout::generate_layout_with_config;
 use simlin_engine::layout::metrics::{LayoutMetrics, MetricWeights, compute_layout_metrics};
 use simlin_engine::{datamodel, open_vensim, open_xmile};
@@ -78,6 +90,12 @@ const MAIN_MODEL: &str = "main";
 /// Default number of seeds to sample per model when `LAYOUT_EVAL_SEEDS` is unset.
 const DEFAULT_SEEDS: u64 = 25;
 
+/// Path (relative to `CARGO_MANIFEST_DIR` = `src/simlin-engine`) of the committed
+/// baseline `CorpusReport`. This file lives in the SOURCE TREE by design (it is
+/// checked in and diffed against on every normal run), unlike every other
+/// artifact, which is written under the gitignored `target/` output dir.
+const BASELINE_REL_PATH: &str = "examples/layout_eval_baseline.json";
+
 // ── Corpus ─────────────────────────────────────────────────────────────────
 
 #[derive(Clone, Copy)]
@@ -282,6 +300,28 @@ fn out_dir() -> String {
         .unwrap_or_else(|_| format!("{}/../../target/layout-eval", env!("CARGO_MANIFEST_DIR")))
 }
 
+/// Whether to (re)seed the committed baseline instead of diffing against it.
+/// True when `LAYOUT_EVAL_WRITE_BASELINE` is set to a truthy value (`1`/`true`,
+/// case-insensitive). Any other value -- and an unset variable -- means a normal
+/// diffing run.
+fn write_baseline_requested() -> bool {
+    matches!(
+        env::var("LAYOUT_EVAL_WRITE_BASELINE")
+            .unwrap_or_default()
+            .trim()
+            .to_ascii_lowercase()
+            .as_str(),
+        "1" | "true"
+    )
+}
+
+/// Absolute path of the committed baseline `CorpusReport` JSON. Resolved against
+/// `CARGO_MANIFEST_DIR` so it always points at the source-tree file regardless
+/// of the working directory the example runs from.
+fn baseline_path() -> String {
+    format!("{}/{}", env!("CARGO_MANIFEST_DIR"), BASELINE_REL_PATH)
+}
+
 // ── Per-model seed sweep ─────────────────────────────────────────────────────
 
 /// Lay out `project`'s main model once for each `seed`, score each layout, and
@@ -566,10 +606,11 @@ struct ModelReport {
 /// The top-level `metrics.json` document: every scored model plus the corpus
 /// aggregates (the geomean of per-model medians and the weight set used).
 ///
-/// `baseline_comparison` is the place Task 5's baseline-vs-candidate diff plugs
-/// in (per-model + aggregate deltas with Mann-Whitney p-values). It is `None`
-/// here in Phase 3; the field exists so the JSON schema is stable across the
-/// two tasks (a Phase-3 reader sees `null`, a Phase-4 reader sees the diff).
+/// `baseline_comparison` carries the baseline-vs-candidate diff (per-model +
+/// aggregate deltas with Mann-Whitney p-values) when a committed baseline JSON
+/// is present; it is `None` (and serde-skipped) when there is no baseline to
+/// diff against. A reader therefore sees the diff embedded directly in the JSON,
+/// or no `baseline_comparison` key at all.
 #[derive(Serialize)]
 struct EvalReport {
     /// Models sorted worst-cost-first (highest `median_cost` at the front), the
@@ -579,9 +620,10 @@ struct EvalReport {
     geomean_of_medians: f64,
     /// The `MetricWeights` used to compute every `weighted_cost` in this report.
     weights: MetricWeights,
-    /// Reserved for Task 5's baseline diff; always `None` in Phase 3.
+    /// The baseline-vs-candidate diff, present only when a committed baseline
+    /// `CorpusReport` was found and compared against this run.
     #[serde(skip_serializing_if = "Option::is_none")]
-    baseline_comparison: Option<()>,
+    baseline_comparison: Option<Comparison>,
 }
 
 /// Map an in-memory `Render` to its JSON row.
@@ -606,6 +648,7 @@ fn build_report(
     renders: &[ModelRenders],
     geomean_of_medians: f64,
     weights: &MetricWeights,
+    baseline_comparison: Option<Comparison>,
 ) -> EvalReport {
     let mut models: Vec<ModelReport> = per_model
         .iter()
@@ -641,7 +684,7 @@ fn build_report(
         models,
         geomean_of_medians,
         weights: *weights,
-        baseline_comparison: None,
+        baseline_comparison,
     }
 }
 
@@ -718,14 +761,83 @@ fn write_render_cell(html: &mut String, kind: &str, render: Option<&RenderReport
     html.push_str("</div>");
 }
 
+/// Format a `delta_ratio` as a signed percentage (e.g. `+3.2%`, `-0.0%`). PURE.
+fn fmt_delta_pct(ratio: f64) -> String {
+    format!("{:+.2}%", ratio * 100.0)
+}
+
+/// Render the baseline-vs-candidate diff into the header: the aggregate delta +
+/// significance verdict, then a per-model table of `delta_ratio`, the
+/// Mann-Whitney p-value, and the significance verdict. A `None` comparison (no
+/// committed baseline) renders a muted note instead, so the contact-sheet always
+/// records whether a baseline was diffed. PURE: appends to `html`.
+fn write_baseline_diff(html: &mut String, comparison: Option<&Comparison>) {
+    let Some(cmp) = comparison else {
+        html.push_str(
+            "<p class=\"none\">No baseline diff (run with \
+             <code>LAYOUT_EVAL_WRITE_BASELINE=1</code> to seed one).</p>\n",
+        );
+        return;
+    };
+
+    html.push_str("<div class=\"baseline\"><h3>Baseline diff</h3>");
+    let agg_class = if cmp.aggregate_significant {
+        "sig"
+    } else {
+        "nonsig"
+    };
+    let agg_verdict = if cmp.aggregate_significant {
+        "significant"
+    } else {
+        "not significant"
+    };
+    let _ = write!(
+        html,
+        "<p class=\"agg\">aggregate delta <code>{}</code> &middot; \
+         p={:.4} &middot; <span class=\"{agg_class}\">{agg_verdict}</span></p>",
+        fmt_delta_pct(cmp.aggregate_delta_ratio),
+        cmp.aggregate_p_value,
+    );
+
+    if cmp.per_model.is_empty() {
+        html.push_str("<p class=\"agg\">(no models matched the baseline)</p></div>\n");
+        return;
+    }
+
+    html.push_str(
+        "<table class=\"diff\"><tr><th>model</th><th>baseline</th>\
+         <th>candidate</th><th>delta</th><th>p</th><th>significance</th></tr>",
+    );
+    for m in &cmp.per_model {
+        let (cls, verdict) = if m.significant {
+            ("sig", "significant")
+        } else {
+            ("nonsig", "&mdash;")
+        };
+        let _ = write!(
+            html,
+            "<tr><td>{}</td><td class=\"num\">{:.4}</td><td class=\"num\">{:.4}</td>\
+             <td class=\"num\">{}</td><td class=\"num\">{:.4}</td>\
+             <td class=\"{cls}\">{verdict}</td></tr>",
+            html_escape(&m.model),
+            m.baseline_median,
+            m.candidate_median,
+            fmt_delta_pct(m.delta_ratio),
+            m.p_value,
+        );
+    }
+    html.push_str("</table></div>\n");
+}
+
 /// Render the self-contained `index.html` contact-sheet from the report.
 ///
 /// PURE: a string built from `report`. The header shows the corpus
-/// `geomean_of_medians` and the weight set; models are laid out one section per
-/// model, worst-cost-first (the report is already sorted), each with its
-/// reference (if any) and best/median/worst renders side by side and a per-term
-/// breakdown under each. `<img>` paths are relative to the out dir so the file
-/// references its sibling PNGs.
+/// `geomean_of_medians`, the weight set, and (when a committed baseline was
+/// diffed) the baseline-vs-candidate delta table; models are laid out one
+/// section per model, worst-cost-first (the report is already sorted), each with
+/// its reference (if any) and best/median/worst renders side by side and a
+/// per-term breakdown under each. `<img>` paths are relative to the out dir so
+/// the file references its sibling PNGs.
 fn render_index_html(report: &EvalReport) -> String {
     let mut html = String::new();
     html.push_str(
@@ -739,6 +851,18 @@ fn render_index_html(report: &EvalReport) -> String {
          .summary code { background: #eee; padding: 1px 4px; border-radius: 4px; }\n\
          table.weights { border-collapse: collapse; font-size: 12px; margin: 8px 0 24px; }\n\
          table.weights td { border: 1px solid #ddd; padding: 2px 8px; }\n\
+         .baseline { border: 1px solid #ddd; border-radius: 4px; background: #fff;\n\
+                     padding: 8px 12px; margin: 8px 0 24px; }\n\
+         .baseline h3 { font-size: 13px; margin: 0 0 6px; }\n\
+         .baseline .agg { font-size: 12px; color: #555; margin: 0 0 6px; }\n\
+         table.diff { border-collapse: collapse; font-size: 12px; }\n\
+         table.diff th, table.diff td { border: 1px solid #eee; padding: 2px 8px;\n\
+                                        text-align: right; }\n\
+         table.diff th:first-child, table.diff td:first-child { text-align: left; }\n\
+         table.diff td.num { font-variant-numeric: tabular-nums; }\n\
+         .sig { color: #c62828; font-weight: 600; }\n\
+         .nonsig { color: #888; }\n\
+         .none { color: #999; font-style: italic; font-size: 12px; margin: 0 0 24px; }\n\
          .model { border: 1px solid #ddd; border-radius: 4px; background: #fff;\n\
                   padding: 12px 16px; margin-bottom: 20px; }\n\
          .model h2 { font-size: 16px; margin: 0 0 2px; }\n\
@@ -789,6 +913,8 @@ fn render_index_html(report: &EvalReport) -> String {
     }
     html.push_str("</table>\n");
 
+    write_baseline_diff(&mut html, report.baseline_comparison.as_ref());
+
     for model in &report.models {
         let name = html_escape(&model.model);
         html.push_str("<section class=\"model\">");
@@ -819,6 +945,106 @@ fn render_index_html(report: &EvalReport) -> String {
     html
 }
 
+// ── Baseline diff (imperative shell) ─────────────────────────────────────────
+
+/// Write `candidate` to the committed baseline JSON, replacing any existing
+/// file. The full `CorpusReport` -- including each model's per-seed `samples` --
+/// is serialized so a later run can re-run Mann-Whitney U over the seed-sample
+/// cost sets. On a serialize or write failure WARN to stderr (the run still
+/// emits its `target/` artifacts; only the baseline re-seed failed).
+fn write_baseline(candidate: &CorpusReport) {
+    let path = baseline_path();
+    match serde_json::to_string_pretty(candidate) {
+        Ok(json) => match std::fs::write(&path, json) {
+            Ok(()) => println!(
+                "wrote baseline {path}\n\
+                 note: re-seed this baseline after the metric weights change."
+            ),
+            Err(err) => eprintln!("WARN: failed to write baseline {path}: {err}"),
+        },
+        Err(err) => eprintln!("WARN: failed to serialize baseline: {err}"),
+    }
+}
+
+/// Read and deserialize the committed baseline `CorpusReport`, if present.
+///
+/// Returns `None` (with a one-line note) when the file does not exist -- the
+/// expected state before a baseline has been seeded. A file that exists but
+/// fails to read or parse is a real error: WARN with the cause and return `None`
+/// so the run still emits its artifacts without a diff.
+fn read_baseline() -> Option<CorpusReport> {
+    let path = baseline_path();
+    let json = match std::fs::read_to_string(&path) {
+        Ok(json) => json,
+        Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+            println!("no baseline; run with LAYOUT_EVAL_WRITE_BASELINE=1 to seed one.");
+            return None;
+        }
+        Err(err) => {
+            eprintln!("WARN: failed to read baseline {path}: {err}");
+            return None;
+        }
+    };
+    match serde_json::from_str::<CorpusReport>(&json) {
+        Ok(report) => Some(report),
+        Err(err) => {
+            eprintln!("WARN: failed to parse baseline {path}: {err}");
+            None
+        }
+    }
+}
+
+/// Print the baseline-vs-candidate diff to stdout: one line per matched model
+/// (delta + p-value + significance) and an aggregate line. PURE-ish: reads
+/// `cmp` and prints; kept in the shell because it does I/O (stdout).
+fn print_comparison(cmp: &Comparison) {
+    println!("baseline diff (candidate vs baseline):");
+    for m in &cmp.per_model {
+        let verdict = if m.significant {
+            "significant"
+        } else {
+            "not significant"
+        };
+        println!(
+            "  {}: delta={} p={:.4} ({verdict})",
+            m.model,
+            fmt_delta_pct(m.delta_ratio),
+            m.p_value,
+        );
+    }
+    if cmp.per_model.is_empty() {
+        println!("  (no models matched the baseline)");
+    }
+    let agg_verdict = if cmp.aggregate_significant {
+        "significant"
+    } else {
+        "not significant"
+    };
+    println!(
+        "  aggregate: delta={} p={:.4} ({agg_verdict})",
+        fmt_delta_pct(cmp.aggregate_delta_ratio),
+        cmp.aggregate_p_value,
+    );
+}
+
+/// Resolve the baseline diff for this run.
+///
+/// When `LAYOUT_EVAL_WRITE_BASELINE` is set, (re)seed the committed baseline
+/// from `candidate` and return `None` (a seeding run reports no diff -- there is
+/// nothing yet to diff against). Otherwise read the committed baseline (if any),
+/// run `compare(baseline, candidate)`, print the diff, and return it for
+/// embedding in the artifacts. Absent baseline -> `None`.
+fn resolve_baseline_diff(candidate: &CorpusReport) -> Option<Comparison> {
+    if write_baseline_requested() {
+        write_baseline(candidate);
+        return None;
+    }
+    let baseline = read_baseline()?;
+    let cmp = compare(&baseline, candidate);
+    print_comparison(&cmp);
+    Some(cmp)
+}
+
 fn main() {
     let keys = selected_keys();
     let m = seed_count();
@@ -876,6 +1102,12 @@ fn main() {
         renders.len(),
     );
 
+    // Either (re)seed the committed baseline from this run, or diff this run's
+    // report against the committed baseline (printing the per-model + aggregate
+    // deltas with Mann-Whitney p-values). The returned `Comparison` (if any) is
+    // embedded into both artifacts below.
+    let baseline_comparison = resolve_baseline_diff(&corpus);
+
     // Build the serializable report from the in-memory stats + renders, then
     // emit both artifacts under the out dir (which defaults under the gitignored
     // repo-root `target/`). `corpus.per_model` and `renders` are positionally
@@ -885,6 +1117,7 @@ fn main() {
         &renders,
         corpus.geomean_of_medians,
         &PLACEHOLDER_WEIGHTS,
+        baseline_comparison,
     );
 
     let metrics_path = format!("{out}/metrics.json");
diff --git a/src/simlin-engine/examples/layout_eval_baseline.README.md b/src/simlin-engine/examples/layout_eval_baseline.README.md
new file mode 100644
index 000000000..9033eecf6
--- /dev/null
+++ b/src/simlin-engine/examples/layout_eval_baseline.README.md
@@ -0,0 +1,32 @@
+# layout_eval_baseline.json
+
+The committed baseline `CorpusReport` that `examples/layout_eval.rs` diffs every
+normal run against (per-model + aggregate deltas with Mann-Whitney U p-values).
+
+## How this snapshot was seeded
+
+This baseline was seeded over a **small representative subset** of the corpus to
+keep the run fast and the committed JSON modest:
+
+```
+LAYOUT_EVAL_MODELS=sir,teacup LAYOUT_EVAL_SEEDS=8 LAYOUT_EVAL_WRITE_BASELINE=1 \
+  cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval
+```
+
+It records the **current pre-Rung-0 layout behavior**, scored with the Phase-3
+`PLACEHOLDER_WEIGHTS` (NOT the calibrated weights). Do not seed the full metasd
+corpus here: that is minutes-scale and produces a large JSON.
+
+## When to regenerate
+
+REGENERATE this baseline:
+
+- **After Phase 4 commits the calibrated `MetricWeights`** (and `layout_eval.rs`
+  switches from `PLACEHOLDER_WEIGHTS` to `MetricWeights::default()`): the
+  weighted costs change, so the recorded sample costs are stale.
+- **Before Phase 5 measures Rung 0's improvement**: the baseline must capture
+  pre-Rung-0 behavior with the final calibrated weights so the Rung-0 diff is
+  meaningful.
+
+Re-run the seeding command above (optionally over a broader model set / larger
+`LAYOUT_EVAL_SEEDS`) and commit the regenerated `layout_eval_baseline.json`.
diff --git a/src/simlin-engine/examples/layout_eval_baseline.json b/src/simlin-engine/examples/layout_eval_baseline.json
new file mode 100644
index 000000000..9dea65888
--- /dev/null
+++ b/src/simlin-engine/examples/layout_eval_baseline.json
@@ -0,0 +1,393 @@
+{
+  "per_model": [
+    {
+      "model": "teacup",
+      "samples": [
+        {
+          "seed": 0,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 1,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 2,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 3,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 4,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 5,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 6,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 7,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 42,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 123,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 456,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        },
+        {
+          "seed": 789,
+          "metrics": {
+            "node_overlap": 0.05039334765728716,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.25,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.5742262628325421
+        }
+      ],
+      "median_cost": 0.5742262628325421,
+      "spread": [
+        0.5742262628325421,
+        0.5742262628325421
+      ],
+      "best_of_k_cost": 0.5742262628325421,
+      "best_seed": 0,
+      "median_seed": 0,
+      "worst_seed": 0
+    },
+    {
+      "model": "sir",
+      "samples": [
+        {
+          "seed": 0,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 1,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 2,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 3,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 4,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 5,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 6,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 7,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 42,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 123,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 456,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        },
+        {
+          "seed": 789,
+          "metrics": {
+            "node_overlap": 0.05327008392222038,
+            "node_connector_overlap": 0.007868550165242162,
+            "label_overlap": 0.004184421171500422,
+            "crossings": 0.3333333333333333,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.6996784533732872
+        }
+      ],
+      "median_cost": 0.6996784533732872,
+      "spread": [
+        0.6996784533732872,
+        0.6996784533732872
+      ],
+      "best_of_k_cost": 0.6996784533732872,
+      "best_seed": 0,
+      "median_seed": 0,
+      "worst_seed": 0
+    }
+  ],
+  "geomean_of_medians": 0.6338562482653269
+}
\ No newline at end of file
diff --git a/src/simlin-engine/src/layout/eval_stats.rs b/src/simlin-engine/src/layout/eval_stats.rs
index e6532c22f..4c6fd5a56 100644
--- a/src/simlin-engine/src/layout/eval_stats.rs
+++ b/src/simlin-engine/src/layout/eval_stats.rs
@@ -214,7 +214,12 @@ pub const GEOMEAN_FLOOR_EPSILON: f64 = 1e-9;
 
 /// One per-seed layout sample: the seed that produced the layout, its computed
 /// metrics, and the scalar weighted cost the optimizer minimizes.
-#[derive(Clone, Debug)]
+///
+/// `Serialize`/`Deserialize` let the corpus sweep round-trip a full
+/// [`CorpusReport`] (including these per-seed samples) through JSON, so the
+/// committed baseline report can be read back and the per-model seed-sample
+/// cost sets re-run through [`mann_whitney_u`] by [`compare`].
+#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
 pub struct MetricSample {
     pub seed: u64,
     pub metrics: LayoutMetrics,
@@ -225,7 +230,10 @@ pub struct MetricSample {
 /// plus the center (`median_cost`), spread (`p25`, `p75`), the best-of-k
 /// production proxy, and the best/median/worst seeds (which drive Phase 3's
 /// PNG renders).
-#[derive(Clone, Debug)]
+///
+/// `Serialize`/`Deserialize` ride on [`MetricSample`]'s so a [`CorpusReport`]
+/// round-trips through JSON (see [`MetricSample`]).
+#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
 pub struct ModelStats {
     pub model: String,
     /// One sample per seed.
@@ -242,7 +250,13 @@ pub struct ModelStats {
 
 /// Corpus-wide report: one `ModelStats` per model plus the geometric mean of
 /// the per-model medians (the single headline aggregate, benchstat-style).
-#[derive(Clone, Debug)]
+///
+/// `Serialize`/`Deserialize` let the corpus sweep write this report to the
+/// committed `examples/layout_eval_baseline.json` and read it back for the
+/// baseline-vs-candidate diff (`compare`). The full report -- including each
+/// model's per-seed `samples` -- round-trips so `compare` can re-run
+/// Mann-Whitney U over the seed-sample cost sets.
+#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
 pub struct CorpusReport {
     pub per_model: Vec<ModelStats>,
     pub geomean_of_medians: f64,
@@ -364,7 +378,11 @@ impl CorpusReport {
 }
 
 /// Per-model verdict from comparing a baseline against a candidate report.
-#[derive(Clone, Debug)]
+///
+/// `Serialize` lets the corpus sweep embed the baseline-vs-candidate diff into
+/// its `metrics.json` artifact. The verdict is never read back from JSON (it is
+/// recomputed by `compare` on every run), so it carries no `Deserialize`.
+#[derive(Clone, Debug, serde::Serialize)]
 pub struct ModelComparison {
     pub model: String,
     pub baseline_median: f64,
@@ -382,7 +400,11 @@ pub struct ModelComparison {
 
 /// Result of comparing two corpus reports: one [`ModelComparison`] per matched
 /// model plus the corpus-wide aggregate delta and significance verdict.
-#[derive(Clone, Debug)]
+///
+/// `Serialize` lets the corpus sweep embed this diff into its `metrics.json`
+/// artifact. Like [`ModelComparison`] it carries no `Deserialize`: the diff is
+/// recomputed by `compare` on every run, never read back from JSON.
+#[derive(Clone, Debug, serde::Serialize)]
 pub struct Comparison {
     /// One entry per model present in BOTH reports (unmatched models are
     /// skipped -- see [`compare`]), in baseline iteration order.
diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index a219eb0f5..6d6489d1b 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -51,10 +51,12 @@ pub const TARGET_AR_MAX: f64 = 16.0 / 9.0;
 /// spread far apart, and that sensitivity is what makes those terms meaningful
 /// across models. See the AC1.8 scoping note in the Phase 1 plan.
 ///
-/// `Serialize` lets the layout-quality eval sweep (`examples/layout_eval.rs`)
-/// emit the per-term breakdown into its `metrics.json` artifact; the struct is
-/// pure data (every field a plain `f64`), so the derive carries no behavior.
-#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize)]
+/// `Serialize`/`Deserialize` let the layout-quality eval sweep
+/// (`examples/layout_eval.rs`) emit the per-term breakdown into its
+/// `metrics.json` artifact and round-trip the committed baseline report back
+/// from JSON for the baseline diff; the struct is pure data (every field a
+/// plain `f64`), so the derives carry no behavior.
+#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize, serde::Deserialize)]
 pub struct LayoutMetrics {
     /// Sum of pairwise node-box overlap area, normalized by total node area.
     pub node_overlap: f64,
@@ -85,10 +87,12 @@ pub struct LayoutMetrics {
 /// (see below) so any accidental use of `weighted_cost` before calibration is
 /// obviously inert rather than silently wrong.
 ///
-/// `Serialize` lets the layout-quality eval sweep (`examples/layout_eval.rs`)
-/// record the weight set it used in its `metrics.json` artifact; the struct is
-/// pure data (every field a plain `f64`), so the derive carries no behavior.
-#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize)]
+/// `Serialize`/`Deserialize` let the layout-quality eval sweep
+/// (`examples/layout_eval.rs`) record the weight set it used in its
+/// `metrics.json` artifact and read it back when round-tripping the committed
+/// baseline report; the struct is pure data (every field a plain `f64`), so the
+/// derives carry no behavior.
+#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize, serde::Deserialize)]
 pub struct MetricWeights {
     pub node_overlap: f64,
     pub node_connector_overlap: f64,

From 38afb57d683215713cfd71546cc82508b31c571f Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 07:23:24 -0700
Subject: [PATCH 20/38] engine: layout_eval skip-on-failure + full-sweep smoke
 check

Wrap each model's full pipeline (load -> seed sweep -> render) in a new
process_model() returning Result<(ModelStats, ModelRenders), String>, the
single model-level skip-on-failure boundary. main() WARN-logs any per-model
Err and continues to the next model, omitting the failed model from every
artifact; the harness always reaches the end and exits 0 (even a fully-empty
corpus is exit 0 with warnings).

Three failure modes funnel through the Result, validated in the order data
flows: a load/parse failure (already surfaced by load_model), a layout that
fails on EVERY seed (zero usable samples -- now a model-level skip rather than
a degenerate all-zero report entry), and render failures (kept non-fatal
inside render_model, so a model with valid scores but a missing PNG still
appears). A partial per-seed failure never sinks a model: the skip fires only
when no seed produced a usable sample.

Also refreshes the now-stale sweep_model doc comment: generate_layout_with_config
is deterministic per seed since fix #633, so the prior note about run-to-run
median/spread drift no longer applies.
---
 src/simlin-engine/examples/layout_eval.rs | 126 ++++++++++++++++------
 1 file changed, 94 insertions(+), 32 deletions(-)

diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
index 72695d1d5..ad577b4d9 100644
--- a/src/simlin-engine/examples/layout_eval.rs
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -333,20 +333,17 @@ fn baseline_path() -> String {
 /// vector -- and every statistic derived from it -- is invariant to rayon's
 /// scheduling: parallelism introduces no nondeterminism here.
 ///
-/// NOTE: `generate_layout_with_config` is itself NOT deterministic per seed --
-/// the same `(model, seed)` pair produces slightly different layouts on
-/// repeated calls, *even serially within one process* (verified by direct
-/// probe). The drift traces to per-process-randomized `HashMap`/`HashSet`
-/// iteration order in the layout pipeline (e.g. `sfdp::build_node_index`'s
-/// `HashMap` feeding force accumulation). This is a pre-existing layout-engine
-/// issue, not a property of this sweep; it means the reported median/spread
-/// vary run-to-run within a small band. The fix belongs in the layout engine
-/// (deterministic ordered containers). Tracked separately.
+/// `generate_layout_with_config` is deterministic per seed (fix #633): the same
+/// `(model, seed)` pair produces the identical layout on repeated calls within
+/// and across processes, so the reported median/spread are reproducible.
 ///
 /// A seed whose layout fails to generate is dropped with a WARN (a single bad
-/// seed must not sink the whole model's sweep). The full model-level
-/// skip-on-failure path (load/render) lands in a later task; here a model with
-/// zero successful seeds yields an empty `ModelStats` (all-zero, no panic).
+/// seed must not sink the whole model's sweep). A model whose layout fails on
+/// EVERY seed yields an empty `samples` vector here; the caller
+/// (`process_model`) treats that zero-usable-samples case as a model-level
+/// failure and skips the model (`WARN: skipping {key}: ...`), so a model that
+/// never lays out is omitted from the report rather than reported as a
+/// degenerate all-zero entry (AC3.6).
 fn sweep_model(key: &str, project: &datamodel::Project, seeds: &[u64]) -> ModelStats {
     // Compute one (seed, sample) per seed in parallel, then sort back into seed
     // order so the sample vector -- and therefore every statistic derived from
@@ -556,6 +553,71 @@ fn report_renders(key: &str, renders: &ModelRenders) {
     }
 }
 
+// ── Per-model pipeline (skip-on-failure) ─────────────────────────────────────
+
+/// Run one model's full pipeline -- load -> seed sweep -> render -- and return
+/// its `(ModelStats, ModelRenders)` on success.
+///
+/// This is the model-level skip-on-failure boundary (AC3.6): EVERY way a single
+/// model can fail funnels through the returned `Err(String)`, which `main` turns
+/// into a `WARN: skipping {key}: {err}` and a continue to the next model, so one
+/// bad model never aborts the sweep and is simply omitted from the report.
+///
+/// Three failure modes, validated in the order data flows (defense-in-depth):
+///   1. **Load failure** (entry layer): a missing file or a parse error is
+///      already surfaced as `Err(String)` by `load_model`; propagated with `?`.
+///   2. **No usable layout** (business layer): `sweep_model` drops each
+///      individually-failing seed with a WARN but still returns a (possibly
+///      empty) `ModelStats`. A model whose layout failed on EVERY seed has zero
+///      samples and cannot be scored, rendered, or aggregated -- it is a
+///      model-level failure here, returned as `Err`. Crucially this only fires
+///      when ALL seeds failed: a model with even one usable sample proceeds, so
+///      a partial per-seed failure never sinks the model.
+///   3. **Render failure** (handled inside `render_model`): a layout that scores
+///      but fails to rasterize or write is non-fatal -- it is WARN-logged and
+///      its `Render` is `None`. A model can therefore appear in the report with
+///      its statistics but a missing PNG cell; this is intentionally NOT a
+///      model-level skip (the scores are still meaningful).
+fn process_model(
+    spec: &ModelSpec,
+    seeds: &[u64],
+    out: &str,
+) -> Result<(ModelStats, ModelRenders), String> {
+    // 1. Load (entry-layer validation lives in `load_model`).
+    let project = load_model(spec)?;
+
+    let n = loaded_element_count(&project);
+    println!("loaded {}: {n} elements", spec.key);
+
+    // 2. Sweep. A model with zero usable samples laid out on no seed -- it is a
+    //    model-level failure, not a degenerate all-zero report entry.
+    let stats = sweep_model(spec.key, &project, seeds);
+    if stats.samples.is_empty() {
+        return Err(format!(
+            "no usable layout: all {} seed(s) failed to lay out",
+            seeds.len(),
+        ));
+    }
+
+    let (p25, p75) = stats.spread;
+    println!(
+        "{}: median={:.4} p25/p75={:.4}/{:.4} best_of_k={:.4} (M={})",
+        spec.key,
+        stats.median_cost,
+        p25,
+        p75,
+        stats.best_of_k_cost,
+        stats.samples.len(),
+    );
+
+    // 3. Render best/median/worst (and the reference, if any). Render failures
+    //    are non-fatal: `render_model` WARN-logs and leaves the cell `None`.
+    let renders = render_model(spec.key, &project, &stats, out);
+    report_renders(spec.key, &renders);
+
+    Ok((stats, renders))
+}
+
 // ── Report (metrics.json + index.html) ──────────────────────────────────────
 //
 // The structs below are the on-disk JSON shape. They are PURE DATA built once
@@ -1060,34 +1122,34 @@ fn main() {
         keys.len(),
     );
 
+    // Per-model skip-on-failure (AC3.6): each model's full pipeline (load ->
+    // sweep -> render) is wrapped in `process_model`. ANY failure -- a load
+    // error, a layout that fails on every seed, etc. -- is WARN-logged and the
+    // sweep CONTINUES to the next model; the failed model is omitted from
+    // `per_model`/`renders` (and therefore from every artifact). The harness
+    // always reaches the end and exits 0, even if every model was skipped.
+    //
+    // `per_model` and `renders` stay positionally paired: both are pushed
+    // exactly once per surviving model, so the Task-4 report builder can zip
+    // them.
     let mut per_model: Vec<ModelStats> = Vec::new();
     let mut renders: Vec<ModelRenders> = Vec::new();
+    let mut skipped = 0usize;
     for spec in CORPUS.iter().filter(|s| keys.contains(&s.key)) {
-        match load_model(spec) {
-            Ok(project) => {
-                let n = loaded_element_count(&project);
-                println!("loaded {}: {n} elements", spec.key);
-
-                let stats = sweep_model(spec.key, &project, &seeds);
-                let (p25, p75) = stats.spread;
-                // The actual sampled seed count is the union of LAYOUT_SEEDS and
-                // 0..m (deduped), reported here as the real M the run swept.
-                println!(
-                    "{}: median={:.4} p25/p75={:.4}/{:.4} best_of_k={:.4} (M={n_sampled})",
-                    spec.key, stats.median_cost, p25, p75, stats.best_of_k_cost,
-                );
-
-                // Render best/median/worst (and the reference, if the model
-                // ships one) to PNGs. Render failures are non-fatal.
-                let model_renders = render_model(spec.key, &project, &stats, &out);
-                report_renders(spec.key, &model_renders);
-
+        match process_model(spec, &seeds, &out) {
+            Ok((stats, model_renders)) => {
                 per_model.push(stats);
                 renders.push(model_renders);
             }
-            Err(err) => eprintln!("WARN: skipping {}: {err}", spec.key),
+            Err(err) => {
+                eprintln!("WARN: skipping {}: {err}", spec.key);
+                skipped += 1;
+            }
         }
     }
+    if skipped > 0 {
+        println!("skipped {skipped} model(s) (see WARN lines above)");
+    }
 
     let corpus = CorpusReport::from_model_stats(per_model);
     println!(

From 9d0c9ff138a4a3c25f07a171ad16ccac164e6daa Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 09:15:09 -0700
Subject: [PATCH 21/38] engine: add default_projects models to layout_eval
 corpus

The four built-in default_projects (fishbanks, logistic-growth, population,
reliability) are hand-laid-out reference layouts and are the primary good-layout
taste anchors for the Phase 4 metric calibration. Added them to the eval corpus
so the sweep can score their reference views against generated layouts.
---
 src/simlin-engine/examples/layout_eval.rs | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
index ad577b4d9..bc8db4d4d 100644
--- a/src/simlin-engine/examples/layout_eval.rs
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -132,6 +132,28 @@ const CORPUS: &[ModelSpec] = &[
         rel_path: "../../test/logistic_growth_ltm/logistic_growth.stmx",
         format: Xmile,
     },
+    // default_projects: the app's curated, hand-laid-out built-in projects.
+    // These are the primary "good layout" taste anchors for Phase 4 calibration.
+    ModelSpec {
+        key: "fishbanks",
+        rel_path: "../../default_projects/fishbanks/model.xmile",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "dp_logistic_growth",
+        rel_path: "../../default_projects/logistic-growth/model.xmile",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "population",
+        rel_path: "../../default_projects/population/model.xmile",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "reliability",
+        rel_path: "../../default_projects/reliability/model.xmile",
+        format: Xmile,
+    },
     // modules
     ModelSpec {
         key: "hares_and_foxes",

From ac1a1485d2e765abaa6976c3654025829d6cab34 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 09:29:13 -0700
Subject: [PATCH 22/38] engine: compute node_overlap and node_connector_overlap
 on bare shape boxes

These two layout-quality terms previously iterated the label-merged
node boxes (node_box), which conflated shape-vs-shape overlap with
label overlap and charged a connector for merely passing under a
label. The user's calibrated taste treats node shapes overlapping
other node shapes, and a connector passing under a non-incident node
SHAPE (it reads as a false causal connection at a glance), as the
real costs; a connector passing only under a semi-transparent LABEL
is mild noise, and label collisions already belong to label_overlap.

node_overlap now sums pairwise overlap over the bare shape boxes
(node_shape_box) and normalizes by total shape-box area;
node_connector_overlap charges a connector for the length it spends
inside non-incident shape boxes. node_shape_boxes is hoisted to be
computed once and shared with label_overlap. sprawl and
aspect_penalty intentionally keep using the label-merged node_box
because the view's visual extent and characteristic node size include
labels. MetricWeights, weighted_cost, crossings, and loop_compactness
are untouched.
---
 src/simlin-engine/src/layout/metrics.rs | 218 +++++++++++++++++++-----
 1 file changed, 173 insertions(+), 45 deletions(-)

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index 6d6489d1b..00b96d5d7 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -58,10 +58,14 @@ pub const TARGET_AR_MAX: f64 = 16.0 / 9.0;
 /// plain `f64`), so the derives carry no behavior.
 #[derive(Clone, Copy, Debug, PartialEq, serde::Serialize, serde::Deserialize)]
 pub struct LayoutMetrics {
-    /// Sum of pairwise node-box overlap area, normalized by total node area.
+    /// Sum of pairwise node *shape*-box overlap area (label-free), normalized
+    /// by total shape-box area. Measures shapes overlapping shapes; label
+    /// collisions are charged by `label_overlap` instead.
     pub node_overlap: f64,
     /// Fraction of total connector length that passes through non-incident
-    /// node boxes.
+    /// node *shape* boxes (label-free). A connector under a node shape reads as
+    /// a false causal connection; a connector under only a label is not
+    /// charged here.
     pub node_connector_overlap: f64,
     /// Sum of label-vs-label and label-vs-node overlap area, normalized by
     /// total label area.
@@ -320,22 +324,41 @@ pub fn compute_layout_metrics(
     _config: &LayoutConfig,
 ) -> LayoutMetrics {
     // --- node boxes (with their owning element for incidence checks) ---
+    //
+    // Two box sets, used by different terms:
+    //   * `node_boxes` is the LABEL-MERGED box (`node_box`): each element's own
+    //     label unioned into its shape. The view's visual extent and its
+    //     characteristic node size both include labels, so `sprawl` and
+    //     `aspect_penalty` use this set.
+    //   * `node_shape_boxes` is the bare SHAPE box (`node_shape_box`):
+    //     label-free. `node_overlap` and `node_connector_overlap` use this set
+    //     so they measure exactly what the user cares about -- node SHAPES
+    //     overlapping other node shapes, and a connector passing under a node
+    //     SHAPE (a false-causal-connection at a glance). A connector passing
+    //     only under a node's LABEL is mild noise (labels are semi-transparent
+    //     and no connector terminates on one) and must NOT be charged here;
+    //     label collisions are the province of `label_overlap`.
     let node_boxes: Vec<(i32, Rect)> = view
         .elements
         .iter()
         .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r)))
         .collect();
+    let node_shape_boxes: Vec<(i32, Rect)> = view
+        .elements
+        .iter()
+        .filter_map(|e| node_shape_box(e).map(|r| (e.get_uid(), r)))
+        .collect();
 
-    // --- node_overlap ---
-    let total_node_area: f64 = node_boxes.iter().map(|(_, r)| rect_area(r)).sum();
-    let node_overlap = if total_node_area > 0.0 {
+    // --- node_overlap (bare shape boxes, normalized by total shape-box area) ---
+    let total_shape_area: f64 = node_shape_boxes.iter().map(|(_, r)| rect_area(r)).sum();
+    let node_overlap = if total_shape_area > 0.0 {
         let mut overlap = 0.0;
-        for i in 0..node_boxes.len() {
-            for j in (i + 1)..node_boxes.len() {
-                overlap += rect_overlap_area(&node_boxes[i].1, &node_boxes[j].1);
+        for i in 0..node_shape_boxes.len() {
+            for j in (i + 1)..node_shape_boxes.len() {
+                overlap += rect_overlap_area(&node_shape_boxes[i].1, &node_shape_boxes[j].1);
             }
         }
-        overlap / total_node_area
+        overlap / total_shape_area
     } else {
         0.0
     };
@@ -344,11 +367,11 @@ pub fn compute_layout_metrics(
     let connectors = collect_connector_geometry(view);
     let total_connector_length: f64 = connectors.iter().map(|c| c.length).sum();
 
-    // --- node_connector_overlap ---
+    // --- node_connector_overlap (length inside non-incident shape boxes) ---
     let node_connector_overlap = if total_connector_length > 0.0 {
         let mut inside = 0.0;
         for c in &connectors {
-            for (uid, rect) in &node_boxes {
+            for (uid, rect) in &node_shape_boxes {
                 if c.incident_uids.contains(uid) {
                     continue; // skip the connector's own endpoints
                 }
@@ -383,11 +406,8 @@ pub fn compute_layout_metrics(
         .iter()
         .filter_map(|e| element_label_props(e).map(|props| (e.get_uid(), label_bounds(&props))))
         .collect();
-    let node_shape_boxes: Vec<(i32, Rect)> = view
-        .elements
-        .iter()
-        .filter_map(|e| node_shape_box(e).map(|r| (e.get_uid(), r)))
-        .collect();
+    // `node_shape_boxes` is computed once above (shared with node_overlap and
+    // node_connector_overlap).
     let total_label_area: f64 = label_boxes.iter().map(|(_, r)| rect_area(r)).sum();
     let label_overlap = if total_label_area > 0.0 {
         let mut overlap = 0.0;
@@ -623,29 +643,20 @@ mod tests {
 
     #[test]
     fn test_node_overlap_known_overlap_fraction() {
-        // Two stocks (45x35) whose centers are 20px apart horizontally and 10px
-        // apart vertically. With LabelSide::Bottom the label sits below the box
-        // and does not change the horizontal/vertical extent that matters for
-        // the box-box overlap of the *element* boxes; however `stock_bounds`
-        // merges the label, so we place the stocks far enough apart vertically
-        // that the labels do not collide, and compute the overlap from the full
-        // merged boxes directly.
+        // Two stocks (45x35) whose centers are 20px apart horizontally and at
+        // the same y. node_overlap is computed on the bare SHAPE boxes (not the
+        // label-merged boxes), so the expected value comes from
+        // `stock_shape_bounds` and is normalized by the total SHAPE-box area.
         let s1 = stock(1, "a", 100.0, 100.0);
         let s2 = stock(2, "b", 120.0, 100.0);
         let view = make_view(vec![s1.clone(), s2.clone()]);
 
         let m = compute_layout_metrics(&view, &cfg());
 
-        // Expected: compute directly from the two merged boxes the renderer
-        // would produce.
-        let b1 = stock_bounds(match &s1 {
-            ViewElement::Stock(s) => s,
-            _ => unreachable!(),
-        });
-        let b2 = stock_bounds(match &s2 {
-            ViewElement::Stock(s) => s,
-            _ => unreachable!(),
-        });
+        // Expected: compute directly from the two bare shape boxes the renderer
+        // draws (the rects, label-free).
+        let b1 = node_shape_box(&s1).unwrap();
+        let b2 = node_shape_box(&s2).unwrap();
         let expected_overlap = rect_overlap_area(&b1, &b2);
         let expected_total = rect_area(&b1) + rect_area(&b2);
         assert!(expected_overlap > 0.0, "fixture must actually overlap");
@@ -660,20 +671,16 @@ mod tests {
 
     #[test]
     fn test_node_overlap_simple_hand_computed() {
-        // Two stocks far apart vertically so labels never collide, and exactly
-        // 5px horizontal center separation so the element boxes overlap by a
-        // hand-computable amount in x while fully overlapping in y is avoided.
-        // To make the arithmetic exact and independent of label geometry, use
-        // LabelSide::Center is risky; instead verify against the helper-derived
-        // boxes (the renderer's own geometry) which is the contract.
+        // Two stocks with exactly one stock-width of horizontal center
+        // separation. node_overlap is a sum over the bare SHAPE boxes, so only
+        // the rects matter (labels are irrelevant to this term now).
         let s1 = stock(1, "a", 0.0, 0.0);
         let s2 = stock(2, "b", STOCK_WIDTH, 0.0); // centers exactly one width apart
         let view = make_view(vec![s1, s2]);
         let m = compute_layout_metrics(&view, &cfg());
-        // Centers one full width apart -> the 45-wide boxes just touch in x
-        // (right edge of #1 at +22.5, left edge of #2 at +22.5): zero element
-        // overlap. Labels (Bottom) are centered under each and 45px apart, each
-        // ~34px wide -> they do not overlap either. So node_overlap == 0.
+        // Centers one full width apart -> the 45-wide shape boxes just touch in
+        // x (right edge of #1 at +22.5, left edge of #2 at +22.5): zero shape
+        // overlap. So node_overlap == 0.
         assert_eq!(m.node_overlap, 0.0);
     }
 
@@ -690,6 +697,56 @@ mod tests {
         assert_eq!(m.node_overlap, 0.0);
     }
 
+    // node_overlap is computed on the bare SHAPE boxes, NOT the label-merged
+    // boxes. The user cares about node shapes overlapping other node shapes;
+    // a label landing on another node's shape (or another label) is the
+    // province of `label_overlap`. This test distinguishes the two regimes and
+    // would FAIL against the prior label-merged-box implementation.
+
+    #[test]
+    fn test_node_overlap_labels_overlap_shapes_disjoint_is_zero() {
+        // Two `LabelSide::Bottom` auxes named "samename" (8 chars), 40px apart
+        // horizontally at the same y -- the same fixture as the label_overlap
+        // double-count regression test:
+        //   aux1 @ (0,0):  shape [-9,9]x[-9,9],   label [-29,29]x[13,27]
+        //   aux2 @ (40,0): shape [31,49]x[-9,9],  label [11,69]x[13,27]
+        // The SHAPE boxes are disjoint (9 < 31), so node_overlap == 0. The
+        // LABEL boxes overlap, but that collision belongs to label_overlap, not
+        // node_overlap. Under the old label-merged boxes node_overlap would be
+        // > 0 (the merged boxes [-29,29]x[-9,27] and [11,69]x[-9,27] overlap),
+        // so this assertion pins the new shape-only behavior.
+        let view = make_view(vec![
+            aux(1, "samename", 0.0, 0.0),
+            aux(2, "samename", 40.0, 0.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(
+            m.node_overlap, 0.0,
+            "node_overlap must ignore label-only overlap (shapes are disjoint)"
+        );
+        // Sanity: the label collision IS captured by label_overlap, confirming
+        // the overlap was not simply lost.
+        assert!(
+            m.label_overlap > 0.0,
+            "the label-vs-label overlap must still be charged by label_overlap"
+        );
+    }
+
+    #[test]
+    fn test_node_overlap_shapes_overlap_is_positive() {
+        // Two stocks (45x35) whose centers are 20px apart horizontally and at
+        // the same y -- their bare SHAPE boxes overlap, so node_overlap > 0.
+        let view = make_view(vec![
+            stock(1, "a", 100.0, 100.0),
+            stock(2, "b", 120.0, 100.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.node_overlap > 0.0,
+            "overlapping node shapes must produce positive node_overlap"
+        );
+    }
+
     // --- AC1.3: node_connector_overlap ---
 
     #[test]
@@ -708,11 +765,15 @@ mod tests {
             "connector passing through a non-incident stock must contribute"
         );
 
-        // Expected = clipped length inside the stock box / total polyline len.
+        // Expected = clipped length inside the stock SHAPE box / total polyline
+        // len. node_connector_overlap charges against the bare shape box, not
+        // the label-merged box. (The connector is horizontal at y=0, so the
+        // clipped length happens to be identical to the label-merged box here;
+        // the SHAPE box is the contract regardless.)
         let connectors = collect_connector_geometry(&view);
         assert_eq!(connectors.len(), 1);
         let c = &connectors[0];
-        let stock_box = node_box(&stock(3, "s", 200.0, 0.0)).unwrap();
+        let stock_box = node_shape_box(&stock(3, "s", 200.0, 0.0)).unwrap();
         let mut inside = 0.0;
         for seg in c.polyline.windows(2) {
             inside += segment_length_in_rect(&seg[0], &seg[1], &stock_box);
@@ -738,6 +799,73 @@ mod tests {
         assert_eq!(m.node_connector_overlap, 0.0);
     }
 
+    // node_connector_overlap charges a connector for the length it spends
+    // inside a non-incident node's bare SHAPE box, NOT its label-merged box.
+    // The user reads a connector passing under a node SHAPE as a false causal
+    // connection (high priority); a connector passing only under a node's LABEL
+    // is mild noise (labels are semi-transparent, no connector starts/ends on a
+    // label) and must NOT be charged. These two tests pin that distinction; the
+    // first would FAIL against the prior label-merged-box implementation.
+
+    #[test]
+    fn test_node_connector_overlap_under_label_only_is_zero() {
+        // Connector from aux #1 (0,0) to aux #2 (400,0): a horizontal line at
+        // y=0 (clipped to the 9px aux radii, so drawn x in [9, 391]). A
+        // non-incident `LabelSide::Bottom` stock #3 named "s" (1 char) is placed
+        // ABOVE the line so its SHAPE box clears y=0 but its label (which hangs
+        // BELOW the shape) reaches down across y=0:
+        //   stock #3 @ (200,-25):
+        //     shape box  x [177.5, 222.5], y [-42.5, -7.5]   (does NOT cross 0)
+        //     label box  x [192, 208],     y [-3.5, 10.5]    (DOES cross 0)
+        // The connector at y=0 passes through the label band but never enters
+        // the shape box, so node_connector_overlap == 0. Under the old
+        // label-merged box (which unions the label, y [-42.5, 10.5]) the line
+        // WOULD be charged, so this assertion is the load-bearing distinction.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let label_only = stock(3, "s", 200.0, -25.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, label_only, link]);
+
+        // Confirm the fixture geometry is what we claim before asserting on the
+        // metric: shape box clears the line, merged box does not.
+        let shape = node_shape_box(&stock(3, "s", 200.0, -25.0)).unwrap();
+        let merged = node_box(&stock(3, "s", 200.0, -25.0)).unwrap();
+        assert!(
+            shape.bottom < 0.0,
+            "shape box must clear the connector line (bottom {} < 0)",
+            shape.bottom
+        );
+        assert!(
+            merged.bottom > 0.0,
+            "merged box must cross the connector line via the label (bottom {} > 0)",
+            merged.bottom
+        );
+
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(
+            m.node_connector_overlap, 0.0,
+            "a connector passing only under a node's LABEL must not be charged"
+        );
+    }
+
+    #[test]
+    fn test_node_connector_overlap_under_shape_is_positive() {
+        // Same connector, but the non-incident stock sits ON the line so the
+        // connector crosses its SHAPE box -- the false-causal-connection case
+        // the user cares about. node_connector_overlap > 0.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let on_line = stock(3, "s", 200.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, on_line, link]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.node_connector_overlap > 0.0,
+            "a connector passing under a node SHAPE must be charged"
+        );
+    }
+
     // --- AC1.4: label_overlap ---
 
     #[test]

From 8a1cd15cac26d0567d3110571b7a0b1211a86e54 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 09:54:18 -0700
Subject: [PATCH 23/38] engine: suppress spurious crossings where links meet
 flow valves/attachments

build_view_segments named a link's endpoint vertices elem_{from_uid}/
elem_{to_uid} but a flow's pipe vertices flow_{uid}#{i}. do_segments_intersect
only suppresses a crossing when two segments share a from_node/to_node string,
so a link incident on a flow -- terminating at the flow's valve
(link.to_uid == flow.uid) or at a stock/cloud the flow attaches to -- grazed
the pipe at the connection point with mismatched node names and was miscounted
as a real crossing.

The flow branch now names each pipe vertex to share the elem_{uid} name the
incident link/stock/cloud already uses: an attached point becomes
elem_{attached_to_uid}, and the valve (which lies on the pipe, not at a stored
point) is injected as an elem_{flow.uid} vertex by splitting the pipe segment
it sits on. A genuinely free interior point keeps the per-flow flow_{uid}#{i}
name, so a real mid-span crossing -- segments sharing no element with the flow
-- is still counted. Consecutive pipe sub-segments share the joining vertex
name, so a flow never self-crosses.

This fixes the dp_logistic_growth reference miscount (two links terminating at
the net birth rate valve reported 2 spurious crossings) that was distorting the
Phase 4 layout-quality calibration; the reference now scores crossings == 0.
---
 .../src/layout/crossings_tests.rs             | 198 ++++++++++++++++++
 src/simlin-engine/src/layout/mod.rs           | 117 ++++++++++-
 2 files changed, 306 insertions(+), 9 deletions(-)

diff --git a/src/simlin-engine/src/layout/crossings_tests.rs b/src/simlin-engine/src/layout/crossings_tests.rs
index 35c239399..a457bb83b 100644
--- a/src/simlin-engine/src/layout/crossings_tests.rs
+++ b/src/simlin-engine/src/layout/crossings_tests.rs
@@ -39,6 +39,75 @@ fn cv_link(uid: i32, from_uid: i32, to_uid: i32, shape: LinkShape) -> ViewElemen
     })
 }
 
+fn cv_stock(uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Stock(view_element::Stock {
+        name: format!("s{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+        compat: None,
+    })
+}
+
+fn cv_cloud(uid: i32, flow_uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Cloud(view_element::Cloud {
+        uid,
+        flow_uid,
+        x,
+        y,
+        compat: None,
+    })
+}
+
+/// A horizontal flow whose valve sits at (`x`, `y`), with its source end
+/// attached to `from_uid` (a cloud or stock to the left) and its sink end
+/// attached to `to_uid` (a stock to the right). The valve lies on the pipe,
+/// mid-span between the two attached endpoints.
+fn cv_flow(uid: i32, x: f64, y: f64, from_uid: i32, to_uid: i32) -> ViewElement {
+    cv_flow_pts(
+        uid,
+        x,
+        y,
+        (x - 60.0, y, Some(from_uid)),
+        (x + 60.0, y, Some(to_uid)),
+    )
+}
+
+/// A two-point flow with the valve at (`x`, `y`) and explicitly positioned
+/// source/sink points, each carrying an optional `attached_to_uid`. Lets a
+/// test reproduce a real reference geometry where the valve does not sit at the
+/// midpoint of the two points.
+fn cv_flow_pts(
+    uid: i32,
+    x: f64,
+    y: f64,
+    from: (f64, f64, Option<i32>),
+    to: (f64, f64, Option<i32>),
+) -> ViewElement {
+    ViewElement::Flow(view_element::Flow {
+        name: format!("f{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Top,
+        points: vec![
+            view_element::FlowPoint {
+                x: from.0,
+                y: from.1,
+                attached_to_uid: from.2,
+            },
+            view_element::FlowPoint {
+                x: to.0,
+                y: to.1,
+                attached_to_uid: to.2,
+            },
+        ],
+        compat: None,
+        label_compat: None,
+    })
+}
+
 fn cv_view(elements: Vec<ViewElement>) -> datamodel::StockFlow {
     datamodel::StockFlow {
         name: None,
@@ -219,3 +288,132 @@ fn test_count_view_crossings_module_incident_link_participates() {
         "a Module-incident link must participate in crossing detection"
     );
 }
+
+/// A link that TERMINATES at a flow's valve must not be counted as crossing the
+/// flow pipe at that shared connection point. This is the exact
+/// dp_logistic_growth reference geometry: the horizontal `net birth rate` flow
+/// (cloud -> valve -> Population stock) plus the `fractional growth rate ->
+/// net birth rate` link, whose drawn arc curves up to the valve from below and
+/// grazes the pipe at the connection point. The link's endpoint (`elem_2`, the
+/// flow's own element uid) and the pipe share the flow's element at the valve,
+/// so that graze is not a real crossing.
+#[test]
+fn test_count_view_crossings_link_to_flow_valve_no_crossing() {
+    let flow_uid = 2;
+    let view = cv_view(vec![
+        cv_stock(1, 602.4000244140625, 259.8000183105469),
+        cv_flow_pts(
+            flow_uid,
+            518.2726610523725,
+            258.60003662109375,
+            // source end attached to the cloud, sink end to the stock
+            (456.79998779296875, 258.60003662109375, Some(3)),
+            (579.9000244140625, 258.60003662109375, Some(1)),
+        ),
+        cv_cloud(3, flow_uid, 456.79998779296875, 258.60003662109375),
+        cv_aux(4, 498.0, 344.20001220703125),
+        // fractional growth rate -> net birth rate (to_uid == flow.uid): the
+        // drawn arc bulges up to graze the pipe at the valve connection point.
+        cv_link(10, 4, flow_uid, LinkShape::Arc(118.82198603295677)),
+    ]);
+
+    assert_eq!(
+        count_view_crossings(&view),
+        0,
+        "a link terminating at a flow valve must not count as crossing the pipe"
+    );
+}
+
+/// The flow-segment naming contract that the suppression relies on: a flow
+/// point attached to a stock/cloud names its pipe vertex `elem_{attached_uid}`
+/// (so a link incident on that stock/cloud, which uses the same name, is
+/// suppressed at the shared connection point), the valve is injected as an
+/// `elem_{flow.uid}` vertex on the pipe (so a link incident on the valve is
+/// suppressed there), and a free point keeps the per-flow `flow_{uid}#{i}`
+/// name (so a genuine mid-span crossing is still counted). This is the
+/// node-name contract; the end-to-end suppression is exercised by the valve and
+/// mid-span tests, since for an attached stock/cloud the link endpoint clips to
+/// the element boundary and only grazes the pipe through the shared vertex.
+#[test]
+fn test_build_view_segments_flow_vertex_naming() {
+    let flow_uid = 2;
+    let stock_uid = 1;
+    let cloud_uid = 3;
+    let view = cv_view(vec![
+        cv_stock(stock_uid, 602.4000244140625, 259.8000183105469),
+        cv_flow_pts(
+            flow_uid,
+            518.2726610523725,
+            258.60003662109375,
+            (456.79998779296875, 258.60003662109375, Some(cloud_uid)),
+            (579.9000244140625, 258.60003662109375, Some(stock_uid)),
+        ),
+        cv_cloud(cloud_uid, flow_uid, 456.79998779296875, 258.60003662109375),
+    ]);
+
+    let segs = build_view_segments(&view);
+    // The pipe splits at the valve into two sub-segments:
+    //   elem_3 (cloud) -> elem_2 (valve)  and  elem_2 (valve) -> elem_1 (stock)
+    let names: Vec<(String, String)> = segs
+        .iter()
+        .map(|s| (s.from_node.clone(), s.to_node.clone()))
+        .collect();
+    assert_eq!(
+        names,
+        vec![
+            ("elem_3".to_string(), "elem_2".to_string()),
+            ("elem_2".to_string(), "elem_1".to_string()),
+        ],
+        "flow pipe must name attached endpoints elem_<attached> and split at the valve as elem_<flow>"
+    );
+
+    // A free (unattached) interior point keeps the per-flow name.
+    let free_view = cv_view(vec![cv_flow_pts(
+        flow_uid,
+        518.2726610523725,
+        258.60003662109375,
+        (456.79998779296875, 258.60003662109375, None),
+        (579.9000244140625, 258.60003662109375, None),
+    )]);
+    let free_segs = build_view_segments(&free_view);
+    let free_names: Vec<(String, String)> = free_segs
+        .iter()
+        .map(|s| (s.from_node.clone(), s.to_node.clone()))
+        .collect();
+    assert_eq!(
+        free_names,
+        vec![
+            (format!("flow_{flow_uid}#0"), format!("elem_{flow_uid}")),
+            (format!("elem_{flow_uid}"), format!("flow_{flow_uid}#1")),
+        ],
+        "an unattached flow point keeps its per-flow name; only the valve is elem_<flow>"
+    );
+}
+
+/// A GENUINE mid-span crossing of a flow pipe -- a link that crosses the pipe
+/// away from any element the flow shares -- must STILL be counted. This guards
+/// against the valve/attachment suppression over-suppressing real crossings.
+#[test]
+fn test_count_view_crossings_link_crosses_flow_pipe_midspan_counted() {
+    // Flow valve at (100, 100), pipe from x=40 to x=160 at y=100. A straight
+    // link runs vertically through x=70 (between the cloud end and the valve,
+    // so it does NOT touch the valve, the cloud, or the stock), crossing the
+    // pipe once.
+    let flow_uid = 20;
+    let view = cv_view(vec![
+        cv_cloud(1, flow_uid, 40.0, 100.0),
+        cv_stock(2, 200.0, 100.0),
+        cv_aux(3, 70.0, 50.0),
+        cv_aux(4, 70.0, 150.0),
+        cv_flow(flow_uid, 100.0, 100.0, 1, 2),
+        // Link from a3 (above the pipe) to a4 (below the pipe), crossing the
+        // pipe at x=70 -- nowhere near the valve or either attached element.
+        cv_link(30, 3, 4, LinkShape::Straight),
+    ]);
+
+    assert_eq!(
+        count_view_crossings(&view),
+        1,
+        "a genuine mid-span crossing of the flow pipe must still be counted"
+    );
+}
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index 312687164..0ebfd3919 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -4358,6 +4358,38 @@ fn detect_chains(
     chains
 }
 
+/// Whether `p` lies on the segment from flow point `a` to flow point `b`,
+/// within a small pixel tolerance. Used to find the pipe segment a flow's valve
+/// sits on so the valve can be injected as a shared `elem_{flow.uid}` vertex.
+///
+/// The perpendicular distance from `p` to the line must be tiny, and `p` must
+/// project within the segment (parameter in `[0, 1]`). A degenerate segment
+/// (`a == b`) only matches when `p` coincides with it.
+fn point_on_segment(
+    p: Position,
+    a: &datamodel::view_element::FlowPoint,
+    b: &datamodel::view_element::FlowPoint,
+) -> bool {
+    const TOL: f64 = 0.5; // pixels
+    let a = Position::new(a.x, a.y);
+    let b = Position::new(b.x, b.y);
+    let ab = b - a;
+    let ap = p - a;
+    let len_sq = ab.dot(ab);
+    if len_sq < f64::EPSILON {
+        // Degenerate segment: only "on" it if p coincides with the point.
+        return ap.dot(ap) < TOL * TOL;
+    }
+    // Project p onto the line; require it to fall within the segment.
+    let t = ap.dot(ab) / len_sq;
+    if !(0.0..=1.0).contains(&t) {
+        return false;
+    }
+    // Perpendicular distance: |ap x ab| / |ab|.
+    let perp = ap.cross_2d(ab).abs() / len_sq.sqrt();
+    perp < TOL
+}
+
 /// Build the set of [`LineSegment`]s that crossing detection runs over for a
 /// completed StockFlow view. This is the single source of geometry shared by
 /// [`count_view_crossings`] and the layout quality metric, so a layout's
@@ -4377,8 +4409,19 @@ fn detect_chains(
 /// `elem_{to_uid}` (so two connectors sharing an element endpoint never count),
 /// while internal arc-sample vertices are `link_{link.uid}#{i}` (so the
 /// consecutive segments of one arc share an internal node name and never count
-/// as self-crossings). Flow pipe segments keep the historic `flow_{uid}#{i}`
-/// naming.
+/// as self-crossings).
+///
+/// A flow's pipe vertices share those same `elem_{uid}` names with whatever
+/// element they connect to, so a link incident on the flow grazes but does not
+/// "cross" the pipe at the shared connection point. A point attached to a
+/// stock/cloud is named `elem_{attached_to_uid}` (matching a link whose
+/// endpoint is that stock/cloud), and the flow's valve -- which sits on the
+/// pipe, not necessarily at a stored point -- is injected as an extra vertex
+/// named `elem_{flow.uid}` so a link incident on the valve (its `to_uid`/
+/// `from_uid` is the flow's own element uid) is suppressed there too. A
+/// genuinely free interior point (no attachment, not the valve) keeps the
+/// historic per-flow `flow_{uid}#{i}` name, so a link that crosses the pipe
+/// mid-span -- sharing no element with the flow -- is still counted.
 fn build_view_segments(view: &datamodel::StockFlow) -> Vec<LineSegment> {
     if view.elements.is_empty() {
         return Vec::new();
@@ -4445,13 +4488,69 @@ fn build_view_segments(view: &datamodel::StockFlow) -> Vec<LineSegment> {
                 }
             }
             ViewElement::Flow(flow) => {
-                for i in 0..flow.points.len().saturating_sub(1) {
-                    segments.push(LineSegment {
-                        start: Position::new(flow.points[i].x, flow.points[i].y),
-                        end: Position::new(flow.points[i + 1].x, flow.points[i + 1].y),
-                        from_node: format!("flow_{}#{}", flow.uid, i),
-                        to_node: format!("flow_{}#{}", flow.uid, i + 1),
-                    });
+                if flow.points.len() < 2 {
+                    continue;
+                }
+
+                // Build the pipe as a sequence of named vertices. A point
+                // attached to a stock/cloud shares that element's `elem_{uid}`
+                // name; a free interior point keeps a per-flow `flow_{uid}#{i}`
+                // name. The valve (the flow's own element, at `flow.x/flow.y`)
+                // is injected as an `elem_{flow.uid}` vertex on the pipe segment
+                // whose span contains it, so a link incident on the valve is
+                // suppressed at that shared connection point. Consecutive
+                // segments of one flow always share the joining vertex name, so
+                // a flow never self-crosses.
+                let point_name = |i: usize| -> String {
+                    match flow.points[i].attached_to_uid {
+                        Some(uid) => format!("elem_{uid}"),
+                        None => format!("flow_{}#{}", flow.uid, i),
+                    }
+                };
+
+                let valve = Position::new(flow.x, flow.y);
+                let valve_name = format!("elem_{}", flow.uid);
+                // The pipe segment the valve sits strictly interior to. `None`
+                // when the valve coincides with a stored point or (in a
+                // hand-edited view) drifted off the polyline; the pipe is then
+                // not split and the existing point names hold.
+                let valve_seg = (0..flow.points.len() - 1).find(|&i| {
+                    let a = Position::new(flow.points[i].x, flow.points[i].y);
+                    let b = Position::new(flow.points[i + 1].x, flow.points[i + 1].y);
+                    valve != a
+                        && valve != b
+                        && point_on_segment(valve, &flow.points[i], &flow.points[i + 1])
+                });
+
+                for i in 0..flow.points.len() - 1 {
+                    let a = Position::new(flow.points[i].x, flow.points[i].y);
+                    let b = Position::new(flow.points[i + 1].x, flow.points[i + 1].y);
+                    let a_name = point_name(i);
+                    let b_name = point_name(i + 1);
+
+                    if Some(i) == valve_seg {
+                        // Split this pipe segment at the valve so both halves
+                        // share the `elem_{flow.uid}` vertex.
+                        segments.push(LineSegment {
+                            start: a,
+                            end: valve,
+                            from_node: a_name,
+                            to_node: valve_name.clone(),
+                        });
+                        segments.push(LineSegment {
+                            start: valve,
+                            end: b,
+                            from_node: valve_name.clone(),
+                            to_node: b_name,
+                        });
+                    } else {
+                        segments.push(LineSegment {
+                            start: a,
+                            end: b,
+                            from_node: a_name,
+                            to_node: b_name,
+                        });
+                    }
                 }
             }
             _ => {}

From 6113910d77335fabb0ca1383b8f5796001f41091 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 10:48:21 -0700
Subject: [PATCH 24/38] engine: compute loop_compactness term (isoperimetric
 loop quality)

loop_compactness was a hardcoded 0.0 reserved field. Compute and populate
it so the layout-quality eval sweep can report a per-model value. The
committed weight stays 0 (MetricWeights::default and weighted_cost are
unchanged); calibration happens in a later phase. The user wants to see
what the term reports before it influences any optimizer.

The measure is isoperimetric over the view's feedback cycles. Build a
directed graph over positioned node-box elements (aux/stock/flow/module/
cloud): each Link contributes from_uid -> to_uid, and each Flow contributes
source_attached -> flow.uid -> dest_attached through its own valve, so a
stock--flow--stock feedback path is part of the graph. Enumerate simple
directed cycles with a bounded DFS (each cycle's smallest uid is the search
root, so a cycle is discovered once), capped at 12 nodes per cycle and 64
cycles total -- SD diagrams are tiny, so this stays O(small) and total. For
each cycle of >= 3 distinct positioned nodes, take the node-box centers in
cycle order, compute the shoelace area and summed-edge perimeter, and form
the isoperimetric quotient Q = 4*PI*Area / Perimeter^2 clamped to [0,1]
(Q=1 a perfect circle, Q->0 collinear/degenerate). The per-cycle penalty is
1 - Q; loop_compactness is the mean penalty over all qualifying cycles, or
0.0 when there is no cycle of >= 3 nodes. It thus rewards clean, well-spread
loops and penalizes collapsed/collinear ones.

Centers come from the bare shape box (node_shape_box), which is symmetric
about the element position -- unlike the label-merged node_box, whose label
offset would skew the polygon. Determinism does not rely on layout seed
alone: adjacency targets are sorted, the search visits nodes in sorted uid
order, every cycle is canonicalized (rotated to its smallest uid) and
de-duplicated, and every per-cycle guard returns None rather than a NaN, so
the mean is identical for a given view regardless of element ordering.
---
 src/simlin-engine/src/layout/metrics.rs | 509 +++++++++++++++++++++++-
 1 file changed, 506 insertions(+), 3 deletions(-)

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index 00b96d5d7..f035f74ce 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -15,7 +15,7 @@
 // them. That makes every term trivially testable with hand-computed expected
 // values (see the inline tests below).
 
-use std::collections::HashSet;
+use std::collections::{BTreeMap, BTreeSet, HashSet};
 
 use crate::datamodel::{self, ViewElement};
 use crate::diagram::common::{
@@ -80,7 +80,11 @@ pub struct LayoutMetrics {
     pub aspect_penalty: f64,
     /// Reserved; computed in a future rung. Always 0.0, weight 0.
     pub chain_straightness: f64,
-    /// Reserved; computed in a future rung. Always 0.0, weight 0.
+    /// Mean isoperimetric penalty `1 - Q` over the view's feedback cycles
+    /// (`Q = 4*PI*Area / Perimeter^2` of each loop's node-center polygon,
+    /// clamped to [0,1]). 0.0 = clean, well-spread loops (circles); higher =
+    /// collapsed/collinear loops. 0.0 when the view has no cycle of >= 3 nodes.
+    /// Computed and reported now; weight stays 0 until Phase 4 calibration.
     pub loop_compactness: f64,
 }
 
@@ -310,6 +314,265 @@ fn collect_connector_geometry(view: &datamodel::StockFlow) -> Vec<ConnectorGeome
     out
 }
 
+// --- loop_compactness (isoperimetric feedback-loop quality) -----------------
+//
+// What it measures: how cleanly the view draws its feedback loops as visible
+// circles. For each simple directed cycle of >= 3 positioned nodes we take the
+// node-box centers in cycle order and form a polygon. Its isoperimetric
+// quotient Q = 4*PI*Area / Perimeter^2 is 1 for a perfect circle and tends to 0
+// as the polygon collapses toward a line (the area vanishes while the perimeter
+// stays large). The per-cycle penalty is `1 - Q` (0 = ideal clean loop, ~1 =
+// squished/collinear), and `loop_compactness` is the mean penalty over all
+// qualifying cycles (0.0 when the view has no cycle of >= 3 nodes). It thus
+// REWARDS well-spread loops and PENALIZES collapsed ones.
+//
+// Bounds (SD diagrams are small, so this stays O(small) and total): a simple
+// cycle is enumerated only up to `MAX_CYCLE_LEN` nodes, and at most
+// `MAX_CYCLES` cycles are scored; enumeration stops once the cap is hit. The
+// graph is built over positioned node-box elements (aux/stock/flow/module/cloud
+// -- the same set as `node_box`); links and flows supply the directed edges.
+//
+// Determinism: layout is deterministic per seed, but this term is additionally
+// independent of element ordering. Adjacency targets are sorted, the DFS starts
+// from each node in sorted uid order, and every enumerated cycle is canonicalized
+// (rotated so its smallest uid is first) and de-duplicated, so the mean is the
+// same regardless of how the elements are listed in the view.
+
+/// Maximum number of nodes in an enumerated simple cycle. SD feedback loops are
+/// short; a longer "cycle" is almost always an artifact of many overlapping
+/// smaller loops and is not worth the combinatorial cost.
+const MAX_CYCLE_LEN: usize = 12;
+
+/// Maximum number of distinct simple cycles scored. Bounds the work on dense
+/// graphs; the mean penalty over the first `MAX_CYCLES` cycles is a faithful
+/// proxy for the whole (SD diagrams rarely approach this).
+const MAX_CYCLES: usize = 64;
+
+/// Directed adjacency over positioned node-box elements, keyed by uid with
+/// sorted successor lists. The center of each node's bare *shape* box (which is
+/// symmetric about the element position, so it is the element center -- unlike
+/// the asymmetric label-merged `node_box`) is recorded for the polygon geometry.
+struct LoopGraph {
+    /// uid -> sorted, de-duplicated successor uids.
+    adj: BTreeMap<i32, Vec<i32>>,
+    /// uid -> node-box center point.
+    centers: BTreeMap<i32, Point>,
+}
+
+/// Build the directed loop graph from the view. Nodes are exactly the elements
+/// with a node box (`node_shape_box`). Edges to/from uids that are not
+/// positioned nodes are dropped. Edges come from:
+///   * each Link: `from_uid -> to_uid`;
+///   * each Flow: for consecutive attached points, `source_attached -> flow.uid`
+///     and `flow.uid -> dest_attached`, so a stock--flow--stock feedback path is
+///     part of the graph (the flow's own valve is the intermediate node).
+fn build_loop_graph(view: &datamodel::StockFlow) -> LoopGraph {
+    let mut centers: BTreeMap<i32, Point> = BTreeMap::new();
+    for e in &view.elements {
+        if let Some(r) = node_shape_box(e) {
+            centers.insert(
+                e.get_uid(),
+                Point {
+                    x: (r.left + r.right) / 2.0,
+                    y: (r.top + r.bottom) / 2.0,
+                },
+            );
+        }
+    }
+
+    // Collect edges into sorted sets per source so the adjacency is canonical
+    // (sorted, de-duplicated) and the cycle search is order-independent.
+    let mut edge_sets: BTreeMap<i32, BTreeSet<i32>> = BTreeMap::new();
+    let mut add_edge = |from: i32, to: i32, centers: &BTreeMap<i32, Point>| {
+        // Both endpoints must be positioned nodes, and we never record a
+        // self-loop (a single-node "cycle" forms no polygon).
+        if from != to && centers.contains_key(&from) && centers.contains_key(&to) {
+            edge_sets.entry(from).or_default().insert(to);
+        }
+    };
+
+    for e in &view.elements {
+        match e {
+            ViewElement::Link(link) => {
+                add_edge(link.from_uid, link.to_uid, &centers);
+            }
+            ViewElement::Flow(flow) => {
+                // Consecutive attached points define stock->flow and flow->stock
+                // edges through the flow's own valve uid.
+                let attached: Vec<i32> = flow
+                    .points
+                    .iter()
+                    .filter_map(|p| p.attached_to_uid)
+                    .collect();
+                for w in attached.windows(2) {
+                    add_edge(w[0], flow.uid, &centers);
+                    add_edge(flow.uid, w[1], &centers);
+                }
+            }
+            _ => {}
+        }
+    }
+
+    let adj: BTreeMap<i32, Vec<i32>> = edge_sets
+        .into_iter()
+        .map(|(k, set)| (k, set.into_iter().collect()))
+        .collect();
+    LoopGraph { adj, centers }
+}
+
+/// Enumerate simple directed cycles (each >= 2 nodes), bounded by
+/// `MAX_CYCLE_LEN` and `MAX_CYCLES`, canonicalized and de-duplicated so the same
+/// directed cycle is returned exactly once regardless of where the search
+/// started. A bounded DFS suffices: SD diagrams are tiny, and the caps keep it
+/// O(small) on the rare dense graph.
+///
+/// Each returned cycle is a `Vec<i32>` of uids in traversal order, rotated so
+/// its smallest uid is first (canonical form), and the set of returned cycles is
+/// itself sorted for a fully deterministic result.
+fn enumerate_simple_cycles(graph: &LoopGraph) -> Vec<Vec<i32>> {
+    let mut found: BTreeSet<Vec<i32>> = BTreeSet::new();
+    // Start a DFS from each node in sorted uid order. To avoid re-finding the
+    // same cycle from each of its members we still canonicalize+dedup, but we
+    // also restrict each search to cycles whose minimum node is the start node,
+    // which prunes the bulk of the duplicate work.
+    let starts: Vec<i32> = graph.adj.keys().copied().collect();
+    let mut path: Vec<i32> = Vec::new();
+    let mut on_path: HashSet<i32> = HashSet::new();
+    for &start in &starts {
+        path.clear();
+        on_path.clear();
+        dfs_cycles(graph, start, start, &mut path, &mut on_path, &mut found);
+        if found.len() >= MAX_CYCLES {
+            break;
+        }
+    }
+    found.into_iter().take(MAX_CYCLES).collect()
+}
+
+/// Depth-first walk that records every simple cycle returning to `start` and
+/// composed only of nodes whose uid is >= `start` (so each cycle is discovered
+/// from its smallest member). `path`/`on_path` track the current simple path.
+fn dfs_cycles(
+    graph: &LoopGraph,
+    start: i32,
+    current: i32,
+    path: &mut Vec<i32>,
+    on_path: &mut HashSet<i32>,
+    found: &mut BTreeSet<Vec<i32>>,
+) {
+    if found.len() >= MAX_CYCLES {
+        return;
+    }
+    path.push(current);
+    on_path.insert(current);
+
+    if let Some(succs) = graph.adj.get(&current) {
+        for &next in succs {
+            if next == start {
+                // Closed a cycle back to the start. Record it (>= 2 nodes by
+                // construction; self-loops were never added as edges).
+                if path.len() >= 2 {
+                    found.insert(canonicalize_cycle(path));
+                    if found.len() >= MAX_CYCLES {
+                        break;
+                    }
+                }
+                continue;
+            }
+            // Only extend through nodes strictly greater than the start (so the
+            // start is the minimum), not already on the path, within the length
+            // cap.
+            if next > start && !on_path.contains(&next) && path.len() < MAX_CYCLE_LEN {
+                dfs_cycles(graph, start, next, path, on_path, found);
+                if found.len() >= MAX_CYCLES {
+                    break;
+                }
+            }
+        }
+    }
+
+    on_path.remove(&current);
+    path.pop();
+}
+
+/// Rotate a cycle so its smallest uid is first, preserving traversal direction.
+/// The DFS already guarantees the start (= minimum) is element 0, but rotating
+/// defensively keeps the canonical form correct for any caller.
+fn canonicalize_cycle(cycle: &[i32]) -> Vec<i32> {
+    if cycle.is_empty() {
+        return Vec::new();
+    }
+    let min_idx = cycle
+        .iter()
+        .enumerate()
+        .min_by_key(|&(_, v)| *v)
+        .map(|(i, _)| i)
+        .unwrap_or(0);
+    let mut out = Vec::with_capacity(cycle.len());
+    for k in 0..cycle.len() {
+        out.push(cycle[(min_idx + k) % cycle.len()]);
+    }
+    out
+}
+
+/// Isoperimetric penalty `1 - Q` for one cycle's node-box centers, or `None` if
+/// the cycle does not qualify (fewer than 3 distinct positioned nodes, or a
+/// degenerate zero-perimeter polygon). `Q = 4*PI*Area / Perimeter^2` is clamped
+/// to [0, 1]; `Area` is the shoelace area (absolute value) and `Perimeter` the
+/// summed edge length over the closed polygon.
+fn cycle_penalty(cycle: &[i32], centers: &BTreeMap<i32, Point>) -> Option<f64> {
+    // Distinct positioned nodes only: a polygon needs >= 3 vertices.
+    let distinct: BTreeSet<i32> = cycle.iter().copied().collect();
+    if distinct.len() < 3 {
+        return None;
+    }
+    let pts: Vec<Point> = cycle
+        .iter()
+        .filter_map(|uid| centers.get(uid).copied())
+        .collect();
+    if pts.len() < 3 {
+        return None;
+    }
+
+    let n = pts.len();
+    let mut area2 = 0.0;
+    let mut perimeter = 0.0;
+    for i in 0..n {
+        let a = pts[i];
+        let b = pts[(i + 1) % n];
+        area2 += a.x * b.y - b.x * a.y;
+        let dx = b.x - a.x;
+        let dy = b.y - a.y;
+        perimeter += (dx * dx + dy * dy).sqrt();
+    }
+    if perimeter <= 0.0 {
+        // All centers coincide: no polygon. Guarded so the division below is
+        // never NaN; such a degenerate cycle simply does not contribute.
+        return None;
+    }
+    let area = area2.abs() / 2.0;
+    let q = (4.0 * std::f64::consts::PI * area / (perimeter * perimeter)).clamp(0.0, 1.0);
+    Some(1.0 - q)
+}
+
+/// `loop_compactness`: mean isoperimetric penalty `1 - Q` over the view's
+/// bounded simple directed cycles of >= 3 positioned nodes. 0.0 when there is no
+/// qualifying cycle. Deterministic for a given view regardless of element order
+/// (see the module comment above). PURE.
+fn compute_loop_compactness(view: &datamodel::StockFlow) -> f64 {
+    let graph = build_loop_graph(view);
+    let cycles = enumerate_simple_cycles(&graph);
+    let penalties: Vec<f64> = cycles
+        .iter()
+        .filter_map(|c| cycle_penalty(c, &graph.centers))
+        .collect();
+    if penalties.is_empty() {
+        0.0
+    } else {
+        penalties.iter().sum::<f64>() / penalties.len() as f64
+    }
+}
+
 /// Compute the layout quality metrics for a completed view.
 ///
 /// PURE: takes data, returns scalars, performs no I/O. The `_config` parameter
@@ -501,6 +764,9 @@ pub fn compute_layout_metrics(
         None => 0.0,
     };
 
+    // --- loop_compactness (isoperimetric feedback-loop quality) ---
+    let loop_compactness = compute_loop_compactness(view);
+
     LayoutMetrics {
         node_overlap,
         node_connector_overlap,
@@ -511,7 +777,7 @@ pub fn compute_layout_metrics(
         aspect_penalty,
         // reserved; computed in a future rung
         chain_straightness: 0.0,
-        loop_compactness: 0.0,
+        loop_compactness,
     }
 }
 
@@ -563,6 +829,41 @@ mod tests {
         })
     }
 
+    /// A flow valve at `(x, y)` with a two-point polyline whose endpoints attach
+    /// to `from_uid` and `to_uid` (a stock--flow--stock segment). The point
+    /// coordinates are irrelevant to `loop_compactness` (which uses node-box
+    /// centers, not flow points), so they are placed at the valve.
+    fn flow_between(
+        uid: i32,
+        name: &str,
+        x: f64,
+        y: f64,
+        from_uid: i32,
+        to_uid: i32,
+    ) -> ViewElement {
+        ViewElement::Flow(view_element::Flow {
+            name: name.to_string(),
+            uid,
+            x,
+            y,
+            label_side: LabelSide::Bottom,
+            points: vec![
+                view_element::FlowPoint {
+                    x,
+                    y,
+                    attached_to_uid: Some(from_uid),
+                },
+                view_element::FlowPoint {
+                    x,
+                    y,
+                    attached_to_uid: Some(to_uid),
+                },
+            ],
+            compat: None,
+            label_compat: None,
+        })
+    }
+
     fn make_view(elements: Vec<ViewElement>) -> datamodel::StockFlow {
         datamodel::StockFlow {
             name: None,
@@ -1235,4 +1536,206 @@ mod tests {
             );
         }
     }
+
+    // --- loop_compactness (isoperimetric loop quality) ---
+
+    /// The center of a node's bare shape box (which is symmetric about the
+    /// element position, so this is the element center). Mirrors the centers the
+    /// metric uses to build each loop polygon.
+    fn shape_center(e: &ViewElement) -> Point {
+        let r = node_shape_box(e).unwrap();
+        Point {
+            x: (r.left + r.right) / 2.0,
+            y: (r.top + r.bottom) / 2.0,
+        }
+    }
+
+    /// Hand-computed isoperimetric penalty `1 - Q` for a polygon over the given
+    /// centers in order (shoelace area, summed-edge perimeter, Q clamped to
+    /// [0,1]). The test's independent oracle for `loop_compactness`.
+    fn expected_loop_penalty(centers: &[Point]) -> f64 {
+        let n = centers.len();
+        let mut area2 = 0.0;
+        let mut perim = 0.0;
+        for i in 0..n {
+            let a = centers[i];
+            let b = centers[(i + 1) % n];
+            area2 += a.x * b.y - b.x * a.y;
+            let dx = b.x - a.x;
+            let dy = b.y - a.y;
+            perim += (dx * dx + dy * dy).sqrt();
+        }
+        let area = area2.abs() / 2.0;
+        let q = (4.0 * std::f64::consts::PI * area / (perim * perim)).clamp(0.0, 1.0);
+        1.0 - q
+    }
+
+    #[test]
+    fn test_loop_compactness_circle_loop_near_zero() {
+        // Eight stocks placed on a circle of radius 300, wired into a directed
+        // 8-cycle by links 1->2->...->8->1. A well-spread loop reads as a clean
+        // circle, so its isoperimetric quotient Q is close to 1 and the penalty
+        // (1 - Q) is small.
+        let n: i32 = 8;
+        let radius = 300.0;
+        let mut elements: Vec<ViewElement> = Vec::new();
+        let mut centers: Vec<Point> = Vec::new();
+        for i in 0..n {
+            let theta = 2.0 * std::f64::consts::PI * f64::from(i) / f64::from(n);
+            let x = radius * theta.cos();
+            let y = radius * theta.sin();
+            let e = stock(i + 1, "n", x, y);
+            centers.push(shape_center(&e));
+            elements.push(e);
+        }
+        for i in 0..n {
+            let from = i + 1;
+            let to = (i + 1) % n + 1;
+            elements.push(straight_link(100 + i, from, to));
+        }
+        let view = make_view(elements);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let expected = expected_loop_penalty(&centers);
+        assert!(
+            (m.loop_compactness - expected).abs() < 1e-9,
+            "loop_compactness {} != hand-computed penalty {}",
+            m.loop_compactness,
+            expected
+        );
+        // A regular octagon's penalty is ~0.05 -- "near 0" (a clean circle).
+        assert!(
+            m.loop_compactness < 0.1,
+            "a well-spread circular loop should score near 0, got {}",
+            m.loop_compactness
+        );
+    }
+
+    #[test]
+    fn test_loop_compactness_collapsed_loop_higher() {
+        // The SAME directed 8-cycle, but the nodes are squished onto a nearly
+        // straight line (a collapsed/collinear loop). The polygon area shrinks
+        // toward zero while the perimeter stays large, so Q -> 0 and the penalty
+        // (1 - Q) -> 1: clearly higher than the circular placement.
+        let n: i32 = 8;
+        let mut elements: Vec<ViewElement> = Vec::new();
+        let mut centers: Vec<Point> = Vec::new();
+        for i in 0..n {
+            // Spread along x, with a tiny alternating y wobble so the polygon is
+            // non-degenerate (nonzero perimeter) but nearly collinear.
+            let x = f64::from(i) * 100.0;
+            let y = if i % 2 == 0 { 0.0 } else { 1.0 };
+            let e = stock(i + 1, "n", x, y);
+            centers.push(shape_center(&e));
+            elements.push(e);
+        }
+        for i in 0..n {
+            let from = i + 1;
+            let to = (i + 1) % n + 1;
+            elements.push(straight_link(100 + i, from, to));
+        }
+        let view = make_view(elements);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let expected = expected_loop_penalty(&centers);
+        assert!(
+            (m.loop_compactness - expected).abs() < 1e-9,
+            "loop_compactness {} != hand-computed penalty {}",
+            m.loop_compactness,
+            expected
+        );
+        // A nearly-collinear loop scores near 1 (squished).
+        assert!(
+            m.loop_compactness > 0.9,
+            "a collapsed/collinear loop should score near 1, got {}",
+            m.loop_compactness
+        );
+    }
+
+    #[test]
+    fn test_loop_compactness_no_cycle_is_zero() {
+        // A pure chain a -> b -> c (no feedback) has no directed cycle, so there
+        // is nothing to score: loop_compactness == 0.0.
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 200.0, 0.0),
+            aux(3, "c", 400.0, 0.0),
+            straight_link(10, 1, 2),
+            straight_link(11, 2, 3),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.loop_compactness, 0.0);
+    }
+
+    #[test]
+    fn test_loop_compactness_two_node_mutual_pair_is_zero() {
+        // A 2-node mutual pair (a -> b -> a) is a cycle, but two points form no
+        // polygon (fewer than 3 distinct nodes), so it contributes nothing.
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 200.0, 0.0),
+            straight_link(10, 1, 2),
+            straight_link(11, 2, 1),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.loop_compactness, 0.0);
+    }
+
+    #[test]
+    fn test_loop_compactness_flow_feedback_path_is_a_cycle() {
+        // A stock--flow--stock feedback path must enter the loop graph: stock #1
+        // and stock #2 connected by flow #3 (so #1 -> #3 -> #2), plus a link
+        // #2 -> #1 closing the loop. The cycle is {#1, #3, #2}: three distinct
+        // positioned nodes -> a real polygon -> a positive penalty.
+        let s1 = stock(1, "a", 0.0, 0.0);
+        let s2 = stock(2, "b", 300.0, 0.0);
+        let f = flow_between(3, "f", 150.0, 200.0, 1, 2);
+        let link = straight_link(10, 2, 1);
+        let view = make_view(vec![s1, s2, f, link]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.loop_compactness > 0.0,
+            "a stock--flow--stock feedback path must form a scored loop, got {}",
+            m.loop_compactness
+        );
+    }
+
+    #[test]
+    fn test_loop_compactness_deterministic_under_shuffle() {
+        // loop_compactness is a mean over cycles, each computed from node-box
+        // centers in cycle order. It must be invariant to the order elements
+        // appear in the view's element list.
+        let n: i32 = 6;
+        let radius = 250.0;
+        let mut elements: Vec<ViewElement> = Vec::new();
+        for i in 0..n {
+            let theta = 2.0 * std::f64::consts::PI * f64::from(i) / f64::from(n);
+            elements.push(stock(
+                i + 1,
+                "n",
+                radius * theta.cos(),
+                radius * theta.sin(),
+            ));
+        }
+        for i in 0..n {
+            let from = i + 1;
+            let to = (i + 1) % n + 1;
+            elements.push(straight_link(100 + i, from, to));
+        }
+        let base = compute_layout_metrics(&make_view(elements.clone()), &cfg());
+
+        // Reverse the element order (links before nodes, nodes reversed); the
+        // graph and its cycles are unchanged.
+        let mut shuffled = elements.clone();
+        shuffled.reverse();
+        let other = compute_layout_metrics(&make_view(shuffled), &cfg());
+
+        assert!(
+            (base.loop_compactness - other.loop_compactness).abs() < 1e-12,
+            "loop_compactness changed under element shuffle: {} vs {}",
+            base.loop_compactness,
+            other.loop_compactness
+        );
+        assert!(base.loop_compactness > 0.0);
+    }
 }

From 71afe1ba6f71311418152e49b157a1c6152f1993 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 11:03:33 -0700
Subject: [PATCH 25/38] engine: make label_overlap measure per-label
 obscuration (small-collision sensitive)

The old label_overlap divided the corpus-wide sum of label-vs-label and
label-vs-shape overlap areas by the corpus's total label area. That
under-counted a small-but-readability-killing overlap: a node circle
clipping the last couple of characters of a short label has a small
absolute overlap area, and dividing it by the total label area of every
label in the diagram diluted it to near zero (~0.004 on the eval corpus),
so the term could not distinguish "one label is 30% obscured" from "nothing
is obscured."

Redefine label_overlap as a per-label obscuration measure: for each labeled
element L with box B_L, approximate the area of B_L covered by any OTHER
label box or any OTHER element's bare shape box as the sum of the individual
overlap areas, capped at area(B_L) so the obscured fraction is in [0,1];
label_overlap is the SUM of those fractions over all labels. A small clip
now registers at its true obscuration fraction (~0.3 for the fixture above)
instead of ~0.004. The capped sum is a monotone proxy for the exact covered
area -- a pixel-exact union is unnecessary, and more/larger overlaps never
decrease a label's fraction.

This preserves the Phase-1 guards: a label is never charged against its own
element's shape box, and the comparison uses the label-free shape box
(node_shape_box), not the label-merged box, so label-vs-label coverage is
not re-counted via another node's merged bounds. A mutual label-label
collision is charged from both labels' perspectives -- intended, since both
are unreadable. area(B_L) == 0 is guarded (skip), so the term stays finite.

MetricWeights/weighted_cost are unchanged (calibration is a later phase). On
the eval corpus the within-model worse-is-higher anchor is preserved (best
<= median <= worst label_overlap for every model) while the magnitudes rise
to reflect true per-label obscuration: e.g. reliability best 0.127 -> 0.937,
fishbanks best 0.019 -> 0.312.
---
 src/simlin-engine/src/layout/metrics.rs | 234 ++++++++++++++++++------
 1 file changed, 179 insertions(+), 55 deletions(-)

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index f035f74ce..b840ce6c6 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -67,8 +67,12 @@ pub struct LayoutMetrics {
     /// a false causal connection; a connector under only a label is not
     /// charged here.
     pub node_connector_overlap: f64,
-    /// Sum of label-vs-label and label-vs-node overlap area, normalized by
-    /// total label area.
+    /// Sum over labeled elements of each label's *obscured fraction*: the area
+    /// of the label box covered by any other label box or any other element's
+    /// bare shape box, capped at the label's own area and divided by it (so each
+    /// term is in [0,1]). 0 = no label obscured. Per-label so a small overlap
+    /// registers at its true obscuration fraction rather than being diluted by
+    /// the corpus's total label area.
     pub label_overlap: f64,
     /// Edge crossings normalized by connector count.
     pub crossings: f64,
@@ -648,22 +652,36 @@ pub fn compute_layout_metrics(
         0.0
     };
 
-    // --- label_overlap ---
-    // Each label box is tagged with its owning element's uid so the
-    // label-vs-node sum can skip that element's own node box: a label is, by
-    // construction, adjacent to (and inside the merged bounds of) its own
-    // element, so charging it against its own box would always add exactly the
-    // label's area -- a constant that is not a real collision.
+    // --- label_overlap (per-label obscuration) ---
     //
-    // The label-vs-node sum compares each label against every OTHER element's
-    // bare *shape* box (`node_shape_box`), NOT its label-merged `node_box`.
-    // `aux_bounds`/`stock_bounds`/`flow_bounds` union each element's own label
-    // into the box they return, so comparing a label against another node's
-    // MERGED box would re-count a label-vs-label overlap that the label-vs-label
-    // term above already counts -- a double-count that inflates the term's
-    // magnitude (which Phase 4 calibrates against). Using the label-free shape
-    // cleanly separates "label lands on another label" from "label lands on
-    // another node's shape".
+    // For each labeled element L, measure how much of its label box B_L is
+    // covered (obscured) by OTHER drawn geometry, then SUM each label's obscured
+    // fraction. This is per-label rather than a single corpus-wide ratio: a
+    // small-but-readability-killing overlap (e.g. a node circle clipping the last
+    // two characters of a short label) registers at its true obscuration
+    // fraction instead of being diluted to ~0 by the corpus's total label area
+    // (the prior `sum_of_overlaps / total_label_area` definition under-counted
+    // exactly this case).
+    //
+    // The coverers of B_L are (a) any OTHER label box and (b) any OTHER element's
+    // bare *shape* box (`node_shape_box`, NOT the label-merged `node_box`):
+    //   * A label is never charged against its OWN element's shape box. By
+    //     construction a label sits adjacent to (and within the merged bounds of)
+    //     its own element, so charging it there would always add a constant that
+    //     is not a real collision.
+    //   * Comparing against the bare shape box (not the label-merged box) keeps
+    //     "label lands on another label" and "label lands on another node's
+    //     shape" cleanly separate -- the merged box unions that node's own label,
+    //     which would re-count the label-vs-label coverage already captured by
+    //     the label-box term.
+    //
+    // A pixel-exact union of all coverers is unnecessary: the covered area is
+    // approximated by the SUM of individual overlap areas, capped at area(B_L) so
+    // a label's obscured fraction stays in [0,1] even when coverers overlap each
+    // other. This is a monotone proxy (more/larger overlaps never decrease the
+    // fraction). A mutual label-label collision is charged from BOTH labels'
+    // perspectives -- intended, since both are unreadable. Guards area(B_L) == 0
+    // (degenerate label) by skipping it, so the term is always finite.
     let label_boxes: Vec<(i32, Rect)> = view
         .elements
         .iter()
@@ -671,28 +689,32 @@ pub fn compute_layout_metrics(
         .collect();
     // `node_shape_boxes` is computed once above (shared with node_overlap and
     // node_connector_overlap).
-    let total_label_area: f64 = label_boxes.iter().map(|(_, r)| rect_area(r)).sum();
-    let label_overlap = if total_label_area > 0.0 {
-        let mut overlap = 0.0;
-        // label-vs-label (each unordered pair once)
-        for i in 0..label_boxes.len() {
-            for j in (i + 1)..label_boxes.len() {
-                overlap += rect_overlap_area(&label_boxes[i].1, &label_boxes[j].1);
+    let mut label_overlap = 0.0;
+    for (lbl_uid, lbl) in &label_boxes {
+        let lbl_area = rect_area(lbl);
+        if lbl_area <= 0.0 {
+            continue; // degenerate label box: no NaN, contributes nothing
+        }
+        let mut covered = 0.0;
+        // Covered by every OTHER label box.
+        for (other_uid, other) in &label_boxes {
+            if other_uid == lbl_uid {
+                continue;
             }
+            covered += rect_overlap_area(lbl, other);
         }
-        // label-vs-node, against the OTHER element's bare shape box.
-        for (lbl_uid, lbl) in &label_boxes {
-            for (node_uid, node) in &node_shape_boxes {
-                if lbl_uid == node_uid {
-                    continue;
-                }
-                overlap += rect_overlap_area(lbl, node);
+        // Covered by every OTHER element's bare shape box.
+        for (node_uid, node) in &node_shape_boxes {
+            if node_uid == lbl_uid {
+                continue;
             }
+            covered += rect_overlap_area(lbl, node);
         }
-        overlap / total_label_area
-    } else {
-        0.0
-    };
+        // Cap the (possibly over-counted) covered area at the label's own area
+        // so the obscured fraction is in [0,1].
+        let obscured_fraction = (covered.min(lbl_area)) / lbl_area;
+        label_overlap += obscured_fraction;
+    }
 
     // --- crossings ---
     let connector_count = connectors.len();
@@ -819,6 +841,19 @@ mod tests {
         })
     }
 
+    /// A cloud at `(x, y)`. A cloud is a positioned node with a bare shape box
+    /// (`cloud_bounds`, a 27x27 square: CLOUD_RADIUS = 13.5) and NO rendered
+    /// label, so it is the cleanest "obscuring shape" fixture for label_overlap.
+    fn cloud(uid: i32, x: f64, y: f64) -> ViewElement {
+        ViewElement::Cloud(view_element::Cloud {
+            uid,
+            flow_uid: -1,
+            x,
+            y,
+            compat: None,
+        })
+    }
+
     fn straight_link(uid: i32, from_uid: i32, to_uid: i32) -> ViewElement {
         ViewElement::Link(view_element::Link {
             uid,
@@ -1167,25 +1202,37 @@ mod tests {
         );
     }
 
-    // --- AC1.4: label_overlap ---
+    // --- AC1.4: label_overlap (per-label obscuration) ---
+    //
+    // label_overlap is the SUM over labeled elements of each label's obscured
+    // fraction: the area of the label box covered by any OTHER label box or any
+    // OTHER element's bare shape box, capped at the label's own area and divided
+    // by it (so each term is in [0,1]). 0 = no label obscured. A small overlap
+    // registers at its true per-label obscuration fraction rather than being
+    // diluted by the corpus's total label area (the old area/total definition's
+    // under-counting; see `test_label_overlap_small_clip_is_sensitive`).
 
     #[test]
     fn test_label_overlap_overlapping_labels() {
-        // Two auxes at the same position -> their labels (Bottom) coincide.
+        // Two auxes at the same position -> their labels (Bottom) coincide
+        // exactly. Each label is fully covered by the other (capped at its own
+        // area), so each obscured fraction is 1.0 and the sum is 2.0.
         let view = make_view(vec![
             aux(1, "samename", 100.0, 100.0),
             aux(2, "samename", 100.0, 100.0),
         ]);
         let m = compute_layout_metrics(&view, &cfg());
         assert!(
-            m.label_overlap > 0.0,
-            "coincident labels must produce positive label_overlap"
+            (m.label_overlap - 2.0).abs() < 1e-9,
+            "two coincident labels are each fully obscured: expected 2.0, got {}",
+            m.label_overlap
         );
     }
 
     #[test]
     fn test_label_overlap_disjoint_is_zero() {
-        // Two auxes far apart -> labels and node boxes are all disjoint.
+        // Two auxes far apart -> no label is covered by anything. Sum of
+        // obscured fractions is 0.0.
         let view = make_view(vec![aux(1, "a", 0.0, 0.0), aux(2, "b", 1000.0, 1000.0)]);
         let m = compute_layout_metrics(&view, &cfg());
         assert_eq!(m.label_overlap, 0.0);
@@ -1193,12 +1240,11 @@ mod tests {
 
     #[test]
     fn test_label_overlap_counts_label_pair_exactly_once() {
-        // Regression for the label_overlap double-count: the label-vs-node term
-        // must compare each label against the OTHER element's bare *shape* box,
-        // not its label-merged `*_bounds` box. Otherwise a label-vs-label
-        // overlap is also counted (once or twice more) inside the other node's
-        // merged box, inflating the magnitude (and the Phase 4 weight it
-        // calibrates).
+        // The Phase-1 double-count guard, restated for per-label obscuration: a
+        // label is never charged against its OWN element's shape box, and a
+        // label-vs-label collision is counted from each label's own perspective
+        // (both labels are unreadable -- that is intended), not via the other
+        // node's label-merged bounds.
         //
         // Fixture: two `LabelSide::Bottom` auxes named "samename" (8 chars).
         //   AUX_RADIUS = 9; label editor width = 8*6 + 10 = 58, height = 14.
@@ -1209,28 +1255,106 @@ mod tests {
         //   aux1 @ (0,0): shape [-9,9]x[-9,9],  label [-29,29]x[13,27]
         //   aux2 @ (40,0): shape [31,49]x[-9,9], label [11,69]x[13,27]
         //
-        // SHAPE boxes do NOT overlap (9 < 31). LABELS overlap by
-        //   x: [11,29] = 18, y: [13,27] = 14  ->  18*14 = 252.
-        // Each label clears the OTHER aux's bare shape box entirely, so the only
-        // contribution is the single label-vs-label pair: total overlap = 252.
+        // SHAPE boxes do NOT overlap (9 < 31), and each label clears the OTHER
+        // aux's bare shape box entirely (label y [13,27] vs shape y [-9,9]). The
+        // LABELS overlap by x:[11,29]=18, y:[13,27]=14 -> 252. Each label box has
+        // area 58*14 = 812 and is covered only by the other label (252 < 812, no
+        // cap), so each obscured fraction is 252/812 and the sum is 504/812.
         let view = make_view(vec![
             aux(1, "samename", 0.0, 0.0),
             aux(2, "samename", 40.0, 0.0),
         ]);
         let m = compute_layout_metrics(&view, &cfg());
 
-        // Total label area = 2 * (58 * 14) = 1624.
-        let expected_overlap = 18.0 * 14.0; // 252.0, counted exactly once
-        let total_label_area = 2.0 * (58.0 * 14.0); // 1624.0
-        let expected = expected_overlap / total_label_area;
+        let label_area = 58.0 * 14.0; // 812.0
+        let overlap = 18.0 * 14.0; // 252.0, the single label-label intersection
+        let expected = (overlap / label_area) + (overlap / label_area); // 504/812
         assert!(
             (m.label_overlap - expected).abs() < 1e-9,
-            "label_overlap should count the label pair exactly once: got {} expected {}",
+            "per-label obscuration should sum each label's fraction once: got {} expected {}",
             m.label_overlap,
             expected
         );
     }
 
+    #[test]
+    fn test_label_overlap_never_charged_against_own_shape() {
+        // A single labeled aux: its Bottom label sits adjacent to (and partly
+        // within the merged bounds of) its OWN shape. A label is never charged
+        // against its own element's shape, and there is no other element, so the
+        // obscured fraction is 0 and label_overlap is exactly 0.0.
+        let view = make_view(vec![aux(1, "samename", 0.0, 0.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(
+            m.label_overlap, 0.0,
+            "a label must never be charged against its own element's shape box"
+        );
+    }
+
+    #[test]
+    fn test_label_overlap_small_clip_is_sensitive() {
+        // A small node SHAPE clipping a few characters of a short label must
+        // register at its true per-label obscuration fraction, NOT be diluted to
+        // ~0 by the corpus's total label area (the old area/total under-count).
+        //
+        // L: aux "ab" (2 chars) @ (0,0), Bottom label.
+        //   editor_width = 2*6 + 10 = 22, height 14 -> label area 308.
+        //   label box: left -11, right 11, top 13, bottom 27.
+        // O: a cloud (no label) @ (18, 20). cloud_bounds (CLOUD_RADIUS 13.5):
+        //   x [4.5, 31.5], y [6.5, 33.5].
+        //   Overlap with L's label: x [4.5,11]=6.5, y [13,27]=14 -> 91.
+        //   obscured_fraction(L) = 91/308 ~= 0.2955; the cloud has no label, so
+        //   the sum is exactly 91/308.
+        // Plus 15 far-apart auxes with long (20-char) labels: each label area
+        //   20*6+10 = 130 wide * 14 = 1820, none overlapping anything. They add
+        //   nothing to the per-label SUM (obscured fraction 0 each) but bloat the
+        //   OLD denominator (total label area), so the OLD area/total score for
+        //   the same clip collapses to ~0.003 -- the under-count this fixes.
+        let mut elements = vec![aux(1, "ab", 0.0, 0.0), cloud(2, 18.0, 20.0)];
+        for k in 0..15 {
+            // Far apart on a 1000px grid so nothing overlaps; 20-char names.
+            elements.push(aux(
+                100 + k,
+                "abcdefghijklmnopqrst",
+                3000.0 + f64::from(k) * 1000.0,
+                3000.0,
+            ));
+        }
+        let view = make_view(elements);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let label_area = 22.0 * 14.0; // 308.0
+        let clip_area = 6.5 * 14.0; // 91.0
+        let expected = clip_area / label_area; // ~0.2955
+        assert!(
+            (m.label_overlap - expected).abs() < 1e-9,
+            "small clip must score its per-label obscuration fraction: got {} expected {}",
+            m.label_overlap,
+            expected
+        );
+        assert!(
+            m.label_overlap > 0.1,
+            "a readability-killing clip must register clearly (> 0.1), got {}",
+            m.label_overlap
+        );
+
+        // Confirm the OLD area/total definition would have under-counted this to
+        // near-zero: the same clip area divided by the corpus total label area.
+        let total_label_area = label_area + 15.0 * (130.0 * 14.0); // 308 + 27300
+        let old_score = clip_area / total_label_area; // ~0.0033
+        assert!(
+            old_score < 0.01,
+            "fixture must demonstrate the old under-count (< 0.01), got {}",
+            old_score
+        );
+        assert!(
+            m.label_overlap > old_score * 50.0,
+            "new per-label score {} must be far larger than the old {}",
+            m.label_overlap,
+            old_score
+        );
+    }
+
     // --- AC1.5: aspect_penalty ---
 
     #[test]

From 386b701ac59d8392a8adcaa8a80cba4eefb46782 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 11:18:04 -0700
Subject: [PATCH 26/38] engine: commit calibrated default MetricWeights
 (readability-dominant)

Replace the Phase-1 all-zeros placeholder Default impl for MetricWeights
with the calibrated production weights from the Phase 3 contact-sheet
calibration, with explicit user sign-off (2026-05-23):

  node_overlap            = 1.0
  node_connector_overlap  = 1.0
  label_overlap           = 1.0
  crossings               = 1.0
  sprawl                  = 0.0
  edge_length_cv          = 0.0
  aspect_penalty          = 0.0
  loop_compactness        = 0.25
  chain_straightness      = 0.0

Failure-mode rationale -- readability >> compactness. The dominant
concerns (node-shape overlap, connectors under node shapes, obscured
labels, edge crossings) carry weight 1: they make a diagram unreadable
or assert false causal connections. sprawl, edge_length_cv, and
aspect_penalty are intentionally 0 -- compactness and aspect ratio are
NOT goals; spreading out to keep labels legible and feedback loops
visible is good, not penalized. loop_compactness is a low 0.25 that
gently rewards drawing feedback loops as visible circles without
dominating the overlap/crossings family. chain_straightness stays 0
(reserved, not yet computed).

The old AC5.1 test asserted the default was all-zeros so weighted_cost
was inert; that is no longer true by design. It is replaced by a
dominance-ordering test that pins the readability-dominant relationships
(each overlap/crossings term strictly exceeds each compactness term;
sprawl/edge_length_cv/aspect_penalty/chain_straightness == 0;
0 < loop_compactness < node_overlap) and re-confirms weighted_cost under
the default equals the explicit sum of weighted terms.
---
 src/simlin-engine/src/layout/metrics.rs | 143 ++++++++++++++++++++----
 1 file changed, 119 insertions(+), 24 deletions(-)

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index b840ce6c6..9c582011f 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -94,10 +94,8 @@ pub struct LayoutMetrics {
 
 /// Per-term weights for the scalar an optimizer minimizes.
 ///
-/// The calibrated production weights (and the failure-mode priority ordering)
-/// are committed in Phase 4. Until then `MetricWeights::default()` is all-zeros
-/// (see below) so any accidental use of `weighted_cost` before calibration is
-/// obviously inert rather than silently wrong.
+/// `MetricWeights::default()` holds the calibrated production weights committed
+/// in Phase 4 (see the failure-mode rationale on the `Default` impl below).
 ///
 /// `Serialize`/`Deserialize` let the layout-quality eval sweep
 /// (`examples/layout_eval.rs`) record the weight set it used in its
@@ -118,21 +116,36 @@ pub struct MetricWeights {
 }
 
 impl Default for MetricWeights {
-    /// All-zeros: calibrated in Phase 4. An all-zero weight set makes
-    /// `weighted_cost` return 0.0 regardless of the metrics, so using the
-    /// default before calibration is inert (cannot mislead an optimizer) rather
-    /// than applying made-up weights.
+    /// The calibrated production weights, from the Phase 3 contact-sheet
+    /// calibration with explicit user sign-off (2026-05-23).
+    ///
+    /// Failure-mode rationale -- readability >> compactness:
+    ///   * The dominant concerns all carry weight 1.0: node-shape overlap
+    ///     (`node_overlap`), connectors passing under node shapes
+    ///     (`node_connector_overlap`), obscured labels (`label_overlap`), and
+    ///     edge `crossings`. These are the things that make a diagram unreadable
+    ///     or assert false causal connections, so they dominate the cost.
+    ///   * `sprawl`, `edge_length_cv`, and `aspect_penalty` are intentionally
+    ///     0.0: compactness and aspect ratio are NOT goals. Spreading nodes out
+    ///     to keep labels legible and feedback loops visible is GOOD, not
+    ///     something to penalize, so these terms must not pull against
+    ///     readability.
+    ///   * `loop_compactness` is a low 0.25: it gently REWARDS drawing feedback
+    ///     loops as visible circles (a readability aid), but must never dominate
+    ///     the overlap/crossings family, so it stays well below 1.0.
+    ///   * `chain_straightness` stays 0.0: it is reserved (not yet computed), so
+    ///     it carries no weight.
     fn default() -> Self {
         MetricWeights {
-            node_overlap: 0.0,
-            node_connector_overlap: 0.0,
-            label_overlap: 0.0,
-            crossings: 0.0,
+            node_overlap: 1.0,
+            node_connector_overlap: 1.0,
+            label_overlap: 1.0,
+            crossings: 1.0,
             sprawl: 0.0,
             edge_length_cv: 0.0,
             aspect_penalty: 0.0,
             chain_straightness: 0.0,
-            loop_compactness: 0.0,
+            loop_compactness: 0.25,
         }
     }
 }
@@ -1449,20 +1462,102 @@ mod tests {
         assert!((m.weighted_cost(&w) - expected).abs() < 1e-9);
     }
 
+    // --- AC5.1: the committed calibrated default expresses readability dominance ---
+    //
+    // The Phase-1 placeholder default was all-zeros (so a pre-calibration
+    // `weighted_cost` was inert). Phase 4 commits real, user-signed-off weights
+    // (2026-05-23), so the default is no longer all-zeros and `weighted_cost`
+    // under it is now meaningful. This test pins the DOMINANCE ORDERING the
+    // committed weights encode -- relationships rather than magic numbers, so it
+    // documents the intent and survives minor retuning -- and re-confirms that
+    // `weighted_cost` applies the default exactly as Σ wᵢ·termᵢ. It replaces the
+    // old "default is all-zeros so cost is inert" assertion, which is no longer
+    // true by design.
+
     #[test]
-    fn test_default_weights_are_all_zero_so_cost_is_inert() {
+    fn test_default_weights_readability_dominant_ordering() {
+        let w = MetricWeights::default();
+
+        // The dominant "overlap + crossings" family: each term that hurts
+        // readability (shapes overlapping shapes, connectors under shapes, labels
+        // obscured, edges crossing) must outweigh every compactness/aspect term.
+        let dominant = [
+            w.node_overlap,
+            w.node_connector_overlap,
+            w.label_overlap,
+            w.crossings,
+        ];
+        let compactness = [w.sprawl, w.edge_length_cv, w.aspect_penalty];
+        for &d in &dominant {
+            for &c in &compactness {
+                assert!(
+                    d > c,
+                    "every readability term ({d}) must strictly exceed every \
+                     compactness/aspect term ({c})"
+                );
+            }
+        }
+
+        // Compactness/aspect are intentionally zero: spreading out to keep labels
+        // legible and feedback loops visible is good, not penalized.
+        assert_eq!(w.sprawl, 0.0, "sprawl is not a goal");
+        assert_eq!(
+            w.edge_length_cv, 0.0,
+            "edge-length uniformity is not a goal"
+        );
+        assert_eq!(w.aspect_penalty, 0.0, "aspect ratio is not a goal");
+
+        // chain_straightness is reserved (not yet computed), so it carries no
+        // weight.
+        assert_eq!(
+            w.chain_straightness, 0.0,
+            "chain_straightness is reserved and must stay zero"
+        );
+
+        // loop_compactness rewards visible feedback-loop circles, but only as a
+        // gentle nudge: a low, non-dominant weight strictly between zero and the
+        // dominant family.
+        assert!(
+            w.loop_compactness > 0.0,
+            "loop_compactness should gently reward visible loops, got {}",
+            w.loop_compactness
+        );
+        assert!(
+            w.loop_compactness < w.node_overlap,
+            "loop_compactness ({}) must stay below the dominant node_overlap ({})",
+            w.loop_compactness,
+            w.node_overlap
+        );
+
+        // `weighted_cost` under the default is still the exact linear combination
+        // (the default is now meaningful, not inert): verify against an explicit
+        // Σ wᵢ·termᵢ over a hand-set metrics value.
         let m = LayoutMetrics {
-            node_overlap: 1.0,
-            node_connector_overlap: 1.0,
-            label_overlap: 1.0,
-            crossings: 1.0,
-            sprawl: 1.0,
-            edge_length_cv: 1.0,
-            aspect_penalty: 1.0,
-            chain_straightness: 1.0,
-            loop_compactness: 1.0,
+            node_overlap: 0.3,
+            node_connector_overlap: 0.1,
+            label_overlap: 0.7,
+            crossings: 2.0,
+            sprawl: 5.0,
+            edge_length_cv: 0.4,
+            aspect_penalty: 1.5,
+            chain_straightness: 0.0,
+            loop_compactness: 0.8,
         };
-        assert_eq!(m.weighted_cost(&MetricWeights::default()), 0.0);
+        let expected = m.node_overlap * w.node_overlap
+            + m.node_connector_overlap * w.node_connector_overlap
+            + m.label_overlap * w.label_overlap
+            + m.crossings * w.crossings
+            + m.sprawl * w.sprawl
+            + m.edge_length_cv * w.edge_length_cv
+            + m.aspect_penalty * w.aspect_penalty
+            + m.chain_straightness * w.chain_straightness
+            + m.loop_compactness * w.loop_compactness;
+        assert!(
+            (m.weighted_cost(&w) - expected).abs() < 1e-12,
+            "weighted_cost under the default must equal Σ wᵢ·termᵢ: got {} expected {}",
+            m.weighted_cost(&w),
+            expected
+        );
     }
 
     // --- AC1.7: empty / single-element views are all-zero and finite ---

From 360024c52d877f12f217cdf538885ada77a47476 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 11:25:19 -0700
Subject: [PATCH 27/38] engine: add human-vs-auto reference-pair ordering test
 (AC5.2)

Encode the user's taste anchors as an objective validation of the
calibrated MetricWeights::default(): on the agreed reference pairs the
shipped hand-authored (human) layout must score a lower weighted_cost
than a machine-generated (auto) layout of the same model.

Construction (b) -- human view vs generated layout. For each anchor we
load the default_projects XMILE model, score its as-loaded main view
(human), generate a single fixed-seed layout (auto), and assert
human < auto under the committed weights. One fixed seed (not
generate_best_layout) keeps the test deterministic (layout is
deterministic per seed, #633) and fast: all five tests run in well under
half a second over the small (<= 42 element) default_projects.

Anchor costs under the committed weights (human < auto):
  reliability:      human 0.255 < auto 2.997
  fishbanks:        human 0.194 < auto 1.603
  population:       human 0.052 < auto 0.053
  logistic-growth:  human 0.089 < auto 0.154

sir is deliberately NOT a human<auto anchor: its shipped reference
obscures more labels than the auto layout (human 0.458 > auto 0.039), so
the metric correctly prefers the auto. That direction is pinned by a
separate documented test so the asymmetry is recorded, not silently
assumed.
---
 src/simlin-engine/src/layout/metrics.rs | 142 ++++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index 9c582011f..b5d3c7abb 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -1957,4 +1957,146 @@ mod tests {
         );
         assert!(base.loop_compactness > 0.0);
     }
+
+    // --- AC5.2: human-vs-auto reference-pair ordering under the committed weights ---
+    //
+    // The committed `MetricWeights::default()` must agree with the user's visual
+    // taste: on the agreed reference pairs the SHIPPED, hand-authored ("human")
+    // layout must score a lower `weighted_cost` than a machine-generated
+    // ("auto") layout of the SAME model. This is the objective validation of the
+    // calibration (Phase 4, AC5.2): if the metric and the weights did not agree
+    // with human taste on an obvious pair, the metric or the pair would be wrong.
+    //
+    // Construction (b) -- "human view vs generated layout" (design glossary): the
+    // four `default_projects` models each ship a hand-authored main view. We
+    // score that as-loaded view (human) and a fixed-seed `generate_layout_with_config`
+    // layout (auto) of the same model, and assert `human < auto`.
+    //
+    // Determinism + budget: layout is deterministic per seed (fix #633), so ONE
+    // fixed seed (not `generate_best_layout`'s multi-seed search) makes the test
+    // reproducible AND fast. The four default_projects are small (<= 42
+    // elements), so a single layout generation each is well under the per-test
+    // budget.
+    //
+    // Anchors: reliability, fishbanks, population, dp(logistic-growth). These all
+    // flip the right way under the committed weights (verified during
+    // calibration). `sir` is deliberately NOT a human<auto anchor -- its shipped
+    // reference genuinely obscures more labels than the auto layout, so the
+    // metric correctly prefers the auto; that direction is pinned separately by
+    // `test_sir_auto_beats_reference_under_default_weights` so the asymmetry is
+    // documented rather than silently dropped.
+
+    /// A fixed annealing seed for the auto layout. Any single fixed seed makes the
+    /// test deterministic; 42 matches the convention used elsewhere in the layout
+    /// config.
+    const REF_PAIR_SEED: u64 = 42;
+
+    /// Load a `default_projects` XMILE model by directory name, resolving the path
+    /// against `CARGO_MANIFEST_DIR` (= `src/simlin-engine`) like the layout
+    /// integration tests. Panics with a clear message on any I/O or parse failure
+    /// (a missing fixture is a test-environment bug, not a metric result).
+    fn load_default_project(dir: &str) -> datamodel::Project {
+        let path = format!(
+            "{}/../../default_projects/{}/model.xmile",
+            env!("CARGO_MANIFEST_DIR"),
+            dir
+        );
+        let file =
+            std::fs::File::open(&path).unwrap_or_else(|e| panic!("failed to open {path}: {e}"));
+        let mut reader = std::io::BufReader::new(file);
+        crate::compat::open_xmile(&mut reader)
+            .unwrap_or_else(|e| panic!("failed to parse {path}: {e:?}"))
+    }
+
+    /// The model's as-loaded, hand-authored main `StockFlow` view (the "human"
+    /// reference). Panics if the model has no such view -- every chosen anchor
+    /// ships one, so its absence is a fixture regression.
+    fn human_view(project: &datamodel::Project) -> datamodel::StockFlow {
+        let model = project
+            .get_model("main")
+            .expect("anchor model must have a 'main' model");
+        match model.views.first() {
+            Some(datamodel::View::StockFlow(sf)) if !sf.elements.is_empty() => sf.clone(),
+            _ => panic!("anchor model must ship a non-empty hand-authored main view"),
+        }
+    }
+
+    /// `weighted_cost` of the shipped human layout under the committed default
+    /// weights.
+    fn human_cost(project: &datamodel::Project) -> f64 {
+        let view = human_view(project);
+        compute_layout_metrics(&view, &LayoutConfig::default())
+            .weighted_cost(&MetricWeights::default())
+    }
+
+    /// `weighted_cost` of a single fixed-seed generated layout under the committed
+    /// default weights. Deterministic per seed, so the score is reproducible.
+    fn auto_cost(project: &datamodel::Project) -> f64 {
+        let cfg = LayoutConfig {
+            annealing_random_seed: REF_PAIR_SEED,
+            ..LayoutConfig::default()
+        };
+        let view = crate::layout::generate_layout_with_config(project, "main", cfg.clone(), None)
+            .expect("auto layout generation must succeed for the anchor model");
+        compute_layout_metrics(&view, &cfg).weighted_cost(&MetricWeights::default())
+    }
+
+    /// Assert the human reference beats the auto layout for one anchor model,
+    /// naming the model and both costs on failure (so a calibration regression is
+    /// immediately legible).
+    fn assert_human_beats_auto(dir: &str) {
+        let project = load_default_project(dir);
+        let human = human_cost(&project);
+        let auto = auto_cost(&project);
+        assert!(
+            human < auto,
+            "reference pair {dir}: expected human_cost ({human}) < auto_cost ({auto}) \
+             under MetricWeights::default()"
+        );
+    }
+
+    #[test]
+    fn test_reference_pair_reliability_human_beats_auto() {
+        assert_human_beats_auto("reliability");
+    }
+
+    #[test]
+    fn test_reference_pair_fishbanks_human_beats_auto() {
+        assert_human_beats_auto("fishbanks");
+    }
+
+    #[test]
+    fn test_reference_pair_population_human_beats_auto() {
+        assert_human_beats_auto("population");
+    }
+
+    #[test]
+    fn test_reference_pair_dp_logistic_growth_human_beats_auto() {
+        assert_human_beats_auto("logistic-growth");
+    }
+
+    #[test]
+    fn test_sir_auto_beats_reference_under_default_weights() {
+        // The documented NON-anchor: SIR's shipped reference obscures more labels
+        // than the auto layout, so the metric correctly prefers the auto. This
+        // pins that direction so the asymmetry (why SIR is excluded from the
+        // human<auto anchors) is recorded rather than silently assumed.
+        let path = format!(
+            "{}/../../test/test-models/samples/SIR/SIR.stmx",
+            env!("CARGO_MANIFEST_DIR")
+        );
+        let file =
+            std::fs::File::open(&path).unwrap_or_else(|e| panic!("failed to open {path}: {e}"));
+        let mut reader = std::io::BufReader::new(file);
+        let project = crate::compat::open_xmile(&mut reader)
+            .unwrap_or_else(|e| panic!("failed to parse {path}: {e:?}"));
+
+        let human = human_cost(&project);
+        let auto = auto_cost(&project);
+        assert!(
+            auto < human,
+            "sir is a documented non-anchor: expected auto_cost ({auto}) < human_cost ({human}) \
+             under MetricWeights::default() (its reference obscures more labels than the auto)"
+        );
+    }
 }

From 634db4ca1180d0332aa19ffe580ecb90386e4aec Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 11:29:22 -0700
Subject: [PATCH 28/38] engine: layout_eval uses calibrated default weights +
 reseed baseline

Now that Phase 4 has committed the calibrated MetricWeights::default(),
the layout_eval sweep scores every weighted_cost with the committed
default instead of the Phase-3 PLACEHOLDER_WEIGHTS const. Remove the
PLACEHOLDER_WEIGHTS const (and its 'Phase 4 will replace' doc comment)
and route the three call sites (per-seed sweep, render scoring, and the
report builder) through MetricWeights::default().

Re-seed the committed examples/layout_eval_baseline.json from current
behavior under the calibrated weights, over the same small documented
subset (teacup,sir at M=8) so the committed file stays small. The
recorded per-seed sample costs change because the weights changed, so
the old baseline was stale. Update the sibling README to note the
2026-05-23 re-seed with the committed weights and drop the now-done
'regenerate after Phase 4' caveat.
---
 src/simlin-engine/examples/layout_eval.rs     |  36 +--
 .../examples/layout_eval_baseline.README.md   |  11 +-
 .../examples/layout_eval_baseline.json        | 210 +++++++++---------
 3 files changed, 115 insertions(+), 142 deletions(-)

diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
index bc8db4d4d..2199d5d70 100644
--- a/src/simlin-engine/examples/layout_eval.rs
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -55,34 +55,6 @@ use simlin_engine::layout::generate_layout_with_config;
 use simlin_engine::layout::metrics::{LayoutMetrics, MetricWeights, compute_layout_metrics};
 use simlin_engine::{datamodel, open_vensim, open_xmile};
 
-/// Phase-3 PLACEHOLDER weights for `weighted_cost`.
-///
-/// `MetricWeights::default()` is all-zeros until Phase 4 commits the calibrated
-/// weights (so any accidental pre-calibration use of `weighted_cost` is inert
-/// rather than silently wrong). The sweep needs a *non-trivial* scalar to rank
-/// seeds (best/median/worst) and to compute the corpus geomean, so this
-/// placeholder encodes the design's intended failure-mode priorities:
-/// the overlap family (`node_overlap`, `node_connector_overlap`, `label_overlap`)
-/// and edge `crossings` are dominant; `sprawl`, `edge_length_cv`, and
-/// `aspect_penalty` are moderate; the reserved structure terms
-/// (`chain_straightness`, `loop_compactness`, always 0.0 in Phase 1-3) carry
-/// zero weight.
-///
-/// Phase 4 commits the calibrated `MetricWeights` (its `Default`); when it
-/// lands, this placeholder MUST be replaced by `MetricWeights::default()` (see
-/// the Phase 4 plan, Task 2).
-const PLACEHOLDER_WEIGHTS: MetricWeights = MetricWeights {
-    node_overlap: 1.0,
-    node_connector_overlap: 1.0,
-    label_overlap: 1.0,
-    crossings: 1.0,
-    sprawl: 0.25,
-    edge_length_cv: 0.25,
-    aspect_penalty: 0.25,
-    chain_straightness: 0.0,
-    loop_compactness: 0.0,
-};
-
 /// The model name the layout pipeline and renderer operate on. `Project::get_model`
 /// maps "main" to the single/main model (matching `tests/layout.rs`).
 const MAIN_MODEL: &str = "main";
@@ -380,7 +352,7 @@ fn sweep_model(key: &str, project: &datamodel::Project, seeds: &[u64]) -> ModelS
             match generate_layout_with_config(project, MAIN_MODEL, cfg.clone(), None) {
                 Ok(view) => {
                     let metrics = compute_layout_metrics(&view, &cfg);
-                    let weighted_cost = metrics.weighted_cost(&PLACEHOLDER_WEIGHTS);
+                    let weighted_cost = metrics.weighted_cost(&MetricWeights::default());
                     Some((
                         seed,
                         MetricSample {
@@ -422,7 +394,7 @@ struct Render {
     seed: Option<u64>,
     /// Per-term metrics of the rendered view.
     metrics: LayoutMetrics,
-    /// Scalar weighted cost under the placeholder weights.
+    /// Scalar weighted cost under the calibrated default weights.
     weighted_cost: f64,
 }
 
@@ -466,7 +438,7 @@ fn render_view(
         eprintln!("WARN: failed to write {path}: {err}");
         return None;
     }
-    let weighted_cost = metrics.weighted_cost(&PLACEHOLDER_WEIGHTS);
+    let weighted_cost = metrics.weighted_cost(&MetricWeights::default());
     Some(Render {
         file: file.to_string(),
         seed,
@@ -1200,7 +1172,7 @@ fn main() {
         &corpus.per_model,
         &renders,
         corpus.geomean_of_medians,
-        &PLACEHOLDER_WEIGHTS,
+        &MetricWeights::default(),
         baseline_comparison,
     );
 
diff --git a/src/simlin-engine/examples/layout_eval_baseline.README.md b/src/simlin-engine/examples/layout_eval_baseline.README.md
index 9033eecf6..c007f65ca 100644
--- a/src/simlin-engine/examples/layout_eval_baseline.README.md
+++ b/src/simlin-engine/examples/layout_eval_baseline.README.md
@@ -13,17 +13,18 @@ LAYOUT_EVAL_MODELS=sir,teacup LAYOUT_EVAL_SEEDS=8 LAYOUT_EVAL_WRITE_BASELINE=1 \
   cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval
 ```
 
-It records the **current pre-Rung-0 layout behavior**, scored with the Phase-3
-`PLACEHOLDER_WEIGHTS` (NOT the calibrated weights). Do not seed the full metasd
+It records the **current pre-Rung-0 layout behavior**, scored with the committed
+calibrated `MetricWeights::default()`. It was re-seeded on 2026-05-23 after
+Phase 4 committed those weights and `layout_eval.rs` switched from the Phase-3
+`PLACEHOLDER_WEIGHTS` to `MetricWeights::default()`. Do not seed the full metasd
 corpus here: that is minutes-scale and produces a large JSON.
 
 ## When to regenerate
 
 REGENERATE this baseline:
 
-- **After Phase 4 commits the calibrated `MetricWeights`** (and `layout_eval.rs`
-  switches from `PLACEHOLDER_WEIGHTS` to `MetricWeights::default()`): the
-  weighted costs change, so the recorded sample costs are stale.
+- **Whenever the calibrated `MetricWeights::default()` change**: the weighted
+  costs change, so the recorded sample costs go stale.
 - **Before Phase 5 measures Rung 0's improvement**: the baseline must capture
   pre-Rung-0 behavior with the final calibrated weights so the Rung-0 diff is
   meaningful.
diff --git a/src/simlin-engine/examples/layout_eval_baseline.json b/src/simlin-engine/examples/layout_eval_baseline.json
index 9dea65888..660587f39 100644
--- a/src/simlin-engine/examples/layout_eval_baseline.json
+++ b/src/simlin-engine/examples/layout_eval_baseline.json
@@ -6,190 +6,190 @@
         {
           "seed": 0,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 1,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 2,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 3,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 4,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 5,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 6,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 7,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 42,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 123,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 456,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         },
         {
           "seed": 789,
           "metrics": {
-            "node_overlap": 0.05039334765728716,
+            "node_overlap": 0.03901734104046243,
             "node_connector_overlap": 0.0,
             "label_overlap": 0.0,
-            "crossings": 0.25,
+            "crossings": 0.0,
             "sprawl": 0.774985901426613,
             "edge_length_cv": 0.3203457592744067,
             "aspect_penalty": 0.0,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.5742262628325421
+          "weighted_cost": 0.03901734104046243
         }
       ],
-      "median_cost": 0.5742262628325421,
+      "median_cost": 0.03901734104046243,
       "spread": [
-        0.5742262628325421,
-        0.5742262628325421
+        0.03901734104046243,
+        0.03901734104046243
       ],
-      "best_of_k_cost": 0.5742262628325421,
+      "best_of_k_cost": 0.03901734104046243,
       "best_seed": 0,
       "median_seed": 0,
       "worst_seed": 0
@@ -200,194 +200,194 @@
         {
           "seed": 0,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 1,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 2,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 3,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 4,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 5,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 6,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 7,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 42,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 123,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 456,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         },
         {
           "seed": 789,
           "metrics": {
-            "node_overlap": 0.05327008392222038,
-            "node_connector_overlap": 0.007868550165242162,
-            "label_overlap": 0.004184421171500422,
-            "crossings": 0.3333333333333333,
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
             "sprawl": 0.7423022923087866,
             "edge_length_cv": 0.39340989843910823,
             "aspect_penalty": 0.06837606837606858,
             "chain_straightness": 0.0,
             "loop_compactness": 0.0
           },
-          "weighted_cost": 0.6996784533732872
+          "weighted_cost": 0.038540721316451254
         }
       ],
-      "median_cost": 0.6996784533732872,
+      "median_cost": 0.038540721316451254,
       "spread": [
-        0.6996784533732872,
-        0.6996784533732872
+        0.038540721316451254,
+        0.038540721316451254
       ],
-      "best_of_k_cost": 0.6996784533732872,
+      "best_of_k_cost": 0.038540721316451254,
       "best_seed": 0,
       "median_seed": 0,
       "worst_seed": 0
     }
   ],
-  "geomean_of_medians": 0.6338562482653269
+  "geomean_of_medians": 0.03877829892542217
 }
\ No newline at end of file

From d2f85c3d5060288926478e895b8f06ecac7ed7c7 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 11:48:56 -0700
Subject: [PATCH 29/38] engine: document population anchor margin and loop
 cycle canonicalization

The population reference-pair anchor beats the auto layout by only ~2.3%
under the committed default weights, far thinner than the other anchors
(reliability ~8.5%, fishbanks ~12%, logistic-growth ~58%). Document that
it is a marginal taste anchor so a future failure is read as a near-boundary
case rather than necessarily a metric regression; the assertion is unchanged.

Also note that canonicalize_cycle normalizes rotation but not traversal
direction. A reverse-direction duplicate essentially never arises for
directed SD feedback loops, and even if it did the shoelace polygon area is
direction-invariant, so both yield the identical isoperimetric penalty.
---
 src/simlin-engine/src/layout/metrics.rs | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index b5d3c7abb..201259e3b 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -515,6 +515,13 @@ fn dfs_cycles(
 /// Rotate a cycle so its smallest uid is first, preserving traversal direction.
 /// The DFS already guarantees the start (= minimum) is element 0, but rotating
 /// defensively keeps the canonical form correct for any caller.
+///
+/// Note: this canonicalizes rotation (start at min uid) but NOT traversal
+/// direction, so a directed cycle and its reverse canonicalize to distinct
+/// entries. That is harmless: a reverse-direction duplicate (essentially never
+/// present for directed SD feedback loops, which would require both directed
+/// edge sets in the graph) would compute the same isoperimetric penalty because
+/// the shoelace polygon area in `cycle_penalty` is direction-invariant.
 fn canonicalize_cycle(cycle: &[i32]) -> Vec<i32> {
     if cycle.is_empty() {
         return Vec::new();
@@ -2065,6 +2072,13 @@ mod tests {
         assert_human_beats_auto("fishbanks");
     }
 
+    // Population is a MARGINAL taste anchor: under the committed default weights
+    // its human cost (~0.0521) beats auto (~0.0533) by only ~2.3%, far thinner
+    // than the other anchors (reliability ~8.5%, fishbanks ~12%,
+    // logistic-growth ~58%). The layout is deterministic per seed, so the
+    // assertion is not flaky -- but if it ever fails it should be read as
+    // "population sits near the boundary" rather than necessarily a real metric
+    // regression. The robust signal lives in reliability/fishbanks/logistic-growth.
     #[test]
     fn test_reference_pair_population_human_beats_auto() {
         assert_human_beats_auto("population");

From ccdd783c15fb5a6412a6ad73645dc0abb5d03970 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 11:59:45 -0700
Subject: [PATCH 30/38] engine: rung 0 - select best layout by weighted_cost

---
 src/simlin-engine/src/layout/layout_tests.rs | 71 ++------------------
 src/simlin-engine/src/layout/mod.rs          | 39 +++++++----
 2 files changed, 30 insertions(+), 80 deletions(-)

diff --git a/src/simlin-engine/src/layout/layout_tests.rs b/src/simlin-engine/src/layout/layout_tests.rs
index 58fc8d1ec..77e82997b 100644
--- a/src/simlin-engine/src/layout/layout_tests.rs
+++ b/src/simlin-engine/src/layout/layout_tests.rs
@@ -528,70 +528,6 @@ fn test_extract_equation_deps_arrayed_uses_all_entries() {
     assert_eq!(deps, vec!["bar", "foo"]);
 }
 
-#[test]
-fn test_select_best_layout_fewest_crossings() {
-    let results = vec![
-        Ok(LayoutResult {
-            view: datamodel::StockFlow {
-                name: None,
-                elements: vec![ViewElement::Aux(view_element::Aux {
-                    name: "from_5_crossings".to_string(),
-                    uid: 1,
-                    x: 0.0,
-                    y: 0.0,
-                    label_side: LabelSide::Bottom,
-                    compat: None,
-                })],
-                view_box: Rect {
-                    x: 0.0,
-                    y: 0.0,
-                    width: 100.0,
-                    height: 100.0,
-                },
-                zoom: 1.0,
-                use_lettered_polarity: false,
-                font: None,
-                sketch_compat: None,
-            },
-            crossings: 5,
-            seed: 42,
-        }),
-        Ok(LayoutResult {
-            view: datamodel::StockFlow {
-                name: None,
-                elements: vec![ViewElement::Aux(view_element::Aux {
-                    name: "from_2_crossings".to_string(),
-                    uid: 2,
-                    x: 0.0,
-                    y: 0.0,
-                    label_side: LabelSide::Bottom,
-                    compat: None,
-                })],
-                view_box: Rect {
-                    x: 0.0,
-                    y: 0.0,
-                    width: 100.0,
-                    height: 100.0,
-                },
-                zoom: 1.0,
-                use_lettered_polarity: false,
-                font: None,
-                sketch_compat: None,
-            },
-            crossings: 2,
-            seed: 123,
-        }),
-    ];
-    let best = select_best_layout(results).unwrap();
-    // Should pick the one with 2 crossings (fewer is better)
-    assert_eq!(best.elements.len(), 1);
-    if let ViewElement::Aux(aux) = &best.elements[0] {
-        assert_eq!(aux.name, "from_2_crossings");
-    } else {
-        unreachable!("expected Aux element");
-    }
-}
-
 #[test]
 fn test_select_best_layout_lowest_seed_on_tie() {
     let results = vec![
@@ -617,7 +553,7 @@ fn test_select_best_layout_lowest_seed_on_tie() {
                 font: None,
                 sketch_compat: None,
             },
-            crossings: 3,
+            weighted_cost: 3.0,
             seed: 123,
         }),
         Ok(LayoutResult {
@@ -642,12 +578,13 @@ fn test_select_best_layout_lowest_seed_on_tie() {
                 font: None,
                 sketch_compat: None,
             },
-            crossings: 3,
+            weighted_cost: 3.0,
             seed: 42,
         }),
     ];
     let best = select_best_layout(results).unwrap();
-    // Should pick seed 42 (lower seed wins on tie)
+    // Equal weighted_cost on both: the lower seed wins the tie-break (still valid
+    // under the Rung-0 weighted_cost selection rule).
     assert_eq!(best.elements.len(), 1);
     if let ViewElement::Aux(aux) = &best.elements[0] {
         assert_eq!(aux.name, "from_seed_42");
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index 0ebfd3919..7fe7105b2 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -73,7 +73,11 @@ struct FlowAttachment {
 /// Result of a single layout generation, used to select the best among parallel attempts.
 struct LayoutResult {
     view: datamodel::StockFlow,
-    crossings: usize,
+    /// The full calibrated layout-quality metric for `view` (Sigma w_i * term_i,
+    /// with `MetricWeights::default()`). `select_best_layout` minimizes this; its
+    /// `crossings` term already captures the accurate connector-crossing count, so
+    /// there is no separate `crossings` field.
+    weighted_cost: f64,
     seed: u64,
 }
 
@@ -5264,8 +5268,10 @@ pub fn generate_layout_with_config(
     fresh_layout(model, &metadata, &config)
 }
 
-/// Generate multiple layouts with different seeds in parallel and pick the
-/// one with fewest crossings. On tie, the lowest seed wins.
+/// Generate multiple layouts with different seeds in parallel and pick the one
+/// that minimizes the full calibrated layout-quality metric (`weighted_cost`,
+/// which includes the accurate connector-crossing count alongside node/label
+/// overlap and loop compactness). On tie, the lowest seed wins.
 pub fn generate_best_layout(
     project: &datamodel::Project,
     model_name: &str,
@@ -5281,10 +5287,14 @@ pub fn generate_best_layout(
         let mut cfg = config.clone();
         cfg.annealing_random_seed = seed;
         let view = fresh_layout(model, &metadata, &cfg)?;
-        let crossings = count_view_crossings(&view);
+        // Score the candidate with the full calibrated metric. Its `crossings`
+        // term computes the accurate connector-crossing count internally, so we
+        // no longer call `count_view_crossings` directly here.
+        let metrics = metrics::compute_layout_metrics(&view, &cfg);
+        let weighted_cost = metrics.weighted_cost(&metrics::MetricWeights::default());
         Ok(LayoutResult {
             view,
-            crossings,
+            weighted_cost,
             seed,
         })
     };
@@ -5314,7 +5324,9 @@ pub fn compute_layout_metadata(
     compute_metadata(project, model_name, db_state)
 }
 
-/// Pick the layout with fewest crossings; on tie, the one from the lowest seed.
+/// Pick the layout that minimizes the full calibrated layout-quality metric
+/// (`weighted_cost`); on tie, the one from the lowest seed. The first `Err`
+/// short-circuits, and an empty result set is an error.
 fn select_best_layout(
     results: Vec<Result<LayoutResult, String>>,
 ) -> Result<datamodel::StockFlow, String> {
@@ -5325,13 +5337,14 @@ fn select_best_layout(
         best = Some(match best {
             None => lr,
             Some(prev) => {
-                if lr.crossings < prev.crossings
-                    || (lr.crossings == prev.crossings && lr.seed < prev.seed)
-                {
-                    lr
-                } else {
-                    prev
-                }
+                // NaN-safe by construction: `<` yields false when
+                // `lr.weighted_cost` is NaN, so a degenerate NaN-cost candidate
+                // never displaces a finite one. If BOTH costs are NaN the
+                // tie-break does not fire either (`==` is false for NaN), so the
+                // earlier candidate is kept -- deterministic regardless.
+                let better = lr.weighted_cost < prev.weighted_cost
+                    || (lr.weighted_cost == prev.weighted_cost && lr.seed < prev.seed);
+                if better { lr } else { prev }
             }
         });
     }

From 095ea6b8fbb9a4773faa70a726b9129b0ef618bc Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 12:11:38 -0700
Subject: [PATCH 31/38] engine: test rung-0 weighted_cost selection (incl.
 more-crossings case)

Add a dedicated layout_selection_tests.rs (wired via #[path], following the
crossings_tests.rs precedent so layout_tests.rs stays under the per-file line
cap) covering AC6.1: select_best_layout minimizes weighted_cost, not crossings.

The headline case builds two candidates whose crossing ordering is the
inverse of their weighted_cost ordering -- the lowest-cost view actually has
MORE connector crossings (asserted via count_view_crossings, so the inversion
is real, not narrative) -- and confirms the lowest-cost view wins, which the
retired crossings-only rule would not have done. Companion cases pin the
lowest-seed tie-break and NaN safety: a NaN challenger never displaces a
finite running best, while a NaN seeded first is sticky under the current
fold (pinned as a documented limitation rather than silently assumed away).
---
 .../src/layout/layout_selection_tests.rs      | 255 ++++++++++++++++++
 src/simlin-engine/src/layout/mod.rs           |   4 +
 2 files changed, 259 insertions(+)
 create mode 100644 src/simlin-engine/src/layout/layout_selection_tests.rs

diff --git a/src/simlin-engine/src/layout/layout_selection_tests.rs b/src/simlin-engine/src/layout/layout_selection_tests.rs
new file mode 100644
index 000000000..09a187b9d
--- /dev/null
+++ b/src/simlin-engine/src/layout/layout_selection_tests.rs
@@ -0,0 +1,255 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+//! Rung-0 layout-selection and regression-guard tests (Phase 5 of the layout
+//! quality eval): `select_best_layout` picks the lowest `weighted_cost`
+//! candidate (even when that means *more* connector crossings than a rival),
+//! the deterministic per-model `weighted_cost` ceiling guards against quality
+//! regressions, and a fixed seed reproduces a byte-identical layout. Split out
+//! of `layout_tests.rs` to keep that file under the per-file line cap, mirroring
+//! the `crossings_tests.rs` precedent.
+
+use super::*;
+use crate::datamodel;
+
+/// A scalar aux at (`x`, `y`) with a unique name, so a selected view can be
+/// identified by which marker element it carries.
+fn marker_aux(uid: i32, name: &str, x: f64, y: f64) -> ViewElement {
+    ViewElement::Aux(view_element::Aux {
+        name: name.to_string(),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+        compat: None,
+    })
+}
+
+fn sel_link(uid: i32, from_uid: i32, to_uid: i32) -> ViewElement {
+    ViewElement::Link(view_element::Link {
+        uid,
+        from_uid,
+        to_uid,
+        shape: LinkShape::Straight,
+        polarity: None,
+    })
+}
+
+/// Wrap a set of view elements into a `StockFlow` carrying `name` as its marker
+/// so `select_best_layout`'s winner is identifiable.
+fn sel_view(name: &str, elements: Vec<ViewElement>) -> datamodel::StockFlow {
+    datamodel::StockFlow {
+        name: Some(name.to_string()),
+        elements,
+        view_box: Rect {
+            x: 0.0,
+            y: 0.0,
+            width: 1000.0,
+            height: 1000.0,
+        },
+        zoom: 1.0,
+        use_lettered_polarity: false,
+        font: None,
+        sketch_compat: None,
+    }
+}
+
+/// A view whose two straight links cross exactly once (the diagonals of a
+/// square): `count_view_crossings == 1`.
+fn crossing_view(name: &str) -> datamodel::StockFlow {
+    sel_view(
+        name,
+        vec![
+            marker_aux(1, "a1", 0.0, 0.0),
+            marker_aux(2, "a2", 100.0, 100.0),
+            marker_aux(3, "a3", 0.0, 100.0),
+            marker_aux(4, "a4", 100.0, 0.0),
+            sel_link(10, 1, 2),
+            sel_link(11, 3, 4),
+        ],
+    )
+}
+
+/// A view whose two straight links share an endpoint and never cross:
+/// `count_view_crossings == 0`.
+fn non_crossing_view(name: &str) -> datamodel::StockFlow {
+    sel_view(
+        name,
+        vec![
+            marker_aux(1, "a1", 50.0, 50.0),
+            marker_aux(2, "a2", 100.0, 0.0),
+            marker_aux(3, "a3", 100.0, 100.0),
+            sel_link(10, 1, 2),
+            sel_link(11, 1, 3),
+        ],
+    )
+}
+
+/// AC6.1: selection minimizes `weighted_cost`, not crossings. The lowest-cost
+/// candidate is deliberately built from a view with MORE connector crossings
+/// than a rival, so the old "fewest crossings" rule would have picked the other
+/// one. We assert the crossing inversion is real (via `count_view_crossings`),
+/// then assert `select_best_layout` returns the lowest-`weighted_cost` view.
+#[test]
+fn test_select_best_layout_minimizes_weighted_cost_over_crossings() {
+    let crossing = crossing_view("more_crossings_low_cost");
+    let non_crossing = non_crossing_view("fewer_crossings_high_cost");
+
+    // The inversion is genuine, not just narrative: the candidate we expect to
+    // win actually has strictly more crossings than the one we expect to lose.
+    let crossing_count = count_view_crossings(&crossing);
+    let non_crossing_count = count_view_crossings(&non_crossing);
+    assert_eq!(crossing_count, 1, "crossing view should have one crossing");
+    assert_eq!(
+        non_crossing_count, 0,
+        "non-crossing view should have zero crossings"
+    );
+    assert!(
+        crossing_count > non_crossing_count,
+        "the low-cost candidate must have more crossings than its rival, \
+         so the choice differs from the old crossings-only rule"
+    );
+
+    // Hand-set costs so the MORE-crossings view is the cheaper one. Under the
+    // retired crossings-only rule `fewer_crossings_high_cost` (0 crossings)
+    // would win; under Rung 0 the lower `weighted_cost` wins.
+    let results = vec![
+        Ok(LayoutResult {
+            view: crossing,
+            weighted_cost: 1.0,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: non_crossing,
+            weighted_cost: 5.0,
+            seed: 123,
+        }),
+    ];
+
+    let best = select_best_layout(results).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("more_crossings_low_cost"),
+        "the lowest-weighted_cost candidate must win even with more crossings"
+    );
+}
+
+/// AC6.1 (tie-break): equal `weighted_cost`, the lower seed wins. This is the
+/// same rule `test_select_best_layout_lowest_seed_on_tie` (in `layout_tests.rs`)
+/// pins on hand-built `StockFlow` literals; here we re-assert it through the
+/// marker-named helpers for completeness alongside the cost-ordering case.
+#[test]
+fn test_select_best_layout_tie_breaks_on_lowest_seed() {
+    let results = vec![
+        Ok(LayoutResult {
+            view: sel_view("seed_456", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 2.5,
+            seed: 456,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("seed_42", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 2.5,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("seed_789", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 2.5,
+            seed: 789,
+        }),
+    ];
+
+    let best = select_best_layout(results).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("seed_42"),
+        "on a weighted_cost tie the lowest seed wins"
+    );
+}
+
+/// AC6.1 (NaN safety): a NaN-cost challenger must never displace a finite
+/// running best. `select_best_layout` keeps the running best whenever the
+/// challenger's `<` comparison is false, and `challenger < finite` is always
+/// false for a NaN challenger -- so a degenerate NaN-cost candidate encountered
+/// after a finite one cannot win.
+#[test]
+fn test_select_best_layout_nan_challenger_never_displaces_finite() {
+    // Finite candidate first, then NaN: the NaN must not displace it.
+    let finite_first = vec![
+        Ok(LayoutResult {
+            view: sel_view("finite", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 4.0,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 123,
+        }),
+    ];
+    let best = select_best_layout(finite_first).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("finite"),
+        "a NaN-cost challenger must not displace a finite running best"
+    );
+
+    // A NaN that arrives last among several finite candidates still loses: the
+    // finite minimum is already the running best by the time NaN is compared.
+    let nan_last = vec![
+        Ok(LayoutResult {
+            view: sel_view("hi", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 9.0,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("lo", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 1.0,
+            seed: 123,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 456,
+        }),
+    ];
+    let best = select_best_layout(nan_last).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("lo"),
+        "the finite minimum wins; a trailing NaN candidate cannot displace it"
+    );
+}
+
+/// AC6.1 (NaN-first limitation, documented): the current fold seeds the running
+/// best with the FIRST result and only replaces it when a challenger compares
+/// strictly less (or ties on cost with a lower seed). A NaN seeded as the
+/// running best is therefore sticky -- `finite < NaN` is false and `finite ==
+/// NaN` is false, so no later finite candidate overtakes it. In production
+/// (`generate_best_layout` runs seeds in the fixed order [42, 123, 456, 789]),
+/// this means a degenerate NaN-cost layout from the first seed would be shipped
+/// even when a later seed produced a finite, usable layout. This test pins that
+/// real behavior so the limitation is explicit, not silently assumed away;
+/// tightening the fold to skip NaN running-bests is tracked separately.
+#[test]
+fn test_select_best_layout_nan_first_is_sticky_documented_limitation() {
+    let nan_first = vec![
+        Ok(LayoutResult {
+            view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("finite", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 4.0,
+            seed: 123,
+        }),
+    ];
+    let best = select_best_layout(nan_first).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("nan"),
+        "a NaN seeded as the running best is sticky under the current fold \
+         (documented limitation)"
+    );
+}
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index 7fe7105b2..a3132f323 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -5360,3 +5360,7 @@ mod tests;
 #[cfg(test)]
 #[path = "crossings_tests.rs"]
 mod crossings_tests;
+
+#[cfg(test)]
+#[path = "layout_selection_tests.rs"]
+mod layout_selection_tests;

From 829396e291dccfc9bf2af12ee654f2c9311dccdc Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 12:15:58 -0700
Subject: [PATCH 32/38] engine: add deterministic weighted_cost regression
 guard

Add a fast, deterministic guard (AC7.1) over three tiny programmatic models
built with TestProject -- a population feedback model, a pure aux dependency
chain, and a two-stock transfer. Each is laid out at the fixed annealing seed
42 and its calibrated weighted_cost asserted at or below a committed per-model
ceiling. The ceilings are observed-cost ceilings (observed at seed 42:
pop 0.0533, chain 0.0, two_stock 0.1646) set a small margin above the
observed value, with a comment documenting how to regenerate them after an
intentional metric/weight change.

For AC7.2, a companion test takes a real fixed-seed layout and piles every
node onto the origin (blowing up node_overlap to ~6.2) and asserts the result
exceeds the pop ceiling, proving the guard discriminates good layouts from
bad rather than passing vacuously. The whole guard runs in well under a
second (tiny models, one seed each).
---
 .../src/layout/layout_selection_tests.rs      | 156 ++++++++++++++++++
 1 file changed, 156 insertions(+)

diff --git a/src/simlin-engine/src/layout/layout_selection_tests.rs b/src/simlin-engine/src/layout/layout_selection_tests.rs
index 09a187b9d..fdb00f732 100644
--- a/src/simlin-engine/src/layout/layout_selection_tests.rs
+++ b/src/simlin-engine/src/layout/layout_selection_tests.rs
@@ -12,6 +12,12 @@
 
 use super::*;
 use crate::datamodel;
+use crate::layout::metrics::{MetricWeights, compute_layout_metrics};
+use crate::test_common::TestProject;
+
+/// `TestProject::build_datamodel` synthesizes a single model named `"main"`, so
+/// every `generate_layout_with_config` call in this file targets that name.
+const MAIN_MODEL: &str = "main";
 
 /// A scalar aux at (`x`, `y`) with a unique name, so a selected view can be
 /// identified by which marker element it carries.
@@ -253,3 +259,153 @@ fn test_select_best_layout_nan_first_is_sticky_documented_limitation() {
          (documented limitation)"
     );
 }
+
+// ---- AC7: deterministic weighted_cost regression guard ----
+//
+// The thresholds below are observed-cost CEILINGS captured at the fixed
+// annealing seed 42 with the calibrated `MetricWeights::default()`. They guard
+// against layout-quality regressions: if a change to the layout algorithm,
+// metric, or weights pushes a tiny model's fixed-seed `weighted_cost` above its
+// ceiling, this test fails loudly. Each ceiling sits a small margin above the
+// observed cost (roughly observed * 1.15, or a small absolute floor when the
+// observed cost is 0) -- tight enough to catch a real regression, loose enough
+// not to flake on float noise.
+//
+// To regenerate after an INTENTIONAL metric/weight change: layout is
+// deterministic per seed, so print the new `weighted_cost` for each guard model
+// (e.g. add a temporary `println!` to `guard_fixed_seed_cost`), run this test
+// once, and reset each ceiling a small margin above the new observed value.
+// Lowering a ceiling that no longer matches reality is fine; raising one to
+// paper over a real regression is not.
+//
+// Observed at seed 42 (2026-05-23): pop = 0.0533, chain = 0.0,
+// two_stock = 0.1646.
+const GUARD_POP_COST_CEILING: f64 = 0.06;
+const GUARD_CHAIN_COST_CEILING: f64 = 0.05;
+const GUARD_TWO_STOCK_COST_CEILING: f64 = 0.19;
+
+/// Lay `project`'s `main` model out at the fixed seed 42 and return its
+/// calibrated `weighted_cost`. Seeding explicitly (rather than relying on the
+/// `LayoutConfig::default()` seed) keeps the guard pinned to one reproducible
+/// layout even if the default seed changes.
+fn guard_fixed_seed_cost(project: &datamodel::Project) -> f64 {
+    let config = LayoutConfig {
+        annealing_random_seed: 42,
+        ..LayoutConfig::default()
+    };
+    let view = generate_layout_with_config(project, MAIN_MODEL, config.clone(), None)
+        .expect("layout generation should succeed");
+    compute_layout_metrics(&view, &config).weighted_cost(&MetricWeights::default())
+}
+
+/// A population stock with births/deaths flows and two rate auxes -- the
+/// canonical tiny feedback model.
+fn guard_pop_model() -> datamodel::Project {
+    TestProject::new("guard_pop")
+        .stock("population", "100", &["births"], &["deaths"], None)
+        .flow("births", "population * birth_rate", None)
+        .flow("deaths", "population * death_rate", None)
+        .aux("birth_rate", "0.03", None)
+        .aux("death_rate", "0.01", None)
+        .build_datamodel()
+}
+
+/// A pure auxiliary dependency chain (no stocks): a -> b -> c -> d.
+fn guard_chain_model() -> datamodel::Project {
+    TestProject::new("guard_chain")
+        .aux("a", "1", None)
+        .aux("b", "a * 2", None)
+        .aux("c", "b + a", None)
+        .aux("d", "c * b", None)
+        .build_datamodel()
+}
+
+/// A two-stock transfer model: source -> transfer -> sink, rate-driven.
+fn guard_two_stock_model() -> datamodel::Project {
+    TestProject::new("guard_two_stock")
+        .stock("source", "100", &[], &["transfer"], None)
+        .stock("sink", "0", &["transfer"], &[], None)
+        .flow("transfer", "source * rate", None)
+        .aux("rate", "0.1", None)
+        .build_datamodel()
+}
+
+/// AC7.1: the fixed-seed `weighted_cost` of each tiny guard model stays at or
+/// below its committed ceiling. Fast and deterministic: three tiny models, one
+/// seed each.
+#[test]
+fn test_weighted_cost_regression_guard() {
+    let cases: [(&str, datamodel::Project, f64); 3] = [
+        ("pop", guard_pop_model(), GUARD_POP_COST_CEILING),
+        ("chain", guard_chain_model(), GUARD_CHAIN_COST_CEILING),
+        (
+            "two_stock",
+            guard_two_stock_model(),
+            GUARD_TWO_STOCK_COST_CEILING,
+        ),
+    ];
+
+    for (name, project, ceiling) in cases {
+        let cost = guard_fixed_seed_cost(&project);
+        assert!(
+            cost <= ceiling,
+            "{name}: fixed-seed weighted_cost {cost} exceeded ceiling {ceiling} \
+             -- a layout-quality regression (or an intentional metric/weight \
+             change that needs the ceiling regenerated)"
+        );
+    }
+}
+
+/// AC7.2: the guard ceiling actually discriminates good layouts from bad ones.
+/// We take a real fixed-seed layout of the pop model and pile every node onto
+/// the same coordinate, blowing up the node-overlap term, then assert the
+/// resulting `weighted_cost` exceeds the ceiling -- so a real layout that
+/// regressed to this level WOULD trip `test_weighted_cost_regression_guard`.
+/// This makes the failure direction explicit and testable without flakiness.
+#[test]
+fn test_weighted_cost_guard_rejects_degenerate_layout() {
+    let project = guard_pop_model();
+    let config = LayoutConfig {
+        annealing_random_seed: 42,
+        ..LayoutConfig::default()
+    };
+    let view = generate_layout_with_config(&project, MAIN_MODEL, config.clone(), None)
+        .expect("layout generation should succeed");
+
+    // Collapse every positioned node onto the origin so the shapes overlap
+    // maximally (links/aliases/groups have no independent position).
+    let mut degenerate = view.clone();
+    for elem in &mut degenerate.elements {
+        match elem {
+            ViewElement::Aux(a) => {
+                a.x = 0.0;
+                a.y = 0.0;
+            }
+            ViewElement::Stock(s) => {
+                s.x = 0.0;
+                s.y = 0.0;
+            }
+            ViewElement::Flow(f) => {
+                f.x = 0.0;
+                f.y = 0.0;
+            }
+            ViewElement::Module(m) => {
+                m.x = 0.0;
+                m.y = 0.0;
+            }
+            ViewElement::Cloud(c) => {
+                c.x = 0.0;
+                c.y = 0.0;
+            }
+            ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => {}
+        }
+    }
+
+    let degenerate_cost =
+        compute_layout_metrics(&degenerate, &config).weighted_cost(&MetricWeights::default());
+    assert!(
+        degenerate_cost > GUARD_POP_COST_CEILING,
+        "a degenerate all-overlapping layout (cost {degenerate_cost}) must exceed \
+         the guard ceiling {GUARD_POP_COST_CEILING}, proving the guard discriminates"
+    );
+}

From 53d9c2c20da7fd20c5b7eec2e867353059fae0f3 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 12:20:02 -0700
Subject: [PATCH 33/38] engine: assert per-seed layout determinism
 (byte-identical)

Add an AC8.1 determinism test: laying out the same model twice through
generate_layout_with_config at the same explicit seed yields two StockFlow
values that compare equal (StockFlow derives PartialEq, so every field --
positions, view box, element order -- is checked).

The test uses a seed-sensitive model (a stock fed/drained by ten leaf auxes
through two flows) where the SFDP/annealing RNG genuinely shapes the layout,
and additionally asserts that a different seed produces a different layout --
so the same-seed equality is a real determinism guarantee, not a vacuous pass
on a tiny model whose arrangement ignores the seed. A comment notes this
per-seed reproducibility is distinct from the Phase 3 M-seed statistical
sweep, which deliberately varies the seed.
---
 .../src/layout/layout_selection_tests.rs      | 66 +++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/src/simlin-engine/src/layout/layout_selection_tests.rs b/src/simlin-engine/src/layout/layout_selection_tests.rs
index fdb00f732..e75d8e7e3 100644
--- a/src/simlin-engine/src/layout/layout_selection_tests.rs
+++ b/src/simlin-engine/src/layout/layout_selection_tests.rs
@@ -409,3 +409,69 @@ fn test_weighted_cost_guard_rejects_degenerate_layout() {
          the guard ceiling {GUARD_POP_COST_CEILING}, proving the guard discriminates"
     );
 }
+
+/// A model with enough nodes (a stock fed/drained by ten leaf auxes through two
+/// flows) that the SFDP/annealing RNG genuinely shapes the layout, so two
+/// different seeds produce two different layouts. The tiny guard models above
+/// converge to one arrangement regardless of seed, which would make a
+/// determinism check vacuous; this model exercises the seeded path.
+fn guard_seed_sensitive_model() -> datamodel::Project {
+    let mut tp = TestProject::new("guard_seed_sensitive")
+        .stock("s", "100", &["inflow"], &["outflow"], None)
+        .flow("inflow", "a1 + a2 + a3 + a4 + a5", None)
+        .flow("outflow", "b1 + b2 + b3 + b4 + b5", None);
+    for i in 1..=5 {
+        tp = tp.aux(&format!("a{i}"), "1", None);
+        tp = tp.aux(&format!("b{i}"), "1", None);
+    }
+    tp.build_datamodel()
+}
+
+/// Lay `project`'s `main` model out at `seed`.
+fn layout_at_seed(project: &datamodel::Project, seed: u64) -> datamodel::StockFlow {
+    let config = LayoutConfig {
+        annealing_random_seed: seed,
+        ..LayoutConfig::default()
+    };
+    generate_layout_with_config(project, MAIN_MODEL, config, None)
+        .expect("layout generation should succeed")
+}
+
+/// AC8.1: a fixed seed reproduces a byte-identical layout. Generating the same
+/// model twice through `generate_layout_with_config` at the same explicit seed
+/// must yield two `StockFlow` values that compare equal (`StockFlow` derives
+/// `PartialEq`, so this checks every field -- positions, view box, element
+/// order -- not just element counts).
+///
+/// We use a seed-sensitive model and also assert that a DIFFERENT seed yields a
+/// DIFFERENT layout, so the same-seed equality is a real determinism guarantee
+/// rather than a vacuous pass on a model whose layout ignores the seed.
+///
+/// This per-seed reproducibility is distinct from the Phase 3 M-seed
+/// statistical sweep, which deliberately VARIES the seed to sample the layout
+/// distribution. Here the seed is held fixed and the layout must be exactly
+/// repeatable; there the seed sweeps and the layouts are expected to differ.
+/// The integration test `tests/layout.rs` already asserts `view1 == view2` for
+/// `generate_layout`; this focused in-crate test covers the
+/// `generate_layout_with_config` + explicit-seed Rung-0 path.
+#[test]
+fn test_layout_is_byte_identical_for_fixed_seed() {
+    let project = guard_seed_sensitive_model();
+
+    let view1 = layout_at_seed(&project, 7);
+    let view2 = layout_at_seed(&project, 7);
+    assert_eq!(
+        view1, view2,
+        "the same model at the same fixed seed must produce a byte-identical layout"
+    );
+
+    // Non-vacuity: a different seed must produce a different layout, proving the
+    // equality above reflects genuine per-seed determinism (not a seed-agnostic
+    // model where any pair would compare equal).
+    let other = layout_at_seed(&project, 999);
+    assert_ne!(
+        view1, other,
+        "a different seed should produce a different layout, so the same-seed \
+         equality is a meaningful determinism guarantee"
+    );
+}

From e78c8a50d7c105156ab51baeb03ee0f197aa8d85 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 12:30:21 -0700
Subject: [PATCH 34/38] engine: make select_best_layout prefer finite cost over
 NaN regardless of order

The fold in select_best_layout kept the running best whenever the
challenger's `<` comparison was false. That correctly drops a NaN-cost
challenger (`NaN < finite` is false), but it also failed to let a finite
challenger overtake a NaN running best (`finite < NaN` and `finite ==
NaN` are both false). So a degenerate NaN-cost layout from the first seed
in the fixed order [42, 123, 456, 789] was sticky: generate_best_layout
would ship it even when a later seed produced a usable finite layout.

Add explicit NaN branches so a NaN challenger never wins and a finite
challenger always beats a NaN running best, making the guard
order-independent. If every candidate is NaN the earliest is kept
deterministically. The metric is NaN-free in practice
(compute_layout_metrics guards every division), so this is defensive
robustness rather than an observed production failure.
---
 .../src/layout/layout_selection_tests.rs      | 53 ++++++++++++++-----
 src/simlin-engine/src/layout/mod.rs           | 31 +++++++----
 2 files changed, 61 insertions(+), 23 deletions(-)

diff --git a/src/simlin-engine/src/layout/layout_selection_tests.rs b/src/simlin-engine/src/layout/layout_selection_tests.rs
index e75d8e7e3..e7e832e55 100644
--- a/src/simlin-engine/src/layout/layout_selection_tests.rs
+++ b/src/simlin-engine/src/layout/layout_selection_tests.rs
@@ -227,18 +227,17 @@ fn test_select_best_layout_nan_challenger_never_displaces_finite() {
     );
 }
 
-/// AC6.1 (NaN-first limitation, documented): the current fold seeds the running
-/// best with the FIRST result and only replaces it when a challenger compares
-/// strictly less (or ties on cost with a lower seed). A NaN seeded as the
-/// running best is therefore sticky -- `finite < NaN` is false and `finite ==
-/// NaN` is false, so no later finite candidate overtakes it. In production
-/// (`generate_best_layout` runs seeds in the fixed order [42, 123, 456, 789]),
-/// this means a degenerate NaN-cost layout from the first seed would be shipped
-/// even when a later seed produced a finite, usable layout. This test pins that
-/// real behavior so the limitation is explicit, not silently assumed away;
-/// tightening the fold to skip NaN running-bests is tracked separately.
+/// AC6.1 (NaN safety, order-independent): a finite challenger must beat a NaN
+/// running best regardless of position. The fold seeds the running best with the
+/// FIRST result, so a degenerate NaN-cost layout from the first seed could
+/// otherwise become a sticky running best (`finite < NaN` is false and `finite
+/// == NaN` is false, so a plain `<` comparison never overtakes it). The fold
+/// special-cases a NaN running best so a later finite candidate always wins. In
+/// production (`generate_best_layout` runs seeds in the fixed order [42, 123,
+/// 456, 789]), this guarantees a usable finite layout is shipped whenever ANY
+/// seed produced one, no matter which seed degenerated.
 #[test]
-fn test_select_best_layout_nan_first_is_sticky_documented_limitation() {
+fn test_select_best_layout_finite_beats_nan_running_best() {
     let nan_first = vec![
         Ok(LayoutResult {
             view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]),
@@ -254,9 +253,35 @@ fn test_select_best_layout_nan_first_is_sticky_documented_limitation() {
     let best = select_best_layout(nan_first).expect("selection should succeed");
     assert_eq!(
         best.name.as_deref(),
-        Some("nan"),
-        "a NaN seeded as the running best is sticky under the current fold \
-         (documented limitation)"
+        Some("finite"),
+        "a finite challenger must beat a NaN running best regardless of order"
+    );
+}
+
+/// AC6.1 (NaN safety, all-NaN determinism): when EVERY candidate has a NaN cost,
+/// neither the `<` comparison nor the NaN special-cases fire (a NaN challenger is
+/// never "better"), so the earliest candidate is kept. This is deterministic
+/// regardless of seed order -- the production caller would ship the first seed's
+/// (degenerate) layout, but the choice is reproducible rather than arbitrary.
+#[test]
+fn test_select_best_layout_all_nan_keeps_earliest() {
+    let all_nan = vec![
+        Ok(LayoutResult {
+            view: sel_view("first", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 456,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("second", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 42,
+        }),
+    ];
+    let best = select_best_layout(all_nan).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("first"),
+        "when all candidates are NaN the earliest is kept deterministically"
     );
 }
 
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index a3132f323..f8705a907 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -5325,8 +5325,11 @@ pub fn compute_layout_metadata(
 }
 
 /// Pick the layout that minimizes the full calibrated layout-quality metric
-/// (`weighted_cost`); on tie, the one from the lowest seed. The first `Err`
-/// short-circuits, and an empty result set is an error.
+/// (`weighted_cost`); on tie, the one from the lowest seed. NaN-cost candidates
+/// (degenerate layouts) never win over a finite one regardless of position in
+/// the result set; if ALL candidates are NaN the earliest is kept
+/// deterministically. The first `Err` short-circuits, and an empty result set is
+/// an error.
 fn select_best_layout(
     results: Vec<Result<LayoutResult, String>>,
 ) -> Result<datamodel::StockFlow, String> {
@@ -5337,13 +5340,23 @@ fn select_best_layout(
         best = Some(match best {
             None => lr,
             Some(prev) => {
-                // NaN-safe by construction: `<` yields false when
-                // `lr.weighted_cost` is NaN, so a degenerate NaN-cost candidate
-                // never displaces a finite one. If BOTH costs are NaN the
-                // tie-break does not fire either (`==` is false for NaN), so the
-                // earlier candidate is kept -- deterministic regardless.
-                let better = lr.weighted_cost < prev.weighted_cost
-                    || (lr.weighted_cost == prev.weighted_cost && lr.seed < prev.seed);
+                // NaN-safe and order-independent: a degenerate NaN-cost
+                // candidate never wins over a finite one regardless of which
+                // came first. A plain `<` already drops a NaN *challenger*
+                // (`NaN < finite` is false), but it would NOT let a finite
+                // challenger overtake a NaN *running best* (`finite < NaN` and
+                // `finite == NaN` are both false), so the first seed's NaN would
+                // be sticky. The explicit NaN branches fix that asymmetry. If
+                // ALL candidates are NaN the challenger is never better, so the
+                // earliest is kept -- deterministic regardless.
+                let better = if lr.weighted_cost.is_nan() {
+                    false // a NaN challenger never wins
+                } else if prev.weighted_cost.is_nan() {
+                    true // a finite challenger always beats a NaN running best
+                } else {
+                    lr.weighted_cost < prev.weighted_cost
+                        || (lr.weighted_cost == prev.weighted_cost && lr.seed < prev.seed)
+                };
                 if better { lr } else { prev }
             }
         });

From 4754e6c2ab7f2fad10fb716598019092b680dd0d Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 12:45:08 -0700
Subject: [PATCH 35/38] doc: update simlin-engine context for layout-quality
 metric

Document the layout-quality eval branch's contract and structure changes in
src/simlin-engine/CLAUDE.md: the new pure metrics.rs (LayoutMetrics /
MetricWeights / compute_layout_metrics / weighted_cost) and eval_stats.rs
modules, the per-seed bit-identical layout determinism guarantee (#633), the
shift of generate_best_layout/select_best_layout from fewest-crossings to the
calibrated weighted_cost (NaN-safe), the rebuilt count_view_crossings /
build_view_segments crossing geometry, the now-public LAYOUT_SEEDS, the
diagram submodules widened to pub(crate) plus connector_polyline and the
label-free *_shape_bounds / rect-segment helpers they share with the metric,
and the gated layout_eval example.
---
 src/simlin-engine/CLAUDE.md | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/simlin-engine/CLAUDE.md b/src/simlin-engine/CLAUDE.md
index bf9149be8..3b3973977 100644
--- a/src/simlin-engine/CLAUDE.md
+++ b/src/simlin-engine/CLAUDE.md
@@ -114,8 +114,13 @@ The unit subsystem is partial-result throughout: a single bad declaration or one
 - **`src/ltm_augment.rs`** - Equation generators for LTM synthetic variables: `generate_link_score_equation_for_link` (ceteris-paribus link scores; takes `RefShape` and source dimension elements to drive per-shape PREVIOUS wrapping), `generate_loop_score_variables` (emits one `loop_score` per loop; relative loop scores are computed post-simulation in `ltm_post.rs`), `build_partial_equation_shaped` (AST-based PREVIOUS wrapping that holds matching-shape references live and wraps everything else, via `wrap_non_matching_in_previous` and `classify_expr0_subscript_shape`; arrayed-per-element-equation (`Ast::Arrayed`) targets get one partial per element assembled into an `Equation::Arrayed`), `link_score_var_name` (synthetic name helper: Bare gets the canonical `{from}\u{2192}{to}` form, FixedIndex prepends `[elem]` to from; the obsolete per-shape `\u{205A}wildcard`/`\u{205A}dynamic` Wildcard/DynamicIndex suffixes were retired -- those shapes now collapse onto the Bare name, since *every statically-describable* inlined reducer (whole-extent or sliced) is hoisted into a `$⁚ltm⁚agg⁚{n}` node and only a `DynamicIndex` reference -- `arr[i+1]`, a range, or the not-hoistable dynamic-index reducer carve-out `SUM(pop[idx,*])` -- and a whole-RHS variable-backed reducer's `Wildcard` argument reach this function), `quote_ident` (identifier quoting for equations). Array support: `classify_reducer` (walks target Expr2 AST to identify reducing builtins -- Linear for SUM/MEAN, Nonlinear for MIN/MAX/STDDEV/RANK, Constant for SIZE -- a thin reader of `ltm_agg::reducer_kind`), `generate_element_to_scalar_equation` (per-element link score equations for arrayed-to-scalar edges, used by both the variable-backed-reducer path and the `source[d] → $⁚ltm⁚agg⁚{n}` half) which dispatches on `ReducerKind` -- `generate_linear_partial` (SUM/MEAN algebraic shortcut), `generate_nonlinear_partial` (MIN/MAX nested binary calls; STDDEV the unrolled population-variance `sqrt` ceteris-paribus partial -- divisor `N`, matching `vm.rs::Opcode::ArrayStddev`, with the mean string-inlined; RANK the documented delta-ratio stand-in pinned by `test_generate_rank_keeps_delta_ratio` -- an order statistic, non-differentiable and unreachable via a real model RHS), `generate_scalar_to_element_equation` (per-element link score for the `$⁚ltm⁚agg⁚{n} → target[e]` half; takes a `source_ref_override: Option<&str>` so a multi-slot arrayed agg's `Δsource` denominator carries the projected `agg[<slot>]` subscript instead of the bare agg name, which wouldn't compile as a scalar), `substitute_reducers_in_expr0` (textually replaces a recognized reducer subexpression in an `Expr0` with its agg name, for the `$⁚ltm⁚agg⁚{n} → target` link score), `resolve_link_score_name_for_loop` (picks the Bare-or-FixedIndex link-score name a loop-score reference should target). Module link score formulas (black-box delta-ratio and composite-ref) are inlined directly into `link_score_equation_text` in `db.rs`.
 - **`src/ltm_post.rs`** - Post-simulation relative loop score computation. `compute_rel_loop_scores(results, loop_partitions)` normalizes each loop's `loop_score` series against the sum of absolute scores within its cycle partition, using SAFEDIV-0 semantics (zero denominator -> zero result). Called after simulation rather than emitted as synthetic equations to avoid O(P^2) equation-text growth on models with dense partitions.
 - **LTM open work**: known LTM bugs and improvements are tracked on GitHub under the `ltm` label; issue #488 is the pinned epic that organises them by area (core algorithm, discovery/post-sim, augmentation, module/array umbrellas). Each open `ltm`-labelled issue carries file:line references and a suggested fix, so a new session can pick a bite-sized piece without re-investigating the subsystem.
-- **`src/diagram/`** - Diagram/sketch rendering: `elements.rs`, `connector.rs`, `flow.rs`, `render.rs`, `common.rs`, `constants.rs`, `label.rs`, `arrowhead.rs`
-- **`src/layout/`** - Automatic diagram layout generation (available on all targets including WASM; uses serial fallback when rayon is unavailable). Two entry points: `generate_best_layout()` (public) generates a complete diagram from scratch; `incremental_layout()` (public) preserves existing element positions and adds/removes only what changed. Submodules: `sfdp.rs` (force-directed placement), `annealing.rs` (crossing reduction), `chain.rs` (stock-flow chain positioning), `config.rs` (layout parameters including `module_width`/`module_height`), `connector.rs` (link routing), `graph.rs` (graph data structures), `metadata.rs` (feedback loops, dominant periods), `placement.rs` (label optimization, normalization), `text.rs` (label sizing), `uid.rs` (UID management), `layout_tests.rs` (unit tests for composable layout blocks and incremental operations). `LayoutState` is the public mutable state struct used by both paths: `LayoutState::new()` for fresh layout, `LayoutState::from_existing_view()` for incremental. Incremental helpers: `identify_new_elements()`, `compute_new_element_positions()`, `settle_new_elements()`, `diff_connectors()`, `diff_clouds()`, `apply_deletion()`, `apply_rename()`. The convenience wrappers `generate_best_layout()` and `generate_layout_with_config()` remain as the primary public API for callers. Generates view elements for modules (not just stocks/flows/auxes).
+- **`src/diagram/`** - Diagram/sketch rendering: `elements.rs`, `connector.rs`, `flow.rs`, `render.rs`, `common.rs`, `constants.rs`, `label.rs`, `arrowhead.rs`. The `connector`/`elements`/`flow`/`label` submodules are `pub(crate)` so the layout-quality metric (`layout::metrics`) can reuse the exact same geometry the SVG renderer draws (a layout's score can never disagree with what is rendered). `connector.rs` exposes `pub(crate) connector_polyline` -- the polyline the renderer draws for a connector: straight links clipped to element boundaries (matching `render_straight_line`), arcs sampled along the arc circle (`ARC_POLYLINE_SAMPLES`, byte-identical to `render_arc`'s SVG), MultiPoint links returning empty (nothing is drawn for them today). `common.rs` carries the shared `Rect`/`Point`/`Circle` geometry plus `pub(crate)` rect/segment helpers (`rect_area`, `rect_overlap_area`, `rect_contains_point`, `segment_length_in_rect`, `rect_width`/`rect_height`). `elements.rs`/`flow.rs` expose label-free `*_shape_bounds` (`aux_shape_bounds`, `stock_shape_bounds`, `flow_shape_bounds`) alongside the label-merged `*_bounds`, so the metric can charge node-shape overlap and connector-under-shape against the bare shape and label-vs-label overlap separately (no double-count).
+- **`src/layout/`** - Automatic diagram layout generation (available on all targets including WASM; uses serial fallback when rayon is unavailable). Two entry points: `generate_best_layout()` (public) generates a complete diagram from scratch; `incremental_layout()` (public) preserves existing element positions and adds/removes only what changed. Submodules: `sfdp.rs` (force-directed placement), `annealing.rs` (crossing reduction), `chain.rs` (stock-flow chain positioning), `config.rs` (layout parameters including `module_width`/`module_height`), `connector.rs` (link routing), `graph.rs` (graph data structures), `metadata.rs` (feedback loops, dominant periods), `placement.rs` (label optimization, normalization), `text.rs` (label sizing), `uid.rs` (UID management), `metrics.rs` and `eval_stats.rs` (the layout-quality metric and its eval statistics; see below), `layout_tests.rs`/`crossings_tests.rs`/`layout_selection_tests.rs` (unit tests for composable layout blocks, crossing geometry, and best-of-k selection). `LayoutState` is the public mutable state struct used by both paths: `LayoutState::new()` for fresh layout, `LayoutState::from_existing_view()` for incremental. Incremental helpers: `identify_new_elements()`, `compute_new_element_positions()`, `settle_new_elements()`, `diff_connectors()`, `diff_clouds()`, `apply_deletion()`, `apply_rename()`. The convenience wrappers `generate_best_layout()` and `generate_layout_with_config()` remain as the primary public API for callers. Generates view elements for modules (not just stocks/flows/auxes).
+  - **Deterministic per seed (#633)**: `fresh_layout` and the incremental `diff_connectors` produce a bit-identical layout for a fixed `(model, seed)` across repeated calls. HashMap iteration order is per-process random, so every layout-affecting iteration over a `HashMap`/`HashSet` is materialized into a sorted `Vec` first: `run_sfdp_with_rigid_chains`'s `var_to_node` centroid/aux-placement loops, and `diff_connectors`'s new-edge / alias-match / preserved-link loops (which allocate sequential uids and append to `state.elements`).
+  - **Best-of-k selection by the calibrated metric**: `generate_best_layout` runs `LAYOUT_SEEDS` (now `pub` -- the eval sweep uses the same seed set as its production proxy) in parallel and `select_best_layout` picks the candidate minimizing `metrics::compute_layout_metrics(view, cfg).weighted_cost(&MetricWeights::default())` -- the full calibrated readability metric, NOT fewest crossings. Selection is NaN-safe (a degenerate NaN-cost layout never wins over a finite one regardless of order; all-NaN keeps the earliest) and ties break to the lowest seed. The `LayoutResult` struct carries `weighted_cost` (no separate `crossings` field; the metric's `crossings` term computes the accurate count internally).
+  - **`count_view_crossings` / `build_view_segments`**: `build_view_segments` is the single source of crossing geometry, shared with `metrics.rs`. Connector geometry comes from `diagram::connector::connector_polyline` (the exact drawn polyline: straight links clipped to boundaries, arcs sampled), and ALL element kinds are resolved by uid (Module/Alias links are no longer silently dropped -- the previous chord-based code only mapped Stock/Flow/Aux/Cloud). Vertex naming suppresses self- and shared-endpoint crossings (`elem_{uid}` endpoints, per-link `link_{uid}#{i}` interior arc samples); a flow's valve is injected as an `elem_{flow.uid}` pipe vertex so a link incident on the valve no longer miscounts as crossing the pipe.
+- **`src/layout/metrics.rs`** - Functional-Core layout-quality metric. `compute_layout_metrics(view, config) -> LayoutMetrics` is pure (no I/O), guaranteed finite (each division guards a zero denominator with 0), so empty/single-element views score all-zero. `LayoutMetrics` per-term costs (0.0 = ideal): `node_overlap` (pairwise node-shape-box overlap), `node_connector_overlap` (connector length under non-incident node shapes -- both on label-free shape boxes), `label_overlap` (per-label obscured fraction), `crossings`, `sprawl`, `edge_length_cv`, `aspect_penalty` (beyond the `TARGET_AR_MAX = 16:9` band), `loop_compactness` (mean isoperimetric `1 - Q` over feedback cycles), and the reserved `chain_straightness` (always 0.0). `LayoutMetrics::weighted_cost(&MetricWeights)` is `Sigma w_i * term_i`. `MetricWeights::default()` is the calibrated readability-dominant production set (overlap/crossings family at 1.0; `sprawl`/`edge_length_cv`/`aspect_penalty` deliberately 0.0 -- spreading out for legibility is good, not penalized; `loop_compactness` a gentle 0.25; `chain_straightness` 0.0). Both structs derive `Serialize`/`Deserialize` purely so the eval sweep can emit/round-trip its JSON artifacts.
+- **`src/layout/eval_stats.rs`** - Functional-Core benchstat-style statistics for the layout-quality seed-sample sweep: `geomean`/`percentile`/`median`/`mann_whitney_u` (non-parametric significance test) plus the `MetricSample`/`ModelStats`/`CorpusReport`/`Comparison` aggregation types and `compare(baseline, candidate)`. No I/O; every primitive returns a finite documented default (`0.0`, or a non-significant `p_value` of `1.0`) on empty/degenerate input, never NaN.
 
 ## Utilities
 
@@ -150,7 +155,7 @@ The unit subsystem is partial-result throughout: a single bad declaration or one
 - **`tests/simulate_systems.rs`** - Systems format simulation integration tests (fixtures in `test/systems-format/`)
 - **`tests/simulate_ltm.rs`** - LTM feature tests
 - **`tests/systems_roundtrip.rs`** - Systems format parse-translate-write round-trip tests
-- **`tests/layout.rs`** - Layout generation integration tests (chains, connectors, modules, LTM metadata, dominant periods, incremental layout operations)
+- **`tests/layout.rs`** - Layout generation integration tests (chains, connectors, modules, LTM metadata, dominant periods, incremental layout operations, and the per-seed bit-identical-layout determinism guard for both the fresh and incremental paths -- #633)
 - **`tests/json_roundtrip.rs`** - JSON serialization roundtrip
 - **`tests/roundtrip.rs`** - XMILE/MDL roundtrip tests
 - **`tests/vm_alloc.rs`** - VM memory allocation tests
@@ -160,3 +165,4 @@ The unit subsystem is partial-result throughout: a single bad declaration or one
 - **`benches/compiler.rs`** - Compiler pipeline benchmarks on real models (WRLD3, C-LEARN)
 - **`benches/simulation.rs`** - VM execution and compilation benchmarks (synthetic models)
 - **`benches/array_ops.rs`** - Array operation benchmarks (sum, broadcast, element-wise)
+- **`examples/layout_eval.rs`** - On-demand layout-quality corpus sweep (gated `[[example]]` with `required-features = ["png_render", "file_io"]`, so the default `--all-targets` build skips it). Scores each model's best-of-`LAYOUT_SEEDS` layout with `metrics::compute_layout_metrics`, renders best/median/worst plus the reference PNGs, and emits `metrics.json` + `index.html` + a diff against the committed `examples/layout_eval_baseline.json` under `target/` (see `examples/layout_eval_baseline.README.md`)

From 37e2719b5b7023c5196bb6a607ca389a8b7a0fe8 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 12:56:00 -0700
Subject: [PATCH 36/38] doc: add test plan for layout quality eval

---
 docs/README.md                                |  1 +
 .../2026-05-22-layout-quality-eval.md         | 85 +++++++++++++++++++
 2 files changed, 86 insertions(+)
 create mode 100644 docs/test-plans/2026-05-22-layout-quality-eval.md

diff --git a/docs/README.md b/docs/README.md
index a055aed08..f96bf8a82 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -36,6 +36,7 @@
 - [plans/](plans/README.md) -- Implementation plans (active and completed)
 - [test-plans/](test-plans/) -- Human verification plans for completed features
   - [test-plans/2026-05-22-engine-wasm-sim.md](test-plans/2026-05-22-engine-wasm-sim.md) -- Manual verification for the `@simlin/engine` selectable wasm engine (`Model.simulate({engine:'wasm'})`): re-running the automated gates, driving the gated/`#[ignore]`d heavy tests, and the human-judged extras (interactive scrubbing feel, VM-vs-wasm benchmark numbers); all 25 ACs already have automated coverage
+  - [test-plans/2026-05-22-layout-quality-eval.md](test-plans/2026-05-22-layout-quality-eval.md) -- Manual verification for the layout-quality eval: running the on-demand corpus sweep and inspecting its `target/layout-eval/` artifacts (metrics.json, the worst-first contact-sheet), plus the human-judgment calibration gate (best/median/worst ordering, reference-vs-auto scoring, weight magnitudes)
 - `implementation-plans/` -- Detailed phase-by-phase implementation plans, created during plan execution
 
 ## Security
diff --git a/docs/test-plans/2026-05-22-layout-quality-eval.md b/docs/test-plans/2026-05-22-layout-quality-eval.md
new file mode 100644
index 000000000..714220c42
--- /dev/null
+++ b/docs/test-plans/2026-05-22-layout-quality-eval.md
@@ -0,0 +1,85 @@
+# Test Plan: Layout Quality Evaluation
+
+Human verification plan for the layout-quality-eval feature (implementation plan
+`docs/implementation-plans/2026-05-22-layout-quality-eval/`). The automated suite
+proves the metric math, the selection rule, and per-seed determinism. This plan
+covers what automated tests cannot: that the on-demand corpus **sweep** emits the
+right artifacts, and that the **human-judgment** calls (best/median/worst
+ordering, reference-vs-auto scoring, weight magnitudes) match a modeler's eye.
+This is the gate for AC3.*, AC4.1-4.3, and the human-in-the-loop part of AC5.
+
+## Prerequisites
+
+- Repo at a commit including the layout-quality-eval branch, clean working tree.
+  Run `./scripts/dev-init.sh`.
+- Toolchain that can build `resvg` (the `png_render` feature):
+  `cargo build -p simlin-engine --features png_render,file_io --example layout_eval`
+  should finish without error.
+- A browser to open `target/layout-eval/index.html`, and a JSON viewer / `jq`
+  for `target/layout-eval/metrics.json`.
+- Automated gate already green:
+  `cargo test -p simlin-engine --lib layout::` and
+  `cargo test -p simlin-engine --features file_io --test layout`.
+
+## Phase 1: Time-boxed smoke run (fast confidence)
+
+| Step | Action | Expected |
+|------|--------|----------|
+| 1 | `LAYOUT_EVAL_MODELS=teacup,sir LAYOUT_EVAL_SEEDS=4 cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval` | Exits 0 (AC3.1). stdout prints a per-model `sir: median=… p25/p75=…/… best_of_k=… (M=4)` line and `corpus: geomean_of_medians=… (2 model(s) scored)`. |
+| 2 | `ls target/layout-eval/` | Contains `metrics.json`, `index.html`, and PNGs: `sir_best/median/worst/reference.png`, `teacup_best/median/worst/reference.png`. |
+| 3 | `git status --porcelain target/` | Empty — nothing under `target/` is tracked (AC3.5). |
+
+## Phase 2: Full corpus sweep + artifact inspection
+
+| Step | Action | Expected |
+|------|--------|----------|
+| 1 | `cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval` (no env overrides: all corpus keys, M=25) | Exits 0. Each model prints its median/spread/best-of-k line; corpus aggregate at the end. Runtime is minutes (deliberately kept out of `cargo test`). |
+| 2 | Open `target/layout-eval/metrics.json` | Valid JSON. Each `per_model[]` has the full `LayoutMetrics` breakdown (`node_overlap`, `node_connector_overlap`, `label_overlap`, `crossings`, `sprawl`, `edge_length_cv`, `aspect_penalty`, `loop_compactness`, `chain_straightness`) + `weighted_cost`, `median_cost`, `spread`, `best_of_k_cost`, `best/median/worst_seed`. Top level has `geomean_of_medians` and the `weights` set (AC3.2). |
+| 3 | Verify AC4.2 by hand: collect each model's `median_cost`, compute their (epsilon-floored) geometric mean, compare to `geomean_of_medians` | The two agree to a few decimals. |
+| 4 | Open `target/layout-eval/index.html` in a browser | Contact sheet sorted **worst weighted_cost first**. Each model row shows best/median/worst (and reference where present) thumbnails with a per-term cost breakdown and the `median / p25/p75 / best_of_k / M=25` summary (AC3.3). Header shows `geomean_of_medians` and the weight set. |
+
+## Phase 3: Human-judgment checks (the calibration gate, AC5.1 / AC5.2)
+
+These are the calls only a human can make; sign-off here closes the
+human-in-the-loop component of AC5.
+
+| Step | Action | Expected (human judgment) |
+|------|--------|---------------------------|
+| 1 (best/median/worst ordering) | For 3-4 models (e.g. `sir`, `fishbanks`, `reliability`, `population`), look at the three generated thumbnails side by side | "best" should genuinely look cleanest (fewest overlaps/crossings, labels readable); "worst" messiest. If the metric's "best" looks worse than its "worst", that is calibration feedback — record it, do not silently accept it. |
+| 2 (reference vs auto) | For each model shipping a `*_reference.png`, compare it to that model's `*_best.png` and read both `weighted_cost` values | For `reliability`, `fishbanks`, `population`, `logistic-growth`: the hand-authored reference should both look cleaner and carry the lower `weighted_cost` (the human<auto direction the AC5.2 tests pin). For `sir`: the reference deliberately obscures more labels, so the auto scores lower — confirm that asymmetry looks right. |
+| 3 (weight magnitudes, AC5.1) | Read the weight set in the `index.html` header / `metrics.json` | Overlap + crossings family carry the dominant weights; `sprawl`/`edge_length_cv`/`aspect_penalty` are 0; `loop_compactness` is a small positive nudge (0.25); `chain_straightness` is 0. Confirm these still match intent over the contact sheet, then sign off. |
+
+## End-to-End: baseline-vs-candidate regression diff (AC4.3)
+
+Validates the full statistical-comparison path (per-model + aggregate deltas with
+Mann-Whitney U p-values + significance) a future tuning change would rely on.
+
+1. Seed a baseline: `LAYOUT_EVAL_WRITE_BASELINE=1 cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval`. stdout notes the baseline was written to `examples/layout_eval_baseline.json`.
+2. Run a plain candidate sweep (no `WRITE_BASELINE`).
+3. In stdout and the `index.html` "baseline diff" section: each model shows a signed `delta_ratio` %, a `p_value`, and a significance verdict; an aggregate delta + verdict is shown.
+4. Sanity (matches automated AC4.5): an unchanged candidate vs the just-written baseline shows deltas near 0% and non-significant everywhere. A genuinely different candidate (e.g. after a deliberate weight change) shows non-zero deltas; large, consistent ones read as significant.
+5. Reset the committed baseline when done: `git checkout examples/layout_eval_baseline.json` (unless intentionally updating it).
+
+## End-to-End: skip-on-failure (AC3.6)
+
+Confirms one bad model never aborts the sweep.
+
+1. Run a sweep including a model whose file you temporarily make missing/unreadable.
+2. Expected: a `WARN: skipping {key}: {err}` line is printed, that model is absent from `metrics.json`/`index.html`, and the sweep still exits 0 and writes a report for the survivors. Restore the file afterward.
+
+## Human Verification Required
+
+| Criterion | Why Manual | Steps |
+|-----------|------------|-------|
+| AC5.1 (weight magnitudes) | Final numeric weights are a taste call over the contact sheet, not derivable from a test. | Phase 3 step 3. |
+| AC5.2 (reference-pair selection + sign-off) | Which models are agreed anchors and whether the human layout truly looks better is human judgment. | Phase 3 steps 1-2. |
+| AC8.2 (rungs 1-3 documented) | Documentation criterion; no implementation phase. | Read the "Additional Considerations / hill-climbing ladder" of `docs/design-plans/2026-05-22-layout-quality-eval.md`; confirm Rung 1 (`config.rs`/`sfdp.rs`/`annealing.rs`), Rung 2 (`annealing.rs`), Rung 3 (overlap-removal / obstacle-aware routing) are each named with their seam. |
+
+## Notes
+
+- Automated coverage was validated PASS against
+  `docs/implementation-plans/2026-05-22-layout-quality-eval/test-requirements.md`
+  (20/20 automated criteria; AC3.* and AC4.1-4.3 operational by design; AC8.2 documentation).
+- The corpus sweep is intentionally **not** part of `cargo test` (it renders PNGs
+  and runs for minutes). It is an on-demand developer tool whose artifacts live
+  under the gitignored `target/layout-eval/`.

From 35ef355faa832b6196b30bf1c6dd07e0c7db3f78 Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 14:24:44 -0700
Subject: [PATCH 37/38] engine: score loop_compactness on flow valves, not
 pipe-extent centers

build_loop_graph computed each loop-polygon vertex as the midpoint of the
element's bare shape box. For flows that box is flow_shape_bounds -- the valve
box unioned with every pipe point -- so its midpoint is the pipe-extent center,
which drifts off the valve (flow.x, flow.y) whenever the valve is dragged
off-center or the pipe is bent. Flows are vertices in the loop graph
(stock->flow->stock edges), so loop_compactness, and therefore the weighted_cost
that Phase-5 Rung-0 select_best_layout minimizes, was being skewed by pipe
geometry rather than true loop geometry.

Compute each loop vertex via diagram::connector::get_visual_center -- the same
visual center the SVG renderer uses -- which returns the valve for a flow and
the element center (the symmetric shape-box midpoint) for aux/stock/module/cloud,
so only flow vertices change. The node-membership gate stays node_shape_box. This
restores the module's same-geometry-as-renderer invariant and corrects the now-
stale LoopGraph/build_loop_graph doc comments that claimed the shape box is
always the element center. Addresses the PR #637 review comment.
---
 src/simlin-engine/src/layout/metrics.rs | 140 +++++++++++++++++++++---
 1 file changed, 125 insertions(+), 15 deletions(-)

diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index 201259e3b..2db161173 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -22,7 +22,7 @@ use crate::diagram::common::{
     self, Point, Rect, display_name, merge_bounds, rect_area, rect_overlap_area,
     segment_length_in_rect,
 };
-use crate::diagram::connector::{ARC_POLYLINE_SAMPLES, connector_polyline};
+use crate::diagram::connector::{ARC_POLYLINE_SAMPLES, connector_polyline, get_visual_center};
 use crate::diagram::elements::{
     aux_bounds, aux_shape_bounds, cloud_bounds, module_bounds, stock_bounds, stock_shape_bounds,
 };
@@ -366,34 +366,48 @@ const MAX_CYCLE_LEN: usize = 12;
 const MAX_CYCLES: usize = 64;
 
 /// Directed adjacency over positioned node-box elements, keyed by uid with
-/// sorted successor lists. The center of each node's bare *shape* box (which is
-/// symmetric about the element position, so it is the element center -- unlike
-/// the asymmetric label-merged `node_box`) is recorded for the polygon geometry.
+/// sorted successor lists. Each node's loop vertex is the renderer's VISUAL
+/// center (`diagram::connector::get_visual_center`) -- for a flow that is its
+/// VALVE `(flow.x, flow.y)`, NOT the pipe-extent center of `flow_shape_bounds`
+/// (which unions the valve box with every pipe point and so drifts off the valve
+/// when the pipe is bent or the valve is dragged off-center); for an
+/// aux/stock/module/cloud it is the element center, which already equals the
+/// symmetric shape-box midpoint. Using the same visual center the SVG renderer
+/// draws keeps the loop polygon faithful to the drawn diagram.
 struct LoopGraph {
     /// uid -> sorted, de-duplicated successor uids.
     adj: BTreeMap<i32, Vec<i32>>,
-    /// uid -> node-box center point.
+    /// uid -> node visual-center point (the valve for flows; the element center
+    /// for aux/stock/module/cloud).
     centers: BTreeMap<i32, Point>,
 }
 
 /// Build the directed loop graph from the view. Nodes are exactly the elements
-/// with a node box (`node_shape_box`). Edges to/from uids that are not
-/// positioned nodes are dropped. Edges come from:
+/// with a node box (`node_shape_box` -- aux/stock/module/cloud/flow; links,
+/// aliases, and groups are excluded). Each node's loop vertex is the renderer's
+/// VISUAL center (`get_visual_center`), so a flow's vertex is its VALVE
+/// `(flow.x, flow.y)`, NOT the pipe-extent center of `flow_shape_bounds` (the
+/// valve box unioned with every pipe point), which drifts off the valve when the
+/// pipe is bent or the valve is dragged off-center. For aux/stock/module/cloud
+/// the visual center is the element center, which already equals the symmetric
+/// shape-box midpoint, so those vertices are unchanged. Edges to/from uids that
+/// are not positioned nodes are dropped. Edges come from:
 ///   * each Link: `from_uid -> to_uid`;
 ///   * each Flow: for consecutive attached points, `source_attached -> flow.uid`
 ///     and `flow.uid -> dest_attached`, so a stock--flow--stock feedback path is
 ///     part of the graph (the flow's own valve is the intermediate node).
 fn build_loop_graph(view: &datamodel::StockFlow) -> LoopGraph {
+    // The node-membership gate stays `node_shape_box` (it defines which elements
+    // are loop nodes), but the loop VERTEX is the renderer's visual center, which
+    // is correct for every gated kind: the valve for a flow, the element center
+    // for aux/stock/module/cloud. `not_arrayed` matches `collect_connector_geometry`
+    // / `build_view_segments` (offset 0, deterministic).
+    let not_arrayed = |_: &str| false;
     let mut centers: BTreeMap<i32, Point> = BTreeMap::new();
     for e in &view.elements {
-        if let Some(r) = node_shape_box(e) {
-            centers.insert(
-                e.get_uid(),
-                Point {
-                    x: (r.left + r.right) / 2.0,
-                    y: (r.top + r.bottom) / 2.0,
-                },
-            );
+        if node_shape_box(e).is_some() {
+            let (cx, cy) = get_visual_center(e, &not_arrayed);
+            centers.insert(e.get_uid(), Point { x: cx, y: cy });
         }
     }
 
@@ -1926,6 +1940,102 @@ mod tests {
         );
     }
 
+    /// A stock--flow--stock loop whose flow has an extra pipe point placed far
+    /// from the valve, plus a closing link. The flow valve sits at `valve`; an
+    /// interior pipe point at `bend` (between the two attached endpoints) bends
+    /// the drawn pipe. `loop_compactness` must score the loop on the flow's
+    /// VALVE (its visual center), NOT on `flow_shape_bounds`' pipe-extent bbox
+    /// center, so the result must depend only on `valve` -- never on `bend`.
+    fn bent_flow_loop_view(valve: Point, bend: Point) -> datamodel::StockFlow {
+        let s1 = stock(1, "a", 0.0, 0.0);
+        let s2 = stock(2, "b", 300.0, 0.0);
+        let f = ViewElement::Flow(view_element::Flow {
+            name: "f".to_string(),
+            uid: 3,
+            x: valve.x,
+            y: valve.y,
+            label_side: LabelSide::Bottom,
+            points: vec![
+                view_element::FlowPoint {
+                    x: 0.0,
+                    y: 0.0,
+                    attached_to_uid: Some(1),
+                },
+                // An interior pipe point that bends the drawn pipe and stretches
+                // `flow_shape_bounds`' bbox, but is NOT the valve.
+                view_element::FlowPoint {
+                    x: bend.x,
+                    y: bend.y,
+                    attached_to_uid: None,
+                },
+                view_element::FlowPoint {
+                    x: 300.0,
+                    y: 0.0,
+                    attached_to_uid: Some(2),
+                },
+            ],
+            compat: None,
+            label_compat: None,
+        });
+        let link = straight_link(10, 2, 1);
+        make_view(vec![s1, s2, f, link])
+    }
+
+    #[test]
+    fn test_loop_compactness_scored_on_flow_valve_not_pipe_extent() {
+        // The loop vertex for a flow must be its VALVE (the renderer's visual
+        // center), not the center of `flow_shape_bounds` (which unions the valve
+        // box with every pipe point). Extending the pipe with a far interior
+        // point moves the pipe-extent bbox center but leaves the valve fixed, so
+        // `loop_compactness` -- which scores the feedback-loop polygon -- must be
+        // UNCHANGED. On the buggy (shape-box-midpoint) implementation it changes.
+        let valve = Point { x: 150.0, y: 200.0 };
+
+        // A pipe bend near the valve vs. one stretched far away. The valve is
+        // identical in both, so the loop polygon (stock--valve--stock) is too.
+        let near = compute_layout_metrics(
+            &bent_flow_loop_view(valve, Point { x: 150.0, y: 210.0 }),
+            &cfg(),
+        );
+        let far = compute_layout_metrics(
+            &bent_flow_loop_view(
+                valve,
+                Point {
+                    x: 150.0,
+                    y: 2000.0,
+                },
+            ),
+            &cfg(),
+        );
+
+        assert!(
+            near.loop_compactness > 0.0,
+            "fixture must form a real (positive-penalty) loop, got {}",
+            near.loop_compactness
+        );
+        assert!(
+            (near.loop_compactness - far.loop_compactness).abs() < 1e-12,
+            "loop_compactness must score the flow VALVE, not the pipe-extent bbox \
+             center: stretching the pipe changed it from {} to {}",
+            near.loop_compactness,
+            far.loop_compactness
+        );
+
+        // Non-vacuous guard: MOVING the valve (with the same pipe bend) DOES
+        // change the loop polygon, so the metric is not trivially constant.
+        let moved_valve = compute_layout_metrics(
+            &bent_flow_loop_view(Point { x: 150.0, y: 400.0 }, Point { x: 150.0, y: 210.0 }),
+            &cfg(),
+        );
+        assert!(
+            (near.loop_compactness - moved_valve.loop_compactness).abs() > 1e-9,
+            "moving the valve must change loop_compactness (test is not trivially \
+             constant): {} vs {}",
+            near.loop_compactness,
+            moved_valve.loop_compactness
+        );
+    }
+
     #[test]
     fn test_loop_compactness_deterministic_under_shuffle() {
         // loop_compactness is a mean over cycles, each computed from node-box

From 001bf34ef459d1466e4464f5e9e2e382f209e42a Mon Sep 17 00:00:00 2001
From: Bobby Powers <bobbypowers@gmail.com>
Date: Sat, 23 May 2026 19:03:39 -0700
Subject: [PATCH 38/38] engine: union per-segment node-connector overlap (no
 double-count across boxes)

node_connector_overlap is documented as a "fraction of total connector
length", but it accumulated segment_length_in_rect for every non-incident
node shape box, summing the per-box scalar lengths. When two non-incident
shape boxes overlapped, the connector sub-length in the overlap region was
counted once per box, so the term could exceed 1.0 -- breaking the
fraction-in-[0,1] contract, over-inflating weighted_cost (this term's weight
is 1.0), and skewing Phase-5 Rung-0 select_best_layout. Overlapping shape
boxes are common because a Flow's node_shape_box is its whole-pipe bounding
box, which frequently overlaps stocks/auxes/other flows.

The fix unions the per-segment clip intervals across all non-incident boxes
before summing, so each physical sub-length is counted at most once. A new
pub(crate) segment_clip_interval_in_rect exposes the Liang-Barsky [t0, t1]
core; segment_length_in_rect now delegates to it (behavior-preserving), and
the metric merges the intervals per segment. Single-non-incident-box layouts
are unchanged (nothing to merge); overlapping-box layouts see the term
decrease toward the true union fraction, lowering weighted_cost in the safe
direction so the AC5.2 human<auto anchors and the AC7 regression-guard
ceilings continue to hold. Addresses the PR #637 review comment.
---
 src/simlin-engine/src/diagram/common.rs |  53 +++--
 src/simlin-engine/src/layout/metrics.rs | 255 +++++++++++++++++++++++-
 2 files changed, 289 insertions(+), 19 deletions(-)

diff --git a/src/simlin-engine/src/diagram/common.rs b/src/simlin-engine/src/diagram/common.rs
index 82310d2c3..683747f4d 100644
--- a/src/simlin-engine/src/diagram/common.rs
+++ b/src/simlin-engine/src/diagram/common.rs
@@ -139,11 +139,13 @@ pub fn rad_to_deg(r: f64) -> f64 {
 
 // These rectangle/segment geometry primitives are the load-bearing helpers for
 // the layout quality metric (`layout::metrics`). `rect_width`/`rect_height`/
-// `rect_area`/`rect_overlap_area`/`segment_length_in_rect` are consumed there
-// (node-overlap, label-overlap, node-connector-overlap, sprawl, and aspect
-// terms); `rect_contains_point` is a primitive kept for completeness and
-// exercised by the inline tests below, so it stays `#[allow(dead_code)]` until
-// a caller needs it.
+// `rect_area`/`rect_overlap_area` are consumed there (node-overlap,
+// label-overlap, sprawl, and aspect terms), and `segment_clip_interval_in_rect`
+// is the Liang-Barsky core that `node_connector_overlap` unions across boxes.
+// `rect_contains_point` and `segment_length_in_rect` are primitives kept for
+// completeness and as the single-box reference oracle the metric's tests check
+// the union path against, so each stays `#[allow(dead_code)]` until a non-test
+// caller needs it.
 
 /// Width of a rect (right - left). May be negative for a degenerate/inverted rect.
 pub(crate) fn rect_width(r: &Rect) -> f64 {
@@ -173,9 +175,19 @@ pub(crate) fn rect_contains_point(r: &Rect, p: &Point) -> bool {
     p.x >= r.left && p.x <= r.right && p.y >= r.top && p.y <= r.bottom
 }
 
-/// Length of the portion of segment p0->p1 that lies within axis-aligned rect r.
-/// Returns 0 if the segment never enters r. Pure; no allocation.
-pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 {
+/// Clipped parameter interval `[t0, t1]` of segment `p0 + t*(p1-p0)` (t in
+/// [0,1]) that lies within axis-aligned rect `r`, or `None` if the segment never
+/// enters `r`. When `Some`, `0.0 <= t0 < t1 <= 1.0` (a zero-thickness touch
+/// where `t0 == t1` returns `None`, contributing no length). This is the
+/// Liang-Barsky core; `segment_length_in_rect` delegates to it, and
+/// `layout::metrics` uses the raw intervals to UNION a connector's coverage
+/// across multiple boxes so each physical sub-length is counted at most once.
+/// Pure; no allocation.
+pub(crate) fn segment_clip_interval_in_rect(
+    p0: &Point,
+    p1: &Point,
+    r: &Rect,
+) -> Option<(f64, f64)> {
     // Liang-Barsky clip of the parametric segment p0 + t*(p1-p0), t in [0,1],
     // against left/right/top/bottom slabs.
     let dx = p1.x - p0.x;
@@ -192,20 +204,20 @@ pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 {
     for (p, q) in edges {
         if p == 0.0 {
             if q < 0.0 {
-                return 0.0; // parallel and outside this slab
+                return None; // parallel and outside this slab
             }
         } else {
             let t = q / p;
             if p < 0.0 {
                 if t > t1 {
-                    return 0.0;
+                    return None;
                 }
                 if t > t0 {
                     t0 = t;
                 }
             } else {
                 if t < t0 {
-                    return 0.0;
+                    return None;
                 }
                 if t < t1 {
                     t1 = t;
@@ -213,8 +225,23 @@ pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 {
             }
         }
     }
-    let seg_len = (dx * dx + dy * dy).sqrt();
-    (t1 - t0).max(0.0) * seg_len
+    if t1 > t0 { Some((t0, t1)) } else { None }
+}
+
+/// Length of the portion of segment p0->p1 that lies within axis-aligned rect r.
+/// Returns 0 if the segment never enters r. Pure; no allocation. Delegates to
+/// `segment_clip_interval_in_rect` so the clip math lives in exactly one place.
+#[allow(dead_code)]
+pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 {
+    match segment_clip_interval_in_rect(p0, p1, r) {
+        Some((t0, t1)) => {
+            let dx = p1.x - p0.x;
+            let dy = p1.y - p0.y;
+            let seg_len = (dx * dx + dy * dy).sqrt();
+            (t1 - t0) * seg_len
+        }
+        None => 0.0,
+    }
 }
 
 #[cfg(test)]
diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
index 2db161173..ff139faf3 100644
--- a/src/simlin-engine/src/layout/metrics.rs
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -20,7 +20,7 @@ use std::collections::{BTreeMap, BTreeSet, HashSet};
 use crate::datamodel::{self, ViewElement};
 use crate::diagram::common::{
     self, Point, Rect, display_name, merge_bounds, rect_area, rect_overlap_area,
-    segment_length_in_rect,
+    segment_clip_interval_in_rect,
 };
 use crate::diagram::connector::{ARC_POLYLINE_SAMPLES, connector_polyline, get_visual_center};
 use crate::diagram::elements::{
@@ -180,6 +180,32 @@ struct ConnectorGeometry {
     length: f64,
 }
 
+/// Total length of the UNION of parameter intervals `[t0, t1]` (each `t` in
+/// [0,1]), counting each covered sub-length once. Sorts by start then sweep-
+/// merges, so overlapping/adjacent intervals collapse. The next interval merges
+/// when its start is `<= ` the current end (no epsilon needed; equality is
+/// tolerated as adjacency). Mutates `intervals` (sorts in place); empty input
+/// yields 0.0. Order-independent in its result. PURE.
+fn merged_interval_length(intervals: &mut [(f64, f64)]) -> f64 {
+    if intervals.is_empty() {
+        return 0.0;
+    }
+    intervals.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap_or(std::cmp::Ordering::Equal));
+    let mut total = 0.0;
+    let mut cur = intervals[0];
+    for &(t0, t1) in &intervals[1..] {
+        if t0 <= cur.1 {
+            // Overlapping or adjacent: extend the current run.
+            cur.1 = cur.1.max(t1);
+        } else {
+            total += cur.1 - cur.0;
+            cur = (t0, t1);
+        }
+    }
+    total += cur.1 - cur.0;
+    total
+}
+
 /// Polyline length: sum of segment lengths.
 fn polyline_length(points: &[Point]) -> f64 {
     points
@@ -669,16 +695,40 @@ pub fn compute_layout_metrics(
     let total_connector_length: f64 = connectors.iter().map(|c| c.length).sum();
 
     // --- node_connector_overlap (length inside non-incident shape boxes) ---
+    //
+    // Documented as a "fraction of total connector length", so each physical
+    // sub-length of connector covered by ANY non-incident node shape box must be
+    // counted AT MOST ONCE. Summing the per-box clipped length double-counts the
+    // region where two non-incident boxes overlap, which can push the normalized
+    // value above 1.0 (overlapping shape boxes are common -- a Flow's shape box is
+    // its whole-pipe bounding box, which frequently overlaps stocks/auxes/other
+    // flows). Instead, for EACH segment we collect the clip intervals over all
+    // non-incident boxes and UNION them (merge overlapping/adjacent intervals)
+    // before summing, so each covered sub-length contributes once and the term is
+    // a true fraction in [0, 1]. The per-segment merge result is order-independent,
+    // so this is deterministic regardless of `node_shape_boxes` iteration order.
     let node_connector_overlap = if total_connector_length > 0.0 {
         let mut inside = 0.0;
         for c in &connectors {
-            for (uid, rect) in &node_shape_boxes {
-                if c.incident_uids.contains(uid) {
-                    continue; // skip the connector's own endpoints
+            for seg in c.polyline.windows(2) {
+                let dx = seg[1].x - seg[0].x;
+                let dy = seg[1].y - seg[0].y;
+                let seg_len = (dx * dx + dy * dy).sqrt();
+                if seg_len == 0.0 {
+                    continue; // degenerate segment covers no length
                 }
-                for seg in c.polyline.windows(2) {
-                    inside += segment_length_in_rect(&seg[0], &seg[1], rect);
+                // Clip interval [t0, t1] of this segment within each non-incident
+                // box, in segment-parameter space (t in [0,1]).
+                let mut intervals: Vec<(f64, f64)> = Vec::new();
+                for (uid, rect) in &node_shape_boxes {
+                    if c.incident_uids.contains(uid) {
+                        continue; // skip the connector's own endpoints
+                    }
+                    if let Some(iv) = segment_clip_interval_in_rect(&seg[0], &seg[1], rect) {
+                        intervals.push(iv);
+                    }
                 }
+                inside += merged_interval_length(&mut intervals) * seg_len;
             }
         }
         inside / total_connector_length
@@ -848,6 +898,10 @@ fn view_bounding_box(node_boxes: &[(i32, Rect)]) -> Option<Rect> {
 mod tests {
     use super::*;
     use crate::datamodel::view_element::{self, LabelSide, LinkShape};
+    // `segment_length_in_rect` is the simple single-box clip; the AC1.3 tests and
+    // the union tests use it as an independent reference oracle to cross-check the
+    // production union path (which composes `segment_clip_interval_in_rect`).
+    use crate::diagram::common::segment_length_in_rect;
     use crate::diagram::constants::STOCK_WIDTH;
     use proptest::prelude::*;
 
@@ -1236,6 +1290,195 @@ mod tests {
         );
     }
 
+    // node_connector_overlap is documented as a "fraction of total connector
+    // length", so it must count each physical sub-length of connector covered by
+    // ANY non-incident node shape box AT MOST ONCE. When two non-incident shape
+    // boxes overlap, the prior implementation summed the per-box clipped lengths,
+    // double-counting the connector segment that lies in the overlap region; the
+    // normalized value could then exceed 1.0 and over-inflate weighted_cost. The
+    // correct value is the UNION length covered by (box A OR box B) over the total
+    // connector length. These two tests pin the union contract.
+
+    /// Length of segment p0->p1 covered by the UNION of `rects` (each physical
+    /// sub-length counted once). Independent reference implementation used by the
+    /// union tests: collect each rect's Liang-Barsky clip interval, merge, sum.
+    fn union_segment_length_in_rects(p0: &Point, p1: &Point, rects: &[Rect]) -> f64 {
+        let seg_len = {
+            let dx = p1.x - p0.x;
+            let dy = p1.y - p0.y;
+            (dx * dx + dy * dy).sqrt()
+        };
+        if seg_len == 0.0 {
+            return 0.0;
+        }
+        let mut intervals: Vec<(f64, f64)> = Vec::new();
+        for r in rects {
+            // Recover [t0, t1] from segment_length_in_rect's reported length: the
+            // tests use axis-aligned horizontal segments, so the clipped length is
+            // an exact multiple of seg_len. We instead build intervals from the
+            // covered length by reconstructing endpoints via the rect bounds for a
+            // horizontal segment at constant y (the only geometry these tests use).
+            let covered = segment_length_in_rect(p0, p1, r);
+            if covered <= 0.0 {
+                continue;
+            }
+            // For a horizontal segment (y constant) inside [left,right], the
+            // covered x-range is [max(min_x,left), min(max_x,right)]. Convert to t.
+            let (xa, xb) = (p0.x.min(p1.x), p0.x.max(p1.x));
+            let lo_x = xa.max(r.left);
+            let hi_x = xb.min(r.right);
+            let span = p1.x - p0.x;
+            let t_lo = ((lo_x - p0.x) / span).clamp(0.0, 1.0);
+            let t_hi = ((hi_x - p0.x) / span).clamp(0.0, 1.0);
+            let (t0, t1) = if t_lo <= t_hi {
+                (t_lo, t_hi)
+            } else {
+                (t_hi, t_lo)
+            };
+            intervals.push((t0, t1));
+        }
+        intervals.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap());
+        let mut total = 0.0;
+        let mut cur: Option<(f64, f64)> = None;
+        for (t0, t1) in intervals {
+            match cur {
+                None => cur = Some((t0, t1)),
+                Some((c0, c1)) => {
+                    if t0 <= c1 {
+                        cur = Some((c0, c1.max(t1)));
+                    } else {
+                        total += c1 - c0;
+                        cur = Some((t0, t1));
+                    }
+                }
+            }
+        }
+        if let Some((c0, c1)) = cur {
+            total += c1 - c0;
+        }
+        total * seg_len
+    }
+
+    #[test]
+    fn test_node_connector_overlap_union_of_overlapping_boxes() {
+        // A horizontal Link between aux #1 (0,0) and aux #2 (400,0) at y=0. Two
+        // NON-incident stocks straddle the line AND overlap each other:
+        //   stock #3 @ (200,0): shape x [177.5, 222.5]
+        //   stock #4 @ (210,0): shape x [187.5, 232.5]
+        // Their shape boxes overlap in x [187.5, 222.5]. The OLD code charged the
+        // connector for box A (length 45) PLUS box B (length 45) = 90, but the
+        // physical connector length under (A OR B) is the union x [177.5, 232.5]
+        // = 55. The new metric must equal union/total, and the old sum/total
+        // strictly exceeds it.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let s3 = stock(3, "s3", 200.0, 0.0);
+        let s4 = stock(4, "s4", 210.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, s3.clone(), s4.clone(), link]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let connectors = collect_connector_geometry(&view);
+        assert_eq!(connectors.len(), 1);
+        let c = &connectors[0];
+        let box3 = node_shape_box(&s3).unwrap();
+        let box4 = node_shape_box(&s4).unwrap();
+
+        // Independent union reference and the old (double-counting) sum.
+        let mut union_len = 0.0;
+        let mut old_sum_len = 0.0;
+        for seg in c.polyline.windows(2) {
+            union_len += union_segment_length_in_rects(&seg[0], &seg[1], &[box3, box4]);
+            old_sum_len += segment_length_in_rect(&seg[0], &seg[1], &box3)
+                + segment_length_in_rect(&seg[0], &seg[1], &box4);
+        }
+        let expected = union_len / c.length;
+        let old_value = old_sum_len / c.length;
+
+        // The fixture must actually overlap so the old sum strictly exceeds the
+        // union (otherwise the test proves nothing).
+        assert!(
+            old_value > expected + 1e-9,
+            "fixture must double-count: old {old_value} should exceed union {expected}"
+        );
+        assert!(
+            (m.node_connector_overlap - expected).abs() < 1e-9,
+            "node_connector_overlap must equal the union fraction: got {} expected {} \
+             (old double-counted value was {})",
+            m.node_connector_overlap,
+            expected,
+            old_value
+        );
+        assert!(
+            m.node_connector_overlap <= 1.0,
+            "node_connector_overlap is a fraction and must be <= 1.0, got {}",
+            m.node_connector_overlap
+        );
+    }
+
+    #[test]
+    fn test_node_connector_overlap_coincident_boxes_counted_once() {
+        // Starker variant: a connector sub-length fully inside TWO COINCIDENT
+        // non-incident boxes is counted ONCE, not twice. Two stocks at the same
+        // position (200,0) each fully contain the connector segment x [177.5,
+        // 222.5]. The OLD code would count that length twice (~2x); the union
+        // counts it once. We also build the fixture so the total connector length
+        // is small enough that the OLD value EXCEEDS 1.0 -- impossible for a
+        // documented fraction. Auxes are placed close in (x 180 and 220) so the
+        // drawn connector is short and lies entirely within the coincident boxes.
+        let a = aux(1, "a", 180.0, 0.0);
+        let b = aux(2, "b", 220.0, 0.0);
+        let s3 = stock(3, "s3", 200.0, 0.0);
+        let s4 = stock(4, "s4", 200.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, s3.clone(), s4.clone(), link]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let connectors = collect_connector_geometry(&view);
+        assert_eq!(connectors.len(), 1);
+        let c = &connectors[0];
+        let box3 = node_shape_box(&s3).unwrap();
+        let box4 = node_shape_box(&s4).unwrap();
+
+        let mut union_len = 0.0;
+        let mut old_sum_len = 0.0;
+        for seg in c.polyline.windows(2) {
+            union_len += union_segment_length_in_rects(&seg[0], &seg[1], &[box3, box4]);
+            old_sum_len += segment_length_in_rect(&seg[0], &seg[1], &box3)
+                + segment_length_in_rect(&seg[0], &seg[1], &box4);
+        }
+        let expected = union_len / c.length;
+        let old_value = old_sum_len / c.length;
+
+        // With two coincident boxes both covering the whole drawn connector, the
+        // union fraction is 1.0 and the old value is ~2.0 (> 1.0, impossible for a
+        // fraction).
+        assert!(
+            old_value > 1.0,
+            "coincident-box fixture must drive the OLD value above 1.0 (got {old_value})"
+        );
+        assert!(
+            (expected - 1.0).abs() < 1e-9,
+            "union of two coincident boxes covering the whole connector is the full \
+             length (fraction 1.0), got {expected}"
+        );
+        assert!(
+            (m.node_connector_overlap - expected).abs() < 1e-9,
+            "coincident non-incident boxes must be counted once: got {} expected {} \
+             (old double-counted value was {})",
+            m.node_connector_overlap,
+            expected,
+            old_value
+        );
+        assert!(
+            m.node_connector_overlap <= 1.0 + 1e-9,
+            "node_connector_overlap is a fraction and must be <= 1.0, got {}",
+            m.node_connector_overlap
+        );
+    }
+
     // --- AC1.4: label_overlap (per-label obscuration) ---
     //
     // label_overlap is the SUM over labeled elements of each label's obscured