diff --git a/docs/README.md b/docs/README.md
index b01911620..f96bf8a82 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -32,9 +32,11 @@
   - [design-plans/2026-05-19-clearn-residual.md](design-plans/2026-05-19-clearn-residual.md) -- Close C-LEARN's residual (#590/#591) as general Vensim import/simulation primitives: arrayed inline graphical functions, import-time macro shadowing, user-macro INITIAL recurrence, residual attribution; 5 phases
   - [design-plans/2026-05-20-wasm-backend.md](design-plans/2026-05-20-wasm-backend.md) -- WebAssembly code-generation backend: compile a model to one self-contained wasm module as an alternative to the bytecode VM (for fast interactive re-simulation), validated to full VM parity; 8 phases
   - [design-plans/2026-05-22-engine-wasm-sim.md](design-plans/2026-05-22-engine-wasm-sim.md) -- Integrate the wasm backend into `@simlin/engine` as a selectable engine (`Model.simulate({engine:'wasm'})`): vm-vs-wasm demux below the `Sim` facade in `DirectBackend`, a resumable blob run ABI for `runTo`, and a node VM-vs-wasm benchmark; 4 phases
+  - [design-plans/2026-05-22-layout-quality-eval.md](design-plans/2026-05-22-layout-quality-eval.md) -- Layout quality evaluation + hill-climbing harness: a pure geometry-accurate `LayoutMetrics` (overlap/sprawl/accurate-arc crossings) and benchstat-style seed-distribution stats, an on-demand corpus sweep that renders and scores layouts against human references, and Rung 0 (rank seeds by `weighted_cost`); 5 phases
 - [plans/](plans/README.md) -- Implementation plans (active and completed)
 - [test-plans/](test-plans/) -- Human verification plans for completed features
   - [test-plans/2026-05-22-engine-wasm-sim.md](test-plans/2026-05-22-engine-wasm-sim.md) -- Manual verification for the `@simlin/engine` selectable wasm engine (`Model.simulate({engine:'wasm'})`): re-running the automated gates, driving the gated/`#[ignore]`d heavy tests, and the human-judged extras (interactive scrubbing feel, VM-vs-wasm benchmark numbers); all 25 ACs already have automated coverage
+  - [test-plans/2026-05-22-layout-quality-eval.md](test-plans/2026-05-22-layout-quality-eval.md) -- Manual verification for the layout-quality eval: running the on-demand corpus sweep and inspecting its `target/layout-eval/` artifacts (metrics.json, the worst-first contact-sheet), plus the human-judgment calibration gate (best/median/worst ordering, reference-vs-auto scoring, weight magnitudes)
 - `implementation-plans/` -- Detailed phase-by-phase implementation plans, created during plan execution
 
 ## Security
diff --git a/docs/design-plans/2026-05-22-layout-quality-eval.md b/docs/design-plans/2026-05-22-layout-quality-eval.md
new file mode 100644
index 000000000..41195892a
--- /dev/null
+++ b/docs/design-plans/2026-05-22-layout-quality-eval.md
@@ -0,0 +1,564 @@
+# Layout Quality Evaluation and Hill-Climbing Harness Design
+
+## Summary
+
+This work builds a closed-loop measurement and tooling harness around `simlin-engine`'s
+automatic diagram layout, so that an agent (or human) can improve layout quality with
+evidence instead of guesswork. The core is two **pure** Rust modules that hold all the
+logic: a *quality-metric core* (`layout/metrics.rs`) that scores a laid-out diagram on
+explicit, scale-free aesthetic cost terms -- node overlap, connectors running through
+nodes, label overlap, edge crossings, sprawl, edge-length unevenness, and aspect ratio --
+and collapses them to a single `weighted_cost` scalar; and a *statistics core*
+(`layout/eval_stats.rs`) that treats a layout's quality as a distribution over random
+seeds, summarizing it with medians, percentiles, a corpus-wide geomean, and a Mann-Whitney
+U significance test (the way Go's `benchstat` compares benchmark runs). Crucially, the
+metric is computed on the *same geometry the PNG renderer draws*, so a layout's score can
+never disagree with how it actually looks. An imperative shell -- an on-demand example
+binary (`examples/layout_eval.rs`) -- composes these cores: it sweeps a curated corpus of
+models, lays each out across many seeds, scores them, renders the best/median/worst (and any
+hand-authored reference view) to PNG, and writes a metrics table plus an HTML contact-sheet.
+
+The architecture exists to enable a tight iteration loop: change a layout parameter or code
+path, run the sweep, read the geomean delta *and look at the rendered contact-sheet*, then
+keep or revert based on whether the change is statistically significant and visually better.
+The scalar `weighted_cost` is the hill to climb; the rendered images are the guardrail
+against optimizing the number while degrading the picture (Goodhart's law); and a small set
+of human-vs-AI reference pairs is the objective check that the metric agrees with human
+taste. With that loop in place, the design takes only the first, smallest algorithm step --
+"Rung 0," re-pointing seed selection to rank by the full metric instead of crossings alone
+-- and protects the gain with a fast deterministic CI guard. Rungs 1-3 (parameter search,
+a metric-driven search objective, and new layout passes) are documented as the forward path
+the harness is built to support, not built here.
+
+## Definition of Done
+
+This work builds the measurement and tooling infrastructure that lets an agent
+iteratively improve `simlin-engine`'s automatic diagram layout. It defines *what a good
+layout is* (an explicit, geometry-accurate quality metric) and *how to judge outputs* (a
+corpus sweep that renders and statistically scores layouts), then takes the first
+improvement step (Rung 0). The layout algorithm itself is not redesigned beyond Rung 0;
+rungs 1-3 are documented as the forward path.
+
+Today the layout engine judges a layout by exactly one quantity -- edge-crossing count
+(`annealing.rs` simulated-annealing cost; `select_best_layout` seed ranking) -- and there
+is no in-repo way to *see* a generated layout outside the browser. This design closes both
+gaps.
+
+1. **A pure `LayoutMetrics` module** (`src/simlin-engine/src/layout/metrics.rs`) computes
+   scale-free aesthetic *cost* terms (0 = ideal) from a `StockFlow` view, on the same
+   geometry the PNG renderer draws: `node_overlap`, `node_connector_overlap`,
+   `label_overlap`, `crossings`, `sprawl`, `edge_length_cv`, and `aspect_penalty`, plus
+   reserved zero-weighted structure terms. `weighted_cost(&MetricWeights) -> f64` collapses
+   them to one scalar to minimize.
+
+2. **Edge crossings are counted on real geometry** -- Arc links sampled to polylines
+   instead of straight chords -- fixing the chord approximation `count_view_crossings`
+   (`mod.rs`) applies to `Link`/Arc shapes today (flow polylines are already
+   segment-sampled). MultiPoint links currently render to nothing; see Additional
+   Considerations.
+
+3. **A Rust in-tree corpus sweep driver** (`src/simlin-engine/examples/layout_eval.rs`)
+   runs over a curated `test/` corpus: for each model it generates layouts across multiple
+   independent seeds, computes `LayoutMetrics` for each, renders the best/median/worst
+   layouts to PNG, and -- where the model ships a hand-authored view -- also scores and
+   renders that view as a reference. No pysimlin or other-binding surface is added.
+
+4. **The sweep reports statistically**: per-model median + spread over the seed samples,
+   a corpus geomean-of-medians aggregate, the production best-of-k cost, and a
+   baseline-vs-candidate comparison using a Mann-Whitney U significance test -- emitted as a
+   metrics table (JSON) and an HTML contact-sheet (best/median/worst per model with score
+   breakdowns), written to a gitignored output directory under `target/`.
+
+5. **Metric weights are calibrated and committed**: initial weights set from the
+   failure-mode priorities (overlap + crossings dominant; sprawl/aspect moderate;
+   structure ~0), refined against rendered examples, and validated by a reference-pair
+   check -- on agreed human-vs-AI model pairs the metric scores the human layout lower
+   (better) than the worse machine layout.
+
+6. **Rung 0 is wired in**: `select_best_layout` (`mod.rs`) ranks the candidate seeds by
+   `weighted_cost` (using the accurate crossing count) instead of crossings-only.
+
+7. **A deterministic CI regression guard**: a fast test over a few tiny models asserts
+   `weighted_cost` stays at or below a committed threshold, and the reference-pair ordering
+   is encoded as a test -- both within the workspace's 3-minute test-time budget.
+
+8. **The hill-climbing ladder (rungs 1-3) is documented** as the forward path (parameter
+   search; metric-driven annealing cost; new layout passes), naming the seam each rung
+   touches. (Satisfied by this plan's Additional Considerations -- no implementation task.)
+
+### Out of scope
+- Redesigning the layout algorithm beyond Rung 0 (rungs 1-3 are documented, not built).
+- Exposing metrics or rendering through pysimlin or any non-Rust binding.
+- A preference-judging UI or a trained preference model (the explicit metric is the chosen
+  signal; human preference enters only as up-front calibration).
+- SD-structure metrics as *weighted* terms (chain straightness, loop readability) -- the
+  fields exist but are zero-weighted initially, since these were de-prioritized.
+
+## Acceptance Criteria
+
+### layout-quality-eval.AC1: Metric terms are geometry-correct and scale-free
+- **layout-quality-eval.AC1.1 Success:** Two node boxes overlapping by a known area yield a `node_overlap` equal to the known overlap fraction.
+- **layout-quality-eval.AC1.2 Success:** Pairwise-disjoint nodes yield `node_overlap` = 0.
+- **layout-quality-eval.AC1.3 Success:** A connector whose polyline passes through a non-incident node box contributes to `node_connector_overlap`; one that avoids all non-incident boxes yields 0.
+- **layout-quality-eval.AC1.4 Success:** Two label boxes overlapping by a known area yield a matching `label_overlap`; non-overlapping labels yield 0.
+- **layout-quality-eval.AC1.5 Success:** `aspect_penalty` is 0 inside the target aspect band and positive outside it (a 1x10 bbox is penalized; a ~4:3 bbox is not).
+- **layout-quality-eval.AC1.6 Success:** `weighted_cost` equals the exact linear combination Σ wᵢ·termᵢ for given weights.
+- **layout-quality-eval.AC1.7 Edge:** An empty or single-element view yields all-zero terms with no NaN or divide-by-zero.
+- **layout-quality-eval.AC1.8 Success:** Uniformly scaling all coordinates leaves every normalized term unchanged within tolerance (scale invariance).
+
+### layout-quality-eval.AC2: Crossings are counted on real geometry
+- **layout-quality-eval.AC2.1 Success:** Two connectors that cross once yield a crossing count of 1; connectors sharing an endpoint yield 0.
+- **layout-quality-eval.AC2.2 Success:** An Arc connector that visually crosses another edge is counted via polyline sampling, on a constructed case where the straight-chord approximation does not count it. (MultiPoint links currently render to nothing, so faithfully counting them is deferred with that renderer gap -- see Additional Considerations.)
+- **layout-quality-eval.AC2.3 Success:** The crossing count is invariant under translation and rotation of the whole view.
+
+### layout-quality-eval.AC3: Corpus sweep produces renders and scores
+- **layout-quality-eval.AC3.1 Success:** `cargo run --release --example layout_eval` runs over the curated corpus and exits 0.
+- **layout-quality-eval.AC3.2 Success:** It writes `metrics.json` with per-model term breakdowns + `weighted_cost` and corpus aggregates.
+- **layout-quality-eval.AC3.3 Success:** It writes `index.html` referencing best/median/worst PNGs per model with score breakdowns.
+- **layout-quality-eval.AC3.4 Success:** Models shipping a hand-authored view get a reference render + score alongside the auto-layout.
+- **layout-quality-eval.AC3.5 Success:** All artifacts are written under `target/` (gitignored); nothing is committed.
+- **layout-quality-eval.AC3.6 Edge:** A model that fails to lay out or render is reported and skipped, not fatal to the sweep.
+
+### layout-quality-eval.AC4: Statistical reporting and comparison
+- **layout-quality-eval.AC4.1 Success:** Per model, M seeds produce M samples; the report includes median + spread (p25/p75) and the best-of-k production proxy.
+- **layout-quality-eval.AC4.2 Success:** The corpus aggregate is the geomean of per-model medians.
+- **layout-quality-eval.AC4.3 Success:** A baseline-vs-candidate run reports per-model and aggregate deltas, each with a Mann-Whitney U p-value / significance verdict.
+- **layout-quality-eval.AC4.4 Success:** `geomean`, median/percentile, and Mann-Whitney U match known reference values.
+- **layout-quality-eval.AC4.5 Edge:** Identical baseline and candidate yield a zero aggregate delta and a non-significant verdict.
+
+### layout-quality-eval.AC5: Calibration is validated objectively
+- **layout-quality-eval.AC5.1 Success:** Committed default `MetricWeights` give overlap and crossings the dominant weights and the reserved structure terms zero weight.
+- **layout-quality-eval.AC5.2 Success:** On the agreed human-vs-AI reference pairs, `weighted_cost(human) < weighted_cost(ai)` under the committed weights (encoded as a test).
+
+### layout-quality-eval.AC6: Rung 0 selection uses the full metric
+- **layout-quality-eval.AC6.1 Success:** `select_best_layout` picks the lowest-`weighted_cost` candidate, verified on constructed candidates where the lowest-cost layout has *more* crossings than another candidate (so the choice differs from crossings-only).
+- **layout-quality-eval.AC6.2 Success:** The existing layout test suite (`tests/layout.rs`, `layout_tests.rs`, `layout_review_tests.rs`) passes unchanged with the new selection.
+
+### layout-quality-eval.AC7: CI regression guard
+- **layout-quality-eval.AC7.1 Success:** A deterministic test over a few tiny models asserts `weighted_cost` <= a committed threshold and completes well within the test-time budget.
+- **layout-quality-eval.AC7.2 Failure:** Raising a guard model's `weighted_cost` above the threshold makes the test fail.
+
+### layout-quality-eval.AC8: Cross-cutting
+- **layout-quality-eval.AC8.1 Success:** A fixed seed reproduces a byte-identical layout (determinism), distinct from the M-seed statistical sampling.
+- **layout-quality-eval.AC8.2 Success:** Additional Considerations documents rungs 1-3 and names the seam each touches. (Satisfied by this design document itself; no implementation phase.)
+
+## Glossary
+
+- **System dynamics (SD) / stock-and-flow model**: A modeling approach that represents a
+  system as stocks (accumulations) connected by flows (rates of change) and feedback links.
+  Simlin builds, simulates, and visualizes these models; their visual form is the "diagram"
+  whose layout this work scores.
+- **StockFlow / `StockFlow` view**: The engine's data structure for a model diagram -- the
+  collection of `ViewElement`s (and their positions) that make up one visual view of a
+  model. The metric takes a `&StockFlow` as input.
+- **`ViewElement`**: A single positioned item in a `StockFlow` view (a stock, flow, auxiliary
+  variable, connector, alias, etc.). Layout assigns each one a position.
+- **Connector / Arc / MultiPoint / `Flow.points`**: Connectors are the links drawn between
+  elements. They are not always straight: an Arc is a curved link, a MultiPoint connector
+  bends through intermediate points, and a flow's pipe follows `Flow.points`. The crossing
+  count and metric sample these into polylines so curved/bent geometry is measured
+  faithfully.
+- **SFDP**: The force-directed graph layout algorithm used to place nodes (`layout/sfdp.rs`),
+  treating links as springs and nodes as mutually repelling charges. Its tunable parameters
+  (`k`, `c`, `p`, spacing constants) are the target of the documented Rung 1 parameter
+  search.
+- **Force-directed layout**: The broader family of layout algorithms (SFDP is one) that
+  positions nodes by simulating attractive/repulsive forces until the system settles.
+- **Simulated annealing (SA)**: The optimization pass (`layout/annealing.rs`) that refines a
+  layout by randomly perturbing it and accepting changes probabilistically, with the
+  acceptance probability cooling over time. It currently minimizes edge crossings only;
+  Rung 2 would feed it the full `weighted_cost`.
+- **Edge crossings**: Places where two connectors visually intersect -- a primary source of
+  diagram clutter, and today the *only* quantity layout optimizes.
+- **`count_view_crossings`**: The existing function (`mod.rs`) that counts crossings. Today it
+  approximates connectors as straight chords; this work refactors it to count on sampled
+  polylines so arcs and bends are handled correctly.
+- **`LAYOUT_SEEDS` / seed sampling**: Production runs layout from four fixed random seeds
+  (`[42, 123, 456, 789]`) and keeps the best result. Because layout is deterministic per
+  seed but its quality varies across seeds, the sweep instead samples *many* seeds to
+  characterize the quality distribution rather than a single lucky/unlucky result.
+- **`select_best_layout`**: The function (`mod.rs`) that picks the winning candidate among
+  the seed runs. Rung 0 re-points it from "fewest crossings" to "lowest `weighted_cost`."
+- **`LayoutMetrics` / `weighted_cost` / `MetricWeights`**: The new quality-metric types.
+  `LayoutMetrics` holds one cost term per aesthetic concern (0 = ideal, all scale-free);
+  `MetricWeights` is one weight per term; `weighted_cost` is their weighted sum `Σ wᵢ·termᵢ`
+  -- the single scalar an optimizer minimizes.
+- **`render_png` / resvg**: `render_png` (`diagram/render_png.rs`, behind the `png_render`
+  feature) rasterizes a diagram to a PNG; resvg is the Rust SVG-rendering library it uses.
+  Because the engine's SVG output is byte-identical to the product's TypeScript renderer,
+  the PNG faithfully reflects the real UI.
+- **geomean (geometric mean)**: The aggregate used to combine per-model median costs across
+  the corpus. Unlike the arithmetic mean, it averages ratios fairly so one large-cost model
+  cannot dominate the corpus score.
+- **Mann-Whitney U test**: A non-parametric significance test that decides whether two
+  samples differ. It is used to judge whether a baseline-vs-candidate cost difference is real
+  signal or seed noise, without assuming the cost distributions are normal.
+- **benchstat**: A Go tool that compares benchmark runs by reporting center, spread, and a
+  significance test over many samples. The statistics core deliberately mirrors its approach
+  for layout quality.
+- **best-of-k**: A "production proxy" statistic -- the minimum cost over k seeds -- that
+  mirrors what production actually ships (best of the fixed seed set), reported alongside the
+  full distribution.
+- **Reference pair (human-vs-AI)**: An agreed pairing of a hand-authored ("human") layout and
+  a machine-generated ("AI") layout of the same model. The metric is validated by requiring
+  `weighted_cost(human) < weighted_cost(ai)` -- an objective check that it agrees with human
+  taste.
+- **Contact-sheet**: The generated `index.html` report -- a grid showing each model's
+  best/median/worst renders (and any reference view) with their score breakdowns, sorted
+  worst-first -- inspected every iteration as the visual guardrail.
+- **"Rungs" / hill-climbing ladder**: The staged forward path for improving layout. Rung 0
+  (built here) changes only seed selection; Rungs 1-3 (documented, not built) are parameter
+  search, a metric-driven search objective, and new layout passes -- each "rung" a discrete,
+  measurable step up the quality hill.
+- **Goodhart('s law)**: "When a measure becomes a target, it ceases to be a good measure" --
+  i.e., any single fitness scalar will eventually be gamed. The contact-sheet renders,
+  visible per-term breakdowns, and reference-pair test are the design's guards against it.
+- **Functional core / imperative shell (FCIS)**: An architectural pattern that isolates pure,
+  side-effect-free logic (here, `metrics.rs` and `eval_stats.rs`) from the I/O-performing
+  shell (here, the `layout_eval.rs` example). The cores are heavily unit/property tested; the
+  shell stays thin.
+- **salsa**: The incremental computation framework backing the engine's model database; the
+  sweep driver syncs the salsa DB before laying out a model, reusing the path that the
+  existing `tests/layout.rs` uses to load corpus models.
+
+## Architecture
+
+The system has three parts, split along the functional-core / imperative-shell line: a
+**pure metric core** and a **pure statistics core** that the **imperative sweep driver**
+composes. Rendering already exists (`diagram::render_png`) and is reused unchanged.
+
+### Quality-metric core (`layout/metrics.rs`, pure)
+
+`compute_layout_metrics(view: &StockFlow, config: &LayoutConfig) -> LayoutMetrics` is a
+pure function with no I/O. It is computed on the **same geometry the renderer draws** --
+node bounding boxes, connector paths, and label boxes obtained from the `diagram` module's
+existing geometry helpers (`diagram::elements`/`flow` `*_bounds`, `diagram::connector`
+path, `diagram::label::label_bounds`) -- so a layout's score and its rendered PNG can never
+disagree. Those helpers are `pub fn`, but their modules (`elements`, `flow`, `label`,
+`connector`) are private in `diagram/mod.rs` today, so a prerequisite is exposing them
+`pub(crate)` for `layout` to call. Every term is a **cost** (0 = ideal) and normalized to be scale-free, so models
+of different sizes are comparable and the corpus can be aggregated.
+
+| Term | Definition (cost; 0 = ideal) | Pain it captures |
+|------|------------------------------|------------------|
+| `node_overlap` | Σ pairwise node-box overlap area / Σ node area | clutter |
+| `node_connector_overlap` | connector-polyline length inside non-incident node boxes / total connector length | connectors under/through nodes |
+| `label_overlap` | overlap area among label boxes and label-vs-node boxes / Σ label area | clutter |
+| `crossings` | connector-polyline crossings (arcs sampled) / connector count | tangled connectors |
+| `sprawl` | mean connector length / characteristic node size | wasted space |
+| `edge_length_cv` | stddev/mean of connector lengths | elements drifting far / unevenness |
+| `aspect_penalty` | deviation of bbox aspect ratio from a target band | unviewable shape |
+| `chain_straightness`, `loop_compactness` | reserved, zero-weighted | (SD structure; deferred) |
+
+Contract:
+
+```rust
+pub struct LayoutMetrics {
+    pub node_overlap: f64,
+    pub node_connector_overlap: f64,
+    pub label_overlap: f64,
+    pub crossings: f64,
+    pub sprawl: f64,
+    pub edge_length_cv: f64,
+    pub aspect_penalty: f64,
+    pub chain_straightness: f64, // reserved, weight 0
+    pub loop_compactness: f64,   // reserved, weight 0
+}
+
+pub struct MetricWeights { /* one f64 per term */ }
+
+impl LayoutMetrics {
+    /// Σ wᵢ·termᵢ — the scalar an optimizer minimizes.
+    pub fn weighted_cost(&self, w: &MetricWeights) -> f64;
+}
+```
+
+`node_overlap`/`node_connector_overlap`, `crossings`, and the sprawl terms pull in opposite
+directions (compact vs. non-overlapping). That tension is intended: the weights set the
+balance, and the overlap terms keep "minimize area" from collapsing the layout.
+
+**Accurate crossings.** The `crossings` term, and a refactored `count_view_crossings`,
+operate on connector geometry sampled to polylines (Arc links plus `Flow.points`), not
+straight chords. This requires factoring the arc geometry -- currently entangled with
+SVG-string emission in `connector::render_arc` (which returns a `String`) -- into a polyline
+producer shared by the renderer and the metric, so both see identical geometry. This is the
+highest-effort item in Phase 1, and the factor-out must keep `render_svg` byte-for-byte
+unchanged (a TS-vs-Rust parity test asserts it). It both feeds the metric and fixes a latent
+undercount in today's seed selection. (MultiPoint links currently render to an empty group,
+so they have no drawn geometry to match; they are a known gap, not measured here.)
+
+### Statistics core (`layout/eval_stats.rs`, pure)
+
+Layout is deterministic at a fixed seed (RNGs are `StdRng::seed_from_u64`; no entropy
+source; the `par_iter` over seeds preserves order), so a specific layout is exactly
+reproducible. But a layout's *quality is a distribution over seed space*, and production
+samples it at the four fixed `LAYOUT_SEEDS` and takes the min. Evaluating a change on one
+fixed seed-set conflates a real improvement with seed luck. The statistics core treats
+evaluation the way Go's `benchstat` treats benchmarks: many samples, center + spread, and a
+significance test on differences.
+
+```rust
+pub struct MetricSample { pub seed: u64, pub metrics: LayoutMetrics, pub weighted_cost: f64 }
+
+pub struct ModelStats {
+    pub model: String,
+    pub samples: Vec<MetricSample>, // one per seed
+    pub median_cost: f64,
+    pub spread: (f64, f64),         // e.g. (p25, p75)
+    pub best_of_k_cost: f64,        // production proxy: min over k seeds
+    pub best_seed: u64, pub median_seed: u64, pub worst_seed: u64,
+}
+
+pub struct CorpusReport { pub per_model: Vec<ModelStats>, pub geomean_of_medians: f64 }
+
+/// Per-model and aggregate delta, each with a Mann-Whitney U p-value (non-parametric;
+/// robust to the non-normal cost distributions layout produces).
+pub fn compare(baseline: &CorpusReport, candidate: &CorpusReport) -> Comparison;
+```
+
+`geomean` (not arithmetic mean) aggregates normalized ratios across heterogeneous models so
+one large-cost model can't dominate; `median`/percentiles summarize each model's
+distribution; Mann-Whitney U decides whether a baseline-vs-candidate delta is signal or
+noise. All are pure, table-testable functions.
+
+### Sweep driver (`examples/layout_eval.rs`, imperative shell)
+
+The shell loads each model in a curated corpus list (XMILE via `open_xmile` and Vensim via
+`open_vensim`, as `examples/backend_bench.rs` does, then salsa-syncs the project as the
+DB-backed layout tests do), and for each model:
+
+1. Runs layout for M independent seeds, producing M `MetricSample`s (and the best-of-k
+   production proxy). The per-seed seam is the existing `generate_layout_with_config`
+   (`mod.rs`, `pub`) -- its single `annealing_random_seed` drives both the SFDP and
+   annealing RNGs -- or the equivalent `generate` closure inside `generate_best_layout`.
+2. Renders the best/median/worst layouts to PNG via `diagram::render_png` (after writing
+   the generated `StockFlow` onto the model's view, which `render_png` reads as
+   `views.first()`).
+3. If the model file ships a non-empty hand-authored view, renders and scores that view
+   untouched as a **reference**.
+
+It then emits, to a gitignored dir under `target/layout-eval/`:
+- `metrics.json` -- per-model `ModelStats` with term breakdowns, plus corpus aggregates.
+- `index.html` -- a contact-sheet sorted worst-cost-first; each cell shows the
+  best/median/worst renders (and the reference, where present) with their metric
+  breakdowns; the header shows corpus geomean and the baseline delta with significance.
+- baseline diff -- `compare()` against a small committed `baseline.json`, printed and
+  embedded in the report.
+
+The driver declares `required-features = ["png_render", "file_io"]` and is run on demand
+(`cargo run --release --example layout_eval`); it is not part of `cargo test`.
+
+### Rung 0 wiring
+
+`select_best_layout` (`mod.rs`) currently keeps the candidate with the fewest crossings
+(tie-break on seed). Rung 0 changes it to keep the candidate with the lowest
+`weighted_cost` (computed with the accurate crossing count), tie-break on seed. This is the
+smallest, immediately-measurable improvement: "best of the candidate seeds" becomes "best
+by the full metric." It changes only selection, not the search.
+
+### The iteration loop this enables
+
+Change a parameter or code path -> run the sweep -> read `metrics.json` *and look at the
+contact-sheet* -> keep or revert based on the geomean delta and its significance, guarded by
+the rendered images. The scalar `weighted_cost` is the hill; the renders are the guardrail
+against gaming it (Goodhart); the reference pairs are the objective check that the metric
+agrees with human taste.
+
+## Existing Patterns
+
+Investigation grounded every touch point in current code; this design adds pure modules and
+one in-tree example, and re-points one existing decision function.
+
+- **Layout module and decision seams.** `src/simlin-engine/src/layout/` holds `mod.rs`
+  (orchestration; `count_view_crossings`; `select_best_layout`; `generate_best_layout`
+  running the `LAYOUT_SEEDS = [42,123,456,789]` candidates via `par_iter`), `sfdp.rs`
+  (force placement, `StdRng::seed_from_u64`), and `annealing.rs` (crossings-only SA cost).
+  This design adds `metrics.rs` and `eval_stats.rs` beside them and edits
+  `select_best_layout`. Terminology (SFDP, annealing, pinned nodes, chains) follows
+  `docs/design-plans/2026-03-27-incremental-layout.md`.
+- **Rendering already exists.** `src/simlin-engine/src/diagram/` provides `render.rs`
+  (`render_svg`), `render_png.rs` (`render_png` / `svg_to_png`, resvg + embedded
+  Roboto-Light, behind the `png_render` feature), with geometry in `elements.rs`,
+  `flow.rs`, `connector.rs`, `label.rs` (`label_bounds`), `common.rs` (`Rect`,
+  `calc_view_box`), and shared `constants.rs`. The metric reuses these geometry helpers so
+  scores match the rendered image -- but only `common`/`constants` are `pub mod` today, so
+  the others must be exposed (see Architecture). `render_svg` is asserted byte-identical to
+  the TS renderer by `src/diagram/tests/svg-rendering.test.ts`, so the PNG faithfully
+  reflects the product UI -- and that test is the tripwire any connector-geometry refactor
+  must not break.
+- **In-tree example precedent.** `src/simlin-engine/examples/backend_bench.rs` is an
+  existing on-demand example (auto-discovered; loads models via `std::fs` +
+  `open_vensim`/`open_xmile`). `examples/layout_eval.rs` follows its shape; the
+  `required-features` mechanism (used today by the crate's `[[test]]` entries, not by any
+  example) means adding a new `[[example]]` block to `Cargo.toml`.
+- **Corpus loading.** `tests/layout.rs` loads XMILE via `load_project`/`open_xmile`; its
+  DB-backed tests show the salsa-sync-then-layout pattern (`SimlinDb::default()` ->
+  `sync_from_datamodel_incremental` -> pass `Some((&mut db, source_project))`). The sweep
+  combines that with `open_vensim` for the Vensim `test/metasd` models. (`verify_layout`
+  itself is only an assertion helper, not a loader.)
+- **Test-time budget.** Per `CLAUDE.md` / `docs/dev/rust.md`, `cargo test --workspace`
+  runs under a 3-minute cap and individual tests complete in seconds. The full corpus sweep
+  therefore stays in the example (not in tests); only a tiny deterministic guard runs in the
+  test suite.
+- **FCIS.** Pure cores (`metrics.rs`, `eval_stats.rs`) hold all logic and are unit/property
+  tested to the project's coverage bar; the example is a thin imperative shell.
+
+No pattern divergence: pure functions beside existing pure layout code, one example beside
+an existing example, one edit to an existing selection function.
+
+## Implementation Phases
+
+<!-- START_PHASE_1 -->
+### Phase 1: Quality-metric core + accurate crossings
+**Goal:** A pure, geometry-accurate `LayoutMetrics` and a polyline-based crossing count.
+
+**Components:**
+- Expose the `diagram` geometry modules (`elements`, `flow`, `label`, `connector`) as
+  `pub(crate)` -- they are private today, so `layout::metrics` cannot call their `*_bounds` /
+  path helpers without this.
+- `src/simlin-engine/src/layout/metrics.rs` (new) -- `LayoutMetrics`, `MetricWeights`,
+  `compute_layout_metrics(view, config)`, `weighted_cost`. Each term computed on the
+  `diagram` module's geometry helpers.
+- Connector arc-to-polyline geometry factored out of `connector::render_arc` (highest-effort
+  item; geometry is currently entangled with SVG-string building), reused by the renderer and
+  the metric. The renderer must be re-routed through it without changing its output.
+- `count_view_crossings` (`mod.rs`) refactored to count on polylines instead of straight
+  chords (Arc/`Link` shapes; flow polylines are already sampled).
+- Unit tests on hand-built tiny views with known geometry (two boxes overlapping by a known
+  fraction; two segments crossing once; shared-endpoint connectors -> 0; a 1x10 bbox ->
+  known aspect penalty; an arc that crosses where its chord would not). Property tests:
+  overlap symmetric and scale-invariant; crossings invariant under translation/rotation.
+
+**Dependencies:** none.
+
+**Done when:** the metric terms match the hand-computed values, scale/translation
+invariance holds, the polyline crossing count differs from the old chord count on the
+constructed arc case, `render_svg` output is unchanged (the `svg-rendering.test.ts` parity
+test still passes), and `cargo test` passes. Covers `layout-quality-eval.AC1.*`,
+`layout-quality-eval.AC2.*`.
+<!-- END_PHASE_1 -->
+
+<!-- START_PHASE_2 -->
+### Phase 2: Statistics core
+**Goal:** Pure aggregation and significance testing for seed-sample distributions.
+
+**Components:**
+- `src/simlin-engine/src/layout/eval_stats.rs` (new) -- `MetricSample`, `ModelStats`,
+  `CorpusReport`, `Comparison`; `geomean`, `median`/percentile, and a Mann-Whitney U test;
+  `compare(baseline, candidate)` producing per-model and aggregate deltas with p-values.
+- Unit tests against known reference values (geomean of a known set; Mann-Whitney U on
+  textbook samples; identical baseline/candidate -> zero delta, non-significant).
+
+**Dependencies:** Phase 1 (the `LayoutMetrics` type embedded in `MetricSample`).
+
+**Done when:** the helpers match known values and `compare()` reports the expected
+significance verdicts. Covers `layout-quality-eval.AC4.4`, `layout-quality-eval.AC4.5`.
+<!-- END_PHASE_2 -->
+
+<!-- START_PHASE_3 -->
+### Phase 3: Corpus sweep driver and report
+**Goal:** An on-demand sweep that lays out, scores, renders, and reports over the corpus.
+
+**Components:**
+- `src/simlin-engine/examples/layout_eval.rs` (new) -- loads a curated corpus list
+  (canonical SIR/teacup/logistic-growth; modules; multipoint connectors; LTM/loop models;
+  aliases; the `test/ai-information` set; a few large `test/metasd` Vensim models) via
+  `open_xmile`/`open_vensim` + salsa sync, runs M seeds per model, scores each, renders
+  best/median/worst PNGs, and scores+renders any shipped hand-authored view as a reference.
+- The per-seed seam: wrap `generate_layout_with_config` (`mod.rs`) or the `generate` closure
+  in `generate_best_layout`, varying `annealing_random_seed` per sample, so the driver can
+  sample seeds and compute the best-of-k proxy.
+- Emits `metrics.json`, `index.html` contact-sheet, and a `compare()` diff against a
+  committed `baseline.json`, under `target/layout-eval/` (gitignored).
+- A new `[[example]]` entry in `Cargo.toml` with `required-features = ["png_render",
+  "file_io"]` (no example uses `required-features` today; `file_io` helps load Vensim models
+  that reference external data, and AC3.6 skip-on-failure covers any that still fail).
+
+**Dependencies:** Phase 1 (metric), Phase 2 (stats).
+
+**Done when:** `cargo run --release --example layout_eval` completes, writes the JSON +
+contact-sheet referencing best/median/worst (and reference) renders, reports per-model
+median+spread / corpus geomean / best-of-k and a baseline delta with significance, places
+artifacts under `target/`, and skips (reports, non-fatally) any model that fails to lay out
+or render. Covers `layout-quality-eval.AC3.*`, `layout-quality-eval.AC4.1`,
+`layout-quality-eval.AC4.2`, `layout-quality-eval.AC4.3`.
+<!-- END_PHASE_3 -->
+
+<!-- START_PHASE_4 -->
+### Phase 4: Calibration and reference-pair validation
+**Goal:** Commit metric weights that match the user's taste, validated objectively.
+
+**Components:**
+- Committed default `MetricWeights` (overlap + crossings dominant; sprawl/aspect moderate;
+  structure terms 0), set via a talk-through over the Phase 3 contact-sheet, treating the
+  user's "this layout is better than that" judgments as ordering constraints on the linear
+  cost.
+- A reference-pair fixture (agreed human-vs-AI model pairs, e.g. from `test/ai-information`)
+  and a test asserting `weighted_cost(human) < weighted_cost(ai)` under the committed
+  weights.
+
+**Dependencies:** Phase 3 (need the contact-sheet to calibrate against), Phase 1.
+
+**Done when:** the committed weights satisfy the reference-pair ordering test, and the user
+has signed off on the weights after reviewing the contact-sheet. Covers
+`layout-quality-eval.AC5.*`.
+<!-- END_PHASE_4 -->
+
+<!-- START_PHASE_5 -->
+### Phase 5: Rung 0 wiring + CI regression guard
+**Goal:** Make seed selection use the full metric, and protect the gains in normal dev.
+
+**Components:**
+- `select_best_layout` (`mod.rs`) re-pointed to minimize `weighted_cost` (accurate
+  crossings), tie-break on seed.
+- A deterministic regression-guard test over a few tiny models asserting `weighted_cost`
+  stays at or below a committed threshold (fixed seeds; fast; under the time budget), plus a
+  determinism check (the same seed reproduces a byte-identical layout).
+- Confirm existing layout tests (`tests/layout.rs`, `layout_tests.rs`,
+  `layout_review_tests.rs`) still pass with the new selection.
+
+**Dependencies:** Phase 1 (metric), Phase 4 (committed weights).
+
+**Done when:** selection picks the lowest-`weighted_cost` candidate (verified on
+constructed candidates where lowest-cost differs from fewest-crossings), the guard +
+determinism tests pass within budget, and the existing layout suite is green. Covers
+`layout-quality-eval.AC6.*`, `layout-quality-eval.AC7.*`, `layout-quality-eval.AC8.1`.
+<!-- END_PHASE_5 -->
+
+## Additional Considerations
+
+**The hill-climbing ladder beyond this plan (rungs 1-3).** Rung 0 (Phase 5) is the only
+algorithm change built here. The forward path, each rung measured by the Phase 3 sweep with
+the Phase 2 significance gate and guarded by the rendered contact-sheet:
+
+- **Rung 1 -- parameter search.** Sweep SFDP `k`, `c`, `p`, the spacing constants, the seed
+  count, and SA temperature/iterations (`config.rs`, `sfdp.rs`, `annealing.rs`) against the
+  corpus geomean. No algorithm change; pure config search (grid/coordinate descent).
+- **Rung 2 -- metric-driven search objective.** Feed `weighted_cost` into the SA acceptance
+  delta (`annealing.rs`, currently `perturbed_crossings - current_crossings`) so the search
+  optimizes the full metric, not just crossings. Higher leverage but costlier per
+  perturbation than a crossing count, so it is a deliberate, measured experiment -- and may
+  use a cheap subset of terms in the inner loop.
+- **Rung 3 -- new passes.** Targeted code such as an overlap-removal post-pass or
+  obstacle-aware connector routing, each validated against the corpus.
+
+**Goodhart guard.** A scalar fitness will be gamed by any optimizer. Three mitigations are
+built in: per-term breakdowns stay visible (not just the scalar); the contact-sheet's
+best/median/worst renders are inspected every iteration (a change that improves the number
+but worsens the picture means the *metric* is wrong, not the layout); and the reference-pair
+test fails if weights stop agreeing with human-judged-better layouts.
+
+**Determinism vs. statistical sampling.** These serve different needs. The CI guard uses
+fixed seeds (deterministic, fast, flake-free). The interactive sweep varies seeds to
+characterize the algorithm's quality distribution, because a single fixed-seed measurement
+cannot distinguish a real improvement from seed luck. A specific bad layout remains exactly
+reproducible by its seed for debugging.
+
+**Sweep cost.** M seeds x corpus x (layout + a few renders) is minutes-scale on the large
+`test/metasd` models; acceptable for an on-demand example, which is why it is not in the
+test suite. M and the large-model tier are configurable.
+
+**Metric/render geometry agreement.** Computing the metric from the renderer's own geometry
+helpers (rather than the `LayoutConfig` element sizes) guarantees the score reflects what
+the PNG shows -- including the connector-polyline sampling that both the renderer and the
+crossing count share.
diff --git a/docs/test-plans/2026-05-22-layout-quality-eval.md b/docs/test-plans/2026-05-22-layout-quality-eval.md
new file mode 100644
index 000000000..714220c42
--- /dev/null
+++ b/docs/test-plans/2026-05-22-layout-quality-eval.md
@@ -0,0 +1,85 @@
+# Test Plan: Layout Quality Evaluation
+
+Human verification plan for the layout-quality-eval feature (implementation plan
+`docs/implementation-plans/2026-05-22-layout-quality-eval/`). The automated suite
+proves the metric math, the selection rule, and per-seed determinism. This plan
+covers what automated tests cannot: that the on-demand corpus **sweep** emits the
+right artifacts, and that the **human-judgment** calls (best/median/worst
+ordering, reference-vs-auto scoring, weight magnitudes) match a modeler's eye.
+This is the gate for AC3.*, AC4.1-4.3, and the human-in-the-loop part of AC5.
+
+## Prerequisites
+
+- Repo at a commit including the layout-quality-eval branch, clean working tree.
+  Run `./scripts/dev-init.sh`.
+- Toolchain that can build `resvg` (the `png_render` feature):
+  `cargo build -p simlin-engine --features png_render,file_io --example layout_eval`
+  should finish without error.
+- A browser to open `target/layout-eval/index.html`, and a JSON viewer / `jq`
+  for `target/layout-eval/metrics.json`.
+- Automated gate already green:
+  `cargo test -p simlin-engine --lib layout::` and
+  `cargo test -p simlin-engine --features file_io --test layout`.
+
+## Phase 1: Time-boxed smoke run (fast confidence)
+
+| Step | Action | Expected |
+|------|--------|----------|
+| 1 | `LAYOUT_EVAL_MODELS=teacup,sir LAYOUT_EVAL_SEEDS=4 cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval` | Exits 0 (AC3.1). stdout prints a per-model `sir: median=… p25/p75=…/… best_of_k=… (M=4)` line and `corpus: geomean_of_medians=… (2 model(s) scored)`. |
+| 2 | `ls target/layout-eval/` | Contains `metrics.json`, `index.html`, and PNGs: `sir_best/median/worst/reference.png`, `teacup_best/median/worst/reference.png`. |
+| 3 | `git status --porcelain target/` | Empty — nothing under `target/` is tracked (AC3.5). |
+
+## Phase 2: Full corpus sweep + artifact inspection
+
+| Step | Action | Expected |
+|------|--------|----------|
+| 1 | `cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval` (no env overrides: all corpus keys, M=25) | Exits 0. Each model prints its median/spread/best-of-k line; corpus aggregate at the end. Runtime is minutes (deliberately kept out of `cargo test`). |
+| 2 | Open `target/layout-eval/metrics.json` | Valid JSON. Each `per_model[]` has the full `LayoutMetrics` breakdown (`node_overlap`, `node_connector_overlap`, `label_overlap`, `crossings`, `sprawl`, `edge_length_cv`, `aspect_penalty`, `loop_compactness`, `chain_straightness`) + `weighted_cost`, `median_cost`, `spread`, `best_of_k_cost`, `best/median/worst_seed`. Top level has `geomean_of_medians` and the `weights` set (AC3.2). |
+| 3 | Verify AC4.2 by hand: collect each model's `median_cost`, compute their (epsilon-floored) geometric mean, compare to `geomean_of_medians` | The two agree to a few decimals. |
+| 4 | Open `target/layout-eval/index.html` in a browser | Contact sheet sorted **worst weighted_cost first**. Each model row shows best/median/worst (and reference where present) thumbnails with a per-term cost breakdown and the `median / p25/p75 / best_of_k / M=25` summary (AC3.3). Header shows `geomean_of_medians` and the weight set. |
+
+## Phase 3: Human-judgment checks (the calibration gate, AC5.1 / AC5.2)
+
+These are the calls only a human can make; sign-off here closes the
+human-in-the-loop component of AC5.
+
+| Step | Action | Expected (human judgment) |
+|------|--------|---------------------------|
+| 1 (best/median/worst ordering) | For 3-4 models (e.g. `sir`, `fishbanks`, `reliability`, `population`), look at the three generated thumbnails side by side | "best" should genuinely look cleanest (fewest overlaps/crossings, labels readable); "worst" messiest. If the metric's "best" looks worse than its "worst", that is calibration feedback — record it, do not silently accept it. |
+| 2 (reference vs auto) | For each model shipping a `*_reference.png`, compare it to that model's `*_best.png` and read both `weighted_cost` values | For `reliability`, `fishbanks`, `population`, `logistic-growth`: the hand-authored reference should both look cleaner and carry the lower `weighted_cost` (the human<auto direction the AC5.2 tests pin). For `sir`: the reference deliberately obscures more labels, so the auto scores lower — confirm that asymmetry looks right. |
+| 3 (weight magnitudes, AC5.1) | Read the weight set in the `index.html` header / `metrics.json` | Overlap + crossings family carry the dominant weights; `sprawl`/`edge_length_cv`/`aspect_penalty` are 0; `loop_compactness` is a small positive nudge (0.25); `chain_straightness` is 0. Confirm these still match intent over the contact sheet, then sign off. |
+
+## End-to-End: baseline-vs-candidate regression diff (AC4.3)
+
+Validates the full statistical-comparison path (per-model + aggregate deltas with
+Mann-Whitney U p-values + significance) a future tuning change would rely on.
+
+1. Seed a baseline: `LAYOUT_EVAL_WRITE_BASELINE=1 cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval`. stdout notes the baseline was written to `examples/layout_eval_baseline.json`.
+2. Run a plain candidate sweep (no `WRITE_BASELINE`).
+3. In stdout and the `index.html` "baseline diff" section: each model shows a signed `delta_ratio` %, a `p_value`, and a significance verdict; an aggregate delta + verdict is shown.
+4. Sanity (matches automated AC4.5): an unchanged candidate vs the just-written baseline shows deltas near 0% and non-significant everywhere. A genuinely different candidate (e.g. after a deliberate weight change) shows non-zero deltas; large, consistent ones read as significant.
+5. Reset the committed baseline when done: `git checkout examples/layout_eval_baseline.json` (unless intentionally updating it).
+
+## End-to-End: skip-on-failure (AC3.6)
+
+Confirms one bad model never aborts the sweep.
+
+1. Run a sweep including a model whose file you temporarily make missing/unreadable.
+2. Expected: a `WARN: skipping {key}: {err}` line is printed, that model is absent from `metrics.json`/`index.html`, and the sweep still exits 0 and writes a report for the survivors. Restore the file afterward.
+
+## Human Verification Required
+
+| Criterion | Why Manual | Steps |
+|-----------|------------|-------|
+| AC5.1 (weight magnitudes) | Final numeric weights are a taste call over the contact sheet, not derivable from a test. | Phase 3 step 3. |
+| AC5.2 (reference-pair selection + sign-off) | Which models are agreed anchors and whether the human layout truly looks better is human judgment. | Phase 3 steps 1-2. |
+| AC8.2 (rungs 1-3 documented) | Documentation criterion; no implementation phase. | Read the "Additional Considerations / hill-climbing ladder" of `docs/design-plans/2026-05-22-layout-quality-eval.md`; confirm Rung 1 (`config.rs`/`sfdp.rs`/`annealing.rs`), Rung 2 (`annealing.rs`), Rung 3 (overlap-removal / obstacle-aware routing) are each named with their seam. |
+
+## Notes
+
+- Automated coverage was validated PASS against
+  `docs/implementation-plans/2026-05-22-layout-quality-eval/test-requirements.md`
+  (20/20 automated criteria; AC3.* and AC4.1-4.3 operational by design; AC8.2 documentation).
+- The corpus sweep is intentionally **not** part of `cargo test` (it renders PNGs
+  and runs for minutes). It is an on-demand developer tool whose artifacts live
+  under the gitignored `target/layout-eval/`.
diff --git a/src/simlin-engine/CLAUDE.md b/src/simlin-engine/CLAUDE.md
index bf9149be8..3b3973977 100644
--- a/src/simlin-engine/CLAUDE.md
+++ b/src/simlin-engine/CLAUDE.md
@@ -114,8 +114,13 @@ The unit subsystem is partial-result throughout: a single bad declaration or one
 - **`src/ltm_augment.rs`** - Equation generators for LTM synthetic variables: `generate_link_score_equation_for_link` (ceteris-paribus link scores; takes `RefShape` and source dimension elements to drive per-shape PREVIOUS wrapping), `generate_loop_score_variables` (emits one `loop_score` per loop; relative loop scores are computed post-simulation in `ltm_post.rs`), `build_partial_equation_shaped` (AST-based PREVIOUS wrapping that holds matching-shape references live and wraps everything else, via `wrap_non_matching_in_previous` and `classify_expr0_subscript_shape`; arrayed-per-element-equation (`Ast::Arrayed`) targets get one partial per element assembled into an `Equation::Arrayed`), `link_score_var_name` (synthetic name helper: Bare gets the canonical `{from}\u{2192}{to}` form, FixedIndex prepends `[elem]` to from; the obsolete per-shape `\u{205A}wildcard`/`\u{205A}dynamic` Wildcard/DynamicIndex suffixes were retired -- those shapes now collapse onto the Bare name, since *every statically-describable* inlined reducer (whole-extent or sliced) is hoisted into a `$⁚ltm⁚agg⁚{n}` node and only a `DynamicIndex` reference -- `arr[i+1]`, a range, or the not-hoistable dynamic-index reducer carve-out `SUM(pop[idx,*])` -- and a whole-RHS variable-backed reducer's `Wildcard` argument reach this function), `quote_ident` (identifier quoting for equations). Array support: `classify_reducer` (walks target Expr2 AST to identify reducing builtins -- Linear for SUM/MEAN, Nonlinear for MIN/MAX/STDDEV/RANK, Constant for SIZE -- a thin reader of `ltm_agg::reducer_kind`), `generate_element_to_scalar_equation` (per-element link score equations for arrayed-to-scalar edges, used by both the variable-backed-reducer path and the `source[d] → $⁚ltm⁚agg⁚{n}` half) which dispatches on `ReducerKind` -- `generate_linear_partial` (SUM/MEAN algebraic shortcut), `generate_nonlinear_partial` (MIN/MAX nested binary calls; STDDEV the unrolled population-variance `sqrt` ceteris-paribus partial -- divisor `N`, matching `vm.rs::Opcode::ArrayStddev`, with the mean string-inlined; RANK the documented delta-ratio stand-in pinned by `test_generate_rank_keeps_delta_ratio` -- an order statistic, non-differentiable and unreachable via a real model RHS), `generate_scalar_to_element_equation` (per-element link score for the `$⁚ltm⁚agg⁚{n} → target[e]` half; takes a `source_ref_override: Option<&str>` so a multi-slot arrayed agg's `Δsource` denominator carries the projected `agg[<slot>]` subscript instead of the bare agg name, which wouldn't compile as a scalar), `substitute_reducers_in_expr0` (textually replaces a recognized reducer subexpression in an `Expr0` with its agg name, for the `$⁚ltm⁚agg⁚{n} → target` link score), `resolve_link_score_name_for_loop` (picks the Bare-or-FixedIndex link-score name a loop-score reference should target). Module link score formulas (black-box delta-ratio and composite-ref) are inlined directly into `link_score_equation_text` in `db.rs`.
 - **`src/ltm_post.rs`** - Post-simulation relative loop score computation. `compute_rel_loop_scores(results, loop_partitions)` normalizes each loop's `loop_score` series against the sum of absolute scores within its cycle partition, using SAFEDIV-0 semantics (zero denominator -> zero result). Called after simulation rather than emitted as synthetic equations to avoid O(P^2) equation-text growth on models with dense partitions.
 - **LTM open work**: known LTM bugs and improvements are tracked on GitHub under the `ltm` label; issue #488 is the pinned epic that organises them by area (core algorithm, discovery/post-sim, augmentation, module/array umbrellas). Each open `ltm`-labelled issue carries file:line references and a suggested fix, so a new session can pick a bite-sized piece without re-investigating the subsystem.
-- **`src/diagram/`** - Diagram/sketch rendering: `elements.rs`, `connector.rs`, `flow.rs`, `render.rs`, `common.rs`, `constants.rs`, `label.rs`, `arrowhead.rs`
-- **`src/layout/`** - Automatic diagram layout generation (available on all targets including WASM; uses serial fallback when rayon is unavailable). Two entry points: `generate_best_layout()` (public) generates a complete diagram from scratch; `incremental_layout()` (public) preserves existing element positions and adds/removes only what changed. Submodules: `sfdp.rs` (force-directed placement), `annealing.rs` (crossing reduction), `chain.rs` (stock-flow chain positioning), `config.rs` (layout parameters including `module_width`/`module_height`), `connector.rs` (link routing), `graph.rs` (graph data structures), `metadata.rs` (feedback loops, dominant periods), `placement.rs` (label optimization, normalization), `text.rs` (label sizing), `uid.rs` (UID management), `layout_tests.rs` (unit tests for composable layout blocks and incremental operations). `LayoutState` is the public mutable state struct used by both paths: `LayoutState::new()` for fresh layout, `LayoutState::from_existing_view()` for incremental. Incremental helpers: `identify_new_elements()`, `compute_new_element_positions()`, `settle_new_elements()`, `diff_connectors()`, `diff_clouds()`, `apply_deletion()`, `apply_rename()`. The convenience wrappers `generate_best_layout()` and `generate_layout_with_config()` remain as the primary public API for callers. Generates view elements for modules (not just stocks/flows/auxes).
+- **`src/diagram/`** - Diagram/sketch rendering: `elements.rs`, `connector.rs`, `flow.rs`, `render.rs`, `common.rs`, `constants.rs`, `label.rs`, `arrowhead.rs`. The `connector`/`elements`/`flow`/`label` submodules are `pub(crate)` so the layout-quality metric (`layout::metrics`) can reuse the exact same geometry the SVG renderer draws (a layout's score can never disagree with what is rendered). `connector.rs` exposes `pub(crate) connector_polyline` -- the polyline the renderer draws for a connector: straight links clipped to element boundaries (matching `render_straight_line`), arcs sampled along the arc circle (`ARC_POLYLINE_SAMPLES`, byte-identical to `render_arc`'s SVG), MultiPoint links returning empty (nothing is drawn for them today). `common.rs` carries the shared `Rect`/`Point`/`Circle` geometry plus `pub(crate)` rect/segment helpers (`rect_area`, `rect_overlap_area`, `rect_contains_point`, `segment_length_in_rect`, `rect_width`/`rect_height`). `elements.rs`/`flow.rs` expose label-free `*_shape_bounds` (`aux_shape_bounds`, `stock_shape_bounds`, `flow_shape_bounds`) alongside the label-merged `*_bounds`, so the metric can charge node-shape overlap and connector-under-shape against the bare shape and label-vs-label overlap separately (no double-count).
+- **`src/layout/`** - Automatic diagram layout generation (available on all targets including WASM; uses serial fallback when rayon is unavailable). Two entry points: `generate_best_layout()` (public) generates a complete diagram from scratch; `incremental_layout()` (public) preserves existing element positions and adds/removes only what changed. Submodules: `sfdp.rs` (force-directed placement), `annealing.rs` (crossing reduction), `chain.rs` (stock-flow chain positioning), `config.rs` (layout parameters including `module_width`/`module_height`), `connector.rs` (link routing), `graph.rs` (graph data structures), `metadata.rs` (feedback loops, dominant periods), `placement.rs` (label optimization, normalization), `text.rs` (label sizing), `uid.rs` (UID management), `metrics.rs` and `eval_stats.rs` (the layout-quality metric and its eval statistics; see below), `layout_tests.rs`/`crossings_tests.rs`/`layout_selection_tests.rs` (unit tests for composable layout blocks, crossing geometry, and best-of-k selection). `LayoutState` is the public mutable state struct used by both paths: `LayoutState::new()` for fresh layout, `LayoutState::from_existing_view()` for incremental. Incremental helpers: `identify_new_elements()`, `compute_new_element_positions()`, `settle_new_elements()`, `diff_connectors()`, `diff_clouds()`, `apply_deletion()`, `apply_rename()`. The convenience wrappers `generate_best_layout()` and `generate_layout_with_config()` remain as the primary public API for callers. Generates view elements for modules (not just stocks/flows/auxes).
+  - **Deterministic per seed (#633)**: `fresh_layout` and the incremental `diff_connectors` produce a bit-identical layout for a fixed `(model, seed)` across repeated calls. HashMap iteration order is per-process random, so every layout-affecting iteration over a `HashMap`/`HashSet` is materialized into a sorted `Vec` first: `run_sfdp_with_rigid_chains`'s `var_to_node` centroid/aux-placement loops, and `diff_connectors`'s new-edge / alias-match / preserved-link loops (which allocate sequential uids and append to `state.elements`).
+  - **Best-of-k selection by the calibrated metric**: `generate_best_layout` runs `LAYOUT_SEEDS` (now `pub` -- the eval sweep uses the same seed set as its production proxy) in parallel and `select_best_layout` picks the candidate minimizing `metrics::compute_layout_metrics(view, cfg).weighted_cost(&MetricWeights::default())` -- the full calibrated readability metric, NOT fewest crossings. Selection is NaN-safe (a degenerate NaN-cost layout never wins over a finite one regardless of order; all-NaN keeps the earliest) and ties break to the lowest seed. The `LayoutResult` struct carries `weighted_cost` (no separate `crossings` field; the metric's `crossings` term computes the accurate count internally).
+  - **`count_view_crossings` / `build_view_segments`**: `build_view_segments` is the single source of crossing geometry, shared with `metrics.rs`. Connector geometry comes from `diagram::connector::connector_polyline` (the exact drawn polyline: straight links clipped to boundaries, arcs sampled), and ALL element kinds are resolved by uid (Module/Alias links are no longer silently dropped -- the previous chord-based code only mapped Stock/Flow/Aux/Cloud). Vertex naming suppresses self- and shared-endpoint crossings (`elem_{uid}` endpoints, per-link `link_{uid}#{i}` interior arc samples); a flow's valve is injected as an `elem_{flow.uid}` pipe vertex so a link incident on the valve no longer miscounts as crossing the pipe.
+- **`src/layout/metrics.rs`** - Functional-Core layout-quality metric. `compute_layout_metrics(view, config) -> LayoutMetrics` is pure (no I/O), guaranteed finite (each division guards a zero denominator with 0), so empty/single-element views score all-zero. `LayoutMetrics` per-term costs (0.0 = ideal): `node_overlap` (pairwise node-shape-box overlap), `node_connector_overlap` (connector length under non-incident node shapes -- both on label-free shape boxes), `label_overlap` (per-label obscured fraction), `crossings`, `sprawl`, `edge_length_cv`, `aspect_penalty` (beyond the `TARGET_AR_MAX = 16:9` band), `loop_compactness` (mean isoperimetric `1 - Q` over feedback cycles), and the reserved `chain_straightness` (always 0.0). `LayoutMetrics::weighted_cost(&MetricWeights)` is `Sigma w_i * term_i`. `MetricWeights::default()` is the calibrated readability-dominant production set (overlap/crossings family at 1.0; `sprawl`/`edge_length_cv`/`aspect_penalty` deliberately 0.0 -- spreading out for legibility is good, not penalized; `loop_compactness` a gentle 0.25; `chain_straightness` 0.0). Both structs derive `Serialize`/`Deserialize` purely so the eval sweep can emit/round-trip its JSON artifacts.
+- **`src/layout/eval_stats.rs`** - Functional-Core benchstat-style statistics for the layout-quality seed-sample sweep: `geomean`/`percentile`/`median`/`mann_whitney_u` (non-parametric significance test) plus the `MetricSample`/`ModelStats`/`CorpusReport`/`Comparison` aggregation types and `compare(baseline, candidate)`. No I/O; every primitive returns a finite documented default (`0.0`, or a non-significant `p_value` of `1.0`) on empty/degenerate input, never NaN.
 
 ## Utilities
 
@@ -150,7 +155,7 @@ The unit subsystem is partial-result throughout: a single bad declaration or one
 - **`tests/simulate_systems.rs`** - Systems format simulation integration tests (fixtures in `test/systems-format/`)
 - **`tests/simulate_ltm.rs`** - LTM feature tests
 - **`tests/systems_roundtrip.rs`** - Systems format parse-translate-write round-trip tests
-- **`tests/layout.rs`** - Layout generation integration tests (chains, connectors, modules, LTM metadata, dominant periods, incremental layout operations)
+- **`tests/layout.rs`** - Layout generation integration tests (chains, connectors, modules, LTM metadata, dominant periods, incremental layout operations, and the per-seed bit-identical-layout determinism guard for both the fresh and incremental paths -- #633)
 - **`tests/json_roundtrip.rs`** - JSON serialization roundtrip
 - **`tests/roundtrip.rs`** - XMILE/MDL roundtrip tests
 - **`tests/vm_alloc.rs`** - VM memory allocation tests
@@ -160,3 +165,4 @@ The unit subsystem is partial-result throughout: a single bad declaration or one
 - **`benches/compiler.rs`** - Compiler pipeline benchmarks on real models (WRLD3, C-LEARN)
 - **`benches/simulation.rs`** - VM execution and compilation benchmarks (synthetic models)
 - **`benches/array_ops.rs`** - Array operation benchmarks (sum, broadcast, element-wise)
+- **`examples/layout_eval.rs`** - On-demand layout-quality corpus sweep (gated `[[example]]` with `required-features = ["png_render", "file_io"]`, so the default `--all-targets` build skips it). Scores each model's best-of-`LAYOUT_SEEDS` layout with `metrics::compute_layout_metrics`, renders best/median/worst plus the reference PNGs, and emits `metrics.json` + `index.html` + a diff against the committed `examples/layout_eval_baseline.json` under `target/` (see `examples/layout_eval_baseline.README.md`)
diff --git a/src/simlin-engine/Cargo.toml b/src/simlin-engine/Cargo.toml
index c02081eed..de3c1f39a 100644
--- a/src/simlin-engine/Cargo.toml
+++ b/src/simlin-engine/Cargo.toml
@@ -115,6 +115,15 @@ name = "compiler_vector"
 name = "vdf_alias_decoder"
 required-features = ["file_io"]
 
+# The layout_eval example calls the png_render-gated `render_png` and loads
+# Vensim corpus models that reference external data (file_io). Examples are
+# auto-discovered and built by `--all-targets` / clippy / pre-commit under the
+# DEFAULT feature set (which excludes png_render); without this `[[example]]`
+# entry pinning required-features, that build would fail to compile the example.
+[[example]]
+name = "layout_eval"
+required-features = ["png_render", "file_io"]
+
 [[bench]]
 name = "array_ops"
 harness = false
diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs
new file mode 100644
index 000000000..2199d5d70
--- /dev/null
+++ b/src/simlin-engine/examples/layout_eval.rs
@@ -0,0 +1,1194 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+//! Layout-quality evaluation sweep (on-demand; NOT part of `cargo test`).
+//!
+//! Lays out a curated corpus of models across many seeds, scores each layout
+//! with the layout-quality metric, renders best/median/worst (and any
+//! hand-authored reference) to PNG, and writes a metrics table (JSON), an HTML
+//! contact-sheet, and a baseline diff -- all under a gitignored `target/` dir.
+//!
+//! This is a thin imperative shell over the metric core
+//! (`layout::metrics::compute_layout_metrics`) and the statistics core
+//! (`layout::eval_stats`). It loads each model via the public `open_xmile` /
+//! `open_vensim` loaders (like `examples/backend_bench.rs`), runs
+//! `generate_layout_with_config` per seed, scores, summarizes, renders, and
+//! emits artifacts.
+//!
+//! Usage:
+//!   cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval
+//!   LAYOUT_EVAL_MODELS=teacup,sir cargo run ... --example layout_eval
+//!
+//! Env knobs:
+//!   LAYOUT_EVAL_MODELS         comma list of corpus keys to run (default: all)
+//!   LAYOUT_EVAL_SEEDS          number of seeds M to sample (default: 25)
+//!   LAYOUT_EVAL_OUT            output directory (default: repo-root target/layout-eval)
+//!   LAYOUT_EVAL_WRITE_BASELINE 1 -> write this run's report to the committed
+//!                              baseline JSON (see below) instead of diffing.
+//!
+//! Baseline diff: a committed `examples/layout_eval_baseline.json` (a serialized
+//! `CorpusReport`) records a reference run. A normal run reads it back, runs
+//! `compare(baseline, candidate)`, and embeds the per-model + aggregate deltas
+//! (with Mann-Whitney U p-values / significance verdicts) into `metrics.json`
+//! and the `index.html` header. With `LAYOUT_EVAL_WRITE_BASELINE=1` the run
+//! instead overwrites that baseline file (re-seed it after the metric weights
+//! change). If the file is absent a normal run skips the diff with a note.
+//!
+//! Requires `--features png_render,file_io`: `png_render` for `render_png`, and
+//! `file_io` so Vensim corpus models that reference external data can load.
+
+use std::collections::BTreeSet;
+use std::env;
+use std::fmt::Write as _;
+use std::io::BufReader;
+
+use rayon::prelude::*;
+use serde::Serialize;
+use simlin_engine::diagram::{PngRenderOpts, render_png};
+use simlin_engine::layout::LAYOUT_SEEDS;
+use simlin_engine::layout::config::LayoutConfig;
+use simlin_engine::layout::eval_stats::{
+    Comparison, CorpusReport, MetricSample, ModelStats, compare,
+};
+use simlin_engine::layout::generate_layout_with_config;
+use simlin_engine::layout::metrics::{LayoutMetrics, MetricWeights, compute_layout_metrics};
+use simlin_engine::{datamodel, open_vensim, open_xmile};
+
+/// The model name the layout pipeline and renderer operate on. `Project::get_model`
+/// maps "main" to the single/main model (matching `tests/layout.rs`).
+const MAIN_MODEL: &str = "main";
+
+/// Default number of seeds to sample per model when `LAYOUT_EVAL_SEEDS` is unset.
+const DEFAULT_SEEDS: u64 = 25;
+
+/// Path (relative to `CARGO_MANIFEST_DIR` = `src/simlin-engine`) of the committed
+/// baseline `CorpusReport`. This file lives in the SOURCE TREE by design (it is
+/// checked in and diffed against on every normal run), unlike every other
+/// artifact, which is written under the gitignored `target/` output dir.
+const BASELINE_REL_PATH: &str = "examples/layout_eval_baseline.json";
+
+// ── Corpus ─────────────────────────────────────────────────────────────────
+
+#[derive(Clone, Copy)]
+enum Format {
+    Xmile,
+    Vensim,
+}
+
+struct ModelSpec {
+    key: &'static str,
+    /// Path relative to CARGO_MANIFEST_DIR (src/simlin-engine).
+    rel_path: &'static str,
+    format: Format,
+}
+
+use Format::{Vensim, Xmile};
+
+/// The curated corpus. Paths are relative to `CARGO_MANIFEST_DIR`
+/// (`src/simlin-engine`); all 15 were verified to exist on disk.
+const CORPUS: &[ModelSpec] = &[
+    // canonical small
+    ModelSpec {
+        key: "teacup",
+        rel_path: "../../test/test-models/samples/teacup/teacup.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "sir",
+        rel_path: "../../test/test-models/samples/SIR/SIR.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "logistic_growth",
+        rel_path: "../../test/logistic_growth_ltm/logistic_growth.stmx",
+        format: Xmile,
+    },
+    // default_projects: the app's curated, hand-laid-out built-in projects.
+    // These are the primary "good layout" taste anchors for Phase 4 calibration.
+    ModelSpec {
+        key: "fishbanks",
+        rel_path: "../../default_projects/fishbanks/model.xmile",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "dp_logistic_growth",
+        rel_path: "../../default_projects/logistic-growth/model.xmile",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "population",
+        rel_path: "../../default_projects/population/model.xmile",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "reliability",
+        rel_path: "../../default_projects/reliability/model.xmile",
+        format: Xmile,
+    },
+    // modules
+    ModelSpec {
+        key: "hares_and_foxes",
+        rel_path: "../../test/modules_hares_and_foxes/modules_hares_and_foxes.stmx",
+        format: Xmile,
+    },
+    // multipoint connectors
+    ModelSpec {
+        key: "multipoint",
+        rel_path: "../../test/test-models/samples/display/multipoint-connection.stmx",
+        format: Xmile,
+    },
+    // aliases
+    ModelSpec {
+        key: "alias1",
+        rel_path: "../../test/alias1/alias1.stmx",
+        format: Xmile,
+    },
+    // LTM / loop models
+    ModelSpec {
+        key: "cross_element",
+        rel_path: "../../test/cross_element_ltm/cross_element.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "arrayed_pop",
+        rel_path: "../../test/arrayed_population_ltm/arrayed_population.stmx",
+        format: Xmile,
+    },
+    // ai-information reference set (human vs AI; used by Phase 4 calibration)
+    ModelSpec {
+        key: "ai_pure_human",
+        rel_path: "../../test/ai-information/PureHumanModel.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "ai_pure_ai",
+        rel_path: "../../test/ai-information/PureAIModel.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "ai_edited",
+        rel_path: "../../test/ai-information/GeneratedByAIThenEdited.stmx",
+        format: Xmile,
+    },
+    ModelSpec {
+        key: "ai_modules_arrays",
+        rel_path: "../../test/ai-information/WithModulesAndArrays.stmx",
+        format: Xmile,
+    },
+    // large metasd Vensim
+    ModelSpec {
+        key: "wrld3_03",
+        rel_path: "../../test/metasd/WRLD3-03/wrld3-03.mdl",
+        format: Vensim,
+    },
+    ModelSpec {
+        key: "beer_game",
+        rel_path: "../../test/metasd/beer-game/RealBeer4-Sterman13.mdl",
+        format: Vensim,
+    },
+    ModelSpec {
+        key: "wonderland",
+        rel_path: "../../test/metasd/wonderland/Wonderland3.mdl",
+        format: Vensim,
+    },
+];
+
+/// Resolve a corpus-relative path against the crate manifest dir.
+fn abs_path(rel: &str) -> String {
+    format!("{}/{}", env!("CARGO_MANIFEST_DIR"), rel)
+}
+
+/// Load one corpus model, dispatching on its declared format: XMILE through a
+/// buffered reader + `open_xmile`, Vensim `.mdl` through a string + `open_vensim`
+/// (mirrors `examples/backend_bench.rs`). Returns a human-readable error on any
+/// I/O or parse failure so the caller can WARN-and-skip (AC3.6).
+fn load_model(spec: &ModelSpec) -> Result<datamodel::Project, String> {
+    let path = abs_path(spec.rel_path);
+    match spec.format {
+        Format::Xmile => {
+            let file =
+                std::fs::File::open(&path).map_err(|e| format!("failed to open {path}: {e}"))?;
+            let mut reader = BufReader::new(file);
+            open_xmile(&mut reader).map_err(|e| format!("failed to parse {path}: {e:?}"))
+        }
+        Format::Vensim => {
+            let contents = std::fs::read_to_string(&path)
+                .map_err(|e| format!("failed to read {path}: {e}"))?;
+            open_vensim(&contents).map_err(|e| format!("failed to parse {path}: {e:?}"))
+        }
+    }
+}
+
+/// Count the view elements in the model's as-loaded main view -- the diagram
+/// the later tasks score and render. A model with no hand-authored view yields
+/// 0 here (its layout is generated from scratch in Task 2).
+fn loaded_element_count(project: &datamodel::Project) -> usize {
+    reference_view(project)
+        .map(|sf| sf.elements.len())
+        .unwrap_or(0)
+}
+
+/// Borrow the model's as-loaded main `StockFlow` view if it is a hand-authored
+/// reference: a non-empty view carrying non-empty `elements`. A model loaded
+/// without a saved diagram (its layout is generated from scratch in the sweep)
+/// has no such view, so this returns `None` and the caller skips the reference
+/// render.
+fn reference_view(project: &datamodel::Project) -> Option<&datamodel::StockFlow> {
+    let model = project.get_model(MAIN_MODEL)?;
+    match model.views.first() {
+        Some(datamodel::View::StockFlow(sf)) if !sf.elements.is_empty() => Some(sf),
+        _ => None,
+    }
+}
+
+// ── Env knobs ────────────────────────────────────────────────────────────────
+
+/// The set of corpus keys to run. `LAYOUT_EVAL_MODELS` is a comma list of keys;
+/// unset/empty means the whole corpus. Unknown keys are reported and dropped so
+/// a typo does not silently run nothing without explanation.
+fn selected_keys() -> Vec<&'static str> {
+    let Ok(raw) = env::var("LAYOUT_EVAL_MODELS") else {
+        return CORPUS.iter().map(|s| s.key).collect();
+    };
+    let requested: Vec<&str> = raw
+        .split(',')
+        .map(str::trim)
+        .filter(|s| !s.is_empty())
+        .collect();
+    if requested.is_empty() {
+        return CORPUS.iter().map(|s| s.key).collect();
+    }
+    let mut keys = Vec::new();
+    for want in requested {
+        match CORPUS.iter().find(|s| s.key == want) {
+            Some(spec) => keys.push(spec.key),
+            None => eprintln!("WARN: unknown model key {want:?}; skipping"),
+        }
+    }
+    keys
+}
+
+/// Number of seeds M to sample per model (`LAYOUT_EVAL_SEEDS`, default 25).
+fn seed_count() -> u64 {
+    env::var("LAYOUT_EVAL_SEEDS")
+        .ok()
+        .and_then(|v| v.parse().ok())
+        .unwrap_or(DEFAULT_SEEDS)
+}
+
+/// The seeds to sample: the union of the production best-of-k proxy
+/// (`LAYOUT_SEEDS`) and `0..m`, deduped and sorted. Including `LAYOUT_SEEDS`
+/// guarantees the best-of-k production proxy is always computable regardless of
+/// `m`.
+fn seed_set(m: u64) -> Vec<u64> {
+    let mut seeds: BTreeSet<u64> = (0..m).collect();
+    seeds.extend(LAYOUT_SEEDS);
+    seeds.into_iter().collect()
+}
+
+/// The output directory (`LAYOUT_EVAL_OUT`, default repo-root
+/// `target/layout-eval`, derived from `CARGO_MANIFEST_DIR`).
+fn out_dir() -> String {
+    env::var("LAYOUT_EVAL_OUT")
+        .unwrap_or_else(|_| format!("{}/../../target/layout-eval", env!("CARGO_MANIFEST_DIR")))
+}
+
+/// Whether to (re)seed the committed baseline instead of diffing against it.
+/// True when `LAYOUT_EVAL_WRITE_BASELINE` is set to a truthy value (`1`/`true`,
+/// case-insensitive). Any other value -- and an unset variable -- means a normal
+/// diffing run.
+fn write_baseline_requested() -> bool {
+    matches!(
+        env::var("LAYOUT_EVAL_WRITE_BASELINE")
+            .unwrap_or_default()
+            .trim()
+            .to_ascii_lowercase()
+            .as_str(),
+        "1" | "true"
+    )
+}
+
+/// Absolute path of the committed baseline `CorpusReport` JSON. Resolved against
+/// `CARGO_MANIFEST_DIR` so it always points at the source-tree file regardless
+/// of the working directory the example runs from.
+fn baseline_path() -> String {
+    format!("{}/{}", env!("CARGO_MANIFEST_DIR"), BASELINE_REL_PATH)
+}
+
+// ── Per-model seed sweep ─────────────────────────────────────────────────────
+
+/// Lay out `project`'s main model once for each `seed`, score each layout, and
+/// summarize the samples into a `ModelStats`.
+///
+/// The per-seed layouts run in parallel via rayon (mirroring
+/// `generate_best_layout`'s `par_iter` over seeds). The parallel results are
+/// collapsed back into `seeds`-order before being summarized, so the sample
+/// vector -- and every statistic derived from it -- is invariant to rayon's
+/// scheduling: parallelism introduces no nondeterminism here.
+///
+/// `generate_layout_with_config` is deterministic per seed (fix #633): the same
+/// `(model, seed)` pair produces the identical layout on repeated calls within
+/// and across processes, so the reported median/spread are reproducible.
+///
+/// A seed whose layout fails to generate is dropped with a WARN (a single bad
+/// seed must not sink the whole model's sweep). A model whose layout fails on
+/// EVERY seed yields an empty `samples` vector here; the caller
+/// (`process_model`) treats that zero-usable-samples case as a model-level
+/// failure and skips the model (`WARN: skipping {key}: ...`), so a model that
+/// never lays out is omitted from the report rather than reported as a
+/// degenerate all-zero entry (AC3.6).
+fn sweep_model(key: &str, project: &datamodel::Project, seeds: &[u64]) -> ModelStats {
+    // Compute one (seed, sample) per seed in parallel, then sort back into seed
+    // order so the sample vector -- and therefore every statistic derived from
+    // it -- is independent of rayon's scheduling.
+    let mut indexed: Vec<(u64, MetricSample)> = seeds
+        .par_iter()
+        .filter_map(|&seed| {
+            let cfg = LayoutConfig {
+                annealing_random_seed: seed,
+                ..LayoutConfig::default()
+            };
+            match generate_layout_with_config(project, MAIN_MODEL, cfg.clone(), None) {
+                Ok(view) => {
+                    let metrics = compute_layout_metrics(&view, &cfg);
+                    let weighted_cost = metrics.weighted_cost(&MetricWeights::default());
+                    Some((
+                        seed,
+                        MetricSample {
+                            seed,
+                            metrics,
+                            weighted_cost,
+                        },
+                    ))
+                }
+                Err(err) => {
+                    eprintln!("WARN: {key} seed {seed} failed to lay out: {err}");
+                    None
+                }
+            }
+        })
+        .collect();
+
+    indexed.sort_by_key(|(seed, _)| *seed);
+    let samples: Vec<MetricSample> = indexed.into_iter().map(|(_, sample)| sample).collect();
+
+    ModelStats::from_samples(key.to_string(), samples, &LAYOUT_SEEDS)
+}
+
+// ── Rendering ────────────────────────────────────────────────────────────────
+
+/// One rendered diagram: the PNG filename written under the out dir (relative,
+/// so the Task-4 `index.html` can reference it with a sibling `<img src>`) and
+/// the metric breakdown of the view that was rendered. The seed is `Some` for a
+/// generated render (best/median/worst) and `None` for the as-loaded reference.
+///
+/// `seed`, `metrics`, and `weighted_cost` are read by Task 4: the report builder
+/// serializes them into `metrics.json` and the contact-sheet's per-render
+/// breakdown table. They are kept as data here (rather than dropped and
+/// recomputed) so the report builder is a pure read over this struct.
+struct Render {
+    /// Filename of the PNG, relative to the out dir (e.g. `sir_best.png`).
+    file: String,
+    /// The seed that produced the generated view (`None` for the reference).
+    seed: Option<u64>,
+    /// Per-term metrics of the rendered view.
+    metrics: LayoutMetrics,
+    /// Scalar weighted cost under the calibrated default weights.
+    weighted_cost: f64,
+}
+
+/// All renders produced for one model: the optional hand-authored reference and
+/// the three generated layouts (best/median/worst). Task 4 serializes these
+/// per-model metric breakdowns into `metrics.json` and the contact-sheet, so the
+/// fields are kept as data the report can read back. A render that failed is
+/// `None` (the failure was already WARN-logged) -- skip-on-failure feeds Task 6.
+struct ModelRenders {
+    reference: Option<Render>,
+    best: Option<Render>,
+    median: Option<Render>,
+    worst: Option<Render>,
+}
+
+/// Render one view to a PNG file under `out`, scoring it with the default
+/// layout config (the metric core is config-driven only for node sizing, which
+/// is constant across the sweep). On any render or write failure, WARN to
+/// stderr and return `None` so the sweep continues (AC3.6).
+///
+/// `project` must already carry the view to render as its main view's first
+/// view (the renderer reads `model.views.first()`). The caller installs the
+/// view (a clone of the project for a generated layout, or the as-loaded
+/// project for the reference) before calling.
+fn render_view(
+    project: &datamodel::Project,
+    metrics: LayoutMetrics,
+    seed: Option<u64>,
+    file: &str,
+    out: &str,
+) -> Option<Render> {
+    let png = match render_png(project, MAIN_MODEL, &PngRenderOpts::default()) {
+        Ok(bytes) => bytes,
+        Err(err) => {
+            eprintln!("WARN: failed to render {file}: {err}");
+            return None;
+        }
+    };
+    let path = format!("{out}/{file}");
+    if let Err(err) = std::fs::write(&path, &png) {
+        eprintln!("WARN: failed to write {path}: {err}");
+        return None;
+    }
+    let weighted_cost = metrics.weighted_cost(&MetricWeights::default());
+    Some(Render {
+        file: file.to_string(),
+        seed,
+        metrics,
+        weighted_cost,
+    })
+}
+
+/// Regenerate the view for `seed`, install it into a clone of `project`, render
+/// it to `{key}_{suffix}.png`, and return the `Render`. A layout-generation
+/// failure is non-fatal: WARN and return `None`.
+fn render_generated(
+    key: &str,
+    suffix: &str,
+    project: &datamodel::Project,
+    seed: u64,
+    out: &str,
+) -> Option<Render> {
+    let cfg = LayoutConfig {
+        annealing_random_seed: seed,
+        ..LayoutConfig::default()
+    };
+    let view = match generate_layout_with_config(project, MAIN_MODEL, cfg.clone(), None) {
+        Ok(view) => view,
+        Err(err) => {
+            eprintln!("WARN: {key} {suffix} (seed {seed}) failed to lay out: {err}");
+            return None;
+        }
+    };
+    let metrics = compute_layout_metrics(&view, &cfg);
+    // Install the generated view into a clone so the as-loaded project (and its
+    // reference view) is never mutated.
+    let mut p = project.clone();
+    p.get_model_mut(MAIN_MODEL).unwrap().views = vec![datamodel::View::StockFlow(view)];
+    let file = format!("{key}_{suffix}.png");
+    render_view(&p, metrics, Some(seed), &file, out)
+}
+
+/// Render the model's best/median/worst generated layouts and -- if the model
+/// ships a hand-authored view -- its reference, all to PNGs under `out`.
+///
+/// The reference is rendered from the AS-LOADED `project` (before any view is
+/// overwritten) so it captures the model's own diagram, not a generated one.
+/// Generated layouts are each regenerated from `project` by seed and installed
+/// into a fresh clone, leaving `project` untouched.
+fn render_model(
+    key: &str,
+    project: &datamodel::Project,
+    stats: &ModelStats,
+    out: &str,
+) -> ModelRenders {
+    // Reference first, from the as-loaded project, before any clone-and-install.
+    // Score the hand-authored `StockFlow` directly (the renderer reads the same
+    // view from `project`, so this is the geometry being rasterized).
+    let reference = reference_view(project).and_then(|sf| {
+        let metrics = compute_layout_metrics(sf, &LayoutConfig::default());
+        render_view(project, metrics, None, &format!("{key}_reference.png"), out)
+    });
+
+    // A model whose sweep produced no samples has all-zero seeds and nothing
+    // worth rendering; skip the generated renders (the reference, if any, is
+    // already captured).
+    if stats.samples.is_empty() {
+        return ModelRenders {
+            reference,
+            best: None,
+            median: None,
+            worst: None,
+        };
+    }
+
+    let best = render_generated(key, "best", project, stats.best_seed, out);
+    let median = render_generated(key, "median", project, stats.median_seed, out);
+    let worst = render_generated(key, "worst", project, stats.worst_seed, out);
+
+    ModelRenders {
+        reference,
+        best,
+        median,
+        worst,
+    }
+}
+
+/// Print the PNG filenames produced for one model (and note a skipped reference
+/// or generated render) so a run's stdout records exactly what was written.
+fn report_renders(key: &str, renders: &ModelRenders) {
+    let mut produced: Vec<&str> = Vec::new();
+    for render in [
+        &renders.reference,
+        &renders.best,
+        &renders.median,
+        &renders.worst,
+    ]
+    .into_iter()
+    .flatten()
+    {
+        produced.push(render.file.as_str());
+    }
+    if produced.is_empty() {
+        println!("{key}: no PNGs rendered");
+    } else {
+        println!("{key}: rendered {}", produced.join(", "));
+    }
+    if renders.reference.is_none() {
+        println!("{key}: no hand-authored reference view (skipped reference render)");
+    }
+}
+
+// ── Per-model pipeline (skip-on-failure) ─────────────────────────────────────
+
+/// Run one model's full pipeline -- load -> seed sweep -> render -- and return
+/// its `(ModelStats, ModelRenders)` on success.
+///
+/// This is the model-level skip-on-failure boundary (AC3.6): EVERY way a single
+/// model can fail funnels through the returned `Err(String)`, which `main` turns
+/// into a `WARN: skipping {key}: {err}` and a continue to the next model, so one
+/// bad model never aborts the sweep and is simply omitted from the report.
+///
+/// Three failure modes, validated in the order data flows (defense-in-depth):
+///   1. **Load failure** (entry layer): a missing file or a parse error is
+///      already surfaced as `Err(String)` by `load_model`; propagated with `?`.
+///   2. **No usable layout** (business layer): `sweep_model` drops each
+///      individually-failing seed with a WARN but still returns a (possibly
+///      empty) `ModelStats`. A model whose layout failed on EVERY seed has zero
+///      samples and cannot be scored, rendered, or aggregated -- it is a
+///      model-level failure here, returned as `Err`. Crucially this only fires
+///      when ALL seeds failed: a model with even one usable sample proceeds, so
+///      a partial per-seed failure never sinks the model.
+///   3. **Render failure** (handled inside `render_model`): a layout that scores
+///      but fails to rasterize or write is non-fatal -- it is WARN-logged and
+///      its `Render` is `None`. A model can therefore appear in the report with
+///      its statistics but a missing PNG cell; this is intentionally NOT a
+///      model-level skip (the scores are still meaningful).
+fn process_model(
+    spec: &ModelSpec,
+    seeds: &[u64],
+    out: &str,
+) -> Result<(ModelStats, ModelRenders), String> {
+    // 1. Load (entry-layer validation lives in `load_model`).
+    let project = load_model(spec)?;
+
+    let n = loaded_element_count(&project);
+    println!("loaded {}: {n} elements", spec.key);
+
+    // 2. Sweep. A model with zero usable samples laid out on no seed -- it is a
+    //    model-level failure, not a degenerate all-zero report entry.
+    let stats = sweep_model(spec.key, &project, seeds);
+    if stats.samples.is_empty() {
+        return Err(format!(
+            "no usable layout: all {} seed(s) failed to lay out",
+            seeds.len(),
+        ));
+    }
+
+    let (p25, p75) = stats.spread;
+    println!(
+        "{}: median={:.4} p25/p75={:.4}/{:.4} best_of_k={:.4} (M={})",
+        spec.key,
+        stats.median_cost,
+        p25,
+        p75,
+        stats.best_of_k_cost,
+        stats.samples.len(),
+    );
+
+    // 3. Render best/median/worst (and the reference, if any). Render failures
+    //    are non-fatal: `render_model` WARN-logs and leaves the cell `None`.
+    let renders = render_model(spec.key, &project, &stats, out);
+    report_renders(spec.key, &renders);
+
+    Ok((stats, renders))
+}
+
+// ── Report (metrics.json + index.html) ──────────────────────────────────────
+//
+// The structs below are the on-disk JSON shape. They are PURE DATA built once
+// from the in-memory `ModelStats` + `ModelRenders` the sweep produced, then
+// serialized straight to disk -- no recomputation. The contact-sheet HTML is
+// rendered from the same `EvalReport`, so the JSON table and the HTML can never
+// disagree. Building the report and rendering the HTML are pure (the only I/O
+// is the two `std::fs::write` calls in `main`).
+
+/// One rendered view's row in the JSON: the PNG filename, the seed that
+/// produced it (`None` for the as-loaded reference), the full per-term
+/// `LayoutMetrics` breakdown, and the scalar `weighted_cost` under the weights
+/// in use.
+#[derive(Serialize)]
+struct RenderReport {
+    file: String,
+    seed: Option<u64>,
+    metrics: LayoutMetrics,
+    weighted_cost: f64,
+}
+
+/// One model's full row in the JSON: its summary statistics (the seed-sweep
+/// center/spread, the best-of-k production proxy, the chosen best/median/worst
+/// seeds, and `m` -- the number of seeds actually swept) plus each of its
+/// renders' per-term breakdowns (`reference` present only when the model ships
+/// a hand-authored view).
+#[derive(Serialize)]
+struct ModelReport {
+    model: String,
+    /// Number of seeds swept for this model (the union of `LAYOUT_SEEDS` and
+    /// `0..M`, deduped). Recorded so a reader can interpret the spread.
+    m: usize,
+    median_cost: f64,
+    /// `(p25, p75)` of the per-seed weighted costs.
+    spread: (f64, f64),
+    /// Production proxy: min weighted cost over the `LAYOUT_SEEDS` seed set.
+    best_of_k_cost: f64,
+    best_seed: u64,
+    median_seed: u64,
+    worst_seed: u64,
+    /// The hand-authored reference render + score, when the model ships one.
+    reference: Option<RenderReport>,
+    best: Option<RenderReport>,
+    median: Option<RenderReport>,
+    worst: Option<RenderReport>,
+}
+
+/// The top-level `metrics.json` document: every scored model plus the corpus
+/// aggregates (the geomean of per-model medians and the weight set used).
+///
+/// `baseline_comparison` carries the baseline-vs-candidate diff (per-model +
+/// aggregate deltas with Mann-Whitney p-values) when a committed baseline JSON
+/// is present; it is `None` (and serde-skipped) when there is no baseline to
+/// diff against. A reader therefore sees the diff embedded directly in the JSON,
+/// or no `baseline_comparison` key at all.
+#[derive(Serialize)]
+struct EvalReport {
+    /// Models sorted worst-cost-first (highest `median_cost` at the front), the
+    /// same order the contact-sheet renders so the JSON and HTML agree.
+    models: Vec<ModelReport>,
+    /// Geometric mean of the per-model medians -- the single headline aggregate.
+    geomean_of_medians: f64,
+    /// The `MetricWeights` used to compute every `weighted_cost` in this report.
+    weights: MetricWeights,
+    /// The baseline-vs-candidate diff, present only when a committed baseline
+    /// `CorpusReport` was found and compared against this run.
+    #[serde(skip_serializing_if = "Option::is_none")]
+    baseline_comparison: Option<Comparison>,
+}
+
+/// Map an in-memory `Render` to its JSON row.
+fn render_report(render: &Render) -> RenderReport {
+    RenderReport {
+        file: render.file.clone(),
+        seed: render.seed,
+        metrics: render.metrics,
+        weighted_cost: render.weighted_cost,
+    }
+}
+
+/// Build the serializable report from the sweep's in-memory results.
+///
+/// PURE: a read over `(per_model, renders)` (paired positionally -- they are
+/// pushed together per model in `main`) plus the corpus `geomean_of_medians`
+/// and the weight set. Models are sorted worst-cost-first (highest median at
+/// the front), the order the contact-sheet inspects top-down as the visual
+/// guardrail; ties break on the model name so the order is deterministic.
+fn build_report(
+    per_model: &[ModelStats],
+    renders: &[ModelRenders],
+    geomean_of_medians: f64,
+    weights: &MetricWeights,
+    baseline_comparison: Option<Comparison>,
+) -> EvalReport {
+    let mut models: Vec<ModelReport> = per_model
+        .iter()
+        .zip(renders.iter())
+        .map(|(stats, render)| ModelReport {
+            model: stats.model.clone(),
+            m: stats.samples.len(),
+            median_cost: stats.median_cost,
+            spread: stats.spread,
+            best_of_k_cost: stats.best_of_k_cost,
+            best_seed: stats.best_seed,
+            median_seed: stats.median_seed,
+            worst_seed: stats.worst_seed,
+            reference: render.reference.as_ref().map(render_report),
+            best: render.best.as_ref().map(render_report),
+            median: render.median.as_ref().map(render_report),
+            worst: render.worst.as_ref().map(render_report),
+        })
+        .collect();
+
+    // Worst-cost-first: highest median at the front. Sort descending by median,
+    // tie-break on model name (ascending) for a deterministic ordering. NaN
+    // medians can't occur (eval_stats guarantees finite costs), but guard the
+    // partial_cmp anyway so a hypothetical NaN never panics the sort.
+    models.sort_by(|a, b| {
+        b.median_cost
+            .partial_cmp(&a.median_cost)
+            .unwrap_or(std::cmp::Ordering::Equal)
+            .then_with(|| a.model.cmp(&b.model))
+    });
+
+    EvalReport {
+        models,
+        geomean_of_medians,
+        weights: *weights,
+        baseline_comparison,
+    }
+}
+
+/// HTML-escape the five characters that are special in element text or
+/// attribute values. The interpolated strings are static model keys and
+/// PNG filenames derived from them, so this is defense-in-depth rather than a
+/// live injection vector -- but escaping unconditionally keeps the artifact
+/// well-formed if a corpus key ever gains a special character.
+fn html_escape(s: &str) -> String {
+    let mut out = String::with_capacity(s.len());
+    for ch in s.chars() {
+        match ch {
+            '&' => out.push_str("&amp;"),
+            '<' => out.push_str("&lt;"),
+            '>' => out.push_str("&gt;"),
+            '"' => out.push_str("&quot;"),
+            '\'' => out.push_str("&#39;"),
+            _ => out.push(ch),
+        }
+    }
+    out
+}
+
+/// Render the per-term metric breakdown for one render as a compact two-column
+/// table (term name -> value), with the scalar `weighted_cost` as the final
+/// row. PURE: appends to `html`.
+fn write_metrics_table(html: &mut String, render: &RenderReport) {
+    let m = &render.metrics;
+    let rows = [
+        ("node_overlap", m.node_overlap),
+        ("node_connector_overlap", m.node_connector_overlap),
+        ("label_overlap", m.label_overlap),
+        ("crossings", m.crossings),
+        ("sprawl", m.sprawl),
+        ("edge_length_cv", m.edge_length_cv),
+        ("aspect_penalty", m.aspect_penalty),
+        ("chain_straightness", m.chain_straightness),
+        ("loop_compactness", m.loop_compactness),
+    ];
+    html.push_str("<table class=\"metrics\">");
+    for (name, value) in rows {
+        let _ = write!(
+            html,
+            "<tr><td>{name}</td><td class=\"num\">{value:.4}</td></tr>"
+        );
+    }
+    let _ = write!(
+        html,
+        "<tr class=\"wcost\"><td>weighted_cost</td><td class=\"num\">{:.4}</td></tr>",
+        render.weighted_cost
+    );
+    html.push_str("</table>");
+}
+
+/// Render one render's cell (heading + image + breakdown table). A missing
+/// render (the model shipped no reference, or its layout/render failed) renders
+/// a muted placeholder so the contact-sheet records the gap rather than hiding
+/// it. PURE.
+fn write_render_cell(html: &mut String, kind: &str, render: Option<&RenderReport>) {
+    html.push_str("<div class=\"cell\">");
+    let _ = write!(html, "<h4>{}</h4>", html_escape(kind));
+    match render {
+        Some(r) => {
+            let src = html_escape(&r.file);
+            let alt = html_escape(&format!("{kind} layout"));
+            let _ = write!(html, "<img src=\"{src}\" alt=\"{alt}\">");
+            if let Some(seed) = r.seed {
+                let _ = write!(html, "<p class=\"seed\">seed {seed}</p>");
+            }
+            write_metrics_table(html, r);
+        }
+        None => html.push_str("<p class=\"missing\">(not rendered)</p>"),
+    }
+    html.push_str("</div>");
+}
+
+/// Format a `delta_ratio` as a signed percentage (e.g. `+3.2%`, `-0.0%`). PURE.
+fn fmt_delta_pct(ratio: f64) -> String {
+    format!("{:+.2}%", ratio * 100.0)
+}
+
+/// Render the baseline-vs-candidate diff into the header: the aggregate delta +
+/// significance verdict, then a per-model table of `delta_ratio`, the
+/// Mann-Whitney p-value, and the significance verdict. A `None` comparison (no
+/// committed baseline) renders a muted note instead, so the contact-sheet always
+/// records whether a baseline was diffed. PURE: appends to `html`.
+fn write_baseline_diff(html: &mut String, comparison: Option<&Comparison>) {
+    let Some(cmp) = comparison else {
+        html.push_str(
+            "<p class=\"none\">No baseline diff (run with \
+             <code>LAYOUT_EVAL_WRITE_BASELINE=1</code> to seed one).</p>\n",
+        );
+        return;
+    };
+
+    html.push_str("<div class=\"baseline\"><h3>Baseline diff</h3>");
+    let agg_class = if cmp.aggregate_significant {
+        "sig"
+    } else {
+        "nonsig"
+    };
+    let agg_verdict = if cmp.aggregate_significant {
+        "significant"
+    } else {
+        "not significant"
+    };
+    let _ = write!(
+        html,
+        "<p class=\"agg\">aggregate delta <code>{}</code> &middot; \
+         p={:.4} &middot; <span class=\"{agg_class}\">{agg_verdict}</span></p>",
+        fmt_delta_pct(cmp.aggregate_delta_ratio),
+        cmp.aggregate_p_value,
+    );
+
+    if cmp.per_model.is_empty() {
+        html.push_str("<p class=\"agg\">(no models matched the baseline)</p></div>\n");
+        return;
+    }
+
+    html.push_str(
+        "<table class=\"diff\"><tr><th>model</th><th>baseline</th>\
+         <th>candidate</th><th>delta</th><th>p</th><th>significance</th></tr>",
+    );
+    for m in &cmp.per_model {
+        let (cls, verdict) = if m.significant {
+            ("sig", "significant")
+        } else {
+            ("nonsig", "&mdash;")
+        };
+        let _ = write!(
+            html,
+            "<tr><td>{}</td><td class=\"num\">{:.4}</td><td class=\"num\">{:.4}</td>\
+             <td class=\"num\">{}</td><td class=\"num\">{:.4}</td>\
+             <td class=\"{cls}\">{verdict}</td></tr>",
+            html_escape(&m.model),
+            m.baseline_median,
+            m.candidate_median,
+            fmt_delta_pct(m.delta_ratio),
+            m.p_value,
+        );
+    }
+    html.push_str("</table></div>\n");
+}
+
+/// Render the self-contained `index.html` contact-sheet from the report.
+///
+/// PURE: a string built from `report`. The header shows the corpus
+/// `geomean_of_medians`, the weight set, and (when a committed baseline was
+/// diffed) the baseline-vs-candidate delta table; models are laid out one
+/// section per model, worst-cost-first (the report is already sorted), each with
+/// its reference (if any) and best/median/worst renders side by side and a
+/// per-term breakdown under each. `<img>` paths are relative to the out dir so
+/// the file references its sibling PNGs.
+fn render_index_html(report: &EvalReport) -> String {
+    let mut html = String::new();
+    html.push_str(
+        "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n\
+         <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n\
+         <title>Layout quality eval</title>\n<style>\n\
+         :root { font-family: Roboto, Helvetica, Arial, sans-serif; }\n\
+         body { margin: 24px; color: #1a1a1a; background: #fafafa; }\n\
+         h1 { font-size: 20px; margin: 0 0 4px; }\n\
+         .summary { color: #555; font-size: 13px; margin-bottom: 16px; }\n\
+         .summary code { background: #eee; padding: 1px 4px; border-radius: 4px; }\n\
+         table.weights { border-collapse: collapse; font-size: 12px; margin: 8px 0 24px; }\n\
+         table.weights td { border: 1px solid #ddd; padding: 2px 8px; }\n\
+         .baseline { border: 1px solid #ddd; border-radius: 4px; background: #fff;\n\
+                     padding: 8px 12px; margin: 8px 0 24px; }\n\
+         .baseline h3 { font-size: 13px; margin: 0 0 6px; }\n\
+         .baseline .agg { font-size: 12px; color: #555; margin: 0 0 6px; }\n\
+         table.diff { border-collapse: collapse; font-size: 12px; }\n\
+         table.diff th, table.diff td { border: 1px solid #eee; padding: 2px 8px;\n\
+                                        text-align: right; }\n\
+         table.diff th:first-child, table.diff td:first-child { text-align: left; }\n\
+         table.diff td.num { font-variant-numeric: tabular-nums; }\n\
+         .sig { color: #c62828; font-weight: 600; }\n\
+         .nonsig { color: #888; }\n\
+         .none { color: #999; font-style: italic; font-size: 12px; margin: 0 0 24px; }\n\
+         .model { border: 1px solid #ddd; border-radius: 4px; background: #fff;\n\
+                  padding: 12px 16px; margin-bottom: 20px; }\n\
+         .model h2 { font-size: 16px; margin: 0 0 2px; }\n\
+         .model .stats { color: #555; font-size: 12px; margin-bottom: 12px; }\n\
+         .renders { display: flex; flex-wrap: wrap; gap: 16px; }\n\
+         .cell { flex: 0 0 auto; max-width: 280px; }\n\
+         .cell h4 { font-size: 13px; margin: 0 0 4px; text-transform: capitalize; }\n\
+         .cell img { max-width: 280px; height: auto; border: 1px solid #eee;\n\
+                     background: #fff; display: block; }\n\
+         .cell .seed { font-size: 11px; color: #888; margin: 4px 0 2px; }\n\
+         .cell .missing { font-size: 12px; color: #999; font-style: italic; }\n\
+         table.metrics { border-collapse: collapse; font-size: 11px; margin-top: 4px;\n\
+                         width: 100%; }\n\
+         table.metrics td { border-bottom: 1px solid #f0f0f0; padding: 1px 4px; }\n\
+         table.metrics td.num { text-align: right; font-variant-numeric: tabular-nums; }\n\
+         table.metrics tr.wcost td { font-weight: 600; border-top: 1px solid #ccc; }\n\
+         </style>\n</head>\n<body>\n",
+    );
+
+    html.push_str("<h1>Layout quality eval</h1>\n");
+    let _ = writeln!(
+        &mut html,
+        "<p class=\"summary\">Corpus <code>geomean_of_medians = {:.4}</code> over \
+         {} model(s), sorted worst-cost-first.</p>",
+        report.geomean_of_medians,
+        report.models.len(),
+    );
+
+    // The weight set used for every weighted_cost in this report.
+    let w = &report.weights;
+    let weight_rows = [
+        ("node_overlap", w.node_overlap),
+        ("node_connector_overlap", w.node_connector_overlap),
+        ("label_overlap", w.label_overlap),
+        ("crossings", w.crossings),
+        ("sprawl", w.sprawl),
+        ("edge_length_cv", w.edge_length_cv),
+        ("aspect_penalty", w.aspect_penalty),
+        ("chain_straightness", w.chain_straightness),
+        ("loop_compactness", w.loop_compactness),
+    ];
+    html.push_str("<table class=\"weights\"><caption>weights</caption>");
+    for (name, value) in weight_rows {
+        let _ = write!(
+            &mut html,
+            "<tr><td>{name}</td><td class=\"num\">{value:.4}</td></tr>"
+        );
+    }
+    html.push_str("</table>\n");
+
+    write_baseline_diff(&mut html, report.baseline_comparison.as_ref());
+
+    for model in &report.models {
+        let name = html_escape(&model.model);
+        html.push_str("<section class=\"model\">");
+        let _ = write!(&mut html, "<h2>{name}</h2>");
+        let _ = write!(
+            &mut html,
+            "<p class=\"stats\">median={:.4} &middot; p25/p75={:.4}/{:.4} &middot; \
+             best_of_k={:.4} &middot; M={} &middot; \
+             seeds best/median/worst={}/{}/{}</p>",
+            model.median_cost,
+            model.spread.0,
+            model.spread.1,
+            model.best_of_k_cost,
+            model.m,
+            model.best_seed,
+            model.median_seed,
+            model.worst_seed,
+        );
+        html.push_str("<div class=\"renders\">");
+        write_render_cell(&mut html, "reference", model.reference.as_ref());
+        write_render_cell(&mut html, "best", model.best.as_ref());
+        write_render_cell(&mut html, "median", model.median.as_ref());
+        write_render_cell(&mut html, "worst", model.worst.as_ref());
+        html.push_str("</div></section>\n");
+    }
+
+    html.push_str("</body>\n</html>\n");
+    html
+}
+
+// ── Baseline diff (imperative shell) ─────────────────────────────────────────
+
+/// Write `candidate` to the committed baseline JSON, replacing any existing
+/// file. The full `CorpusReport` -- including each model's per-seed `samples` --
+/// is serialized so a later run can re-run Mann-Whitney U over the seed-sample
+/// cost sets. On a serialize or write failure WARN to stderr (the run still
+/// emits its `target/` artifacts; only the baseline re-seed failed).
+fn write_baseline(candidate: &CorpusReport) {
+    let path = baseline_path();
+    match serde_json::to_string_pretty(candidate) {
+        Ok(json) => match std::fs::write(&path, json) {
+            Ok(()) => println!(
+                "wrote baseline {path}\n\
+                 note: re-seed this baseline after the metric weights change."
+            ),
+            Err(err) => eprintln!("WARN: failed to write baseline {path}: {err}"),
+        },
+        Err(err) => eprintln!("WARN: failed to serialize baseline: {err}"),
+    }
+}
+
+/// Read and deserialize the committed baseline `CorpusReport`, if present.
+///
+/// Returns `None` (with a one-line note) when the file does not exist -- the
+/// expected state before a baseline has been seeded. A file that exists but
+/// fails to read or parse is a real error: WARN with the cause and return `None`
+/// so the run still emits its artifacts without a diff.
+fn read_baseline() -> Option<CorpusReport> {
+    let path = baseline_path();
+    let json = match std::fs::read_to_string(&path) {
+        Ok(json) => json,
+        Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+            println!("no baseline; run with LAYOUT_EVAL_WRITE_BASELINE=1 to seed one.");
+            return None;
+        }
+        Err(err) => {
+            eprintln!("WARN: failed to read baseline {path}: {err}");
+            return None;
+        }
+    };
+    match serde_json::from_str::<CorpusReport>(&json) {
+        Ok(report) => Some(report),
+        Err(err) => {
+            eprintln!("WARN: failed to parse baseline {path}: {err}");
+            None
+        }
+    }
+}
+
+/// Print the baseline-vs-candidate diff to stdout: one line per matched model
+/// (delta + p-value + significance) and an aggregate line. PURE-ish: reads
+/// `cmp` and prints; kept in the shell because it does I/O (stdout).
+fn print_comparison(cmp: &Comparison) {
+    println!("baseline diff (candidate vs baseline):");
+    for m in &cmp.per_model {
+        let verdict = if m.significant {
+            "significant"
+        } else {
+            "not significant"
+        };
+        println!(
+            "  {}: delta={} p={:.4} ({verdict})",
+            m.model,
+            fmt_delta_pct(m.delta_ratio),
+            m.p_value,
+        );
+    }
+    if cmp.per_model.is_empty() {
+        println!("  (no models matched the baseline)");
+    }
+    let agg_verdict = if cmp.aggregate_significant {
+        "significant"
+    } else {
+        "not significant"
+    };
+    println!(
+        "  aggregate: delta={} p={:.4} ({agg_verdict})",
+        fmt_delta_pct(cmp.aggregate_delta_ratio),
+        cmp.aggregate_p_value,
+    );
+}
+
+/// Resolve the baseline diff for this run.
+///
+/// When `LAYOUT_EVAL_WRITE_BASELINE` is set, (re)seed the committed baseline
+/// from `candidate` and return `None` (a seeding run reports no diff -- there is
+/// nothing yet to diff against). Otherwise read the committed baseline (if any),
+/// run `compare(baseline, candidate)`, print the diff, and return it for
+/// embedding in the artifacts. Absent baseline -> `None`.
+fn resolve_baseline_diff(candidate: &CorpusReport) -> Option<Comparison> {
+    if write_baseline_requested() {
+        write_baseline(candidate);
+        return None;
+    }
+    let baseline = read_baseline()?;
+    let cmp = compare(&baseline, candidate);
+    print_comparison(&cmp);
+    Some(cmp)
+}
+
+fn main() {
+    let keys = selected_keys();
+    let m = seed_count();
+    let seeds = seed_set(m);
+    let out = out_dir();
+
+    std::fs::create_dir_all(&out)
+        .unwrap_or_else(|e| panic!("failed to create output dir {out}: {e}"));
+
+    let n_sampled = seeds.len();
+    println!(
+        "layout_eval: {} model(s), M={m} seeds (sampling {n_sampled} unique), out={out}",
+        keys.len(),
+    );
+
+    // Per-model skip-on-failure (AC3.6): each model's full pipeline (load ->
+    // sweep -> render) is wrapped in `process_model`. ANY failure -- a load
+    // error, a layout that fails on every seed, etc. -- is WARN-logged and the
+    // sweep CONTINUES to the next model; the failed model is omitted from
+    // `per_model`/`renders` (and therefore from every artifact). The harness
+    // always reaches the end and exits 0, even if every model was skipped.
+    //
+    // `per_model` and `renders` stay positionally paired: both are pushed
+    // exactly once per surviving model, so the Task-4 report builder can zip
+    // them.
+    let mut per_model: Vec<ModelStats> = Vec::new();
+    let mut renders: Vec<ModelRenders> = Vec::new();
+    let mut skipped = 0usize;
+    for spec in CORPUS.iter().filter(|s| keys.contains(&s.key)) {
+        match process_model(spec, &seeds, &out) {
+            Ok((stats, model_renders)) => {
+                per_model.push(stats);
+                renders.push(model_renders);
+            }
+            Err(err) => {
+                eprintln!("WARN: skipping {}: {err}", spec.key);
+                skipped += 1;
+            }
+        }
+    }
+    if skipped > 0 {
+        println!("skipped {skipped} model(s) (see WARN lines above)");
+    }
+
+    let corpus = CorpusReport::from_model_stats(per_model);
+    println!(
+        "corpus: geomean_of_medians={:.4} ({} model(s) scored)",
+        corpus.geomean_of_medians,
+        corpus.per_model.len(),
+    );
+
+    let with_reference = renders.iter().filter(|r| r.reference.is_some()).count();
+    println!(
+        "corpus: {with_reference}/{} model(s) shipped a hand-authored reference view",
+        renders.len(),
+    );
+
+    // Either (re)seed the committed baseline from this run, or diff this run's
+    // report against the committed baseline (printing the per-model + aggregate
+    // deltas with Mann-Whitney p-values). The returned `Comparison` (if any) is
+    // embedded into both artifacts below.
+    let baseline_comparison = resolve_baseline_diff(&corpus);
+
+    // Build the serializable report from the in-memory stats + renders, then
+    // emit both artifacts under the out dir (which defaults under the gitignored
+    // repo-root `target/`). `corpus.per_model` and `renders` are positionally
+    // paired -- both are pushed once per surviving model in the loop above.
+    let report = build_report(
+        &corpus.per_model,
+        &renders,
+        corpus.geomean_of_medians,
+        &MetricWeights::default(),
+        baseline_comparison,
+    );
+
+    let metrics_path = format!("{out}/metrics.json");
+    match serde_json::to_string_pretty(&report) {
+        Ok(json) => match std::fs::write(&metrics_path, json) {
+            Ok(()) => println!("wrote {metrics_path}"),
+            Err(err) => eprintln!("WARN: failed to write {metrics_path}: {err}"),
+        },
+        Err(err) => eprintln!("WARN: failed to serialize metrics.json: {err}"),
+    }
+
+    let index_path = format!("{out}/index.html");
+    let html = render_index_html(&report);
+    match std::fs::write(&index_path, html) {
+        Ok(()) => println!("wrote {index_path}"),
+        Err(err) => eprintln!("WARN: failed to write {index_path}: {err}"),
+    }
+}
diff --git a/src/simlin-engine/examples/layout_eval_baseline.README.md b/src/simlin-engine/examples/layout_eval_baseline.README.md
new file mode 100644
index 000000000..c007f65ca
--- /dev/null
+++ b/src/simlin-engine/examples/layout_eval_baseline.README.md
@@ -0,0 +1,33 @@
+# layout_eval_baseline.json
+
+The committed baseline `CorpusReport` that `examples/layout_eval.rs` diffs every
+normal run against (per-model + aggregate deltas with Mann-Whitney U p-values).
+
+## How this snapshot was seeded
+
+This baseline was seeded over a **small representative subset** of the corpus to
+keep the run fast and the committed JSON modest:
+
+```
+LAYOUT_EVAL_MODELS=sir,teacup LAYOUT_EVAL_SEEDS=8 LAYOUT_EVAL_WRITE_BASELINE=1 \
+  cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval
+```
+
+It records the **current pre-Rung-0 layout behavior**, scored with the committed
+calibrated `MetricWeights::default()`. It was re-seeded on 2026-05-23 after
+Phase 4 committed those weights and `layout_eval.rs` switched from the Phase-3
+`PLACEHOLDER_WEIGHTS` to `MetricWeights::default()`. Do not seed the full metasd
+corpus here: that is minutes-scale and produces a large JSON.
+
+## When to regenerate
+
+REGENERATE this baseline:
+
+- **Whenever the calibrated `MetricWeights::default()` change**: the weighted
+  costs change, so the recorded sample costs go stale.
+- **Before Phase 5 measures Rung 0's improvement**: the baseline must capture
+  pre-Rung-0 behavior with the final calibrated weights so the Rung-0 diff is
+  meaningful.
+
+Re-run the seeding command above (optionally over a broader model set / larger
+`LAYOUT_EVAL_SEEDS`) and commit the regenerated `layout_eval_baseline.json`.
diff --git a/src/simlin-engine/examples/layout_eval_baseline.json b/src/simlin-engine/examples/layout_eval_baseline.json
new file mode 100644
index 000000000..660587f39
--- /dev/null
+++ b/src/simlin-engine/examples/layout_eval_baseline.json
@@ -0,0 +1,393 @@
+{
+  "per_model": [
+    {
+      "model": "teacup",
+      "samples": [
+        {
+          "seed": 0,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 1,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 2,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 3,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 4,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 5,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 6,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 7,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 42,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 123,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 456,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        },
+        {
+          "seed": 789,
+          "metrics": {
+            "node_overlap": 0.03901734104046243,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.0,
+            "crossings": 0.0,
+            "sprawl": 0.774985901426613,
+            "edge_length_cv": 0.3203457592744067,
+            "aspect_penalty": 0.0,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.03901734104046243
+        }
+      ],
+      "median_cost": 0.03901734104046243,
+      "spread": [
+        0.03901734104046243,
+        0.03901734104046243
+      ],
+      "best_of_k_cost": 0.03901734104046243,
+      "best_seed": 0,
+      "median_seed": 0,
+      "worst_seed": 0
+    },
+    {
+      "model": "sir",
+      "samples": [
+        {
+          "seed": 0,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 1,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 2,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 3,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 4,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 5,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 6,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 7,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 42,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 123,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 456,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        },
+        {
+          "seed": 789,
+          "metrics": {
+            "node_overlap": 0.0,
+            "node_connector_overlap": 0.0,
+            "label_overlap": 0.038540721316451254,
+            "crossings": 0.0,
+            "sprawl": 0.7423022923087866,
+            "edge_length_cv": 0.39340989843910823,
+            "aspect_penalty": 0.06837606837606858,
+            "chain_straightness": 0.0,
+            "loop_compactness": 0.0
+          },
+          "weighted_cost": 0.038540721316451254
+        }
+      ],
+      "median_cost": 0.038540721316451254,
+      "spread": [
+        0.038540721316451254,
+        0.038540721316451254
+      ],
+      "best_of_k_cost": 0.038540721316451254,
+      "best_seed": 0,
+      "median_seed": 0,
+      "worst_seed": 0
+    }
+  ],
+  "geomean_of_medians": 0.03877829892542217
+}
\ No newline at end of file
diff --git a/src/simlin-engine/src/diagram/common.rs b/src/simlin-engine/src/diagram/common.rs
index cf4a16596..683747f4d 100644
--- a/src/simlin-engine/src/diagram/common.rs
+++ b/src/simlin-engine/src/diagram/common.rs
@@ -137,6 +137,113 @@ pub fn rad_to_deg(r: f64) -> f64 {
     (r * 180.0) / PI
 }
 
+// These rectangle/segment geometry primitives are the load-bearing helpers for
+// the layout quality metric (`layout::metrics`). `rect_width`/`rect_height`/
+// `rect_area`/`rect_overlap_area` are consumed there (node-overlap,
+// label-overlap, sprawl, and aspect terms), and `segment_clip_interval_in_rect`
+// is the Liang-Barsky core that `node_connector_overlap` unions across boxes.
+// `rect_contains_point` and `segment_length_in_rect` are primitives kept for
+// completeness and as the single-box reference oracle the metric's tests check
+// the union path against, so each stays `#[allow(dead_code)]` until a non-test
+// caller needs it.
+
+/// Width of a rect (right - left). May be negative for a degenerate/inverted rect.
+pub(crate) fn rect_width(r: &Rect) -> f64 {
+    r.right - r.left
+}
+
+/// Height of a rect (bottom - top).
+pub(crate) fn rect_height(r: &Rect) -> f64 {
+    r.bottom - r.top
+}
+
+/// Area of a rect, clamped to >= 0.
+pub(crate) fn rect_area(r: &Rect) -> f64 {
+    (rect_width(r).max(0.0)) * (rect_height(r).max(0.0))
+}
+
+/// Area of the axis-aligned intersection of two rects (0 if they do not overlap).
+pub(crate) fn rect_overlap_area(a: &Rect, b: &Rect) -> f64 {
+    let w = a.right.min(b.right) - a.left.max(b.left);
+    let h = a.bottom.min(b.bottom) - a.top.max(b.top);
+    if w > 0.0 && h > 0.0 { w * h } else { 0.0 }
+}
+
+/// True if `p` lies inside (or on the boundary of) `r`.
+#[allow(dead_code)]
+pub(crate) fn rect_contains_point(r: &Rect, p: &Point) -> bool {
+    p.x >= r.left && p.x <= r.right && p.y >= r.top && p.y <= r.bottom
+}
+
+/// Clipped parameter interval `[t0, t1]` of segment `p0 + t*(p1-p0)` (t in
+/// [0,1]) that lies within axis-aligned rect `r`, or `None` if the segment never
+/// enters `r`. When `Some`, `0.0 <= t0 < t1 <= 1.0` (a zero-thickness touch
+/// where `t0 == t1` returns `None`, contributing no length). This is the
+/// Liang-Barsky core; `segment_length_in_rect` delegates to it, and
+/// `layout::metrics` uses the raw intervals to UNION a connector's coverage
+/// across multiple boxes so each physical sub-length is counted at most once.
+/// Pure; no allocation.
+pub(crate) fn segment_clip_interval_in_rect(
+    p0: &Point,
+    p1: &Point,
+    r: &Rect,
+) -> Option<(f64, f64)> {
+    // Liang-Barsky clip of the parametric segment p0 + t*(p1-p0), t in [0,1],
+    // against left/right/top/bottom slabs.
+    let dx = p1.x - p0.x;
+    let dy = p1.y - p0.y;
+    let mut t0 = 0.0_f64;
+    let mut t1 = 1.0_f64;
+    // (p, q) pairs for the four half-planes; segment inside slab where p*t <= q.
+    let edges = [
+        (-dx, p0.x - r.left),
+        (dx, r.right - p0.x),
+        (-dy, p0.y - r.top),
+        (dy, r.bottom - p0.y),
+    ];
+    for (p, q) in edges {
+        if p == 0.0 {
+            if q < 0.0 {
+                return None; // parallel and outside this slab
+            }
+        } else {
+            let t = q / p;
+            if p < 0.0 {
+                if t > t1 {
+                    return None;
+                }
+                if t > t0 {
+                    t0 = t;
+                }
+            } else {
+                if t < t0 {
+                    return None;
+                }
+                if t < t1 {
+                    t1 = t;
+                }
+            }
+        }
+    }
+    if t1 > t0 { Some((t0, t1)) } else { None }
+}
+
+/// Length of the portion of segment p0->p1 that lies within axis-aligned rect r.
+/// Returns 0 if the segment never enters r. Pure; no allocation. Delegates to
+/// `segment_clip_interval_in_rect` so the clip math lives in exactly one place.
+#[allow(dead_code)]
+pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 {
+    match segment_clip_interval_in_rect(p0, p1, r) {
+        Some((t0, t1)) => {
+            let dx = p1.x - p0.x;
+            let dy = p1.y - p0.y;
+            let seg_len = (dx * dx + dy * dy).sqrt();
+            (t1 - t0) * seg_len
+        }
+        None => 0.0,
+    }
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -282,4 +389,194 @@ mod tests {
         assert!((rad_to_deg(PI) - 180.0).abs() < 1e-10);
         assert!((rad_to_deg(PI / 2.0) - 90.0).abs() < 1e-10);
     }
+
+    #[test]
+    fn test_rect_dimensions() {
+        let r = Rect {
+            top: 10.0,
+            left: 20.0,
+            right: 50.0,
+            bottom: 70.0,
+        };
+        assert_eq!(rect_width(&r), 30.0);
+        assert_eq!(rect_height(&r), 60.0);
+        assert_eq!(rect_area(&r), 30.0 * 60.0);
+    }
+
+    #[test]
+    fn test_rect_area_clamps_negative() {
+        // An inverted/degenerate rect (right < left, bottom < top) has
+        // negative width/height; rect_area clamps each to 0 so the result is 0.
+        let inverted = Rect {
+            top: 70.0,
+            left: 50.0,
+            right: 20.0,
+            bottom: 10.0,
+        };
+        assert!(rect_width(&inverted) < 0.0);
+        assert!(rect_height(&inverted) < 0.0);
+        assert_eq!(rect_area(&inverted), 0.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_known_overlap() {
+        // a covers x in [0,10], y in [0,10]; b covers x in [5,15], y in [5,15].
+        // Their intersection is x in [5,10], y in [5,10] => 5 x 5 = 25.
+        let a = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let b = Rect {
+            top: 5.0,
+            left: 5.0,
+            right: 15.0,
+            bottom: 15.0,
+        };
+        assert_eq!(rect_overlap_area(&a, &b), 25.0);
+        // Overlap is symmetric in argument order.
+        assert_eq!(rect_overlap_area(&b, &a), 25.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_disjoint() {
+        let a = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let b = Rect {
+            top: 20.0,
+            left: 20.0,
+            right: 30.0,
+            bottom: 30.0,
+        };
+        assert_eq!(rect_overlap_area(&a, &b), 0.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_identical() {
+        // Two identical rects overlap by their full area.
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 4.0,
+        };
+        assert_eq!(rect_overlap_area(&r, &r), rect_area(&r));
+        assert_eq!(rect_overlap_area(&r, &r), 40.0);
+    }
+
+    #[test]
+    fn test_rect_overlap_area_touching_edge() {
+        // b's left edge touches a's right edge (both at x=10): zero-width overlap => 0.
+        let a = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let b = Rect {
+            top: 0.0,
+            left: 10.0,
+            right: 20.0,
+            bottom: 10.0,
+        };
+        assert_eq!(rect_overlap_area(&a, &b), 0.0);
+    }
+
+    #[test]
+    fn test_rect_contains_point() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Strictly inside.
+        assert!(rect_contains_point(&r, &Point { x: 5.0, y: 5.0 }));
+        // On the boundary (inclusive).
+        assert!(rect_contains_point(&r, &Point { x: 0.0, y: 0.0 }));
+        assert!(rect_contains_point(&r, &Point { x: 10.0, y: 10.0 }));
+        assert!(rect_contains_point(&r, &Point { x: 0.0, y: 5.0 }));
+        // Outside on each side.
+        assert!(!rect_contains_point(&r, &Point { x: -1.0, y: 5.0 }));
+        assert!(!rect_contains_point(&r, &Point { x: 11.0, y: 5.0 }));
+        assert!(!rect_contains_point(&r, &Point { x: 5.0, y: -1.0 }));
+        assert!(!rect_contains_point(&r, &Point { x: 5.0, y: 11.0 }));
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_crosses_fully() {
+        // Rect spans x in [0,10], y in [0,10]. A horizontal segment from
+        // (-5, 5) to (15, 5) enters at x=0 and exits at x=10 => inside length 10.
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let got =
+            segment_length_in_rect(&Point { x: -5.0, y: 5.0 }, &Point { x: 15.0, y: 5.0 }, &r);
+        assert!((got - 10.0).abs() < 1e-9, "got {got}");
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_entirely_outside() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Segment well above the rect, never enters.
+        let got =
+            segment_length_in_rect(&Point { x: -5.0, y: 50.0 }, &Point { x: 15.0, y: 50.0 }, &r);
+        assert_eq!(got, 0.0);
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_entirely_inside() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Segment from (2,2) to (5,6): both endpoints inside; full length is
+        // sqrt(3^2 + 4^2) = 5.
+        let got = segment_length_in_rect(&Point { x: 2.0, y: 2.0 }, &Point { x: 5.0, y: 6.0 }, &r);
+        assert!((got - 5.0).abs() < 1e-9, "got {got}");
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_one_endpoint_inside() {
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        // Horizontal segment from (5,5) (inside) to (25,5) (outside): the
+        // portion inside runs from x=5 to x=10 => length 5.
+        let got = segment_length_in_rect(&Point { x: 5.0, y: 5.0 }, &Point { x: 25.0, y: 5.0 }, &r);
+        assert!((got - 5.0).abs() < 1e-9, "got {got}");
+    }
+
+    #[test]
+    fn test_segment_length_in_rect_parallel_outside_slab() {
+        // A vertical segment to the left of the rect is parallel to the
+        // left/right slabs and outside them: dx == 0 with q < 0 => 0.
+        let r = Rect {
+            top: 0.0,
+            left: 0.0,
+            right: 10.0,
+            bottom: 10.0,
+        };
+        let got =
+            segment_length_in_rect(&Point { x: -5.0, y: -5.0 }, &Point { x: -5.0, y: 15.0 }, &r);
+        assert_eq!(got, 0.0);
+    }
 }
diff --git a/src/simlin-engine/src/diagram/connector.rs b/src/simlin-engine/src/diagram/connector.rs
index a59f5a0e0..14ed42f0f 100644
--- a/src/simlin-engine/src/diagram/connector.rs
+++ b/src/simlin-engine/src/diagram/connector.rs
@@ -13,6 +13,15 @@ use crate::diagram::common::{
 };
 use crate::diagram::constants::*;
 
+/// Number of straight segments used to approximate a drawn arc connector when
+/// producing its polyline for crossing detection and metric computation. 16
+/// segments closely tracks the curve: the maximum chord-to-arc deviation for a
+/// half-circle sampled this finely is well under a pixel at typical diagram
+/// radii, which is more than enough to detect whether the arc crosses another
+/// edge. It does not affect rendered SVG (the renderer still emits a single
+/// `A` arc command); it only governs the sampled geometry the metric sees.
+pub(crate) const ARC_POLYLINE_SAMPLES: usize = 16;
+
 enum ElementShape {
     Circle { r: f64 },
     Rect { hw: f64, hh: f64 },
@@ -101,7 +110,10 @@ fn is_element_arrayed(element: &ViewElement, is_arrayed_fn: &dyn Fn(&str) -> boo
     }
 }
 
-fn get_visual_center(element: &ViewElement, is_arrayed_fn: &dyn Fn(&str) -> bool) -> (f64, f64) {
+pub(crate) fn get_visual_center(
+    element: &ViewElement,
+    is_arrayed_fn: &dyn Fn(&str) -> bool,
+) -> (f64, f64) {
     let (cx, cy) = match element {
         ViewElement::Aux(a) => (a.x, a.y),
         ViewElement::Stock(s) => (s.x, s.y),
@@ -140,7 +152,7 @@ fn circle_from_points(p1: Point, p2: Point, p3: Point) -> Result<Circle, &'stati
     Ok(Circle { x: cx, y: cy, r })
 }
 
-fn opposite_theta(theta: f64) -> f64 {
+pub(crate) fn opposite_theta(theta: f64) -> f64 {
     let mut t = theta + PI;
     if t > PI {
         t -= 2.0 * PI;
@@ -148,7 +160,7 @@ fn opposite_theta(theta: f64) -> f64 {
     t
 }
 
-fn intersect_element_straight(
+pub(crate) fn intersect_element_straight(
     element: &ViewElement,
     theta: f64,
     is_arrayed_fn: &dyn Fn(&str) -> bool,
@@ -164,7 +176,7 @@ fn intersect_element_straight(
     }
 }
 
-fn intersect_element_arc(
+pub(crate) fn intersect_element_arc(
     element: &ViewElement,
     circ: &Circle,
     inv: bool,
@@ -215,7 +227,7 @@ fn intersect_element_arc(
     }
 }
 
-fn is_straight_line(
+pub(crate) fn is_straight_line(
     element: &view_element::Link,
     from: &ViewElement,
     to: &ViewElement,
@@ -234,7 +246,7 @@ fn is_straight_line(
     }
 }
 
-fn arc_circle(
+pub(crate) fn arc_circle(
     element: &view_element::Link,
     from: &ViewElement,
     to: &ViewElement,
@@ -342,24 +354,48 @@ fn render_straight_line(
     svg
 }
 
-fn render_arc(
+/// The exact scalars `render_arc` needs to format its SVG, plus what an arc
+/// sampler needs to reproduce the drawn curve as a polyline. All fields are
+/// raw f64 (no pre-rounding): rounding happens only at the `js_format_number`
+/// boundary in `render_arc`, so the SVG string stays byte-for-byte identical
+/// to the pre-factor-out code (and to the TypeScript renderer).
+#[derive(Clone, Copy)]
+struct ArcGeometry {
+    /// SVG path start (= `from_visual`, the source element center).
+    start: Point,
+    /// SVG path end (= `to_visual`, the target element center).
+    arc_end: Point,
+    /// Arc center and radius.
+    circ: Circle,
+    /// SVG large-arc-flag.
+    sweep: bool,
+    /// SVG sweep-flag.
+    inv: bool,
+    /// Arrowhead anchor point on the target element boundary.
+    end: Point,
+    /// Final arrowhead rotation in degrees (already adjusted for `inv`).
+    arrow_theta: f64,
+}
+
+/// Compute the drawn-arc geometry for a connector. Returns `None` in the two
+/// cases the renderer draws nothing: a non-`Arc` shape (e.g. `MultiPoint`) and
+/// a degenerate arc where `arc_circle` cannot be constructed. The body is the
+/// verbatim geometry the original `render_arc` computed (lines that produced
+/// `circ`, `inv`, `sweep`, `start`, `arc_end`, `end`, and `arrow_theta`).
+fn arc_geometry(
     element: &view_element::Link,
     from: &ViewElement,
     to: &ViewElement,
-    is_to_stock: bool,
     is_arrayed_fn: &dyn Fn(&str) -> bool,
-) -> String {
+) -> Option<ArcGeometry> {
     let from_visual = get_visual_center(from, is_arrayed_fn);
     let to_visual = get_visual_center(to, is_arrayed_fn);
 
-    let circ = match arc_circle(element, from, to, is_arrayed_fn) {
-        Some(c) => c,
-        None => return "<g></g>".to_string(),
-    };
+    let circ = arc_circle(element, from, to, is_arrayed_fn)?;
 
     let takeoff_angle = match &element.shape {
         LinkShape::Arc(arc) => deg_to_rad(*arc),
-        _ => return "<g></g>".to_string(),
+        _ => return None,
     };
 
     let from_theta = (from_visual.1 - circ.y).atan2(from_visual.0 - circ.x);
@@ -397,23 +433,120 @@ fn render_arc(
     };
     let end = intersect_element_arc(to, &circ, !inv, is_arrayed_fn);
 
-    let path = format!(
-        "M{},{}A{},{} 0 {},{} {},{}",
-        js_format_number(start.x),
-        js_format_number(start.y),
-        js_format_number(circ.r),
-        js_format_number(circ.r),
-        sweep as u8,
-        inv as u8,
-        js_format_number(arc_end.x),
-        js_format_number(arc_end.y)
-    );
-
     let mut arrow_theta = rad_to_deg((end.y - circ.y).atan2(end.x - circ.x)) - 90.0;
     if inv {
         arrow_theta += 180.0;
     }
 
+    Some(ArcGeometry {
+        start,
+        arc_end,
+        circ,
+        sweep,
+        inv,
+        end,
+        arrow_theta,
+    })
+}
+
+/// Sample the drawn SVG arc as a polyline from `g.start` to `g.arc_end` along
+/// `g.circ`, honoring the SVG large-arc (`g.sweep`) and sweep (`g.inv`) flags.
+/// Uses the standard SVG endpoint->center arc parametrization: derive the
+/// start angle and a signed sweep `delta` from the two endpoint angles, then
+/// adjust `delta` so its sign matches the sweep-flag and its magnitude matches
+/// the large-arc-flag. Returns `samples.max(2) + 1` points.
+fn sample_arc(g: &ArcGeometry, samples: usize) -> Vec<Point> {
+    let n = samples.max(2);
+    let theta0 = (g.start.y - g.circ.y).atan2(g.start.x - g.circ.x);
+    let theta1 = (g.arc_end.y - g.circ.y).atan2(g.arc_end.x - g.circ.x);
+    // SVG sweep-flag (g.inv) selects direction; large-arc-flag (g.sweep)
+    // selects the >180-degree arc. Normalize delta accordingly.
+    let mut delta = theta1 - theta0;
+    let two_pi = 2.0 * std::f64::consts::PI;
+    // bring delta into (-2pi, 2pi)
+    while delta <= -two_pi {
+        delta += two_pi;
+    }
+    while delta >= two_pi {
+        delta -= two_pi;
+    }
+    let sweep_positive = g.inv; // sweep-flag set => angles increase
+    if sweep_positive && delta < 0.0 {
+        delta += two_pi;
+    }
+    if !sweep_positive && delta > 0.0 {
+        delta -= two_pi;
+    }
+    let large = g.sweep; // large-arc-flag
+    if large && delta.abs() < std::f64::consts::PI {
+        delta += if delta >= 0.0 { two_pi } else { -two_pi };
+    }
+    if !large && delta.abs() > std::f64::consts::PI {
+        delta += if delta >= 0.0 { -two_pi } else { two_pi };
+    }
+    (0..=n)
+        .map(|i| {
+            let t = i as f64 / n as f64;
+            let th = theta0 + delta * t;
+            Point {
+                x: g.circ.x + g.circ.r * th.cos(),
+                y: g.circ.y + g.circ.r * th.sin(),
+            }
+        })
+        .collect()
+}
+
+/// The polyline the renderer draws for a connector, as the metric/crossing
+/// code sees it. Straight links are clipped to element boundaries (matching
+/// `render_straight_line`); arcs are sampled center-to-center along the arc
+/// circle (matching `render_arc`, which draws start=from_visual to
+/// arc_end=to_visual); MultiPoint links return an empty vec because the
+/// renderer draws nothing for them today (known gap).
+pub(crate) fn connector_polyline(
+    element: &view_element::Link,
+    from: &ViewElement,
+    to: &ViewElement,
+    is_arrayed_fn: &dyn Fn(&str) -> bool,
+    arc_samples: usize,
+) -> Vec<Point> {
+    if is_straight_line(element, from, to, is_arrayed_fn) {
+        let from_visual = get_visual_center(from, is_arrayed_fn);
+        let to_visual = get_visual_center(to, is_arrayed_fn);
+        let theta = (to_visual.1 - from_visual.1).atan2(to_visual.0 - from_visual.0);
+        let start = intersect_element_straight(from, theta, is_arrayed_fn);
+        let end = intersect_element_straight(to, opposite_theta(theta), is_arrayed_fn);
+        return vec![start, end];
+    }
+    match arc_geometry(element, from, to, is_arrayed_fn) {
+        None => Vec::new(), // MultiPoint or degenerate arc: renderer draws nothing
+        Some(g) => sample_arc(&g, arc_samples),
+    }
+}
+
+fn render_arc(
+    element: &view_element::Link,
+    from: &ViewElement,
+    to: &ViewElement,
+    is_to_stock: bool,
+    is_arrayed_fn: &dyn Fn(&str) -> bool,
+) -> String {
+    let g = match arc_geometry(element, from, to, is_arrayed_fn) {
+        Some(g) => g,
+        None => return "<g></g>".to_string(),
+    };
+
+    let path = format!(
+        "M{},{}A{},{} 0 {},{} {},{}",
+        js_format_number(g.start.x),
+        js_format_number(g.start.y),
+        js_format_number(g.circ.r),
+        js_format_number(g.circ.r),
+        g.sweep as u8,
+        g.inv as u8,
+        js_format_number(g.arc_end.x),
+        js_format_number(g.arc_end.y)
+    );
+
     let connector_class = if is_to_stock {
         "simlin-connector simlin-connector-dashed"
     } else {
@@ -432,9 +565,9 @@ fn render_arc(
         connector_class
     ));
     svg.push_str(&render_arrowhead(
-        end.x,
-        end.y,
-        arrow_theta,
+        g.end.x,
+        g.end.y,
+        g.arrow_theta,
         ARROWHEAD_RADIUS,
         ArrowheadType::Connector,
     ));
@@ -558,6 +691,116 @@ mod tests {
         assert!(svg.contains("simlin-arrowhead-link"));
     }
 
+    /// Byte-identical regression guard for the arc factor-out. The expected
+    /// string was captured from the pre-refactor `render_arc` output for this
+    /// exact Arc link; the geometry extraction must not change a single byte
+    /// (the `svg-rendering.test.ts` parity test asserts Rust SVG == TS SVG).
+    #[test]
+    fn test_render_arc_svg_byte_identical() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::Arc(30.0),
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 200.0, "b", 2);
+
+        let svg = render_connector(&link, &from, &to, &not_arrayed);
+        let expected = "<g><path d=\"M100,100A273.20508075688764,273.20508075688764 0 0,1 200,200\" class=\"simlin-connector-bg\"></path><path d=\"M100,100A273.20508075688764,273.20508075688764 0 0,1 200,200\" class=\"simlin-connector\"></path><g><path d=\"M199.87072507234473,192.27852897536678L188.62072507234473,196.77852897536678A27,27 0 0,1 188.62072507234473,187.77852897536678z\" class=\"simlin-arrowhead-bg\" transform=\"rotate(58.1118629772876,195.37072507234473,192.27852897536678)\"></path><path d=\"M195.37072507234473,192.27852897536678L189.37072507234473,195.27852897536678A18,18 0 0,1 189.37072507234473,189.27852897536678z\" class=\"simlin-arrowhead-link\" transform=\"rotate(58.1118629772876,195.37072507234473,192.27852897536678)\"></path></g></g>";
+        assert_eq!(svg, expected);
+        assert!(svg.starts_with("<g><path d=\"M100,100A"));
+    }
+
+    #[test]
+    fn test_connector_polyline_straight_uses_boundary_endpoints() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::Straight,
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 100.0, "b", 2);
+
+        let poly = connector_polyline(&link, &from, &to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+        assert_eq!(poly.len(), 2, "straight link yields exactly two points");
+
+        // Endpoints are clipped to the element boundary (AUX_RADIUS), NOT the
+        // raw centers (100,100) and (200,100). theta = 0 along +x.
+        let expected_start = intersect_element_straight(&from, 0.0, &not_arrayed);
+        let expected_end = intersect_element_straight(&to, opposite_theta(0.0), &not_arrayed);
+        assert!((poly[0].x - expected_start.x).abs() < 1e-9);
+        assert!((poly[0].y - expected_start.y).abs() < 1e-9);
+        assert!((poly[1].x - expected_end.x).abs() < 1e-9);
+        assert!((poly[1].y - expected_end.y).abs() < 1e-9);
+        // Sanity: start is offset from the center by AUX_RADIUS, not at center.
+        assert!((poly[0].x - (100.0 + AUX_RADIUS)).abs() < 1e-9);
+        assert!((poly[1].x - (200.0 - AUX_RADIUS)).abs() < 1e-9);
+    }
+
+    #[test]
+    fn test_connector_polyline_arc_samples_on_circle() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::Arc(30.0),
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 200.0, "b", 2);
+
+        let poly = connector_polyline(&link, &from, &to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+        assert_eq!(
+            poly.len(),
+            ARC_POLYLINE_SAMPLES + 1,
+            "arc yields ARC_POLYLINE_SAMPLES segments => N+1 points"
+        );
+
+        // The drawn arc goes center-to-center (start = from_visual,
+        // arc_end = to_visual).
+        let first = poly.first().unwrap();
+        let last = poly.last().unwrap();
+        assert!((first.x - 100.0).abs() < 1e-6 && (first.y - 100.0).abs() < 1e-6);
+        assert!((last.x - 200.0).abs() < 1e-6 && (last.y - 200.0).abs() < 1e-6);
+
+        // Every sampled point lies on the arc circle.
+        let circ = arc_circle(&link, &from, &to, &not_arrayed).unwrap();
+        for p in &poly {
+            let d = (square(p.x - circ.x) + square(p.y - circ.y)).sqrt();
+            assert!(
+                (d - circ.r).abs() < 1e-6,
+                "point ({}, {}) not on arc circle: dist {} vs r {}",
+                p.x,
+                p.y,
+                d,
+                circ.r
+            );
+        }
+    }
+
+    #[test]
+    fn test_connector_polyline_multipoint_is_empty() {
+        let link = view_element::Link {
+            uid: 10,
+            from_uid: 1,
+            to_uid: 2,
+            shape: LinkShape::MultiPoint(vec![]),
+            polarity: None,
+        };
+        let from = make_aux_ve(100.0, 100.0, "a", 1);
+        let to = make_aux_ve(200.0, 200.0, "b", 2);
+
+        let poly = connector_polyline(&link, &from, &to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+        assert!(
+            poly.is_empty(),
+            "MultiPoint links draw nothing, so the polyline is empty"
+        );
+    }
+
     // --- ray_rect_intersection tests ---
 
     fn assert_on_rect_boundary(p: Point, cx: f64, cy: f64, hw: f64, hh: f64) {
diff --git a/src/simlin-engine/src/diagram/elements.rs b/src/simlin-engine/src/diagram/elements.rs
index ca6a56fcb..04215e974 100644
--- a/src/simlin-engine/src/diagram/elements.rs
+++ b/src/simlin-engine/src/diagram/elements.rs
@@ -49,16 +49,26 @@ pub fn render_aux(element: &view_element::Aux, is_arrayed: bool) -> String {
     svg
 }
 
-pub fn aux_bounds(element: &view_element::Aux) -> Rect {
+/// The aux's bare *shape* box (the circle's bounding rect), WITHOUT its label.
+/// `aux_bounds` is this box merged with the label; quality metrics that already
+/// account for labels separately (e.g. label-vs-node overlap) need the
+/// label-free shape to avoid double-counting the label area.
+pub(crate) fn aux_shape_bounds(element: &view_element::Aux) -> Rect {
     let cx = element.x;
     let cy = element.y;
     let r = AUX_RADIUS;
-    let bounds = Rect {
+    Rect {
         top: cy - r,
         left: cx - r,
         right: cx + r,
         bottom: cy + r,
-    };
+    }
+}
+
+pub fn aux_bounds(element: &view_element::Aux) -> Rect {
+    let cx = element.x;
+    let cy = element.y;
+    let bounds = aux_shape_bounds(element);
 
     let label_props = LabelProps::new(cx, cy, element.label_side, display_name(&element.name));
     element_with_label_bounds(bounds, &label_props)
@@ -108,17 +118,27 @@ pub fn render_stock(element: &view_element::Stock, is_arrayed: bool) -> String {
     svg
 }
 
-pub fn stock_bounds(element: &view_element::Stock) -> Rect {
+/// The stock's bare *shape* box (the rect), WITHOUT its label. See
+/// `aux_shape_bounds` for why the label-free shape is exposed separately.
+pub(crate) fn stock_shape_bounds(element: &view_element::Stock) -> Rect {
     let cx = element.x;
     let cy = element.y;
     let w = STOCK_WIDTH;
     let h = STOCK_HEIGHT;
-    let bounds = Rect {
+    Rect {
         top: cy - h / 2.0,
         left: cx - w / 2.0,
         right: cx + w / 2.0,
         bottom: cy + h / 2.0,
-    };
+    }
+}
+
+pub fn stock_bounds(element: &view_element::Stock) -> Rect {
+    let cx = element.x;
+    let cy = element.y;
+    let w = STOCK_WIDTH;
+    let h = STOCK_HEIGHT;
+    let bounds = stock_shape_bounds(element);
 
     let label_props = LabelProps::new(cx, cy, element.label_side, display_name(&element.name))
         .with_radii(w / 2.0, h / 2.0);
diff --git a/src/simlin-engine/src/diagram/flow.rs b/src/simlin-engine/src/diagram/flow.rs
index 91e911558..f7d0395a1 100644
--- a/src/simlin-engine/src/diagram/flow.rs
+++ b/src/simlin-engine/src/diagram/flow.rs
@@ -141,7 +141,12 @@ pub fn render_flow(element: &view_element::Flow, sink: &ViewElement, is_arrayed:
     svg
 }
 
-pub fn flow_bounds(element: &view_element::Flow) -> Rect {
+/// The flow's bare *shape* box (the valve plus the pipe polyline points),
+/// WITHOUT its label. `flow_bounds` is this box merged with the label; see
+/// `diagram::elements::aux_shape_bounds` for why the label-free shape is
+/// exposed separately. The flow path points ARE part of the shape (the drawn
+/// pipe), so they stay included here.
+pub(crate) fn flow_shape_bounds(element: &view_element::Flow) -> Rect {
     let cx = element.x;
     let cy = element.y;
     // Flow valve bounds use r=6 (FLOW_VALVE_RADIUS), NOT AuxRadius
@@ -153,13 +158,7 @@ pub fn flow_bounds(element: &view_element::Flow) -> Rect {
         bottom: cy + r,
     };
 
-    // Include label bounds
-    let label_props =
-        LabelProps::new(cx, cy, element.label_side, display_name(&element.name)).with_radii(r, r);
-    let l_bounds = label_bounds(&label_props);
-    bounds = merge_bounds(bounds, l_bounds);
-
-    // Include flow path points
+    // Include flow path points (the drawn pipe).
     for point in &element.points {
         bounds.left = bounds.left.min(point.x);
         bounds.right = bounds.right.max(point.x);
@@ -170,6 +169,20 @@ pub fn flow_bounds(element: &view_element::Flow) -> Rect {
     bounds
 }
 
+pub fn flow_bounds(element: &view_element::Flow) -> Rect {
+    let cx = element.x;
+    let cy = element.y;
+    let r = FLOW_VALVE_RADIUS;
+    let shape = flow_shape_bounds(element);
+
+    // Include label bounds
+    let label_props =
+        LabelProps::new(cx, cy, element.label_side, display_name(&element.name)).with_radii(r, r);
+    let l_bounds = label_bounds(&label_props);
+
+    merge_bounds(shape, l_bounds)
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
diff --git a/src/simlin-engine/src/diagram/mod.rs b/src/simlin-engine/src/diagram/mod.rs
index 32742b326..f9cd8f614 100644
--- a/src/simlin-engine/src/diagram/mod.rs
+++ b/src/simlin-engine/src/diagram/mod.rs
@@ -4,11 +4,11 @@
 
 mod arrowhead;
 pub mod common;
-mod connector;
+pub(crate) mod connector;
 pub mod constants;
-mod elements;
-mod flow;
-mod label;
+pub(crate) mod elements;
+pub(crate) mod flow;
+pub(crate) mod label;
 mod render;
 #[cfg(feature = "png_render")]
 mod render_png;
diff --git a/src/simlin-engine/src/layout/crossings_tests.rs b/src/simlin-engine/src/layout/crossings_tests.rs
new file mode 100644
index 000000000..a457bb83b
--- /dev/null
+++ b/src/simlin-engine/src/layout/crossings_tests.rs
@@ -0,0 +1,419 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+//! Tests for the polyline-based `count_view_crossings` / `build_view_segments`
+//! (Phase 1, Task 4 of the layout quality eval). Kept in their own file so the
+//! `layout_tests.rs` integration suite stays under the per-file line cap.
+
+use super::*;
+
+fn cv_aux(uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Aux(view_element::Aux {
+        name: format!("a{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+        compat: None,
+    })
+}
+
+fn cv_module(uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Module(view_element::Module {
+        name: format!("m{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+    })
+}
+
+fn cv_link(uid: i32, from_uid: i32, to_uid: i32, shape: LinkShape) -> ViewElement {
+    ViewElement::Link(view_element::Link {
+        uid,
+        from_uid,
+        to_uid,
+        shape,
+        polarity: None,
+    })
+}
+
+fn cv_stock(uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Stock(view_element::Stock {
+        name: format!("s{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+        compat: None,
+    })
+}
+
+fn cv_cloud(uid: i32, flow_uid: i32, x: f64, y: f64) -> ViewElement {
+    ViewElement::Cloud(view_element::Cloud {
+        uid,
+        flow_uid,
+        x,
+        y,
+        compat: None,
+    })
+}
+
+/// A horizontal flow whose valve sits at (`x`, `y`), with its source end
+/// attached to `from_uid` (a cloud or stock to the left) and its sink end
+/// attached to `to_uid` (a stock to the right). The valve lies on the pipe,
+/// mid-span between the two attached endpoints.
+fn cv_flow(uid: i32, x: f64, y: f64, from_uid: i32, to_uid: i32) -> ViewElement {
+    cv_flow_pts(
+        uid,
+        x,
+        y,
+        (x - 60.0, y, Some(from_uid)),
+        (x + 60.0, y, Some(to_uid)),
+    )
+}
+
+/// A two-point flow with the valve at (`x`, `y`) and explicitly positioned
+/// source/sink points, each carrying an optional `attached_to_uid`. Lets a
+/// test reproduce a real reference geometry where the valve does not sit at the
+/// midpoint of the two points.
+fn cv_flow_pts(
+    uid: i32,
+    x: f64,
+    y: f64,
+    from: (f64, f64, Option<i32>),
+    to: (f64, f64, Option<i32>),
+) -> ViewElement {
+    ViewElement::Flow(view_element::Flow {
+        name: format!("f{uid}"),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Top,
+        points: vec![
+            view_element::FlowPoint {
+                x: from.0,
+                y: from.1,
+                attached_to_uid: from.2,
+            },
+            view_element::FlowPoint {
+                x: to.0,
+                y: to.1,
+                attached_to_uid: to.2,
+            },
+        ],
+        compat: None,
+        label_compat: None,
+    })
+}
+
+fn cv_view(elements: Vec<ViewElement>) -> datamodel::StockFlow {
+    datamodel::StockFlow {
+        name: None,
+        elements,
+        view_box: Rect {
+            x: 0.0,
+            y: 0.0,
+            width: 1000.0,
+            height: 1000.0,
+        },
+        zoom: 1.0,
+        use_lettered_polarity: false,
+        font: None,
+        sketch_compat: None,
+    }
+}
+
+/// AC2.1: two straight links that cross once yield a crossing count of 1.
+#[test]
+fn test_count_view_crossings_two_straight_links_cross_once() {
+    // Link 1: a1(0,0) -> a2(100,100). Link 2: a3(0,100) -> a4(100,0).
+    // The two diagonals of a square cross exactly once at the center.
+    let view = cv_view(vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_aux(2, 100.0, 100.0),
+        cv_aux(3, 0.0, 100.0),
+        cv_aux(4, 100.0, 0.0),
+        cv_link(10, 1, 2, LinkShape::Straight),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ]);
+
+    assert_eq!(count_view_crossings(&view), 1);
+}
+
+/// AC2.1: two links sharing an endpoint element yield 0 crossings.
+#[test]
+fn test_count_view_crossings_shared_endpoint_no_crossing() {
+    // Both links start at a1; sharing the `elem_1` vertex suppresses any
+    // intersection at the shared endpoint.
+    let view = cv_view(vec![
+        cv_aux(1, 50.0, 50.0),
+        cv_aux(2, 100.0, 0.0),
+        cv_aux(3, 100.0, 100.0),
+        cv_link(10, 1, 2, LinkShape::Straight),
+        cv_link(11, 1, 3, LinkShape::Straight),
+    ]);
+
+    assert_eq!(count_view_crossings(&view), 0);
+}
+
+/// AC2.2: an Arc connector that visually crosses another edge is counted via
+/// polyline sampling, on a case where the straight-chord approximation does
+/// not count it. The arc from a1(0,0) to a2(200,0) bulges down to a peak near
+/// (100, 57.7); a horizontal straight link c-d at y=50 (from x=40 to x=160)
+/// passes through the bulge, crossing the curve twice (near x=58 and x=142),
+/// while the arc's straight chord (the line y=0) stays well clear of it. So the
+/// old chord-based count is 0 and the new polyline-based count is >= 1.
+#[test]
+fn test_count_view_crossings_arc_curve_crosses_when_chord_does_not() {
+    let view = cv_view(vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_aux(2, 200.0, 0.0),
+        cv_aux(3, 40.0, 50.0),
+        cv_aux(4, 160.0, 50.0),
+        // Wide arc: large take-off angle so the curve bulges well below the
+        // straight chord between the two endpoints.
+        cv_link(10, 1, 2, LinkShape::Arc(60.0)),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ]);
+
+    // The straight-chord approximation (centers, ignoring shape) does NOT
+    // count this crossing: build those chord segments inline and confirm 0.
+    let p1 = Position::new(0.0, 0.0);
+    let p2 = Position::new(200.0, 0.0);
+    let p3 = Position::new(40.0, 50.0);
+    let p4 = Position::new(160.0, 50.0);
+    let chord_segments = vec![
+        LineSegment {
+            start: p1,
+            end: p2,
+            from_node: "elem_1".to_string(),
+            to_node: "elem_2".to_string(),
+        },
+        LineSegment {
+            start: p3,
+            end: p4,
+            from_node: "elem_3".to_string(),
+            to_node: "elem_4".to_string(),
+        },
+    ];
+    assert_eq!(
+        annealing::count_crossings(&chord_segments),
+        0,
+        "chord approximation must not see this crossing"
+    );
+
+    // The polyline (sampled arc) DOES count it.
+    assert!(
+        count_view_crossings(&view) >= 1,
+        "sampled arc curve must cross the straight link"
+    );
+}
+
+/// AC2.3: the crossing count is invariant under translation and rotation of
+/// the whole view.
+#[test]
+fn test_count_view_crossings_translation_rotation_invariant() {
+    let base = vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_aux(2, 100.0, 100.0),
+        cv_aux(3, 0.0, 100.0),
+        cv_aux(4, 100.0, 0.0),
+        cv_link(10, 1, 2, LinkShape::Arc(25.0)),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ];
+    let base_count = count_view_crossings(&cv_view(base.clone()));
+
+    // Translate every coordinate by a fixed offset.
+    let translated: Vec<ViewElement> = base
+        .iter()
+        .map(|e| transform_element(e, |x, y| (x + 137.0, y - 89.0)))
+        .collect();
+    assert_eq!(
+        count_view_crossings(&cv_view(translated)),
+        base_count,
+        "translation must preserve crossing count"
+    );
+
+    // Rotate every coordinate about the origin by a fixed angle.
+    let theta = 0.7_f64; // radians
+    let (s, c) = theta.sin_cos();
+    let rotated: Vec<ViewElement> = base
+        .iter()
+        .map(|e| transform_element(e, |x, y| (x * c - y * s, x * s + y * c)))
+        .collect();
+    assert_eq!(
+        count_view_crossings(&cv_view(rotated)),
+        base_count,
+        "rotation must preserve crossing count"
+    );
+}
+
+/// Apply a coordinate transform to the (x, y) of a positioned view element.
+/// Links carry no coordinates of their own and pass through unchanged.
+fn transform_element(e: &ViewElement, f: impl Fn(f64, f64) -> (f64, f64)) -> ViewElement {
+    match e {
+        ViewElement::Aux(a) => {
+            let (x, y) = f(a.x, a.y);
+            ViewElement::Aux(view_element::Aux { x, y, ..a.clone() })
+        }
+        ViewElement::Module(m) => {
+            let (x, y) = f(m.x, m.y);
+            ViewElement::Module(view_element::Module { x, y, ..m.clone() })
+        }
+        other => other.clone(),
+    }
+}
+
+/// Module/Alias undercount fix: a link from an Aux to a Module that crosses
+/// another link is now counted. Previously Module-incident links were dropped
+/// from the segment set entirely, so this crossing was invisible.
+#[test]
+fn test_count_view_crossings_module_incident_link_participates() {
+    // Link 1: a1(0,0) -> m2(100,100) (a Module endpoint).
+    // Link 2: a3(0,100) -> a4(100,0). The two diagonals cross once.
+    let view = cv_view(vec![
+        cv_aux(1, 0.0, 0.0),
+        cv_module(2, 100.0, 100.0),
+        cv_aux(3, 0.0, 100.0),
+        cv_aux(4, 100.0, 0.0),
+        cv_link(10, 1, 2, LinkShape::Straight),
+        cv_link(11, 3, 4, LinkShape::Straight),
+    ]);
+
+    assert_eq!(
+        count_view_crossings(&view),
+        1,
+        "a Module-incident link must participate in crossing detection"
+    );
+}
+
+/// A link that TERMINATES at a flow's valve must not be counted as crossing the
+/// flow pipe at that shared connection point. This is the exact
+/// dp_logistic_growth reference geometry: the horizontal `net birth rate` flow
+/// (cloud -> valve -> Population stock) plus the `fractional growth rate ->
+/// net birth rate` link, whose drawn arc curves up to the valve from below and
+/// grazes the pipe at the connection point. The link's endpoint (`elem_2`, the
+/// flow's own element uid) and the pipe share the flow's element at the valve,
+/// so that graze is not a real crossing.
+#[test]
+fn test_count_view_crossings_link_to_flow_valve_no_crossing() {
+    let flow_uid = 2;
+    let view = cv_view(vec![
+        cv_stock(1, 602.4000244140625, 259.8000183105469),
+        cv_flow_pts(
+            flow_uid,
+            518.2726610523725,
+            258.60003662109375,
+            // source end attached to the cloud, sink end to the stock
+            (456.79998779296875, 258.60003662109375, Some(3)),
+            (579.9000244140625, 258.60003662109375, Some(1)),
+        ),
+        cv_cloud(3, flow_uid, 456.79998779296875, 258.60003662109375),
+        cv_aux(4, 498.0, 344.20001220703125),
+        // fractional growth rate -> net birth rate (to_uid == flow.uid): the
+        // drawn arc bulges up to graze the pipe at the valve connection point.
+        cv_link(10, 4, flow_uid, LinkShape::Arc(118.82198603295677)),
+    ]);
+
+    assert_eq!(
+        count_view_crossings(&view),
+        0,
+        "a link terminating at a flow valve must not count as crossing the pipe"
+    );
+}
+
+/// The flow-segment naming contract that the suppression relies on: a flow
+/// point attached to a stock/cloud names its pipe vertex `elem_{attached_uid}`
+/// (so a link incident on that stock/cloud, which uses the same name, is
+/// suppressed at the shared connection point), the valve is injected as an
+/// `elem_{flow.uid}` vertex on the pipe (so a link incident on the valve is
+/// suppressed there), and a free point keeps the per-flow `flow_{uid}#{i}`
+/// name (so a genuine mid-span crossing is still counted). This is the
+/// node-name contract; the end-to-end suppression is exercised by the valve and
+/// mid-span tests, since for an attached stock/cloud the link endpoint clips to
+/// the element boundary and only grazes the pipe through the shared vertex.
+#[test]
+fn test_build_view_segments_flow_vertex_naming() {
+    let flow_uid = 2;
+    let stock_uid = 1;
+    let cloud_uid = 3;
+    let view = cv_view(vec![
+        cv_stock(stock_uid, 602.4000244140625, 259.8000183105469),
+        cv_flow_pts(
+            flow_uid,
+            518.2726610523725,
+            258.60003662109375,
+            (456.79998779296875, 258.60003662109375, Some(cloud_uid)),
+            (579.9000244140625, 258.60003662109375, Some(stock_uid)),
+        ),
+        cv_cloud(cloud_uid, flow_uid, 456.79998779296875, 258.60003662109375),
+    ]);
+
+    let segs = build_view_segments(&view);
+    // The pipe splits at the valve into two sub-segments:
+    //   elem_3 (cloud) -> elem_2 (valve)  and  elem_2 (valve) -> elem_1 (stock)
+    let names: Vec<(String, String)> = segs
+        .iter()
+        .map(|s| (s.from_node.clone(), s.to_node.clone()))
+        .collect();
+    assert_eq!(
+        names,
+        vec![
+            ("elem_3".to_string(), "elem_2".to_string()),
+            ("elem_2".to_string(), "elem_1".to_string()),
+        ],
+        "flow pipe must name attached endpoints elem_<attached> and split at the valve as elem_<flow>"
+    );
+
+    // A free (unattached) interior point keeps the per-flow name.
+    let free_view = cv_view(vec![cv_flow_pts(
+        flow_uid,
+        518.2726610523725,
+        258.60003662109375,
+        (456.79998779296875, 258.60003662109375, None),
+        (579.9000244140625, 258.60003662109375, None),
+    )]);
+    let free_segs = build_view_segments(&free_view);
+    let free_names: Vec<(String, String)> = free_segs
+        .iter()
+        .map(|s| (s.from_node.clone(), s.to_node.clone()))
+        .collect();
+    assert_eq!(
+        free_names,
+        vec![
+            (format!("flow_{flow_uid}#0"), format!("elem_{flow_uid}")),
+            (format!("elem_{flow_uid}"), format!("flow_{flow_uid}#1")),
+        ],
+        "an unattached flow point keeps its per-flow name; only the valve is elem_<flow>"
+    );
+}
+
+/// A GENUINE mid-span crossing of a flow pipe -- a link that crosses the pipe
+/// away from any element the flow shares -- must STILL be counted. This guards
+/// against the valve/attachment suppression over-suppressing real crossings.
+#[test]
+fn test_count_view_crossings_link_crosses_flow_pipe_midspan_counted() {
+    // Flow valve at (100, 100), pipe from x=40 to x=160 at y=100. A straight
+    // link runs vertically through x=70 (between the cloud end and the valve,
+    // so it does NOT touch the valve, the cloud, or the stock), crossing the
+    // pipe once.
+    let flow_uid = 20;
+    let view = cv_view(vec![
+        cv_cloud(1, flow_uid, 40.0, 100.0),
+        cv_stock(2, 200.0, 100.0),
+        cv_aux(3, 70.0, 50.0),
+        cv_aux(4, 70.0, 150.0),
+        cv_flow(flow_uid, 100.0, 100.0, 1, 2),
+        // Link from a3 (above the pipe) to a4 (below the pipe), crossing the
+        // pipe at x=70 -- nowhere near the valve or either attached element.
+        cv_link(30, 3, 4, LinkShape::Straight),
+    ]);
+
+    assert_eq!(
+        count_view_crossings(&view),
+        1,
+        "a genuine mid-span crossing of the flow pipe must still be counted"
+    );
+}
diff --git a/src/simlin-engine/src/layout/eval_stats.rs b/src/simlin-engine/src/layout/eval_stats.rs
new file mode 100644
index 000000000..4c6fd5a56
--- /dev/null
+++ b/src/simlin-engine/src/layout/eval_stats.rs
@@ -0,0 +1,1147 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+// pattern: Functional Core
+//
+// Pure statistics for layout-quality seed-sample distributions, mirroring Go's
+// `benchstat`: many per-seed samples reduced to a center + spread, plus a
+// non-parametric significance test (Mann-Whitney U) on differences.
+//
+// There is NO I/O in this module: it takes slices of numbers, computes scalars,
+// and returns them. Every primitive returns a finite, documented default
+// (`0.0`, or a non-significant `p_value` of `1.0`) on empty or degenerate
+// input -- it must never return NaN, matching the engine's no-NaN policy for
+// statistics. That makes every term trivially testable with hand-computed
+// expected values (see the inline tests below).
+//
+// The corpus sweep (Phase 3) is the imperative shell that fills these structs
+// from real layouts.
+
+use crate::layout::metrics::LayoutMetrics;
+
+/// Geometric mean of strictly-positive values: `exp(mean(ln(x)))`.
+///
+/// Returns `0.0` for an empty slice. Values must be `> 0`; layout costs are
+/// `>= 0`, so callers floor with a small epsilon before calling (see
+/// [`CorpusReport::from_model_stats`]) so a single `0` cost cannot zero the
+/// whole-corpus geometric mean.
+pub fn geomean(values: &[f64]) -> f64 {
+    if values.is_empty() {
+        return 0.0;
+    }
+    // The geometric mean of a single value is that value exactly; short-circuit
+    // to avoid a needless ln/exp round-trip (which would return e.g.
+    // 4.999999999999999 for an input of 5.0).
+    if values.len() == 1 {
+        return values[0];
+    }
+    let sum_ln: f64 = values.iter().map(|&x| x.ln()).sum();
+    (sum_ln / values.len() as f64).exp()
+}
+
+/// Linear-interpolated percentile using the "type 7" convention (NumPy's
+/// default): for sorted `x` of length `n` and `p` in `[0, 1]`, the fractional
+/// rank is `p * (n - 1)`, then the result interpolates linearly between the
+/// values at the floor and ceil of that rank.
+///
+/// Returns `0.0` for an empty slice and the single value for `n == 1`.
+/// `values` need not be pre-sorted -- a copy is sorted internally. `p` is
+/// clamped to `[0, 1]`.
+pub fn percentile(values: &[f64], p: f64) -> f64 {
+    if values.is_empty() {
+        return 0.0;
+    }
+    let n = values.len();
+    if n == 1 {
+        return values[0];
+    }
+
+    let mut sorted = values.to_vec();
+    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
+
+    let p = p.clamp(0.0, 1.0);
+    // Type-7 fractional rank in [0, n-1].
+    let rank = p * (n as f64 - 1.0);
+    let lo = rank.floor() as usize;
+    let hi = rank.ceil() as usize;
+    if lo == hi {
+        return sorted[lo];
+    }
+    let frac = rank - lo as f64;
+    sorted[lo] * (1.0 - frac) + sorted[hi] * frac
+}
+
+/// Median, equal to `percentile(values, 0.5)`.
+pub fn median(values: &[f64]) -> f64 {
+    percentile(values, 0.5)
+}
+
+/// Mann-Whitney U test result for two independent samples.
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct MannWhitney {
+    /// The smaller of `u1` and `u2`.
+    pub u: f64,
+    /// U statistic for sample `a`.
+    pub u1: f64,
+    /// U statistic for sample `b`.
+    pub u2: f64,
+    /// Two-sided p-value (normal approximation with tie + continuity
+    /// correction).
+    pub p_value: f64,
+}
+
+/// Mann-Whitney U (a.k.a. Wilcoxon rank-sum) test on two independent samples.
+///
+/// Ranks the pooled samples, averaging tied ranks; computes U from the rank
+/// sums; reports the two-sided p-value via the normal approximation with tie
+/// correction and continuity correction. For tiny samples this approximation
+/// is rough; the sweep uses M >= ~20 seeds where it is good.
+///
+/// Returns `p_value = 1.0` (non-significant) when either sample is empty or all
+/// pooled values are identical (no separation is possible, so the variance of
+/// the normal approximation is zero).
+pub fn mann_whitney_u(a: &[f64], b: &[f64]) -> MannWhitney {
+    let n1 = a.len();
+    let n2 = b.len();
+    if n1 == 0 || n2 == 0 {
+        // No separation possible with an empty sample. Report a degenerate but
+        // finite result with a non-significant p-value.
+        return MannWhitney {
+            u: 0.0,
+            u1: 0.0,
+            u2: 0.0,
+            p_value: 1.0,
+        };
+    }
+
+    // 1. Pool, tagging each value with which sample it came from (false = a),
+    //    sort by value, and assign average ranks (1..=N) to tied groups.
+    let mut pooled: Vec<(f64, bool)> = Vec::with_capacity(n1 + n2);
+    pooled.extend(a.iter().map(|&v| (v, false)));
+    pooled.extend(b.iter().map(|&v| (v, true)));
+    pooled.sort_by(|x, y| x.0.partial_cmp(&y.0).unwrap_or(std::cmp::Ordering::Equal));
+
+    let n = (n1 + n2) as f64;
+    let mut r1 = 0.0; // sum of ranks belonging to sample `a`
+    // Σ (t^3 - t) over each tie group of size t, for the variance correction.
+    let mut tie_term = 0.0;
+    let mut i = 0;
+    while i < pooled.len() {
+        // Extend [i, j) over the run of values equal to pooled[i].0.
+        let mut j = i + 1;
+        while j < pooled.len() && pooled[j].0 == pooled[i].0 {
+            j += 1;
+        }
+        let group_len = j - i;
+        // Ranks are 1-based; the average rank of positions i..j (0-based) is
+        // ((i+1) + j) / 2.
+        let avg_rank = ((i + 1) + j) as f64 / 2.0;
+        for entry in &pooled[i..j] {
+            if !entry.1 {
+                r1 += avg_rank;
+            }
+        }
+        if group_len > 1 {
+            let t = group_len as f64;
+            tie_term += t * t * t - t;
+        }
+        i = j;
+    }
+
+    // 2. U statistics from the rank sums.
+    let n1f = n1 as f64;
+    let n2f = n2 as f64;
+    let u1 = r1 - n1f * (n1f + 1.0) / 2.0;
+    let u2 = n1f * n2f - u1;
+    let u = u1.min(u2);
+
+    // 3. Mean and tie-corrected variance of the U distribution.
+    let mu = n1f * n2f / 2.0;
+    let variance = (n1f * n2f / 12.0) * ((n + 1.0) - tie_term / (n * (n - 1.0)));
+
+    // 4. Two-sided p-value via the normal approximation with a 0.5 continuity
+    //    correction. When the variance is zero (all pooled values identical,
+    //    or n == 1 with no spread), no separation is possible -- report the
+    //    non-significant default rather than dividing by zero.
+    let p_value = if variance <= 0.0 {
+        1.0
+    } else {
+        let z = ((u - mu).abs() - 0.5).max(0.0) / variance.sqrt();
+        (2.0 * (1.0 - phi(z))).clamp(0.0, 1.0)
+    };
+
+    MannWhitney { u, u1, u2, p_value }
+}
+
+/// Error function via the Abramowitz & Stegun 7.1.26 rational approximation
+/// (max absolute error ~1.5e-7) -- ample accuracy for a significance verdict.
+///
+/// A small local copy keeps this module self-contained and independently
+/// testable (the VM-internal `crate::alloc::erfc_approx`/`normal_cdf` are an
+/// implementation detail of the allocation opcodes).
+fn erf(x: f64) -> f64 {
+    // A&S 7.1.26 is stated for x >= 0; erf is odd, so reflect for x < 0.
+    let sign = if x < 0.0 { -1.0 } else { 1.0 };
+    let x = x.abs();
+
+    const A1: f64 = 0.254_829_592;
+    const A2: f64 = -0.284_496_736;
+    const A3: f64 = 1.421_413_741;
+    const A4: f64 = -1.453_152_027;
+    const A5: f64 = 1.061_405_429;
+    const P: f64 = 0.327_591_1;
+
+    let t = 1.0 / (1.0 + P * x);
+    // Horner form of (a1 t + a2 t^2 + a3 t^3 + a4 t^4 + a5 t^5).
+    let poly = ((((A5 * t + A4) * t + A3) * t + A2) * t + A1) * t;
+    let y = 1.0 - poly * (-x * x).exp();
+    sign * y
+}
+
+/// Standard normal CDF, `Phi(x) = 0.5 * (1 + erf(x / sqrt(2)))`.
+fn phi(x: f64) -> f64 {
+    0.5 * (1.0 + erf(x / std::f64::consts::SQRT_2))
+}
+
+/// Floor applied to each model's median before it enters the corpus geometric
+/// mean. A geometric mean is the product of its terms, so a single `0` median
+/// would zero the whole aggregate; flooring with this small epsilon keeps a
+/// genuinely-perfect (zero-cost) model from collapsing the corpus number while
+/// remaining far below any meaningful cost. Documented and applied only in
+/// [`CorpusReport::from_model_stats`].
+pub const GEOMEAN_FLOOR_EPSILON: f64 = 1e-9;
+
+/// One per-seed layout sample: the seed that produced the layout, its computed
+/// metrics, and the scalar weighted cost the optimizer minimizes.
+///
+/// `Serialize`/`Deserialize` let the corpus sweep round-trip a full
+/// [`CorpusReport`] (including these per-seed samples) through JSON, so the
+/// committed baseline report can be read back and the per-model seed-sample
+/// cost sets re-run through [`mann_whitney_u`] by [`compare`].
+#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
+pub struct MetricSample {
+    pub seed: u64,
+    pub metrics: LayoutMetrics,
+    pub weighted_cost: f64,
+}
+
+/// Aggregated statistics for one model's seed sweep: the raw per-seed samples
+/// plus the center (`median_cost`), spread (`p25`, `p75`), the best-of-k
+/// production proxy, and the best/median/worst seeds (which drive Phase 3's
+/// PNG renders).
+///
+/// `Serialize`/`Deserialize` ride on [`MetricSample`]'s so a [`CorpusReport`]
+/// round-trips through JSON (see [`MetricSample`]).
+#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
+pub struct ModelStats {
+    pub model: String,
+    /// One sample per seed.
+    pub samples: Vec<MetricSample>,
+    pub median_cost: f64,
+    /// `(p25, p75)` of the weighted costs.
+    pub spread: (f64, f64),
+    /// Production proxy: the min weighted cost over the k production seeds.
+    pub best_of_k_cost: f64,
+    pub best_seed: u64,
+    pub median_seed: u64,
+    pub worst_seed: u64,
+}
+
+/// Corpus-wide report: one `ModelStats` per model plus the geometric mean of
+/// the per-model medians (the single headline aggregate, benchstat-style).
+///
+/// `Serialize`/`Deserialize` let the corpus sweep write this report to the
+/// committed `examples/layout_eval_baseline.json` and read it back for the
+/// baseline-vs-candidate diff (`compare`). The full report -- including each
+/// model's per-seed `samples` -- round-trips so `compare` can re-run
+/// Mann-Whitney U over the seed-sample cost sets.
+#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
+pub struct CorpusReport {
+    pub per_model: Vec<ModelStats>,
+    pub geomean_of_medians: f64,
+}
+
+impl ModelStats {
+    /// Summarize a model's per-seed samples.
+    ///
+    /// `production_seeds` is the fixed seed set used for the best-of-k proxy:
+    /// `best_of_k_cost` is the min `weighted_cost` among the samples whose seed
+    /// is in that set, falling back to the global min when none of the
+    /// production seeds were sampled. The median seed is the sample whose cost
+    /// is closest to `median_cost`, breaking ties on the lowest seed (so the
+    /// chosen render is deterministic). Empty `samples` yields all-zero fields
+    /// and seeds of `0` -- no panic.
+    pub fn from_samples(
+        model: String,
+        samples: Vec<MetricSample>,
+        production_seeds: &[u64],
+    ) -> ModelStats {
+        if samples.is_empty() {
+            return ModelStats {
+                model,
+                samples,
+                median_cost: 0.0,
+                spread: (0.0, 0.0),
+                best_of_k_cost: 0.0,
+                best_seed: 0,
+                median_seed: 0,
+                worst_seed: 0,
+            };
+        }
+
+        let costs: Vec<f64> = samples.iter().map(|s| s.weighted_cost).collect();
+        let median_cost = median(&costs);
+        let spread = (percentile(&costs, 0.25), percentile(&costs, 0.75));
+
+        // best/worst seeds: the seeds of the global min / max weighted_cost.
+        // Tie-break on the lowest seed so the chosen render is deterministic.
+        let best_seed = samples
+            .iter()
+            .min_by(|x, y| {
+                x.weighted_cost
+                    .partial_cmp(&y.weighted_cost)
+                    .unwrap_or(std::cmp::Ordering::Equal)
+                    .then(x.seed.cmp(&y.seed))
+            })
+            .map(|s| s.seed)
+            .unwrap_or(0);
+        let worst_seed = samples
+            .iter()
+            .max_by(|x, y| {
+                x.weighted_cost
+                    .partial_cmp(&y.weighted_cost)
+                    .unwrap_or(std::cmp::Ordering::Equal)
+                    // For a tie on cost, max_by returns the LATER-compared-greater
+                    // element; flip the seed comparison so the lowest seed wins.
+                    .then(y.seed.cmp(&x.seed))
+            })
+            .map(|s| s.seed)
+            .unwrap_or(0);
+
+        // median seed: the sample whose cost is closest to `median_cost`,
+        // breaking ties on the lowest seed.
+        let median_seed = samples
+            .iter()
+            .min_by(|x, y| {
+                let dx = (x.weighted_cost - median_cost).abs();
+                let dy = (y.weighted_cost - median_cost).abs();
+                dx.partial_cmp(&dy)
+                    .unwrap_or(std::cmp::Ordering::Equal)
+                    .then(x.seed.cmp(&y.seed))
+            })
+            .map(|s| s.seed)
+            .unwrap_or(0);
+
+        // best-of-k: min weighted_cost among samples whose seed is a production
+        // seed; fall back to the global min when none were sampled.
+        let prod_min = samples
+            .iter()
+            .filter(|s| production_seeds.contains(&s.seed))
+            .map(|s| s.weighted_cost)
+            .fold(f64::INFINITY, f64::min);
+        let best_of_k_cost = if prod_min.is_finite() {
+            prod_min
+        } else {
+            costs.iter().cloned().fold(f64::INFINITY, f64::min)
+        };
+
+        ModelStats {
+            model,
+            samples,
+            median_cost,
+            spread,
+            best_of_k_cost,
+            best_seed,
+            median_seed,
+            worst_seed,
+        }
+    }
+}
+
+impl CorpusReport {
+    /// Build a corpus report. `geomean_of_medians` is the geometric mean of
+    /// each model's `median_cost`, with each median floored by
+    /// [`GEOMEAN_FLOOR_EPSILON`] so a single `0` median cannot zero the whole
+    /// aggregate. An empty corpus yields `geomean_of_medians == 0.0`.
+    pub fn from_model_stats(per_model: Vec<ModelStats>) -> CorpusReport {
+        let medians: Vec<f64> = per_model
+            .iter()
+            .map(|m| m.median_cost.max(GEOMEAN_FLOOR_EPSILON))
+            .collect();
+        let geomean_of_medians = geomean(&medians);
+        CorpusReport {
+            per_model,
+            geomean_of_medians,
+        }
+    }
+}
+
+/// Per-model verdict from comparing a baseline against a candidate report.
+///
+/// `Serialize` lets the corpus sweep embed the baseline-vs-candidate diff into
+/// its `metrics.json` artifact. The verdict is never read back from JSON (it is
+/// recomputed by `compare` on every run), so it carries no `Deserialize`.
+#[derive(Clone, Debug, serde::Serialize)]
+pub struct ModelComparison {
+    pub model: String,
+    pub baseline_median: f64,
+    pub candidate_median: f64,
+    /// `candidate_median / baseline_median - 1.0`, or `0.0` when the baseline
+    /// median is `0` (so a degenerate baseline never produces inf/NaN). A
+    /// negative ratio means the candidate is cheaper (better).
+    pub delta_ratio: f64,
+    /// Two-sided Mann-Whitney U p-value over the two models' seed-sample
+    /// `weighted_cost` vectors.
+    pub p_value: f64,
+    /// `p_value < SIGNIFICANCE_ALPHA`.
+    pub significant: bool,
+}
+
+/// Result of comparing two corpus reports: one [`ModelComparison`] per matched
+/// model plus the corpus-wide aggregate delta and significance verdict.
+///
+/// `Serialize` lets the corpus sweep embed this diff into its `metrics.json`
+/// artifact. Like [`ModelComparison`] it carries no `Deserialize`: the diff is
+/// recomputed by `compare` on every run, never read back from JSON.
+#[derive(Clone, Debug, serde::Serialize)]
+pub struct Comparison {
+    /// One entry per model present in BOTH reports (unmatched models are
+    /// skipped -- see [`compare`]), in baseline iteration order.
+    pub per_model: Vec<ModelComparison>,
+    /// `geomean(candidate medians) / geomean(baseline medians) - 1.0` over the
+    /// matched per-model medians, or `0.0` when the baseline geomean is `0`.
+    pub aggregate_delta_ratio: f64,
+    /// Two-sided Mann-Whitney U p-value over the matched per-model medians (see
+    /// [`compare`] for why Mann-Whitney rather than a paired test).
+    pub aggregate_p_value: f64,
+    /// `aggregate_p_value < SIGNIFICANCE_ALPHA`.
+    pub aggregate_significant: bool,
+}
+
+/// Significance threshold for the p-value verdicts -- the conventional 5%.
+pub const SIGNIFICANCE_ALPHA: f64 = 0.05;
+
+/// Compute `candidate / baseline - 1.0`, returning `0.0` when `baseline == 0`
+/// so a degenerate (zero) baseline never produces an infinite or NaN ratio.
+/// Mirrors the no-NaN policy of the rest of this module.
+fn delta_ratio(baseline: f64, candidate: f64) -> f64 {
+    if baseline == 0.0 {
+        0.0
+    } else {
+        candidate / baseline - 1.0
+    }
+}
+
+/// Compare two corpus reports.
+///
+/// Models are matched by `model` name; only models present in BOTH reports are
+/// compared. A model present in just one report is **skipped** (it has no
+/// counterpart to difference against). The returned `per_model` is in baseline
+/// iteration order.
+///
+/// Per matched model: the two seed-sample `weighted_cost` vectors are run
+/// through [`mann_whitney_u`]; `delta_ratio` is computed from the medians
+/// (`0.0` when the baseline median is `0`); `significant` is
+/// `p_value < SIGNIFICANCE_ALPHA`.
+///
+/// Aggregate: `aggregate_delta_ratio` is the ratio of the candidate-side to
+/// baseline-side geometric mean of the matched per-model medians (each side
+/// floored by [`GEOMEAN_FLOOR_EPSILON`] exactly as [`CorpusReport`] does, so a
+/// `0` median can't zero the aggregate). `aggregate_p_value` is
+/// `mann_whitney_u(baseline_medians, candidate_medians).p_value` over the
+/// matched per-model medians.
+///
+/// The aggregate significance test treats the two median vectors as
+/// independent samples (Mann-Whitney U), per the design. A paired test such as
+/// Wilcoxon signed-rank -- which would exploit the model-by-model pairing of
+/// the matched medians -- is a documented future refinement, not implemented
+/// here.
+///
+/// On empty or fully-disjoint reports there are no matched models:
+/// `per_model` is empty, `aggregate_delta_ratio == 0.0`, and the aggregate is
+/// non-significant with a finite p-value (no NaN).
+pub fn compare(baseline: &CorpusReport, candidate: &CorpusReport) -> Comparison {
+    // Index the candidate's models by name so we can pull the matching entry in
+    // baseline iteration order without an O(n^2) scan.
+    let candidate_by_name: std::collections::HashMap<&str, &ModelStats> = candidate
+        .per_model
+        .iter()
+        .map(|m| (m.model.as_str(), m))
+        .collect();
+
+    let mut per_model = Vec::new();
+    let mut baseline_medians = Vec::new();
+    let mut candidate_medians = Vec::new();
+
+    for base in &baseline.per_model {
+        let Some(cand) = candidate_by_name.get(base.model.as_str()) else {
+            // Unmatched: present only in the baseline, so skip it.
+            continue;
+        };
+
+        let baseline_costs: Vec<f64> = base.samples.iter().map(|s| s.weighted_cost).collect();
+        let candidate_costs: Vec<f64> = cand.samples.iter().map(|s| s.weighted_cost).collect();
+        let mw = mann_whitney_u(&baseline_costs, &candidate_costs);
+
+        let baseline_median = base.median_cost;
+        let candidate_median = cand.median_cost;
+        let ratio = delta_ratio(baseline_median, candidate_median);
+
+        baseline_medians.push(baseline_median);
+        candidate_medians.push(candidate_median);
+
+        per_model.push(ModelComparison {
+            model: base.model.clone(),
+            baseline_median,
+            candidate_median,
+            delta_ratio: ratio,
+            p_value: mw.p_value,
+            significant: mw.p_value < SIGNIFICANCE_ALPHA,
+        });
+    }
+
+    // Aggregate delta: ratio of the two geomean-of-medians, each side floored
+    // by the same epsilon CorpusReport uses so a single 0 median can't zero a
+    // side's geometric mean.
+    let baseline_floored: Vec<f64> = baseline_medians
+        .iter()
+        .map(|&m| m.max(GEOMEAN_FLOOR_EPSILON))
+        .collect();
+    let candidate_floored: Vec<f64> = candidate_medians
+        .iter()
+        .map(|&m| m.max(GEOMEAN_FLOOR_EPSILON))
+        .collect();
+    let aggregate_delta_ratio =
+        delta_ratio(geomean(&baseline_floored), geomean(&candidate_floored));
+
+    let aggregate_p_value = mann_whitney_u(&baseline_medians, &candidate_medians).p_value;
+
+    Comparison {
+        per_model,
+        aggregate_delta_ratio,
+        aggregate_p_value,
+        aggregate_significant: aggregate_p_value < SIGNIFICANCE_ALPHA,
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use proptest::prelude::*;
+
+    const EPS: f64 = 1e-9;
+
+    fn close(a: f64, b: f64) -> bool {
+        (a - b).abs() < EPS
+    }
+
+    // --- geomean ---
+
+    #[test]
+    fn test_geomean_two_values() {
+        // sqrt(2*8) = sqrt(16) = 4.
+        assert!(close(geomean(&[2.0, 8.0]), 4.0), "{}", geomean(&[2.0, 8.0]));
+    }
+
+    #[test]
+    fn test_geomean_three_values() {
+        // cbrt(1*10*100) = cbrt(1000) = 10.
+        let g = geomean(&[1.0, 10.0, 100.0]);
+        assert!(close(g, 10.0), "{}", g);
+    }
+
+    #[test]
+    fn test_geomean_empty_is_zero() {
+        assert_eq!(geomean(&[]), 0.0);
+    }
+
+    #[test]
+    fn test_geomean_single() {
+        assert_eq!(geomean(&[5.0]), 5.0);
+    }
+
+    // --- percentile / median (type 7) ---
+
+    #[test]
+    fn test_median_odd() {
+        assert_eq!(median(&[1.0, 2.0, 3.0]), 2.0);
+    }
+
+    #[test]
+    fn test_median_even() {
+        assert_eq!(median(&[1.0, 2.0, 3.0, 4.0]), 2.5);
+    }
+
+    #[test]
+    fn test_percentile_type7_quartiles() {
+        // NumPy np.percentile([1,2,3,4,5], 25) == 2.0, 75 == 4.0.
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.25), 2.0);
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.75), 4.0);
+    }
+
+    #[test]
+    fn test_percentile_empty_is_zero() {
+        assert_eq!(percentile(&[], 0.5), 0.0);
+    }
+
+    #[test]
+    fn test_percentile_single() {
+        assert_eq!(percentile(&[7.0], 0.9), 7.0);
+    }
+
+    #[test]
+    fn test_percentile_unsorted_input() {
+        // The function must sort a copy: a reversed input gives the same answer.
+        assert_eq!(percentile(&[5.0, 4.0, 3.0, 2.0, 1.0], 0.25), 2.0);
+    }
+
+    #[test]
+    fn test_percentile_endpoints() {
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.0), 1.0);
+        assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 1.0), 5.0);
+    }
+
+    // --- Mann-Whitney U ---
+
+    #[test]
+    fn test_mann_whitney_complete_separation() {
+        // a strictly below b: complete separation. With n1 = n2 = 4,
+        // r1 = 1+2+3+4 = 10, u1 = 10 - 4*5/2 = 0, u2 = 16 - 0 = 16, u = 0.
+        let r = mann_whitney_u(&[1.0, 2.0, 3.0, 4.0], &[5.0, 6.0, 7.0, 8.0]);
+        assert_eq!(r.u1, 0.0);
+        assert_eq!(r.u2, 16.0);
+        assert_eq!(r.u, 0.0);
+        assert!(
+            r.p_value < 0.05,
+            "p_value {} should be significant",
+            r.p_value
+        );
+    }
+
+    #[test]
+    fn test_mann_whitney_no_difference() {
+        // Identical samples: every value tied. u1 == u2 == n1*n2/2 == 8, and
+        // the tie-corrected variance is 0, so p_value is the non-significant
+        // default of 1.0.
+        let r = mann_whitney_u(&[1.0, 2.0, 3.0, 4.0], &[1.0, 2.0, 3.0, 4.0]);
+        assert_eq!(r.u1, 8.0);
+        assert_eq!(r.u2, 8.0);
+        assert!(
+            r.p_value > 0.5,
+            "p_value {} should be non-significant",
+            r.p_value
+        );
+    }
+
+    #[test]
+    fn test_mann_whitney_u1_plus_u2_invariant() {
+        // u1 + u2 == n1*n2 on a mixed (interleaved, with ties) example.
+        let a = [1.0, 3.0, 5.0, 7.0, 3.0];
+        let b = [2.0, 4.0, 6.0, 3.0];
+        let r = mann_whitney_u(&a, &b);
+        let n1n2 = (a.len() * b.len()) as f64;
+        assert!(
+            close(r.u1 + r.u2, n1n2),
+            "u1 {} + u2 {} != n1*n2 {}",
+            r.u1,
+            r.u2,
+            n1n2
+        );
+    }
+
+    #[test]
+    fn test_mann_whitney_empty_is_nonsignificant() {
+        let r = mann_whitney_u(&[], &[1.0, 2.0, 3.0]);
+        assert_eq!(r.p_value, 1.0);
+        assert!(r.u.is_finite());
+        assert!(r.u1.is_finite());
+        assert!(r.u2.is_finite());
+    }
+
+    // --- erf / Phi sanity (exercised indirectly through the p-value path) ---
+
+    #[test]
+    fn test_phi_zero() {
+        assert!(close(phi(0.0), 0.5), "{}", phi(0.0));
+    }
+
+    #[test]
+    fn test_phi_1_96() {
+        // The classic 97.5th percentile of the standard normal.
+        assert!((phi(1.96) - 0.975).abs() < 1e-3, "{}", phi(1.96));
+    }
+
+    #[test]
+    fn test_erf_known_values() {
+        assert!(close(erf(0.0), 0.0), "{}", erf(0.0));
+        // erf(1) ~= 0.8427007929 (A&S 7.1.26 max error ~1.5e-7).
+        assert!((erf(1.0) - 0.842_700_792_9).abs() < 1e-6, "{}", erf(1.0));
+        // erf is odd.
+        assert!(close(erf(-0.5), -erf(0.5)), "erf not odd");
+    }
+
+    // --- No NaN: every primitive on empty / degenerate input is finite ---
+
+    #[test]
+    fn test_no_nan_on_degenerate_input() {
+        assert!(geomean(&[]).is_finite());
+        assert!(geomean(&[3.0]).is_finite());
+        assert!(percentile(&[], 0.5).is_finite());
+        assert!(percentile(&[1.0], 0.5).is_finite());
+        assert!(median(&[]).is_finite());
+        let r0 = mann_whitney_u(&[], &[]);
+        assert!(r0.u.is_finite() && r0.u1.is_finite() && r0.u2.is_finite());
+        assert!(r0.p_value.is_finite());
+        let r1 = mann_whitney_u(&[1.0, 1.0], &[1.0, 1.0]);
+        assert!(r1.p_value.is_finite());
+        assert!(phi(0.0).is_finite());
+        assert!(erf(0.0).is_finite());
+    }
+
+    // --- property tests for the statistics invariants ---
+
+    proptest! {
+        #![proptest_config(ProptestConfig::with_cases(128))]
+
+        /// The geometric mean is a function of the multiset of values: it is
+        /// invariant under any permutation of the input (the product of the
+        /// values is commutative).
+        #[test]
+        fn prop_geomean_permutation_invariant(
+            mut vals in prop::collection::vec(0.01f64..1000.0, 1..=12),
+            seed in any::<u64>(),
+        ) {
+            let base = geomean(&vals);
+            // Deterministic Fisher-Yates shuffle driven by `seed` so the
+            // property is a pure rearrangement of the same multiset.
+            let mut state = seed | 1;
+            for i in (1..vals.len()).rev() {
+                state = state.wrapping_mul(6364136223846793005).wrapping_add(1);
+                let j = (state >> 33) as usize % (i + 1);
+                vals.swap(i, j);
+            }
+            let shuffled = geomean(&vals);
+            // Relative tolerance: ln/exp accumulates rounding across orderings.
+            prop_assert!(
+                (base - shuffled).abs() <= 1e-9 * base.abs().max(1.0),
+                "geomean changed under permutation: {} vs {}",
+                base,
+                shuffled
+            );
+        }
+
+        /// `percentile` is bounded by the sample's min and max and is monotone
+        /// non-decreasing in `p`. Both are core type-7 invariants and both must
+        /// produce finite values.
+        #[test]
+        fn prop_percentile_bounded_and_monotone(
+            vals in prop::collection::vec(-500.0f64..500.0, 1..=20),
+            p_lo in 0.0f64..=1.0,
+            delta in 0.0f64..=1.0,
+        ) {
+            let p_hi = (p_lo + delta).min(1.0);
+            let min = vals.iter().cloned().fold(f64::INFINITY, f64::min);
+            let max = vals.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
+            let q_lo = percentile(&vals, p_lo);
+            let q_hi = percentile(&vals, p_hi);
+            prop_assert!(q_lo.is_finite() && q_hi.is_finite());
+            // Bounded by the data range (small slack for interpolation rounding).
+            prop_assert!(q_lo >= min - 1e-9 && q_lo <= max + 1e-9, "{} not in [{},{}]", q_lo, min, max);
+            // Monotone non-decreasing in p.
+            prop_assert!(q_hi >= q_lo - 1e-9, "percentile not monotone: {} < {}", q_hi, q_lo);
+        }
+
+        /// The partition identity `u1 + u2 == n1 * n2` holds for ANY pair of
+        /// non-empty samples, and the reported `u` is the smaller of the two.
+        /// The two-sided p-value is always a finite probability in [0, 1].
+        #[test]
+        fn prop_mann_whitney_partition_identity(
+            a in prop::collection::vec(-50.0f64..50.0, 1..=15),
+            b in prop::collection::vec(-50.0f64..50.0, 1..=15),
+        ) {
+            let r = mann_whitney_u(&a, &b);
+            let n1n2 = (a.len() * b.len()) as f64;
+            prop_assert!(
+                (r.u1 + r.u2 - n1n2).abs() < 1e-9,
+                "u1 {} + u2 {} != n1*n2 {}",
+                r.u1, r.u2, n1n2
+            );
+            prop_assert!((r.u - r.u1.min(r.u2)).abs() < 1e-9);
+            prop_assert!(r.p_value.is_finite() && (0.0..=1.0).contains(&r.p_value));
+        }
+    }
+
+    // --- Task 2: ModelStats / CorpusReport constructors ---
+
+    /// A `LayoutMetrics` whose `node_overlap` carries `cost` and every other
+    /// term is zero, so `weighted_cost` with `node_overlap == 1.0` returns
+    /// exactly `cost`. Keeps the test fixtures readable while still exercising
+    /// the real struct.
+    fn metrics_with_cost(cost: f64) -> LayoutMetrics {
+        LayoutMetrics {
+            node_overlap: cost,
+            node_connector_overlap: 0.0,
+            label_overlap: 0.0,
+            crossings: 0.0,
+            sprawl: 0.0,
+            edge_length_cv: 0.0,
+            aspect_penalty: 0.0,
+            chain_straightness: 0.0,
+            loop_compactness: 0.0,
+        }
+    }
+
+    fn sample(seed: u64, cost: f64) -> MetricSample {
+        MetricSample {
+            seed,
+            metrics: metrics_with_cost(cost),
+            weighted_cost: cost,
+        }
+    }
+
+    #[test]
+    fn test_from_samples_known_set() {
+        // Five seeds with hand-pickable costs.
+        //   seed 1 -> 10, seed 2 -> 30, seed 3 -> 20, seed 4 -> 50, seed 5 -> 40
+        // Sorted costs: [10, 20, 30, 40, 50].
+        //   median (type-7, p=0.5) = 30
+        //   p25 = 20, p75 = 40
+        //   global min cost = 10 (seed 1), max cost = 50 (seed 4)
+        //   median-nearest cost = 30 (seed 2)
+        let samples = vec![
+            sample(1, 10.0),
+            sample(2, 30.0),
+            sample(3, 20.0),
+            sample(4, 50.0),
+            sample(5, 40.0),
+        ];
+        // Production seeds: 3 and 5 (costs 20 and 40). Min over them is 20, which
+        // is NOT the global min (10, seed 1). This is the "best-of-k differs from
+        // the global min" case.
+        let production_seeds = [3u64, 5u64];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &production_seeds);
+
+        assert_eq!(stats.model, "m");
+        assert_eq!(stats.median_cost, 30.0);
+        assert_eq!(stats.spread, (20.0, 40.0));
+        assert_eq!(
+            stats.best_of_k_cost, 20.0,
+            "best-of-k must use production seeds"
+        );
+        assert_eq!(stats.best_seed, 1, "global min cost is seed 1");
+        assert_eq!(stats.worst_seed, 4, "global max cost is seed 4");
+        assert_eq!(stats.median_seed, 2, "median-nearest cost is seed 2");
+    }
+
+    #[test]
+    fn test_from_samples_best_of_k_falls_back_to_global_min() {
+        // No production seed was sampled -> best_of_k_cost falls back to global
+        // min weighted_cost.
+        let samples = vec![sample(1, 10.0), sample(2, 30.0), sample(3, 20.0)];
+        let production_seeds = [100u64, 200u64];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &production_seeds);
+        assert_eq!(
+            stats.best_of_k_cost, 10.0,
+            "no production seed sampled -> global min"
+        );
+    }
+
+    #[test]
+    fn test_from_samples_median_seed_tie_break_lowest() {
+        // Two seeds equidistant from the median cost: the lower seed wins.
+        //   seeds 5, 9 with costs 10 and 30; sorted costs [10, 30] -> median 20.
+        //   |10 - 20| == |30 - 20| == 10, a tie. Lowest seed (5) must win.
+        let samples = vec![sample(9, 30.0), sample(5, 10.0)];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &[]);
+        assert_eq!(stats.median_cost, 20.0);
+        assert_eq!(stats.median_seed, 5, "tie must break on the lowest seed");
+    }
+
+    #[test]
+    fn test_from_samples_worst_seed_tie_break_lowest() {
+        // Two seeds SHARE the maximum cost; the lower seed must win. The third
+        // (lower-cost) sample ensures the max is a genuine tie, not the only
+        // value. seeds 7 and 4 both cost 50 (the max); seed 2 costs 10.
+        // worst_seed must be 4 (the lower of the two tied-at-max seeds), NOT 7.
+        // This fails if the tie-break direction in from_samples were reversed
+        // (a `.then(x.seed.cmp(&y.seed))` after max_by would pick 7).
+        let samples = vec![sample(7, 50.0), sample(2, 10.0), sample(4, 50.0)];
+        let stats = ModelStats::from_samples("m".to_string(), samples, &[]);
+        assert_eq!(
+            stats.worst_seed, 4,
+            "max-cost tie must break on the lowest seed"
+        );
+    }
+
+    #[test]
+    fn test_from_samples_empty_is_all_zero() {
+        let stats = ModelStats::from_samples("empty".to_string(), vec![], &[1, 2, 3]);
+        assert_eq!(stats.median_cost, 0.0);
+        assert_eq!(stats.spread, (0.0, 0.0));
+        assert_eq!(stats.best_of_k_cost, 0.0);
+        assert_eq!(stats.best_seed, 0);
+        assert_eq!(stats.median_seed, 0);
+        assert_eq!(stats.worst_seed, 0);
+        // Finite, no NaN.
+        assert!(stats.median_cost.is_finite());
+        assert!(stats.spread.0.is_finite() && stats.spread.1.is_finite());
+        assert!(stats.best_of_k_cost.is_finite());
+    }
+
+    fn model_stats_with_median(model: &str, median: f64) -> ModelStats {
+        // Build a one-sample model whose median equals `median`.
+        ModelStats::from_samples(model.to_string(), vec![sample(1, median)], &[1])
+    }
+
+    #[test]
+    fn test_from_model_stats_geomean_of_medians() {
+        // Three models with medians 2, 8, 32: geomean = cbrt(2*8*32) = cbrt(512) = 8.
+        let per_model = vec![
+            model_stats_with_median("a", 2.0),
+            model_stats_with_median("b", 8.0),
+            model_stats_with_median("c", 32.0),
+        ];
+        let medians: Vec<f64> = per_model.iter().map(|m| m.median_cost).collect();
+        let report = CorpusReport::from_model_stats(per_model);
+        assert!(
+            close(report.geomean_of_medians, geomean(&medians)),
+            "{} != {}",
+            report.geomean_of_medians,
+            geomean(&medians)
+        );
+        assert!(
+            close(report.geomean_of_medians, 8.0),
+            "{}",
+            report.geomean_of_medians
+        );
+    }
+
+    #[test]
+    fn test_from_model_stats_zero_median_does_not_zero_aggregate() {
+        // A model with median 0 must not collapse the corpus geomean to 0; the
+        // epsilon floor keeps it positive and finite.
+        let per_model = vec![
+            model_stats_with_median("a", 0.0),
+            model_stats_with_median("b", 10.0),
+            model_stats_with_median("c", 1000.0),
+        ];
+        let report = CorpusReport::from_model_stats(per_model);
+        assert!(
+            report.geomean_of_medians > 0.0,
+            "a single 0 median must not zero the aggregate: got {}",
+            report.geomean_of_medians
+        );
+        assert!(report.geomean_of_medians.is_finite());
+        // It must equal the geomean of the floored medians, exactly.
+        let floored = [GEOMEAN_FLOOR_EPSILON, 10.0, 1000.0];
+        assert!(
+            close(report.geomean_of_medians, geomean(&floored)),
+            "{} != {}",
+            report.geomean_of_medians,
+            geomean(&floored)
+        );
+    }
+
+    #[test]
+    fn test_from_model_stats_empty_corpus_is_zero() {
+        let report = CorpusReport::from_model_stats(vec![]);
+        assert_eq!(report.geomean_of_medians, 0.0);
+        assert!(report.geomean_of_medians.is_finite());
+    }
+
+    // --- Task 3: compare(baseline, candidate) ---
+
+    /// Build a `ModelStats` directly from a list of `(seed, cost)` pairs, with
+    /// no production seeds (best-of-k irrelevant for the comparison tests).
+    fn model_stats_from_costs(model: &str, seed_costs: &[(u64, f64)]) -> ModelStats {
+        let samples: Vec<MetricSample> = seed_costs
+            .iter()
+            .map(|&(seed, cost)| sample(seed, cost))
+            .collect();
+        ModelStats::from_samples(model.to_string(), samples, &[])
+    }
+
+    #[test]
+    fn test_compare_identical_report_is_zero_and_nonsignificant() {
+        // AC4.5: comparing a report against itself must report no change and no
+        // significance, with p-values pinned to the non-significant default.
+        let report = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0), (4, 40.0)]),
+            model_stats_from_costs("b", &[(1, 5.0), (2, 15.0), (3, 25.0), (4, 35.0)]),
+        ]);
+
+        let cmp = compare(&report, &report);
+
+        assert_eq!(cmp.per_model.len(), 2);
+        for m in &cmp.per_model {
+            assert_eq!(m.delta_ratio, 0.0, "model {} delta_ratio", m.model);
+            assert!(!m.significant, "model {} must not be significant", m.model);
+            // Identical seed samples ⇒ every value tied ⇒ non-significant.
+            assert!(
+                m.p_value > 0.5,
+                "model {} p_value {} should be non-significant",
+                m.model,
+                m.p_value
+            );
+        }
+        assert_eq!(cmp.aggregate_delta_ratio, 0.0);
+        assert!(!cmp.aggregate_significant);
+        assert!(
+            cmp.aggregate_p_value > 0.5,
+            "aggregate p_value {} should be non-significant",
+            cmp.aggregate_p_value
+        );
+    }
+
+    #[test]
+    fn test_compare_clear_improvement_is_negative_and_significant() {
+        // Candidate strictly below baseline with non-overlapping seed samples:
+        // the aggregate delta is negative and the per-model verdict is
+        // significant where the two samples completely separate.
+        let baseline = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs(
+                "a",
+                &[(1, 100.0), (2, 110.0), (3, 120.0), (4, 130.0), (5, 140.0)],
+            ),
+            model_stats_from_costs(
+                "b",
+                &[(1, 200.0), (2, 210.0), (3, 220.0), (4, 230.0), (5, 240.0)],
+            ),
+        ]);
+        let candidate = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs(
+                "a",
+                &[(1, 10.0), (2, 11.0), (3, 12.0), (4, 13.0), (5, 14.0)],
+            ),
+            model_stats_from_costs(
+                "b",
+                &[(1, 20.0), (2, 21.0), (3, 22.0), (4, 23.0), (5, 24.0)],
+            ),
+        ]);
+
+        let cmp = compare(&baseline, &candidate);
+
+        assert_eq!(cmp.per_model.len(), 2);
+        for m in &cmp.per_model {
+            assert!(
+                m.delta_ratio < 0.0,
+                "model {} delta_ratio {} should be negative",
+                m.model,
+                m.delta_ratio
+            );
+            assert!(
+                m.candidate_median < m.baseline_median,
+                "model {} candidate median {} should be below baseline {}",
+                m.model,
+                m.candidate_median,
+                m.baseline_median
+            );
+            assert!(
+                m.significant,
+                "model {} (completely separated samples) should be significant; p_value {}",
+                m.model, m.p_value
+            );
+        }
+        assert!(
+            cmp.aggregate_delta_ratio < 0.0,
+            "aggregate_delta_ratio {} should be negative",
+            cmp.aggregate_delta_ratio
+        );
+    }
+
+    #[test]
+    fn test_compare_only_matched_models_are_compared() {
+        // Models are matched by name; a model present in only one report is
+        // skipped. baseline has {a, b, only_baseline}; candidate has
+        // {a, b, only_candidate}. The matched set compared is {a, b}, in
+        // baseline order.
+        let baseline = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs("only_baseline", &[(1, 1.0), (2, 2.0)]),
+            model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0)]),
+            model_stats_from_costs("b", &[(1, 100.0), (2, 200.0), (3, 300.0)]),
+        ]);
+        let candidate = CorpusReport::from_model_stats(vec![
+            model_stats_from_costs("b", &[(1, 100.0), (2, 200.0), (3, 300.0)]),
+            model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0)]),
+            model_stats_from_costs("only_candidate", &[(1, 9.0), (2, 8.0)]),
+        ]);
+
+        let cmp = compare(&baseline, &candidate);
+
+        // Exactly the two matched models, in baseline iteration order.
+        let names: Vec<&str> = cmp.per_model.iter().map(|m| m.model.as_str()).collect();
+        assert_eq!(
+            names,
+            vec!["a", "b"],
+            "only matched models, in baseline order"
+        );
+        // The unmatched names appear nowhere.
+        assert!(!names.contains(&"only_baseline"));
+        assert!(!names.contains(&"only_candidate"));
+    }
+
+    #[test]
+    fn test_compare_zero_baseline_median_no_divide_by_zero() {
+        // No NaN: a model whose baseline median is 0 yields delta_ratio == 0.0
+        // (not inf/NaN) and every reported field stays finite.
+        let baseline = CorpusReport::from_model_stats(vec![model_stats_from_costs(
+            "z",
+            &[(1, 0.0), (2, 0.0), (3, 0.0)],
+        )]);
+        let candidate = CorpusReport::from_model_stats(vec![model_stats_from_costs(
+            "z",
+            &[(1, 5.0), (2, 6.0), (3, 7.0)],
+        )]);
+
+        let cmp = compare(&baseline, &candidate);
+
+        assert_eq!(cmp.per_model.len(), 1);
+        let m = &cmp.per_model[0];
+        assert_eq!(m.baseline_median, 0.0);
+        assert_eq!(
+            m.delta_ratio, 0.0,
+            "delta_ratio with a 0 baseline median must be 0.0, not inf/NaN"
+        );
+        assert!(m.delta_ratio.is_finite());
+        assert!(m.candidate_median.is_finite());
+        assert!(m.p_value.is_finite());
+        assert!(cmp.aggregate_delta_ratio.is_finite());
+        assert!(cmp.aggregate_p_value.is_finite());
+    }
+
+    #[test]
+    fn test_compare_empty_reports_are_finite_and_nonsignificant() {
+        // Degenerate input: two empty corpora compare to no per-model rows, a
+        // zero aggregate delta, and a finite non-significant verdict.
+        let empty = CorpusReport::from_model_stats(vec![]);
+        let cmp = compare(&empty, &empty);
+        assert!(cmp.per_model.is_empty());
+        assert_eq!(cmp.aggregate_delta_ratio, 0.0);
+        assert!(cmp.aggregate_delta_ratio.is_finite());
+        assert!(cmp.aggregate_p_value.is_finite());
+        assert!(!cmp.aggregate_significant);
+    }
+
+    #[test]
+    fn test_compare_no_matched_models_is_finite() {
+        // Reports with disjoint model names share no matched models: no
+        // per-model rows, a zero aggregate delta, and a finite verdict.
+        let baseline =
+            CorpusReport::from_model_stats(vec![model_stats_from_costs("a", &[(1, 10.0)])]);
+        let candidate =
+            CorpusReport::from_model_stats(vec![model_stats_from_costs("b", &[(1, 20.0)])]);
+        let cmp = compare(&baseline, &candidate);
+        assert!(cmp.per_model.is_empty());
+        assert_eq!(cmp.aggregate_delta_ratio, 0.0);
+        assert!(cmp.aggregate_delta_ratio.is_finite());
+        assert!(cmp.aggregate_p_value.is_finite());
+        assert!(!cmp.aggregate_significant);
+    }
+
+    #[test]
+    fn test_compare_significance_alpha_is_five_percent() {
+        // The exported significance threshold is the conventional 0.05.
+        assert_eq!(SIGNIFICANCE_ALPHA, 0.05);
+    }
+}
diff --git a/src/simlin-engine/src/layout/layout_selection_tests.rs b/src/simlin-engine/src/layout/layout_selection_tests.rs
new file mode 100644
index 000000000..e7e832e55
--- /dev/null
+++ b/src/simlin-engine/src/layout/layout_selection_tests.rs
@@ -0,0 +1,502 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+//! Rung-0 layout-selection and regression-guard tests (Phase 5 of the layout
+//! quality eval): `select_best_layout` picks the lowest `weighted_cost`
+//! candidate (even when that means *more* connector crossings than a rival),
+//! the deterministic per-model `weighted_cost` ceiling guards against quality
+//! regressions, and a fixed seed reproduces a byte-identical layout. Split out
+//! of `layout_tests.rs` to keep that file under the per-file line cap, mirroring
+//! the `crossings_tests.rs` precedent.
+
+use super::*;
+use crate::datamodel;
+use crate::layout::metrics::{MetricWeights, compute_layout_metrics};
+use crate::test_common::TestProject;
+
+/// `TestProject::build_datamodel` synthesizes a single model named `"main"`, so
+/// every `generate_layout_with_config` call in this file targets that name.
+const MAIN_MODEL: &str = "main";
+
+/// A scalar aux at (`x`, `y`) with a unique name, so a selected view can be
+/// identified by which marker element it carries.
+fn marker_aux(uid: i32, name: &str, x: f64, y: f64) -> ViewElement {
+    ViewElement::Aux(view_element::Aux {
+        name: name.to_string(),
+        uid,
+        x,
+        y,
+        label_side: LabelSide::Bottom,
+        compat: None,
+    })
+}
+
+fn sel_link(uid: i32, from_uid: i32, to_uid: i32) -> ViewElement {
+    ViewElement::Link(view_element::Link {
+        uid,
+        from_uid,
+        to_uid,
+        shape: LinkShape::Straight,
+        polarity: None,
+    })
+}
+
+/// Wrap a set of view elements into a `StockFlow` carrying `name` as its marker
+/// so `select_best_layout`'s winner is identifiable.
+fn sel_view(name: &str, elements: Vec<ViewElement>) -> datamodel::StockFlow {
+    datamodel::StockFlow {
+        name: Some(name.to_string()),
+        elements,
+        view_box: Rect {
+            x: 0.0,
+            y: 0.0,
+            width: 1000.0,
+            height: 1000.0,
+        },
+        zoom: 1.0,
+        use_lettered_polarity: false,
+        font: None,
+        sketch_compat: None,
+    }
+}
+
+/// A view whose two straight links cross exactly once (the diagonals of a
+/// square): `count_view_crossings == 1`.
+fn crossing_view(name: &str) -> datamodel::StockFlow {
+    sel_view(
+        name,
+        vec![
+            marker_aux(1, "a1", 0.0, 0.0),
+            marker_aux(2, "a2", 100.0, 100.0),
+            marker_aux(3, "a3", 0.0, 100.0),
+            marker_aux(4, "a4", 100.0, 0.0),
+            sel_link(10, 1, 2),
+            sel_link(11, 3, 4),
+        ],
+    )
+}
+
+/// A view whose two straight links share an endpoint and never cross:
+/// `count_view_crossings == 0`.
+fn non_crossing_view(name: &str) -> datamodel::StockFlow {
+    sel_view(
+        name,
+        vec![
+            marker_aux(1, "a1", 50.0, 50.0),
+            marker_aux(2, "a2", 100.0, 0.0),
+            marker_aux(3, "a3", 100.0, 100.0),
+            sel_link(10, 1, 2),
+            sel_link(11, 1, 3),
+        ],
+    )
+}
+
+/// AC6.1: selection minimizes `weighted_cost`, not crossings. The lowest-cost
+/// candidate is deliberately built from a view with MORE connector crossings
+/// than a rival, so the old "fewest crossings" rule would have picked the other
+/// one. We assert the crossing inversion is real (via `count_view_crossings`),
+/// then assert `select_best_layout` returns the lowest-`weighted_cost` view.
+#[test]
+fn test_select_best_layout_minimizes_weighted_cost_over_crossings() {
+    let crossing = crossing_view("more_crossings_low_cost");
+    let non_crossing = non_crossing_view("fewer_crossings_high_cost");
+
+    // The inversion is genuine, not just narrative: the candidate we expect to
+    // win actually has strictly more crossings than the one we expect to lose.
+    let crossing_count = count_view_crossings(&crossing);
+    let non_crossing_count = count_view_crossings(&non_crossing);
+    assert_eq!(crossing_count, 1, "crossing view should have one crossing");
+    assert_eq!(
+        non_crossing_count, 0,
+        "non-crossing view should have zero crossings"
+    );
+    assert!(
+        crossing_count > non_crossing_count,
+        "the low-cost candidate must have more crossings than its rival, \
+         so the choice differs from the old crossings-only rule"
+    );
+
+    // Hand-set costs so the MORE-crossings view is the cheaper one. Under the
+    // retired crossings-only rule `fewer_crossings_high_cost` (0 crossings)
+    // would win; under Rung 0 the lower `weighted_cost` wins.
+    let results = vec![
+        Ok(LayoutResult {
+            view: crossing,
+            weighted_cost: 1.0,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: non_crossing,
+            weighted_cost: 5.0,
+            seed: 123,
+        }),
+    ];
+
+    let best = select_best_layout(results).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("more_crossings_low_cost"),
+        "the lowest-weighted_cost candidate must win even with more crossings"
+    );
+}
+
+/// AC6.1 (tie-break): equal `weighted_cost`, the lower seed wins. This is the
+/// same rule `test_select_best_layout_lowest_seed_on_tie` (in `layout_tests.rs`)
+/// pins on hand-built `StockFlow` literals; here we re-assert it through the
+/// marker-named helpers for completeness alongside the cost-ordering case.
+#[test]
+fn test_select_best_layout_tie_breaks_on_lowest_seed() {
+    let results = vec![
+        Ok(LayoutResult {
+            view: sel_view("seed_456", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 2.5,
+            seed: 456,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("seed_42", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 2.5,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("seed_789", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 2.5,
+            seed: 789,
+        }),
+    ];
+
+    let best = select_best_layout(results).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("seed_42"),
+        "on a weighted_cost tie the lowest seed wins"
+    );
+}
+
+/// AC6.1 (NaN safety): a NaN-cost challenger must never displace a finite
+/// running best. `select_best_layout` keeps the running best whenever the
+/// challenger's `<` comparison is false, and `challenger < finite` is always
+/// false for a NaN challenger -- so a degenerate NaN-cost candidate encountered
+/// after a finite one cannot win.
+#[test]
+fn test_select_best_layout_nan_challenger_never_displaces_finite() {
+    // Finite candidate first, then NaN: the NaN must not displace it.
+    let finite_first = vec![
+        Ok(LayoutResult {
+            view: sel_view("finite", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 4.0,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 123,
+        }),
+    ];
+    let best = select_best_layout(finite_first).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("finite"),
+        "a NaN-cost challenger must not displace a finite running best"
+    );
+
+    // A NaN that arrives last among several finite candidates still loses: the
+    // finite minimum is already the running best by the time NaN is compared.
+    let nan_last = vec![
+        Ok(LayoutResult {
+            view: sel_view("hi", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 9.0,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("lo", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 1.0,
+            seed: 123,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 456,
+        }),
+    ];
+    let best = select_best_layout(nan_last).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("lo"),
+        "the finite minimum wins; a trailing NaN candidate cannot displace it"
+    );
+}
+
+/// AC6.1 (NaN safety, order-independent): a finite challenger must beat a NaN
+/// running best regardless of position. The fold seeds the running best with the
+/// FIRST result, so a degenerate NaN-cost layout from the first seed could
+/// otherwise become a sticky running best (`finite < NaN` is false and `finite
+/// == NaN` is false, so a plain `<` comparison never overtakes it). The fold
+/// special-cases a NaN running best so a later finite candidate always wins. In
+/// production (`generate_best_layout` runs seeds in the fixed order [42, 123,
+/// 456, 789]), this guarantees a usable finite layout is shipped whenever ANY
+/// seed produced one, no matter which seed degenerated.
+#[test]
+fn test_select_best_layout_finite_beats_nan_running_best() {
+    let nan_first = vec![
+        Ok(LayoutResult {
+            view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 42,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("finite", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: 4.0,
+            seed: 123,
+        }),
+    ];
+    let best = select_best_layout(nan_first).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("finite"),
+        "a finite challenger must beat a NaN running best regardless of order"
+    );
+}
+
+/// AC6.1 (NaN safety, all-NaN determinism): when EVERY candidate has a NaN cost,
+/// neither the `<` comparison nor the NaN special-cases fire (a NaN challenger is
+/// never "better"), so the earliest candidate is kept. This is deterministic
+/// regardless of seed order -- the production caller would ship the first seed's
+/// (degenerate) layout, but the choice is reproducible rather than arbitrary.
+#[test]
+fn test_select_best_layout_all_nan_keeps_earliest() {
+    let all_nan = vec![
+        Ok(LayoutResult {
+            view: sel_view("first", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 456,
+        }),
+        Ok(LayoutResult {
+            view: sel_view("second", vec![marker_aux(1, "a", 0.0, 0.0)]),
+            weighted_cost: f64::NAN,
+            seed: 42,
+        }),
+    ];
+    let best = select_best_layout(all_nan).expect("selection should succeed");
+    assert_eq!(
+        best.name.as_deref(),
+        Some("first"),
+        "when all candidates are NaN the earliest is kept deterministically"
+    );
+}
+
+// ---- AC7: deterministic weighted_cost regression guard ----
+//
+// The thresholds below are observed-cost CEILINGS captured at the fixed
+// annealing seed 42 with the calibrated `MetricWeights::default()`. They guard
+// against layout-quality regressions: if a change to the layout algorithm,
+// metric, or weights pushes a tiny model's fixed-seed `weighted_cost` above its
+// ceiling, this test fails loudly. Each ceiling sits a small margin above the
+// observed cost (roughly observed * 1.15, or a small absolute floor when the
+// observed cost is 0) -- tight enough to catch a real regression, loose enough
+// not to flake on float noise.
+//
+// To regenerate after an INTENTIONAL metric/weight change: layout is
+// deterministic per seed, so print the new `weighted_cost` for each guard model
+// (e.g. add a temporary `println!` to `guard_fixed_seed_cost`), run this test
+// once, and reset each ceiling a small margin above the new observed value.
+// Lowering a ceiling that no longer matches reality is fine; raising one to
+// paper over a real regression is not.
+//
+// Observed at seed 42 (2026-05-23): pop = 0.0533, chain = 0.0,
+// two_stock = 0.1646.
+const GUARD_POP_COST_CEILING: f64 = 0.06;
+const GUARD_CHAIN_COST_CEILING: f64 = 0.05;
+const GUARD_TWO_STOCK_COST_CEILING: f64 = 0.19;
+
+/// Lay `project`'s `main` model out at the fixed seed 42 and return its
+/// calibrated `weighted_cost`. Seeding explicitly (rather than relying on the
+/// `LayoutConfig::default()` seed) keeps the guard pinned to one reproducible
+/// layout even if the default seed changes.
+fn guard_fixed_seed_cost(project: &datamodel::Project) -> f64 {
+    let config = LayoutConfig {
+        annealing_random_seed: 42,
+        ..LayoutConfig::default()
+    };
+    let view = generate_layout_with_config(project, MAIN_MODEL, config.clone(), None)
+        .expect("layout generation should succeed");
+    compute_layout_metrics(&view, &config).weighted_cost(&MetricWeights::default())
+}
+
+/// A population stock with births/deaths flows and two rate auxes -- the
+/// canonical tiny feedback model.
+fn guard_pop_model() -> datamodel::Project {
+    TestProject::new("guard_pop")
+        .stock("population", "100", &["births"], &["deaths"], None)
+        .flow("births", "population * birth_rate", None)
+        .flow("deaths", "population * death_rate", None)
+        .aux("birth_rate", "0.03", None)
+        .aux("death_rate", "0.01", None)
+        .build_datamodel()
+}
+
+/// A pure auxiliary dependency chain (no stocks): a -> b -> c -> d.
+fn guard_chain_model() -> datamodel::Project {
+    TestProject::new("guard_chain")
+        .aux("a", "1", None)
+        .aux("b", "a * 2", None)
+        .aux("c", "b + a", None)
+        .aux("d", "c * b", None)
+        .build_datamodel()
+}
+
+/// A two-stock transfer model: source -> transfer -> sink, rate-driven.
+fn guard_two_stock_model() -> datamodel::Project {
+    TestProject::new("guard_two_stock")
+        .stock("source", "100", &[], &["transfer"], None)
+        .stock("sink", "0", &["transfer"], &[], None)
+        .flow("transfer", "source * rate", None)
+        .aux("rate", "0.1", None)
+        .build_datamodel()
+}
+
+/// AC7.1: the fixed-seed `weighted_cost` of each tiny guard model stays at or
+/// below its committed ceiling. Fast and deterministic: three tiny models, one
+/// seed each.
+#[test]
+fn test_weighted_cost_regression_guard() {
+    let cases: [(&str, datamodel::Project, f64); 3] = [
+        ("pop", guard_pop_model(), GUARD_POP_COST_CEILING),
+        ("chain", guard_chain_model(), GUARD_CHAIN_COST_CEILING),
+        (
+            "two_stock",
+            guard_two_stock_model(),
+            GUARD_TWO_STOCK_COST_CEILING,
+        ),
+    ];
+
+    for (name, project, ceiling) in cases {
+        let cost = guard_fixed_seed_cost(&project);
+        assert!(
+            cost <= ceiling,
+            "{name}: fixed-seed weighted_cost {cost} exceeded ceiling {ceiling} \
+             -- a layout-quality regression (or an intentional metric/weight \
+             change that needs the ceiling regenerated)"
+        );
+    }
+}
+
+/// AC7.2: the guard ceiling actually discriminates good layouts from bad ones.
+/// We take a real fixed-seed layout of the pop model and pile every node onto
+/// the same coordinate, blowing up the node-overlap term, then assert the
+/// resulting `weighted_cost` exceeds the ceiling -- so a real layout that
+/// regressed to this level WOULD trip `test_weighted_cost_regression_guard`.
+/// This makes the failure direction explicit and testable without flakiness.
+#[test]
+fn test_weighted_cost_guard_rejects_degenerate_layout() {
+    let project = guard_pop_model();
+    let config = LayoutConfig {
+        annealing_random_seed: 42,
+        ..LayoutConfig::default()
+    };
+    let view = generate_layout_with_config(&project, MAIN_MODEL, config.clone(), None)
+        .expect("layout generation should succeed");
+
+    // Collapse every positioned node onto the origin so the shapes overlap
+    // maximally (links/aliases/groups have no independent position).
+    let mut degenerate = view.clone();
+    for elem in &mut degenerate.elements {
+        match elem {
+            ViewElement::Aux(a) => {
+                a.x = 0.0;
+                a.y = 0.0;
+            }
+            ViewElement::Stock(s) => {
+                s.x = 0.0;
+                s.y = 0.0;
+            }
+            ViewElement::Flow(f) => {
+                f.x = 0.0;
+                f.y = 0.0;
+            }
+            ViewElement::Module(m) => {
+                m.x = 0.0;
+                m.y = 0.0;
+            }
+            ViewElement::Cloud(c) => {
+                c.x = 0.0;
+                c.y = 0.0;
+            }
+            ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => {}
+        }
+    }
+
+    let degenerate_cost =
+        compute_layout_metrics(&degenerate, &config).weighted_cost(&MetricWeights::default());
+    assert!(
+        degenerate_cost > GUARD_POP_COST_CEILING,
+        "a degenerate all-overlapping layout (cost {degenerate_cost}) must exceed \
+         the guard ceiling {GUARD_POP_COST_CEILING}, proving the guard discriminates"
+    );
+}
+
+/// A model with enough nodes (a stock fed/drained by ten leaf auxes through two
+/// flows) that the SFDP/annealing RNG genuinely shapes the layout, so two
+/// different seeds produce two different layouts. The tiny guard models above
+/// converge to one arrangement regardless of seed, which would make a
+/// determinism check vacuous; this model exercises the seeded path.
+fn guard_seed_sensitive_model() -> datamodel::Project {
+    let mut tp = TestProject::new("guard_seed_sensitive")
+        .stock("s", "100", &["inflow"], &["outflow"], None)
+        .flow("inflow", "a1 + a2 + a3 + a4 + a5", None)
+        .flow("outflow", "b1 + b2 + b3 + b4 + b5", None);
+    for i in 1..=5 {
+        tp = tp.aux(&format!("a{i}"), "1", None);
+        tp = tp.aux(&format!("b{i}"), "1", None);
+    }
+    tp.build_datamodel()
+}
+
+/// Lay `project`'s `main` model out at `seed`.
+fn layout_at_seed(project: &datamodel::Project, seed: u64) -> datamodel::StockFlow {
+    let config = LayoutConfig {
+        annealing_random_seed: seed,
+        ..LayoutConfig::default()
+    };
+    generate_layout_with_config(project, MAIN_MODEL, config, None)
+        .expect("layout generation should succeed")
+}
+
+/// AC8.1: a fixed seed reproduces a byte-identical layout. Generating the same
+/// model twice through `generate_layout_with_config` at the same explicit seed
+/// must yield two `StockFlow` values that compare equal (`StockFlow` derives
+/// `PartialEq`, so this checks every field -- positions, view box, element
+/// order -- not just element counts).
+///
+/// We use a seed-sensitive model and also assert that a DIFFERENT seed yields a
+/// DIFFERENT layout, so the same-seed equality is a real determinism guarantee
+/// rather than a vacuous pass on a model whose layout ignores the seed.
+///
+/// This per-seed reproducibility is distinct from the Phase 3 M-seed
+/// statistical sweep, which deliberately VARIES the seed to sample the layout
+/// distribution. Here the seed is held fixed and the layout must be exactly
+/// repeatable; there the seed sweeps and the layouts are expected to differ.
+/// The integration test `tests/layout.rs` already asserts `view1 == view2` for
+/// `generate_layout`; this focused in-crate test covers the
+/// `generate_layout_with_config` + explicit-seed Rung-0 path.
+#[test]
+fn test_layout_is_byte_identical_for_fixed_seed() {
+    let project = guard_seed_sensitive_model();
+
+    let view1 = layout_at_seed(&project, 7);
+    let view2 = layout_at_seed(&project, 7);
+    assert_eq!(
+        view1, view2,
+        "the same model at the same fixed seed must produce a byte-identical layout"
+    );
+
+    // Non-vacuity: a different seed must produce a different layout, proving the
+    // equality above reflects genuine per-seed determinism (not a seed-agnostic
+    // model where any pair would compare equal).
+    let other = layout_at_seed(&project, 999);
+    assert_ne!(
+        view1, other,
+        "a different seed should produce a different layout, so the same-seed \
+         equality is a meaningful determinism guarantee"
+    );
+}
diff --git a/src/simlin-engine/src/layout/layout_tests.rs b/src/simlin-engine/src/layout/layout_tests.rs
index 58fc8d1ec..77e82997b 100644
--- a/src/simlin-engine/src/layout/layout_tests.rs
+++ b/src/simlin-engine/src/layout/layout_tests.rs
@@ -528,70 +528,6 @@ fn test_extract_equation_deps_arrayed_uses_all_entries() {
     assert_eq!(deps, vec!["bar", "foo"]);
 }
 
-#[test]
-fn test_select_best_layout_fewest_crossings() {
-    let results = vec![
-        Ok(LayoutResult {
-            view: datamodel::StockFlow {
-                name: None,
-                elements: vec![ViewElement::Aux(view_element::Aux {
-                    name: "from_5_crossings".to_string(),
-                    uid: 1,
-                    x: 0.0,
-                    y: 0.0,
-                    label_side: LabelSide::Bottom,
-                    compat: None,
-                })],
-                view_box: Rect {
-                    x: 0.0,
-                    y: 0.0,
-                    width: 100.0,
-                    height: 100.0,
-                },
-                zoom: 1.0,
-                use_lettered_polarity: false,
-                font: None,
-                sketch_compat: None,
-            },
-            crossings: 5,
-            seed: 42,
-        }),
-        Ok(LayoutResult {
-            view: datamodel::StockFlow {
-                name: None,
-                elements: vec![ViewElement::Aux(view_element::Aux {
-                    name: "from_2_crossings".to_string(),
-                    uid: 2,
-                    x: 0.0,
-                    y: 0.0,
-                    label_side: LabelSide::Bottom,
-                    compat: None,
-                })],
-                view_box: Rect {
-                    x: 0.0,
-                    y: 0.0,
-                    width: 100.0,
-                    height: 100.0,
-                },
-                zoom: 1.0,
-                use_lettered_polarity: false,
-                font: None,
-                sketch_compat: None,
-            },
-            crossings: 2,
-            seed: 123,
-        }),
-    ];
-    let best = select_best_layout(results).unwrap();
-    // Should pick the one with 2 crossings (fewer is better)
-    assert_eq!(best.elements.len(), 1);
-    if let ViewElement::Aux(aux) = &best.elements[0] {
-        assert_eq!(aux.name, "from_2_crossings");
-    } else {
-        unreachable!("expected Aux element");
-    }
-}
-
 #[test]
 fn test_select_best_layout_lowest_seed_on_tie() {
     let results = vec![
@@ -617,7 +553,7 @@ fn test_select_best_layout_lowest_seed_on_tie() {
                 font: None,
                 sketch_compat: None,
             },
-            crossings: 3,
+            weighted_cost: 3.0,
             seed: 123,
         }),
         Ok(LayoutResult {
@@ -642,12 +578,13 @@ fn test_select_best_layout_lowest_seed_on_tie() {
                 font: None,
                 sketch_compat: None,
             },
-            crossings: 3,
+            weighted_cost: 3.0,
             seed: 42,
         }),
     ];
     let best = select_best_layout(results).unwrap();
-    // Should pick seed 42 (lower seed wins on tie)
+    // Equal weighted_cost on both: the lower seed wins the tie-break (still valid
+    // under the Rung-0 weighted_cost selection rule).
     assert_eq!(best.elements.len(), 1);
     if let ViewElement::Aux(aux) = &best.elements[0] {
         assert_eq!(aux.name, "from_seed_42");
diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs
new file mode 100644
index 000000000..ff139faf3
--- /dev/null
+++ b/src/simlin-engine/src/layout/metrics.rs
@@ -0,0 +1,2469 @@
+// Copyright 2026 The Simlin Authors. All rights reserved.
+// Use of this source code is governed by the Apache License,
+// Version 2.0, that can be found in the LICENSE file.
+
+// pattern: Functional Core
+//
+// The layout quality core. Every term here is computed purely from a
+// `datamodel::StockFlow` (and the `LayoutConfig` parameter, kept for
+// forward-compatibility with the design's optimizer signature). All geometry
+// comes from the same `diagram` helpers the SVG renderer uses and from
+// `layout::build_view_segments`, so a layout's quality score can never disagree
+// with the geometry the renderer draws or with `count_view_crossings`.
+//
+// There is NO I/O in this module: it takes data, computes scalars, returns
+// them. That makes every term trivially testable with hand-computed expected
+// values (see the inline tests below).
+
+use std::collections::{BTreeMap, BTreeSet, HashSet};
+
+use crate::datamodel::{self, ViewElement};
+use crate::diagram::common::{
+    self, Point, Rect, display_name, merge_bounds, rect_area, rect_overlap_area,
+    segment_clip_interval_in_rect,
+};
+use crate::diagram::connector::{ARC_POLYLINE_SAMPLES, connector_polyline, get_visual_center};
+use crate::diagram::elements::{
+    aux_bounds, aux_shape_bounds, cloud_bounds, module_bounds, stock_bounds, stock_shape_bounds,
+};
+use crate::diagram::flow::{flow_bounds, flow_shape_bounds};
+use crate::diagram::label::{LabelProps, label_bounds};
+
+use super::annealing::count_crossings;
+use super::build_view_segments;
+use super::config::LayoutConfig;
+
+/// Upper bound of the target aspect-ratio band. A view whose bounding-box
+/// aspect ratio (long side / short side, always >= 1) is at or below this value
+/// is "well-proportioned" and incurs no `aspect_penalty`. 16:9 is a generous
+/// band that comfortably contains the conventional 4:3 diagram proportions
+/// while still penalizing pathologically thin (e.g. 1x10) layouts.
+pub const TARGET_AR_MAX: f64 = 16.0 / 9.0;
+
+/// One quality cost per aesthetic concern, with `0.0` always meaning "ideal".
+///
+/// Most terms are scale-free by construction (ratios of like quantities), so
+/// they are comparable across models of different absolute coordinate scale.
+/// Three terms are *intentionally* sensitive to the absolute coordinate scale
+/// relative to the universal fixed node-box size (`node_overlap`,
+/// `label_overlap`, `sprawl`): a model whose nodes are packed tightly against
+/// the fixed pixel size of a stock/aux box should score differently from one
+/// spread far apart, and that sensitivity is what makes those terms meaningful
+/// across models. See the AC1.8 scoping note in the Phase 1 plan.
+///
+/// `Serialize`/`Deserialize` let the layout-quality eval sweep
+/// (`examples/layout_eval.rs`) emit the per-term breakdown into its
+/// `metrics.json` artifact and round-trip the committed baseline report back
+/// from JSON for the baseline diff; the struct is pure data (every field a
+/// plain `f64`), so the derives carry no behavior.
+#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize, serde::Deserialize)]
+pub struct LayoutMetrics {
+    /// Sum of pairwise node *shape*-box overlap area (label-free), normalized
+    /// by total shape-box area. Measures shapes overlapping shapes; label
+    /// collisions are charged by `label_overlap` instead.
+    pub node_overlap: f64,
+    /// Fraction of total connector length that passes through non-incident
+    /// node *shape* boxes (label-free). A connector under a node shape reads as
+    /// a false causal connection; a connector under only a label is not
+    /// charged here.
+    pub node_connector_overlap: f64,
+    /// Sum over labeled elements of each label's *obscured fraction*: the area
+    /// of the label box covered by any other label box or any other element's
+    /// bare shape box, capped at the label's own area and divided by it (so each
+    /// term is in [0,1]). 0 = no label obscured. Per-label so a small overlap
+    /// registers at its true obscuration fraction rather than being diluted by
+    /// the corpus's total label area.
+    pub label_overlap: f64,
+    /// Edge crossings normalized by connector count.
+    pub crossings: f64,
+    /// Mean connector length relative to the characteristic node size.
+    pub sprawl: f64,
+    /// Coefficient of variation (stddev/mean) of connector lengths.
+    pub edge_length_cv: f64,
+    /// How far the view bounding-box aspect ratio exceeds the target band.
+    pub aspect_penalty: f64,
+    /// Reserved; computed in a future rung. Always 0.0, weight 0.
+    pub chain_straightness: f64,
+    /// Mean isoperimetric penalty `1 - Q` over the view's feedback cycles
+    /// (`Q = 4*PI*Area / Perimeter^2` of each loop's node-center polygon,
+    /// clamped to [0,1]). 0.0 = clean, well-spread loops (circles); higher =
+    /// collapsed/collinear loops. 0.0 when the view has no cycle of >= 3 nodes.
+    /// Computed and reported now; weight stays 0 until Phase 4 calibration.
+    pub loop_compactness: f64,
+}
+
+/// Per-term weights for the scalar an optimizer minimizes.
+///
+/// `MetricWeights::default()` holds the calibrated production weights committed
+/// in Phase 4 (see the failure-mode rationale on the `Default` impl below).
+///
+/// `Serialize`/`Deserialize` let the layout-quality eval sweep
+/// (`examples/layout_eval.rs`) record the weight set it used in its
+/// `metrics.json` artifact and read it back when round-tripping the committed
+/// baseline report; the struct is pure data (every field a plain `f64`), so the
+/// derives carry no behavior.
+#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize, serde::Deserialize)]
+pub struct MetricWeights {
+    pub node_overlap: f64,
+    pub node_connector_overlap: f64,
+    pub label_overlap: f64,
+    pub crossings: f64,
+    pub sprawl: f64,
+    pub edge_length_cv: f64,
+    pub aspect_penalty: f64,
+    pub chain_straightness: f64,
+    pub loop_compactness: f64,
+}
+
+impl Default for MetricWeights {
+    /// The calibrated production weights, from the Phase 3 contact-sheet
+    /// calibration with explicit user sign-off (2026-05-23).
+    ///
+    /// Failure-mode rationale -- readability >> compactness:
+    ///   * The dominant concerns all carry weight 1.0: node-shape overlap
+    ///     (`node_overlap`), connectors passing under node shapes
+    ///     (`node_connector_overlap`), obscured labels (`label_overlap`), and
+    ///     edge `crossings`. These are the things that make a diagram unreadable
+    ///     or assert false causal connections, so they dominate the cost.
+    ///   * `sprawl`, `edge_length_cv`, and `aspect_penalty` are intentionally
+    ///     0.0: compactness and aspect ratio are NOT goals. Spreading nodes out
+    ///     to keep labels legible and feedback loops visible is GOOD, not
+    ///     something to penalize, so these terms must not pull against
+    ///     readability.
+    ///   * `loop_compactness` is a low 0.25: it gently REWARDS drawing feedback
+    ///     loops as visible circles (a readability aid), but must never dominate
+    ///     the overlap/crossings family, so it stays well below 1.0.
+    ///   * `chain_straightness` stays 0.0: it is reserved (not yet computed), so
+    ///     it carries no weight.
+    fn default() -> Self {
+        MetricWeights {
+            node_overlap: 1.0,
+            node_connector_overlap: 1.0,
+            label_overlap: 1.0,
+            crossings: 1.0,
+            sprawl: 0.0,
+            edge_length_cv: 0.0,
+            aspect_penalty: 0.0,
+            chain_straightness: 0.0,
+            loop_compactness: 0.25,
+        }
+    }
+}
+
+impl LayoutMetrics {
+    /// Sigma w_i * term_i -- the scalar an optimizer minimizes.
+    pub fn weighted_cost(&self, w: &MetricWeights) -> f64 {
+        self.node_overlap * w.node_overlap
+            + self.node_connector_overlap * w.node_connector_overlap
+            + self.label_overlap * w.label_overlap
+            + self.crossings * w.crossings
+            + self.sprawl * w.sprawl
+            + self.edge_length_cv * w.edge_length_cv
+            + self.aspect_penalty * w.aspect_penalty
+            + self.chain_straightness * w.chain_straightness
+            + self.loop_compactness * w.loop_compactness
+    }
+}
+
+/// The drawn geometry of one connector (Link or Flow): its incident node uids
+/// (so node-connector-overlap can skip them) and the polyline the renderer
+/// draws. Built once and reused by every connector-derived term so they all see
+/// the same geometry.
+struct ConnectorGeometry {
+    /// Element uids the connector is attached to and must not be charged for
+    /// passing through (its own endpoints).
+    incident_uids: HashSet<i32>,
+    /// The drawn polyline. Always has at least two points (connectors that draw
+    /// nothing -- e.g. MultiPoint links -- are not collected at all).
+    polyline: Vec<Point>,
+    /// Total polyline length.
+    length: f64,
+}
+
+/// Total length of the UNION of parameter intervals `[t0, t1]` (each `t` in
+/// [0,1]), counting each covered sub-length once. Sorts by start then sweep-
+/// merges, so overlapping/adjacent intervals collapse. The next interval merges
+/// when its start is `<= ` the current end (no epsilon needed; equality is
+/// tolerated as adjacency). Mutates `intervals` (sorts in place); empty input
+/// yields 0.0. Order-independent in its result. PURE.
+fn merged_interval_length(intervals: &mut [(f64, f64)]) -> f64 {
+    if intervals.is_empty() {
+        return 0.0;
+    }
+    intervals.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap_or(std::cmp::Ordering::Equal));
+    let mut total = 0.0;
+    let mut cur = intervals[0];
+    for &(t0, t1) in &intervals[1..] {
+        if t0 <= cur.1 {
+            // Overlapping or adjacent: extend the current run.
+            cur.1 = cur.1.max(t1);
+        } else {
+            total += cur.1 - cur.0;
+            cur = (t0, t1);
+        }
+    }
+    total += cur.1 - cur.0;
+    total
+}
+
+/// Polyline length: sum of segment lengths.
+fn polyline_length(points: &[Point]) -> f64 {
+    points
+        .windows(2)
+        .map(|w| {
+            let dx = w[1].x - w[0].x;
+            let dy = w[1].y - w[0].y;
+            (dx * dx + dy * dy).sqrt()
+        })
+        .sum()
+}
+
+/// Resolve the node box for an element that has one (everything except links,
+/// groups, and aliases -- aliases have no bounds helper and are excluded to
+/// match the renderer's `calc_view_box`).
+fn node_box(element: &ViewElement) -> Option<Rect> {
+    match element {
+        ViewElement::Aux(a) => Some(aux_bounds(a)),
+        ViewElement::Stock(s) => Some(stock_bounds(s)),
+        ViewElement::Module(m) => Some(module_bounds(m)),
+        ViewElement::Cloud(c) => Some(cloud_bounds(c)),
+        ViewElement::Flow(f) => Some(flow_bounds(f)),
+        ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => None,
+    }
+}
+
+/// The element's bare *shape* box, WITHOUT its own label, for the same set of
+/// elements as `node_box`. `aux_bounds`/`stock_bounds`/`flow_bounds` merge each
+/// element's own label into the returned box; the label-vs-node term of
+/// `label_overlap` must use the label-free shape so a label-vs-label overlap is
+/// not also charged via the other node's label-merged box (a double-count).
+/// `module_bounds`/`cloud_bounds` already exclude the label (modules render a
+/// label that their bounds omit; clouds render none), so they are their own
+/// shape box.
+fn node_shape_box(element: &ViewElement) -> Option<Rect> {
+    match element {
+        ViewElement::Aux(a) => Some(aux_shape_bounds(a)),
+        ViewElement::Stock(s) => Some(stock_shape_bounds(s)),
+        ViewElement::Module(m) => Some(module_bounds(m)),
+        ViewElement::Cloud(c) => Some(cloud_bounds(c)),
+        ViewElement::Flow(f) => Some(flow_shape_bounds(f)),
+        ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => None,
+    }
+}
+
+/// Build a `LabelProps` for a labeled element, matching the renderer's label
+/// geometry (center, label side, display name, and the element's radii). Only
+/// elements that render a label return `Some`. The radii match the per-element
+/// `with_radii` calls in `diagram::elements`/`diagram::flow`.
+fn element_label_props(element: &ViewElement) -> Option<LabelProps> {
+    use crate::diagram::constants::{
+        AUX_RADIUS, FLOW_VALVE_RADIUS, MODULE_HEIGHT, MODULE_WIDTH, STOCK_HEIGHT, STOCK_WIDTH,
+    };
+    match element {
+        ViewElement::Aux(a) => Some(
+            LabelProps::new(a.x, a.y, a.label_side, display_name(&a.name))
+                .with_radii(AUX_RADIUS, AUX_RADIUS),
+        ),
+        ViewElement::Stock(s) => Some(
+            LabelProps::new(s.x, s.y, s.label_side, display_name(&s.name))
+                .with_radii(STOCK_WIDTH / 2.0, STOCK_HEIGHT / 2.0),
+        ),
+        ViewElement::Module(m) => Some(
+            LabelProps::new(m.x, m.y, m.label_side, display_name(&m.name))
+                .with_radii(MODULE_WIDTH / 2.0, MODULE_HEIGHT / 2.0),
+        ),
+        ViewElement::Flow(f) => Some(
+            LabelProps::new(f.x, f.y, f.label_side, display_name(&f.name))
+                .with_radii(FLOW_VALVE_RADIUS, FLOW_VALVE_RADIUS),
+        ),
+        // Aliases do render a label, but they have no `*_bounds` helper and are
+        // excluded from node bounds to match the renderer's view box; we keep
+        // the label-set consistent with the node-box set by also excluding
+        // their labels. Links/Clouds/Groups render no element label.
+        ViewElement::Alias(_)
+        | ViewElement::Link(_)
+        | ViewElement::Cloud(_)
+        | ViewElement::Group(_) => None,
+    }
+}
+
+/// Collect the drawn geometry of every connector (Link or Flow) that draws
+/// something. Links use the shared `connector_polyline` (the exact geometry the
+/// renderer draws and `build_view_segments` counts); flows use their point
+/// polyline. Connectors that draw nothing (MultiPoint links, degenerate arcs,
+/// flows with fewer than two points) are omitted entirely.
+fn collect_connector_geometry(view: &datamodel::StockFlow) -> Vec<ConnectorGeometry> {
+    let mut uid_elements = std::collections::HashMap::new();
+    for elem in &view.elements {
+        uid_elements.insert(elem.get_uid(), elem);
+    }
+    // Center-based, deterministic: nothing is treated as arrayed (matches
+    // `build_view_segments`).
+    let not_arrayed = |_: &str| false;
+
+    let mut out = Vec::new();
+    for elem in &view.elements {
+        match elem {
+            ViewElement::Link(link) => {
+                let (Some(&from), Some(&to)) = (
+                    uid_elements.get(&link.from_uid),
+                    uid_elements.get(&link.to_uid),
+                ) else {
+                    continue;
+                };
+                let polyline =
+                    connector_polyline(link, from, to, &not_arrayed, ARC_POLYLINE_SAMPLES);
+                if polyline.len() < 2 {
+                    continue;
+                }
+                let length = polyline_length(&polyline);
+                let mut incident_uids = HashSet::new();
+                incident_uids.insert(link.from_uid);
+                incident_uids.insert(link.to_uid);
+                out.push(ConnectorGeometry {
+                    incident_uids,
+                    polyline,
+                    length,
+                });
+            }
+            ViewElement::Flow(flow) => {
+                if flow.points.len() < 2 {
+                    continue;
+                }
+                let polyline: Vec<Point> = flow
+                    .points
+                    .iter()
+                    .map(|p| Point { x: p.x, y: p.y })
+                    .collect();
+                let length = polyline_length(&polyline);
+                // A flow is incident on its own valve plus any element its
+                // points attach to (the stock/cloud at each end).
+                let mut incident_uids = HashSet::new();
+                incident_uids.insert(flow.uid);
+                for p in &flow.points {
+                    if let Some(uid) = p.attached_to_uid {
+                        incident_uids.insert(uid);
+                    }
+                }
+                out.push(ConnectorGeometry {
+                    incident_uids,
+                    polyline,
+                    length,
+                });
+            }
+            _ => {}
+        }
+    }
+    out
+}
+
+// --- loop_compactness (isoperimetric feedback-loop quality) -----------------
+//
+// What it measures: how cleanly the view draws its feedback loops as visible
+// circles. For each simple directed cycle of >= 3 positioned nodes we take the
+// node-box centers in cycle order and form a polygon. Its isoperimetric
+// quotient Q = 4*PI*Area / Perimeter^2 is 1 for a perfect circle and tends to 0
+// as the polygon collapses toward a line (the area vanishes while the perimeter
+// stays large). The per-cycle penalty is `1 - Q` (0 = ideal clean loop, ~1 =
+// squished/collinear), and `loop_compactness` is the mean penalty over all
+// qualifying cycles (0.0 when the view has no cycle of >= 3 nodes). It thus
+// REWARDS well-spread loops and PENALIZES collapsed ones.
+//
+// Bounds (SD diagrams are small, so this stays O(small) and total): a simple
+// cycle is enumerated only up to `MAX_CYCLE_LEN` nodes, and at most
+// `MAX_CYCLES` cycles are scored; enumeration stops once the cap is hit. The
+// graph is built over positioned node-box elements (aux/stock/flow/module/cloud
+// -- the same set as `node_box`); links and flows supply the directed edges.
+//
+// Determinism: layout is deterministic per seed, but this term is additionally
+// independent of element ordering. Adjacency targets are sorted, the DFS starts
+// from each node in sorted uid order, and every enumerated cycle is canonicalized
+// (rotated so its smallest uid is first) and de-duplicated, so the mean is the
+// same regardless of how the elements are listed in the view.
+
+/// Maximum number of nodes in an enumerated simple cycle. SD feedback loops are
+/// short; a longer "cycle" is almost always an artifact of many overlapping
+/// smaller loops and is not worth the combinatorial cost.
+const MAX_CYCLE_LEN: usize = 12;
+
+/// Maximum number of distinct simple cycles scored. Bounds the work on dense
+/// graphs; the mean penalty over the first `MAX_CYCLES` cycles is a faithful
+/// proxy for the whole (SD diagrams rarely approach this).
+const MAX_CYCLES: usize = 64;
+
+/// Directed adjacency over positioned node-box elements, keyed by uid with
+/// sorted successor lists. Each node's loop vertex is the renderer's VISUAL
+/// center (`diagram::connector::get_visual_center`) -- for a flow that is its
+/// VALVE `(flow.x, flow.y)`, NOT the pipe-extent center of `flow_shape_bounds`
+/// (which unions the valve box with every pipe point and so drifts off the valve
+/// when the pipe is bent or the valve is dragged off-center); for an
+/// aux/stock/module/cloud it is the element center, which already equals the
+/// symmetric shape-box midpoint. Using the same visual center the SVG renderer
+/// draws keeps the loop polygon faithful to the drawn diagram.
+struct LoopGraph {
+    /// uid -> sorted, de-duplicated successor uids.
+    adj: BTreeMap<i32, Vec<i32>>,
+    /// uid -> node visual-center point (the valve for flows; the element center
+    /// for aux/stock/module/cloud).
+    centers: BTreeMap<i32, Point>,
+}
+
+/// Build the directed loop graph from the view. Nodes are exactly the elements
+/// with a node box (`node_shape_box` -- aux/stock/module/cloud/flow; links,
+/// aliases, and groups are excluded). Each node's loop vertex is the renderer's
+/// VISUAL center (`get_visual_center`), so a flow's vertex is its VALVE
+/// `(flow.x, flow.y)`, NOT the pipe-extent center of `flow_shape_bounds` (the
+/// valve box unioned with every pipe point), which drifts off the valve when the
+/// pipe is bent or the valve is dragged off-center. For aux/stock/module/cloud
+/// the visual center is the element center, which already equals the symmetric
+/// shape-box midpoint, so those vertices are unchanged. Edges to/from uids that
+/// are not positioned nodes are dropped. Edges come from:
+///   * each Link: `from_uid -> to_uid`;
+///   * each Flow: for consecutive attached points, `source_attached -> flow.uid`
+///     and `flow.uid -> dest_attached`, so a stock--flow--stock feedback path is
+///     part of the graph (the flow's own valve is the intermediate node).
+fn build_loop_graph(view: &datamodel::StockFlow) -> LoopGraph {
+    // The node-membership gate stays `node_shape_box` (it defines which elements
+    // are loop nodes), but the loop VERTEX is the renderer's visual center, which
+    // is correct for every gated kind: the valve for a flow, the element center
+    // for aux/stock/module/cloud. `not_arrayed` matches `collect_connector_geometry`
+    // / `build_view_segments` (offset 0, deterministic).
+    let not_arrayed = |_: &str| false;
+    let mut centers: BTreeMap<i32, Point> = BTreeMap::new();
+    for e in &view.elements {
+        if node_shape_box(e).is_some() {
+            let (cx, cy) = get_visual_center(e, &not_arrayed);
+            centers.insert(e.get_uid(), Point { x: cx, y: cy });
+        }
+    }
+
+    // Collect edges into sorted sets per source so the adjacency is canonical
+    // (sorted, de-duplicated) and the cycle search is order-independent.
+    let mut edge_sets: BTreeMap<i32, BTreeSet<i32>> = BTreeMap::new();
+    let mut add_edge = |from: i32, to: i32, centers: &BTreeMap<i32, Point>| {
+        // Both endpoints must be positioned nodes, and we never record a
+        // self-loop (a single-node "cycle" forms no polygon).
+        if from != to && centers.contains_key(&from) && centers.contains_key(&to) {
+            edge_sets.entry(from).or_default().insert(to);
+        }
+    };
+
+    for e in &view.elements {
+        match e {
+            ViewElement::Link(link) => {
+                add_edge(link.from_uid, link.to_uid, &centers);
+            }
+            ViewElement::Flow(flow) => {
+                // Consecutive attached points define stock->flow and flow->stock
+                // edges through the flow's own valve uid.
+                let attached: Vec<i32> = flow
+                    .points
+                    .iter()
+                    .filter_map(|p| p.attached_to_uid)
+                    .collect();
+                for w in attached.windows(2) {
+                    add_edge(w[0], flow.uid, &centers);
+                    add_edge(flow.uid, w[1], &centers);
+                }
+            }
+            _ => {}
+        }
+    }
+
+    let adj: BTreeMap<i32, Vec<i32>> = edge_sets
+        .into_iter()
+        .map(|(k, set)| (k, set.into_iter().collect()))
+        .collect();
+    LoopGraph { adj, centers }
+}
+
+/// Enumerate simple directed cycles (each >= 2 nodes), bounded by
+/// `MAX_CYCLE_LEN` and `MAX_CYCLES`, canonicalized and de-duplicated so the same
+/// directed cycle is returned exactly once regardless of where the search
+/// started. A bounded DFS suffices: SD diagrams are tiny, and the caps keep it
+/// O(small) on the rare dense graph.
+///
+/// Each returned cycle is a `Vec<i32>` of uids in traversal order, rotated so
+/// its smallest uid is first (canonical form), and the set of returned cycles is
+/// itself sorted for a fully deterministic result.
+fn enumerate_simple_cycles(graph: &LoopGraph) -> Vec<Vec<i32>> {
+    let mut found: BTreeSet<Vec<i32>> = BTreeSet::new();
+    // Start a DFS from each node in sorted uid order. To avoid re-finding the
+    // same cycle from each of its members we still canonicalize+dedup, but we
+    // also restrict each search to cycles whose minimum node is the start node,
+    // which prunes the bulk of the duplicate work.
+    let starts: Vec<i32> = graph.adj.keys().copied().collect();
+    let mut path: Vec<i32> = Vec::new();
+    let mut on_path: HashSet<i32> = HashSet::new();
+    for &start in &starts {
+        path.clear();
+        on_path.clear();
+        dfs_cycles(graph, start, start, &mut path, &mut on_path, &mut found);
+        if found.len() >= MAX_CYCLES {
+            break;
+        }
+    }
+    found.into_iter().take(MAX_CYCLES).collect()
+}
+
+/// Depth-first walk that records every simple cycle returning to `start` and
+/// composed only of nodes whose uid is >= `start` (so each cycle is discovered
+/// from its smallest member). `path`/`on_path` track the current simple path.
+fn dfs_cycles(
+    graph: &LoopGraph,
+    start: i32,
+    current: i32,
+    path: &mut Vec<i32>,
+    on_path: &mut HashSet<i32>,
+    found: &mut BTreeSet<Vec<i32>>,
+) {
+    if found.len() >= MAX_CYCLES {
+        return;
+    }
+    path.push(current);
+    on_path.insert(current);
+
+    if let Some(succs) = graph.adj.get(&current) {
+        for &next in succs {
+            if next == start {
+                // Closed a cycle back to the start. Record it (>= 2 nodes by
+                // construction; self-loops were never added as edges).
+                if path.len() >= 2 {
+                    found.insert(canonicalize_cycle(path));
+                    if found.len() >= MAX_CYCLES {
+                        break;
+                    }
+                }
+                continue;
+            }
+            // Only extend through nodes strictly greater than the start (so the
+            // start is the minimum), not already on the path, within the length
+            // cap.
+            if next > start && !on_path.contains(&next) && path.len() < MAX_CYCLE_LEN {
+                dfs_cycles(graph, start, next, path, on_path, found);
+                if found.len() >= MAX_CYCLES {
+                    break;
+                }
+            }
+        }
+    }
+
+    on_path.remove(&current);
+    path.pop();
+}
+
+/// Rotate a cycle so its smallest uid is first, preserving traversal direction.
+/// The DFS already guarantees the start (= minimum) is element 0, but rotating
+/// defensively keeps the canonical form correct for any caller.
+///
+/// Note: this canonicalizes rotation (start at min uid) but NOT traversal
+/// direction, so a directed cycle and its reverse canonicalize to distinct
+/// entries. That is harmless: a reverse-direction duplicate (essentially never
+/// present for directed SD feedback loops, which would require both directed
+/// edge sets in the graph) would compute the same isoperimetric penalty because
+/// the shoelace polygon area in `cycle_penalty` is direction-invariant.
+fn canonicalize_cycle(cycle: &[i32]) -> Vec<i32> {
+    if cycle.is_empty() {
+        return Vec::new();
+    }
+    let min_idx = cycle
+        .iter()
+        .enumerate()
+        .min_by_key(|&(_, v)| *v)
+        .map(|(i, _)| i)
+        .unwrap_or(0);
+    let mut out = Vec::with_capacity(cycle.len());
+    for k in 0..cycle.len() {
+        out.push(cycle[(min_idx + k) % cycle.len()]);
+    }
+    out
+}
+
+/// Isoperimetric penalty `1 - Q` for one cycle's node-box centers, or `None` if
+/// the cycle does not qualify (fewer than 3 distinct positioned nodes, or a
+/// degenerate zero-perimeter polygon). `Q = 4*PI*Area / Perimeter^2` is clamped
+/// to [0, 1]; `Area` is the shoelace area (absolute value) and `Perimeter` the
+/// summed edge length over the closed polygon.
+fn cycle_penalty(cycle: &[i32], centers: &BTreeMap<i32, Point>) -> Option<f64> {
+    // Distinct positioned nodes only: a polygon needs >= 3 vertices.
+    let distinct: BTreeSet<i32> = cycle.iter().copied().collect();
+    if distinct.len() < 3 {
+        return None;
+    }
+    let pts: Vec<Point> = cycle
+        .iter()
+        .filter_map(|uid| centers.get(uid).copied())
+        .collect();
+    if pts.len() < 3 {
+        return None;
+    }
+
+    let n = pts.len();
+    let mut area2 = 0.0;
+    let mut perimeter = 0.0;
+    for i in 0..n {
+        let a = pts[i];
+        let b = pts[(i + 1) % n];
+        area2 += a.x * b.y - b.x * a.y;
+        let dx = b.x - a.x;
+        let dy = b.y - a.y;
+        perimeter += (dx * dx + dy * dy).sqrt();
+    }
+    if perimeter <= 0.0 {
+        // All centers coincide: no polygon. Guarded so the division below is
+        // never NaN; such a degenerate cycle simply does not contribute.
+        return None;
+    }
+    let area = area2.abs() / 2.0;
+    let q = (4.0 * std::f64::consts::PI * area / (perimeter * perimeter)).clamp(0.0, 1.0);
+    Some(1.0 - q)
+}
+
+/// `loop_compactness`: mean isoperimetric penalty `1 - Q` over the view's
+/// bounded simple directed cycles of >= 3 positioned nodes. 0.0 when there is no
+/// qualifying cycle. Deterministic for a given view regardless of element order
+/// (see the module comment above). PURE.
+fn compute_loop_compactness(view: &datamodel::StockFlow) -> f64 {
+    let graph = build_loop_graph(view);
+    let cycles = enumerate_simple_cycles(&graph);
+    let penalties: Vec<f64> = cycles
+        .iter()
+        .filter_map(|c| cycle_penalty(c, &graph.centers))
+        .collect();
+    if penalties.is_empty() {
+        0.0
+    } else {
+        penalties.iter().sum::<f64>() / penalties.len() as f64
+    }
+}
+
+/// Compute the layout quality metrics for a completed view.
+///
+/// PURE: takes data, returns scalars, performs no I/O. The `_config` parameter
+/// is kept to match the design's optimizer-facing signature and for forward
+/// compatibility; the box geometry is sourced entirely from the `diagram`
+/// helpers (which use fixed pixel element sizes), so the config is presently
+/// unused. Every term is guaranteed finite (each division guards a zero
+/// denominator by returning 0), so empty and single-element views yield
+/// all-zero, NaN-free metrics.
+pub fn compute_layout_metrics(
+    view: &datamodel::StockFlow,
+    _config: &LayoutConfig,
+) -> LayoutMetrics {
+    // --- node boxes (with their owning element for incidence checks) ---
+    //
+    // Two box sets, used by different terms:
+    //   * `node_boxes` is the LABEL-MERGED box (`node_box`): each element's own
+    //     label unioned into its shape. The view's visual extent and its
+    //     characteristic node size both include labels, so `sprawl` and
+    //     `aspect_penalty` use this set.
+    //   * `node_shape_boxes` is the bare SHAPE box (`node_shape_box`):
+    //     label-free. `node_overlap` and `node_connector_overlap` use this set
+    //     so they measure exactly what the user cares about -- node SHAPES
+    //     overlapping other node shapes, and a connector passing under a node
+    //     SHAPE (a false-causal-connection at a glance). A connector passing
+    //     only under a node's LABEL is mild noise (labels are semi-transparent
+    //     and no connector terminates on one) and must NOT be charged here;
+    //     label collisions are the province of `label_overlap`.
+    let node_boxes: Vec<(i32, Rect)> = view
+        .elements
+        .iter()
+        .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r)))
+        .collect();
+    let node_shape_boxes: Vec<(i32, Rect)> = view
+        .elements
+        .iter()
+        .filter_map(|e| node_shape_box(e).map(|r| (e.get_uid(), r)))
+        .collect();
+
+    // --- node_overlap (bare shape boxes, normalized by total shape-box area) ---
+    let total_shape_area: f64 = node_shape_boxes.iter().map(|(_, r)| rect_area(r)).sum();
+    let node_overlap = if total_shape_area > 0.0 {
+        let mut overlap = 0.0;
+        for i in 0..node_shape_boxes.len() {
+            for j in (i + 1)..node_shape_boxes.len() {
+                overlap += rect_overlap_area(&node_shape_boxes[i].1, &node_shape_boxes[j].1);
+            }
+        }
+        overlap / total_shape_area
+    } else {
+        0.0
+    };
+
+    // --- connector geometry (shared by several terms) ---
+    let connectors = collect_connector_geometry(view);
+    let total_connector_length: f64 = connectors.iter().map(|c| c.length).sum();
+
+    // --- node_connector_overlap (length inside non-incident shape boxes) ---
+    //
+    // Documented as a "fraction of total connector length", so each physical
+    // sub-length of connector covered by ANY non-incident node shape box must be
+    // counted AT MOST ONCE. Summing the per-box clipped length double-counts the
+    // region where two non-incident boxes overlap, which can push the normalized
+    // value above 1.0 (overlapping shape boxes are common -- a Flow's shape box is
+    // its whole-pipe bounding box, which frequently overlaps stocks/auxes/other
+    // flows). Instead, for EACH segment we collect the clip intervals over all
+    // non-incident boxes and UNION them (merge overlapping/adjacent intervals)
+    // before summing, so each covered sub-length contributes once and the term is
+    // a true fraction in [0, 1]. The per-segment merge result is order-independent,
+    // so this is deterministic regardless of `node_shape_boxes` iteration order.
+    let node_connector_overlap = if total_connector_length > 0.0 {
+        let mut inside = 0.0;
+        for c in &connectors {
+            for seg in c.polyline.windows(2) {
+                let dx = seg[1].x - seg[0].x;
+                let dy = seg[1].y - seg[0].y;
+                let seg_len = (dx * dx + dy * dy).sqrt();
+                if seg_len == 0.0 {
+                    continue; // degenerate segment covers no length
+                }
+                // Clip interval [t0, t1] of this segment within each non-incident
+                // box, in segment-parameter space (t in [0,1]).
+                let mut intervals: Vec<(f64, f64)> = Vec::new();
+                for (uid, rect) in &node_shape_boxes {
+                    if c.incident_uids.contains(uid) {
+                        continue; // skip the connector's own endpoints
+                    }
+                    if let Some(iv) = segment_clip_interval_in_rect(&seg[0], &seg[1], rect) {
+                        intervals.push(iv);
+                    }
+                }
+                inside += merged_interval_length(&mut intervals) * seg_len;
+            }
+        }
+        inside / total_connector_length
+    } else {
+        0.0
+    };
+
+    // --- label_overlap (per-label obscuration) ---
+    //
+    // For each labeled element L, measure how much of its label box B_L is
+    // covered (obscured) by OTHER drawn geometry, then SUM each label's obscured
+    // fraction. This is per-label rather than a single corpus-wide ratio: a
+    // small-but-readability-killing overlap (e.g. a node circle clipping the last
+    // two characters of a short label) registers at its true obscuration
+    // fraction instead of being diluted to ~0 by the corpus's total label area
+    // (the prior `sum_of_overlaps / total_label_area` definition under-counted
+    // exactly this case).
+    //
+    // The coverers of B_L are (a) any OTHER label box and (b) any OTHER element's
+    // bare *shape* box (`node_shape_box`, NOT the label-merged `node_box`):
+    //   * A label is never charged against its OWN element's shape box. By
+    //     construction a label sits adjacent to (and within the merged bounds of)
+    //     its own element, so charging it there would always add a constant that
+    //     is not a real collision.
+    //   * Comparing against the bare shape box (not the label-merged box) keeps
+    //     "label lands on another label" and "label lands on another node's
+    //     shape" cleanly separate -- the merged box unions that node's own label,
+    //     which would re-count the label-vs-label coverage already captured by
+    //     the label-box term.
+    //
+    // A pixel-exact union of all coverers is unnecessary: the covered area is
+    // approximated by the SUM of individual overlap areas, capped at area(B_L) so
+    // a label's obscured fraction stays in [0,1] even when coverers overlap each
+    // other. This is a monotone proxy (more/larger overlaps never decrease the
+    // fraction). A mutual label-label collision is charged from BOTH labels'
+    // perspectives -- intended, since both are unreadable. Guards area(B_L) == 0
+    // (degenerate label) by skipping it, so the term is always finite.
+    let label_boxes: Vec<(i32, Rect)> = view
+        .elements
+        .iter()
+        .filter_map(|e| element_label_props(e).map(|props| (e.get_uid(), label_bounds(&props))))
+        .collect();
+    // `node_shape_boxes` is computed once above (shared with node_overlap and
+    // node_connector_overlap).
+    let mut label_overlap = 0.0;
+    for (lbl_uid, lbl) in &label_boxes {
+        let lbl_area = rect_area(lbl);
+        if lbl_area <= 0.0 {
+            continue; // degenerate label box: no NaN, contributes nothing
+        }
+        let mut covered = 0.0;
+        // Covered by every OTHER label box.
+        for (other_uid, other) in &label_boxes {
+            if other_uid == lbl_uid {
+                continue;
+            }
+            covered += rect_overlap_area(lbl, other);
+        }
+        // Covered by every OTHER element's bare shape box.
+        for (node_uid, node) in &node_shape_boxes {
+            if node_uid == lbl_uid {
+                continue;
+            }
+            covered += rect_overlap_area(lbl, node);
+        }
+        // Cap the (possibly over-counted) covered area at the label's own area
+        // so the obscured fraction is in [0,1].
+        let obscured_fraction = (covered.min(lbl_area)) / lbl_area;
+        label_overlap += obscured_fraction;
+    }
+
+    // --- crossings ---
+    let connector_count = connectors.len();
+    let crossings = if connector_count > 0 {
+        count_crossings(&build_view_segments(view)) as f64 / connector_count as f64
+    } else {
+        0.0
+    };
+
+    // --- sprawl ---
+    let sprawl = if !connectors.is_empty() && !node_boxes.is_empty() {
+        let mean_connector_length = total_connector_length / connectors.len() as f64;
+        let characteristic_node_size = node_boxes
+            .iter()
+            .map(|(_, r)| {
+                let w = common::rect_width(r);
+                let h = common::rect_height(r);
+                (w * w + h * h).sqrt()
+            })
+            .sum::<f64>()
+            / node_boxes.len() as f64;
+        if characteristic_node_size > 0.0 {
+            mean_connector_length / characteristic_node_size
+        } else {
+            0.0
+        }
+    } else {
+        0.0
+    };
+
+    // --- edge_length_cv ---
+    let edge_length_cv = if connectors.len() >= 2 {
+        let n = connectors.len() as f64;
+        let mean = total_connector_length / n;
+        if mean > 0.0 {
+            let variance = connectors
+                .iter()
+                .map(|c| {
+                    let d = c.length - mean;
+                    d * d
+                })
+                .sum::<f64>()
+                / n; // population variance
+            variance.sqrt() / mean
+        } else {
+            0.0
+        }
+    } else {
+        0.0
+    };
+
+    // --- aspect_penalty ---
+    // Bounding box over node boxes (union). The aspect ratio is the long side
+    // over the short side (always >= 1); we penalize the amount by which it
+    // exceeds the target band. Chosen formula: `ar - TARGET_AR_MAX` (a plain
+    // unit-of-ratio overshoot). Documented here and matched in the AC1.5 test.
+    let aspect_penalty = match view_bounding_box(&node_boxes) {
+        Some(bbox) => {
+            let w = common::rect_width(&bbox);
+            let h = common::rect_height(&bbox);
+            let (long, short) = if w >= h { (w, h) } else { (h, w) };
+            if short <= 0.0 {
+                0.0
+            } else {
+                let ar = long / short;
+                (ar - TARGET_AR_MAX).max(0.0)
+            }
+        }
+        None => 0.0,
+    };
+
+    // --- loop_compactness (isoperimetric feedback-loop quality) ---
+    let loop_compactness = compute_loop_compactness(view);
+
+    LayoutMetrics {
+        node_overlap,
+        node_connector_overlap,
+        label_overlap,
+        crossings,
+        sprawl,
+        edge_length_cv,
+        aspect_penalty,
+        // reserved; computed in a future rung
+        chain_straightness: 0.0,
+        loop_compactness,
+    }
+}
+
+/// Union of the node boxes, or `None` if there are no node boxes.
+fn view_bounding_box(node_boxes: &[(i32, Rect)]) -> Option<Rect> {
+    let mut iter = node_boxes.iter();
+    let first = iter.next()?.1;
+    Some(iter.fold(first, |acc, (_, r)| merge_bounds(acc, *r)))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::datamodel::view_element::{self, LabelSide, LinkShape};
+    // `segment_length_in_rect` is the simple single-box clip; the AC1.3 tests and
+    // the union tests use it as an independent reference oracle to cross-check the
+    // production union path (which composes `segment_clip_interval_in_rect`).
+    use crate::diagram::common::segment_length_in_rect;
+    use crate::diagram::constants::STOCK_WIDTH;
+    use proptest::prelude::*;
+
+    // --- fixture helpers ---
+
+    fn stock(uid: i32, name: &str, x: f64, y: f64) -> ViewElement {
+        ViewElement::Stock(view_element::Stock {
+            name: name.to_string(),
+            uid,
+            x,
+            y,
+            label_side: LabelSide::Bottom,
+            compat: None,
+        })
+    }
+
+    fn aux(uid: i32, name: &str, x: f64, y: f64) -> ViewElement {
+        ViewElement::Aux(view_element::Aux {
+            name: name.to_string(),
+            uid,
+            x,
+            y,
+            label_side: LabelSide::Bottom,
+            compat: None,
+        })
+    }
+
+    /// A cloud at `(x, y)`. A cloud is a positioned node with a bare shape box
+    /// (`cloud_bounds`, a 27x27 square: CLOUD_RADIUS = 13.5) and NO rendered
+    /// label, so it is the cleanest "obscuring shape" fixture for label_overlap.
+    fn cloud(uid: i32, x: f64, y: f64) -> ViewElement {
+        ViewElement::Cloud(view_element::Cloud {
+            uid,
+            flow_uid: -1,
+            x,
+            y,
+            compat: None,
+        })
+    }
+
+    fn straight_link(uid: i32, from_uid: i32, to_uid: i32) -> ViewElement {
+        ViewElement::Link(view_element::Link {
+            uid,
+            from_uid,
+            to_uid,
+            shape: LinkShape::Straight,
+            polarity: None,
+        })
+    }
+
+    /// A flow valve at `(x, y)` with a two-point polyline whose endpoints attach
+    /// to `from_uid` and `to_uid` (a stock--flow--stock segment). The point
+    /// coordinates are irrelevant to `loop_compactness` (which uses node-box
+    /// centers, not flow points), so they are placed at the valve.
+    fn flow_between(
+        uid: i32,
+        name: &str,
+        x: f64,
+        y: f64,
+        from_uid: i32,
+        to_uid: i32,
+    ) -> ViewElement {
+        ViewElement::Flow(view_element::Flow {
+            name: name.to_string(),
+            uid,
+            x,
+            y,
+            label_side: LabelSide::Bottom,
+            points: vec![
+                view_element::FlowPoint {
+                    x,
+                    y,
+                    attached_to_uid: Some(from_uid),
+                },
+                view_element::FlowPoint {
+                    x,
+                    y,
+                    attached_to_uid: Some(to_uid),
+                },
+            ],
+            compat: None,
+            label_compat: None,
+        })
+    }
+
+    fn make_view(elements: Vec<ViewElement>) -> datamodel::StockFlow {
+        datamodel::StockFlow {
+            name: None,
+            elements,
+            view_box: datamodel::Rect {
+                x: 0.0,
+                y: 0.0,
+                width: 1000.0,
+                height: 1000.0,
+            },
+            zoom: 1.0,
+            use_lettered_polarity: false,
+            font: None,
+            sketch_compat: None,
+        }
+    }
+
+    fn cfg() -> LayoutConfig {
+        LayoutConfig::default()
+    }
+
+    /// Scale every coordinate of a view by `s` (element centers and any
+    /// flow/connector points). Used by the AC1.8 scale-invariance test.
+    fn scale_view(view: &datamodel::StockFlow, s: f64) -> datamodel::StockFlow {
+        let elements = view
+            .elements
+            .iter()
+            .map(|e| match e {
+                ViewElement::Aux(a) => ViewElement::Aux(view_element::Aux {
+                    x: a.x * s,
+                    y: a.y * s,
+                    ..a.clone()
+                }),
+                ViewElement::Stock(st) => ViewElement::Stock(view_element::Stock {
+                    x: st.x * s,
+                    y: st.y * s,
+                    ..st.clone()
+                }),
+                ViewElement::Flow(f) => ViewElement::Flow(view_element::Flow {
+                    x: f.x * s,
+                    y: f.y * s,
+                    points: f
+                        .points
+                        .iter()
+                        .map(|p| view_element::FlowPoint {
+                            x: p.x * s,
+                            y: p.y * s,
+                            attached_to_uid: p.attached_to_uid,
+                        })
+                        .collect(),
+                    ..f.clone()
+                }),
+                ViewElement::Module(m) => ViewElement::Module(view_element::Module {
+                    x: m.x * s,
+                    y: m.y * s,
+                    ..m.clone()
+                }),
+                ViewElement::Cloud(c) => ViewElement::Cloud(view_element::Cloud {
+                    x: c.x * s,
+                    y: c.y * s,
+                    ..c.clone()
+                }),
+                ViewElement::Alias(a) => ViewElement::Alias(view_element::Alias {
+                    x: a.x * s,
+                    y: a.y * s,
+                    ..a.clone()
+                }),
+                other => other.clone(),
+            })
+            .collect();
+        datamodel::StockFlow {
+            elements,
+            ..view.clone()
+        }
+    }
+
+    // --- AC1.1: node_overlap equals known overlap / total node area ---
+
+    #[test]
+    fn test_node_overlap_known_overlap_fraction() {
+        // Two stocks (45x35) whose centers are 20px apart horizontally and at
+        // the same y. node_overlap is computed on the bare SHAPE boxes (not the
+        // label-merged boxes), so the expected value comes from
+        // `stock_shape_bounds` and is normalized by the total SHAPE-box area.
+        let s1 = stock(1, "a", 100.0, 100.0);
+        let s2 = stock(2, "b", 120.0, 100.0);
+        let view = make_view(vec![s1.clone(), s2.clone()]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+
+        // Expected: compute directly from the two bare shape boxes the renderer
+        // draws (the rects, label-free).
+        let b1 = node_shape_box(&s1).unwrap();
+        let b2 = node_shape_box(&s2).unwrap();
+        let expected_overlap = rect_overlap_area(&b1, &b2);
+        let expected_total = rect_area(&b1) + rect_area(&b2);
+        assert!(expected_overlap > 0.0, "fixture must actually overlap");
+        let expected = expected_overlap / expected_total;
+        assert!(
+            (m.node_overlap - expected).abs() < 1e-9,
+            "node_overlap {} != expected {}",
+            m.node_overlap,
+            expected
+        );
+    }
+
+    #[test]
+    fn test_node_overlap_simple_hand_computed() {
+        // Two stocks with exactly one stock-width of horizontal center
+        // separation. node_overlap is a sum over the bare SHAPE boxes, so only
+        // the rects matter (labels are irrelevant to this term now).
+        let s1 = stock(1, "a", 0.0, 0.0);
+        let s2 = stock(2, "b", STOCK_WIDTH, 0.0); // centers exactly one width apart
+        let view = make_view(vec![s1, s2]);
+        let m = compute_layout_metrics(&view, &cfg());
+        // Centers one full width apart -> the 45-wide shape boxes just touch in
+        // x (right edge of #1 at +22.5, left edge of #2 at +22.5): zero shape
+        // overlap. So node_overlap == 0.
+        assert_eq!(m.node_overlap, 0.0);
+    }
+
+    // --- AC1.2: pairwise-disjoint nodes => node_overlap == 0 ---
+
+    #[test]
+    fn test_node_overlap_disjoint_is_zero() {
+        let view = make_view(vec![
+            stock(1, "a", 0.0, 0.0),
+            stock(2, "b", 500.0, 500.0),
+            aux(3, "c", 1000.0, 0.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.node_overlap, 0.0);
+    }
+
+    // node_overlap is computed on the bare SHAPE boxes, NOT the label-merged
+    // boxes. The user cares about node shapes overlapping other node shapes;
+    // a label landing on another node's shape (or another label) is the
+    // province of `label_overlap`. This test distinguishes the two regimes and
+    // would FAIL against the prior label-merged-box implementation.
+
+    #[test]
+    fn test_node_overlap_labels_overlap_shapes_disjoint_is_zero() {
+        // Two `LabelSide::Bottom` auxes named "samename" (8 chars), 40px apart
+        // horizontally at the same y -- the same fixture as the label_overlap
+        // double-count regression test:
+        //   aux1 @ (0,0):  shape [-9,9]x[-9,9],   label [-29,29]x[13,27]
+        //   aux2 @ (40,0): shape [31,49]x[-9,9],  label [11,69]x[13,27]
+        // The SHAPE boxes are disjoint (9 < 31), so node_overlap == 0. The
+        // LABEL boxes overlap, but that collision belongs to label_overlap, not
+        // node_overlap. Under the old label-merged boxes node_overlap would be
+        // > 0 (the merged boxes [-29,29]x[-9,27] and [11,69]x[-9,27] overlap),
+        // so this assertion pins the new shape-only behavior.
+        let view = make_view(vec![
+            aux(1, "samename", 0.0, 0.0),
+            aux(2, "samename", 40.0, 0.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(
+            m.node_overlap, 0.0,
+            "node_overlap must ignore label-only overlap (shapes are disjoint)"
+        );
+        // Sanity: the label collision IS captured by label_overlap, confirming
+        // the overlap was not simply lost.
+        assert!(
+            m.label_overlap > 0.0,
+            "the label-vs-label overlap must still be charged by label_overlap"
+        );
+    }
+
+    #[test]
+    fn test_node_overlap_shapes_overlap_is_positive() {
+        // Two stocks (45x35) whose centers are 20px apart horizontally and at
+        // the same y -- their bare SHAPE boxes overlap, so node_overlap > 0.
+        let view = make_view(vec![
+            stock(1, "a", 100.0, 100.0),
+            stock(2, "b", 120.0, 100.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.node_overlap > 0.0,
+            "overlapping node shapes must produce positive node_overlap"
+        );
+    }
+
+    // --- AC1.3: node_connector_overlap ---
+
+    #[test]
+    fn test_node_connector_overlap_through_third_node() {
+        // Connector from aux #1 (far left) to aux #2 (far right), passing
+        // horizontally through a stock #3 sitting on the line at the middle.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let mid = stock(3, "s", 200.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, mid, link]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.node_connector_overlap > 0.0,
+            "connector passing through a non-incident stock must contribute"
+        );
+
+        // Expected = clipped length inside the stock SHAPE box / total polyline
+        // len. node_connector_overlap charges against the bare shape box, not
+        // the label-merged box. (The connector is horizontal at y=0, so the
+        // clipped length happens to be identical to the label-merged box here;
+        // the SHAPE box is the contract regardless.)
+        let connectors = collect_connector_geometry(&view);
+        assert_eq!(connectors.len(), 1);
+        let c = &connectors[0];
+        let stock_box = node_shape_box(&stock(3, "s", 200.0, 0.0)).unwrap();
+        let mut inside = 0.0;
+        for seg in c.polyline.windows(2) {
+            inside += segment_length_in_rect(&seg[0], &seg[1], &stock_box);
+        }
+        let expected = inside / c.length;
+        assert!(
+            (m.node_connector_overlap - expected).abs() < 1e-9,
+            "got {} expected {}",
+            m.node_connector_overlap,
+            expected
+        );
+    }
+
+    #[test]
+    fn test_node_connector_overlap_avoids_all_is_zero() {
+        // Connector between two auxes with a third node well off the line.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let off = stock(3, "s", 200.0, 500.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, off, link]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.node_connector_overlap, 0.0);
+    }
+
+    // node_connector_overlap charges a connector for the length it spends
+    // inside a non-incident node's bare SHAPE box, NOT its label-merged box.
+    // The user reads a connector passing under a node SHAPE as a false causal
+    // connection (high priority); a connector passing only under a node's LABEL
+    // is mild noise (labels are semi-transparent, no connector starts/ends on a
+    // label) and must NOT be charged. These two tests pin that distinction; the
+    // first would FAIL against the prior label-merged-box implementation.
+
+    #[test]
+    fn test_node_connector_overlap_under_label_only_is_zero() {
+        // Connector from aux #1 (0,0) to aux #2 (400,0): a horizontal line at
+        // y=0 (clipped to the 9px aux radii, so drawn x in [9, 391]). A
+        // non-incident `LabelSide::Bottom` stock #3 named "s" (1 char) is placed
+        // ABOVE the line so its SHAPE box clears y=0 but its label (which hangs
+        // BELOW the shape) reaches down across y=0:
+        //   stock #3 @ (200,-25):
+        //     shape box  x [177.5, 222.5], y [-42.5, -7.5]   (does NOT cross 0)
+        //     label box  x [192, 208],     y [-3.5, 10.5]    (DOES cross 0)
+        // The connector at y=0 passes through the label band but never enters
+        // the shape box, so node_connector_overlap == 0. Under the old
+        // label-merged box (which unions the label, y [-42.5, 10.5]) the line
+        // WOULD be charged, so this assertion is the load-bearing distinction.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let label_only = stock(3, "s", 200.0, -25.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, label_only, link]);
+
+        // Confirm the fixture geometry is what we claim before asserting on the
+        // metric: shape box clears the line, merged box does not.
+        let shape = node_shape_box(&stock(3, "s", 200.0, -25.0)).unwrap();
+        let merged = node_box(&stock(3, "s", 200.0, -25.0)).unwrap();
+        assert!(
+            shape.bottom < 0.0,
+            "shape box must clear the connector line (bottom {} < 0)",
+            shape.bottom
+        );
+        assert!(
+            merged.bottom > 0.0,
+            "merged box must cross the connector line via the label (bottom {} > 0)",
+            merged.bottom
+        );
+
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(
+            m.node_connector_overlap, 0.0,
+            "a connector passing only under a node's LABEL must not be charged"
+        );
+    }
+
+    #[test]
+    fn test_node_connector_overlap_under_shape_is_positive() {
+        // Same connector, but the non-incident stock sits ON the line so the
+        // connector crosses its SHAPE box -- the false-causal-connection case
+        // the user cares about. node_connector_overlap > 0.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let on_line = stock(3, "s", 200.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, on_line, link]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.node_connector_overlap > 0.0,
+            "a connector passing under a node SHAPE must be charged"
+        );
+    }
+
+    // node_connector_overlap is documented as a "fraction of total connector
+    // length", so it must count each physical sub-length of connector covered by
+    // ANY non-incident node shape box AT MOST ONCE. When two non-incident shape
+    // boxes overlap, the prior implementation summed the per-box clipped lengths,
+    // double-counting the connector segment that lies in the overlap region; the
+    // normalized value could then exceed 1.0 and over-inflate weighted_cost. The
+    // correct value is the UNION length covered by (box A OR box B) over the total
+    // connector length. These two tests pin the union contract.
+
+    /// Length of segment p0->p1 covered by the UNION of `rects` (each physical
+    /// sub-length counted once). Independent reference implementation used by the
+    /// union tests: collect each rect's Liang-Barsky clip interval, merge, sum.
+    fn union_segment_length_in_rects(p0: &Point, p1: &Point, rects: &[Rect]) -> f64 {
+        let seg_len = {
+            let dx = p1.x - p0.x;
+            let dy = p1.y - p0.y;
+            (dx * dx + dy * dy).sqrt()
+        };
+        if seg_len == 0.0 {
+            return 0.0;
+        }
+        let mut intervals: Vec<(f64, f64)> = Vec::new();
+        for r in rects {
+            // Recover [t0, t1] from segment_length_in_rect's reported length: the
+            // tests use axis-aligned horizontal segments, so the clipped length is
+            // an exact multiple of seg_len. We instead build intervals from the
+            // covered length by reconstructing endpoints via the rect bounds for a
+            // horizontal segment at constant y (the only geometry these tests use).
+            let covered = segment_length_in_rect(p0, p1, r);
+            if covered <= 0.0 {
+                continue;
+            }
+            // For a horizontal segment (y constant) inside [left,right], the
+            // covered x-range is [max(min_x,left), min(max_x,right)]. Convert to t.
+            let (xa, xb) = (p0.x.min(p1.x), p0.x.max(p1.x));
+            let lo_x = xa.max(r.left);
+            let hi_x = xb.min(r.right);
+            let span = p1.x - p0.x;
+            let t_lo = ((lo_x - p0.x) / span).clamp(0.0, 1.0);
+            let t_hi = ((hi_x - p0.x) / span).clamp(0.0, 1.0);
+            let (t0, t1) = if t_lo <= t_hi {
+                (t_lo, t_hi)
+            } else {
+                (t_hi, t_lo)
+            };
+            intervals.push((t0, t1));
+        }
+        intervals.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap());
+        let mut total = 0.0;
+        let mut cur: Option<(f64, f64)> = None;
+        for (t0, t1) in intervals {
+            match cur {
+                None => cur = Some((t0, t1)),
+                Some((c0, c1)) => {
+                    if t0 <= c1 {
+                        cur = Some((c0, c1.max(t1)));
+                    } else {
+                        total += c1 - c0;
+                        cur = Some((t0, t1));
+                    }
+                }
+            }
+        }
+        if let Some((c0, c1)) = cur {
+            total += c1 - c0;
+        }
+        total * seg_len
+    }
+
+    #[test]
+    fn test_node_connector_overlap_union_of_overlapping_boxes() {
+        // A horizontal Link between aux #1 (0,0) and aux #2 (400,0) at y=0. Two
+        // NON-incident stocks straddle the line AND overlap each other:
+        //   stock #3 @ (200,0): shape x [177.5, 222.5]
+        //   stock #4 @ (210,0): shape x [187.5, 232.5]
+        // Their shape boxes overlap in x [187.5, 222.5]. The OLD code charged the
+        // connector for box A (length 45) PLUS box B (length 45) = 90, but the
+        // physical connector length under (A OR B) is the union x [177.5, 232.5]
+        // = 55. The new metric must equal union/total, and the old sum/total
+        // strictly exceeds it.
+        let a = aux(1, "a", 0.0, 0.0);
+        let b = aux(2, "b", 400.0, 0.0);
+        let s3 = stock(3, "s3", 200.0, 0.0);
+        let s4 = stock(4, "s4", 210.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, s3.clone(), s4.clone(), link]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let connectors = collect_connector_geometry(&view);
+        assert_eq!(connectors.len(), 1);
+        let c = &connectors[0];
+        let box3 = node_shape_box(&s3).unwrap();
+        let box4 = node_shape_box(&s4).unwrap();
+
+        // Independent union reference and the old (double-counting) sum.
+        let mut union_len = 0.0;
+        let mut old_sum_len = 0.0;
+        for seg in c.polyline.windows(2) {
+            union_len += union_segment_length_in_rects(&seg[0], &seg[1], &[box3, box4]);
+            old_sum_len += segment_length_in_rect(&seg[0], &seg[1], &box3)
+                + segment_length_in_rect(&seg[0], &seg[1], &box4);
+        }
+        let expected = union_len / c.length;
+        let old_value = old_sum_len / c.length;
+
+        // The fixture must actually overlap so the old sum strictly exceeds the
+        // union (otherwise the test proves nothing).
+        assert!(
+            old_value > expected + 1e-9,
+            "fixture must double-count: old {old_value} should exceed union {expected}"
+        );
+        assert!(
+            (m.node_connector_overlap - expected).abs() < 1e-9,
+            "node_connector_overlap must equal the union fraction: got {} expected {} \
+             (old double-counted value was {})",
+            m.node_connector_overlap,
+            expected,
+            old_value
+        );
+        assert!(
+            m.node_connector_overlap <= 1.0,
+            "node_connector_overlap is a fraction and must be <= 1.0, got {}",
+            m.node_connector_overlap
+        );
+    }
+
+    #[test]
+    fn test_node_connector_overlap_coincident_boxes_counted_once() {
+        // Starker variant: a connector sub-length fully inside TWO COINCIDENT
+        // non-incident boxes is counted ONCE, not twice. Two stocks at the same
+        // position (200,0) each fully contain the connector segment x [177.5,
+        // 222.5]. The OLD code would count that length twice (~2x); the union
+        // counts it once. We also build the fixture so the total connector length
+        // is small enough that the OLD value EXCEEDS 1.0 -- impossible for a
+        // documented fraction. Auxes are placed close in (x 180 and 220) so the
+        // drawn connector is short and lies entirely within the coincident boxes.
+        let a = aux(1, "a", 180.0, 0.0);
+        let b = aux(2, "b", 220.0, 0.0);
+        let s3 = stock(3, "s3", 200.0, 0.0);
+        let s4 = stock(4, "s4", 200.0, 0.0);
+        let link = straight_link(10, 1, 2);
+        let view = make_view(vec![a, b, s3.clone(), s4.clone(), link]);
+
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let connectors = collect_connector_geometry(&view);
+        assert_eq!(connectors.len(), 1);
+        let c = &connectors[0];
+        let box3 = node_shape_box(&s3).unwrap();
+        let box4 = node_shape_box(&s4).unwrap();
+
+        let mut union_len = 0.0;
+        let mut old_sum_len = 0.0;
+        for seg in c.polyline.windows(2) {
+            union_len += union_segment_length_in_rects(&seg[0], &seg[1], &[box3, box4]);
+            old_sum_len += segment_length_in_rect(&seg[0], &seg[1], &box3)
+                + segment_length_in_rect(&seg[0], &seg[1], &box4);
+        }
+        let expected = union_len / c.length;
+        let old_value = old_sum_len / c.length;
+
+        // With two coincident boxes both covering the whole drawn connector, the
+        // union fraction is 1.0 and the old value is ~2.0 (> 1.0, impossible for a
+        // fraction).
+        assert!(
+            old_value > 1.0,
+            "coincident-box fixture must drive the OLD value above 1.0 (got {old_value})"
+        );
+        assert!(
+            (expected - 1.0).abs() < 1e-9,
+            "union of two coincident boxes covering the whole connector is the full \
+             length (fraction 1.0), got {expected}"
+        );
+        assert!(
+            (m.node_connector_overlap - expected).abs() < 1e-9,
+            "coincident non-incident boxes must be counted once: got {} expected {} \
+             (old double-counted value was {})",
+            m.node_connector_overlap,
+            expected,
+            old_value
+        );
+        assert!(
+            m.node_connector_overlap <= 1.0 + 1e-9,
+            "node_connector_overlap is a fraction and must be <= 1.0, got {}",
+            m.node_connector_overlap
+        );
+    }
+
+    // --- AC1.4: label_overlap (per-label obscuration) ---
+    //
+    // label_overlap is the SUM over labeled elements of each label's obscured
+    // fraction: the area of the label box covered by any OTHER label box or any
+    // OTHER element's bare shape box, capped at the label's own area and divided
+    // by it (so each term is in [0,1]). 0 = no label obscured. A small overlap
+    // registers at its true per-label obscuration fraction rather than being
+    // diluted by the corpus's total label area (the old area/total definition's
+    // under-counting; see `test_label_overlap_small_clip_is_sensitive`).
+
+    #[test]
+    fn test_label_overlap_overlapping_labels() {
+        // Two auxes at the same position -> their labels (Bottom) coincide
+        // exactly. Each label is fully covered by the other (capped at its own
+        // area), so each obscured fraction is 1.0 and the sum is 2.0.
+        let view = make_view(vec![
+            aux(1, "samename", 100.0, 100.0),
+            aux(2, "samename", 100.0, 100.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            (m.label_overlap - 2.0).abs() < 1e-9,
+            "two coincident labels are each fully obscured: expected 2.0, got {}",
+            m.label_overlap
+        );
+    }
+
+    #[test]
+    fn test_label_overlap_disjoint_is_zero() {
+        // Two auxes far apart -> no label is covered by anything. Sum of
+        // obscured fractions is 0.0.
+        let view = make_view(vec![aux(1, "a", 0.0, 0.0), aux(2, "b", 1000.0, 1000.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.label_overlap, 0.0);
+    }
+
+    #[test]
+    fn test_label_overlap_counts_label_pair_exactly_once() {
+        // The Phase-1 double-count guard, restated for per-label obscuration: a
+        // label is never charged against its OWN element's shape box, and a
+        // label-vs-label collision is counted from each label's own perspective
+        // (both labels are unreadable -- that is intended), not via the other
+        // node's label-merged bounds.
+        //
+        // Fixture: two `LabelSide::Bottom` auxes named "samename" (8 chars).
+        //   AUX_RADIUS = 9; label editor width = 8*6 + 10 = 58, height = 14.
+        //   With Bottom labels, label top = cy + 9 + LABEL_PADDING(4) = cy + 13,
+        //   bottom = cy + 27, left = cx - 29, right = cx + 29.
+        //
+        // Place them 40px apart horizontally, same y:
+        //   aux1 @ (0,0): shape [-9,9]x[-9,9],  label [-29,29]x[13,27]
+        //   aux2 @ (40,0): shape [31,49]x[-9,9], label [11,69]x[13,27]
+        //
+        // SHAPE boxes do NOT overlap (9 < 31), and each label clears the OTHER
+        // aux's bare shape box entirely (label y [13,27] vs shape y [-9,9]). The
+        // LABELS overlap by x:[11,29]=18, y:[13,27]=14 -> 252. Each label box has
+        // area 58*14 = 812 and is covered only by the other label (252 < 812, no
+        // cap), so each obscured fraction is 252/812 and the sum is 504/812.
+        let view = make_view(vec![
+            aux(1, "samename", 0.0, 0.0),
+            aux(2, "samename", 40.0, 0.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let label_area = 58.0 * 14.0; // 812.0
+        let overlap = 18.0 * 14.0; // 252.0, the single label-label intersection
+        let expected = (overlap / label_area) + (overlap / label_area); // 504/812
+        assert!(
+            (m.label_overlap - expected).abs() < 1e-9,
+            "per-label obscuration should sum each label's fraction once: got {} expected {}",
+            m.label_overlap,
+            expected
+        );
+    }
+
+    #[test]
+    fn test_label_overlap_never_charged_against_own_shape() {
+        // A single labeled aux: its Bottom label sits adjacent to (and partly
+        // within the merged bounds of) its OWN shape. A label is never charged
+        // against its own element's shape, and there is no other element, so the
+        // obscured fraction is 0 and label_overlap is exactly 0.0.
+        let view = make_view(vec![aux(1, "samename", 0.0, 0.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(
+            m.label_overlap, 0.0,
+            "a label must never be charged against its own element's shape box"
+        );
+    }
+
+    #[test]
+    fn test_label_overlap_small_clip_is_sensitive() {
+        // A small node SHAPE clipping a few characters of a short label must
+        // register at its true per-label obscuration fraction, NOT be diluted to
+        // ~0 by the corpus's total label area (the old area/total under-count).
+        //
+        // L: aux "ab" (2 chars) @ (0,0), Bottom label.
+        //   editor_width = 2*6 + 10 = 22, height 14 -> label area 308.
+        //   label box: left -11, right 11, top 13, bottom 27.
+        // O: a cloud (no label) @ (18, 20). cloud_bounds (CLOUD_RADIUS 13.5):
+        //   x [4.5, 31.5], y [6.5, 33.5].
+        //   Overlap with L's label: x [4.5,11]=6.5, y [13,27]=14 -> 91.
+        //   obscured_fraction(L) = 91/308 ~= 0.2955; the cloud has no label, so
+        //   the sum is exactly 91/308.
+        // Plus 15 far-apart auxes with long (20-char) labels: each label area
+        //   20*6+10 = 130 wide * 14 = 1820, none overlapping anything. They add
+        //   nothing to the per-label SUM (obscured fraction 0 each) but bloat the
+        //   OLD denominator (total label area), so the OLD area/total score for
+        //   the same clip collapses to ~0.003 -- the under-count this fixes.
+        let mut elements = vec![aux(1, "ab", 0.0, 0.0), cloud(2, 18.0, 20.0)];
+        for k in 0..15 {
+            // Far apart on a 1000px grid so nothing overlaps; 20-char names.
+            elements.push(aux(
+                100 + k,
+                "abcdefghijklmnopqrst",
+                3000.0 + f64::from(k) * 1000.0,
+                3000.0,
+            ));
+        }
+        let view = make_view(elements);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let label_area = 22.0 * 14.0; // 308.0
+        let clip_area = 6.5 * 14.0; // 91.0
+        let expected = clip_area / label_area; // ~0.2955
+        assert!(
+            (m.label_overlap - expected).abs() < 1e-9,
+            "small clip must score its per-label obscuration fraction: got {} expected {}",
+            m.label_overlap,
+            expected
+        );
+        assert!(
+            m.label_overlap > 0.1,
+            "a readability-killing clip must register clearly (> 0.1), got {}",
+            m.label_overlap
+        );
+
+        // Confirm the OLD area/total definition would have under-counted this to
+        // near-zero: the same clip area divided by the corpus total label area.
+        let total_label_area = label_area + 15.0 * (130.0 * 14.0); // 308 + 27300
+        let old_score = clip_area / total_label_area; // ~0.0033
+        assert!(
+            old_score < 0.01,
+            "fixture must demonstrate the old under-count (< 0.01), got {}",
+            old_score
+        );
+        assert!(
+            m.label_overlap > old_score * 50.0,
+            "new per-label score {} must be far larger than the old {}",
+            m.label_overlap,
+            old_score
+        );
+    }
+
+    // --- AC1.5: aspect_penalty ---
+
+    #[test]
+    fn test_aspect_penalty_thin_box_positive() {
+        // Two auxes stacked far apart vertically and close horizontally -> the
+        // node bounding box is tall and thin (ar >> target), so penalty > 0.
+        let view = make_view(vec![aux(1, "a", 0.0, 0.0), aux(2, "b", 0.0, 1000.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.aspect_penalty > 0.0,
+            "a tall thin bbox must be penalized, got {}",
+            m.aspect_penalty
+        );
+
+        // Verify it equals exactly `ar - TARGET_AR_MAX` for the computed bbox.
+        let node_boxes: Vec<(i32, Rect)> = view
+            .elements
+            .iter()
+            .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r)))
+            .collect();
+        let bbox = view_bounding_box(&node_boxes).unwrap();
+        let w = common::rect_width(&bbox);
+        let h = common::rect_height(&bbox);
+        let (long, short) = if w >= h { (w, h) } else { (h, w) };
+        let expected = (long / short - TARGET_AR_MAX).max(0.0);
+        assert!((m.aspect_penalty - expected).abs() < 1e-9);
+    }
+
+    #[test]
+    fn test_aspect_penalty_balanced_box_zero() {
+        // Four auxes placed so the bounding box is ~4:3 (well inside the 16:9
+        // band) -> zero penalty. Width 400, height 300 between centers; the
+        // fixed node radii add a small symmetric margin that keeps ar < 16/9.
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 400.0, 0.0),
+            aux(3, "c", 0.0, 300.0),
+            aux(4, "d", 400.0, 300.0),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        // Confirm the bbox aspect ratio really is inside the band for this
+        // fixture, then assert the penalty is exactly zero.
+        let node_boxes: Vec<(i32, Rect)> = view
+            .elements
+            .iter()
+            .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r)))
+            .collect();
+        let bbox = view_bounding_box(&node_boxes).unwrap();
+        let w = common::rect_width(&bbox);
+        let h = common::rect_height(&bbox);
+        let ar = w.max(h) / w.min(h);
+        assert!(ar <= TARGET_AR_MAX, "fixture bbox ar {} not in band", ar);
+        assert_eq!(m.aspect_penalty, 0.0);
+    }
+
+    // --- AC1.6: weighted_cost is the exact linear combination ---
+
+    #[test]
+    fn test_weighted_cost_exact_linear_combination() {
+        let m = LayoutMetrics {
+            node_overlap: 1.5,
+            node_connector_overlap: 2.0,
+            label_overlap: 0.5,
+            crossings: 3.0,
+            sprawl: 4.0,
+            edge_length_cv: 0.25,
+            aspect_penalty: 6.0,
+            chain_straightness: 7.0,
+            loop_compactness: 8.0,
+        };
+        let w = MetricWeights {
+            node_overlap: 10.0,
+            node_connector_overlap: 20.0,
+            label_overlap: 30.0,
+            crossings: 40.0,
+            sprawl: 50.0,
+            edge_length_cv: 60.0,
+            aspect_penalty: 70.0,
+            chain_straightness: 80.0,
+            loop_compactness: 90.0,
+        };
+        let expected = 1.5 * 10.0
+            + 2.0 * 20.0
+            + 0.5 * 30.0
+            + 3.0 * 40.0
+            + 4.0 * 50.0
+            + 0.25 * 60.0
+            + 6.0 * 70.0
+            + 7.0 * 80.0
+            + 8.0 * 90.0;
+        assert!((m.weighted_cost(&w) - expected).abs() < 1e-9);
+    }
+
+    // --- AC5.1: the committed calibrated default expresses readability dominance ---
+    //
+    // The Phase-1 placeholder default was all-zeros (so a pre-calibration
+    // `weighted_cost` was inert). Phase 4 commits real, user-signed-off weights
+    // (2026-05-23), so the default is no longer all-zeros and `weighted_cost`
+    // under it is now meaningful. This test pins the DOMINANCE ORDERING the
+    // committed weights encode -- relationships rather than magic numbers, so it
+    // documents the intent and survives minor retuning -- and re-confirms that
+    // `weighted_cost` applies the default exactly as Σ wᵢ·termᵢ. It replaces the
+    // old "default is all-zeros so cost is inert" assertion, which is no longer
+    // true by design.
+
+    #[test]
+    fn test_default_weights_readability_dominant_ordering() {
+        let w = MetricWeights::default();
+
+        // The dominant "overlap + crossings" family: each term that hurts
+        // readability (shapes overlapping shapes, connectors under shapes, labels
+        // obscured, edges crossing) must outweigh every compactness/aspect term.
+        let dominant = [
+            w.node_overlap,
+            w.node_connector_overlap,
+            w.label_overlap,
+            w.crossings,
+        ];
+        let compactness = [w.sprawl, w.edge_length_cv, w.aspect_penalty];
+        for &d in &dominant {
+            for &c in &compactness {
+                assert!(
+                    d > c,
+                    "every readability term ({d}) must strictly exceed every \
+                     compactness/aspect term ({c})"
+                );
+            }
+        }
+
+        // Compactness/aspect are intentionally zero: spreading out to keep labels
+        // legible and feedback loops visible is good, not penalized.
+        assert_eq!(w.sprawl, 0.0, "sprawl is not a goal");
+        assert_eq!(
+            w.edge_length_cv, 0.0,
+            "edge-length uniformity is not a goal"
+        );
+        assert_eq!(w.aspect_penalty, 0.0, "aspect ratio is not a goal");
+
+        // chain_straightness is reserved (not yet computed), so it carries no
+        // weight.
+        assert_eq!(
+            w.chain_straightness, 0.0,
+            "chain_straightness is reserved and must stay zero"
+        );
+
+        // loop_compactness rewards visible feedback-loop circles, but only as a
+        // gentle nudge: a low, non-dominant weight strictly between zero and the
+        // dominant family.
+        assert!(
+            w.loop_compactness > 0.0,
+            "loop_compactness should gently reward visible loops, got {}",
+            w.loop_compactness
+        );
+        assert!(
+            w.loop_compactness < w.node_overlap,
+            "loop_compactness ({}) must stay below the dominant node_overlap ({})",
+            w.loop_compactness,
+            w.node_overlap
+        );
+
+        // `weighted_cost` under the default is still the exact linear combination
+        // (the default is now meaningful, not inert): verify against an explicit
+        // Σ wᵢ·termᵢ over a hand-set metrics value.
+        let m = LayoutMetrics {
+            node_overlap: 0.3,
+            node_connector_overlap: 0.1,
+            label_overlap: 0.7,
+            crossings: 2.0,
+            sprawl: 5.0,
+            edge_length_cv: 0.4,
+            aspect_penalty: 1.5,
+            chain_straightness: 0.0,
+            loop_compactness: 0.8,
+        };
+        let expected = m.node_overlap * w.node_overlap
+            + m.node_connector_overlap * w.node_connector_overlap
+            + m.label_overlap * w.label_overlap
+            + m.crossings * w.crossings
+            + m.sprawl * w.sprawl
+            + m.edge_length_cv * w.edge_length_cv
+            + m.aspect_penalty * w.aspect_penalty
+            + m.chain_straightness * w.chain_straightness
+            + m.loop_compactness * w.loop_compactness;
+        assert!(
+            (m.weighted_cost(&w) - expected).abs() < 1e-12,
+            "weighted_cost under the default must equal Σ wᵢ·termᵢ: got {} expected {}",
+            m.weighted_cost(&w),
+            expected
+        );
+    }
+
+    // --- AC1.7: empty / single-element views are all-zero and finite ---
+
+    fn assert_all_finite(m: &LayoutMetrics) {
+        assert!(m.node_overlap.is_finite());
+        assert!(m.node_connector_overlap.is_finite());
+        assert!(m.label_overlap.is_finite());
+        assert!(m.crossings.is_finite());
+        assert!(m.sprawl.is_finite());
+        assert!(m.edge_length_cv.is_finite());
+        assert!(m.aspect_penalty.is_finite());
+        assert!(m.chain_straightness.is_finite());
+        assert!(m.loop_compactness.is_finite());
+    }
+
+    fn assert_all_zero(m: &LayoutMetrics) {
+        assert_eq!(m.node_overlap, 0.0);
+        assert_eq!(m.node_connector_overlap, 0.0);
+        assert_eq!(m.label_overlap, 0.0);
+        assert_eq!(m.crossings, 0.0);
+        assert_eq!(m.sprawl, 0.0);
+        assert_eq!(m.edge_length_cv, 0.0);
+        assert_eq!(m.aspect_penalty, 0.0);
+        assert_eq!(m.chain_straightness, 0.0);
+        assert_eq!(m.loop_compactness, 0.0);
+    }
+
+    #[test]
+    fn test_empty_view_all_zero_finite() {
+        let view = make_view(vec![]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_all_finite(&m);
+        assert_all_zero(&m);
+    }
+
+    #[test]
+    fn test_single_element_view_all_zero_finite() {
+        let view = make_view(vec![aux(1, "only", 100.0, 100.0)]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_all_finite(&m);
+        // A single node has no overlaps, no connectors, and a degenerate (zero
+        // short-side? no -- a real box) bounding box. Its aspect ratio is the
+        // single aux box's own ar, which for a square-ish aux box is ~1 (inside
+        // the band), so aspect_penalty is 0; all connector terms are 0.
+        assert_eq!(m.node_overlap, 0.0);
+        assert_eq!(m.node_connector_overlap, 0.0);
+        assert_eq!(m.crossings, 0.0);
+        assert_eq!(m.sprawl, 0.0);
+        assert_eq!(m.edge_length_cv, 0.0);
+    }
+
+    // --- AC1.8 (scoped): scale invariance under uniform coordinate scaling ---
+    //
+    // SCOPING (correction to the AC1.8 plan note, 2026-05-22): the plan listed
+    // `node_connector_overlap`, `crossings`, `edge_length_cv`, and
+    // `aspect_penalty` as scale-free. After implementing the metric against the
+    // ACTUAL renderer geometry (the design's load-bearing invariant: metrics
+    // are computed on the same geometry the renderer draws), only `crossings`
+    // is exactly scale-invariant -- and even then only for crossings that lie
+    // INTERIOR to both connectors, away from the fixed-size node boundaries the
+    // polylines are clipped to (a crossing grazing a node boundary near a
+    // segment endpoint can flip; see the detailed note at the assertion below).
+    // This fixture's crossing is at the center of the square the two links form,
+    // squarely in that interior regime. The reason the other terms are not
+    // exactly invariant is the same fixed-pixel element geometry the plan
+    // already cites for node_overlap/label_overlap/sprawl, and it propagates
+    // further than the plan anticipated:
+    //
+    //   * Connectors are clipped to fixed-radius element boundaries, so a
+    //     straight link's drawn length is `s*center_dist - r_from - r_to`
+    //     (AFFINE in `s`, not linear). Hence `edge_length_cv = stddev/mean` of
+    //     those affine lengths is only ASYMPTOTICALLY invariant (the fixed
+    //     offset shrinks relative to the scaled spread), not exactly.
+    //   * `node_connector_overlap` divides an inside-fixed-box overlap length
+    //     (which does NOT scale) by total connector length (which does), so it
+    //     shrinks like ~1/s -- scale-SENSITIVE, like `sprawl`.
+    //   * The view bounding box is `union(fixed boxes around scaled centers)`,
+    //     so its width/height are each `s*span + fixed_box_size`; the aspect
+    //     ratio is therefore only asymptotically invariant.
+    //
+    // The principled resolution keeps renderer-faithful geometry (the whole
+    // point of the phase) and accepts that only the topological `crossings`
+    // term is exactly scale-invariant. This test asserts that exactly, and
+    // additionally pins the documented scale-SENSITIVITY of
+    // `node_connector_overlap` (clean ~1/s) so the scoping is non-vacuous. The
+    // mismatch with the plan's term list is surfaced in the executor report and
+    // tracked for the calibration phase.
+    //
+    // The fixture has zero node-overlap and zero label-overlap so those
+    // scale-sensitive area terms are trivially 0 before and after scaling.
+    #[test]
+    fn test_scale_invariance_of_scale_free_terms() {
+        // A small connected, well-separated view: three auxes and two stocks,
+        // far enough apart that there is no node-overlap and no label-overlap,
+        // with two straight links (one of which passes through a non-incident
+        // node so node_connector_overlap is nonzero and meaningful).
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 400.0, 0.0),
+            stock(3, "s", 200.0, 0.0), // on the a->b line: nonzero conn overlap
+            aux(4, "c", 0.0, 300.0),
+            stock(5, "t", 400.0, 320.0),
+            straight_link(10, 1, 2), // passes through stock #3
+            straight_link(11, 4, 5),
+        ]);
+
+        let base = compute_layout_metrics(&view, &cfg());
+        // Sanity: the fixture must have zero node/label overlap (so the
+        // scale-sensitive area terms are trivially scale-equal) and a nonzero
+        // conn-overlap (so the documented scale-SENSITIVITY check is
+        // non-vacuous).
+        assert_eq!(base.node_overlap, 0.0, "fixture must have no node overlap");
+        assert_eq!(
+            base.label_overlap, 0.0,
+            "fixture must have no label overlap"
+        );
+        assert!(
+            base.node_connector_overlap > 0.0,
+            "fixture must have a connector through a non-incident node"
+        );
+
+        let s = 3.0;
+        let scaled = compute_layout_metrics(&scale_view(&view, s), &cfg());
+
+        // The one exactly scale-invariant term here: edge crossings.
+        //
+        // Crossings are NOT *universally* scale-invariant. A crossing is counted
+        // on the drawn polylines, which are clipped to the same fixed-pixel node
+        // boxes (the connector endpoints sit on element boundaries that do not
+        // scale). A crossing that merely grazes a node boundary near a segment
+        // endpoint can therefore appear or disappear under uniform scale.
+        // Crossings that lie comfortably INTERIOR to both connectors (away from
+        // those fixed-size boundaries) are exactly preserved, because the
+        // interior of each polyline is an exact affine image of itself under
+        // uniform scale and an intersection of two segments is invariant under a
+        // shared affine map. This fixture's crossing is at the center of the
+        // square the two links form -- maximally far from every node box -- so
+        // it is squarely in the scale-invariant interior regime and the count is
+        // preserved exactly.
+        assert!(
+            (scaled.crossings - base.crossings).abs() < 1e-9,
+            "crossings not scale-invariant: {} vs {}",
+            scaled.crossings,
+            base.crossings
+        );
+
+        // Documented scale-SENSITIVITY of node_connector_overlap: with
+        // fixed-size node boxes, scaling the coordinates by `s` leaves the
+        // inside-box overlap length essentially unchanged (the box and the
+        // line's center crossing are fixed) while total connector length grows
+        // with `s`, so the ratio strictly DECREASES under up-scaling. (It does
+        // not drop by exactly 1/s because the denominator -- connector length
+        // clipped to fixed-radius element boundaries -- is affine in `s`, not
+        // linear; we assert the robust direction rather than a brittle factor.)
+        assert!(
+            scaled.node_connector_overlap < base.node_connector_overlap,
+            "node_connector_overlap should DROP under up-scaling (fixed boxes): \
+             scaled {} should be < base {}",
+            scaled.node_connector_overlap,
+            base.node_connector_overlap
+        );
+    }
+
+    // --- Property test: node_overlap is symmetric under element shuffle ---
+
+    proptest! {
+        #![proptest_config(ProptestConfig::with_cases(64))]
+
+        /// node_overlap is a sum over unordered element pairs, so it must be
+        /// invariant under any permutation of the element list.
+        #[test]
+        fn prop_node_overlap_shuffle_invariant(
+            // four stocks at small integer-ish coordinates so some overlap and
+            // some don't; coordinates kept modest to stay fast.
+            xs in prop::collection::vec(-50.0f64..50.0, 4),
+            ys in prop::collection::vec(-50.0f64..50.0, 4),
+            perm in prop::sample::subsequence(vec![0usize, 1, 2, 3], 4),
+        ) {
+            let elems: Vec<ViewElement> = (0..4)
+                .map(|i| stock(i as i32 + 1, "n", xs[i], ys[i]))
+                .collect();
+
+            let base = compute_layout_metrics(&make_view(elems.clone()), &cfg());
+
+            // `perm` is a random ordering of [0,1,2,3]; reorder accordingly.
+            let shuffled: Vec<ViewElement> = perm.iter().map(|&i| elems[i].clone()).collect();
+            let other = compute_layout_metrics(&make_view(shuffled), &cfg());
+
+            prop_assert!(
+                (base.node_overlap - other.node_overlap).abs() < 1e-9,
+                "node_overlap changed under shuffle: {} vs {}",
+                base.node_overlap,
+                other.node_overlap
+            );
+        }
+    }
+
+    // --- loop_compactness (isoperimetric loop quality) ---
+
+    /// The center of a node's bare shape box (which is symmetric about the
+    /// element position, so this is the element center). Mirrors the centers the
+    /// metric uses to build each loop polygon.
+    fn shape_center(e: &ViewElement) -> Point {
+        let r = node_shape_box(e).unwrap();
+        Point {
+            x: (r.left + r.right) / 2.0,
+            y: (r.top + r.bottom) / 2.0,
+        }
+    }
+
+    /// Hand-computed isoperimetric penalty `1 - Q` for a polygon over the given
+    /// centers in order (shoelace area, summed-edge perimeter, Q clamped to
+    /// [0,1]). The test's independent oracle for `loop_compactness`.
+    fn expected_loop_penalty(centers: &[Point]) -> f64 {
+        let n = centers.len();
+        let mut area2 = 0.0;
+        let mut perim = 0.0;
+        for i in 0..n {
+            let a = centers[i];
+            let b = centers[(i + 1) % n];
+            area2 += a.x * b.y - b.x * a.y;
+            let dx = b.x - a.x;
+            let dy = b.y - a.y;
+            perim += (dx * dx + dy * dy).sqrt();
+        }
+        let area = area2.abs() / 2.0;
+        let q = (4.0 * std::f64::consts::PI * area / (perim * perim)).clamp(0.0, 1.0);
+        1.0 - q
+    }
+
+    #[test]
+    fn test_loop_compactness_circle_loop_near_zero() {
+        // Eight stocks placed on a circle of radius 300, wired into a directed
+        // 8-cycle by links 1->2->...->8->1. A well-spread loop reads as a clean
+        // circle, so its isoperimetric quotient Q is close to 1 and the penalty
+        // (1 - Q) is small.
+        let n: i32 = 8;
+        let radius = 300.0;
+        let mut elements: Vec<ViewElement> = Vec::new();
+        let mut centers: Vec<Point> = Vec::new();
+        for i in 0..n {
+            let theta = 2.0 * std::f64::consts::PI * f64::from(i) / f64::from(n);
+            let x = radius * theta.cos();
+            let y = radius * theta.sin();
+            let e = stock(i + 1, "n", x, y);
+            centers.push(shape_center(&e));
+            elements.push(e);
+        }
+        for i in 0..n {
+            let from = i + 1;
+            let to = (i + 1) % n + 1;
+            elements.push(straight_link(100 + i, from, to));
+        }
+        let view = make_view(elements);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let expected = expected_loop_penalty(&centers);
+        assert!(
+            (m.loop_compactness - expected).abs() < 1e-9,
+            "loop_compactness {} != hand-computed penalty {}",
+            m.loop_compactness,
+            expected
+        );
+        // A regular octagon's penalty is ~0.05 -- "near 0" (a clean circle).
+        assert!(
+            m.loop_compactness < 0.1,
+            "a well-spread circular loop should score near 0, got {}",
+            m.loop_compactness
+        );
+    }
+
+    #[test]
+    fn test_loop_compactness_collapsed_loop_higher() {
+        // The SAME directed 8-cycle, but the nodes are squished onto a nearly
+        // straight line (a collapsed/collinear loop). The polygon area shrinks
+        // toward zero while the perimeter stays large, so Q -> 0 and the penalty
+        // (1 - Q) -> 1: clearly higher than the circular placement.
+        let n: i32 = 8;
+        let mut elements: Vec<ViewElement> = Vec::new();
+        let mut centers: Vec<Point> = Vec::new();
+        for i in 0..n {
+            // Spread along x, with a tiny alternating y wobble so the polygon is
+            // non-degenerate (nonzero perimeter) but nearly collinear.
+            let x = f64::from(i) * 100.0;
+            let y = if i % 2 == 0 { 0.0 } else { 1.0 };
+            let e = stock(i + 1, "n", x, y);
+            centers.push(shape_center(&e));
+            elements.push(e);
+        }
+        for i in 0..n {
+            let from = i + 1;
+            let to = (i + 1) % n + 1;
+            elements.push(straight_link(100 + i, from, to));
+        }
+        let view = make_view(elements);
+        let m = compute_layout_metrics(&view, &cfg());
+
+        let expected = expected_loop_penalty(&centers);
+        assert!(
+            (m.loop_compactness - expected).abs() < 1e-9,
+            "loop_compactness {} != hand-computed penalty {}",
+            m.loop_compactness,
+            expected
+        );
+        // A nearly-collinear loop scores near 1 (squished).
+        assert!(
+            m.loop_compactness > 0.9,
+            "a collapsed/collinear loop should score near 1, got {}",
+            m.loop_compactness
+        );
+    }
+
+    #[test]
+    fn test_loop_compactness_no_cycle_is_zero() {
+        // A pure chain a -> b -> c (no feedback) has no directed cycle, so there
+        // is nothing to score: loop_compactness == 0.0.
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 200.0, 0.0),
+            aux(3, "c", 400.0, 0.0),
+            straight_link(10, 1, 2),
+            straight_link(11, 2, 3),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.loop_compactness, 0.0);
+    }
+
+    #[test]
+    fn test_loop_compactness_two_node_mutual_pair_is_zero() {
+        // A 2-node mutual pair (a -> b -> a) is a cycle, but two points form no
+        // polygon (fewer than 3 distinct nodes), so it contributes nothing.
+        let view = make_view(vec![
+            aux(1, "a", 0.0, 0.0),
+            aux(2, "b", 200.0, 0.0),
+            straight_link(10, 1, 2),
+            straight_link(11, 2, 1),
+        ]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert_eq!(m.loop_compactness, 0.0);
+    }
+
+    #[test]
+    fn test_loop_compactness_flow_feedback_path_is_a_cycle() {
+        // A stock--flow--stock feedback path must enter the loop graph: stock #1
+        // and stock #2 connected by flow #3 (so #1 -> #3 -> #2), plus a link
+        // #2 -> #1 closing the loop. The cycle is {#1, #3, #2}: three distinct
+        // positioned nodes -> a real polygon -> a positive penalty.
+        let s1 = stock(1, "a", 0.0, 0.0);
+        let s2 = stock(2, "b", 300.0, 0.0);
+        let f = flow_between(3, "f", 150.0, 200.0, 1, 2);
+        let link = straight_link(10, 2, 1);
+        let view = make_view(vec![s1, s2, f, link]);
+        let m = compute_layout_metrics(&view, &cfg());
+        assert!(
+            m.loop_compactness > 0.0,
+            "a stock--flow--stock feedback path must form a scored loop, got {}",
+            m.loop_compactness
+        );
+    }
+
+    /// A stock--flow--stock loop whose flow has an extra pipe point placed far
+    /// from the valve, plus a closing link. The flow valve sits at `valve`; an
+    /// interior pipe point at `bend` (between the two attached endpoints) bends
+    /// the drawn pipe. `loop_compactness` must score the loop on the flow's
+    /// VALVE (its visual center), NOT on `flow_shape_bounds`' pipe-extent bbox
+    /// center, so the result must depend only on `valve` -- never on `bend`.
+    fn bent_flow_loop_view(valve: Point, bend: Point) -> datamodel::StockFlow {
+        let s1 = stock(1, "a", 0.0, 0.0);
+        let s2 = stock(2, "b", 300.0, 0.0);
+        let f = ViewElement::Flow(view_element::Flow {
+            name: "f".to_string(),
+            uid: 3,
+            x: valve.x,
+            y: valve.y,
+            label_side: LabelSide::Bottom,
+            points: vec![
+                view_element::FlowPoint {
+                    x: 0.0,
+                    y: 0.0,
+                    attached_to_uid: Some(1),
+                },
+                // An interior pipe point that bends the drawn pipe and stretches
+                // `flow_shape_bounds`' bbox, but is NOT the valve.
+                view_element::FlowPoint {
+                    x: bend.x,
+                    y: bend.y,
+                    attached_to_uid: None,
+                },
+                view_element::FlowPoint {
+                    x: 300.0,
+                    y: 0.0,
+                    attached_to_uid: Some(2),
+                },
+            ],
+            compat: None,
+            label_compat: None,
+        });
+        let link = straight_link(10, 2, 1);
+        make_view(vec![s1, s2, f, link])
+    }
+
+    #[test]
+    fn test_loop_compactness_scored_on_flow_valve_not_pipe_extent() {
+        // The loop vertex for a flow must be its VALVE (the renderer's visual
+        // center), not the center of `flow_shape_bounds` (which unions the valve
+        // box with every pipe point). Extending the pipe with a far interior
+        // point moves the pipe-extent bbox center but leaves the valve fixed, so
+        // `loop_compactness` -- which scores the feedback-loop polygon -- must be
+        // UNCHANGED. On the buggy (shape-box-midpoint) implementation it changes.
+        let valve = Point { x: 150.0, y: 200.0 };
+
+        // A pipe bend near the valve vs. one stretched far away. The valve is
+        // identical in both, so the loop polygon (stock--valve--stock) is too.
+        let near = compute_layout_metrics(
+            &bent_flow_loop_view(valve, Point { x: 150.0, y: 210.0 }),
+            &cfg(),
+        );
+        let far = compute_layout_metrics(
+            &bent_flow_loop_view(
+                valve,
+                Point {
+                    x: 150.0,
+                    y: 2000.0,
+                },
+            ),
+            &cfg(),
+        );
+
+        assert!(
+            near.loop_compactness > 0.0,
+            "fixture must form a real (positive-penalty) loop, got {}",
+            near.loop_compactness
+        );
+        assert!(
+            (near.loop_compactness - far.loop_compactness).abs() < 1e-12,
+            "loop_compactness must score the flow VALVE, not the pipe-extent bbox \
+             center: stretching the pipe changed it from {} to {}",
+            near.loop_compactness,
+            far.loop_compactness
+        );
+
+        // Non-vacuous guard: MOVING the valve (with the same pipe bend) DOES
+        // change the loop polygon, so the metric is not trivially constant.
+        let moved_valve = compute_layout_metrics(
+            &bent_flow_loop_view(Point { x: 150.0, y: 400.0 }, Point { x: 150.0, y: 210.0 }),
+            &cfg(),
+        );
+        assert!(
+            (near.loop_compactness - moved_valve.loop_compactness).abs() > 1e-9,
+            "moving the valve must change loop_compactness (test is not trivially \
+             constant): {} vs {}",
+            near.loop_compactness,
+            moved_valve.loop_compactness
+        );
+    }
+
+    #[test]
+    fn test_loop_compactness_deterministic_under_shuffle() {
+        // loop_compactness is a mean over cycles, each computed from node-box
+        // centers in cycle order. It must be invariant to the order elements
+        // appear in the view's element list.
+        let n: i32 = 6;
+        let radius = 250.0;
+        let mut elements: Vec<ViewElement> = Vec::new();
+        for i in 0..n {
+            let theta = 2.0 * std::f64::consts::PI * f64::from(i) / f64::from(n);
+            elements.push(stock(
+                i + 1,
+                "n",
+                radius * theta.cos(),
+                radius * theta.sin(),
+            ));
+        }
+        for i in 0..n {
+            let from = i + 1;
+            let to = (i + 1) % n + 1;
+            elements.push(straight_link(100 + i, from, to));
+        }
+        let base = compute_layout_metrics(&make_view(elements.clone()), &cfg());
+
+        // Reverse the element order (links before nodes, nodes reversed); the
+        // graph and its cycles are unchanged.
+        let mut shuffled = elements.clone();
+        shuffled.reverse();
+        let other = compute_layout_metrics(&make_view(shuffled), &cfg());
+
+        assert!(
+            (base.loop_compactness - other.loop_compactness).abs() < 1e-12,
+            "loop_compactness changed under element shuffle: {} vs {}",
+            base.loop_compactness,
+            other.loop_compactness
+        );
+        assert!(base.loop_compactness > 0.0);
+    }
+
+    // --- AC5.2: human-vs-auto reference-pair ordering under the committed weights ---
+    //
+    // The committed `MetricWeights::default()` must agree with the user's visual
+    // taste: on the agreed reference pairs the SHIPPED, hand-authored ("human")
+    // layout must score a lower `weighted_cost` than a machine-generated
+    // ("auto") layout of the SAME model. This is the objective validation of the
+    // calibration (Phase 4, AC5.2): if the metric and the weights did not agree
+    // with human taste on an obvious pair, the metric or the pair would be wrong.
+    //
+    // Construction (b) -- "human view vs generated layout" (design glossary): the
+    // four `default_projects` models each ship a hand-authored main view. We
+    // score that as-loaded view (human) and a fixed-seed `generate_layout_with_config`
+    // layout (auto) of the same model, and assert `human < auto`.
+    //
+    // Determinism + budget: layout is deterministic per seed (fix #633), so ONE
+    // fixed seed (not `generate_best_layout`'s multi-seed search) makes the test
+    // reproducible AND fast. The four default_projects are small (<= 42
+    // elements), so a single layout generation each is well under the per-test
+    // budget.
+    //
+    // Anchors: reliability, fishbanks, population, dp(logistic-growth). These all
+    // flip the right way under the committed weights (verified during
+    // calibration). `sir` is deliberately NOT a human<auto anchor -- its shipped
+    // reference genuinely obscures more labels than the auto layout, so the
+    // metric correctly prefers the auto; that direction is pinned separately by
+    // `test_sir_auto_beats_reference_under_default_weights` so the asymmetry is
+    // documented rather than silently dropped.
+
+    /// A fixed annealing seed for the auto layout. Any single fixed seed makes the
+    /// test deterministic; 42 matches the convention used elsewhere in the layout
+    /// config.
+    const REF_PAIR_SEED: u64 = 42;
+
+    /// Load a `default_projects` XMILE model by directory name, resolving the path
+    /// against `CARGO_MANIFEST_DIR` (= `src/simlin-engine`) like the layout
+    /// integration tests. Panics with a clear message on any I/O or parse failure
+    /// (a missing fixture is a test-environment bug, not a metric result).
+    fn load_default_project(dir: &str) -> datamodel::Project {
+        let path = format!(
+            "{}/../../default_projects/{}/model.xmile",
+            env!("CARGO_MANIFEST_DIR"),
+            dir
+        );
+        let file =
+            std::fs::File::open(&path).unwrap_or_else(|e| panic!("failed to open {path}: {e}"));
+        let mut reader = std::io::BufReader::new(file);
+        crate::compat::open_xmile(&mut reader)
+            .unwrap_or_else(|e| panic!("failed to parse {path}: {e:?}"))
+    }
+
+    /// The model's as-loaded, hand-authored main `StockFlow` view (the "human"
+    /// reference). Panics if the model has no such view -- every chosen anchor
+    /// ships one, so its absence is a fixture regression.
+    fn human_view(project: &datamodel::Project) -> datamodel::StockFlow {
+        let model = project
+            .get_model("main")
+            .expect("anchor model must have a 'main' model");
+        match model.views.first() {
+            Some(datamodel::View::StockFlow(sf)) if !sf.elements.is_empty() => sf.clone(),
+            _ => panic!("anchor model must ship a non-empty hand-authored main view"),
+        }
+    }
+
+    /// `weighted_cost` of the shipped human layout under the committed default
+    /// weights.
+    fn human_cost(project: &datamodel::Project) -> f64 {
+        let view = human_view(project);
+        compute_layout_metrics(&view, &LayoutConfig::default())
+            .weighted_cost(&MetricWeights::default())
+    }
+
+    /// `weighted_cost` of a single fixed-seed generated layout under the committed
+    /// default weights. Deterministic per seed, so the score is reproducible.
+    fn auto_cost(project: &datamodel::Project) -> f64 {
+        let cfg = LayoutConfig {
+            annealing_random_seed: REF_PAIR_SEED,
+            ..LayoutConfig::default()
+        };
+        let view = crate::layout::generate_layout_with_config(project, "main", cfg.clone(), None)
+            .expect("auto layout generation must succeed for the anchor model");
+        compute_layout_metrics(&view, &cfg).weighted_cost(&MetricWeights::default())
+    }
+
+    /// Assert the human reference beats the auto layout for one anchor model,
+    /// naming the model and both costs on failure (so a calibration regression is
+    /// immediately legible).
+    fn assert_human_beats_auto(dir: &str) {
+        let project = load_default_project(dir);
+        let human = human_cost(&project);
+        let auto = auto_cost(&project);
+        assert!(
+            human < auto,
+            "reference pair {dir}: expected human_cost ({human}) < auto_cost ({auto}) \
+             under MetricWeights::default()"
+        );
+    }
+
+    #[test]
+    fn test_reference_pair_reliability_human_beats_auto() {
+        assert_human_beats_auto("reliability");
+    }
+
+    #[test]
+    fn test_reference_pair_fishbanks_human_beats_auto() {
+        assert_human_beats_auto("fishbanks");
+    }
+
+    // Population is a MARGINAL taste anchor: under the committed default weights
+    // its human cost (~0.0521) beats auto (~0.0533) by only ~2.3%, far thinner
+    // than the other anchors (reliability ~8.5%, fishbanks ~12%,
+    // logistic-growth ~58%). The layout is deterministic per seed, so the
+    // assertion is not flaky -- but if it ever fails it should be read as
+    // "population sits near the boundary" rather than necessarily a real metric
+    // regression. The robust signal lives in reliability/fishbanks/logistic-growth.
+    #[test]
+    fn test_reference_pair_population_human_beats_auto() {
+        assert_human_beats_auto("population");
+    }
+
+    #[test]
+    fn test_reference_pair_dp_logistic_growth_human_beats_auto() {
+        assert_human_beats_auto("logistic-growth");
+    }
+
+    #[test]
+    fn test_sir_auto_beats_reference_under_default_weights() {
+        // The documented NON-anchor: SIR's shipped reference obscures more labels
+        // than the auto layout, so the metric correctly prefers the auto. This
+        // pins that direction so the asymmetry (why SIR is excluded from the
+        // human<auto anchors) is recorded rather than silently assumed.
+        let path = format!(
+            "{}/../../test/test-models/samples/SIR/SIR.stmx",
+            env!("CARGO_MANIFEST_DIR")
+        );
+        let file =
+            std::fs::File::open(&path).unwrap_or_else(|e| panic!("failed to open {path}: {e}"));
+        let mut reader = std::io::BufReader::new(file);
+        let project = crate::compat::open_xmile(&mut reader)
+            .unwrap_or_else(|e| panic!("failed to parse {path}: {e:?}"));
+
+        let human = human_cost(&project);
+        let auto = auto_cost(&project);
+        assert!(
+            auto < human,
+            "sir is a documented non-anchor: expected auto_cost ({auto}) < human_cost ({human}) \
+             under MetricWeights::default() (its reference obscures more labels than the auto)"
+        );
+    }
+}
diff --git a/src/simlin-engine/src/layout/mod.rs b/src/simlin-engine/src/layout/mod.rs
index db63e4088..f8705a907 100644
--- a/src/simlin-engine/src/layout/mod.rs
+++ b/src/simlin-engine/src/layout/mod.rs
@@ -6,8 +6,10 @@ pub mod annealing;
 pub mod chain;
 pub mod config;
 pub mod connector;
+pub mod eval_stats;
 pub mod graph;
 pub mod metadata;
+pub mod metrics;
 pub mod placement;
 pub mod sfdp;
 pub mod text;
@@ -71,7 +73,11 @@ struct FlowAttachment {
 /// Result of a single layout generation, used to select the best among parallel attempts.
 struct LayoutResult {
     view: datamodel::StockFlow,
-    crossings: usize,
+    /// The full calibrated layout-quality metric for `view` (Sigma w_i * term_i,
+    /// with `MetricWeights::default()`). `select_best_layout` minimizes this; its
+    /// `crossings` term already captures the accurate connector-crossing count, so
+    /// there is no separate `crossings` field.
+    weighted_cost: f64,
     seed: u64,
 }
 
@@ -1117,23 +1123,38 @@ pub fn diff_connectors(state: &mut LayoutState, metadata: &ComputedMetadata) {
     // Track which old links have been consumed so each is used at most once.
     let mut consumed_old_links: HashSet<(i32, i32)> = HashSet::new();
 
+    // Iterate edges in a deterministic order. `new_edges` is a HashSet, so its
+    // iteration order is per-process random; since each newly-created link both
+    // allocates a sequential `uid` and is appended to `state.elements` in this
+    // loop, hash order would otherwise assign different uids / element ordering
+    // to the same logical link run-to-run (the incremental analogue of #633).
+    let mut sorted_new_edges: Vec<(i32, i32)> = new_edges.iter().copied().collect();
+    sorted_new_edges.sort_unstable();
+
     // Add back preserved links (unchanged) and create new links
-    for &(from_uid, to_uid) in &new_edges {
+    for (from_uid, to_uid) in sorted_new_edges {
         if let Some(old_link) = old_links.get(&(from_uid, to_uid)) {
             // Preserved: keep the old link exactly as-is
             state.elements.push(old_link.clone());
             consumed_old_links.insert((from_uid, to_uid));
-        } else if let Some((&key, old_link)) = old_links.iter().find(|&(&(of, ot), _)| {
-            if consumed_old_links.contains(&(of, ot)) {
-                return false;
-            }
-            let rf = alias_to_primary.get(&of).copied().unwrap_or(of);
-            let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot);
-            rf == from_uid && rt == to_uid
-        }) {
+        } else if let Some(key) = old_links
+            .keys()
+            .copied()
+            .filter(|&(of, ot)| {
+                if consumed_old_links.contains(&(of, ot)) {
+                    return false;
+                }
+                let rf = alias_to_primary.get(&of).copied().unwrap_or(of);
+                let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot);
+                rf == from_uid && rt == to_uid
+            })
+            // Pick the lowest matching key so the alias-match selection is
+            // deterministic; HashMap iteration order would otherwise vary.
+            .min()
+        {
             // Preserved via alias: the old link targets an alias whose primary
             // variable matches this dependency edge. Keep the alias link as-is.
-            state.elements.push(old_link.clone());
+            state.elements.push(old_links[&key].clone());
             consumed_old_links.insert(key);
         } else if let Some((from_ident, to_ident)) = new_edge_idents.get(&(from_uid, to_uid)) {
             // Added: create new link with default shape
@@ -1170,14 +1191,19 @@ pub fn diff_connectors(state: &mut LayoutState, metadata: &ComputedMetadata) {
     // match a valid dependency. Imported views may have multiple rendered
     // connectors for the same dependency (e.g., links to two different
     // aliases of the same variable).
-    for (&(of, ot), old_link) in &old_links {
+    // Iterate in a deterministic order for the same reason as the new-edge loop:
+    // the preserved links are appended to `state.elements`, so HashMap iteration
+    // order would otherwise perturb element ordering run-to-run.
+    let mut sorted_old_links: Vec<&(i32, i32)> = old_links.keys().collect();
+    sorted_old_links.sort_unstable();
+    for &(of, ot) in sorted_old_links {
         if consumed_old_links.contains(&(of, ot)) {
             continue;
         }
         let rf = alias_to_primary.get(&of).copied().unwrap_or(of);
         let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot);
         if new_edges.contains(&(rf, rt)) {
-            state.elements.push(old_link.clone());
+            state.elements.push(old_links[&(of, ot)].clone());
         }
     }
 }
@@ -2454,7 +2480,16 @@ fn run_sfdp_with_rigid_chains(
     let mut center_y = config.start_y;
     let mut count = 0;
 
-    for (var_ident, node_id) in var_to_node {
+    // `var_to_node` is a HashMap, so its iteration order is per-process random.
+    // Two loops below are order-sensitive: the centroid accumulation sums floats
+    // (non-associative, so hash order perturbs the result) and the aux-placement
+    // loop assigns each unpositioned aux a polar seed angle by its iteration rank.
+    // Materialize a deterministic sorted view and iterate THAT in both loops so a
+    // fixed (model, seed) yields a bit-identical layout across repeated calls (#633).
+    let mut entries: Vec<(&String, &String)> = var_to_node.iter().collect();
+    entries.sort();
+
+    for &(var_ident, node_id) in &entries {
         if let Some(uid) = state.uid_manager.get_uid(var_ident)
             && let Some(&pos) = state.positions.get(&uid)
         {
@@ -2489,7 +2524,7 @@ fn run_sfdp_with_rigid_chains(
     }
 
     let mut aux_index = 0;
-    for node_id in var_to_node.values() {
+    for &(_var_ident, node_id) in &entries {
         if initial_layout.contains_key(node_id) {
             continue;
         }
@@ -4327,67 +4362,217 @@ fn detect_chains(
     chains
 }
 
-/// Count edge crossings in a completed StockFlow view.
+/// Whether `p` lies on the segment from flow point `a` to flow point `b`,
+/// within a small pixel tolerance. Used to find the pipe segment a flow's valve
+/// sits on so the valve can be injected as a shared `elem_{flow.uid}` vertex.
 ///
-/// Arc and multi-point link shapes are approximated as straight segments
-/// from source to target position, so counts for diagrams with curved
-/// connectors are approximate.
-pub fn count_view_crossings(view: &datamodel::StockFlow) -> usize {
+/// The perpendicular distance from `p` to the line must be tiny, and `p` must
+/// project within the segment (parameter in `[0, 1]`). A degenerate segment
+/// (`a == b`) only matches when `p` coincides with it.
+fn point_on_segment(
+    p: Position,
+    a: &datamodel::view_element::FlowPoint,
+    b: &datamodel::view_element::FlowPoint,
+) -> bool {
+    const TOL: f64 = 0.5; // pixels
+    let a = Position::new(a.x, a.y);
+    let b = Position::new(b.x, b.y);
+    let ab = b - a;
+    let ap = p - a;
+    let len_sq = ab.dot(ab);
+    if len_sq < f64::EPSILON {
+        // Degenerate segment: only "on" it if p coincides with the point.
+        return ap.dot(ap) < TOL * TOL;
+    }
+    // Project p onto the line; require it to fall within the segment.
+    let t = ap.dot(ab) / len_sq;
+    if !(0.0..=1.0).contains(&t) {
+        return false;
+    }
+    // Perpendicular distance: |ap x ab| / |ab|.
+    let perp = ap.cross_2d(ab).abs() / len_sq.sqrt();
+    perp < TOL
+}
+
+/// Build the set of [`LineSegment`]s that crossing detection runs over for a
+/// completed StockFlow view. This is the single source of geometry shared by
+/// [`count_view_crossings`] and the layout quality metric, so a layout's
+/// crossing score can never disagree with the geometry the renderer draws.
+///
+/// Connector geometry comes from [`crate::diagram::connector::connector_polyline`],
+/// the exact polyline the SVG renderer draws: straight links are clipped to
+/// element boundaries, arcs are sampled along their arc circle, and MultiPoint
+/// links contribute nothing (the renderer draws nothing for them today).
+///
+/// Element endpoints are resolved over *all* element kinds, so a link incident
+/// on a Module or Alias is no longer dropped (the previous chord-based code
+/// only mapped Stock/Flow/Aux/Cloud, silently undercounting such crossings).
+///
+/// Node naming suppresses self- and shared-endpoint "crossings" exactly like
+/// before: a connector's first vertex is `elem_{from_uid}` and its last is
+/// `elem_{to_uid}` (so two connectors sharing an element endpoint never count),
+/// while internal arc-sample vertices are `link_{link.uid}#{i}` (so the
+/// consecutive segments of one arc share an internal node name and never count
+/// as self-crossings).
+///
+/// A flow's pipe vertices share those same `elem_{uid}` names with whatever
+/// element they connect to, so a link incident on the flow grazes but does not
+/// "cross" the pipe at the shared connection point. A point attached to a
+/// stock/cloud is named `elem_{attached_to_uid}` (matching a link whose
+/// endpoint is that stock/cloud), and the flow's valve -- which sits on the
+/// pipe, not necessarily at a stored point -- is injected as an extra vertex
+/// named `elem_{flow.uid}` so a link incident on the valve (its `to_uid`/
+/// `from_uid` is the flow's own element uid) is suppressed there too. A
+/// genuinely free interior point (no attachment, not the valve) keeps the
+/// historic per-flow `flow_{uid}#{i}` name, so a link that crosses the pipe
+/// mid-span -- sharing no element with the flow -- is still counted.
+fn build_view_segments(view: &datamodel::StockFlow) -> Vec<LineSegment> {
     if view.elements.is_empty() {
-        return 0;
+        return Vec::new();
     }
 
-    let mut uid_positions: HashMap<i32, Position> = HashMap::new();
+    // Resolve every element by uid so a link can find its endpoints regardless
+    // of the endpoint's kind (Module/Alias included).
+    let mut uid_elements: HashMap<i32, &ViewElement> = HashMap::new();
     for elem in &view.elements {
-        match elem {
-            ViewElement::Stock(s) => {
-                uid_positions.insert(s.uid, Position::new(s.x, s.y));
-            }
-            ViewElement::Flow(f) => {
-                uid_positions.insert(f.uid, Position::new(f.x, f.y));
-            }
-            ViewElement::Aux(a) => {
-                uid_positions.insert(a.uid, Position::new(a.x, a.y));
-            }
-            ViewElement::Cloud(c) => {
-                uid_positions.insert(c.uid, Position::new(c.x, c.y));
-            }
-            _ => {}
-        }
+        uid_elements.insert(elem.get_uid(), elem);
     }
 
+    // Crossing detection is center-based and deterministic; no element is
+    // treated as arrayed (matching the historic behavior).
+    let not_arrayed = |_: &str| false;
+
     let mut segments: Vec<LineSegment> = Vec::new();
 
     for elem in &view.elements {
         match elem {
             ViewElement::Link(link) => {
-                if let (Some(&from_pos), Some(&to_pos)) = (
-                    uid_positions.get(&link.from_uid),
-                    uid_positions.get(&link.to_uid),
-                ) {
+                let (Some(&from), Some(&to)) = (
+                    uid_elements.get(&link.from_uid),
+                    uid_elements.get(&link.to_uid),
+                ) else {
+                    continue; // an endpoint is genuinely missing
+                };
+
+                let polyline = crate::diagram::connector::connector_polyline(
+                    link,
+                    from,
+                    to,
+                    &not_arrayed,
+                    crate::diagram::connector::ARC_POLYLINE_SAMPLES,
+                );
+                if polyline.len() < 2 {
+                    continue; // MultiPoint / degenerate: nothing drawn
+                }
+
+                let last_idx = polyline.len() - 1;
+                // Name the first vertex after the source element and the last
+                // after the target element so two connectors sharing an element
+                // endpoint are suppressed; name internal vertices per-link so a
+                // connector never crosses itself.
+                let vertex_name = |i: usize| -> String {
+                    if i == 0 {
+                        format!("elem_{}", link.from_uid)
+                    } else if i == last_idx {
+                        format!("elem_{}", link.to_uid)
+                    } else {
+                        format!("link_{}#{}", link.uid, i)
+                    }
+                };
+
+                for i in 0..last_idx {
+                    let a = polyline[i];
+                    let b = polyline[i + 1];
                     segments.push(LineSegment {
-                        start: from_pos,
-                        end: to_pos,
-                        from_node: format!("elem_{}", link.from_uid),
-                        to_node: format!("elem_{}", link.to_uid),
+                        start: Position::new(a.x, a.y),
+                        end: Position::new(b.x, b.y),
+                        from_node: vertex_name(i),
+                        to_node: vertex_name(i + 1),
                     });
                 }
             }
             ViewElement::Flow(flow) => {
-                for i in 0..flow.points.len().saturating_sub(1) {
-                    segments.push(LineSegment {
-                        start: Position::new(flow.points[i].x, flow.points[i].y),
-                        end: Position::new(flow.points[i + 1].x, flow.points[i + 1].y),
-                        from_node: format!("flow_{}#{}", flow.uid, i),
-                        to_node: format!("flow_{}#{}", flow.uid, i + 1),
-                    });
+                if flow.points.len() < 2 {
+                    continue;
+                }
+
+                // Build the pipe as a sequence of named vertices. A point
+                // attached to a stock/cloud shares that element's `elem_{uid}`
+                // name; a free interior point keeps a per-flow `flow_{uid}#{i}`
+                // name. The valve (the flow's own element, at `flow.x/flow.y`)
+                // is injected as an `elem_{flow.uid}` vertex on the pipe segment
+                // whose span contains it, so a link incident on the valve is
+                // suppressed at that shared connection point. Consecutive
+                // segments of one flow always share the joining vertex name, so
+                // a flow never self-crosses.
+                let point_name = |i: usize| -> String {
+                    match flow.points[i].attached_to_uid {
+                        Some(uid) => format!("elem_{uid}"),
+                        None => format!("flow_{}#{}", flow.uid, i),
+                    }
+                };
+
+                let valve = Position::new(flow.x, flow.y);
+                let valve_name = format!("elem_{}", flow.uid);
+                // The pipe segment the valve sits strictly interior to. `None`
+                // when the valve coincides with a stored point or (in a
+                // hand-edited view) drifted off the polyline; the pipe is then
+                // not split and the existing point names hold.
+                let valve_seg = (0..flow.points.len() - 1).find(|&i| {
+                    let a = Position::new(flow.points[i].x, flow.points[i].y);
+                    let b = Position::new(flow.points[i + 1].x, flow.points[i + 1].y);
+                    valve != a
+                        && valve != b
+                        && point_on_segment(valve, &flow.points[i], &flow.points[i + 1])
+                });
+
+                for i in 0..flow.points.len() - 1 {
+                    let a = Position::new(flow.points[i].x, flow.points[i].y);
+                    let b = Position::new(flow.points[i + 1].x, flow.points[i + 1].y);
+                    let a_name = point_name(i);
+                    let b_name = point_name(i + 1);
+
+                    if Some(i) == valve_seg {
+                        // Split this pipe segment at the valve so both halves
+                        // share the `elem_{flow.uid}` vertex.
+                        segments.push(LineSegment {
+                            start: a,
+                            end: valve,
+                            from_node: a_name,
+                            to_node: valve_name.clone(),
+                        });
+                        segments.push(LineSegment {
+                            start: valve,
+                            end: b,
+                            from_node: valve_name.clone(),
+                            to_node: b_name,
+                        });
+                    } else {
+                        segments.push(LineSegment {
+                            start: a,
+                            end: b,
+                            from_node: a_name,
+                            to_node: b_name,
+                        });
+                    }
                 }
             }
             _ => {}
         }
     }
 
-    annealing::count_crossings(&segments)
+    segments
+}
+
+/// Count edge crossings in a completed StockFlow view.
+///
+/// Crossings are counted on the connectors' sampled drawn polylines: straight
+/// links clipped to element boundaries, arcs sampled along their arc circle,
+/// and flow pipes as their point polylines. All element endpoints are resolved
+/// (Module/Alias included), so the count reflects the geometry the renderer
+/// actually draws rather than a straight chord approximation.
+pub fn count_view_crossings(view: &datamodel::StockFlow) -> usize {
+    annealing::count_crossings(&build_view_segments(view))
 }
 
 /// Assemble a [`datamodel::StockFlow`] from finalized layout state, copying
@@ -4432,7 +4617,12 @@ fn build_stock_flow_from_state(
 
 /// Seeds for parallel layout generation. Each seed produces a different SFDP
 /// layout; the one with fewest connector crossings is selected.
-const LAYOUT_SEEDS: [u64; 4] = [42, 123, 456, 789];
+///
+/// These are also the layout-quality sweep's best-of-k production proxy: the
+/// `layout_eval` example scores the best layout over exactly this seed set to
+/// estimate what production (which picks best-of-`LAYOUT_SEEDS`) would ship,
+/// so it is exposed publicly. The value and behavior are unchanged.
+pub const LAYOUT_SEEDS: [u64; 4] = [42, 123, 456, 789];
 
 /// Apply a model patch incrementally to an existing diagram view,
 /// preserving existing element positions and only placing new or
@@ -5078,8 +5268,10 @@ pub fn generate_layout_with_config(
     fresh_layout(model, &metadata, &config)
 }
 
-/// Generate multiple layouts with different seeds in parallel and pick the
-/// one with fewest crossings. On tie, the lowest seed wins.
+/// Generate multiple layouts with different seeds in parallel and pick the one
+/// that minimizes the full calibrated layout-quality metric (`weighted_cost`,
+/// which includes the accurate connector-crossing count alongside node/label
+/// overlap and loop compactness). On tie, the lowest seed wins.
 pub fn generate_best_layout(
     project: &datamodel::Project,
     model_name: &str,
@@ -5095,10 +5287,14 @@ pub fn generate_best_layout(
         let mut cfg = config.clone();
         cfg.annealing_random_seed = seed;
         let view = fresh_layout(model, &metadata, &cfg)?;
-        let crossings = count_view_crossings(&view);
+        // Score the candidate with the full calibrated metric. Its `crossings`
+        // term computes the accurate connector-crossing count internally, so we
+        // no longer call `count_view_crossings` directly here.
+        let metrics = metrics::compute_layout_metrics(&view, &cfg);
+        let weighted_cost = metrics.weighted_cost(&metrics::MetricWeights::default());
         Ok(LayoutResult {
             view,
-            crossings,
+            weighted_cost,
             seed,
         })
     };
@@ -5128,7 +5324,12 @@ pub fn compute_layout_metadata(
     compute_metadata(project, model_name, db_state)
 }
 
-/// Pick the layout with fewest crossings; on tie, the one from the lowest seed.
+/// Pick the layout that minimizes the full calibrated layout-quality metric
+/// (`weighted_cost`); on tie, the one from the lowest seed. NaN-cost candidates
+/// (degenerate layouts) never win over a finite one regardless of position in
+/// the result set; if ALL candidates are NaN the earliest is kept
+/// deterministically. The first `Err` short-circuits, and an empty result set is
+/// an error.
 fn select_best_layout(
     results: Vec<Result<LayoutResult, String>>,
 ) -> Result<datamodel::StockFlow, String> {
@@ -5139,13 +5340,24 @@ fn select_best_layout(
         best = Some(match best {
             None => lr,
             Some(prev) => {
-                if lr.crossings < prev.crossings
-                    || (lr.crossings == prev.crossings && lr.seed < prev.seed)
-                {
-                    lr
+                // NaN-safe and order-independent: a degenerate NaN-cost
+                // candidate never wins over a finite one regardless of which
+                // came first. A plain `<` already drops a NaN *challenger*
+                // (`NaN < finite` is false), but it would NOT let a finite
+                // challenger overtake a NaN *running best* (`finite < NaN` and
+                // `finite == NaN` are both false), so the first seed's NaN would
+                // be sticky. The explicit NaN branches fix that asymmetry. If
+                // ALL candidates are NaN the challenger is never better, so the
+                // earliest is kept -- deterministic regardless.
+                let better = if lr.weighted_cost.is_nan() {
+                    false // a NaN challenger never wins
+                } else if prev.weighted_cost.is_nan() {
+                    true // a finite challenger always beats a NaN running best
                 } else {
-                    prev
-                }
+                    lr.weighted_cost < prev.weighted_cost
+                        || (lr.weighted_cost == prev.weighted_cost && lr.seed < prev.seed)
+                };
+                if better { lr } else { prev }
             }
         });
     }
@@ -5157,3 +5369,11 @@ fn select_best_layout(
 #[cfg(test)]
 #[path = "layout_tests.rs"]
 mod tests;
+
+#[cfg(test)]
+#[path = "crossings_tests.rs"]
+mod crossings_tests;
+
+#[cfg(test)]
+#[path = "layout_selection_tests.rs"]
+mod layout_selection_tests;
diff --git a/src/simlin-engine/tests/layout.rs b/src/simlin-engine/tests/layout.rs
index fb6b4a1a0..4373cab67 100644
--- a/src/simlin-engine/tests/layout.rs
+++ b/src/simlin-engine/tests/layout.rs
@@ -2223,3 +2223,113 @@ fn test_incremental_add_chain_rebuilds_existing_cloud_flow() {
         "chain_flow and waste_flow should not overlap after incremental add (dist={dist})"
     );
 }
+
+/// Count how many elements differ between two views generated for the same
+/// model.  Element ordering is structurally stable (see
+/// `test_layout_structural_consistency`), so a positional comparison can be
+/// done index-by-index; `ViewElement` derives `PartialEq` over its f64
+/// coordinates (and flow `points`), giving an exact byte-for-byte comparison.
+/// Returns `(differing, total)`.
+fn count_layout_differences(
+    a: &simlin_engine::datamodel::StockFlow,
+    b: &simlin_engine::datamodel::StockFlow,
+) -> (usize, usize) {
+    assert_eq!(
+        a.elements.len(),
+        b.elements.len(),
+        "layouts must have the same number of elements to compare"
+    );
+    let differing = a
+        .elements
+        .iter()
+        .zip(b.elements.iter())
+        .filter(|(ea, eb)| ea != eb)
+        .count();
+    (differing, a.elements.len())
+}
+
+/// A layout produced for a fixed (model, annealing_random_seed) must be
+/// bit-identical across repeated serial calls in one process (issue #633).
+/// The RNG is already seeded deterministically; the only remaining source of
+/// run-to-run drift was per-instance-random `HashMap` iteration order inside
+/// `run_sfdp_with_rigid_chains` (centroid float accumulation and aux initial
+/// placement).  SIR has auxiliaries, so it exercises the aux-placement loop.
+#[test]
+fn test_layout_deterministic_per_seed() {
+    let project = load_project("test/test-models/samples/SIR/SIR.stmx");
+
+    let config = LayoutConfig {
+        annealing_random_seed: 42,
+        ..Default::default()
+    };
+
+    let view1 = generate_layout_with_config(&project, MAIN_MODEL, config.clone(), None)
+        .expect("first layout should succeed");
+    let view2 = generate_layout_with_config(&project, MAIN_MODEL, config, None)
+        .expect("second layout should succeed");
+
+    let (differing, total) = count_layout_differences(&view1, &view2);
+    assert_eq!(
+        differing, 0,
+        "layout for a fixed seed must be deterministic: {differing}/{total} elements differ \
+         between two serial calls"
+    );
+}
+
+/// The incremental layout path (`incremental_layout` ->
+/// `compute_new_element_positions`) must also be deterministic for a fixed
+/// model + patch.  This guards against the same class of HashMap-iteration
+/// nondeterminism in the incremental code paths.
+#[test]
+fn test_incremental_layout_deterministic() {
+    use simlin_engine::datamodel;
+    use simlin_engine::layout::incremental_layout;
+    use simlin_engine::{ModelOperation, ModelPatch};
+
+    let project = load_project("test/test-models/samples/SIR/SIR.stmx");
+    let old_view =
+        generate_layout(&project, MAIN_MODEL, None).expect("initial layout should succeed");
+
+    let mut patched_project = project.clone();
+    let model = patched_project.get_model_mut(MAIN_MODEL).unwrap();
+    model
+        .variables
+        .push(datamodel::Variable::Aux(datamodel::Aux {
+            ident: "vaccination_rate".to_string(),
+            equation: datamodel::Equation::Scalar("susceptible * 0.01".to_string()),
+            documentation: String::new(),
+            units: None,
+            gf: None,
+            ai_state: None,
+            uid: None,
+            compat: Default::default(),
+        }));
+
+    let make_patch = || ModelPatch {
+        name: String::new(),
+        ops: vec![ModelOperation::UpsertAux(datamodel::Aux {
+            ident: "vaccination_rate".to_string(),
+            equation: datamodel::Equation::Scalar("susceptible * 0.01".to_string()),
+            documentation: String::new(),
+            units: None,
+            gf: None,
+            ai_state: None,
+            uid: None,
+            compat: Default::default(),
+        })],
+    };
+
+    let new_view1 =
+        incremental_layout(&old_view, &patched_project, MAIN_MODEL, &make_patch(), None)
+            .expect("first incremental layout should succeed");
+    let new_view2 =
+        incremental_layout(&old_view, &patched_project, MAIN_MODEL, &make_patch(), None)
+            .expect("second incremental layout should succeed");
+
+    let (differing, total) = count_layout_differences(&new_view1, &new_view2);
+    assert_eq!(
+        differing, 0,
+        "incremental layout must be deterministic: {differing}/{total} elements differ \
+         between two serial calls"
+    );
+}