diff --git a/docs/README.md b/docs/README.md index b01911620..f96bf8a82 100644 --- a/docs/README.md +++ b/docs/README.md @@ -32,9 +32,11 @@ - [design-plans/2026-05-19-clearn-residual.md](design-plans/2026-05-19-clearn-residual.md) -- Close C-LEARN's residual (#590/#591) as general Vensim import/simulation primitives: arrayed inline graphical functions, import-time macro shadowing, user-macro INITIAL recurrence, residual attribution; 5 phases - [design-plans/2026-05-20-wasm-backend.md](design-plans/2026-05-20-wasm-backend.md) -- WebAssembly code-generation backend: compile a model to one self-contained wasm module as an alternative to the bytecode VM (for fast interactive re-simulation), validated to full VM parity; 8 phases - [design-plans/2026-05-22-engine-wasm-sim.md](design-plans/2026-05-22-engine-wasm-sim.md) -- Integrate the wasm backend into `@simlin/engine` as a selectable engine (`Model.simulate({engine:'wasm'})`): vm-vs-wasm demux below the `Sim` facade in `DirectBackend`, a resumable blob run ABI for `runTo`, and a node VM-vs-wasm benchmark; 4 phases + - [design-plans/2026-05-22-layout-quality-eval.md](design-plans/2026-05-22-layout-quality-eval.md) -- Layout quality evaluation + hill-climbing harness: a pure geometry-accurate `LayoutMetrics` (overlap/sprawl/accurate-arc crossings) and benchstat-style seed-distribution stats, an on-demand corpus sweep that renders and scores layouts against human references, and Rung 0 (rank seeds by `weighted_cost`); 5 phases - [plans/](plans/README.md) -- Implementation plans (active and completed) - [test-plans/](test-plans/) -- Human verification plans for completed features - [test-plans/2026-05-22-engine-wasm-sim.md](test-plans/2026-05-22-engine-wasm-sim.md) -- Manual verification for the `@simlin/engine` selectable wasm engine (`Model.simulate({engine:'wasm'})`): re-running the automated gates, driving the gated/`#[ignore]`d heavy tests, and the human-judged extras (interactive scrubbing feel, VM-vs-wasm benchmark numbers); all 25 ACs already have automated coverage + - [test-plans/2026-05-22-layout-quality-eval.md](test-plans/2026-05-22-layout-quality-eval.md) -- Manual verification for the layout-quality eval: running the on-demand corpus sweep and inspecting its `target/layout-eval/` artifacts (metrics.json, the worst-first contact-sheet), plus the human-judgment calibration gate (best/median/worst ordering, reference-vs-auto scoring, weight magnitudes) - `implementation-plans/` -- Detailed phase-by-phase implementation plans, created during plan execution ## Security diff --git a/docs/design-plans/2026-05-22-layout-quality-eval.md b/docs/design-plans/2026-05-22-layout-quality-eval.md new file mode 100644 index 000000000..41195892a --- /dev/null +++ b/docs/design-plans/2026-05-22-layout-quality-eval.md @@ -0,0 +1,564 @@ +# Layout Quality Evaluation and Hill-Climbing Harness Design + +## Summary + +This work builds a closed-loop measurement and tooling harness around `simlin-engine`'s +automatic diagram layout, so that an agent (or human) can improve layout quality with +evidence instead of guesswork. The core is two **pure** Rust modules that hold all the +logic: a *quality-metric core* (`layout/metrics.rs`) that scores a laid-out diagram on +explicit, scale-free aesthetic cost terms -- node overlap, connectors running through +nodes, label overlap, edge crossings, sprawl, edge-length unevenness, and aspect ratio -- +and collapses them to a single `weighted_cost` scalar; and a *statistics core* +(`layout/eval_stats.rs`) that treats a layout's quality as a distribution over random +seeds, summarizing it with medians, percentiles, a corpus-wide geomean, and a Mann-Whitney +U significance test (the way Go's `benchstat` compares benchmark runs). Crucially, the +metric is computed on the *same geometry the PNG renderer draws*, so a layout's score can +never disagree with how it actually looks. An imperative shell -- an on-demand example +binary (`examples/layout_eval.rs`) -- composes these cores: it sweeps a curated corpus of +models, lays each out across many seeds, scores them, renders the best/median/worst (and any +hand-authored reference view) to PNG, and writes a metrics table plus an HTML contact-sheet. + +The architecture exists to enable a tight iteration loop: change a layout parameter or code +path, run the sweep, read the geomean delta *and look at the rendered contact-sheet*, then +keep or revert based on whether the change is statistically significant and visually better. +The scalar `weighted_cost` is the hill to climb; the rendered images are the guardrail +against optimizing the number while degrading the picture (Goodhart's law); and a small set +of human-vs-AI reference pairs is the objective check that the metric agrees with human +taste. With that loop in place, the design takes only the first, smallest algorithm step -- +"Rung 0," re-pointing seed selection to rank by the full metric instead of crossings alone +-- and protects the gain with a fast deterministic CI guard. Rungs 1-3 (parameter search, +a metric-driven search objective, and new layout passes) are documented as the forward path +the harness is built to support, not built here. + +## Definition of Done + +This work builds the measurement and tooling infrastructure that lets an agent +iteratively improve `simlin-engine`'s automatic diagram layout. It defines *what a good +layout is* (an explicit, geometry-accurate quality metric) and *how to judge outputs* (a +corpus sweep that renders and statistically scores layouts), then takes the first +improvement step (Rung 0). The layout algorithm itself is not redesigned beyond Rung 0; +rungs 1-3 are documented as the forward path. + +Today the layout engine judges a layout by exactly one quantity -- edge-crossing count +(`annealing.rs` simulated-annealing cost; `select_best_layout` seed ranking) -- and there +is no in-repo way to *see* a generated layout outside the browser. This design closes both +gaps. + +1. **A pure `LayoutMetrics` module** (`src/simlin-engine/src/layout/metrics.rs`) computes + scale-free aesthetic *cost* terms (0 = ideal) from a `StockFlow` view, on the same + geometry the PNG renderer draws: `node_overlap`, `node_connector_overlap`, + `label_overlap`, `crossings`, `sprawl`, `edge_length_cv`, and `aspect_penalty`, plus + reserved zero-weighted structure terms. `weighted_cost(&MetricWeights) -> f64` collapses + them to one scalar to minimize. + +2. **Edge crossings are counted on real geometry** -- Arc links sampled to polylines + instead of straight chords -- fixing the chord approximation `count_view_crossings` + (`mod.rs`) applies to `Link`/Arc shapes today (flow polylines are already + segment-sampled). MultiPoint links currently render to nothing; see Additional + Considerations. + +3. **A Rust in-tree corpus sweep driver** (`src/simlin-engine/examples/layout_eval.rs`) + runs over a curated `test/` corpus: for each model it generates layouts across multiple + independent seeds, computes `LayoutMetrics` for each, renders the best/median/worst + layouts to PNG, and -- where the model ships a hand-authored view -- also scores and + renders that view as a reference. No pysimlin or other-binding surface is added. + +4. **The sweep reports statistically**: per-model median + spread over the seed samples, + a corpus geomean-of-medians aggregate, the production best-of-k cost, and a + baseline-vs-candidate comparison using a Mann-Whitney U significance test -- emitted as a + metrics table (JSON) and an HTML contact-sheet (best/median/worst per model with score + breakdowns), written to a gitignored output directory under `target/`. + +5. **Metric weights are calibrated and committed**: initial weights set from the + failure-mode priorities (overlap + crossings dominant; sprawl/aspect moderate; + structure ~0), refined against rendered examples, and validated by a reference-pair + check -- on agreed human-vs-AI model pairs the metric scores the human layout lower + (better) than the worse machine layout. + +6. **Rung 0 is wired in**: `select_best_layout` (`mod.rs`) ranks the candidate seeds by + `weighted_cost` (using the accurate crossing count) instead of crossings-only. + +7. **A deterministic CI regression guard**: a fast test over a few tiny models asserts + `weighted_cost` stays at or below a committed threshold, and the reference-pair ordering + is encoded as a test -- both within the workspace's 3-minute test-time budget. + +8. **The hill-climbing ladder (rungs 1-3) is documented** as the forward path (parameter + search; metric-driven annealing cost; new layout passes), naming the seam each rung + touches. (Satisfied by this plan's Additional Considerations -- no implementation task.) + +### Out of scope +- Redesigning the layout algorithm beyond Rung 0 (rungs 1-3 are documented, not built). +- Exposing metrics or rendering through pysimlin or any non-Rust binding. +- A preference-judging UI or a trained preference model (the explicit metric is the chosen + signal; human preference enters only as up-front calibration). +- SD-structure metrics as *weighted* terms (chain straightness, loop readability) -- the + fields exist but are zero-weighted initially, since these were de-prioritized. + +## Acceptance Criteria + +### layout-quality-eval.AC1: Metric terms are geometry-correct and scale-free +- **layout-quality-eval.AC1.1 Success:** Two node boxes overlapping by a known area yield a `node_overlap` equal to the known overlap fraction. +- **layout-quality-eval.AC1.2 Success:** Pairwise-disjoint nodes yield `node_overlap` = 0. +- **layout-quality-eval.AC1.3 Success:** A connector whose polyline passes through a non-incident node box contributes to `node_connector_overlap`; one that avoids all non-incident boxes yields 0. +- **layout-quality-eval.AC1.4 Success:** Two label boxes overlapping by a known area yield a matching `label_overlap`; non-overlapping labels yield 0. +- **layout-quality-eval.AC1.5 Success:** `aspect_penalty` is 0 inside the target aspect band and positive outside it (a 1x10 bbox is penalized; a ~4:3 bbox is not). +- **layout-quality-eval.AC1.6 Success:** `weighted_cost` equals the exact linear combination Σ wᵢ·termᵢ for given weights. +- **layout-quality-eval.AC1.7 Edge:** An empty or single-element view yields all-zero terms with no NaN or divide-by-zero. +- **layout-quality-eval.AC1.8 Success:** Uniformly scaling all coordinates leaves every normalized term unchanged within tolerance (scale invariance). + +### layout-quality-eval.AC2: Crossings are counted on real geometry +- **layout-quality-eval.AC2.1 Success:** Two connectors that cross once yield a crossing count of 1; connectors sharing an endpoint yield 0. +- **layout-quality-eval.AC2.2 Success:** An Arc connector that visually crosses another edge is counted via polyline sampling, on a constructed case where the straight-chord approximation does not count it. (MultiPoint links currently render to nothing, so faithfully counting them is deferred with that renderer gap -- see Additional Considerations.) +- **layout-quality-eval.AC2.3 Success:** The crossing count is invariant under translation and rotation of the whole view. + +### layout-quality-eval.AC3: Corpus sweep produces renders and scores +- **layout-quality-eval.AC3.1 Success:** `cargo run --release --example layout_eval` runs over the curated corpus and exits 0. +- **layout-quality-eval.AC3.2 Success:** It writes `metrics.json` with per-model term breakdowns + `weighted_cost` and corpus aggregates. +- **layout-quality-eval.AC3.3 Success:** It writes `index.html` referencing best/median/worst PNGs per model with score breakdowns. +- **layout-quality-eval.AC3.4 Success:** Models shipping a hand-authored view get a reference render + score alongside the auto-layout. +- **layout-quality-eval.AC3.5 Success:** All artifacts are written under `target/` (gitignored); nothing is committed. +- **layout-quality-eval.AC3.6 Edge:** A model that fails to lay out or render is reported and skipped, not fatal to the sweep. + +### layout-quality-eval.AC4: Statistical reporting and comparison +- **layout-quality-eval.AC4.1 Success:** Per model, M seeds produce M samples; the report includes median + spread (p25/p75) and the best-of-k production proxy. +- **layout-quality-eval.AC4.2 Success:** The corpus aggregate is the geomean of per-model medians. +- **layout-quality-eval.AC4.3 Success:** A baseline-vs-candidate run reports per-model and aggregate deltas, each with a Mann-Whitney U p-value / significance verdict. +- **layout-quality-eval.AC4.4 Success:** `geomean`, median/percentile, and Mann-Whitney U match known reference values. +- **layout-quality-eval.AC4.5 Edge:** Identical baseline and candidate yield a zero aggregate delta and a non-significant verdict. + +### layout-quality-eval.AC5: Calibration is validated objectively +- **layout-quality-eval.AC5.1 Success:** Committed default `MetricWeights` give overlap and crossings the dominant weights and the reserved structure terms zero weight. +- **layout-quality-eval.AC5.2 Success:** On the agreed human-vs-AI reference pairs, `weighted_cost(human) < weighted_cost(ai)` under the committed weights (encoded as a test). + +### layout-quality-eval.AC6: Rung 0 selection uses the full metric +- **layout-quality-eval.AC6.1 Success:** `select_best_layout` picks the lowest-`weighted_cost` candidate, verified on constructed candidates where the lowest-cost layout has *more* crossings than another candidate (so the choice differs from crossings-only). +- **layout-quality-eval.AC6.2 Success:** The existing layout test suite (`tests/layout.rs`, `layout_tests.rs`, `layout_review_tests.rs`) passes unchanged with the new selection. + +### layout-quality-eval.AC7: CI regression guard +- **layout-quality-eval.AC7.1 Success:** A deterministic test over a few tiny models asserts `weighted_cost` <= a committed threshold and completes well within the test-time budget. +- **layout-quality-eval.AC7.2 Failure:** Raising a guard model's `weighted_cost` above the threshold makes the test fail. + +### layout-quality-eval.AC8: Cross-cutting +- **layout-quality-eval.AC8.1 Success:** A fixed seed reproduces a byte-identical layout (determinism), distinct from the M-seed statistical sampling. +- **layout-quality-eval.AC8.2 Success:** Additional Considerations documents rungs 1-3 and names the seam each touches. (Satisfied by this design document itself; no implementation phase.) + +## Glossary + +- **System dynamics (SD) / stock-and-flow model**: A modeling approach that represents a + system as stocks (accumulations) connected by flows (rates of change) and feedback links. + Simlin builds, simulates, and visualizes these models; their visual form is the "diagram" + whose layout this work scores. +- **StockFlow / `StockFlow` view**: The engine's data structure for a model diagram -- the + collection of `ViewElement`s (and their positions) that make up one visual view of a + model. The metric takes a `&StockFlow` as input. +- **`ViewElement`**: A single positioned item in a `StockFlow` view (a stock, flow, auxiliary + variable, connector, alias, etc.). Layout assigns each one a position. +- **Connector / Arc / MultiPoint / `Flow.points`**: Connectors are the links drawn between + elements. They are not always straight: an Arc is a curved link, a MultiPoint connector + bends through intermediate points, and a flow's pipe follows `Flow.points`. The crossing + count and metric sample these into polylines so curved/bent geometry is measured + faithfully. +- **SFDP**: The force-directed graph layout algorithm used to place nodes (`layout/sfdp.rs`), + treating links as springs and nodes as mutually repelling charges. Its tunable parameters + (`k`, `c`, `p`, spacing constants) are the target of the documented Rung 1 parameter + search. +- **Force-directed layout**: The broader family of layout algorithms (SFDP is one) that + positions nodes by simulating attractive/repulsive forces until the system settles. +- **Simulated annealing (SA)**: The optimization pass (`layout/annealing.rs`) that refines a + layout by randomly perturbing it and accepting changes probabilistically, with the + acceptance probability cooling over time. It currently minimizes edge crossings only; + Rung 2 would feed it the full `weighted_cost`. +- **Edge crossings**: Places where two connectors visually intersect -- a primary source of + diagram clutter, and today the *only* quantity layout optimizes. +- **`count_view_crossings`**: The existing function (`mod.rs`) that counts crossings. Today it + approximates connectors as straight chords; this work refactors it to count on sampled + polylines so arcs and bends are handled correctly. +- **`LAYOUT_SEEDS` / seed sampling**: Production runs layout from four fixed random seeds + (`[42, 123, 456, 789]`) and keeps the best result. Because layout is deterministic per + seed but its quality varies across seeds, the sweep instead samples *many* seeds to + characterize the quality distribution rather than a single lucky/unlucky result. +- **`select_best_layout`**: The function (`mod.rs`) that picks the winning candidate among + the seed runs. Rung 0 re-points it from "fewest crossings" to "lowest `weighted_cost`." +- **`LayoutMetrics` / `weighted_cost` / `MetricWeights`**: The new quality-metric types. + `LayoutMetrics` holds one cost term per aesthetic concern (0 = ideal, all scale-free); + `MetricWeights` is one weight per term; `weighted_cost` is their weighted sum `Σ wᵢ·termᵢ` + -- the single scalar an optimizer minimizes. +- **`render_png` / resvg**: `render_png` (`diagram/render_png.rs`, behind the `png_render` + feature) rasterizes a diagram to a PNG; resvg is the Rust SVG-rendering library it uses. + Because the engine's SVG output is byte-identical to the product's TypeScript renderer, + the PNG faithfully reflects the real UI. +- **geomean (geometric mean)**: The aggregate used to combine per-model median costs across + the corpus. Unlike the arithmetic mean, it averages ratios fairly so one large-cost model + cannot dominate the corpus score. +- **Mann-Whitney U test**: A non-parametric significance test that decides whether two + samples differ. It is used to judge whether a baseline-vs-candidate cost difference is real + signal or seed noise, without assuming the cost distributions are normal. +- **benchstat**: A Go tool that compares benchmark runs by reporting center, spread, and a + significance test over many samples. The statistics core deliberately mirrors its approach + for layout quality. +- **best-of-k**: A "production proxy" statistic -- the minimum cost over k seeds -- that + mirrors what production actually ships (best of the fixed seed set), reported alongside the + full distribution. +- **Reference pair (human-vs-AI)**: An agreed pairing of a hand-authored ("human") layout and + a machine-generated ("AI") layout of the same model. The metric is validated by requiring + `weighted_cost(human) < weighted_cost(ai)` -- an objective check that it agrees with human + taste. +- **Contact-sheet**: The generated `index.html` report -- a grid showing each model's + best/median/worst renders (and any reference view) with their score breakdowns, sorted + worst-first -- inspected every iteration as the visual guardrail. +- **"Rungs" / hill-climbing ladder**: The staged forward path for improving layout. Rung 0 + (built here) changes only seed selection; Rungs 1-3 (documented, not built) are parameter + search, a metric-driven search objective, and new layout passes -- each "rung" a discrete, + measurable step up the quality hill. +- **Goodhart('s law)**: "When a measure becomes a target, it ceases to be a good measure" -- + i.e., any single fitness scalar will eventually be gamed. The contact-sheet renders, + visible per-term breakdowns, and reference-pair test are the design's guards against it. +- **Functional core / imperative shell (FCIS)**: An architectural pattern that isolates pure, + side-effect-free logic (here, `metrics.rs` and `eval_stats.rs`) from the I/O-performing + shell (here, the `layout_eval.rs` example). The cores are heavily unit/property tested; the + shell stays thin. +- **salsa**: The incremental computation framework backing the engine's model database; the + sweep driver syncs the salsa DB before laying out a model, reusing the path that the + existing `tests/layout.rs` uses to load corpus models. + +## Architecture + +The system has three parts, split along the functional-core / imperative-shell line: a +**pure metric core** and a **pure statistics core** that the **imperative sweep driver** +composes. Rendering already exists (`diagram::render_png`) and is reused unchanged. + +### Quality-metric core (`layout/metrics.rs`, pure) + +`compute_layout_metrics(view: &StockFlow, config: &LayoutConfig) -> LayoutMetrics` is a +pure function with no I/O. It is computed on the **same geometry the renderer draws** -- +node bounding boxes, connector paths, and label boxes obtained from the `diagram` module's +existing geometry helpers (`diagram::elements`/`flow` `*_bounds`, `diagram::connector` +path, `diagram::label::label_bounds`) -- so a layout's score and its rendered PNG can never +disagree. Those helpers are `pub fn`, but their modules (`elements`, `flow`, `label`, +`connector`) are private in `diagram/mod.rs` today, so a prerequisite is exposing them +`pub(crate)` for `layout` to call. Every term is a **cost** (0 = ideal) and normalized to be scale-free, so models +of different sizes are comparable and the corpus can be aggregated. + +| Term | Definition (cost; 0 = ideal) | Pain it captures | +|------|------------------------------|------------------| +| `node_overlap` | Σ pairwise node-box overlap area / Σ node area | clutter | +| `node_connector_overlap` | connector-polyline length inside non-incident node boxes / total connector length | connectors under/through nodes | +| `label_overlap` | overlap area among label boxes and label-vs-node boxes / Σ label area | clutter | +| `crossings` | connector-polyline crossings (arcs sampled) / connector count | tangled connectors | +| `sprawl` | mean connector length / characteristic node size | wasted space | +| `edge_length_cv` | stddev/mean of connector lengths | elements drifting far / unevenness | +| `aspect_penalty` | deviation of bbox aspect ratio from a target band | unviewable shape | +| `chain_straightness`, `loop_compactness` | reserved, zero-weighted | (SD structure; deferred) | + +Contract: + +```rust +pub struct LayoutMetrics { + pub node_overlap: f64, + pub node_connector_overlap: f64, + pub label_overlap: f64, + pub crossings: f64, + pub sprawl: f64, + pub edge_length_cv: f64, + pub aspect_penalty: f64, + pub chain_straightness: f64, // reserved, weight 0 + pub loop_compactness: f64, // reserved, weight 0 +} + +pub struct MetricWeights { /* one f64 per term */ } + +impl LayoutMetrics { + /// Σ wᵢ·termᵢ — the scalar an optimizer minimizes. + pub fn weighted_cost(&self, w: &MetricWeights) -> f64; +} +``` + +`node_overlap`/`node_connector_overlap`, `crossings`, and the sprawl terms pull in opposite +directions (compact vs. non-overlapping). That tension is intended: the weights set the +balance, and the overlap terms keep "minimize area" from collapsing the layout. + +**Accurate crossings.** The `crossings` term, and a refactored `count_view_crossings`, +operate on connector geometry sampled to polylines (Arc links plus `Flow.points`), not +straight chords. This requires factoring the arc geometry -- currently entangled with +SVG-string emission in `connector::render_arc` (which returns a `String`) -- into a polyline +producer shared by the renderer and the metric, so both see identical geometry. This is the +highest-effort item in Phase 1, and the factor-out must keep `render_svg` byte-for-byte +unchanged (a TS-vs-Rust parity test asserts it). It both feeds the metric and fixes a latent +undercount in today's seed selection. (MultiPoint links currently render to an empty group, +so they have no drawn geometry to match; they are a known gap, not measured here.) + +### Statistics core (`layout/eval_stats.rs`, pure) + +Layout is deterministic at a fixed seed (RNGs are `StdRng::seed_from_u64`; no entropy +source; the `par_iter` over seeds preserves order), so a specific layout is exactly +reproducible. But a layout's *quality is a distribution over seed space*, and production +samples it at the four fixed `LAYOUT_SEEDS` and takes the min. Evaluating a change on one +fixed seed-set conflates a real improvement with seed luck. The statistics core treats +evaluation the way Go's `benchstat` treats benchmarks: many samples, center + spread, and a +significance test on differences. + +```rust +pub struct MetricSample { pub seed: u64, pub metrics: LayoutMetrics, pub weighted_cost: f64 } + +pub struct ModelStats { + pub model: String, + pub samples: Vec, // one per seed + pub median_cost: f64, + pub spread: (f64, f64), // e.g. (p25, p75) + pub best_of_k_cost: f64, // production proxy: min over k seeds + pub best_seed: u64, pub median_seed: u64, pub worst_seed: u64, +} + +pub struct CorpusReport { pub per_model: Vec, pub geomean_of_medians: f64 } + +/// Per-model and aggregate delta, each with a Mann-Whitney U p-value (non-parametric; +/// robust to the non-normal cost distributions layout produces). +pub fn compare(baseline: &CorpusReport, candidate: &CorpusReport) -> Comparison; +``` + +`geomean` (not arithmetic mean) aggregates normalized ratios across heterogeneous models so +one large-cost model can't dominate; `median`/percentiles summarize each model's +distribution; Mann-Whitney U decides whether a baseline-vs-candidate delta is signal or +noise. All are pure, table-testable functions. + +### Sweep driver (`examples/layout_eval.rs`, imperative shell) + +The shell loads each model in a curated corpus list (XMILE via `open_xmile` and Vensim via +`open_vensim`, as `examples/backend_bench.rs` does, then salsa-syncs the project as the +DB-backed layout tests do), and for each model: + +1. Runs layout for M independent seeds, producing M `MetricSample`s (and the best-of-k + production proxy). The per-seed seam is the existing `generate_layout_with_config` + (`mod.rs`, `pub`) -- its single `annealing_random_seed` drives both the SFDP and + annealing RNGs -- or the equivalent `generate` closure inside `generate_best_layout`. +2. Renders the best/median/worst layouts to PNG via `diagram::render_png` (after writing + the generated `StockFlow` onto the model's view, which `render_png` reads as + `views.first()`). +3. If the model file ships a non-empty hand-authored view, renders and scores that view + untouched as a **reference**. + +It then emits, to a gitignored dir under `target/layout-eval/`: +- `metrics.json` -- per-model `ModelStats` with term breakdowns, plus corpus aggregates. +- `index.html` -- a contact-sheet sorted worst-cost-first; each cell shows the + best/median/worst renders (and the reference, where present) with their metric + breakdowns; the header shows corpus geomean and the baseline delta with significance. +- baseline diff -- `compare()` against a small committed `baseline.json`, printed and + embedded in the report. + +The driver declares `required-features = ["png_render", "file_io"]` and is run on demand +(`cargo run --release --example layout_eval`); it is not part of `cargo test`. + +### Rung 0 wiring + +`select_best_layout` (`mod.rs`) currently keeps the candidate with the fewest crossings +(tie-break on seed). Rung 0 changes it to keep the candidate with the lowest +`weighted_cost` (computed with the accurate crossing count), tie-break on seed. This is the +smallest, immediately-measurable improvement: "best of the candidate seeds" becomes "best +by the full metric." It changes only selection, not the search. + +### The iteration loop this enables + +Change a parameter or code path -> run the sweep -> read `metrics.json` *and look at the +contact-sheet* -> keep or revert based on the geomean delta and its significance, guarded by +the rendered images. The scalar `weighted_cost` is the hill; the renders are the guardrail +against gaming it (Goodhart); the reference pairs are the objective check that the metric +agrees with human taste. + +## Existing Patterns + +Investigation grounded every touch point in current code; this design adds pure modules and +one in-tree example, and re-points one existing decision function. + +- **Layout module and decision seams.** `src/simlin-engine/src/layout/` holds `mod.rs` + (orchestration; `count_view_crossings`; `select_best_layout`; `generate_best_layout` + running the `LAYOUT_SEEDS = [42,123,456,789]` candidates via `par_iter`), `sfdp.rs` + (force placement, `StdRng::seed_from_u64`), and `annealing.rs` (crossings-only SA cost). + This design adds `metrics.rs` and `eval_stats.rs` beside them and edits + `select_best_layout`. Terminology (SFDP, annealing, pinned nodes, chains) follows + `docs/design-plans/2026-03-27-incremental-layout.md`. +- **Rendering already exists.** `src/simlin-engine/src/diagram/` provides `render.rs` + (`render_svg`), `render_png.rs` (`render_png` / `svg_to_png`, resvg + embedded + Roboto-Light, behind the `png_render` feature), with geometry in `elements.rs`, + `flow.rs`, `connector.rs`, `label.rs` (`label_bounds`), `common.rs` (`Rect`, + `calc_view_box`), and shared `constants.rs`. The metric reuses these geometry helpers so + scores match the rendered image -- but only `common`/`constants` are `pub mod` today, so + the others must be exposed (see Architecture). `render_svg` is asserted byte-identical to + the TS renderer by `src/diagram/tests/svg-rendering.test.ts`, so the PNG faithfully + reflects the product UI -- and that test is the tripwire any connector-geometry refactor + must not break. +- **In-tree example precedent.** `src/simlin-engine/examples/backend_bench.rs` is an + existing on-demand example (auto-discovered; loads models via `std::fs` + + `open_vensim`/`open_xmile`). `examples/layout_eval.rs` follows its shape; the + `required-features` mechanism (used today by the crate's `[[test]]` entries, not by any + example) means adding a new `[[example]]` block to `Cargo.toml`. +- **Corpus loading.** `tests/layout.rs` loads XMILE via `load_project`/`open_xmile`; its + DB-backed tests show the salsa-sync-then-layout pattern (`SimlinDb::default()` -> + `sync_from_datamodel_incremental` -> pass `Some((&mut db, source_project))`). The sweep + combines that with `open_vensim` for the Vensim `test/metasd` models. (`verify_layout` + itself is only an assertion helper, not a loader.) +- **Test-time budget.** Per `CLAUDE.md` / `docs/dev/rust.md`, `cargo test --workspace` + runs under a 3-minute cap and individual tests complete in seconds. The full corpus sweep + therefore stays in the example (not in tests); only a tiny deterministic guard runs in the + test suite. +- **FCIS.** Pure cores (`metrics.rs`, `eval_stats.rs`) hold all logic and are unit/property + tested to the project's coverage bar; the example is a thin imperative shell. + +No pattern divergence: pure functions beside existing pure layout code, one example beside +an existing example, one edit to an existing selection function. + +## Implementation Phases + + +### Phase 1: Quality-metric core + accurate crossings +**Goal:** A pure, geometry-accurate `LayoutMetrics` and a polyline-based crossing count. + +**Components:** +- Expose the `diagram` geometry modules (`elements`, `flow`, `label`, `connector`) as + `pub(crate)` -- they are private today, so `layout::metrics` cannot call their `*_bounds` / + path helpers without this. +- `src/simlin-engine/src/layout/metrics.rs` (new) -- `LayoutMetrics`, `MetricWeights`, + `compute_layout_metrics(view, config)`, `weighted_cost`. Each term computed on the + `diagram` module's geometry helpers. +- Connector arc-to-polyline geometry factored out of `connector::render_arc` (highest-effort + item; geometry is currently entangled with SVG-string building), reused by the renderer and + the metric. The renderer must be re-routed through it without changing its output. +- `count_view_crossings` (`mod.rs`) refactored to count on polylines instead of straight + chords (Arc/`Link` shapes; flow polylines are already sampled). +- Unit tests on hand-built tiny views with known geometry (two boxes overlapping by a known + fraction; two segments crossing once; shared-endpoint connectors -> 0; a 1x10 bbox -> + known aspect penalty; an arc that crosses where its chord would not). Property tests: + overlap symmetric and scale-invariant; crossings invariant under translation/rotation. + +**Dependencies:** none. + +**Done when:** the metric terms match the hand-computed values, scale/translation +invariance holds, the polyline crossing count differs from the old chord count on the +constructed arc case, `render_svg` output is unchanged (the `svg-rendering.test.ts` parity +test still passes), and `cargo test` passes. Covers `layout-quality-eval.AC1.*`, +`layout-quality-eval.AC2.*`. + + + +### Phase 2: Statistics core +**Goal:** Pure aggregation and significance testing for seed-sample distributions. + +**Components:** +- `src/simlin-engine/src/layout/eval_stats.rs` (new) -- `MetricSample`, `ModelStats`, + `CorpusReport`, `Comparison`; `geomean`, `median`/percentile, and a Mann-Whitney U test; + `compare(baseline, candidate)` producing per-model and aggregate deltas with p-values. +- Unit tests against known reference values (geomean of a known set; Mann-Whitney U on + textbook samples; identical baseline/candidate -> zero delta, non-significant). + +**Dependencies:** Phase 1 (the `LayoutMetrics` type embedded in `MetricSample`). + +**Done when:** the helpers match known values and `compare()` reports the expected +significance verdicts. Covers `layout-quality-eval.AC4.4`, `layout-quality-eval.AC4.5`. + + + +### Phase 3: Corpus sweep driver and report +**Goal:** An on-demand sweep that lays out, scores, renders, and reports over the corpus. + +**Components:** +- `src/simlin-engine/examples/layout_eval.rs` (new) -- loads a curated corpus list + (canonical SIR/teacup/logistic-growth; modules; multipoint connectors; LTM/loop models; + aliases; the `test/ai-information` set; a few large `test/metasd` Vensim models) via + `open_xmile`/`open_vensim` + salsa sync, runs M seeds per model, scores each, renders + best/median/worst PNGs, and scores+renders any shipped hand-authored view as a reference. +- The per-seed seam: wrap `generate_layout_with_config` (`mod.rs`) or the `generate` closure + in `generate_best_layout`, varying `annealing_random_seed` per sample, so the driver can + sample seeds and compute the best-of-k proxy. +- Emits `metrics.json`, `index.html` contact-sheet, and a `compare()` diff against a + committed `baseline.json`, under `target/layout-eval/` (gitignored). +- A new `[[example]]` entry in `Cargo.toml` with `required-features = ["png_render", + "file_io"]` (no example uses `required-features` today; `file_io` helps load Vensim models + that reference external data, and AC3.6 skip-on-failure covers any that still fail). + +**Dependencies:** Phase 1 (metric), Phase 2 (stats). + +**Done when:** `cargo run --release --example layout_eval` completes, writes the JSON + +contact-sheet referencing best/median/worst (and reference) renders, reports per-model +median+spread / corpus geomean / best-of-k and a baseline delta with significance, places +artifacts under `target/`, and skips (reports, non-fatally) any model that fails to lay out +or render. Covers `layout-quality-eval.AC3.*`, `layout-quality-eval.AC4.1`, +`layout-quality-eval.AC4.2`, `layout-quality-eval.AC4.3`. + + + +### Phase 4: Calibration and reference-pair validation +**Goal:** Commit metric weights that match the user's taste, validated objectively. + +**Components:** +- Committed default `MetricWeights` (overlap + crossings dominant; sprawl/aspect moderate; + structure terms 0), set via a talk-through over the Phase 3 contact-sheet, treating the + user's "this layout is better than that" judgments as ordering constraints on the linear + cost. +- A reference-pair fixture (agreed human-vs-AI model pairs, e.g. from `test/ai-information`) + and a test asserting `weighted_cost(human) < weighted_cost(ai)` under the committed + weights. + +**Dependencies:** Phase 3 (need the contact-sheet to calibrate against), Phase 1. + +**Done when:** the committed weights satisfy the reference-pair ordering test, and the user +has signed off on the weights after reviewing the contact-sheet. Covers +`layout-quality-eval.AC5.*`. + + + +### Phase 5: Rung 0 wiring + CI regression guard +**Goal:** Make seed selection use the full metric, and protect the gains in normal dev. + +**Components:** +- `select_best_layout` (`mod.rs`) re-pointed to minimize `weighted_cost` (accurate + crossings), tie-break on seed. +- A deterministic regression-guard test over a few tiny models asserting `weighted_cost` + stays at or below a committed threshold (fixed seeds; fast; under the time budget), plus a + determinism check (the same seed reproduces a byte-identical layout). +- Confirm existing layout tests (`tests/layout.rs`, `layout_tests.rs`, + `layout_review_tests.rs`) still pass with the new selection. + +**Dependencies:** Phase 1 (metric), Phase 4 (committed weights). + +**Done when:** selection picks the lowest-`weighted_cost` candidate (verified on +constructed candidates where lowest-cost differs from fewest-crossings), the guard + +determinism tests pass within budget, and the existing layout suite is green. Covers +`layout-quality-eval.AC6.*`, `layout-quality-eval.AC7.*`, `layout-quality-eval.AC8.1`. + + +## Additional Considerations + +**The hill-climbing ladder beyond this plan (rungs 1-3).** Rung 0 (Phase 5) is the only +algorithm change built here. The forward path, each rung measured by the Phase 3 sweep with +the Phase 2 significance gate and guarded by the rendered contact-sheet: + +- **Rung 1 -- parameter search.** Sweep SFDP `k`, `c`, `p`, the spacing constants, the seed + count, and SA temperature/iterations (`config.rs`, `sfdp.rs`, `annealing.rs`) against the + corpus geomean. No algorithm change; pure config search (grid/coordinate descent). +- **Rung 2 -- metric-driven search objective.** Feed `weighted_cost` into the SA acceptance + delta (`annealing.rs`, currently `perturbed_crossings - current_crossings`) so the search + optimizes the full metric, not just crossings. Higher leverage but costlier per + perturbation than a crossing count, so it is a deliberate, measured experiment -- and may + use a cheap subset of terms in the inner loop. +- **Rung 3 -- new passes.** Targeted code such as an overlap-removal post-pass or + obstacle-aware connector routing, each validated against the corpus. + +**Goodhart guard.** A scalar fitness will be gamed by any optimizer. Three mitigations are +built in: per-term breakdowns stay visible (not just the scalar); the contact-sheet's +best/median/worst renders are inspected every iteration (a change that improves the number +but worsens the picture means the *metric* is wrong, not the layout); and the reference-pair +test fails if weights stop agreeing with human-judged-better layouts. + +**Determinism vs. statistical sampling.** These serve different needs. The CI guard uses +fixed seeds (deterministic, fast, flake-free). The interactive sweep varies seeds to +characterize the algorithm's quality distribution, because a single fixed-seed measurement +cannot distinguish a real improvement from seed luck. A specific bad layout remains exactly +reproducible by its seed for debugging. + +**Sweep cost.** M seeds x corpus x (layout + a few renders) is minutes-scale on the large +`test/metasd` models; acceptable for an on-demand example, which is why it is not in the +test suite. M and the large-model tier are configurable. + +**Metric/render geometry agreement.** Computing the metric from the renderer's own geometry +helpers (rather than the `LayoutConfig` element sizes) guarantees the score reflects what +the PNG shows -- including the connector-polyline sampling that both the renderer and the +crossing count share. diff --git a/docs/test-plans/2026-05-22-layout-quality-eval.md b/docs/test-plans/2026-05-22-layout-quality-eval.md new file mode 100644 index 000000000..714220c42 --- /dev/null +++ b/docs/test-plans/2026-05-22-layout-quality-eval.md @@ -0,0 +1,85 @@ +# Test Plan: Layout Quality Evaluation + +Human verification plan for the layout-quality-eval feature (implementation plan +`docs/implementation-plans/2026-05-22-layout-quality-eval/`). The automated suite +proves the metric math, the selection rule, and per-seed determinism. This plan +covers what automated tests cannot: that the on-demand corpus **sweep** emits the +right artifacts, and that the **human-judgment** calls (best/median/worst +ordering, reference-vs-auto scoring, weight magnitudes) match a modeler's eye. +This is the gate for AC3.*, AC4.1-4.3, and the human-in-the-loop part of AC5. + +## Prerequisites + +- Repo at a commit including the layout-quality-eval branch, clean working tree. + Run `./scripts/dev-init.sh`. +- Toolchain that can build `resvg` (the `png_render` feature): + `cargo build -p simlin-engine --features png_render,file_io --example layout_eval` + should finish without error. +- A browser to open `target/layout-eval/index.html`, and a JSON viewer / `jq` + for `target/layout-eval/metrics.json`. +- Automated gate already green: + `cargo test -p simlin-engine --lib layout::` and + `cargo test -p simlin-engine --features file_io --test layout`. + +## Phase 1: Time-boxed smoke run (fast confidence) + +| Step | Action | Expected | +|------|--------|----------| +| 1 | `LAYOUT_EVAL_MODELS=teacup,sir LAYOUT_EVAL_SEEDS=4 cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval` | Exits 0 (AC3.1). stdout prints a per-model `sir: median=… p25/p75=…/… best_of_k=… (M=4)` line and `corpus: geomean_of_medians=… (2 model(s) scored)`. | +| 2 | `ls target/layout-eval/` | Contains `metrics.json`, `index.html`, and PNGs: `sir_best/median/worst/reference.png`, `teacup_best/median/worst/reference.png`. | +| 3 | `git status --porcelain target/` | Empty — nothing under `target/` is tracked (AC3.5). | + +## Phase 2: Full corpus sweep + artifact inspection + +| Step | Action | Expected | +|------|--------|----------| +| 1 | `cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval` (no env overrides: all corpus keys, M=25) | Exits 0. Each model prints its median/spread/best-of-k line; corpus aggregate at the end. Runtime is minutes (deliberately kept out of `cargo test`). | +| 2 | Open `target/layout-eval/metrics.json` | Valid JSON. Each `per_model[]` has the full `LayoutMetrics` breakdown (`node_overlap`, `node_connector_overlap`, `label_overlap`, `crossings`, `sprawl`, `edge_length_cv`, `aspect_penalty`, `loop_compactness`, `chain_straightness`) + `weighted_cost`, `median_cost`, `spread`, `best_of_k_cost`, `best/median/worst_seed`. Top level has `geomean_of_medians` and the `weights` set (AC3.2). | +| 3 | Verify AC4.2 by hand: collect each model's `median_cost`, compute their (epsilon-floored) geometric mean, compare to `geomean_of_medians` | The two agree to a few decimals. | +| 4 | Open `target/layout-eval/index.html` in a browser | Contact sheet sorted **worst weighted_cost first**. Each model row shows best/median/worst (and reference where present) thumbnails with a per-term cost breakdown and the `median / p25/p75 / best_of_k / M=25` summary (AC3.3). Header shows `geomean_of_medians` and the weight set. | + +## Phase 3: Human-judgment checks (the calibration gate, AC5.1 / AC5.2) + +These are the calls only a human can make; sign-off here closes the +human-in-the-loop component of AC5. + +| Step | Action | Expected (human judgment) | +|------|--------|---------------------------| +| 1 (best/median/worst ordering) | For 3-4 models (e.g. `sir`, `fishbanks`, `reliability`, `population`), look at the three generated thumbnails side by side | "best" should genuinely look cleanest (fewest overlaps/crossings, labels readable); "worst" messiest. If the metric's "best" looks worse than its "worst", that is calibration feedback — record it, do not silently accept it. | +| 2 (reference vs auto) | For each model shipping a `*_reference.png`, compare it to that model's `*_best.png` and read both `weighted_cost` values | For `reliability`, `fishbanks`, `population`, `logistic-growth`: the hand-authored reference should both look cleaner and carry the lower `weighted_cost` (the human` so a multi-slot arrayed agg's `Δsource` denominator carries the projected `agg[]` subscript instead of the bare agg name, which wouldn't compile as a scalar), `substitute_reducers_in_expr0` (textually replaces a recognized reducer subexpression in an `Expr0` with its agg name, for the `$⁚ltm⁚agg⁚{n} → target` link score), `resolve_link_score_name_for_loop` (picks the Bare-or-FixedIndex link-score name a loop-score reference should target). Module link score formulas (black-box delta-ratio and composite-ref) are inlined directly into `link_score_equation_text` in `db.rs`. - **`src/ltm_post.rs`** - Post-simulation relative loop score computation. `compute_rel_loop_scores(results, loop_partitions)` normalizes each loop's `loop_score` series against the sum of absolute scores within its cycle partition, using SAFEDIV-0 semantics (zero denominator -> zero result). Called after simulation rather than emitted as synthetic equations to avoid O(P^2) equation-text growth on models with dense partitions. - **LTM open work**: known LTM bugs and improvements are tracked on GitHub under the `ltm` label; issue #488 is the pinned epic that organises them by area (core algorithm, discovery/post-sim, augmentation, module/array umbrellas). Each open `ltm`-labelled issue carries file:line references and a suggested fix, so a new session can pick a bite-sized piece without re-investigating the subsystem. -- **`src/diagram/`** - Diagram/sketch rendering: `elements.rs`, `connector.rs`, `flow.rs`, `render.rs`, `common.rs`, `constants.rs`, `label.rs`, `arrowhead.rs` -- **`src/layout/`** - Automatic diagram layout generation (available on all targets including WASM; uses serial fallback when rayon is unavailable). Two entry points: `generate_best_layout()` (public) generates a complete diagram from scratch; `incremental_layout()` (public) preserves existing element positions and adds/removes only what changed. Submodules: `sfdp.rs` (force-directed placement), `annealing.rs` (crossing reduction), `chain.rs` (stock-flow chain positioning), `config.rs` (layout parameters including `module_width`/`module_height`), `connector.rs` (link routing), `graph.rs` (graph data structures), `metadata.rs` (feedback loops, dominant periods), `placement.rs` (label optimization, normalization), `text.rs` (label sizing), `uid.rs` (UID management), `layout_tests.rs` (unit tests for composable layout blocks and incremental operations). `LayoutState` is the public mutable state struct used by both paths: `LayoutState::new()` for fresh layout, `LayoutState::from_existing_view()` for incremental. Incremental helpers: `identify_new_elements()`, `compute_new_element_positions()`, `settle_new_elements()`, `diff_connectors()`, `diff_clouds()`, `apply_deletion()`, `apply_rename()`. The convenience wrappers `generate_best_layout()` and `generate_layout_with_config()` remain as the primary public API for callers. Generates view elements for modules (not just stocks/flows/auxes). +- **`src/diagram/`** - Diagram/sketch rendering: `elements.rs`, `connector.rs`, `flow.rs`, `render.rs`, `common.rs`, `constants.rs`, `label.rs`, `arrowhead.rs`. The `connector`/`elements`/`flow`/`label` submodules are `pub(crate)` so the layout-quality metric (`layout::metrics`) can reuse the exact same geometry the SVG renderer draws (a layout's score can never disagree with what is rendered). `connector.rs` exposes `pub(crate) connector_polyline` -- the polyline the renderer draws for a connector: straight links clipped to element boundaries (matching `render_straight_line`), arcs sampled along the arc circle (`ARC_POLYLINE_SAMPLES`, byte-identical to `render_arc`'s SVG), MultiPoint links returning empty (nothing is drawn for them today). `common.rs` carries the shared `Rect`/`Point`/`Circle` geometry plus `pub(crate)` rect/segment helpers (`rect_area`, `rect_overlap_area`, `rect_contains_point`, `segment_length_in_rect`, `rect_width`/`rect_height`). `elements.rs`/`flow.rs` expose label-free `*_shape_bounds` (`aux_shape_bounds`, `stock_shape_bounds`, `flow_shape_bounds`) alongside the label-merged `*_bounds`, so the metric can charge node-shape overlap and connector-under-shape against the bare shape and label-vs-label overlap separately (no double-count). +- **`src/layout/`** - Automatic diagram layout generation (available on all targets including WASM; uses serial fallback when rayon is unavailable). Two entry points: `generate_best_layout()` (public) generates a complete diagram from scratch; `incremental_layout()` (public) preserves existing element positions and adds/removes only what changed. Submodules: `sfdp.rs` (force-directed placement), `annealing.rs` (crossing reduction), `chain.rs` (stock-flow chain positioning), `config.rs` (layout parameters including `module_width`/`module_height`), `connector.rs` (link routing), `graph.rs` (graph data structures), `metadata.rs` (feedback loops, dominant periods), `placement.rs` (label optimization, normalization), `text.rs` (label sizing), `uid.rs` (UID management), `metrics.rs` and `eval_stats.rs` (the layout-quality metric and its eval statistics; see below), `layout_tests.rs`/`crossings_tests.rs`/`layout_selection_tests.rs` (unit tests for composable layout blocks, crossing geometry, and best-of-k selection). `LayoutState` is the public mutable state struct used by both paths: `LayoutState::new()` for fresh layout, `LayoutState::from_existing_view()` for incremental. Incremental helpers: `identify_new_elements()`, `compute_new_element_positions()`, `settle_new_elements()`, `diff_connectors()`, `diff_clouds()`, `apply_deletion()`, `apply_rename()`. The convenience wrappers `generate_best_layout()` and `generate_layout_with_config()` remain as the primary public API for callers. Generates view elements for modules (not just stocks/flows/auxes). + - **Deterministic per seed (#633)**: `fresh_layout` and the incremental `diff_connectors` produce a bit-identical layout for a fixed `(model, seed)` across repeated calls. HashMap iteration order is per-process random, so every layout-affecting iteration over a `HashMap`/`HashSet` is materialized into a sorted `Vec` first: `run_sfdp_with_rigid_chains`'s `var_to_node` centroid/aux-placement loops, and `diff_connectors`'s new-edge / alias-match / preserved-link loops (which allocate sequential uids and append to `state.elements`). + - **Best-of-k selection by the calibrated metric**: `generate_best_layout` runs `LAYOUT_SEEDS` (now `pub` -- the eval sweep uses the same seed set as its production proxy) in parallel and `select_best_layout` picks the candidate minimizing `metrics::compute_layout_metrics(view, cfg).weighted_cost(&MetricWeights::default())` -- the full calibrated readability metric, NOT fewest crossings. Selection is NaN-safe (a degenerate NaN-cost layout never wins over a finite one regardless of order; all-NaN keeps the earliest) and ties break to the lowest seed. The `LayoutResult` struct carries `weighted_cost` (no separate `crossings` field; the metric's `crossings` term computes the accurate count internally). + - **`count_view_crossings` / `build_view_segments`**: `build_view_segments` is the single source of crossing geometry, shared with `metrics.rs`. Connector geometry comes from `diagram::connector::connector_polyline` (the exact drawn polyline: straight links clipped to boundaries, arcs sampled), and ALL element kinds are resolved by uid (Module/Alias links are no longer silently dropped -- the previous chord-based code only mapped Stock/Flow/Aux/Cloud). Vertex naming suppresses self- and shared-endpoint crossings (`elem_{uid}` endpoints, per-link `link_{uid}#{i}` interior arc samples); a flow's valve is injected as an `elem_{flow.uid}` pipe vertex so a link incident on the valve no longer miscounts as crossing the pipe. +- **`src/layout/metrics.rs`** - Functional-Core layout-quality metric. `compute_layout_metrics(view, config) -> LayoutMetrics` is pure (no I/O), guaranteed finite (each division guards a zero denominator with 0), so empty/single-element views score all-zero. `LayoutMetrics` per-term costs (0.0 = ideal): `node_overlap` (pairwise node-shape-box overlap), `node_connector_overlap` (connector length under non-incident node shapes -- both on label-free shape boxes), `label_overlap` (per-label obscured fraction), `crossings`, `sprawl`, `edge_length_cv`, `aspect_penalty` (beyond the `TARGET_AR_MAX = 16:9` band), `loop_compactness` (mean isoperimetric `1 - Q` over feedback cycles), and the reserved `chain_straightness` (always 0.0). `LayoutMetrics::weighted_cost(&MetricWeights)` is `Sigma w_i * term_i`. `MetricWeights::default()` is the calibrated readability-dominant production set (overlap/crossings family at 1.0; `sprawl`/`edge_length_cv`/`aspect_penalty` deliberately 0.0 -- spreading out for legibility is good, not penalized; `loop_compactness` a gentle 0.25; `chain_straightness` 0.0). Both structs derive `Serialize`/`Deserialize` purely so the eval sweep can emit/round-trip its JSON artifacts. +- **`src/layout/eval_stats.rs`** - Functional-Core benchstat-style statistics for the layout-quality seed-sample sweep: `geomean`/`percentile`/`median`/`mann_whitney_u` (non-parametric significance test) plus the `MetricSample`/`ModelStats`/`CorpusReport`/`Comparison` aggregation types and `compare(baseline, candidate)`. No I/O; every primitive returns a finite documented default (`0.0`, or a non-significant `p_value` of `1.0`) on empty/degenerate input, never NaN. ## Utilities @@ -150,7 +155,7 @@ The unit subsystem is partial-result throughout: a single bad declaration or one - **`tests/simulate_systems.rs`** - Systems format simulation integration tests (fixtures in `test/systems-format/`) - **`tests/simulate_ltm.rs`** - LTM feature tests - **`tests/systems_roundtrip.rs`** - Systems format parse-translate-write round-trip tests -- **`tests/layout.rs`** - Layout generation integration tests (chains, connectors, modules, LTM metadata, dominant periods, incremental layout operations) +- **`tests/layout.rs`** - Layout generation integration tests (chains, connectors, modules, LTM metadata, dominant periods, incremental layout operations, and the per-seed bit-identical-layout determinism guard for both the fresh and incremental paths -- #633) - **`tests/json_roundtrip.rs`** - JSON serialization roundtrip - **`tests/roundtrip.rs`** - XMILE/MDL roundtrip tests - **`tests/vm_alloc.rs`** - VM memory allocation tests @@ -160,3 +165,4 @@ The unit subsystem is partial-result throughout: a single bad declaration or one - **`benches/compiler.rs`** - Compiler pipeline benchmarks on real models (WRLD3, C-LEARN) - **`benches/simulation.rs`** - VM execution and compilation benchmarks (synthetic models) - **`benches/array_ops.rs`** - Array operation benchmarks (sum, broadcast, element-wise) +- **`examples/layout_eval.rs`** - On-demand layout-quality corpus sweep (gated `[[example]]` with `required-features = ["png_render", "file_io"]`, so the default `--all-targets` build skips it). Scores each model's best-of-`LAYOUT_SEEDS` layout with `metrics::compute_layout_metrics`, renders best/median/worst plus the reference PNGs, and emits `metrics.json` + `index.html` + a diff against the committed `examples/layout_eval_baseline.json` under `target/` (see `examples/layout_eval_baseline.README.md`) diff --git a/src/simlin-engine/Cargo.toml b/src/simlin-engine/Cargo.toml index c02081eed..de3c1f39a 100644 --- a/src/simlin-engine/Cargo.toml +++ b/src/simlin-engine/Cargo.toml @@ -115,6 +115,15 @@ name = "compiler_vector" name = "vdf_alias_decoder" required-features = ["file_io"] +# The layout_eval example calls the png_render-gated `render_png` and loads +# Vensim corpus models that reference external data (file_io). Examples are +# auto-discovered and built by `--all-targets` / clippy / pre-commit under the +# DEFAULT feature set (which excludes png_render); without this `[[example]]` +# entry pinning required-features, that build would fail to compile the example. +[[example]] +name = "layout_eval" +required-features = ["png_render", "file_io"] + [[bench]] name = "array_ops" harness = false diff --git a/src/simlin-engine/examples/layout_eval.rs b/src/simlin-engine/examples/layout_eval.rs new file mode 100644 index 000000000..2199d5d70 --- /dev/null +++ b/src/simlin-engine/examples/layout_eval.rs @@ -0,0 +1,1194 @@ +// Copyright 2026 The Simlin Authors. All rights reserved. +// Use of this source code is governed by the Apache License, +// Version 2.0, that can be found in the LICENSE file. + +//! Layout-quality evaluation sweep (on-demand; NOT part of `cargo test`). +//! +//! Lays out a curated corpus of models across many seeds, scores each layout +//! with the layout-quality metric, renders best/median/worst (and any +//! hand-authored reference) to PNG, and writes a metrics table (JSON), an HTML +//! contact-sheet, and a baseline diff -- all under a gitignored `target/` dir. +//! +//! This is a thin imperative shell over the metric core +//! (`layout::metrics::compute_layout_metrics`) and the statistics core +//! (`layout::eval_stats`). It loads each model via the public `open_xmile` / +//! `open_vensim` loaders (like `examples/backend_bench.rs`), runs +//! `generate_layout_with_config` per seed, scores, summarizes, renders, and +//! emits artifacts. +//! +//! Usage: +//! cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval +//! LAYOUT_EVAL_MODELS=teacup,sir cargo run ... --example layout_eval +//! +//! Env knobs: +//! LAYOUT_EVAL_MODELS comma list of corpus keys to run (default: all) +//! LAYOUT_EVAL_SEEDS number of seeds M to sample (default: 25) +//! LAYOUT_EVAL_OUT output directory (default: repo-root target/layout-eval) +//! LAYOUT_EVAL_WRITE_BASELINE 1 -> write this run's report to the committed +//! baseline JSON (see below) instead of diffing. +//! +//! Baseline diff: a committed `examples/layout_eval_baseline.json` (a serialized +//! `CorpusReport`) records a reference run. A normal run reads it back, runs +//! `compare(baseline, candidate)`, and embeds the per-model + aggregate deltas +//! (with Mann-Whitney U p-values / significance verdicts) into `metrics.json` +//! and the `index.html` header. With `LAYOUT_EVAL_WRITE_BASELINE=1` the run +//! instead overwrites that baseline file (re-seed it after the metric weights +//! change). If the file is absent a normal run skips the diff with a note. +//! +//! Requires `--features png_render,file_io`: `png_render` for `render_png`, and +//! `file_io` so Vensim corpus models that reference external data can load. + +use std::collections::BTreeSet; +use std::env; +use std::fmt::Write as _; +use std::io::BufReader; + +use rayon::prelude::*; +use serde::Serialize; +use simlin_engine::diagram::{PngRenderOpts, render_png}; +use simlin_engine::layout::LAYOUT_SEEDS; +use simlin_engine::layout::config::LayoutConfig; +use simlin_engine::layout::eval_stats::{ + Comparison, CorpusReport, MetricSample, ModelStats, compare, +}; +use simlin_engine::layout::generate_layout_with_config; +use simlin_engine::layout::metrics::{LayoutMetrics, MetricWeights, compute_layout_metrics}; +use simlin_engine::{datamodel, open_vensim, open_xmile}; + +/// The model name the layout pipeline and renderer operate on. `Project::get_model` +/// maps "main" to the single/main model (matching `tests/layout.rs`). +const MAIN_MODEL: &str = "main"; + +/// Default number of seeds to sample per model when `LAYOUT_EVAL_SEEDS` is unset. +const DEFAULT_SEEDS: u64 = 25; + +/// Path (relative to `CARGO_MANIFEST_DIR` = `src/simlin-engine`) of the committed +/// baseline `CorpusReport`. This file lives in the SOURCE TREE by design (it is +/// checked in and diffed against on every normal run), unlike every other +/// artifact, which is written under the gitignored `target/` output dir. +const BASELINE_REL_PATH: &str = "examples/layout_eval_baseline.json"; + +// ── Corpus ───────────────────────────────────────────────────────────────── + +#[derive(Clone, Copy)] +enum Format { + Xmile, + Vensim, +} + +struct ModelSpec { + key: &'static str, + /// Path relative to CARGO_MANIFEST_DIR (src/simlin-engine). + rel_path: &'static str, + format: Format, +} + +use Format::{Vensim, Xmile}; + +/// The curated corpus. Paths are relative to `CARGO_MANIFEST_DIR` +/// (`src/simlin-engine`); all 15 were verified to exist on disk. +const CORPUS: &[ModelSpec] = &[ + // canonical small + ModelSpec { + key: "teacup", + rel_path: "../../test/test-models/samples/teacup/teacup.stmx", + format: Xmile, + }, + ModelSpec { + key: "sir", + rel_path: "../../test/test-models/samples/SIR/SIR.stmx", + format: Xmile, + }, + ModelSpec { + key: "logistic_growth", + rel_path: "../../test/logistic_growth_ltm/logistic_growth.stmx", + format: Xmile, + }, + // default_projects: the app's curated, hand-laid-out built-in projects. + // These are the primary "good layout" taste anchors for Phase 4 calibration. + ModelSpec { + key: "fishbanks", + rel_path: "../../default_projects/fishbanks/model.xmile", + format: Xmile, + }, + ModelSpec { + key: "dp_logistic_growth", + rel_path: "../../default_projects/logistic-growth/model.xmile", + format: Xmile, + }, + ModelSpec { + key: "population", + rel_path: "../../default_projects/population/model.xmile", + format: Xmile, + }, + ModelSpec { + key: "reliability", + rel_path: "../../default_projects/reliability/model.xmile", + format: Xmile, + }, + // modules + ModelSpec { + key: "hares_and_foxes", + rel_path: "../../test/modules_hares_and_foxes/modules_hares_and_foxes.stmx", + format: Xmile, + }, + // multipoint connectors + ModelSpec { + key: "multipoint", + rel_path: "../../test/test-models/samples/display/multipoint-connection.stmx", + format: Xmile, + }, + // aliases + ModelSpec { + key: "alias1", + rel_path: "../../test/alias1/alias1.stmx", + format: Xmile, + }, + // LTM / loop models + ModelSpec { + key: "cross_element", + rel_path: "../../test/cross_element_ltm/cross_element.stmx", + format: Xmile, + }, + ModelSpec { + key: "arrayed_pop", + rel_path: "../../test/arrayed_population_ltm/arrayed_population.stmx", + format: Xmile, + }, + // ai-information reference set (human vs AI; used by Phase 4 calibration) + ModelSpec { + key: "ai_pure_human", + rel_path: "../../test/ai-information/PureHumanModel.stmx", + format: Xmile, + }, + ModelSpec { + key: "ai_pure_ai", + rel_path: "../../test/ai-information/PureAIModel.stmx", + format: Xmile, + }, + ModelSpec { + key: "ai_edited", + rel_path: "../../test/ai-information/GeneratedByAIThenEdited.stmx", + format: Xmile, + }, + ModelSpec { + key: "ai_modules_arrays", + rel_path: "../../test/ai-information/WithModulesAndArrays.stmx", + format: Xmile, + }, + // large metasd Vensim + ModelSpec { + key: "wrld3_03", + rel_path: "../../test/metasd/WRLD3-03/wrld3-03.mdl", + format: Vensim, + }, + ModelSpec { + key: "beer_game", + rel_path: "../../test/metasd/beer-game/RealBeer4-Sterman13.mdl", + format: Vensim, + }, + ModelSpec { + key: "wonderland", + rel_path: "../../test/metasd/wonderland/Wonderland3.mdl", + format: Vensim, + }, +]; + +/// Resolve a corpus-relative path against the crate manifest dir. +fn abs_path(rel: &str) -> String { + format!("{}/{}", env!("CARGO_MANIFEST_DIR"), rel) +} + +/// Load one corpus model, dispatching on its declared format: XMILE through a +/// buffered reader + `open_xmile`, Vensim `.mdl` through a string + `open_vensim` +/// (mirrors `examples/backend_bench.rs`). Returns a human-readable error on any +/// I/O or parse failure so the caller can WARN-and-skip (AC3.6). +fn load_model(spec: &ModelSpec) -> Result { + let path = abs_path(spec.rel_path); + match spec.format { + Format::Xmile => { + let file = + std::fs::File::open(&path).map_err(|e| format!("failed to open {path}: {e}"))?; + let mut reader = BufReader::new(file); + open_xmile(&mut reader).map_err(|e| format!("failed to parse {path}: {e:?}")) + } + Format::Vensim => { + let contents = std::fs::read_to_string(&path) + .map_err(|e| format!("failed to read {path}: {e}"))?; + open_vensim(&contents).map_err(|e| format!("failed to parse {path}: {e:?}")) + } + } +} + +/// Count the view elements in the model's as-loaded main view -- the diagram +/// the later tasks score and render. A model with no hand-authored view yields +/// 0 here (its layout is generated from scratch in Task 2). +fn loaded_element_count(project: &datamodel::Project) -> usize { + reference_view(project) + .map(|sf| sf.elements.len()) + .unwrap_or(0) +} + +/// Borrow the model's as-loaded main `StockFlow` view if it is a hand-authored +/// reference: a non-empty view carrying non-empty `elements`. A model loaded +/// without a saved diagram (its layout is generated from scratch in the sweep) +/// has no such view, so this returns `None` and the caller skips the reference +/// render. +fn reference_view(project: &datamodel::Project) -> Option<&datamodel::StockFlow> { + let model = project.get_model(MAIN_MODEL)?; + match model.views.first() { + Some(datamodel::View::StockFlow(sf)) if !sf.elements.is_empty() => Some(sf), + _ => None, + } +} + +// ── Env knobs ──────────────────────────────────────────────────────────────── + +/// The set of corpus keys to run. `LAYOUT_EVAL_MODELS` is a comma list of keys; +/// unset/empty means the whole corpus. Unknown keys are reported and dropped so +/// a typo does not silently run nothing without explanation. +fn selected_keys() -> Vec<&'static str> { + let Ok(raw) = env::var("LAYOUT_EVAL_MODELS") else { + return CORPUS.iter().map(|s| s.key).collect(); + }; + let requested: Vec<&str> = raw + .split(',') + .map(str::trim) + .filter(|s| !s.is_empty()) + .collect(); + if requested.is_empty() { + return CORPUS.iter().map(|s| s.key).collect(); + } + let mut keys = Vec::new(); + for want in requested { + match CORPUS.iter().find(|s| s.key == want) { + Some(spec) => keys.push(spec.key), + None => eprintln!("WARN: unknown model key {want:?}; skipping"), + } + } + keys +} + +/// Number of seeds M to sample per model (`LAYOUT_EVAL_SEEDS`, default 25). +fn seed_count() -> u64 { + env::var("LAYOUT_EVAL_SEEDS") + .ok() + .and_then(|v| v.parse().ok()) + .unwrap_or(DEFAULT_SEEDS) +} + +/// The seeds to sample: the union of the production best-of-k proxy +/// (`LAYOUT_SEEDS`) and `0..m`, deduped and sorted. Including `LAYOUT_SEEDS` +/// guarantees the best-of-k production proxy is always computable regardless of +/// `m`. +fn seed_set(m: u64) -> Vec { + let mut seeds: BTreeSet = (0..m).collect(); + seeds.extend(LAYOUT_SEEDS); + seeds.into_iter().collect() +} + +/// The output directory (`LAYOUT_EVAL_OUT`, default repo-root +/// `target/layout-eval`, derived from `CARGO_MANIFEST_DIR`). +fn out_dir() -> String { + env::var("LAYOUT_EVAL_OUT") + .unwrap_or_else(|_| format!("{}/../../target/layout-eval", env!("CARGO_MANIFEST_DIR"))) +} + +/// Whether to (re)seed the committed baseline instead of diffing against it. +/// True when `LAYOUT_EVAL_WRITE_BASELINE` is set to a truthy value (`1`/`true`, +/// case-insensitive). Any other value -- and an unset variable -- means a normal +/// diffing run. +fn write_baseline_requested() -> bool { + matches!( + env::var("LAYOUT_EVAL_WRITE_BASELINE") + .unwrap_or_default() + .trim() + .to_ascii_lowercase() + .as_str(), + "1" | "true" + ) +} + +/// Absolute path of the committed baseline `CorpusReport` JSON. Resolved against +/// `CARGO_MANIFEST_DIR` so it always points at the source-tree file regardless +/// of the working directory the example runs from. +fn baseline_path() -> String { + format!("{}/{}", env!("CARGO_MANIFEST_DIR"), BASELINE_REL_PATH) +} + +// ── Per-model seed sweep ───────────────────────────────────────────────────── + +/// Lay out `project`'s main model once for each `seed`, score each layout, and +/// summarize the samples into a `ModelStats`. +/// +/// The per-seed layouts run in parallel via rayon (mirroring +/// `generate_best_layout`'s `par_iter` over seeds). The parallel results are +/// collapsed back into `seeds`-order before being summarized, so the sample +/// vector -- and every statistic derived from it -- is invariant to rayon's +/// scheduling: parallelism introduces no nondeterminism here. +/// +/// `generate_layout_with_config` is deterministic per seed (fix #633): the same +/// `(model, seed)` pair produces the identical layout on repeated calls within +/// and across processes, so the reported median/spread are reproducible. +/// +/// A seed whose layout fails to generate is dropped with a WARN (a single bad +/// seed must not sink the whole model's sweep). A model whose layout fails on +/// EVERY seed yields an empty `samples` vector here; the caller +/// (`process_model`) treats that zero-usable-samples case as a model-level +/// failure and skips the model (`WARN: skipping {key}: ...`), so a model that +/// never lays out is omitted from the report rather than reported as a +/// degenerate all-zero entry (AC3.6). +fn sweep_model(key: &str, project: &datamodel::Project, seeds: &[u64]) -> ModelStats { + // Compute one (seed, sample) per seed in parallel, then sort back into seed + // order so the sample vector -- and therefore every statistic derived from + // it -- is independent of rayon's scheduling. + let mut indexed: Vec<(u64, MetricSample)> = seeds + .par_iter() + .filter_map(|&seed| { + let cfg = LayoutConfig { + annealing_random_seed: seed, + ..LayoutConfig::default() + }; + match generate_layout_with_config(project, MAIN_MODEL, cfg.clone(), None) { + Ok(view) => { + let metrics = compute_layout_metrics(&view, &cfg); + let weighted_cost = metrics.weighted_cost(&MetricWeights::default()); + Some(( + seed, + MetricSample { + seed, + metrics, + weighted_cost, + }, + )) + } + Err(err) => { + eprintln!("WARN: {key} seed {seed} failed to lay out: {err}"); + None + } + } + }) + .collect(); + + indexed.sort_by_key(|(seed, _)| *seed); + let samples: Vec = indexed.into_iter().map(|(_, sample)| sample).collect(); + + ModelStats::from_samples(key.to_string(), samples, &LAYOUT_SEEDS) +} + +// ── Rendering ──────────────────────────────────────────────────────────────── + +/// One rendered diagram: the PNG filename written under the out dir (relative, +/// so the Task-4 `index.html` can reference it with a sibling ``) and +/// the metric breakdown of the view that was rendered. The seed is `Some` for a +/// generated render (best/median/worst) and `None` for the as-loaded reference. +/// +/// `seed`, `metrics`, and `weighted_cost` are read by Task 4: the report builder +/// serializes them into `metrics.json` and the contact-sheet's per-render +/// breakdown table. They are kept as data here (rather than dropped and +/// recomputed) so the report builder is a pure read over this struct. +struct Render { + /// Filename of the PNG, relative to the out dir (e.g. `sir_best.png`). + file: String, + /// The seed that produced the generated view (`None` for the reference). + seed: Option, + /// Per-term metrics of the rendered view. + metrics: LayoutMetrics, + /// Scalar weighted cost under the calibrated default weights. + weighted_cost: f64, +} + +/// All renders produced for one model: the optional hand-authored reference and +/// the three generated layouts (best/median/worst). Task 4 serializes these +/// per-model metric breakdowns into `metrics.json` and the contact-sheet, so the +/// fields are kept as data the report can read back. A render that failed is +/// `None` (the failure was already WARN-logged) -- skip-on-failure feeds Task 6. +struct ModelRenders { + reference: Option, + best: Option, + median: Option, + worst: Option, +} + +/// Render one view to a PNG file under `out`, scoring it with the default +/// layout config (the metric core is config-driven only for node sizing, which +/// is constant across the sweep). On any render or write failure, WARN to +/// stderr and return `None` so the sweep continues (AC3.6). +/// +/// `project` must already carry the view to render as its main view's first +/// view (the renderer reads `model.views.first()`). The caller installs the +/// view (a clone of the project for a generated layout, or the as-loaded +/// project for the reference) before calling. +fn render_view( + project: &datamodel::Project, + metrics: LayoutMetrics, + seed: Option, + file: &str, + out: &str, +) -> Option { + let png = match render_png(project, MAIN_MODEL, &PngRenderOpts::default()) { + Ok(bytes) => bytes, + Err(err) => { + eprintln!("WARN: failed to render {file}: {err}"); + return None; + } + }; + let path = format!("{out}/{file}"); + if let Err(err) = std::fs::write(&path, &png) { + eprintln!("WARN: failed to write {path}: {err}"); + return None; + } + let weighted_cost = metrics.weighted_cost(&MetricWeights::default()); + Some(Render { + file: file.to_string(), + seed, + metrics, + weighted_cost, + }) +} + +/// Regenerate the view for `seed`, install it into a clone of `project`, render +/// it to `{key}_{suffix}.png`, and return the `Render`. A layout-generation +/// failure is non-fatal: WARN and return `None`. +fn render_generated( + key: &str, + suffix: &str, + project: &datamodel::Project, + seed: u64, + out: &str, +) -> Option { + let cfg = LayoutConfig { + annealing_random_seed: seed, + ..LayoutConfig::default() + }; + let view = match generate_layout_with_config(project, MAIN_MODEL, cfg.clone(), None) { + Ok(view) => view, + Err(err) => { + eprintln!("WARN: {key} {suffix} (seed {seed}) failed to lay out: {err}"); + return None; + } + }; + let metrics = compute_layout_metrics(&view, &cfg); + // Install the generated view into a clone so the as-loaded project (and its + // reference view) is never mutated. + let mut p = project.clone(); + p.get_model_mut(MAIN_MODEL).unwrap().views = vec![datamodel::View::StockFlow(view)]; + let file = format!("{key}_{suffix}.png"); + render_view(&p, metrics, Some(seed), &file, out) +} + +/// Render the model's best/median/worst generated layouts and -- if the model +/// ships a hand-authored view -- its reference, all to PNGs under `out`. +/// +/// The reference is rendered from the AS-LOADED `project` (before any view is +/// overwritten) so it captures the model's own diagram, not a generated one. +/// Generated layouts are each regenerated from `project` by seed and installed +/// into a fresh clone, leaving `project` untouched. +fn render_model( + key: &str, + project: &datamodel::Project, + stats: &ModelStats, + out: &str, +) -> ModelRenders { + // Reference first, from the as-loaded project, before any clone-and-install. + // Score the hand-authored `StockFlow` directly (the renderer reads the same + // view from `project`, so this is the geometry being rasterized). + let reference = reference_view(project).and_then(|sf| { + let metrics = compute_layout_metrics(sf, &LayoutConfig::default()); + render_view(project, metrics, None, &format!("{key}_reference.png"), out) + }); + + // A model whose sweep produced no samples has all-zero seeds and nothing + // worth rendering; skip the generated renders (the reference, if any, is + // already captured). + if stats.samples.is_empty() { + return ModelRenders { + reference, + best: None, + median: None, + worst: None, + }; + } + + let best = render_generated(key, "best", project, stats.best_seed, out); + let median = render_generated(key, "median", project, stats.median_seed, out); + let worst = render_generated(key, "worst", project, stats.worst_seed, out); + + ModelRenders { + reference, + best, + median, + worst, + } +} + +/// Print the PNG filenames produced for one model (and note a skipped reference +/// or generated render) so a run's stdout records exactly what was written. +fn report_renders(key: &str, renders: &ModelRenders) { + let mut produced: Vec<&str> = Vec::new(); + for render in [ + &renders.reference, + &renders.best, + &renders.median, + &renders.worst, + ] + .into_iter() + .flatten() + { + produced.push(render.file.as_str()); + } + if produced.is_empty() { + println!("{key}: no PNGs rendered"); + } else { + println!("{key}: rendered {}", produced.join(", ")); + } + if renders.reference.is_none() { + println!("{key}: no hand-authored reference view (skipped reference render)"); + } +} + +// ── Per-model pipeline (skip-on-failure) ───────────────────────────────────── + +/// Run one model's full pipeline -- load -> seed sweep -> render -- and return +/// its `(ModelStats, ModelRenders)` on success. +/// +/// This is the model-level skip-on-failure boundary (AC3.6): EVERY way a single +/// model can fail funnels through the returned `Err(String)`, which `main` turns +/// into a `WARN: skipping {key}: {err}` and a continue to the next model, so one +/// bad model never aborts the sweep and is simply omitted from the report. +/// +/// Three failure modes, validated in the order data flows (defense-in-depth): +/// 1. **Load failure** (entry layer): a missing file or a parse error is +/// already surfaced as `Err(String)` by `load_model`; propagated with `?`. +/// 2. **No usable layout** (business layer): `sweep_model` drops each +/// individually-failing seed with a WARN but still returns a (possibly +/// empty) `ModelStats`. A model whose layout failed on EVERY seed has zero +/// samples and cannot be scored, rendered, or aggregated -- it is a +/// model-level failure here, returned as `Err`. Crucially this only fires +/// when ALL seeds failed: a model with even one usable sample proceeds, so +/// a partial per-seed failure never sinks the model. +/// 3. **Render failure** (handled inside `render_model`): a layout that scores +/// but fails to rasterize or write is non-fatal -- it is WARN-logged and +/// its `Render` is `None`. A model can therefore appear in the report with +/// its statistics but a missing PNG cell; this is intentionally NOT a +/// model-level skip (the scores are still meaningful). +fn process_model( + spec: &ModelSpec, + seeds: &[u64], + out: &str, +) -> Result<(ModelStats, ModelRenders), String> { + // 1. Load (entry-layer validation lives in `load_model`). + let project = load_model(spec)?; + + let n = loaded_element_count(&project); + println!("loaded {}: {n} elements", spec.key); + + // 2. Sweep. A model with zero usable samples laid out on no seed -- it is a + // model-level failure, not a degenerate all-zero report entry. + let stats = sweep_model(spec.key, &project, seeds); + if stats.samples.is_empty() { + return Err(format!( + "no usable layout: all {} seed(s) failed to lay out", + seeds.len(), + )); + } + + let (p25, p75) = stats.spread; + println!( + "{}: median={:.4} p25/p75={:.4}/{:.4} best_of_k={:.4} (M={})", + spec.key, + stats.median_cost, + p25, + p75, + stats.best_of_k_cost, + stats.samples.len(), + ); + + // 3. Render best/median/worst (and the reference, if any). Render failures + // are non-fatal: `render_model` WARN-logs and leaves the cell `None`. + let renders = render_model(spec.key, &project, &stats, out); + report_renders(spec.key, &renders); + + Ok((stats, renders)) +} + +// ── Report (metrics.json + index.html) ────────────────────────────────────── +// +// The structs below are the on-disk JSON shape. They are PURE DATA built once +// from the in-memory `ModelStats` + `ModelRenders` the sweep produced, then +// serialized straight to disk -- no recomputation. The contact-sheet HTML is +// rendered from the same `EvalReport`, so the JSON table and the HTML can never +// disagree. Building the report and rendering the HTML are pure (the only I/O +// is the two `std::fs::write` calls in `main`). + +/// One rendered view's row in the JSON: the PNG filename, the seed that +/// produced it (`None` for the as-loaded reference), the full per-term +/// `LayoutMetrics` breakdown, and the scalar `weighted_cost` under the weights +/// in use. +#[derive(Serialize)] +struct RenderReport { + file: String, + seed: Option, + metrics: LayoutMetrics, + weighted_cost: f64, +} + +/// One model's full row in the JSON: its summary statistics (the seed-sweep +/// center/spread, the best-of-k production proxy, the chosen best/median/worst +/// seeds, and `m` -- the number of seeds actually swept) plus each of its +/// renders' per-term breakdowns (`reference` present only when the model ships +/// a hand-authored view). +#[derive(Serialize)] +struct ModelReport { + model: String, + /// Number of seeds swept for this model (the union of `LAYOUT_SEEDS` and + /// `0..M`, deduped). Recorded so a reader can interpret the spread. + m: usize, + median_cost: f64, + /// `(p25, p75)` of the per-seed weighted costs. + spread: (f64, f64), + /// Production proxy: min weighted cost over the `LAYOUT_SEEDS` seed set. + best_of_k_cost: f64, + best_seed: u64, + median_seed: u64, + worst_seed: u64, + /// The hand-authored reference render + score, when the model ships one. + reference: Option, + best: Option, + median: Option, + worst: Option, +} + +/// The top-level `metrics.json` document: every scored model plus the corpus +/// aggregates (the geomean of per-model medians and the weight set used). +/// +/// `baseline_comparison` carries the baseline-vs-candidate diff (per-model + +/// aggregate deltas with Mann-Whitney p-values) when a committed baseline JSON +/// is present; it is `None` (and serde-skipped) when there is no baseline to +/// diff against. A reader therefore sees the diff embedded directly in the JSON, +/// or no `baseline_comparison` key at all. +#[derive(Serialize)] +struct EvalReport { + /// Models sorted worst-cost-first (highest `median_cost` at the front), the + /// same order the contact-sheet renders so the JSON and HTML agree. + models: Vec, + /// Geometric mean of the per-model medians -- the single headline aggregate. + geomean_of_medians: f64, + /// The `MetricWeights` used to compute every `weighted_cost` in this report. + weights: MetricWeights, + /// The baseline-vs-candidate diff, present only when a committed baseline + /// `CorpusReport` was found and compared against this run. + #[serde(skip_serializing_if = "Option::is_none")] + baseline_comparison: Option, +} + +/// Map an in-memory `Render` to its JSON row. +fn render_report(render: &Render) -> RenderReport { + RenderReport { + file: render.file.clone(), + seed: render.seed, + metrics: render.metrics, + weighted_cost: render.weighted_cost, + } +} + +/// Build the serializable report from the sweep's in-memory results. +/// +/// PURE: a read over `(per_model, renders)` (paired positionally -- they are +/// pushed together per model in `main`) plus the corpus `geomean_of_medians` +/// and the weight set. Models are sorted worst-cost-first (highest median at +/// the front), the order the contact-sheet inspects top-down as the visual +/// guardrail; ties break on the model name so the order is deterministic. +fn build_report( + per_model: &[ModelStats], + renders: &[ModelRenders], + geomean_of_medians: f64, + weights: &MetricWeights, + baseline_comparison: Option, +) -> EvalReport { + let mut models: Vec = per_model + .iter() + .zip(renders.iter()) + .map(|(stats, render)| ModelReport { + model: stats.model.clone(), + m: stats.samples.len(), + median_cost: stats.median_cost, + spread: stats.spread, + best_of_k_cost: stats.best_of_k_cost, + best_seed: stats.best_seed, + median_seed: stats.median_seed, + worst_seed: stats.worst_seed, + reference: render.reference.as_ref().map(render_report), + best: render.best.as_ref().map(render_report), + median: render.median.as_ref().map(render_report), + worst: render.worst.as_ref().map(render_report), + }) + .collect(); + + // Worst-cost-first: highest median at the front. Sort descending by median, + // tie-break on model name (ascending) for a deterministic ordering. NaN + // medians can't occur (eval_stats guarantees finite costs), but guard the + // partial_cmp anyway so a hypothetical NaN never panics the sort. + models.sort_by(|a, b| { + b.median_cost + .partial_cmp(&a.median_cost) + .unwrap_or(std::cmp::Ordering::Equal) + .then_with(|| a.model.cmp(&b.model)) + }); + + EvalReport { + models, + geomean_of_medians, + weights: *weights, + baseline_comparison, + } +} + +/// HTML-escape the five characters that are special in element text or +/// attribute values. The interpolated strings are static model keys and +/// PNG filenames derived from them, so this is defense-in-depth rather than a +/// live injection vector -- but escaping unconditionally keeps the artifact +/// well-formed if a corpus key ever gains a special character. +fn html_escape(s: &str) -> String { + let mut out = String::with_capacity(s.len()); + for ch in s.chars() { + match ch { + '&' => out.push_str("&"), + '<' => out.push_str("<"), + '>' => out.push_str(">"), + '"' => out.push_str("""), + '\'' => out.push_str("'"), + _ => out.push(ch), + } + } + out +} + +/// Render the per-term metric breakdown for one render as a compact two-column +/// table (term name -> value), with the scalar `weighted_cost` as the final +/// row. PURE: appends to `html`. +fn write_metrics_table(html: &mut String, render: &RenderReport) { + let m = &render.metrics; + let rows = [ + ("node_overlap", m.node_overlap), + ("node_connector_overlap", m.node_connector_overlap), + ("label_overlap", m.label_overlap), + ("crossings", m.crossings), + ("sprawl", m.sprawl), + ("edge_length_cv", m.edge_length_cv), + ("aspect_penalty", m.aspect_penalty), + ("chain_straightness", m.chain_straightness), + ("loop_compactness", m.loop_compactness), + ]; + html.push_str(""); + for (name, value) in rows { + let _ = write!( + html, + "" + ); + } + let _ = write!( + html, + "", + render.weighted_cost + ); + html.push_str("
{name}{value:.4}
weighted_cost{:.4}
"); +} + +/// Render one render's cell (heading + image + breakdown table). A missing +/// render (the model shipped no reference, or its layout/render failed) renders +/// a muted placeholder so the contact-sheet records the gap rather than hiding +/// it. PURE. +fn write_render_cell(html: &mut String, kind: &str, render: Option<&RenderReport>) { + html.push_str("
"); + let _ = write!(html, "

{}

", html_escape(kind)); + match render { + Some(r) => { + let src = html_escape(&r.file); + let alt = html_escape(&format!("{kind} layout")); + let _ = write!(html, "\"{alt}\""); + if let Some(seed) = r.seed { + let _ = write!(html, "

seed {seed}

"); + } + write_metrics_table(html, r); + } + None => html.push_str("

(not rendered)

"), + } + html.push_str("
"); +} + +/// Format a `delta_ratio` as a signed percentage (e.g. `+3.2%`, `-0.0%`). PURE. +fn fmt_delta_pct(ratio: f64) -> String { + format!("{:+.2}%", ratio * 100.0) +} + +/// Render the baseline-vs-candidate diff into the header: the aggregate delta + +/// significance verdict, then a per-model table of `delta_ratio`, the +/// Mann-Whitney p-value, and the significance verdict. A `None` comparison (no +/// committed baseline) renders a muted note instead, so the contact-sheet always +/// records whether a baseline was diffed. PURE: appends to `html`. +fn write_baseline_diff(html: &mut String, comparison: Option<&Comparison>) { + let Some(cmp) = comparison else { + html.push_str( + "

No baseline diff (run with \ + LAYOUT_EVAL_WRITE_BASELINE=1 to seed one).

\n", + ); + return; + }; + + html.push_str("

Baseline diff

"); + let agg_class = if cmp.aggregate_significant { + "sig" + } else { + "nonsig" + }; + let agg_verdict = if cmp.aggregate_significant { + "significant" + } else { + "not significant" + }; + let _ = write!( + html, + "

aggregate delta {} · \ + p={:.4} · {agg_verdict}

", + fmt_delta_pct(cmp.aggregate_delta_ratio), + cmp.aggregate_p_value, + ); + + if cmp.per_model.is_empty() { + html.push_str("

(no models matched the baseline)

\n"); + return; + } + + html.push_str( + "\ + ", + ); + for m in &cmp.per_model { + let (cls, verdict) = if m.significant { + ("sig", "significant") + } else { + ("nonsig", "—") + }; + let _ = write!( + html, + "\ + \ + ", + html_escape(&m.model), + m.baseline_median, + m.candidate_median, + fmt_delta_pct(m.delta_ratio), + m.p_value, + ); + } + html.push_str("
modelbaselinecandidatedeltapsignificance
{}{:.4}{:.4}{}{:.4}{verdict}
\n"); +} + +/// Render the self-contained `index.html` contact-sheet from the report. +/// +/// PURE: a string built from `report`. The header shows the corpus +/// `geomean_of_medians`, the weight set, and (when a committed baseline was +/// diffed) the baseline-vs-candidate delta table; models are laid out one +/// section per model, worst-cost-first (the report is already sorted), each with +/// its reference (if any) and best/median/worst renders side by side and a +/// per-term breakdown under each. `` paths are relative to the out dir so +/// the file references its sibling PNGs. +fn render_index_html(report: &EvalReport) -> String { + let mut html = String::new(); + html.push_str( + "\n\n\n\n\ + \n\ + Layout quality eval\n\n\n\n", + ); + + html.push_str("

Layout quality eval

\n"); + let _ = writeln!( + &mut html, + "

Corpus geomean_of_medians = {:.4} over \ + {} model(s), sorted worst-cost-first.

", + report.geomean_of_medians, + report.models.len(), + ); + + // The weight set used for every weighted_cost in this report. + let w = &report.weights; + let weight_rows = [ + ("node_overlap", w.node_overlap), + ("node_connector_overlap", w.node_connector_overlap), + ("label_overlap", w.label_overlap), + ("crossings", w.crossings), + ("sprawl", w.sprawl), + ("edge_length_cv", w.edge_length_cv), + ("aspect_penalty", w.aspect_penalty), + ("chain_straightness", w.chain_straightness), + ("loop_compactness", w.loop_compactness), + ]; + html.push_str(""); + for (name, value) in weight_rows { + let _ = write!( + &mut html, + "" + ); + } + html.push_str("
weights
{name}{value:.4}
\n"); + + write_baseline_diff(&mut html, report.baseline_comparison.as_ref()); + + for model in &report.models { + let name = html_escape(&model.model); + html.push_str("
"); + let _ = write!(&mut html, "

{name}

"); + let _ = write!( + &mut html, + "

median={:.4} · p25/p75={:.4}/{:.4} · \ + best_of_k={:.4} · M={} · \ + seeds best/median/worst={}/{}/{}

", + model.median_cost, + model.spread.0, + model.spread.1, + model.best_of_k_cost, + model.m, + model.best_seed, + model.median_seed, + model.worst_seed, + ); + html.push_str("
"); + write_render_cell(&mut html, "reference", model.reference.as_ref()); + write_render_cell(&mut html, "best", model.best.as_ref()); + write_render_cell(&mut html, "median", model.median.as_ref()); + write_render_cell(&mut html, "worst", model.worst.as_ref()); + html.push_str("
\n"); + } + + html.push_str("\n\n"); + html +} + +// ── Baseline diff (imperative shell) ───────────────────────────────────────── + +/// Write `candidate` to the committed baseline JSON, replacing any existing +/// file. The full `CorpusReport` -- including each model's per-seed `samples` -- +/// is serialized so a later run can re-run Mann-Whitney U over the seed-sample +/// cost sets. On a serialize or write failure WARN to stderr (the run still +/// emits its `target/` artifacts; only the baseline re-seed failed). +fn write_baseline(candidate: &CorpusReport) { + let path = baseline_path(); + match serde_json::to_string_pretty(candidate) { + Ok(json) => match std::fs::write(&path, json) { + Ok(()) => println!( + "wrote baseline {path}\n\ + note: re-seed this baseline after the metric weights change." + ), + Err(err) => eprintln!("WARN: failed to write baseline {path}: {err}"), + }, + Err(err) => eprintln!("WARN: failed to serialize baseline: {err}"), + } +} + +/// Read and deserialize the committed baseline `CorpusReport`, if present. +/// +/// Returns `None` (with a one-line note) when the file does not exist -- the +/// expected state before a baseline has been seeded. A file that exists but +/// fails to read or parse is a real error: WARN with the cause and return `None` +/// so the run still emits its artifacts without a diff. +fn read_baseline() -> Option { + let path = baseline_path(); + let json = match std::fs::read_to_string(&path) { + Ok(json) => json, + Err(err) if err.kind() == std::io::ErrorKind::NotFound => { + println!("no baseline; run with LAYOUT_EVAL_WRITE_BASELINE=1 to seed one."); + return None; + } + Err(err) => { + eprintln!("WARN: failed to read baseline {path}: {err}"); + return None; + } + }; + match serde_json::from_str::(&json) { + Ok(report) => Some(report), + Err(err) => { + eprintln!("WARN: failed to parse baseline {path}: {err}"); + None + } + } +} + +/// Print the baseline-vs-candidate diff to stdout: one line per matched model +/// (delta + p-value + significance) and an aggregate line. PURE-ish: reads +/// `cmp` and prints; kept in the shell because it does I/O (stdout). +fn print_comparison(cmp: &Comparison) { + println!("baseline diff (candidate vs baseline):"); + for m in &cmp.per_model { + let verdict = if m.significant { + "significant" + } else { + "not significant" + }; + println!( + " {}: delta={} p={:.4} ({verdict})", + m.model, + fmt_delta_pct(m.delta_ratio), + m.p_value, + ); + } + if cmp.per_model.is_empty() { + println!(" (no models matched the baseline)"); + } + let agg_verdict = if cmp.aggregate_significant { + "significant" + } else { + "not significant" + }; + println!( + " aggregate: delta={} p={:.4} ({agg_verdict})", + fmt_delta_pct(cmp.aggregate_delta_ratio), + cmp.aggregate_p_value, + ); +} + +/// Resolve the baseline diff for this run. +/// +/// When `LAYOUT_EVAL_WRITE_BASELINE` is set, (re)seed the committed baseline +/// from `candidate` and return `None` (a seeding run reports no diff -- there is +/// nothing yet to diff against). Otherwise read the committed baseline (if any), +/// run `compare(baseline, candidate)`, print the diff, and return it for +/// embedding in the artifacts. Absent baseline -> `None`. +fn resolve_baseline_diff(candidate: &CorpusReport) -> Option { + if write_baseline_requested() { + write_baseline(candidate); + return None; + } + let baseline = read_baseline()?; + let cmp = compare(&baseline, candidate); + print_comparison(&cmp); + Some(cmp) +} + +fn main() { + let keys = selected_keys(); + let m = seed_count(); + let seeds = seed_set(m); + let out = out_dir(); + + std::fs::create_dir_all(&out) + .unwrap_or_else(|e| panic!("failed to create output dir {out}: {e}")); + + let n_sampled = seeds.len(); + println!( + "layout_eval: {} model(s), M={m} seeds (sampling {n_sampled} unique), out={out}", + keys.len(), + ); + + // Per-model skip-on-failure (AC3.6): each model's full pipeline (load -> + // sweep -> render) is wrapped in `process_model`. ANY failure -- a load + // error, a layout that fails on every seed, etc. -- is WARN-logged and the + // sweep CONTINUES to the next model; the failed model is omitted from + // `per_model`/`renders` (and therefore from every artifact). The harness + // always reaches the end and exits 0, even if every model was skipped. + // + // `per_model` and `renders` stay positionally paired: both are pushed + // exactly once per surviving model, so the Task-4 report builder can zip + // them. + let mut per_model: Vec = Vec::new(); + let mut renders: Vec = Vec::new(); + let mut skipped = 0usize; + for spec in CORPUS.iter().filter(|s| keys.contains(&s.key)) { + match process_model(spec, &seeds, &out) { + Ok((stats, model_renders)) => { + per_model.push(stats); + renders.push(model_renders); + } + Err(err) => { + eprintln!("WARN: skipping {}: {err}", spec.key); + skipped += 1; + } + } + } + if skipped > 0 { + println!("skipped {skipped} model(s) (see WARN lines above)"); + } + + let corpus = CorpusReport::from_model_stats(per_model); + println!( + "corpus: geomean_of_medians={:.4} ({} model(s) scored)", + corpus.geomean_of_medians, + corpus.per_model.len(), + ); + + let with_reference = renders.iter().filter(|r| r.reference.is_some()).count(); + println!( + "corpus: {with_reference}/{} model(s) shipped a hand-authored reference view", + renders.len(), + ); + + // Either (re)seed the committed baseline from this run, or diff this run's + // report against the committed baseline (printing the per-model + aggregate + // deltas with Mann-Whitney p-values). The returned `Comparison` (if any) is + // embedded into both artifacts below. + let baseline_comparison = resolve_baseline_diff(&corpus); + + // Build the serializable report from the in-memory stats + renders, then + // emit both artifacts under the out dir (which defaults under the gitignored + // repo-root `target/`). `corpus.per_model` and `renders` are positionally + // paired -- both are pushed once per surviving model in the loop above. + let report = build_report( + &corpus.per_model, + &renders, + corpus.geomean_of_medians, + &MetricWeights::default(), + baseline_comparison, + ); + + let metrics_path = format!("{out}/metrics.json"); + match serde_json::to_string_pretty(&report) { + Ok(json) => match std::fs::write(&metrics_path, json) { + Ok(()) => println!("wrote {metrics_path}"), + Err(err) => eprintln!("WARN: failed to write {metrics_path}: {err}"), + }, + Err(err) => eprintln!("WARN: failed to serialize metrics.json: {err}"), + } + + let index_path = format!("{out}/index.html"); + let html = render_index_html(&report); + match std::fs::write(&index_path, html) { + Ok(()) => println!("wrote {index_path}"), + Err(err) => eprintln!("WARN: failed to write {index_path}: {err}"), + } +} diff --git a/src/simlin-engine/examples/layout_eval_baseline.README.md b/src/simlin-engine/examples/layout_eval_baseline.README.md new file mode 100644 index 000000000..c007f65ca --- /dev/null +++ b/src/simlin-engine/examples/layout_eval_baseline.README.md @@ -0,0 +1,33 @@ +# layout_eval_baseline.json + +The committed baseline `CorpusReport` that `examples/layout_eval.rs` diffs every +normal run against (per-model + aggregate deltas with Mann-Whitney U p-values). + +## How this snapshot was seeded + +This baseline was seeded over a **small representative subset** of the corpus to +keep the run fast and the committed JSON modest: + +``` +LAYOUT_EVAL_MODELS=sir,teacup LAYOUT_EVAL_SEEDS=8 LAYOUT_EVAL_WRITE_BASELINE=1 \ + cargo run --release -p simlin-engine --features png_render,file_io --example layout_eval +``` + +It records the **current pre-Rung-0 layout behavior**, scored with the committed +calibrated `MetricWeights::default()`. It was re-seeded on 2026-05-23 after +Phase 4 committed those weights and `layout_eval.rs` switched from the Phase-3 +`PLACEHOLDER_WEIGHTS` to `MetricWeights::default()`. Do not seed the full metasd +corpus here: that is minutes-scale and produces a large JSON. + +## When to regenerate + +REGENERATE this baseline: + +- **Whenever the calibrated `MetricWeights::default()` change**: the weighted + costs change, so the recorded sample costs go stale. +- **Before Phase 5 measures Rung 0's improvement**: the baseline must capture + pre-Rung-0 behavior with the final calibrated weights so the Rung-0 diff is + meaningful. + +Re-run the seeding command above (optionally over a broader model set / larger +`LAYOUT_EVAL_SEEDS`) and commit the regenerated `layout_eval_baseline.json`. diff --git a/src/simlin-engine/examples/layout_eval_baseline.json b/src/simlin-engine/examples/layout_eval_baseline.json new file mode 100644 index 000000000..660587f39 --- /dev/null +++ b/src/simlin-engine/examples/layout_eval_baseline.json @@ -0,0 +1,393 @@ +{ + "per_model": [ + { + "model": "teacup", + "samples": [ + { + "seed": 0, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 1, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 2, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 3, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 4, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 5, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 6, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 7, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 42, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 123, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 456, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + }, + { + "seed": 789, + "metrics": { + "node_overlap": 0.03901734104046243, + "node_connector_overlap": 0.0, + "label_overlap": 0.0, + "crossings": 0.0, + "sprawl": 0.774985901426613, + "edge_length_cv": 0.3203457592744067, + "aspect_penalty": 0.0, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.03901734104046243 + } + ], + "median_cost": 0.03901734104046243, + "spread": [ + 0.03901734104046243, + 0.03901734104046243 + ], + "best_of_k_cost": 0.03901734104046243, + "best_seed": 0, + "median_seed": 0, + "worst_seed": 0 + }, + { + "model": "sir", + "samples": [ + { + "seed": 0, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 1, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 2, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 3, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 4, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 5, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 6, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 7, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 42, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 123, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 456, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + }, + { + "seed": 789, + "metrics": { + "node_overlap": 0.0, + "node_connector_overlap": 0.0, + "label_overlap": 0.038540721316451254, + "crossings": 0.0, + "sprawl": 0.7423022923087866, + "edge_length_cv": 0.39340989843910823, + "aspect_penalty": 0.06837606837606858, + "chain_straightness": 0.0, + "loop_compactness": 0.0 + }, + "weighted_cost": 0.038540721316451254 + } + ], + "median_cost": 0.038540721316451254, + "spread": [ + 0.038540721316451254, + 0.038540721316451254 + ], + "best_of_k_cost": 0.038540721316451254, + "best_seed": 0, + "median_seed": 0, + "worst_seed": 0 + } + ], + "geomean_of_medians": 0.03877829892542217 +} \ No newline at end of file diff --git a/src/simlin-engine/src/diagram/common.rs b/src/simlin-engine/src/diagram/common.rs index cf4a16596..683747f4d 100644 --- a/src/simlin-engine/src/diagram/common.rs +++ b/src/simlin-engine/src/diagram/common.rs @@ -137,6 +137,113 @@ pub fn rad_to_deg(r: f64) -> f64 { (r * 180.0) / PI } +// These rectangle/segment geometry primitives are the load-bearing helpers for +// the layout quality metric (`layout::metrics`). `rect_width`/`rect_height`/ +// `rect_area`/`rect_overlap_area` are consumed there (node-overlap, +// label-overlap, sprawl, and aspect terms), and `segment_clip_interval_in_rect` +// is the Liang-Barsky core that `node_connector_overlap` unions across boxes. +// `rect_contains_point` and `segment_length_in_rect` are primitives kept for +// completeness and as the single-box reference oracle the metric's tests check +// the union path against, so each stays `#[allow(dead_code)]` until a non-test +// caller needs it. + +/// Width of a rect (right - left). May be negative for a degenerate/inverted rect. +pub(crate) fn rect_width(r: &Rect) -> f64 { + r.right - r.left +} + +/// Height of a rect (bottom - top). +pub(crate) fn rect_height(r: &Rect) -> f64 { + r.bottom - r.top +} + +/// Area of a rect, clamped to >= 0. +pub(crate) fn rect_area(r: &Rect) -> f64 { + (rect_width(r).max(0.0)) * (rect_height(r).max(0.0)) +} + +/// Area of the axis-aligned intersection of two rects (0 if they do not overlap). +pub(crate) fn rect_overlap_area(a: &Rect, b: &Rect) -> f64 { + let w = a.right.min(b.right) - a.left.max(b.left); + let h = a.bottom.min(b.bottom) - a.top.max(b.top); + if w > 0.0 && h > 0.0 { w * h } else { 0.0 } +} + +/// True if `p` lies inside (or on the boundary of) `r`. +#[allow(dead_code)] +pub(crate) fn rect_contains_point(r: &Rect, p: &Point) -> bool { + p.x >= r.left && p.x <= r.right && p.y >= r.top && p.y <= r.bottom +} + +/// Clipped parameter interval `[t0, t1]` of segment `p0 + t*(p1-p0)` (t in +/// [0,1]) that lies within axis-aligned rect `r`, or `None` if the segment never +/// enters `r`. When `Some`, `0.0 <= t0 < t1 <= 1.0` (a zero-thickness touch +/// where `t0 == t1` returns `None`, contributing no length). This is the +/// Liang-Barsky core; `segment_length_in_rect` delegates to it, and +/// `layout::metrics` uses the raw intervals to UNION a connector's coverage +/// across multiple boxes so each physical sub-length is counted at most once. +/// Pure; no allocation. +pub(crate) fn segment_clip_interval_in_rect( + p0: &Point, + p1: &Point, + r: &Rect, +) -> Option<(f64, f64)> { + // Liang-Barsky clip of the parametric segment p0 + t*(p1-p0), t in [0,1], + // against left/right/top/bottom slabs. + let dx = p1.x - p0.x; + let dy = p1.y - p0.y; + let mut t0 = 0.0_f64; + let mut t1 = 1.0_f64; + // (p, q) pairs for the four half-planes; segment inside slab where p*t <= q. + let edges = [ + (-dx, p0.x - r.left), + (dx, r.right - p0.x), + (-dy, p0.y - r.top), + (dy, r.bottom - p0.y), + ]; + for (p, q) in edges { + if p == 0.0 { + if q < 0.0 { + return None; // parallel and outside this slab + } + } else { + let t = q / p; + if p < 0.0 { + if t > t1 { + return None; + } + if t > t0 { + t0 = t; + } + } else { + if t < t0 { + return None; + } + if t < t1 { + t1 = t; + } + } + } + } + if t1 > t0 { Some((t0, t1)) } else { None } +} + +/// Length of the portion of segment p0->p1 that lies within axis-aligned rect r. +/// Returns 0 if the segment never enters r. Pure; no allocation. Delegates to +/// `segment_clip_interval_in_rect` so the clip math lives in exactly one place. +#[allow(dead_code)] +pub(crate) fn segment_length_in_rect(p0: &Point, p1: &Point, r: &Rect) -> f64 { + match segment_clip_interval_in_rect(p0, p1, r) { + Some((t0, t1)) => { + let dx = p1.x - p0.x; + let dy = p1.y - p0.y; + let seg_len = (dx * dx + dy * dy).sqrt(); + (t1 - t0) * seg_len + } + None => 0.0, + } +} + #[cfg(test)] mod tests { use super::*; @@ -282,4 +389,194 @@ mod tests { assert!((rad_to_deg(PI) - 180.0).abs() < 1e-10); assert!((rad_to_deg(PI / 2.0) - 90.0).abs() < 1e-10); } + + #[test] + fn test_rect_dimensions() { + let r = Rect { + top: 10.0, + left: 20.0, + right: 50.0, + bottom: 70.0, + }; + assert_eq!(rect_width(&r), 30.0); + assert_eq!(rect_height(&r), 60.0); + assert_eq!(rect_area(&r), 30.0 * 60.0); + } + + #[test] + fn test_rect_area_clamps_negative() { + // An inverted/degenerate rect (right < left, bottom < top) has + // negative width/height; rect_area clamps each to 0 so the result is 0. + let inverted = Rect { + top: 70.0, + left: 50.0, + right: 20.0, + bottom: 10.0, + }; + assert!(rect_width(&inverted) < 0.0); + assert!(rect_height(&inverted) < 0.0); + assert_eq!(rect_area(&inverted), 0.0); + } + + #[test] + fn test_rect_overlap_area_known_overlap() { + // a covers x in [0,10], y in [0,10]; b covers x in [5,15], y in [5,15]. + // Their intersection is x in [5,10], y in [5,10] => 5 x 5 = 25. + let a = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + let b = Rect { + top: 5.0, + left: 5.0, + right: 15.0, + bottom: 15.0, + }; + assert_eq!(rect_overlap_area(&a, &b), 25.0); + // Overlap is symmetric in argument order. + assert_eq!(rect_overlap_area(&b, &a), 25.0); + } + + #[test] + fn test_rect_overlap_area_disjoint() { + let a = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + let b = Rect { + top: 20.0, + left: 20.0, + right: 30.0, + bottom: 30.0, + }; + assert_eq!(rect_overlap_area(&a, &b), 0.0); + } + + #[test] + fn test_rect_overlap_area_identical() { + // Two identical rects overlap by their full area. + let r = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 4.0, + }; + assert_eq!(rect_overlap_area(&r, &r), rect_area(&r)); + assert_eq!(rect_overlap_area(&r, &r), 40.0); + } + + #[test] + fn test_rect_overlap_area_touching_edge() { + // b's left edge touches a's right edge (both at x=10): zero-width overlap => 0. + let a = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + let b = Rect { + top: 0.0, + left: 10.0, + right: 20.0, + bottom: 10.0, + }; + assert_eq!(rect_overlap_area(&a, &b), 0.0); + } + + #[test] + fn test_rect_contains_point() { + let r = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + // Strictly inside. + assert!(rect_contains_point(&r, &Point { x: 5.0, y: 5.0 })); + // On the boundary (inclusive). + assert!(rect_contains_point(&r, &Point { x: 0.0, y: 0.0 })); + assert!(rect_contains_point(&r, &Point { x: 10.0, y: 10.0 })); + assert!(rect_contains_point(&r, &Point { x: 0.0, y: 5.0 })); + // Outside on each side. + assert!(!rect_contains_point(&r, &Point { x: -1.0, y: 5.0 })); + assert!(!rect_contains_point(&r, &Point { x: 11.0, y: 5.0 })); + assert!(!rect_contains_point(&r, &Point { x: 5.0, y: -1.0 })); + assert!(!rect_contains_point(&r, &Point { x: 5.0, y: 11.0 })); + } + + #[test] + fn test_segment_length_in_rect_crosses_fully() { + // Rect spans x in [0,10], y in [0,10]. A horizontal segment from + // (-5, 5) to (15, 5) enters at x=0 and exits at x=10 => inside length 10. + let r = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + let got = + segment_length_in_rect(&Point { x: -5.0, y: 5.0 }, &Point { x: 15.0, y: 5.0 }, &r); + assert!((got - 10.0).abs() < 1e-9, "got {got}"); + } + + #[test] + fn test_segment_length_in_rect_entirely_outside() { + let r = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + // Segment well above the rect, never enters. + let got = + segment_length_in_rect(&Point { x: -5.0, y: 50.0 }, &Point { x: 15.0, y: 50.0 }, &r); + assert_eq!(got, 0.0); + } + + #[test] + fn test_segment_length_in_rect_entirely_inside() { + let r = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + // Segment from (2,2) to (5,6): both endpoints inside; full length is + // sqrt(3^2 + 4^2) = 5. + let got = segment_length_in_rect(&Point { x: 2.0, y: 2.0 }, &Point { x: 5.0, y: 6.0 }, &r); + assert!((got - 5.0).abs() < 1e-9, "got {got}"); + } + + #[test] + fn test_segment_length_in_rect_one_endpoint_inside() { + let r = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + // Horizontal segment from (5,5) (inside) to (25,5) (outside): the + // portion inside runs from x=5 to x=10 => length 5. + let got = segment_length_in_rect(&Point { x: 5.0, y: 5.0 }, &Point { x: 25.0, y: 5.0 }, &r); + assert!((got - 5.0).abs() < 1e-9, "got {got}"); + } + + #[test] + fn test_segment_length_in_rect_parallel_outside_slab() { + // A vertical segment to the left of the rect is parallel to the + // left/right slabs and outside them: dx == 0 with q < 0 => 0. + let r = Rect { + top: 0.0, + left: 0.0, + right: 10.0, + bottom: 10.0, + }; + let got = + segment_length_in_rect(&Point { x: -5.0, y: -5.0 }, &Point { x: -5.0, y: 15.0 }, &r); + assert_eq!(got, 0.0); + } } diff --git a/src/simlin-engine/src/diagram/connector.rs b/src/simlin-engine/src/diagram/connector.rs index a59f5a0e0..14ed42f0f 100644 --- a/src/simlin-engine/src/diagram/connector.rs +++ b/src/simlin-engine/src/diagram/connector.rs @@ -13,6 +13,15 @@ use crate::diagram::common::{ }; use crate::diagram::constants::*; +/// Number of straight segments used to approximate a drawn arc connector when +/// producing its polyline for crossing detection and metric computation. 16 +/// segments closely tracks the curve: the maximum chord-to-arc deviation for a +/// half-circle sampled this finely is well under a pixel at typical diagram +/// radii, which is more than enough to detect whether the arc crosses another +/// edge. It does not affect rendered SVG (the renderer still emits a single +/// `A` arc command); it only governs the sampled geometry the metric sees. +pub(crate) const ARC_POLYLINE_SAMPLES: usize = 16; + enum ElementShape { Circle { r: f64 }, Rect { hw: f64, hh: f64 }, @@ -101,7 +110,10 @@ fn is_element_arrayed(element: &ViewElement, is_arrayed_fn: &dyn Fn(&str) -> boo } } -fn get_visual_center(element: &ViewElement, is_arrayed_fn: &dyn Fn(&str) -> bool) -> (f64, f64) { +pub(crate) fn get_visual_center( + element: &ViewElement, + is_arrayed_fn: &dyn Fn(&str) -> bool, +) -> (f64, f64) { let (cx, cy) = match element { ViewElement::Aux(a) => (a.x, a.y), ViewElement::Stock(s) => (s.x, s.y), @@ -140,7 +152,7 @@ fn circle_from_points(p1: Point, p2: Point, p3: Point) -> Result f64 { +pub(crate) fn opposite_theta(theta: f64) -> f64 { let mut t = theta + PI; if t > PI { t -= 2.0 * PI; @@ -148,7 +160,7 @@ fn opposite_theta(theta: f64) -> f64 { t } -fn intersect_element_straight( +pub(crate) fn intersect_element_straight( element: &ViewElement, theta: f64, is_arrayed_fn: &dyn Fn(&str) -> bool, @@ -164,7 +176,7 @@ fn intersect_element_straight( } } -fn intersect_element_arc( +pub(crate) fn intersect_element_arc( element: &ViewElement, circ: &Circle, inv: bool, @@ -215,7 +227,7 @@ fn intersect_element_arc( } } -fn is_straight_line( +pub(crate) fn is_straight_line( element: &view_element::Link, from: &ViewElement, to: &ViewElement, @@ -234,7 +246,7 @@ fn is_straight_line( } } -fn arc_circle( +pub(crate) fn arc_circle( element: &view_element::Link, from: &ViewElement, to: &ViewElement, @@ -342,24 +354,48 @@ fn render_straight_line( svg } -fn render_arc( +/// The exact scalars `render_arc` needs to format its SVG, plus what an arc +/// sampler needs to reproduce the drawn curve as a polyline. All fields are +/// raw f64 (no pre-rounding): rounding happens only at the `js_format_number` +/// boundary in `render_arc`, so the SVG string stays byte-for-byte identical +/// to the pre-factor-out code (and to the TypeScript renderer). +#[derive(Clone, Copy)] +struct ArcGeometry { + /// SVG path start (= `from_visual`, the source element center). + start: Point, + /// SVG path end (= `to_visual`, the target element center). + arc_end: Point, + /// Arc center and radius. + circ: Circle, + /// SVG large-arc-flag. + sweep: bool, + /// SVG sweep-flag. + inv: bool, + /// Arrowhead anchor point on the target element boundary. + end: Point, + /// Final arrowhead rotation in degrees (already adjusted for `inv`). + arrow_theta: f64, +} + +/// Compute the drawn-arc geometry for a connector. Returns `None` in the two +/// cases the renderer draws nothing: a non-`Arc` shape (e.g. `MultiPoint`) and +/// a degenerate arc where `arc_circle` cannot be constructed. The body is the +/// verbatim geometry the original `render_arc` computed (lines that produced +/// `circ`, `inv`, `sweep`, `start`, `arc_end`, `end`, and `arrow_theta`). +fn arc_geometry( element: &view_element::Link, from: &ViewElement, to: &ViewElement, - is_to_stock: bool, is_arrayed_fn: &dyn Fn(&str) -> bool, -) -> String { +) -> Option { let from_visual = get_visual_center(from, is_arrayed_fn); let to_visual = get_visual_center(to, is_arrayed_fn); - let circ = match arc_circle(element, from, to, is_arrayed_fn) { - Some(c) => c, - None => return "".to_string(), - }; + let circ = arc_circle(element, from, to, is_arrayed_fn)?; let takeoff_angle = match &element.shape { LinkShape::Arc(arc) => deg_to_rad(*arc), - _ => return "".to_string(), + _ => return None, }; let from_theta = (from_visual.1 - circ.y).atan2(from_visual.0 - circ.x); @@ -397,23 +433,120 @@ fn render_arc( }; let end = intersect_element_arc(to, &circ, !inv, is_arrayed_fn); - let path = format!( - "M{},{}A{},{} 0 {},{} {},{}", - js_format_number(start.x), - js_format_number(start.y), - js_format_number(circ.r), - js_format_number(circ.r), - sweep as u8, - inv as u8, - js_format_number(arc_end.x), - js_format_number(arc_end.y) - ); - let mut arrow_theta = rad_to_deg((end.y - circ.y).atan2(end.x - circ.x)) - 90.0; if inv { arrow_theta += 180.0; } + Some(ArcGeometry { + start, + arc_end, + circ, + sweep, + inv, + end, + arrow_theta, + }) +} + +/// Sample the drawn SVG arc as a polyline from `g.start` to `g.arc_end` along +/// `g.circ`, honoring the SVG large-arc (`g.sweep`) and sweep (`g.inv`) flags. +/// Uses the standard SVG endpoint->center arc parametrization: derive the +/// start angle and a signed sweep `delta` from the two endpoint angles, then +/// adjust `delta` so its sign matches the sweep-flag and its magnitude matches +/// the large-arc-flag. Returns `samples.max(2) + 1` points. +fn sample_arc(g: &ArcGeometry, samples: usize) -> Vec { + let n = samples.max(2); + let theta0 = (g.start.y - g.circ.y).atan2(g.start.x - g.circ.x); + let theta1 = (g.arc_end.y - g.circ.y).atan2(g.arc_end.x - g.circ.x); + // SVG sweep-flag (g.inv) selects direction; large-arc-flag (g.sweep) + // selects the >180-degree arc. Normalize delta accordingly. + let mut delta = theta1 - theta0; + let two_pi = 2.0 * std::f64::consts::PI; + // bring delta into (-2pi, 2pi) + while delta <= -two_pi { + delta += two_pi; + } + while delta >= two_pi { + delta -= two_pi; + } + let sweep_positive = g.inv; // sweep-flag set => angles increase + if sweep_positive && delta < 0.0 { + delta += two_pi; + } + if !sweep_positive && delta > 0.0 { + delta -= two_pi; + } + let large = g.sweep; // large-arc-flag + if large && delta.abs() < std::f64::consts::PI { + delta += if delta >= 0.0 { two_pi } else { -two_pi }; + } + if !large && delta.abs() > std::f64::consts::PI { + delta += if delta >= 0.0 { -two_pi } else { two_pi }; + } + (0..=n) + .map(|i| { + let t = i as f64 / n as f64; + let th = theta0 + delta * t; + Point { + x: g.circ.x + g.circ.r * th.cos(), + y: g.circ.y + g.circ.r * th.sin(), + } + }) + .collect() +} + +/// The polyline the renderer draws for a connector, as the metric/crossing +/// code sees it. Straight links are clipped to element boundaries (matching +/// `render_straight_line`); arcs are sampled center-to-center along the arc +/// circle (matching `render_arc`, which draws start=from_visual to +/// arc_end=to_visual); MultiPoint links return an empty vec because the +/// renderer draws nothing for them today (known gap). +pub(crate) fn connector_polyline( + element: &view_element::Link, + from: &ViewElement, + to: &ViewElement, + is_arrayed_fn: &dyn Fn(&str) -> bool, + arc_samples: usize, +) -> Vec { + if is_straight_line(element, from, to, is_arrayed_fn) { + let from_visual = get_visual_center(from, is_arrayed_fn); + let to_visual = get_visual_center(to, is_arrayed_fn); + let theta = (to_visual.1 - from_visual.1).atan2(to_visual.0 - from_visual.0); + let start = intersect_element_straight(from, theta, is_arrayed_fn); + let end = intersect_element_straight(to, opposite_theta(theta), is_arrayed_fn); + return vec![start, end]; + } + match arc_geometry(element, from, to, is_arrayed_fn) { + None => Vec::new(), // MultiPoint or degenerate arc: renderer draws nothing + Some(g) => sample_arc(&g, arc_samples), + } +} + +fn render_arc( + element: &view_element::Link, + from: &ViewElement, + to: &ViewElement, + is_to_stock: bool, + is_arrayed_fn: &dyn Fn(&str) -> bool, +) -> String { + let g = match arc_geometry(element, from, to, is_arrayed_fn) { + Some(g) => g, + None => return "".to_string(), + }; + + let path = format!( + "M{},{}A{},{} 0 {},{} {},{}", + js_format_number(g.start.x), + js_format_number(g.start.y), + js_format_number(g.circ.r), + js_format_number(g.circ.r), + g.sweep as u8, + g.inv as u8, + js_format_number(g.arc_end.x), + js_format_number(g.arc_end.y) + ); + let connector_class = if is_to_stock { "simlin-connector simlin-connector-dashed" } else { @@ -432,9 +565,9 @@ fn render_arc( connector_class )); svg.push_str(&render_arrowhead( - end.x, - end.y, - arrow_theta, + g.end.x, + g.end.y, + g.arrow_theta, ARROWHEAD_RADIUS, ArrowheadType::Connector, )); @@ -558,6 +691,116 @@ mod tests { assert!(svg.contains("simlin-arrowhead-link")); } + /// Byte-identical regression guard for the arc factor-out. The expected + /// string was captured from the pre-refactor `render_arc` output for this + /// exact Arc link; the geometry extraction must not change a single byte + /// (the `svg-rendering.test.ts` parity test asserts Rust SVG == TS SVG). + #[test] + fn test_render_arc_svg_byte_identical() { + let link = view_element::Link { + uid: 10, + from_uid: 1, + to_uid: 2, + shape: LinkShape::Arc(30.0), + polarity: None, + }; + let from = make_aux_ve(100.0, 100.0, "a", 1); + let to = make_aux_ve(200.0, 200.0, "b", 2); + + let svg = render_connector(&link, &from, &to, ¬_arrayed); + let expected = ""; + assert_eq!(svg, expected); + assert!(svg.starts_with(" N+1 points" + ); + + // The drawn arc goes center-to-center (start = from_visual, + // arc_end = to_visual). + let first = poly.first().unwrap(); + let last = poly.last().unwrap(); + assert!((first.x - 100.0).abs() < 1e-6 && (first.y - 100.0).abs() < 1e-6); + assert!((last.x - 200.0).abs() < 1e-6 && (last.y - 200.0).abs() < 1e-6); + + // Every sampled point lies on the arc circle. + let circ = arc_circle(&link, &from, &to, ¬_arrayed).unwrap(); + for p in &poly { + let d = (square(p.x - circ.x) + square(p.y - circ.y)).sqrt(); + assert!( + (d - circ.r).abs() < 1e-6, + "point ({}, {}) not on arc circle: dist {} vs r {}", + p.x, + p.y, + d, + circ.r + ); + } + } + + #[test] + fn test_connector_polyline_multipoint_is_empty() { + let link = view_element::Link { + uid: 10, + from_uid: 1, + to_uid: 2, + shape: LinkShape::MultiPoint(vec![]), + polarity: None, + }; + let from = make_aux_ve(100.0, 100.0, "a", 1); + let to = make_aux_ve(200.0, 200.0, "b", 2); + + let poly = connector_polyline(&link, &from, &to, ¬_arrayed, ARC_POLYLINE_SAMPLES); + assert!( + poly.is_empty(), + "MultiPoint links draw nothing, so the polyline is empty" + ); + } + // --- ray_rect_intersection tests --- fn assert_on_rect_boundary(p: Point, cx: f64, cy: f64, hw: f64, hh: f64) { diff --git a/src/simlin-engine/src/diagram/elements.rs b/src/simlin-engine/src/diagram/elements.rs index ca6a56fcb..04215e974 100644 --- a/src/simlin-engine/src/diagram/elements.rs +++ b/src/simlin-engine/src/diagram/elements.rs @@ -49,16 +49,26 @@ pub fn render_aux(element: &view_element::Aux, is_arrayed: bool) -> String { svg } -pub fn aux_bounds(element: &view_element::Aux) -> Rect { +/// The aux's bare *shape* box (the circle's bounding rect), WITHOUT its label. +/// `aux_bounds` is this box merged with the label; quality metrics that already +/// account for labels separately (e.g. label-vs-node overlap) need the +/// label-free shape to avoid double-counting the label area. +pub(crate) fn aux_shape_bounds(element: &view_element::Aux) -> Rect { let cx = element.x; let cy = element.y; let r = AUX_RADIUS; - let bounds = Rect { + Rect { top: cy - r, left: cx - r, right: cx + r, bottom: cy + r, - }; + } +} + +pub fn aux_bounds(element: &view_element::Aux) -> Rect { + let cx = element.x; + let cy = element.y; + let bounds = aux_shape_bounds(element); let label_props = LabelProps::new(cx, cy, element.label_side, display_name(&element.name)); element_with_label_bounds(bounds, &label_props) @@ -108,17 +118,27 @@ pub fn render_stock(element: &view_element::Stock, is_arrayed: bool) -> String { svg } -pub fn stock_bounds(element: &view_element::Stock) -> Rect { +/// The stock's bare *shape* box (the rect), WITHOUT its label. See +/// `aux_shape_bounds` for why the label-free shape is exposed separately. +pub(crate) fn stock_shape_bounds(element: &view_element::Stock) -> Rect { let cx = element.x; let cy = element.y; let w = STOCK_WIDTH; let h = STOCK_HEIGHT; - let bounds = Rect { + Rect { top: cy - h / 2.0, left: cx - w / 2.0, right: cx + w / 2.0, bottom: cy + h / 2.0, - }; + } +} + +pub fn stock_bounds(element: &view_element::Stock) -> Rect { + let cx = element.x; + let cy = element.y; + let w = STOCK_WIDTH; + let h = STOCK_HEIGHT; + let bounds = stock_shape_bounds(element); let label_props = LabelProps::new(cx, cy, element.label_side, display_name(&element.name)) .with_radii(w / 2.0, h / 2.0); diff --git a/src/simlin-engine/src/diagram/flow.rs b/src/simlin-engine/src/diagram/flow.rs index 91e911558..f7d0395a1 100644 --- a/src/simlin-engine/src/diagram/flow.rs +++ b/src/simlin-engine/src/diagram/flow.rs @@ -141,7 +141,12 @@ pub fn render_flow(element: &view_element::Flow, sink: &ViewElement, is_arrayed: svg } -pub fn flow_bounds(element: &view_element::Flow) -> Rect { +/// The flow's bare *shape* box (the valve plus the pipe polyline points), +/// WITHOUT its label. `flow_bounds` is this box merged with the label; see +/// `diagram::elements::aux_shape_bounds` for why the label-free shape is +/// exposed separately. The flow path points ARE part of the shape (the drawn +/// pipe), so they stay included here. +pub(crate) fn flow_shape_bounds(element: &view_element::Flow) -> Rect { let cx = element.x; let cy = element.y; // Flow valve bounds use r=6 (FLOW_VALVE_RADIUS), NOT AuxRadius @@ -153,13 +158,7 @@ pub fn flow_bounds(element: &view_element::Flow) -> Rect { bottom: cy + r, }; - // Include label bounds - let label_props = - LabelProps::new(cx, cy, element.label_side, display_name(&element.name)).with_radii(r, r); - let l_bounds = label_bounds(&label_props); - bounds = merge_bounds(bounds, l_bounds); - - // Include flow path points + // Include flow path points (the drawn pipe). for point in &element.points { bounds.left = bounds.left.min(point.x); bounds.right = bounds.right.max(point.x); @@ -170,6 +169,20 @@ pub fn flow_bounds(element: &view_element::Flow) -> Rect { bounds } +pub fn flow_bounds(element: &view_element::Flow) -> Rect { + let cx = element.x; + let cy = element.y; + let r = FLOW_VALVE_RADIUS; + let shape = flow_shape_bounds(element); + + // Include label bounds + let label_props = + LabelProps::new(cx, cy, element.label_side, display_name(&element.name)).with_radii(r, r); + let l_bounds = label_bounds(&label_props); + + merge_bounds(shape, l_bounds) +} + #[cfg(test)] mod tests { use super::*; diff --git a/src/simlin-engine/src/diagram/mod.rs b/src/simlin-engine/src/diagram/mod.rs index 32742b326..f9cd8f614 100644 --- a/src/simlin-engine/src/diagram/mod.rs +++ b/src/simlin-engine/src/diagram/mod.rs @@ -4,11 +4,11 @@ mod arrowhead; pub mod common; -mod connector; +pub(crate) mod connector; pub mod constants; -mod elements; -mod flow; -mod label; +pub(crate) mod elements; +pub(crate) mod flow; +pub(crate) mod label; mod render; #[cfg(feature = "png_render")] mod render_png; diff --git a/src/simlin-engine/src/layout/crossings_tests.rs b/src/simlin-engine/src/layout/crossings_tests.rs new file mode 100644 index 000000000..a457bb83b --- /dev/null +++ b/src/simlin-engine/src/layout/crossings_tests.rs @@ -0,0 +1,419 @@ +// Copyright 2026 The Simlin Authors. All rights reserved. +// Use of this source code is governed by the Apache License, +// Version 2.0, that can be found in the LICENSE file. + +//! Tests for the polyline-based `count_view_crossings` / `build_view_segments` +//! (Phase 1, Task 4 of the layout quality eval). Kept in their own file so the +//! `layout_tests.rs` integration suite stays under the per-file line cap. + +use super::*; + +fn cv_aux(uid: i32, x: f64, y: f64) -> ViewElement { + ViewElement::Aux(view_element::Aux { + name: format!("a{uid}"), + uid, + x, + y, + label_side: LabelSide::Bottom, + compat: None, + }) +} + +fn cv_module(uid: i32, x: f64, y: f64) -> ViewElement { + ViewElement::Module(view_element::Module { + name: format!("m{uid}"), + uid, + x, + y, + label_side: LabelSide::Bottom, + }) +} + +fn cv_link(uid: i32, from_uid: i32, to_uid: i32, shape: LinkShape) -> ViewElement { + ViewElement::Link(view_element::Link { + uid, + from_uid, + to_uid, + shape, + polarity: None, + }) +} + +fn cv_stock(uid: i32, x: f64, y: f64) -> ViewElement { + ViewElement::Stock(view_element::Stock { + name: format!("s{uid}"), + uid, + x, + y, + label_side: LabelSide::Bottom, + compat: None, + }) +} + +fn cv_cloud(uid: i32, flow_uid: i32, x: f64, y: f64) -> ViewElement { + ViewElement::Cloud(view_element::Cloud { + uid, + flow_uid, + x, + y, + compat: None, + }) +} + +/// A horizontal flow whose valve sits at (`x`, `y`), with its source end +/// attached to `from_uid` (a cloud or stock to the left) and its sink end +/// attached to `to_uid` (a stock to the right). The valve lies on the pipe, +/// mid-span between the two attached endpoints. +fn cv_flow(uid: i32, x: f64, y: f64, from_uid: i32, to_uid: i32) -> ViewElement { + cv_flow_pts( + uid, + x, + y, + (x - 60.0, y, Some(from_uid)), + (x + 60.0, y, Some(to_uid)), + ) +} + +/// A two-point flow with the valve at (`x`, `y`) and explicitly positioned +/// source/sink points, each carrying an optional `attached_to_uid`. Lets a +/// test reproduce a real reference geometry where the valve does not sit at the +/// midpoint of the two points. +fn cv_flow_pts( + uid: i32, + x: f64, + y: f64, + from: (f64, f64, Option), + to: (f64, f64, Option), +) -> ViewElement { + ViewElement::Flow(view_element::Flow { + name: format!("f{uid}"), + uid, + x, + y, + label_side: LabelSide::Top, + points: vec![ + view_element::FlowPoint { + x: from.0, + y: from.1, + attached_to_uid: from.2, + }, + view_element::FlowPoint { + x: to.0, + y: to.1, + attached_to_uid: to.2, + }, + ], + compat: None, + label_compat: None, + }) +} + +fn cv_view(elements: Vec) -> datamodel::StockFlow { + datamodel::StockFlow { + name: None, + elements, + view_box: Rect { + x: 0.0, + y: 0.0, + width: 1000.0, + height: 1000.0, + }, + zoom: 1.0, + use_lettered_polarity: false, + font: None, + sketch_compat: None, + } +} + +/// AC2.1: two straight links that cross once yield a crossing count of 1. +#[test] +fn test_count_view_crossings_two_straight_links_cross_once() { + // Link 1: a1(0,0) -> a2(100,100). Link 2: a3(0,100) -> a4(100,0). + // The two diagonals of a square cross exactly once at the center. + let view = cv_view(vec![ + cv_aux(1, 0.0, 0.0), + cv_aux(2, 100.0, 100.0), + cv_aux(3, 0.0, 100.0), + cv_aux(4, 100.0, 0.0), + cv_link(10, 1, 2, LinkShape::Straight), + cv_link(11, 3, 4, LinkShape::Straight), + ]); + + assert_eq!(count_view_crossings(&view), 1); +} + +/// AC2.1: two links sharing an endpoint element yield 0 crossings. +#[test] +fn test_count_view_crossings_shared_endpoint_no_crossing() { + // Both links start at a1; sharing the `elem_1` vertex suppresses any + // intersection at the shared endpoint. + let view = cv_view(vec![ + cv_aux(1, 50.0, 50.0), + cv_aux(2, 100.0, 0.0), + cv_aux(3, 100.0, 100.0), + cv_link(10, 1, 2, LinkShape::Straight), + cv_link(11, 1, 3, LinkShape::Straight), + ]); + + assert_eq!(count_view_crossings(&view), 0); +} + +/// AC2.2: an Arc connector that visually crosses another edge is counted via +/// polyline sampling, on a case where the straight-chord approximation does +/// not count it. The arc from a1(0,0) to a2(200,0) bulges down to a peak near +/// (100, 57.7); a horizontal straight link c-d at y=50 (from x=40 to x=160) +/// passes through the bulge, crossing the curve twice (near x=58 and x=142), +/// while the arc's straight chord (the line y=0) stays well clear of it. So the +/// old chord-based count is 0 and the new polyline-based count is >= 1. +#[test] +fn test_count_view_crossings_arc_curve_crosses_when_chord_does_not() { + let view = cv_view(vec![ + cv_aux(1, 0.0, 0.0), + cv_aux(2, 200.0, 0.0), + cv_aux(3, 40.0, 50.0), + cv_aux(4, 160.0, 50.0), + // Wide arc: large take-off angle so the curve bulges well below the + // straight chord between the two endpoints. + cv_link(10, 1, 2, LinkShape::Arc(60.0)), + cv_link(11, 3, 4, LinkShape::Straight), + ]); + + // The straight-chord approximation (centers, ignoring shape) does NOT + // count this crossing: build those chord segments inline and confirm 0. + let p1 = Position::new(0.0, 0.0); + let p2 = Position::new(200.0, 0.0); + let p3 = Position::new(40.0, 50.0); + let p4 = Position::new(160.0, 50.0); + let chord_segments = vec![ + LineSegment { + start: p1, + end: p2, + from_node: "elem_1".to_string(), + to_node: "elem_2".to_string(), + }, + LineSegment { + start: p3, + end: p4, + from_node: "elem_3".to_string(), + to_node: "elem_4".to_string(), + }, + ]; + assert_eq!( + annealing::count_crossings(&chord_segments), + 0, + "chord approximation must not see this crossing" + ); + + // The polyline (sampled arc) DOES count it. + assert!( + count_view_crossings(&view) >= 1, + "sampled arc curve must cross the straight link" + ); +} + +/// AC2.3: the crossing count is invariant under translation and rotation of +/// the whole view. +#[test] +fn test_count_view_crossings_translation_rotation_invariant() { + let base = vec![ + cv_aux(1, 0.0, 0.0), + cv_aux(2, 100.0, 100.0), + cv_aux(3, 0.0, 100.0), + cv_aux(4, 100.0, 0.0), + cv_link(10, 1, 2, LinkShape::Arc(25.0)), + cv_link(11, 3, 4, LinkShape::Straight), + ]; + let base_count = count_view_crossings(&cv_view(base.clone())); + + // Translate every coordinate by a fixed offset. + let translated: Vec = base + .iter() + .map(|e| transform_element(e, |x, y| (x + 137.0, y - 89.0))) + .collect(); + assert_eq!( + count_view_crossings(&cv_view(translated)), + base_count, + "translation must preserve crossing count" + ); + + // Rotate every coordinate about the origin by a fixed angle. + let theta = 0.7_f64; // radians + let (s, c) = theta.sin_cos(); + let rotated: Vec = base + .iter() + .map(|e| transform_element(e, |x, y| (x * c - y * s, x * s + y * c))) + .collect(); + assert_eq!( + count_view_crossings(&cv_view(rotated)), + base_count, + "rotation must preserve crossing count" + ); +} + +/// Apply a coordinate transform to the (x, y) of a positioned view element. +/// Links carry no coordinates of their own and pass through unchanged. +fn transform_element(e: &ViewElement, f: impl Fn(f64, f64) -> (f64, f64)) -> ViewElement { + match e { + ViewElement::Aux(a) => { + let (x, y) = f(a.x, a.y); + ViewElement::Aux(view_element::Aux { x, y, ..a.clone() }) + } + ViewElement::Module(m) => { + let (x, y) = f(m.x, m.y); + ViewElement::Module(view_element::Module { x, y, ..m.clone() }) + } + other => other.clone(), + } +} + +/// Module/Alias undercount fix: a link from an Aux to a Module that crosses +/// another link is now counted. Previously Module-incident links were dropped +/// from the segment set entirely, so this crossing was invisible. +#[test] +fn test_count_view_crossings_module_incident_link_participates() { + // Link 1: a1(0,0) -> m2(100,100) (a Module endpoint). + // Link 2: a3(0,100) -> a4(100,0). The two diagonals cross once. + let view = cv_view(vec![ + cv_aux(1, 0.0, 0.0), + cv_module(2, 100.0, 100.0), + cv_aux(3, 0.0, 100.0), + cv_aux(4, 100.0, 0.0), + cv_link(10, 1, 2, LinkShape::Straight), + cv_link(11, 3, 4, LinkShape::Straight), + ]); + + assert_eq!( + count_view_crossings(&view), + 1, + "a Module-incident link must participate in crossing detection" + ); +} + +/// A link that TERMINATES at a flow's valve must not be counted as crossing the +/// flow pipe at that shared connection point. This is the exact +/// dp_logistic_growth reference geometry: the horizontal `net birth rate` flow +/// (cloud -> valve -> Population stock) plus the `fractional growth rate -> +/// net birth rate` link, whose drawn arc curves up to the valve from below and +/// grazes the pipe at the connection point. The link's endpoint (`elem_2`, the +/// flow's own element uid) and the pipe share the flow's element at the valve, +/// so that graze is not a real crossing. +#[test] +fn test_count_view_crossings_link_to_flow_valve_no_crossing() { + let flow_uid = 2; + let view = cv_view(vec![ + cv_stock(1, 602.4000244140625, 259.8000183105469), + cv_flow_pts( + flow_uid, + 518.2726610523725, + 258.60003662109375, + // source end attached to the cloud, sink end to the stock + (456.79998779296875, 258.60003662109375, Some(3)), + (579.9000244140625, 258.60003662109375, Some(1)), + ), + cv_cloud(3, flow_uid, 456.79998779296875, 258.60003662109375), + cv_aux(4, 498.0, 344.20001220703125), + // fractional growth rate -> net birth rate (to_uid == flow.uid): the + // drawn arc bulges up to graze the pipe at the valve connection point. + cv_link(10, 4, flow_uid, LinkShape::Arc(118.82198603295677)), + ]); + + assert_eq!( + count_view_crossings(&view), + 0, + "a link terminating at a flow valve must not count as crossing the pipe" + ); +} + +/// The flow-segment naming contract that the suppression relies on: a flow +/// point attached to a stock/cloud names its pipe vertex `elem_{attached_uid}` +/// (so a link incident on that stock/cloud, which uses the same name, is +/// suppressed at the shared connection point), the valve is injected as an +/// `elem_{flow.uid}` vertex on the pipe (so a link incident on the valve is +/// suppressed there), and a free point keeps the per-flow `flow_{uid}#{i}` +/// name (so a genuine mid-span crossing is still counted). This is the +/// node-name contract; the end-to-end suppression is exercised by the valve and +/// mid-span tests, since for an attached stock/cloud the link endpoint clips to +/// the element boundary and only grazes the pipe through the shared vertex. +#[test] +fn test_build_view_segments_flow_vertex_naming() { + let flow_uid = 2; + let stock_uid = 1; + let cloud_uid = 3; + let view = cv_view(vec![ + cv_stock(stock_uid, 602.4000244140625, 259.8000183105469), + cv_flow_pts( + flow_uid, + 518.2726610523725, + 258.60003662109375, + (456.79998779296875, 258.60003662109375, Some(cloud_uid)), + (579.9000244140625, 258.60003662109375, Some(stock_uid)), + ), + cv_cloud(cloud_uid, flow_uid, 456.79998779296875, 258.60003662109375), + ]); + + let segs = build_view_segments(&view); + // The pipe splits at the valve into two sub-segments: + // elem_3 (cloud) -> elem_2 (valve) and elem_2 (valve) -> elem_1 (stock) + let names: Vec<(String, String)> = segs + .iter() + .map(|s| (s.from_node.clone(), s.to_node.clone())) + .collect(); + assert_eq!( + names, + vec![ + ("elem_3".to_string(), "elem_2".to_string()), + ("elem_2".to_string(), "elem_1".to_string()), + ], + "flow pipe must name attached endpoints elem_ and split at the valve as elem_" + ); + + // A free (unattached) interior point keeps the per-flow name. + let free_view = cv_view(vec![cv_flow_pts( + flow_uid, + 518.2726610523725, + 258.60003662109375, + (456.79998779296875, 258.60003662109375, None), + (579.9000244140625, 258.60003662109375, None), + )]); + let free_segs = build_view_segments(&free_view); + let free_names: Vec<(String, String)> = free_segs + .iter() + .map(|s| (s.from_node.clone(), s.to_node.clone())) + .collect(); + assert_eq!( + free_names, + vec![ + (format!("flow_{flow_uid}#0"), format!("elem_{flow_uid}")), + (format!("elem_{flow_uid}"), format!("flow_{flow_uid}#1")), + ], + "an unattached flow point keeps its per-flow name; only the valve is elem_" + ); +} + +/// A GENUINE mid-span crossing of a flow pipe -- a link that crosses the pipe +/// away from any element the flow shares -- must STILL be counted. This guards +/// against the valve/attachment suppression over-suppressing real crossings. +#[test] +fn test_count_view_crossings_link_crosses_flow_pipe_midspan_counted() { + // Flow valve at (100, 100), pipe from x=40 to x=160 at y=100. A straight + // link runs vertically through x=70 (between the cloud end and the valve, + // so it does NOT touch the valve, the cloud, or the stock), crossing the + // pipe once. + let flow_uid = 20; + let view = cv_view(vec![ + cv_cloud(1, flow_uid, 40.0, 100.0), + cv_stock(2, 200.0, 100.0), + cv_aux(3, 70.0, 50.0), + cv_aux(4, 70.0, 150.0), + cv_flow(flow_uid, 100.0, 100.0, 1, 2), + // Link from a3 (above the pipe) to a4 (below the pipe), crossing the + // pipe at x=70 -- nowhere near the valve or either attached element. + cv_link(30, 3, 4, LinkShape::Straight), + ]); + + assert_eq!( + count_view_crossings(&view), + 1, + "a genuine mid-span crossing of the flow pipe must still be counted" + ); +} diff --git a/src/simlin-engine/src/layout/eval_stats.rs b/src/simlin-engine/src/layout/eval_stats.rs new file mode 100644 index 000000000..4c6fd5a56 --- /dev/null +++ b/src/simlin-engine/src/layout/eval_stats.rs @@ -0,0 +1,1147 @@ +// Copyright 2026 The Simlin Authors. All rights reserved. +// Use of this source code is governed by the Apache License, +// Version 2.0, that can be found in the LICENSE file. + +// pattern: Functional Core +// +// Pure statistics for layout-quality seed-sample distributions, mirroring Go's +// `benchstat`: many per-seed samples reduced to a center + spread, plus a +// non-parametric significance test (Mann-Whitney U) on differences. +// +// There is NO I/O in this module: it takes slices of numbers, computes scalars, +// and returns them. Every primitive returns a finite, documented default +// (`0.0`, or a non-significant `p_value` of `1.0`) on empty or degenerate +// input -- it must never return NaN, matching the engine's no-NaN policy for +// statistics. That makes every term trivially testable with hand-computed +// expected values (see the inline tests below). +// +// The corpus sweep (Phase 3) is the imperative shell that fills these structs +// from real layouts. + +use crate::layout::metrics::LayoutMetrics; + +/// Geometric mean of strictly-positive values: `exp(mean(ln(x)))`. +/// +/// Returns `0.0` for an empty slice. Values must be `> 0`; layout costs are +/// `>= 0`, so callers floor with a small epsilon before calling (see +/// [`CorpusReport::from_model_stats`]) so a single `0` cost cannot zero the +/// whole-corpus geometric mean. +pub fn geomean(values: &[f64]) -> f64 { + if values.is_empty() { + return 0.0; + } + // The geometric mean of a single value is that value exactly; short-circuit + // to avoid a needless ln/exp round-trip (which would return e.g. + // 4.999999999999999 for an input of 5.0). + if values.len() == 1 { + return values[0]; + } + let sum_ln: f64 = values.iter().map(|&x| x.ln()).sum(); + (sum_ln / values.len() as f64).exp() +} + +/// Linear-interpolated percentile using the "type 7" convention (NumPy's +/// default): for sorted `x` of length `n` and `p` in `[0, 1]`, the fractional +/// rank is `p * (n - 1)`, then the result interpolates linearly between the +/// values at the floor and ceil of that rank. +/// +/// Returns `0.0` for an empty slice and the single value for `n == 1`. +/// `values` need not be pre-sorted -- a copy is sorted internally. `p` is +/// clamped to `[0, 1]`. +pub fn percentile(values: &[f64], p: f64) -> f64 { + if values.is_empty() { + return 0.0; + } + let n = values.len(); + if n == 1 { + return values[0]; + } + + let mut sorted = values.to_vec(); + sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)); + + let p = p.clamp(0.0, 1.0); + // Type-7 fractional rank in [0, n-1]. + let rank = p * (n as f64 - 1.0); + let lo = rank.floor() as usize; + let hi = rank.ceil() as usize; + if lo == hi { + return sorted[lo]; + } + let frac = rank - lo as f64; + sorted[lo] * (1.0 - frac) + sorted[hi] * frac +} + +/// Median, equal to `percentile(values, 0.5)`. +pub fn median(values: &[f64]) -> f64 { + percentile(values, 0.5) +} + +/// Mann-Whitney U test result for two independent samples. +#[derive(Clone, Copy, Debug, PartialEq)] +pub struct MannWhitney { + /// The smaller of `u1` and `u2`. + pub u: f64, + /// U statistic for sample `a`. + pub u1: f64, + /// U statistic for sample `b`. + pub u2: f64, + /// Two-sided p-value (normal approximation with tie + continuity + /// correction). + pub p_value: f64, +} + +/// Mann-Whitney U (a.k.a. Wilcoxon rank-sum) test on two independent samples. +/// +/// Ranks the pooled samples, averaging tied ranks; computes U from the rank +/// sums; reports the two-sided p-value via the normal approximation with tie +/// correction and continuity correction. For tiny samples this approximation +/// is rough; the sweep uses M >= ~20 seeds where it is good. +/// +/// Returns `p_value = 1.0` (non-significant) when either sample is empty or all +/// pooled values are identical (no separation is possible, so the variance of +/// the normal approximation is zero). +pub fn mann_whitney_u(a: &[f64], b: &[f64]) -> MannWhitney { + let n1 = a.len(); + let n2 = b.len(); + if n1 == 0 || n2 == 0 { + // No separation possible with an empty sample. Report a degenerate but + // finite result with a non-significant p-value. + return MannWhitney { + u: 0.0, + u1: 0.0, + u2: 0.0, + p_value: 1.0, + }; + } + + // 1. Pool, tagging each value with which sample it came from (false = a), + // sort by value, and assign average ranks (1..=N) to tied groups. + let mut pooled: Vec<(f64, bool)> = Vec::with_capacity(n1 + n2); + pooled.extend(a.iter().map(|&v| (v, false))); + pooled.extend(b.iter().map(|&v| (v, true))); + pooled.sort_by(|x, y| x.0.partial_cmp(&y.0).unwrap_or(std::cmp::Ordering::Equal)); + + let n = (n1 + n2) as f64; + let mut r1 = 0.0; // sum of ranks belonging to sample `a` + // Σ (t^3 - t) over each tie group of size t, for the variance correction. + let mut tie_term = 0.0; + let mut i = 0; + while i < pooled.len() { + // Extend [i, j) over the run of values equal to pooled[i].0. + let mut j = i + 1; + while j < pooled.len() && pooled[j].0 == pooled[i].0 { + j += 1; + } + let group_len = j - i; + // Ranks are 1-based; the average rank of positions i..j (0-based) is + // ((i+1) + j) / 2. + let avg_rank = ((i + 1) + j) as f64 / 2.0; + for entry in &pooled[i..j] { + if !entry.1 { + r1 += avg_rank; + } + } + if group_len > 1 { + let t = group_len as f64; + tie_term += t * t * t - t; + } + i = j; + } + + // 2. U statistics from the rank sums. + let n1f = n1 as f64; + let n2f = n2 as f64; + let u1 = r1 - n1f * (n1f + 1.0) / 2.0; + let u2 = n1f * n2f - u1; + let u = u1.min(u2); + + // 3. Mean and tie-corrected variance of the U distribution. + let mu = n1f * n2f / 2.0; + let variance = (n1f * n2f / 12.0) * ((n + 1.0) - tie_term / (n * (n - 1.0))); + + // 4. Two-sided p-value via the normal approximation with a 0.5 continuity + // correction. When the variance is zero (all pooled values identical, + // or n == 1 with no spread), no separation is possible -- report the + // non-significant default rather than dividing by zero. + let p_value = if variance <= 0.0 { + 1.0 + } else { + let z = ((u - mu).abs() - 0.5).max(0.0) / variance.sqrt(); + (2.0 * (1.0 - phi(z))).clamp(0.0, 1.0) + }; + + MannWhitney { u, u1, u2, p_value } +} + +/// Error function via the Abramowitz & Stegun 7.1.26 rational approximation +/// (max absolute error ~1.5e-7) -- ample accuracy for a significance verdict. +/// +/// A small local copy keeps this module self-contained and independently +/// testable (the VM-internal `crate::alloc::erfc_approx`/`normal_cdf` are an +/// implementation detail of the allocation opcodes). +fn erf(x: f64) -> f64 { + // A&S 7.1.26 is stated for x >= 0; erf is odd, so reflect for x < 0. + let sign = if x < 0.0 { -1.0 } else { 1.0 }; + let x = x.abs(); + + const A1: f64 = 0.254_829_592; + const A2: f64 = -0.284_496_736; + const A3: f64 = 1.421_413_741; + const A4: f64 = -1.453_152_027; + const A5: f64 = 1.061_405_429; + const P: f64 = 0.327_591_1; + + let t = 1.0 / (1.0 + P * x); + // Horner form of (a1 t + a2 t^2 + a3 t^3 + a4 t^4 + a5 t^5). + let poly = ((((A5 * t + A4) * t + A3) * t + A2) * t + A1) * t; + let y = 1.0 - poly * (-x * x).exp(); + sign * y +} + +/// Standard normal CDF, `Phi(x) = 0.5 * (1 + erf(x / sqrt(2)))`. +fn phi(x: f64) -> f64 { + 0.5 * (1.0 + erf(x / std::f64::consts::SQRT_2)) +} + +/// Floor applied to each model's median before it enters the corpus geometric +/// mean. A geometric mean is the product of its terms, so a single `0` median +/// would zero the whole aggregate; flooring with this small epsilon keeps a +/// genuinely-perfect (zero-cost) model from collapsing the corpus number while +/// remaining far below any meaningful cost. Documented and applied only in +/// [`CorpusReport::from_model_stats`]. +pub const GEOMEAN_FLOOR_EPSILON: f64 = 1e-9; + +/// One per-seed layout sample: the seed that produced the layout, its computed +/// metrics, and the scalar weighted cost the optimizer minimizes. +/// +/// `Serialize`/`Deserialize` let the corpus sweep round-trip a full +/// [`CorpusReport`] (including these per-seed samples) through JSON, so the +/// committed baseline report can be read back and the per-model seed-sample +/// cost sets re-run through [`mann_whitney_u`] by [`compare`]. +#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)] +pub struct MetricSample { + pub seed: u64, + pub metrics: LayoutMetrics, + pub weighted_cost: f64, +} + +/// Aggregated statistics for one model's seed sweep: the raw per-seed samples +/// plus the center (`median_cost`), spread (`p25`, `p75`), the best-of-k +/// production proxy, and the best/median/worst seeds (which drive Phase 3's +/// PNG renders). +/// +/// `Serialize`/`Deserialize` ride on [`MetricSample`]'s so a [`CorpusReport`] +/// round-trips through JSON (see [`MetricSample`]). +#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)] +pub struct ModelStats { + pub model: String, + /// One sample per seed. + pub samples: Vec, + pub median_cost: f64, + /// `(p25, p75)` of the weighted costs. + pub spread: (f64, f64), + /// Production proxy: the min weighted cost over the k production seeds. + pub best_of_k_cost: f64, + pub best_seed: u64, + pub median_seed: u64, + pub worst_seed: u64, +} + +/// Corpus-wide report: one `ModelStats` per model plus the geometric mean of +/// the per-model medians (the single headline aggregate, benchstat-style). +/// +/// `Serialize`/`Deserialize` let the corpus sweep write this report to the +/// committed `examples/layout_eval_baseline.json` and read it back for the +/// baseline-vs-candidate diff (`compare`). The full report -- including each +/// model's per-seed `samples` -- round-trips so `compare` can re-run +/// Mann-Whitney U over the seed-sample cost sets. +#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)] +pub struct CorpusReport { + pub per_model: Vec, + pub geomean_of_medians: f64, +} + +impl ModelStats { + /// Summarize a model's per-seed samples. + /// + /// `production_seeds` is the fixed seed set used for the best-of-k proxy: + /// `best_of_k_cost` is the min `weighted_cost` among the samples whose seed + /// is in that set, falling back to the global min when none of the + /// production seeds were sampled. The median seed is the sample whose cost + /// is closest to `median_cost`, breaking ties on the lowest seed (so the + /// chosen render is deterministic). Empty `samples` yields all-zero fields + /// and seeds of `0` -- no panic. + pub fn from_samples( + model: String, + samples: Vec, + production_seeds: &[u64], + ) -> ModelStats { + if samples.is_empty() { + return ModelStats { + model, + samples, + median_cost: 0.0, + spread: (0.0, 0.0), + best_of_k_cost: 0.0, + best_seed: 0, + median_seed: 0, + worst_seed: 0, + }; + } + + let costs: Vec = samples.iter().map(|s| s.weighted_cost).collect(); + let median_cost = median(&costs); + let spread = (percentile(&costs, 0.25), percentile(&costs, 0.75)); + + // best/worst seeds: the seeds of the global min / max weighted_cost. + // Tie-break on the lowest seed so the chosen render is deterministic. + let best_seed = samples + .iter() + .min_by(|x, y| { + x.weighted_cost + .partial_cmp(&y.weighted_cost) + .unwrap_or(std::cmp::Ordering::Equal) + .then(x.seed.cmp(&y.seed)) + }) + .map(|s| s.seed) + .unwrap_or(0); + let worst_seed = samples + .iter() + .max_by(|x, y| { + x.weighted_cost + .partial_cmp(&y.weighted_cost) + .unwrap_or(std::cmp::Ordering::Equal) + // For a tie on cost, max_by returns the LATER-compared-greater + // element; flip the seed comparison so the lowest seed wins. + .then(y.seed.cmp(&x.seed)) + }) + .map(|s| s.seed) + .unwrap_or(0); + + // median seed: the sample whose cost is closest to `median_cost`, + // breaking ties on the lowest seed. + let median_seed = samples + .iter() + .min_by(|x, y| { + let dx = (x.weighted_cost - median_cost).abs(); + let dy = (y.weighted_cost - median_cost).abs(); + dx.partial_cmp(&dy) + .unwrap_or(std::cmp::Ordering::Equal) + .then(x.seed.cmp(&y.seed)) + }) + .map(|s| s.seed) + .unwrap_or(0); + + // best-of-k: min weighted_cost among samples whose seed is a production + // seed; fall back to the global min when none were sampled. + let prod_min = samples + .iter() + .filter(|s| production_seeds.contains(&s.seed)) + .map(|s| s.weighted_cost) + .fold(f64::INFINITY, f64::min); + let best_of_k_cost = if prod_min.is_finite() { + prod_min + } else { + costs.iter().cloned().fold(f64::INFINITY, f64::min) + }; + + ModelStats { + model, + samples, + median_cost, + spread, + best_of_k_cost, + best_seed, + median_seed, + worst_seed, + } + } +} + +impl CorpusReport { + /// Build a corpus report. `geomean_of_medians` is the geometric mean of + /// each model's `median_cost`, with each median floored by + /// [`GEOMEAN_FLOOR_EPSILON`] so a single `0` median cannot zero the whole + /// aggregate. An empty corpus yields `geomean_of_medians == 0.0`. + pub fn from_model_stats(per_model: Vec) -> CorpusReport { + let medians: Vec = per_model + .iter() + .map(|m| m.median_cost.max(GEOMEAN_FLOOR_EPSILON)) + .collect(); + let geomean_of_medians = geomean(&medians); + CorpusReport { + per_model, + geomean_of_medians, + } + } +} + +/// Per-model verdict from comparing a baseline against a candidate report. +/// +/// `Serialize` lets the corpus sweep embed the baseline-vs-candidate diff into +/// its `metrics.json` artifact. The verdict is never read back from JSON (it is +/// recomputed by `compare` on every run), so it carries no `Deserialize`. +#[derive(Clone, Debug, serde::Serialize)] +pub struct ModelComparison { + pub model: String, + pub baseline_median: f64, + pub candidate_median: f64, + /// `candidate_median / baseline_median - 1.0`, or `0.0` when the baseline + /// median is `0` (so a degenerate baseline never produces inf/NaN). A + /// negative ratio means the candidate is cheaper (better). + pub delta_ratio: f64, + /// Two-sided Mann-Whitney U p-value over the two models' seed-sample + /// `weighted_cost` vectors. + pub p_value: f64, + /// `p_value < SIGNIFICANCE_ALPHA`. + pub significant: bool, +} + +/// Result of comparing two corpus reports: one [`ModelComparison`] per matched +/// model plus the corpus-wide aggregate delta and significance verdict. +/// +/// `Serialize` lets the corpus sweep embed this diff into its `metrics.json` +/// artifact. Like [`ModelComparison`] it carries no `Deserialize`: the diff is +/// recomputed by `compare` on every run, never read back from JSON. +#[derive(Clone, Debug, serde::Serialize)] +pub struct Comparison { + /// One entry per model present in BOTH reports (unmatched models are + /// skipped -- see [`compare`]), in baseline iteration order. + pub per_model: Vec, + /// `geomean(candidate medians) / geomean(baseline medians) - 1.0` over the + /// matched per-model medians, or `0.0` when the baseline geomean is `0`. + pub aggregate_delta_ratio: f64, + /// Two-sided Mann-Whitney U p-value over the matched per-model medians (see + /// [`compare`] for why Mann-Whitney rather than a paired test). + pub aggregate_p_value: f64, + /// `aggregate_p_value < SIGNIFICANCE_ALPHA`. + pub aggregate_significant: bool, +} + +/// Significance threshold for the p-value verdicts -- the conventional 5%. +pub const SIGNIFICANCE_ALPHA: f64 = 0.05; + +/// Compute `candidate / baseline - 1.0`, returning `0.0` when `baseline == 0` +/// so a degenerate (zero) baseline never produces an infinite or NaN ratio. +/// Mirrors the no-NaN policy of the rest of this module. +fn delta_ratio(baseline: f64, candidate: f64) -> f64 { + if baseline == 0.0 { + 0.0 + } else { + candidate / baseline - 1.0 + } +} + +/// Compare two corpus reports. +/// +/// Models are matched by `model` name; only models present in BOTH reports are +/// compared. A model present in just one report is **skipped** (it has no +/// counterpart to difference against). The returned `per_model` is in baseline +/// iteration order. +/// +/// Per matched model: the two seed-sample `weighted_cost` vectors are run +/// through [`mann_whitney_u`]; `delta_ratio` is computed from the medians +/// (`0.0` when the baseline median is `0`); `significant` is +/// `p_value < SIGNIFICANCE_ALPHA`. +/// +/// Aggregate: `aggregate_delta_ratio` is the ratio of the candidate-side to +/// baseline-side geometric mean of the matched per-model medians (each side +/// floored by [`GEOMEAN_FLOOR_EPSILON`] exactly as [`CorpusReport`] does, so a +/// `0` median can't zero the aggregate). `aggregate_p_value` is +/// `mann_whitney_u(baseline_medians, candidate_medians).p_value` over the +/// matched per-model medians. +/// +/// The aggregate significance test treats the two median vectors as +/// independent samples (Mann-Whitney U), per the design. A paired test such as +/// Wilcoxon signed-rank -- which would exploit the model-by-model pairing of +/// the matched medians -- is a documented future refinement, not implemented +/// here. +/// +/// On empty or fully-disjoint reports there are no matched models: +/// `per_model` is empty, `aggregate_delta_ratio == 0.0`, and the aggregate is +/// non-significant with a finite p-value (no NaN). +pub fn compare(baseline: &CorpusReport, candidate: &CorpusReport) -> Comparison { + // Index the candidate's models by name so we can pull the matching entry in + // baseline iteration order without an O(n^2) scan. + let candidate_by_name: std::collections::HashMap<&str, &ModelStats> = candidate + .per_model + .iter() + .map(|m| (m.model.as_str(), m)) + .collect(); + + let mut per_model = Vec::new(); + let mut baseline_medians = Vec::new(); + let mut candidate_medians = Vec::new(); + + for base in &baseline.per_model { + let Some(cand) = candidate_by_name.get(base.model.as_str()) else { + // Unmatched: present only in the baseline, so skip it. + continue; + }; + + let baseline_costs: Vec = base.samples.iter().map(|s| s.weighted_cost).collect(); + let candidate_costs: Vec = cand.samples.iter().map(|s| s.weighted_cost).collect(); + let mw = mann_whitney_u(&baseline_costs, &candidate_costs); + + let baseline_median = base.median_cost; + let candidate_median = cand.median_cost; + let ratio = delta_ratio(baseline_median, candidate_median); + + baseline_medians.push(baseline_median); + candidate_medians.push(candidate_median); + + per_model.push(ModelComparison { + model: base.model.clone(), + baseline_median, + candidate_median, + delta_ratio: ratio, + p_value: mw.p_value, + significant: mw.p_value < SIGNIFICANCE_ALPHA, + }); + } + + // Aggregate delta: ratio of the two geomean-of-medians, each side floored + // by the same epsilon CorpusReport uses so a single 0 median can't zero a + // side's geometric mean. + let baseline_floored: Vec = baseline_medians + .iter() + .map(|&m| m.max(GEOMEAN_FLOOR_EPSILON)) + .collect(); + let candidate_floored: Vec = candidate_medians + .iter() + .map(|&m| m.max(GEOMEAN_FLOOR_EPSILON)) + .collect(); + let aggregate_delta_ratio = + delta_ratio(geomean(&baseline_floored), geomean(&candidate_floored)); + + let aggregate_p_value = mann_whitney_u(&baseline_medians, &candidate_medians).p_value; + + Comparison { + per_model, + aggregate_delta_ratio, + aggregate_p_value, + aggregate_significant: aggregate_p_value < SIGNIFICANCE_ALPHA, + } +} + +#[cfg(test)] +mod tests { + use super::*; + use proptest::prelude::*; + + const EPS: f64 = 1e-9; + + fn close(a: f64, b: f64) -> bool { + (a - b).abs() < EPS + } + + // --- geomean --- + + #[test] + fn test_geomean_two_values() { + // sqrt(2*8) = sqrt(16) = 4. + assert!(close(geomean(&[2.0, 8.0]), 4.0), "{}", geomean(&[2.0, 8.0])); + } + + #[test] + fn test_geomean_three_values() { + // cbrt(1*10*100) = cbrt(1000) = 10. + let g = geomean(&[1.0, 10.0, 100.0]); + assert!(close(g, 10.0), "{}", g); + } + + #[test] + fn test_geomean_empty_is_zero() { + assert_eq!(geomean(&[]), 0.0); + } + + #[test] + fn test_geomean_single() { + assert_eq!(geomean(&[5.0]), 5.0); + } + + // --- percentile / median (type 7) --- + + #[test] + fn test_median_odd() { + assert_eq!(median(&[1.0, 2.0, 3.0]), 2.0); + } + + #[test] + fn test_median_even() { + assert_eq!(median(&[1.0, 2.0, 3.0, 4.0]), 2.5); + } + + #[test] + fn test_percentile_type7_quartiles() { + // NumPy np.percentile([1,2,3,4,5], 25) == 2.0, 75 == 4.0. + assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.25), 2.0); + assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.75), 4.0); + } + + #[test] + fn test_percentile_empty_is_zero() { + assert_eq!(percentile(&[], 0.5), 0.0); + } + + #[test] + fn test_percentile_single() { + assert_eq!(percentile(&[7.0], 0.9), 7.0); + } + + #[test] + fn test_percentile_unsorted_input() { + // The function must sort a copy: a reversed input gives the same answer. + assert_eq!(percentile(&[5.0, 4.0, 3.0, 2.0, 1.0], 0.25), 2.0); + } + + #[test] + fn test_percentile_endpoints() { + assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 0.0), 1.0); + assert_eq!(percentile(&[1.0, 2.0, 3.0, 4.0, 5.0], 1.0), 5.0); + } + + // --- Mann-Whitney U --- + + #[test] + fn test_mann_whitney_complete_separation() { + // a strictly below b: complete separation. With n1 = n2 = 4, + // r1 = 1+2+3+4 = 10, u1 = 10 - 4*5/2 = 0, u2 = 16 - 0 = 16, u = 0. + let r = mann_whitney_u(&[1.0, 2.0, 3.0, 4.0], &[5.0, 6.0, 7.0, 8.0]); + assert_eq!(r.u1, 0.0); + assert_eq!(r.u2, 16.0); + assert_eq!(r.u, 0.0); + assert!( + r.p_value < 0.05, + "p_value {} should be significant", + r.p_value + ); + } + + #[test] + fn test_mann_whitney_no_difference() { + // Identical samples: every value tied. u1 == u2 == n1*n2/2 == 8, and + // the tie-corrected variance is 0, so p_value is the non-significant + // default of 1.0. + let r = mann_whitney_u(&[1.0, 2.0, 3.0, 4.0], &[1.0, 2.0, 3.0, 4.0]); + assert_eq!(r.u1, 8.0); + assert_eq!(r.u2, 8.0); + assert!( + r.p_value > 0.5, + "p_value {} should be non-significant", + r.p_value + ); + } + + #[test] + fn test_mann_whitney_u1_plus_u2_invariant() { + // u1 + u2 == n1*n2 on a mixed (interleaved, with ties) example. + let a = [1.0, 3.0, 5.0, 7.0, 3.0]; + let b = [2.0, 4.0, 6.0, 3.0]; + let r = mann_whitney_u(&a, &b); + let n1n2 = (a.len() * b.len()) as f64; + assert!( + close(r.u1 + r.u2, n1n2), + "u1 {} + u2 {} != n1*n2 {}", + r.u1, + r.u2, + n1n2 + ); + } + + #[test] + fn test_mann_whitney_empty_is_nonsignificant() { + let r = mann_whitney_u(&[], &[1.0, 2.0, 3.0]); + assert_eq!(r.p_value, 1.0); + assert!(r.u.is_finite()); + assert!(r.u1.is_finite()); + assert!(r.u2.is_finite()); + } + + // --- erf / Phi sanity (exercised indirectly through the p-value path) --- + + #[test] + fn test_phi_zero() { + assert!(close(phi(0.0), 0.5), "{}", phi(0.0)); + } + + #[test] + fn test_phi_1_96() { + // The classic 97.5th percentile of the standard normal. + assert!((phi(1.96) - 0.975).abs() < 1e-3, "{}", phi(1.96)); + } + + #[test] + fn test_erf_known_values() { + assert!(close(erf(0.0), 0.0), "{}", erf(0.0)); + // erf(1) ~= 0.8427007929 (A&S 7.1.26 max error ~1.5e-7). + assert!((erf(1.0) - 0.842_700_792_9).abs() < 1e-6, "{}", erf(1.0)); + // erf is odd. + assert!(close(erf(-0.5), -erf(0.5)), "erf not odd"); + } + + // --- No NaN: every primitive on empty / degenerate input is finite --- + + #[test] + fn test_no_nan_on_degenerate_input() { + assert!(geomean(&[]).is_finite()); + assert!(geomean(&[3.0]).is_finite()); + assert!(percentile(&[], 0.5).is_finite()); + assert!(percentile(&[1.0], 0.5).is_finite()); + assert!(median(&[]).is_finite()); + let r0 = mann_whitney_u(&[], &[]); + assert!(r0.u.is_finite() && r0.u1.is_finite() && r0.u2.is_finite()); + assert!(r0.p_value.is_finite()); + let r1 = mann_whitney_u(&[1.0, 1.0], &[1.0, 1.0]); + assert!(r1.p_value.is_finite()); + assert!(phi(0.0).is_finite()); + assert!(erf(0.0).is_finite()); + } + + // --- property tests for the statistics invariants --- + + proptest! { + #![proptest_config(ProptestConfig::with_cases(128))] + + /// The geometric mean is a function of the multiset of values: it is + /// invariant under any permutation of the input (the product of the + /// values is commutative). + #[test] + fn prop_geomean_permutation_invariant( + mut vals in prop::collection::vec(0.01f64..1000.0, 1..=12), + seed in any::(), + ) { + let base = geomean(&vals); + // Deterministic Fisher-Yates shuffle driven by `seed` so the + // property is a pure rearrangement of the same multiset. + let mut state = seed | 1; + for i in (1..vals.len()).rev() { + state = state.wrapping_mul(6364136223846793005).wrapping_add(1); + let j = (state >> 33) as usize % (i + 1); + vals.swap(i, j); + } + let shuffled = geomean(&vals); + // Relative tolerance: ln/exp accumulates rounding across orderings. + prop_assert!( + (base - shuffled).abs() <= 1e-9 * base.abs().max(1.0), + "geomean changed under permutation: {} vs {}", + base, + shuffled + ); + } + + /// `percentile` is bounded by the sample's min and max and is monotone + /// non-decreasing in `p`. Both are core type-7 invariants and both must + /// produce finite values. + #[test] + fn prop_percentile_bounded_and_monotone( + vals in prop::collection::vec(-500.0f64..500.0, 1..=20), + p_lo in 0.0f64..=1.0, + delta in 0.0f64..=1.0, + ) { + let p_hi = (p_lo + delta).min(1.0); + let min = vals.iter().cloned().fold(f64::INFINITY, f64::min); + let max = vals.iter().cloned().fold(f64::NEG_INFINITY, f64::max); + let q_lo = percentile(&vals, p_lo); + let q_hi = percentile(&vals, p_hi); + prop_assert!(q_lo.is_finite() && q_hi.is_finite()); + // Bounded by the data range (small slack for interpolation rounding). + prop_assert!(q_lo >= min - 1e-9 && q_lo <= max + 1e-9, "{} not in [{},{}]", q_lo, min, max); + // Monotone non-decreasing in p. + prop_assert!(q_hi >= q_lo - 1e-9, "percentile not monotone: {} < {}", q_hi, q_lo); + } + + /// The partition identity `u1 + u2 == n1 * n2` holds for ANY pair of + /// non-empty samples, and the reported `u` is the smaller of the two. + /// The two-sided p-value is always a finite probability in [0, 1]. + #[test] + fn prop_mann_whitney_partition_identity( + a in prop::collection::vec(-50.0f64..50.0, 1..=15), + b in prop::collection::vec(-50.0f64..50.0, 1..=15), + ) { + let r = mann_whitney_u(&a, &b); + let n1n2 = (a.len() * b.len()) as f64; + prop_assert!( + (r.u1 + r.u2 - n1n2).abs() < 1e-9, + "u1 {} + u2 {} != n1*n2 {}", + r.u1, r.u2, n1n2 + ); + prop_assert!((r.u - r.u1.min(r.u2)).abs() < 1e-9); + prop_assert!(r.p_value.is_finite() && (0.0..=1.0).contains(&r.p_value)); + } + } + + // --- Task 2: ModelStats / CorpusReport constructors --- + + /// A `LayoutMetrics` whose `node_overlap` carries `cost` and every other + /// term is zero, so `weighted_cost` with `node_overlap == 1.0` returns + /// exactly `cost`. Keeps the test fixtures readable while still exercising + /// the real struct. + fn metrics_with_cost(cost: f64) -> LayoutMetrics { + LayoutMetrics { + node_overlap: cost, + node_connector_overlap: 0.0, + label_overlap: 0.0, + crossings: 0.0, + sprawl: 0.0, + edge_length_cv: 0.0, + aspect_penalty: 0.0, + chain_straightness: 0.0, + loop_compactness: 0.0, + } + } + + fn sample(seed: u64, cost: f64) -> MetricSample { + MetricSample { + seed, + metrics: metrics_with_cost(cost), + weighted_cost: cost, + } + } + + #[test] + fn test_from_samples_known_set() { + // Five seeds with hand-pickable costs. + // seed 1 -> 10, seed 2 -> 30, seed 3 -> 20, seed 4 -> 50, seed 5 -> 40 + // Sorted costs: [10, 20, 30, 40, 50]. + // median (type-7, p=0.5) = 30 + // p25 = 20, p75 = 40 + // global min cost = 10 (seed 1), max cost = 50 (seed 4) + // median-nearest cost = 30 (seed 2) + let samples = vec![ + sample(1, 10.0), + sample(2, 30.0), + sample(3, 20.0), + sample(4, 50.0), + sample(5, 40.0), + ]; + // Production seeds: 3 and 5 (costs 20 and 40). Min over them is 20, which + // is NOT the global min (10, seed 1). This is the "best-of-k differs from + // the global min" case. + let production_seeds = [3u64, 5u64]; + let stats = ModelStats::from_samples("m".to_string(), samples, &production_seeds); + + assert_eq!(stats.model, "m"); + assert_eq!(stats.median_cost, 30.0); + assert_eq!(stats.spread, (20.0, 40.0)); + assert_eq!( + stats.best_of_k_cost, 20.0, + "best-of-k must use production seeds" + ); + assert_eq!(stats.best_seed, 1, "global min cost is seed 1"); + assert_eq!(stats.worst_seed, 4, "global max cost is seed 4"); + assert_eq!(stats.median_seed, 2, "median-nearest cost is seed 2"); + } + + #[test] + fn test_from_samples_best_of_k_falls_back_to_global_min() { + // No production seed was sampled -> best_of_k_cost falls back to global + // min weighted_cost. + let samples = vec![sample(1, 10.0), sample(2, 30.0), sample(3, 20.0)]; + let production_seeds = [100u64, 200u64]; + let stats = ModelStats::from_samples("m".to_string(), samples, &production_seeds); + assert_eq!( + stats.best_of_k_cost, 10.0, + "no production seed sampled -> global min" + ); + } + + #[test] + fn test_from_samples_median_seed_tie_break_lowest() { + // Two seeds equidistant from the median cost: the lower seed wins. + // seeds 5, 9 with costs 10 and 30; sorted costs [10, 30] -> median 20. + // |10 - 20| == |30 - 20| == 10, a tie. Lowest seed (5) must win. + let samples = vec![sample(9, 30.0), sample(5, 10.0)]; + let stats = ModelStats::from_samples("m".to_string(), samples, &[]); + assert_eq!(stats.median_cost, 20.0); + assert_eq!(stats.median_seed, 5, "tie must break on the lowest seed"); + } + + #[test] + fn test_from_samples_worst_seed_tie_break_lowest() { + // Two seeds SHARE the maximum cost; the lower seed must win. The third + // (lower-cost) sample ensures the max is a genuine tie, not the only + // value. seeds 7 and 4 both cost 50 (the max); seed 2 costs 10. + // worst_seed must be 4 (the lower of the two tied-at-max seeds), NOT 7. + // This fails if the tie-break direction in from_samples were reversed + // (a `.then(x.seed.cmp(&y.seed))` after max_by would pick 7). + let samples = vec![sample(7, 50.0), sample(2, 10.0), sample(4, 50.0)]; + let stats = ModelStats::from_samples("m".to_string(), samples, &[]); + assert_eq!( + stats.worst_seed, 4, + "max-cost tie must break on the lowest seed" + ); + } + + #[test] + fn test_from_samples_empty_is_all_zero() { + let stats = ModelStats::from_samples("empty".to_string(), vec![], &[1, 2, 3]); + assert_eq!(stats.median_cost, 0.0); + assert_eq!(stats.spread, (0.0, 0.0)); + assert_eq!(stats.best_of_k_cost, 0.0); + assert_eq!(stats.best_seed, 0); + assert_eq!(stats.median_seed, 0); + assert_eq!(stats.worst_seed, 0); + // Finite, no NaN. + assert!(stats.median_cost.is_finite()); + assert!(stats.spread.0.is_finite() && stats.spread.1.is_finite()); + assert!(stats.best_of_k_cost.is_finite()); + } + + fn model_stats_with_median(model: &str, median: f64) -> ModelStats { + // Build a one-sample model whose median equals `median`. + ModelStats::from_samples(model.to_string(), vec![sample(1, median)], &[1]) + } + + #[test] + fn test_from_model_stats_geomean_of_medians() { + // Three models with medians 2, 8, 32: geomean = cbrt(2*8*32) = cbrt(512) = 8. + let per_model = vec![ + model_stats_with_median("a", 2.0), + model_stats_with_median("b", 8.0), + model_stats_with_median("c", 32.0), + ]; + let medians: Vec = per_model.iter().map(|m| m.median_cost).collect(); + let report = CorpusReport::from_model_stats(per_model); + assert!( + close(report.geomean_of_medians, geomean(&medians)), + "{} != {}", + report.geomean_of_medians, + geomean(&medians) + ); + assert!( + close(report.geomean_of_medians, 8.0), + "{}", + report.geomean_of_medians + ); + } + + #[test] + fn test_from_model_stats_zero_median_does_not_zero_aggregate() { + // A model with median 0 must not collapse the corpus geomean to 0; the + // epsilon floor keeps it positive and finite. + let per_model = vec![ + model_stats_with_median("a", 0.0), + model_stats_with_median("b", 10.0), + model_stats_with_median("c", 1000.0), + ]; + let report = CorpusReport::from_model_stats(per_model); + assert!( + report.geomean_of_medians > 0.0, + "a single 0 median must not zero the aggregate: got {}", + report.geomean_of_medians + ); + assert!(report.geomean_of_medians.is_finite()); + // It must equal the geomean of the floored medians, exactly. + let floored = [GEOMEAN_FLOOR_EPSILON, 10.0, 1000.0]; + assert!( + close(report.geomean_of_medians, geomean(&floored)), + "{} != {}", + report.geomean_of_medians, + geomean(&floored) + ); + } + + #[test] + fn test_from_model_stats_empty_corpus_is_zero() { + let report = CorpusReport::from_model_stats(vec![]); + assert_eq!(report.geomean_of_medians, 0.0); + assert!(report.geomean_of_medians.is_finite()); + } + + // --- Task 3: compare(baseline, candidate) --- + + /// Build a `ModelStats` directly from a list of `(seed, cost)` pairs, with + /// no production seeds (best-of-k irrelevant for the comparison tests). + fn model_stats_from_costs(model: &str, seed_costs: &[(u64, f64)]) -> ModelStats { + let samples: Vec = seed_costs + .iter() + .map(|&(seed, cost)| sample(seed, cost)) + .collect(); + ModelStats::from_samples(model.to_string(), samples, &[]) + } + + #[test] + fn test_compare_identical_report_is_zero_and_nonsignificant() { + // AC4.5: comparing a report against itself must report no change and no + // significance, with p-values pinned to the non-significant default. + let report = CorpusReport::from_model_stats(vec![ + model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0), (4, 40.0)]), + model_stats_from_costs("b", &[(1, 5.0), (2, 15.0), (3, 25.0), (4, 35.0)]), + ]); + + let cmp = compare(&report, &report); + + assert_eq!(cmp.per_model.len(), 2); + for m in &cmp.per_model { + assert_eq!(m.delta_ratio, 0.0, "model {} delta_ratio", m.model); + assert!(!m.significant, "model {} must not be significant", m.model); + // Identical seed samples ⇒ every value tied ⇒ non-significant. + assert!( + m.p_value > 0.5, + "model {} p_value {} should be non-significant", + m.model, + m.p_value + ); + } + assert_eq!(cmp.aggregate_delta_ratio, 0.0); + assert!(!cmp.aggregate_significant); + assert!( + cmp.aggregate_p_value > 0.5, + "aggregate p_value {} should be non-significant", + cmp.aggregate_p_value + ); + } + + #[test] + fn test_compare_clear_improvement_is_negative_and_significant() { + // Candidate strictly below baseline with non-overlapping seed samples: + // the aggregate delta is negative and the per-model verdict is + // significant where the two samples completely separate. + let baseline = CorpusReport::from_model_stats(vec![ + model_stats_from_costs( + "a", + &[(1, 100.0), (2, 110.0), (3, 120.0), (4, 130.0), (5, 140.0)], + ), + model_stats_from_costs( + "b", + &[(1, 200.0), (2, 210.0), (3, 220.0), (4, 230.0), (5, 240.0)], + ), + ]); + let candidate = CorpusReport::from_model_stats(vec![ + model_stats_from_costs( + "a", + &[(1, 10.0), (2, 11.0), (3, 12.0), (4, 13.0), (5, 14.0)], + ), + model_stats_from_costs( + "b", + &[(1, 20.0), (2, 21.0), (3, 22.0), (4, 23.0), (5, 24.0)], + ), + ]); + + let cmp = compare(&baseline, &candidate); + + assert_eq!(cmp.per_model.len(), 2); + for m in &cmp.per_model { + assert!( + m.delta_ratio < 0.0, + "model {} delta_ratio {} should be negative", + m.model, + m.delta_ratio + ); + assert!( + m.candidate_median < m.baseline_median, + "model {} candidate median {} should be below baseline {}", + m.model, + m.candidate_median, + m.baseline_median + ); + assert!( + m.significant, + "model {} (completely separated samples) should be significant; p_value {}", + m.model, m.p_value + ); + } + assert!( + cmp.aggregate_delta_ratio < 0.0, + "aggregate_delta_ratio {} should be negative", + cmp.aggregate_delta_ratio + ); + } + + #[test] + fn test_compare_only_matched_models_are_compared() { + // Models are matched by name; a model present in only one report is + // skipped. baseline has {a, b, only_baseline}; candidate has + // {a, b, only_candidate}. The matched set compared is {a, b}, in + // baseline order. + let baseline = CorpusReport::from_model_stats(vec![ + model_stats_from_costs("only_baseline", &[(1, 1.0), (2, 2.0)]), + model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0)]), + model_stats_from_costs("b", &[(1, 100.0), (2, 200.0), (3, 300.0)]), + ]); + let candidate = CorpusReport::from_model_stats(vec![ + model_stats_from_costs("b", &[(1, 100.0), (2, 200.0), (3, 300.0)]), + model_stats_from_costs("a", &[(1, 10.0), (2, 20.0), (3, 30.0)]), + model_stats_from_costs("only_candidate", &[(1, 9.0), (2, 8.0)]), + ]); + + let cmp = compare(&baseline, &candidate); + + // Exactly the two matched models, in baseline iteration order. + let names: Vec<&str> = cmp.per_model.iter().map(|m| m.model.as_str()).collect(); + assert_eq!( + names, + vec!["a", "b"], + "only matched models, in baseline order" + ); + // The unmatched names appear nowhere. + assert!(!names.contains(&"only_baseline")); + assert!(!names.contains(&"only_candidate")); + } + + #[test] + fn test_compare_zero_baseline_median_no_divide_by_zero() { + // No NaN: a model whose baseline median is 0 yields delta_ratio == 0.0 + // (not inf/NaN) and every reported field stays finite. + let baseline = CorpusReport::from_model_stats(vec![model_stats_from_costs( + "z", + &[(1, 0.0), (2, 0.0), (3, 0.0)], + )]); + let candidate = CorpusReport::from_model_stats(vec![model_stats_from_costs( + "z", + &[(1, 5.0), (2, 6.0), (3, 7.0)], + )]); + + let cmp = compare(&baseline, &candidate); + + assert_eq!(cmp.per_model.len(), 1); + let m = &cmp.per_model[0]; + assert_eq!(m.baseline_median, 0.0); + assert_eq!( + m.delta_ratio, 0.0, + "delta_ratio with a 0 baseline median must be 0.0, not inf/NaN" + ); + assert!(m.delta_ratio.is_finite()); + assert!(m.candidate_median.is_finite()); + assert!(m.p_value.is_finite()); + assert!(cmp.aggregate_delta_ratio.is_finite()); + assert!(cmp.aggregate_p_value.is_finite()); + } + + #[test] + fn test_compare_empty_reports_are_finite_and_nonsignificant() { + // Degenerate input: two empty corpora compare to no per-model rows, a + // zero aggregate delta, and a finite non-significant verdict. + let empty = CorpusReport::from_model_stats(vec![]); + let cmp = compare(&empty, &empty); + assert!(cmp.per_model.is_empty()); + assert_eq!(cmp.aggregate_delta_ratio, 0.0); + assert!(cmp.aggregate_delta_ratio.is_finite()); + assert!(cmp.aggregate_p_value.is_finite()); + assert!(!cmp.aggregate_significant); + } + + #[test] + fn test_compare_no_matched_models_is_finite() { + // Reports with disjoint model names share no matched models: no + // per-model rows, a zero aggregate delta, and a finite verdict. + let baseline = + CorpusReport::from_model_stats(vec![model_stats_from_costs("a", &[(1, 10.0)])]); + let candidate = + CorpusReport::from_model_stats(vec![model_stats_from_costs("b", &[(1, 20.0)])]); + let cmp = compare(&baseline, &candidate); + assert!(cmp.per_model.is_empty()); + assert_eq!(cmp.aggregate_delta_ratio, 0.0); + assert!(cmp.aggregate_delta_ratio.is_finite()); + assert!(cmp.aggregate_p_value.is_finite()); + assert!(!cmp.aggregate_significant); + } + + #[test] + fn test_compare_significance_alpha_is_five_percent() { + // The exported significance threshold is the conventional 0.05. + assert_eq!(SIGNIFICANCE_ALPHA, 0.05); + } +} diff --git a/src/simlin-engine/src/layout/layout_selection_tests.rs b/src/simlin-engine/src/layout/layout_selection_tests.rs new file mode 100644 index 000000000..e7e832e55 --- /dev/null +++ b/src/simlin-engine/src/layout/layout_selection_tests.rs @@ -0,0 +1,502 @@ +// Copyright 2026 The Simlin Authors. All rights reserved. +// Use of this source code is governed by the Apache License, +// Version 2.0, that can be found in the LICENSE file. + +//! Rung-0 layout-selection and regression-guard tests (Phase 5 of the layout +//! quality eval): `select_best_layout` picks the lowest `weighted_cost` +//! candidate (even when that means *more* connector crossings than a rival), +//! the deterministic per-model `weighted_cost` ceiling guards against quality +//! regressions, and a fixed seed reproduces a byte-identical layout. Split out +//! of `layout_tests.rs` to keep that file under the per-file line cap, mirroring +//! the `crossings_tests.rs` precedent. + +use super::*; +use crate::datamodel; +use crate::layout::metrics::{MetricWeights, compute_layout_metrics}; +use crate::test_common::TestProject; + +/// `TestProject::build_datamodel` synthesizes a single model named `"main"`, so +/// every `generate_layout_with_config` call in this file targets that name. +const MAIN_MODEL: &str = "main"; + +/// A scalar aux at (`x`, `y`) with a unique name, so a selected view can be +/// identified by which marker element it carries. +fn marker_aux(uid: i32, name: &str, x: f64, y: f64) -> ViewElement { + ViewElement::Aux(view_element::Aux { + name: name.to_string(), + uid, + x, + y, + label_side: LabelSide::Bottom, + compat: None, + }) +} + +fn sel_link(uid: i32, from_uid: i32, to_uid: i32) -> ViewElement { + ViewElement::Link(view_element::Link { + uid, + from_uid, + to_uid, + shape: LinkShape::Straight, + polarity: None, + }) +} + +/// Wrap a set of view elements into a `StockFlow` carrying `name` as its marker +/// so `select_best_layout`'s winner is identifiable. +fn sel_view(name: &str, elements: Vec) -> datamodel::StockFlow { + datamodel::StockFlow { + name: Some(name.to_string()), + elements, + view_box: Rect { + x: 0.0, + y: 0.0, + width: 1000.0, + height: 1000.0, + }, + zoom: 1.0, + use_lettered_polarity: false, + font: None, + sketch_compat: None, + } +} + +/// A view whose two straight links cross exactly once (the diagonals of a +/// square): `count_view_crossings == 1`. +fn crossing_view(name: &str) -> datamodel::StockFlow { + sel_view( + name, + vec![ + marker_aux(1, "a1", 0.0, 0.0), + marker_aux(2, "a2", 100.0, 100.0), + marker_aux(3, "a3", 0.0, 100.0), + marker_aux(4, "a4", 100.0, 0.0), + sel_link(10, 1, 2), + sel_link(11, 3, 4), + ], + ) +} + +/// A view whose two straight links share an endpoint and never cross: +/// `count_view_crossings == 0`. +fn non_crossing_view(name: &str) -> datamodel::StockFlow { + sel_view( + name, + vec![ + marker_aux(1, "a1", 50.0, 50.0), + marker_aux(2, "a2", 100.0, 0.0), + marker_aux(3, "a3", 100.0, 100.0), + sel_link(10, 1, 2), + sel_link(11, 1, 3), + ], + ) +} + +/// AC6.1: selection minimizes `weighted_cost`, not crossings. The lowest-cost +/// candidate is deliberately built from a view with MORE connector crossings +/// than a rival, so the old "fewest crossings" rule would have picked the other +/// one. We assert the crossing inversion is real (via `count_view_crossings`), +/// then assert `select_best_layout` returns the lowest-`weighted_cost` view. +#[test] +fn test_select_best_layout_minimizes_weighted_cost_over_crossings() { + let crossing = crossing_view("more_crossings_low_cost"); + let non_crossing = non_crossing_view("fewer_crossings_high_cost"); + + // The inversion is genuine, not just narrative: the candidate we expect to + // win actually has strictly more crossings than the one we expect to lose. + let crossing_count = count_view_crossings(&crossing); + let non_crossing_count = count_view_crossings(&non_crossing); + assert_eq!(crossing_count, 1, "crossing view should have one crossing"); + assert_eq!( + non_crossing_count, 0, + "non-crossing view should have zero crossings" + ); + assert!( + crossing_count > non_crossing_count, + "the low-cost candidate must have more crossings than its rival, \ + so the choice differs from the old crossings-only rule" + ); + + // Hand-set costs so the MORE-crossings view is the cheaper one. Under the + // retired crossings-only rule `fewer_crossings_high_cost` (0 crossings) + // would win; under Rung 0 the lower `weighted_cost` wins. + let results = vec![ + Ok(LayoutResult { + view: crossing, + weighted_cost: 1.0, + seed: 42, + }), + Ok(LayoutResult { + view: non_crossing, + weighted_cost: 5.0, + seed: 123, + }), + ]; + + let best = select_best_layout(results).expect("selection should succeed"); + assert_eq!( + best.name.as_deref(), + Some("more_crossings_low_cost"), + "the lowest-weighted_cost candidate must win even with more crossings" + ); +} + +/// AC6.1 (tie-break): equal `weighted_cost`, the lower seed wins. This is the +/// same rule `test_select_best_layout_lowest_seed_on_tie` (in `layout_tests.rs`) +/// pins on hand-built `StockFlow` literals; here we re-assert it through the +/// marker-named helpers for completeness alongside the cost-ordering case. +#[test] +fn test_select_best_layout_tie_breaks_on_lowest_seed() { + let results = vec![ + Ok(LayoutResult { + view: sel_view("seed_456", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: 2.5, + seed: 456, + }), + Ok(LayoutResult { + view: sel_view("seed_42", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: 2.5, + seed: 42, + }), + Ok(LayoutResult { + view: sel_view("seed_789", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: 2.5, + seed: 789, + }), + ]; + + let best = select_best_layout(results).expect("selection should succeed"); + assert_eq!( + best.name.as_deref(), + Some("seed_42"), + "on a weighted_cost tie the lowest seed wins" + ); +} + +/// AC6.1 (NaN safety): a NaN-cost challenger must never displace a finite +/// running best. `select_best_layout` keeps the running best whenever the +/// challenger's `<` comparison is false, and `challenger < finite` is always +/// false for a NaN challenger -- so a degenerate NaN-cost candidate encountered +/// after a finite one cannot win. +#[test] +fn test_select_best_layout_nan_challenger_never_displaces_finite() { + // Finite candidate first, then NaN: the NaN must not displace it. + let finite_first = vec![ + Ok(LayoutResult { + view: sel_view("finite", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: 4.0, + seed: 42, + }), + Ok(LayoutResult { + view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: f64::NAN, + seed: 123, + }), + ]; + let best = select_best_layout(finite_first).expect("selection should succeed"); + assert_eq!( + best.name.as_deref(), + Some("finite"), + "a NaN-cost challenger must not displace a finite running best" + ); + + // A NaN that arrives last among several finite candidates still loses: the + // finite minimum is already the running best by the time NaN is compared. + let nan_last = vec![ + Ok(LayoutResult { + view: sel_view("hi", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: 9.0, + seed: 42, + }), + Ok(LayoutResult { + view: sel_view("lo", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: 1.0, + seed: 123, + }), + Ok(LayoutResult { + view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: f64::NAN, + seed: 456, + }), + ]; + let best = select_best_layout(nan_last).expect("selection should succeed"); + assert_eq!( + best.name.as_deref(), + Some("lo"), + "the finite minimum wins; a trailing NaN candidate cannot displace it" + ); +} + +/// AC6.1 (NaN safety, order-independent): a finite challenger must beat a NaN +/// running best regardless of position. The fold seeds the running best with the +/// FIRST result, so a degenerate NaN-cost layout from the first seed could +/// otherwise become a sticky running best (`finite < NaN` is false and `finite +/// == NaN` is false, so a plain `<` comparison never overtakes it). The fold +/// special-cases a NaN running best so a later finite candidate always wins. In +/// production (`generate_best_layout` runs seeds in the fixed order [42, 123, +/// 456, 789]), this guarantees a usable finite layout is shipped whenever ANY +/// seed produced one, no matter which seed degenerated. +#[test] +fn test_select_best_layout_finite_beats_nan_running_best() { + let nan_first = vec![ + Ok(LayoutResult { + view: sel_view("nan", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: f64::NAN, + seed: 42, + }), + Ok(LayoutResult { + view: sel_view("finite", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: 4.0, + seed: 123, + }), + ]; + let best = select_best_layout(nan_first).expect("selection should succeed"); + assert_eq!( + best.name.as_deref(), + Some("finite"), + "a finite challenger must beat a NaN running best regardless of order" + ); +} + +/// AC6.1 (NaN safety, all-NaN determinism): when EVERY candidate has a NaN cost, +/// neither the `<` comparison nor the NaN special-cases fire (a NaN challenger is +/// never "better"), so the earliest candidate is kept. This is deterministic +/// regardless of seed order -- the production caller would ship the first seed's +/// (degenerate) layout, but the choice is reproducible rather than arbitrary. +#[test] +fn test_select_best_layout_all_nan_keeps_earliest() { + let all_nan = vec![ + Ok(LayoutResult { + view: sel_view("first", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: f64::NAN, + seed: 456, + }), + Ok(LayoutResult { + view: sel_view("second", vec![marker_aux(1, "a", 0.0, 0.0)]), + weighted_cost: f64::NAN, + seed: 42, + }), + ]; + let best = select_best_layout(all_nan).expect("selection should succeed"); + assert_eq!( + best.name.as_deref(), + Some("first"), + "when all candidates are NaN the earliest is kept deterministically" + ); +} + +// ---- AC7: deterministic weighted_cost regression guard ---- +// +// The thresholds below are observed-cost CEILINGS captured at the fixed +// annealing seed 42 with the calibrated `MetricWeights::default()`. They guard +// against layout-quality regressions: if a change to the layout algorithm, +// metric, or weights pushes a tiny model's fixed-seed `weighted_cost` above its +// ceiling, this test fails loudly. Each ceiling sits a small margin above the +// observed cost (roughly observed * 1.15, or a small absolute floor when the +// observed cost is 0) -- tight enough to catch a real regression, loose enough +// not to flake on float noise. +// +// To regenerate after an INTENTIONAL metric/weight change: layout is +// deterministic per seed, so print the new `weighted_cost` for each guard model +// (e.g. add a temporary `println!` to `guard_fixed_seed_cost`), run this test +// once, and reset each ceiling a small margin above the new observed value. +// Lowering a ceiling that no longer matches reality is fine; raising one to +// paper over a real regression is not. +// +// Observed at seed 42 (2026-05-23): pop = 0.0533, chain = 0.0, +// two_stock = 0.1646. +const GUARD_POP_COST_CEILING: f64 = 0.06; +const GUARD_CHAIN_COST_CEILING: f64 = 0.05; +const GUARD_TWO_STOCK_COST_CEILING: f64 = 0.19; + +/// Lay `project`'s `main` model out at the fixed seed 42 and return its +/// calibrated `weighted_cost`. Seeding explicitly (rather than relying on the +/// `LayoutConfig::default()` seed) keeps the guard pinned to one reproducible +/// layout even if the default seed changes. +fn guard_fixed_seed_cost(project: &datamodel::Project) -> f64 { + let config = LayoutConfig { + annealing_random_seed: 42, + ..LayoutConfig::default() + }; + let view = generate_layout_with_config(project, MAIN_MODEL, config.clone(), None) + .expect("layout generation should succeed"); + compute_layout_metrics(&view, &config).weighted_cost(&MetricWeights::default()) +} + +/// A population stock with births/deaths flows and two rate auxes -- the +/// canonical tiny feedback model. +fn guard_pop_model() -> datamodel::Project { + TestProject::new("guard_pop") + .stock("population", "100", &["births"], &["deaths"], None) + .flow("births", "population * birth_rate", None) + .flow("deaths", "population * death_rate", None) + .aux("birth_rate", "0.03", None) + .aux("death_rate", "0.01", None) + .build_datamodel() +} + +/// A pure auxiliary dependency chain (no stocks): a -> b -> c -> d. +fn guard_chain_model() -> datamodel::Project { + TestProject::new("guard_chain") + .aux("a", "1", None) + .aux("b", "a * 2", None) + .aux("c", "b + a", None) + .aux("d", "c * b", None) + .build_datamodel() +} + +/// A two-stock transfer model: source -> transfer -> sink, rate-driven. +fn guard_two_stock_model() -> datamodel::Project { + TestProject::new("guard_two_stock") + .stock("source", "100", &[], &["transfer"], None) + .stock("sink", "0", &["transfer"], &[], None) + .flow("transfer", "source * rate", None) + .aux("rate", "0.1", None) + .build_datamodel() +} + +/// AC7.1: the fixed-seed `weighted_cost` of each tiny guard model stays at or +/// below its committed ceiling. Fast and deterministic: three tiny models, one +/// seed each. +#[test] +fn test_weighted_cost_regression_guard() { + let cases: [(&str, datamodel::Project, f64); 3] = [ + ("pop", guard_pop_model(), GUARD_POP_COST_CEILING), + ("chain", guard_chain_model(), GUARD_CHAIN_COST_CEILING), + ( + "two_stock", + guard_two_stock_model(), + GUARD_TWO_STOCK_COST_CEILING, + ), + ]; + + for (name, project, ceiling) in cases { + let cost = guard_fixed_seed_cost(&project); + assert!( + cost <= ceiling, + "{name}: fixed-seed weighted_cost {cost} exceeded ceiling {ceiling} \ + -- a layout-quality regression (or an intentional metric/weight \ + change that needs the ceiling regenerated)" + ); + } +} + +/// AC7.2: the guard ceiling actually discriminates good layouts from bad ones. +/// We take a real fixed-seed layout of the pop model and pile every node onto +/// the same coordinate, blowing up the node-overlap term, then assert the +/// resulting `weighted_cost` exceeds the ceiling -- so a real layout that +/// regressed to this level WOULD trip `test_weighted_cost_regression_guard`. +/// This makes the failure direction explicit and testable without flakiness. +#[test] +fn test_weighted_cost_guard_rejects_degenerate_layout() { + let project = guard_pop_model(); + let config = LayoutConfig { + annealing_random_seed: 42, + ..LayoutConfig::default() + }; + let view = generate_layout_with_config(&project, MAIN_MODEL, config.clone(), None) + .expect("layout generation should succeed"); + + // Collapse every positioned node onto the origin so the shapes overlap + // maximally (links/aliases/groups have no independent position). + let mut degenerate = view.clone(); + for elem in &mut degenerate.elements { + match elem { + ViewElement::Aux(a) => { + a.x = 0.0; + a.y = 0.0; + } + ViewElement::Stock(s) => { + s.x = 0.0; + s.y = 0.0; + } + ViewElement::Flow(f) => { + f.x = 0.0; + f.y = 0.0; + } + ViewElement::Module(m) => { + m.x = 0.0; + m.y = 0.0; + } + ViewElement::Cloud(c) => { + c.x = 0.0; + c.y = 0.0; + } + ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => {} + } + } + + let degenerate_cost = + compute_layout_metrics(°enerate, &config).weighted_cost(&MetricWeights::default()); + assert!( + degenerate_cost > GUARD_POP_COST_CEILING, + "a degenerate all-overlapping layout (cost {degenerate_cost}) must exceed \ + the guard ceiling {GUARD_POP_COST_CEILING}, proving the guard discriminates" + ); +} + +/// A model with enough nodes (a stock fed/drained by ten leaf auxes through two +/// flows) that the SFDP/annealing RNG genuinely shapes the layout, so two +/// different seeds produce two different layouts. The tiny guard models above +/// converge to one arrangement regardless of seed, which would make a +/// determinism check vacuous; this model exercises the seeded path. +fn guard_seed_sensitive_model() -> datamodel::Project { + let mut tp = TestProject::new("guard_seed_sensitive") + .stock("s", "100", &["inflow"], &["outflow"], None) + .flow("inflow", "a1 + a2 + a3 + a4 + a5", None) + .flow("outflow", "b1 + b2 + b3 + b4 + b5", None); + for i in 1..=5 { + tp = tp.aux(&format!("a{i}"), "1", None); + tp = tp.aux(&format!("b{i}"), "1", None); + } + tp.build_datamodel() +} + +/// Lay `project`'s `main` model out at `seed`. +fn layout_at_seed(project: &datamodel::Project, seed: u64) -> datamodel::StockFlow { + let config = LayoutConfig { + annealing_random_seed: seed, + ..LayoutConfig::default() + }; + generate_layout_with_config(project, MAIN_MODEL, config, None) + .expect("layout generation should succeed") +} + +/// AC8.1: a fixed seed reproduces a byte-identical layout. Generating the same +/// model twice through `generate_layout_with_config` at the same explicit seed +/// must yield two `StockFlow` values that compare equal (`StockFlow` derives +/// `PartialEq`, so this checks every field -- positions, view box, element +/// order -- not just element counts). +/// +/// We use a seed-sensitive model and also assert that a DIFFERENT seed yields a +/// DIFFERENT layout, so the same-seed equality is a real determinism guarantee +/// rather than a vacuous pass on a model whose layout ignores the seed. +/// +/// This per-seed reproducibility is distinct from the Phase 3 M-seed +/// statistical sweep, which deliberately VARIES the seed to sample the layout +/// distribution. Here the seed is held fixed and the layout must be exactly +/// repeatable; there the seed sweeps and the layouts are expected to differ. +/// The integration test `tests/layout.rs` already asserts `view1 == view2` for +/// `generate_layout`; this focused in-crate test covers the +/// `generate_layout_with_config` + explicit-seed Rung-0 path. +#[test] +fn test_layout_is_byte_identical_for_fixed_seed() { + let project = guard_seed_sensitive_model(); + + let view1 = layout_at_seed(&project, 7); + let view2 = layout_at_seed(&project, 7); + assert_eq!( + view1, view2, + "the same model at the same fixed seed must produce a byte-identical layout" + ); + + // Non-vacuity: a different seed must produce a different layout, proving the + // equality above reflects genuine per-seed determinism (not a seed-agnostic + // model where any pair would compare equal). + let other = layout_at_seed(&project, 999); + assert_ne!( + view1, other, + "a different seed should produce a different layout, so the same-seed \ + equality is a meaningful determinism guarantee" + ); +} diff --git a/src/simlin-engine/src/layout/layout_tests.rs b/src/simlin-engine/src/layout/layout_tests.rs index 58fc8d1ec..77e82997b 100644 --- a/src/simlin-engine/src/layout/layout_tests.rs +++ b/src/simlin-engine/src/layout/layout_tests.rs @@ -528,70 +528,6 @@ fn test_extract_equation_deps_arrayed_uses_all_entries() { assert_eq!(deps, vec!["bar", "foo"]); } -#[test] -fn test_select_best_layout_fewest_crossings() { - let results = vec![ - Ok(LayoutResult { - view: datamodel::StockFlow { - name: None, - elements: vec![ViewElement::Aux(view_element::Aux { - name: "from_5_crossings".to_string(), - uid: 1, - x: 0.0, - y: 0.0, - label_side: LabelSide::Bottom, - compat: None, - })], - view_box: Rect { - x: 0.0, - y: 0.0, - width: 100.0, - height: 100.0, - }, - zoom: 1.0, - use_lettered_polarity: false, - font: None, - sketch_compat: None, - }, - crossings: 5, - seed: 42, - }), - Ok(LayoutResult { - view: datamodel::StockFlow { - name: None, - elements: vec![ViewElement::Aux(view_element::Aux { - name: "from_2_crossings".to_string(), - uid: 2, - x: 0.0, - y: 0.0, - label_side: LabelSide::Bottom, - compat: None, - })], - view_box: Rect { - x: 0.0, - y: 0.0, - width: 100.0, - height: 100.0, - }, - zoom: 1.0, - use_lettered_polarity: false, - font: None, - sketch_compat: None, - }, - crossings: 2, - seed: 123, - }), - ]; - let best = select_best_layout(results).unwrap(); - // Should pick the one with 2 crossings (fewer is better) - assert_eq!(best.elements.len(), 1); - if let ViewElement::Aux(aux) = &best.elements[0] { - assert_eq!(aux.name, "from_2_crossings"); - } else { - unreachable!("expected Aux element"); - } -} - #[test] fn test_select_best_layout_lowest_seed_on_tie() { let results = vec![ @@ -617,7 +553,7 @@ fn test_select_best_layout_lowest_seed_on_tie() { font: None, sketch_compat: None, }, - crossings: 3, + weighted_cost: 3.0, seed: 123, }), Ok(LayoutResult { @@ -642,12 +578,13 @@ fn test_select_best_layout_lowest_seed_on_tie() { font: None, sketch_compat: None, }, - crossings: 3, + weighted_cost: 3.0, seed: 42, }), ]; let best = select_best_layout(results).unwrap(); - // Should pick seed 42 (lower seed wins on tie) + // Equal weighted_cost on both: the lower seed wins the tie-break (still valid + // under the Rung-0 weighted_cost selection rule). assert_eq!(best.elements.len(), 1); if let ViewElement::Aux(aux) = &best.elements[0] { assert_eq!(aux.name, "from_seed_42"); diff --git a/src/simlin-engine/src/layout/metrics.rs b/src/simlin-engine/src/layout/metrics.rs new file mode 100644 index 000000000..ff139faf3 --- /dev/null +++ b/src/simlin-engine/src/layout/metrics.rs @@ -0,0 +1,2469 @@ +// Copyright 2026 The Simlin Authors. All rights reserved. +// Use of this source code is governed by the Apache License, +// Version 2.0, that can be found in the LICENSE file. + +// pattern: Functional Core +// +// The layout quality core. Every term here is computed purely from a +// `datamodel::StockFlow` (and the `LayoutConfig` parameter, kept for +// forward-compatibility with the design's optimizer signature). All geometry +// comes from the same `diagram` helpers the SVG renderer uses and from +// `layout::build_view_segments`, so a layout's quality score can never disagree +// with the geometry the renderer draws or with `count_view_crossings`. +// +// There is NO I/O in this module: it takes data, computes scalars, returns +// them. That makes every term trivially testable with hand-computed expected +// values (see the inline tests below). + +use std::collections::{BTreeMap, BTreeSet, HashSet}; + +use crate::datamodel::{self, ViewElement}; +use crate::diagram::common::{ + self, Point, Rect, display_name, merge_bounds, rect_area, rect_overlap_area, + segment_clip_interval_in_rect, +}; +use crate::diagram::connector::{ARC_POLYLINE_SAMPLES, connector_polyline, get_visual_center}; +use crate::diagram::elements::{ + aux_bounds, aux_shape_bounds, cloud_bounds, module_bounds, stock_bounds, stock_shape_bounds, +}; +use crate::diagram::flow::{flow_bounds, flow_shape_bounds}; +use crate::diagram::label::{LabelProps, label_bounds}; + +use super::annealing::count_crossings; +use super::build_view_segments; +use super::config::LayoutConfig; + +/// Upper bound of the target aspect-ratio band. A view whose bounding-box +/// aspect ratio (long side / short side, always >= 1) is at or below this value +/// is "well-proportioned" and incurs no `aspect_penalty`. 16:9 is a generous +/// band that comfortably contains the conventional 4:3 diagram proportions +/// while still penalizing pathologically thin (e.g. 1x10) layouts. +pub const TARGET_AR_MAX: f64 = 16.0 / 9.0; + +/// One quality cost per aesthetic concern, with `0.0` always meaning "ideal". +/// +/// Most terms are scale-free by construction (ratios of like quantities), so +/// they are comparable across models of different absolute coordinate scale. +/// Three terms are *intentionally* sensitive to the absolute coordinate scale +/// relative to the universal fixed node-box size (`node_overlap`, +/// `label_overlap`, `sprawl`): a model whose nodes are packed tightly against +/// the fixed pixel size of a stock/aux box should score differently from one +/// spread far apart, and that sensitivity is what makes those terms meaningful +/// across models. See the AC1.8 scoping note in the Phase 1 plan. +/// +/// `Serialize`/`Deserialize` let the layout-quality eval sweep +/// (`examples/layout_eval.rs`) emit the per-term breakdown into its +/// `metrics.json` artifact and round-trip the committed baseline report back +/// from JSON for the baseline diff; the struct is pure data (every field a +/// plain `f64`), so the derives carry no behavior. +#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize, serde::Deserialize)] +pub struct LayoutMetrics { + /// Sum of pairwise node *shape*-box overlap area (label-free), normalized + /// by total shape-box area. Measures shapes overlapping shapes; label + /// collisions are charged by `label_overlap` instead. + pub node_overlap: f64, + /// Fraction of total connector length that passes through non-incident + /// node *shape* boxes (label-free). A connector under a node shape reads as + /// a false causal connection; a connector under only a label is not + /// charged here. + pub node_connector_overlap: f64, + /// Sum over labeled elements of each label's *obscured fraction*: the area + /// of the label box covered by any other label box or any other element's + /// bare shape box, capped at the label's own area and divided by it (so each + /// term is in [0,1]). 0 = no label obscured. Per-label so a small overlap + /// registers at its true obscuration fraction rather than being diluted by + /// the corpus's total label area. + pub label_overlap: f64, + /// Edge crossings normalized by connector count. + pub crossings: f64, + /// Mean connector length relative to the characteristic node size. + pub sprawl: f64, + /// Coefficient of variation (stddev/mean) of connector lengths. + pub edge_length_cv: f64, + /// How far the view bounding-box aspect ratio exceeds the target band. + pub aspect_penalty: f64, + /// Reserved; computed in a future rung. Always 0.0, weight 0. + pub chain_straightness: f64, + /// Mean isoperimetric penalty `1 - Q` over the view's feedback cycles + /// (`Q = 4*PI*Area / Perimeter^2` of each loop's node-center polygon, + /// clamped to [0,1]). 0.0 = clean, well-spread loops (circles); higher = + /// collapsed/collinear loops. 0.0 when the view has no cycle of >= 3 nodes. + /// Computed and reported now; weight stays 0 until Phase 4 calibration. + pub loop_compactness: f64, +} + +/// Per-term weights for the scalar an optimizer minimizes. +/// +/// `MetricWeights::default()` holds the calibrated production weights committed +/// in Phase 4 (see the failure-mode rationale on the `Default` impl below). +/// +/// `Serialize`/`Deserialize` let the layout-quality eval sweep +/// (`examples/layout_eval.rs`) record the weight set it used in its +/// `metrics.json` artifact and read it back when round-tripping the committed +/// baseline report; the struct is pure data (every field a plain `f64`), so the +/// derives carry no behavior. +#[derive(Clone, Copy, Debug, PartialEq, serde::Serialize, serde::Deserialize)] +pub struct MetricWeights { + pub node_overlap: f64, + pub node_connector_overlap: f64, + pub label_overlap: f64, + pub crossings: f64, + pub sprawl: f64, + pub edge_length_cv: f64, + pub aspect_penalty: f64, + pub chain_straightness: f64, + pub loop_compactness: f64, +} + +impl Default for MetricWeights { + /// The calibrated production weights, from the Phase 3 contact-sheet + /// calibration with explicit user sign-off (2026-05-23). + /// + /// Failure-mode rationale -- readability >> compactness: + /// * The dominant concerns all carry weight 1.0: node-shape overlap + /// (`node_overlap`), connectors passing under node shapes + /// (`node_connector_overlap`), obscured labels (`label_overlap`), and + /// edge `crossings`. These are the things that make a diagram unreadable + /// or assert false causal connections, so they dominate the cost. + /// * `sprawl`, `edge_length_cv`, and `aspect_penalty` are intentionally + /// 0.0: compactness and aspect ratio are NOT goals. Spreading nodes out + /// to keep labels legible and feedback loops visible is GOOD, not + /// something to penalize, so these terms must not pull against + /// readability. + /// * `loop_compactness` is a low 0.25: it gently REWARDS drawing feedback + /// loops as visible circles (a readability aid), but must never dominate + /// the overlap/crossings family, so it stays well below 1.0. + /// * `chain_straightness` stays 0.0: it is reserved (not yet computed), so + /// it carries no weight. + fn default() -> Self { + MetricWeights { + node_overlap: 1.0, + node_connector_overlap: 1.0, + label_overlap: 1.0, + crossings: 1.0, + sprawl: 0.0, + edge_length_cv: 0.0, + aspect_penalty: 0.0, + chain_straightness: 0.0, + loop_compactness: 0.25, + } + } +} + +impl LayoutMetrics { + /// Sigma w_i * term_i -- the scalar an optimizer minimizes. + pub fn weighted_cost(&self, w: &MetricWeights) -> f64 { + self.node_overlap * w.node_overlap + + self.node_connector_overlap * w.node_connector_overlap + + self.label_overlap * w.label_overlap + + self.crossings * w.crossings + + self.sprawl * w.sprawl + + self.edge_length_cv * w.edge_length_cv + + self.aspect_penalty * w.aspect_penalty + + self.chain_straightness * w.chain_straightness + + self.loop_compactness * w.loop_compactness + } +} + +/// The drawn geometry of one connector (Link or Flow): its incident node uids +/// (so node-connector-overlap can skip them) and the polyline the renderer +/// draws. Built once and reused by every connector-derived term so they all see +/// the same geometry. +struct ConnectorGeometry { + /// Element uids the connector is attached to and must not be charged for + /// passing through (its own endpoints). + incident_uids: HashSet, + /// The drawn polyline. Always has at least two points (connectors that draw + /// nothing -- e.g. MultiPoint links -- are not collected at all). + polyline: Vec, + /// Total polyline length. + length: f64, +} + +/// Total length of the UNION of parameter intervals `[t0, t1]` (each `t` in +/// [0,1]), counting each covered sub-length once. Sorts by start then sweep- +/// merges, so overlapping/adjacent intervals collapse. The next interval merges +/// when its start is `<= ` the current end (no epsilon needed; equality is +/// tolerated as adjacency). Mutates `intervals` (sorts in place); empty input +/// yields 0.0. Order-independent in its result. PURE. +fn merged_interval_length(intervals: &mut [(f64, f64)]) -> f64 { + if intervals.is_empty() { + return 0.0; + } + intervals.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap_or(std::cmp::Ordering::Equal)); + let mut total = 0.0; + let mut cur = intervals[0]; + for &(t0, t1) in &intervals[1..] { + if t0 <= cur.1 { + // Overlapping or adjacent: extend the current run. + cur.1 = cur.1.max(t1); + } else { + total += cur.1 - cur.0; + cur = (t0, t1); + } + } + total += cur.1 - cur.0; + total +} + +/// Polyline length: sum of segment lengths. +fn polyline_length(points: &[Point]) -> f64 { + points + .windows(2) + .map(|w| { + let dx = w[1].x - w[0].x; + let dy = w[1].y - w[0].y; + (dx * dx + dy * dy).sqrt() + }) + .sum() +} + +/// Resolve the node box for an element that has one (everything except links, +/// groups, and aliases -- aliases have no bounds helper and are excluded to +/// match the renderer's `calc_view_box`). +fn node_box(element: &ViewElement) -> Option { + match element { + ViewElement::Aux(a) => Some(aux_bounds(a)), + ViewElement::Stock(s) => Some(stock_bounds(s)), + ViewElement::Module(m) => Some(module_bounds(m)), + ViewElement::Cloud(c) => Some(cloud_bounds(c)), + ViewElement::Flow(f) => Some(flow_bounds(f)), + ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => None, + } +} + +/// The element's bare *shape* box, WITHOUT its own label, for the same set of +/// elements as `node_box`. `aux_bounds`/`stock_bounds`/`flow_bounds` merge each +/// element's own label into the returned box; the label-vs-node term of +/// `label_overlap` must use the label-free shape so a label-vs-label overlap is +/// not also charged via the other node's label-merged box (a double-count). +/// `module_bounds`/`cloud_bounds` already exclude the label (modules render a +/// label that their bounds omit; clouds render none), so they are their own +/// shape box. +fn node_shape_box(element: &ViewElement) -> Option { + match element { + ViewElement::Aux(a) => Some(aux_shape_bounds(a)), + ViewElement::Stock(s) => Some(stock_shape_bounds(s)), + ViewElement::Module(m) => Some(module_bounds(m)), + ViewElement::Cloud(c) => Some(cloud_bounds(c)), + ViewElement::Flow(f) => Some(flow_shape_bounds(f)), + ViewElement::Link(_) | ViewElement::Alias(_) | ViewElement::Group(_) => None, + } +} + +/// Build a `LabelProps` for a labeled element, matching the renderer's label +/// geometry (center, label side, display name, and the element's radii). Only +/// elements that render a label return `Some`. The radii match the per-element +/// `with_radii` calls in `diagram::elements`/`diagram::flow`. +fn element_label_props(element: &ViewElement) -> Option { + use crate::diagram::constants::{ + AUX_RADIUS, FLOW_VALVE_RADIUS, MODULE_HEIGHT, MODULE_WIDTH, STOCK_HEIGHT, STOCK_WIDTH, + }; + match element { + ViewElement::Aux(a) => Some( + LabelProps::new(a.x, a.y, a.label_side, display_name(&a.name)) + .with_radii(AUX_RADIUS, AUX_RADIUS), + ), + ViewElement::Stock(s) => Some( + LabelProps::new(s.x, s.y, s.label_side, display_name(&s.name)) + .with_radii(STOCK_WIDTH / 2.0, STOCK_HEIGHT / 2.0), + ), + ViewElement::Module(m) => Some( + LabelProps::new(m.x, m.y, m.label_side, display_name(&m.name)) + .with_radii(MODULE_WIDTH / 2.0, MODULE_HEIGHT / 2.0), + ), + ViewElement::Flow(f) => Some( + LabelProps::new(f.x, f.y, f.label_side, display_name(&f.name)) + .with_radii(FLOW_VALVE_RADIUS, FLOW_VALVE_RADIUS), + ), + // Aliases do render a label, but they have no `*_bounds` helper and are + // excluded from node bounds to match the renderer's view box; we keep + // the label-set consistent with the node-box set by also excluding + // their labels. Links/Clouds/Groups render no element label. + ViewElement::Alias(_) + | ViewElement::Link(_) + | ViewElement::Cloud(_) + | ViewElement::Group(_) => None, + } +} + +/// Collect the drawn geometry of every connector (Link or Flow) that draws +/// something. Links use the shared `connector_polyline` (the exact geometry the +/// renderer draws and `build_view_segments` counts); flows use their point +/// polyline. Connectors that draw nothing (MultiPoint links, degenerate arcs, +/// flows with fewer than two points) are omitted entirely. +fn collect_connector_geometry(view: &datamodel::StockFlow) -> Vec { + let mut uid_elements = std::collections::HashMap::new(); + for elem in &view.elements { + uid_elements.insert(elem.get_uid(), elem); + } + // Center-based, deterministic: nothing is treated as arrayed (matches + // `build_view_segments`). + let not_arrayed = |_: &str| false; + + let mut out = Vec::new(); + for elem in &view.elements { + match elem { + ViewElement::Link(link) => { + let (Some(&from), Some(&to)) = ( + uid_elements.get(&link.from_uid), + uid_elements.get(&link.to_uid), + ) else { + continue; + }; + let polyline = + connector_polyline(link, from, to, ¬_arrayed, ARC_POLYLINE_SAMPLES); + if polyline.len() < 2 { + continue; + } + let length = polyline_length(&polyline); + let mut incident_uids = HashSet::new(); + incident_uids.insert(link.from_uid); + incident_uids.insert(link.to_uid); + out.push(ConnectorGeometry { + incident_uids, + polyline, + length, + }); + } + ViewElement::Flow(flow) => { + if flow.points.len() < 2 { + continue; + } + let polyline: Vec = flow + .points + .iter() + .map(|p| Point { x: p.x, y: p.y }) + .collect(); + let length = polyline_length(&polyline); + // A flow is incident on its own valve plus any element its + // points attach to (the stock/cloud at each end). + let mut incident_uids = HashSet::new(); + incident_uids.insert(flow.uid); + for p in &flow.points { + if let Some(uid) = p.attached_to_uid { + incident_uids.insert(uid); + } + } + out.push(ConnectorGeometry { + incident_uids, + polyline, + length, + }); + } + _ => {} + } + } + out +} + +// --- loop_compactness (isoperimetric feedback-loop quality) ----------------- +// +// What it measures: how cleanly the view draws its feedback loops as visible +// circles. For each simple directed cycle of >= 3 positioned nodes we take the +// node-box centers in cycle order and form a polygon. Its isoperimetric +// quotient Q = 4*PI*Area / Perimeter^2 is 1 for a perfect circle and tends to 0 +// as the polygon collapses toward a line (the area vanishes while the perimeter +// stays large). The per-cycle penalty is `1 - Q` (0 = ideal clean loop, ~1 = +// squished/collinear), and `loop_compactness` is the mean penalty over all +// qualifying cycles (0.0 when the view has no cycle of >= 3 nodes). It thus +// REWARDS well-spread loops and PENALIZES collapsed ones. +// +// Bounds (SD diagrams are small, so this stays O(small) and total): a simple +// cycle is enumerated only up to `MAX_CYCLE_LEN` nodes, and at most +// `MAX_CYCLES` cycles are scored; enumeration stops once the cap is hit. The +// graph is built over positioned node-box elements (aux/stock/flow/module/cloud +// -- the same set as `node_box`); links and flows supply the directed edges. +// +// Determinism: layout is deterministic per seed, but this term is additionally +// independent of element ordering. Adjacency targets are sorted, the DFS starts +// from each node in sorted uid order, and every enumerated cycle is canonicalized +// (rotated so its smallest uid is first) and de-duplicated, so the mean is the +// same regardless of how the elements are listed in the view. + +/// Maximum number of nodes in an enumerated simple cycle. SD feedback loops are +/// short; a longer "cycle" is almost always an artifact of many overlapping +/// smaller loops and is not worth the combinatorial cost. +const MAX_CYCLE_LEN: usize = 12; + +/// Maximum number of distinct simple cycles scored. Bounds the work on dense +/// graphs; the mean penalty over the first `MAX_CYCLES` cycles is a faithful +/// proxy for the whole (SD diagrams rarely approach this). +const MAX_CYCLES: usize = 64; + +/// Directed adjacency over positioned node-box elements, keyed by uid with +/// sorted successor lists. Each node's loop vertex is the renderer's VISUAL +/// center (`diagram::connector::get_visual_center`) -- for a flow that is its +/// VALVE `(flow.x, flow.y)`, NOT the pipe-extent center of `flow_shape_bounds` +/// (which unions the valve box with every pipe point and so drifts off the valve +/// when the pipe is bent or the valve is dragged off-center); for an +/// aux/stock/module/cloud it is the element center, which already equals the +/// symmetric shape-box midpoint. Using the same visual center the SVG renderer +/// draws keeps the loop polygon faithful to the drawn diagram. +struct LoopGraph { + /// uid -> sorted, de-duplicated successor uids. + adj: BTreeMap>, + /// uid -> node visual-center point (the valve for flows; the element center + /// for aux/stock/module/cloud). + centers: BTreeMap, +} + +/// Build the directed loop graph from the view. Nodes are exactly the elements +/// with a node box (`node_shape_box` -- aux/stock/module/cloud/flow; links, +/// aliases, and groups are excluded). Each node's loop vertex is the renderer's +/// VISUAL center (`get_visual_center`), so a flow's vertex is its VALVE +/// `(flow.x, flow.y)`, NOT the pipe-extent center of `flow_shape_bounds` (the +/// valve box unioned with every pipe point), which drifts off the valve when the +/// pipe is bent or the valve is dragged off-center. For aux/stock/module/cloud +/// the visual center is the element center, which already equals the symmetric +/// shape-box midpoint, so those vertices are unchanged. Edges to/from uids that +/// are not positioned nodes are dropped. Edges come from: +/// * each Link: `from_uid -> to_uid`; +/// * each Flow: for consecutive attached points, `source_attached -> flow.uid` +/// and `flow.uid -> dest_attached`, so a stock--flow--stock feedback path is +/// part of the graph (the flow's own valve is the intermediate node). +fn build_loop_graph(view: &datamodel::StockFlow) -> LoopGraph { + // The node-membership gate stays `node_shape_box` (it defines which elements + // are loop nodes), but the loop VERTEX is the renderer's visual center, which + // is correct for every gated kind: the valve for a flow, the element center + // for aux/stock/module/cloud. `not_arrayed` matches `collect_connector_geometry` + // / `build_view_segments` (offset 0, deterministic). + let not_arrayed = |_: &str| false; + let mut centers: BTreeMap = BTreeMap::new(); + for e in &view.elements { + if node_shape_box(e).is_some() { + let (cx, cy) = get_visual_center(e, ¬_arrayed); + centers.insert(e.get_uid(), Point { x: cx, y: cy }); + } + } + + // Collect edges into sorted sets per source so the adjacency is canonical + // (sorted, de-duplicated) and the cycle search is order-independent. + let mut edge_sets: BTreeMap> = BTreeMap::new(); + let mut add_edge = |from: i32, to: i32, centers: &BTreeMap| { + // Both endpoints must be positioned nodes, and we never record a + // self-loop (a single-node "cycle" forms no polygon). + if from != to && centers.contains_key(&from) && centers.contains_key(&to) { + edge_sets.entry(from).or_default().insert(to); + } + }; + + for e in &view.elements { + match e { + ViewElement::Link(link) => { + add_edge(link.from_uid, link.to_uid, ¢ers); + } + ViewElement::Flow(flow) => { + // Consecutive attached points define stock->flow and flow->stock + // edges through the flow's own valve uid. + let attached: Vec = flow + .points + .iter() + .filter_map(|p| p.attached_to_uid) + .collect(); + for w in attached.windows(2) { + add_edge(w[0], flow.uid, ¢ers); + add_edge(flow.uid, w[1], ¢ers); + } + } + _ => {} + } + } + + let adj: BTreeMap> = edge_sets + .into_iter() + .map(|(k, set)| (k, set.into_iter().collect())) + .collect(); + LoopGraph { adj, centers } +} + +/// Enumerate simple directed cycles (each >= 2 nodes), bounded by +/// `MAX_CYCLE_LEN` and `MAX_CYCLES`, canonicalized and de-duplicated so the same +/// directed cycle is returned exactly once regardless of where the search +/// started. A bounded DFS suffices: SD diagrams are tiny, and the caps keep it +/// O(small) on the rare dense graph. +/// +/// Each returned cycle is a `Vec` of uids in traversal order, rotated so +/// its smallest uid is first (canonical form), and the set of returned cycles is +/// itself sorted for a fully deterministic result. +fn enumerate_simple_cycles(graph: &LoopGraph) -> Vec> { + let mut found: BTreeSet> = BTreeSet::new(); + // Start a DFS from each node in sorted uid order. To avoid re-finding the + // same cycle from each of its members we still canonicalize+dedup, but we + // also restrict each search to cycles whose minimum node is the start node, + // which prunes the bulk of the duplicate work. + let starts: Vec = graph.adj.keys().copied().collect(); + let mut path: Vec = Vec::new(); + let mut on_path: HashSet = HashSet::new(); + for &start in &starts { + path.clear(); + on_path.clear(); + dfs_cycles(graph, start, start, &mut path, &mut on_path, &mut found); + if found.len() >= MAX_CYCLES { + break; + } + } + found.into_iter().take(MAX_CYCLES).collect() +} + +/// Depth-first walk that records every simple cycle returning to `start` and +/// composed only of nodes whose uid is >= `start` (so each cycle is discovered +/// from its smallest member). `path`/`on_path` track the current simple path. +fn dfs_cycles( + graph: &LoopGraph, + start: i32, + current: i32, + path: &mut Vec, + on_path: &mut HashSet, + found: &mut BTreeSet>, +) { + if found.len() >= MAX_CYCLES { + return; + } + path.push(current); + on_path.insert(current); + + if let Some(succs) = graph.adj.get(¤t) { + for &next in succs { + if next == start { + // Closed a cycle back to the start. Record it (>= 2 nodes by + // construction; self-loops were never added as edges). + if path.len() >= 2 { + found.insert(canonicalize_cycle(path)); + if found.len() >= MAX_CYCLES { + break; + } + } + continue; + } + // Only extend through nodes strictly greater than the start (so the + // start is the minimum), not already on the path, within the length + // cap. + if next > start && !on_path.contains(&next) && path.len() < MAX_CYCLE_LEN { + dfs_cycles(graph, start, next, path, on_path, found); + if found.len() >= MAX_CYCLES { + break; + } + } + } + } + + on_path.remove(¤t); + path.pop(); +} + +/// Rotate a cycle so its smallest uid is first, preserving traversal direction. +/// The DFS already guarantees the start (= minimum) is element 0, but rotating +/// defensively keeps the canonical form correct for any caller. +/// +/// Note: this canonicalizes rotation (start at min uid) but NOT traversal +/// direction, so a directed cycle and its reverse canonicalize to distinct +/// entries. That is harmless: a reverse-direction duplicate (essentially never +/// present for directed SD feedback loops, which would require both directed +/// edge sets in the graph) would compute the same isoperimetric penalty because +/// the shoelace polygon area in `cycle_penalty` is direction-invariant. +fn canonicalize_cycle(cycle: &[i32]) -> Vec { + if cycle.is_empty() { + return Vec::new(); + } + let min_idx = cycle + .iter() + .enumerate() + .min_by_key(|&(_, v)| *v) + .map(|(i, _)| i) + .unwrap_or(0); + let mut out = Vec::with_capacity(cycle.len()); + for k in 0..cycle.len() { + out.push(cycle[(min_idx + k) % cycle.len()]); + } + out +} + +/// Isoperimetric penalty `1 - Q` for one cycle's node-box centers, or `None` if +/// the cycle does not qualify (fewer than 3 distinct positioned nodes, or a +/// degenerate zero-perimeter polygon). `Q = 4*PI*Area / Perimeter^2` is clamped +/// to [0, 1]; `Area` is the shoelace area (absolute value) and `Perimeter` the +/// summed edge length over the closed polygon. +fn cycle_penalty(cycle: &[i32], centers: &BTreeMap) -> Option { + // Distinct positioned nodes only: a polygon needs >= 3 vertices. + let distinct: BTreeSet = cycle.iter().copied().collect(); + if distinct.len() < 3 { + return None; + } + let pts: Vec = cycle + .iter() + .filter_map(|uid| centers.get(uid).copied()) + .collect(); + if pts.len() < 3 { + return None; + } + + let n = pts.len(); + let mut area2 = 0.0; + let mut perimeter = 0.0; + for i in 0..n { + let a = pts[i]; + let b = pts[(i + 1) % n]; + area2 += a.x * b.y - b.x * a.y; + let dx = b.x - a.x; + let dy = b.y - a.y; + perimeter += (dx * dx + dy * dy).sqrt(); + } + if perimeter <= 0.0 { + // All centers coincide: no polygon. Guarded so the division below is + // never NaN; such a degenerate cycle simply does not contribute. + return None; + } + let area = area2.abs() / 2.0; + let q = (4.0 * std::f64::consts::PI * area / (perimeter * perimeter)).clamp(0.0, 1.0); + Some(1.0 - q) +} + +/// `loop_compactness`: mean isoperimetric penalty `1 - Q` over the view's +/// bounded simple directed cycles of >= 3 positioned nodes. 0.0 when there is no +/// qualifying cycle. Deterministic for a given view regardless of element order +/// (see the module comment above). PURE. +fn compute_loop_compactness(view: &datamodel::StockFlow) -> f64 { + let graph = build_loop_graph(view); + let cycles = enumerate_simple_cycles(&graph); + let penalties: Vec = cycles + .iter() + .filter_map(|c| cycle_penalty(c, &graph.centers)) + .collect(); + if penalties.is_empty() { + 0.0 + } else { + penalties.iter().sum::() / penalties.len() as f64 + } +} + +/// Compute the layout quality metrics for a completed view. +/// +/// PURE: takes data, returns scalars, performs no I/O. The `_config` parameter +/// is kept to match the design's optimizer-facing signature and for forward +/// compatibility; the box geometry is sourced entirely from the `diagram` +/// helpers (which use fixed pixel element sizes), so the config is presently +/// unused. Every term is guaranteed finite (each division guards a zero +/// denominator by returning 0), so empty and single-element views yield +/// all-zero, NaN-free metrics. +pub fn compute_layout_metrics( + view: &datamodel::StockFlow, + _config: &LayoutConfig, +) -> LayoutMetrics { + // --- node boxes (with their owning element for incidence checks) --- + // + // Two box sets, used by different terms: + // * `node_boxes` is the LABEL-MERGED box (`node_box`): each element's own + // label unioned into its shape. The view's visual extent and its + // characteristic node size both include labels, so `sprawl` and + // `aspect_penalty` use this set. + // * `node_shape_boxes` is the bare SHAPE box (`node_shape_box`): + // label-free. `node_overlap` and `node_connector_overlap` use this set + // so they measure exactly what the user cares about -- node SHAPES + // overlapping other node shapes, and a connector passing under a node + // SHAPE (a false-causal-connection at a glance). A connector passing + // only under a node's LABEL is mild noise (labels are semi-transparent + // and no connector terminates on one) and must NOT be charged here; + // label collisions are the province of `label_overlap`. + let node_boxes: Vec<(i32, Rect)> = view + .elements + .iter() + .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r))) + .collect(); + let node_shape_boxes: Vec<(i32, Rect)> = view + .elements + .iter() + .filter_map(|e| node_shape_box(e).map(|r| (e.get_uid(), r))) + .collect(); + + // --- node_overlap (bare shape boxes, normalized by total shape-box area) --- + let total_shape_area: f64 = node_shape_boxes.iter().map(|(_, r)| rect_area(r)).sum(); + let node_overlap = if total_shape_area > 0.0 { + let mut overlap = 0.0; + for i in 0..node_shape_boxes.len() { + for j in (i + 1)..node_shape_boxes.len() { + overlap += rect_overlap_area(&node_shape_boxes[i].1, &node_shape_boxes[j].1); + } + } + overlap / total_shape_area + } else { + 0.0 + }; + + // --- connector geometry (shared by several terms) --- + let connectors = collect_connector_geometry(view); + let total_connector_length: f64 = connectors.iter().map(|c| c.length).sum(); + + // --- node_connector_overlap (length inside non-incident shape boxes) --- + // + // Documented as a "fraction of total connector length", so each physical + // sub-length of connector covered by ANY non-incident node shape box must be + // counted AT MOST ONCE. Summing the per-box clipped length double-counts the + // region where two non-incident boxes overlap, which can push the normalized + // value above 1.0 (overlapping shape boxes are common -- a Flow's shape box is + // its whole-pipe bounding box, which frequently overlaps stocks/auxes/other + // flows). Instead, for EACH segment we collect the clip intervals over all + // non-incident boxes and UNION them (merge overlapping/adjacent intervals) + // before summing, so each covered sub-length contributes once and the term is + // a true fraction in [0, 1]. The per-segment merge result is order-independent, + // so this is deterministic regardless of `node_shape_boxes` iteration order. + let node_connector_overlap = if total_connector_length > 0.0 { + let mut inside = 0.0; + for c in &connectors { + for seg in c.polyline.windows(2) { + let dx = seg[1].x - seg[0].x; + let dy = seg[1].y - seg[0].y; + let seg_len = (dx * dx + dy * dy).sqrt(); + if seg_len == 0.0 { + continue; // degenerate segment covers no length + } + // Clip interval [t0, t1] of this segment within each non-incident + // box, in segment-parameter space (t in [0,1]). + let mut intervals: Vec<(f64, f64)> = Vec::new(); + for (uid, rect) in &node_shape_boxes { + if c.incident_uids.contains(uid) { + continue; // skip the connector's own endpoints + } + if let Some(iv) = segment_clip_interval_in_rect(&seg[0], &seg[1], rect) { + intervals.push(iv); + } + } + inside += merged_interval_length(&mut intervals) * seg_len; + } + } + inside / total_connector_length + } else { + 0.0 + }; + + // --- label_overlap (per-label obscuration) --- + // + // For each labeled element L, measure how much of its label box B_L is + // covered (obscured) by OTHER drawn geometry, then SUM each label's obscured + // fraction. This is per-label rather than a single corpus-wide ratio: a + // small-but-readability-killing overlap (e.g. a node circle clipping the last + // two characters of a short label) registers at its true obscuration + // fraction instead of being diluted to ~0 by the corpus's total label area + // (the prior `sum_of_overlaps / total_label_area` definition under-counted + // exactly this case). + // + // The coverers of B_L are (a) any OTHER label box and (b) any OTHER element's + // bare *shape* box (`node_shape_box`, NOT the label-merged `node_box`): + // * A label is never charged against its OWN element's shape box. By + // construction a label sits adjacent to (and within the merged bounds of) + // its own element, so charging it there would always add a constant that + // is not a real collision. + // * Comparing against the bare shape box (not the label-merged box) keeps + // "label lands on another label" and "label lands on another node's + // shape" cleanly separate -- the merged box unions that node's own label, + // which would re-count the label-vs-label coverage already captured by + // the label-box term. + // + // A pixel-exact union of all coverers is unnecessary: the covered area is + // approximated by the SUM of individual overlap areas, capped at area(B_L) so + // a label's obscured fraction stays in [0,1] even when coverers overlap each + // other. This is a monotone proxy (more/larger overlaps never decrease the + // fraction). A mutual label-label collision is charged from BOTH labels' + // perspectives -- intended, since both are unreadable. Guards area(B_L) == 0 + // (degenerate label) by skipping it, so the term is always finite. + let label_boxes: Vec<(i32, Rect)> = view + .elements + .iter() + .filter_map(|e| element_label_props(e).map(|props| (e.get_uid(), label_bounds(&props)))) + .collect(); + // `node_shape_boxes` is computed once above (shared with node_overlap and + // node_connector_overlap). + let mut label_overlap = 0.0; + for (lbl_uid, lbl) in &label_boxes { + let lbl_area = rect_area(lbl); + if lbl_area <= 0.0 { + continue; // degenerate label box: no NaN, contributes nothing + } + let mut covered = 0.0; + // Covered by every OTHER label box. + for (other_uid, other) in &label_boxes { + if other_uid == lbl_uid { + continue; + } + covered += rect_overlap_area(lbl, other); + } + // Covered by every OTHER element's bare shape box. + for (node_uid, node) in &node_shape_boxes { + if node_uid == lbl_uid { + continue; + } + covered += rect_overlap_area(lbl, node); + } + // Cap the (possibly over-counted) covered area at the label's own area + // so the obscured fraction is in [0,1]. + let obscured_fraction = (covered.min(lbl_area)) / lbl_area; + label_overlap += obscured_fraction; + } + + // --- crossings --- + let connector_count = connectors.len(); + let crossings = if connector_count > 0 { + count_crossings(&build_view_segments(view)) as f64 / connector_count as f64 + } else { + 0.0 + }; + + // --- sprawl --- + let sprawl = if !connectors.is_empty() && !node_boxes.is_empty() { + let mean_connector_length = total_connector_length / connectors.len() as f64; + let characteristic_node_size = node_boxes + .iter() + .map(|(_, r)| { + let w = common::rect_width(r); + let h = common::rect_height(r); + (w * w + h * h).sqrt() + }) + .sum::() + / node_boxes.len() as f64; + if characteristic_node_size > 0.0 { + mean_connector_length / characteristic_node_size + } else { + 0.0 + } + } else { + 0.0 + }; + + // --- edge_length_cv --- + let edge_length_cv = if connectors.len() >= 2 { + let n = connectors.len() as f64; + let mean = total_connector_length / n; + if mean > 0.0 { + let variance = connectors + .iter() + .map(|c| { + let d = c.length - mean; + d * d + }) + .sum::() + / n; // population variance + variance.sqrt() / mean + } else { + 0.0 + } + } else { + 0.0 + }; + + // --- aspect_penalty --- + // Bounding box over node boxes (union). The aspect ratio is the long side + // over the short side (always >= 1); we penalize the amount by which it + // exceeds the target band. Chosen formula: `ar - TARGET_AR_MAX` (a plain + // unit-of-ratio overshoot). Documented here and matched in the AC1.5 test. + let aspect_penalty = match view_bounding_box(&node_boxes) { + Some(bbox) => { + let w = common::rect_width(&bbox); + let h = common::rect_height(&bbox); + let (long, short) = if w >= h { (w, h) } else { (h, w) }; + if short <= 0.0 { + 0.0 + } else { + let ar = long / short; + (ar - TARGET_AR_MAX).max(0.0) + } + } + None => 0.0, + }; + + // --- loop_compactness (isoperimetric feedback-loop quality) --- + let loop_compactness = compute_loop_compactness(view); + + LayoutMetrics { + node_overlap, + node_connector_overlap, + label_overlap, + crossings, + sprawl, + edge_length_cv, + aspect_penalty, + // reserved; computed in a future rung + chain_straightness: 0.0, + loop_compactness, + } +} + +/// Union of the node boxes, or `None` if there are no node boxes. +fn view_bounding_box(node_boxes: &[(i32, Rect)]) -> Option { + let mut iter = node_boxes.iter(); + let first = iter.next()?.1; + Some(iter.fold(first, |acc, (_, r)| merge_bounds(acc, *r))) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::datamodel::view_element::{self, LabelSide, LinkShape}; + // `segment_length_in_rect` is the simple single-box clip; the AC1.3 tests and + // the union tests use it as an independent reference oracle to cross-check the + // production union path (which composes `segment_clip_interval_in_rect`). + use crate::diagram::common::segment_length_in_rect; + use crate::diagram::constants::STOCK_WIDTH; + use proptest::prelude::*; + + // --- fixture helpers --- + + fn stock(uid: i32, name: &str, x: f64, y: f64) -> ViewElement { + ViewElement::Stock(view_element::Stock { + name: name.to_string(), + uid, + x, + y, + label_side: LabelSide::Bottom, + compat: None, + }) + } + + fn aux(uid: i32, name: &str, x: f64, y: f64) -> ViewElement { + ViewElement::Aux(view_element::Aux { + name: name.to_string(), + uid, + x, + y, + label_side: LabelSide::Bottom, + compat: None, + }) + } + + /// A cloud at `(x, y)`. A cloud is a positioned node with a bare shape box + /// (`cloud_bounds`, a 27x27 square: CLOUD_RADIUS = 13.5) and NO rendered + /// label, so it is the cleanest "obscuring shape" fixture for label_overlap. + fn cloud(uid: i32, x: f64, y: f64) -> ViewElement { + ViewElement::Cloud(view_element::Cloud { + uid, + flow_uid: -1, + x, + y, + compat: None, + }) + } + + fn straight_link(uid: i32, from_uid: i32, to_uid: i32) -> ViewElement { + ViewElement::Link(view_element::Link { + uid, + from_uid, + to_uid, + shape: LinkShape::Straight, + polarity: None, + }) + } + + /// A flow valve at `(x, y)` with a two-point polyline whose endpoints attach + /// to `from_uid` and `to_uid` (a stock--flow--stock segment). The point + /// coordinates are irrelevant to `loop_compactness` (which uses node-box + /// centers, not flow points), so they are placed at the valve. + fn flow_between( + uid: i32, + name: &str, + x: f64, + y: f64, + from_uid: i32, + to_uid: i32, + ) -> ViewElement { + ViewElement::Flow(view_element::Flow { + name: name.to_string(), + uid, + x, + y, + label_side: LabelSide::Bottom, + points: vec![ + view_element::FlowPoint { + x, + y, + attached_to_uid: Some(from_uid), + }, + view_element::FlowPoint { + x, + y, + attached_to_uid: Some(to_uid), + }, + ], + compat: None, + label_compat: None, + }) + } + + fn make_view(elements: Vec) -> datamodel::StockFlow { + datamodel::StockFlow { + name: None, + elements, + view_box: datamodel::Rect { + x: 0.0, + y: 0.0, + width: 1000.0, + height: 1000.0, + }, + zoom: 1.0, + use_lettered_polarity: false, + font: None, + sketch_compat: None, + } + } + + fn cfg() -> LayoutConfig { + LayoutConfig::default() + } + + /// Scale every coordinate of a view by `s` (element centers and any + /// flow/connector points). Used by the AC1.8 scale-invariance test. + fn scale_view(view: &datamodel::StockFlow, s: f64) -> datamodel::StockFlow { + let elements = view + .elements + .iter() + .map(|e| match e { + ViewElement::Aux(a) => ViewElement::Aux(view_element::Aux { + x: a.x * s, + y: a.y * s, + ..a.clone() + }), + ViewElement::Stock(st) => ViewElement::Stock(view_element::Stock { + x: st.x * s, + y: st.y * s, + ..st.clone() + }), + ViewElement::Flow(f) => ViewElement::Flow(view_element::Flow { + x: f.x * s, + y: f.y * s, + points: f + .points + .iter() + .map(|p| view_element::FlowPoint { + x: p.x * s, + y: p.y * s, + attached_to_uid: p.attached_to_uid, + }) + .collect(), + ..f.clone() + }), + ViewElement::Module(m) => ViewElement::Module(view_element::Module { + x: m.x * s, + y: m.y * s, + ..m.clone() + }), + ViewElement::Cloud(c) => ViewElement::Cloud(view_element::Cloud { + x: c.x * s, + y: c.y * s, + ..c.clone() + }), + ViewElement::Alias(a) => ViewElement::Alias(view_element::Alias { + x: a.x * s, + y: a.y * s, + ..a.clone() + }), + other => other.clone(), + }) + .collect(); + datamodel::StockFlow { + elements, + ..view.clone() + } + } + + // --- AC1.1: node_overlap equals known overlap / total node area --- + + #[test] + fn test_node_overlap_known_overlap_fraction() { + // Two stocks (45x35) whose centers are 20px apart horizontally and at + // the same y. node_overlap is computed on the bare SHAPE boxes (not the + // label-merged boxes), so the expected value comes from + // `stock_shape_bounds` and is normalized by the total SHAPE-box area. + let s1 = stock(1, "a", 100.0, 100.0); + let s2 = stock(2, "b", 120.0, 100.0); + let view = make_view(vec![s1.clone(), s2.clone()]); + + let m = compute_layout_metrics(&view, &cfg()); + + // Expected: compute directly from the two bare shape boxes the renderer + // draws (the rects, label-free). + let b1 = node_shape_box(&s1).unwrap(); + let b2 = node_shape_box(&s2).unwrap(); + let expected_overlap = rect_overlap_area(&b1, &b2); + let expected_total = rect_area(&b1) + rect_area(&b2); + assert!(expected_overlap > 0.0, "fixture must actually overlap"); + let expected = expected_overlap / expected_total; + assert!( + (m.node_overlap - expected).abs() < 1e-9, + "node_overlap {} != expected {}", + m.node_overlap, + expected + ); + } + + #[test] + fn test_node_overlap_simple_hand_computed() { + // Two stocks with exactly one stock-width of horizontal center + // separation. node_overlap is a sum over the bare SHAPE boxes, so only + // the rects matter (labels are irrelevant to this term now). + let s1 = stock(1, "a", 0.0, 0.0); + let s2 = stock(2, "b", STOCK_WIDTH, 0.0); // centers exactly one width apart + let view = make_view(vec![s1, s2]); + let m = compute_layout_metrics(&view, &cfg()); + // Centers one full width apart -> the 45-wide shape boxes just touch in + // x (right edge of #1 at +22.5, left edge of #2 at +22.5): zero shape + // overlap. So node_overlap == 0. + assert_eq!(m.node_overlap, 0.0); + } + + // --- AC1.2: pairwise-disjoint nodes => node_overlap == 0 --- + + #[test] + fn test_node_overlap_disjoint_is_zero() { + let view = make_view(vec![ + stock(1, "a", 0.0, 0.0), + stock(2, "b", 500.0, 500.0), + aux(3, "c", 1000.0, 0.0), + ]); + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!(m.node_overlap, 0.0); + } + + // node_overlap is computed on the bare SHAPE boxes, NOT the label-merged + // boxes. The user cares about node shapes overlapping other node shapes; + // a label landing on another node's shape (or another label) is the + // province of `label_overlap`. This test distinguishes the two regimes and + // would FAIL against the prior label-merged-box implementation. + + #[test] + fn test_node_overlap_labels_overlap_shapes_disjoint_is_zero() { + // Two `LabelSide::Bottom` auxes named "samename" (8 chars), 40px apart + // horizontally at the same y -- the same fixture as the label_overlap + // double-count regression test: + // aux1 @ (0,0): shape [-9,9]x[-9,9], label [-29,29]x[13,27] + // aux2 @ (40,0): shape [31,49]x[-9,9], label [11,69]x[13,27] + // The SHAPE boxes are disjoint (9 < 31), so node_overlap == 0. The + // LABEL boxes overlap, but that collision belongs to label_overlap, not + // node_overlap. Under the old label-merged boxes node_overlap would be + // > 0 (the merged boxes [-29,29]x[-9,27] and [11,69]x[-9,27] overlap), + // so this assertion pins the new shape-only behavior. + let view = make_view(vec![ + aux(1, "samename", 0.0, 0.0), + aux(2, "samename", 40.0, 0.0), + ]); + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!( + m.node_overlap, 0.0, + "node_overlap must ignore label-only overlap (shapes are disjoint)" + ); + // Sanity: the label collision IS captured by label_overlap, confirming + // the overlap was not simply lost. + assert!( + m.label_overlap > 0.0, + "the label-vs-label overlap must still be charged by label_overlap" + ); + } + + #[test] + fn test_node_overlap_shapes_overlap_is_positive() { + // Two stocks (45x35) whose centers are 20px apart horizontally and at + // the same y -- their bare SHAPE boxes overlap, so node_overlap > 0. + let view = make_view(vec![ + stock(1, "a", 100.0, 100.0), + stock(2, "b", 120.0, 100.0), + ]); + let m = compute_layout_metrics(&view, &cfg()); + assert!( + m.node_overlap > 0.0, + "overlapping node shapes must produce positive node_overlap" + ); + } + + // --- AC1.3: node_connector_overlap --- + + #[test] + fn test_node_connector_overlap_through_third_node() { + // Connector from aux #1 (far left) to aux #2 (far right), passing + // horizontally through a stock #3 sitting on the line at the middle. + let a = aux(1, "a", 0.0, 0.0); + let b = aux(2, "b", 400.0, 0.0); + let mid = stock(3, "s", 200.0, 0.0); + let link = straight_link(10, 1, 2); + let view = make_view(vec![a, b, mid, link]); + + let m = compute_layout_metrics(&view, &cfg()); + assert!( + m.node_connector_overlap > 0.0, + "connector passing through a non-incident stock must contribute" + ); + + // Expected = clipped length inside the stock SHAPE box / total polyline + // len. node_connector_overlap charges against the bare shape box, not + // the label-merged box. (The connector is horizontal at y=0, so the + // clipped length happens to be identical to the label-merged box here; + // the SHAPE box is the contract regardless.) + let connectors = collect_connector_geometry(&view); + assert_eq!(connectors.len(), 1); + let c = &connectors[0]; + let stock_box = node_shape_box(&stock(3, "s", 200.0, 0.0)).unwrap(); + let mut inside = 0.0; + for seg in c.polyline.windows(2) { + inside += segment_length_in_rect(&seg[0], &seg[1], &stock_box); + } + let expected = inside / c.length; + assert!( + (m.node_connector_overlap - expected).abs() < 1e-9, + "got {} expected {}", + m.node_connector_overlap, + expected + ); + } + + #[test] + fn test_node_connector_overlap_avoids_all_is_zero() { + // Connector between two auxes with a third node well off the line. + let a = aux(1, "a", 0.0, 0.0); + let b = aux(2, "b", 400.0, 0.0); + let off = stock(3, "s", 200.0, 500.0); + let link = straight_link(10, 1, 2); + let view = make_view(vec![a, b, off, link]); + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!(m.node_connector_overlap, 0.0); + } + + // node_connector_overlap charges a connector for the length it spends + // inside a non-incident node's bare SHAPE box, NOT its label-merged box. + // The user reads a connector passing under a node SHAPE as a false causal + // connection (high priority); a connector passing only under a node's LABEL + // is mild noise (labels are semi-transparent, no connector starts/ends on a + // label) and must NOT be charged. These two tests pin that distinction; the + // first would FAIL against the prior label-merged-box implementation. + + #[test] + fn test_node_connector_overlap_under_label_only_is_zero() { + // Connector from aux #1 (0,0) to aux #2 (400,0): a horizontal line at + // y=0 (clipped to the 9px aux radii, so drawn x in [9, 391]). A + // non-incident `LabelSide::Bottom` stock #3 named "s" (1 char) is placed + // ABOVE the line so its SHAPE box clears y=0 but its label (which hangs + // BELOW the shape) reaches down across y=0: + // stock #3 @ (200,-25): + // shape box x [177.5, 222.5], y [-42.5, -7.5] (does NOT cross 0) + // label box x [192, 208], y [-3.5, 10.5] (DOES cross 0) + // The connector at y=0 passes through the label band but never enters + // the shape box, so node_connector_overlap == 0. Under the old + // label-merged box (which unions the label, y [-42.5, 10.5]) the line + // WOULD be charged, so this assertion is the load-bearing distinction. + let a = aux(1, "a", 0.0, 0.0); + let b = aux(2, "b", 400.0, 0.0); + let label_only = stock(3, "s", 200.0, -25.0); + let link = straight_link(10, 1, 2); + let view = make_view(vec![a, b, label_only, link]); + + // Confirm the fixture geometry is what we claim before asserting on the + // metric: shape box clears the line, merged box does not. + let shape = node_shape_box(&stock(3, "s", 200.0, -25.0)).unwrap(); + let merged = node_box(&stock(3, "s", 200.0, -25.0)).unwrap(); + assert!( + shape.bottom < 0.0, + "shape box must clear the connector line (bottom {} < 0)", + shape.bottom + ); + assert!( + merged.bottom > 0.0, + "merged box must cross the connector line via the label (bottom {} > 0)", + merged.bottom + ); + + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!( + m.node_connector_overlap, 0.0, + "a connector passing only under a node's LABEL must not be charged" + ); + } + + #[test] + fn test_node_connector_overlap_under_shape_is_positive() { + // Same connector, but the non-incident stock sits ON the line so the + // connector crosses its SHAPE box -- the false-causal-connection case + // the user cares about. node_connector_overlap > 0. + let a = aux(1, "a", 0.0, 0.0); + let b = aux(2, "b", 400.0, 0.0); + let on_line = stock(3, "s", 200.0, 0.0); + let link = straight_link(10, 1, 2); + let view = make_view(vec![a, b, on_line, link]); + let m = compute_layout_metrics(&view, &cfg()); + assert!( + m.node_connector_overlap > 0.0, + "a connector passing under a node SHAPE must be charged" + ); + } + + // node_connector_overlap is documented as a "fraction of total connector + // length", so it must count each physical sub-length of connector covered by + // ANY non-incident node shape box AT MOST ONCE. When two non-incident shape + // boxes overlap, the prior implementation summed the per-box clipped lengths, + // double-counting the connector segment that lies in the overlap region; the + // normalized value could then exceed 1.0 and over-inflate weighted_cost. The + // correct value is the UNION length covered by (box A OR box B) over the total + // connector length. These two tests pin the union contract. + + /// Length of segment p0->p1 covered by the UNION of `rects` (each physical + /// sub-length counted once). Independent reference implementation used by the + /// union tests: collect each rect's Liang-Barsky clip interval, merge, sum. + fn union_segment_length_in_rects(p0: &Point, p1: &Point, rects: &[Rect]) -> f64 { + let seg_len = { + let dx = p1.x - p0.x; + let dy = p1.y - p0.y; + (dx * dx + dy * dy).sqrt() + }; + if seg_len == 0.0 { + return 0.0; + } + let mut intervals: Vec<(f64, f64)> = Vec::new(); + for r in rects { + // Recover [t0, t1] from segment_length_in_rect's reported length: the + // tests use axis-aligned horizontal segments, so the clipped length is + // an exact multiple of seg_len. We instead build intervals from the + // covered length by reconstructing endpoints via the rect bounds for a + // horizontal segment at constant y (the only geometry these tests use). + let covered = segment_length_in_rect(p0, p1, r); + if covered <= 0.0 { + continue; + } + // For a horizontal segment (y constant) inside [left,right], the + // covered x-range is [max(min_x,left), min(max_x,right)]. Convert to t. + let (xa, xb) = (p0.x.min(p1.x), p0.x.max(p1.x)); + let lo_x = xa.max(r.left); + let hi_x = xb.min(r.right); + let span = p1.x - p0.x; + let t_lo = ((lo_x - p0.x) / span).clamp(0.0, 1.0); + let t_hi = ((hi_x - p0.x) / span).clamp(0.0, 1.0); + let (t0, t1) = if t_lo <= t_hi { + (t_lo, t_hi) + } else { + (t_hi, t_lo) + }; + intervals.push((t0, t1)); + } + intervals.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap()); + let mut total = 0.0; + let mut cur: Option<(f64, f64)> = None; + for (t0, t1) in intervals { + match cur { + None => cur = Some((t0, t1)), + Some((c0, c1)) => { + if t0 <= c1 { + cur = Some((c0, c1.max(t1))); + } else { + total += c1 - c0; + cur = Some((t0, t1)); + } + } + } + } + if let Some((c0, c1)) = cur { + total += c1 - c0; + } + total * seg_len + } + + #[test] + fn test_node_connector_overlap_union_of_overlapping_boxes() { + // A horizontal Link between aux #1 (0,0) and aux #2 (400,0) at y=0. Two + // NON-incident stocks straddle the line AND overlap each other: + // stock #3 @ (200,0): shape x [177.5, 222.5] + // stock #4 @ (210,0): shape x [187.5, 232.5] + // Their shape boxes overlap in x [187.5, 222.5]. The OLD code charged the + // connector for box A (length 45) PLUS box B (length 45) = 90, but the + // physical connector length under (A OR B) is the union x [177.5, 232.5] + // = 55. The new metric must equal union/total, and the old sum/total + // strictly exceeds it. + let a = aux(1, "a", 0.0, 0.0); + let b = aux(2, "b", 400.0, 0.0); + let s3 = stock(3, "s3", 200.0, 0.0); + let s4 = stock(4, "s4", 210.0, 0.0); + let link = straight_link(10, 1, 2); + let view = make_view(vec![a, b, s3.clone(), s4.clone(), link]); + + let m = compute_layout_metrics(&view, &cfg()); + + let connectors = collect_connector_geometry(&view); + assert_eq!(connectors.len(), 1); + let c = &connectors[0]; + let box3 = node_shape_box(&s3).unwrap(); + let box4 = node_shape_box(&s4).unwrap(); + + // Independent union reference and the old (double-counting) sum. + let mut union_len = 0.0; + let mut old_sum_len = 0.0; + for seg in c.polyline.windows(2) { + union_len += union_segment_length_in_rects(&seg[0], &seg[1], &[box3, box4]); + old_sum_len += segment_length_in_rect(&seg[0], &seg[1], &box3) + + segment_length_in_rect(&seg[0], &seg[1], &box4); + } + let expected = union_len / c.length; + let old_value = old_sum_len / c.length; + + // The fixture must actually overlap so the old sum strictly exceeds the + // union (otherwise the test proves nothing). + assert!( + old_value > expected + 1e-9, + "fixture must double-count: old {old_value} should exceed union {expected}" + ); + assert!( + (m.node_connector_overlap - expected).abs() < 1e-9, + "node_connector_overlap must equal the union fraction: got {} expected {} \ + (old double-counted value was {})", + m.node_connector_overlap, + expected, + old_value + ); + assert!( + m.node_connector_overlap <= 1.0, + "node_connector_overlap is a fraction and must be <= 1.0, got {}", + m.node_connector_overlap + ); + } + + #[test] + fn test_node_connector_overlap_coincident_boxes_counted_once() { + // Starker variant: a connector sub-length fully inside TWO COINCIDENT + // non-incident boxes is counted ONCE, not twice. Two stocks at the same + // position (200,0) each fully contain the connector segment x [177.5, + // 222.5]. The OLD code would count that length twice (~2x); the union + // counts it once. We also build the fixture so the total connector length + // is small enough that the OLD value EXCEEDS 1.0 -- impossible for a + // documented fraction. Auxes are placed close in (x 180 and 220) so the + // drawn connector is short and lies entirely within the coincident boxes. + let a = aux(1, "a", 180.0, 0.0); + let b = aux(2, "b", 220.0, 0.0); + let s3 = stock(3, "s3", 200.0, 0.0); + let s4 = stock(4, "s4", 200.0, 0.0); + let link = straight_link(10, 1, 2); + let view = make_view(vec![a, b, s3.clone(), s4.clone(), link]); + + let m = compute_layout_metrics(&view, &cfg()); + + let connectors = collect_connector_geometry(&view); + assert_eq!(connectors.len(), 1); + let c = &connectors[0]; + let box3 = node_shape_box(&s3).unwrap(); + let box4 = node_shape_box(&s4).unwrap(); + + let mut union_len = 0.0; + let mut old_sum_len = 0.0; + for seg in c.polyline.windows(2) { + union_len += union_segment_length_in_rects(&seg[0], &seg[1], &[box3, box4]); + old_sum_len += segment_length_in_rect(&seg[0], &seg[1], &box3) + + segment_length_in_rect(&seg[0], &seg[1], &box4); + } + let expected = union_len / c.length; + let old_value = old_sum_len / c.length; + + // With two coincident boxes both covering the whole drawn connector, the + // union fraction is 1.0 and the old value is ~2.0 (> 1.0, impossible for a + // fraction). + assert!( + old_value > 1.0, + "coincident-box fixture must drive the OLD value above 1.0 (got {old_value})" + ); + assert!( + (expected - 1.0).abs() < 1e-9, + "union of two coincident boxes covering the whole connector is the full \ + length (fraction 1.0), got {expected}" + ); + assert!( + (m.node_connector_overlap - expected).abs() < 1e-9, + "coincident non-incident boxes must be counted once: got {} expected {} \ + (old double-counted value was {})", + m.node_connector_overlap, + expected, + old_value + ); + assert!( + m.node_connector_overlap <= 1.0 + 1e-9, + "node_connector_overlap is a fraction and must be <= 1.0, got {}", + m.node_connector_overlap + ); + } + + // --- AC1.4: label_overlap (per-label obscuration) --- + // + // label_overlap is the SUM over labeled elements of each label's obscured + // fraction: the area of the label box covered by any OTHER label box or any + // OTHER element's bare shape box, capped at the label's own area and divided + // by it (so each term is in [0,1]). 0 = no label obscured. A small overlap + // registers at its true per-label obscuration fraction rather than being + // diluted by the corpus's total label area (the old area/total definition's + // under-counting; see `test_label_overlap_small_clip_is_sensitive`). + + #[test] + fn test_label_overlap_overlapping_labels() { + // Two auxes at the same position -> their labels (Bottom) coincide + // exactly. Each label is fully covered by the other (capped at its own + // area), so each obscured fraction is 1.0 and the sum is 2.0. + let view = make_view(vec![ + aux(1, "samename", 100.0, 100.0), + aux(2, "samename", 100.0, 100.0), + ]); + let m = compute_layout_metrics(&view, &cfg()); + assert!( + (m.label_overlap - 2.0).abs() < 1e-9, + "two coincident labels are each fully obscured: expected 2.0, got {}", + m.label_overlap + ); + } + + #[test] + fn test_label_overlap_disjoint_is_zero() { + // Two auxes far apart -> no label is covered by anything. Sum of + // obscured fractions is 0.0. + let view = make_view(vec![aux(1, "a", 0.0, 0.0), aux(2, "b", 1000.0, 1000.0)]); + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!(m.label_overlap, 0.0); + } + + #[test] + fn test_label_overlap_counts_label_pair_exactly_once() { + // The Phase-1 double-count guard, restated for per-label obscuration: a + // label is never charged against its OWN element's shape box, and a + // label-vs-label collision is counted from each label's own perspective + // (both labels are unreadable -- that is intended), not via the other + // node's label-merged bounds. + // + // Fixture: two `LabelSide::Bottom` auxes named "samename" (8 chars). + // AUX_RADIUS = 9; label editor width = 8*6 + 10 = 58, height = 14. + // With Bottom labels, label top = cy + 9 + LABEL_PADDING(4) = cy + 13, + // bottom = cy + 27, left = cx - 29, right = cx + 29. + // + // Place them 40px apart horizontally, same y: + // aux1 @ (0,0): shape [-9,9]x[-9,9], label [-29,29]x[13,27] + // aux2 @ (40,0): shape [31,49]x[-9,9], label [11,69]x[13,27] + // + // SHAPE boxes do NOT overlap (9 < 31), and each label clears the OTHER + // aux's bare shape box entirely (label y [13,27] vs shape y [-9,9]). The + // LABELS overlap by x:[11,29]=18, y:[13,27]=14 -> 252. Each label box has + // area 58*14 = 812 and is covered only by the other label (252 < 812, no + // cap), so each obscured fraction is 252/812 and the sum is 504/812. + let view = make_view(vec![ + aux(1, "samename", 0.0, 0.0), + aux(2, "samename", 40.0, 0.0), + ]); + let m = compute_layout_metrics(&view, &cfg()); + + let label_area = 58.0 * 14.0; // 812.0 + let overlap = 18.0 * 14.0; // 252.0, the single label-label intersection + let expected = (overlap / label_area) + (overlap / label_area); // 504/812 + assert!( + (m.label_overlap - expected).abs() < 1e-9, + "per-label obscuration should sum each label's fraction once: got {} expected {}", + m.label_overlap, + expected + ); + } + + #[test] + fn test_label_overlap_never_charged_against_own_shape() { + // A single labeled aux: its Bottom label sits adjacent to (and partly + // within the merged bounds of) its OWN shape. A label is never charged + // against its own element's shape, and there is no other element, so the + // obscured fraction is 0 and label_overlap is exactly 0.0. + let view = make_view(vec![aux(1, "samename", 0.0, 0.0)]); + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!( + m.label_overlap, 0.0, + "a label must never be charged against its own element's shape box" + ); + } + + #[test] + fn test_label_overlap_small_clip_is_sensitive() { + // A small node SHAPE clipping a few characters of a short label must + // register at its true per-label obscuration fraction, NOT be diluted to + // ~0 by the corpus's total label area (the old area/total under-count). + // + // L: aux "ab" (2 chars) @ (0,0), Bottom label. + // editor_width = 2*6 + 10 = 22, height 14 -> label area 308. + // label box: left -11, right 11, top 13, bottom 27. + // O: a cloud (no label) @ (18, 20). cloud_bounds (CLOUD_RADIUS 13.5): + // x [4.5, 31.5], y [6.5, 33.5]. + // Overlap with L's label: x [4.5,11]=6.5, y [13,27]=14 -> 91. + // obscured_fraction(L) = 91/308 ~= 0.2955; the cloud has no label, so + // the sum is exactly 91/308. + // Plus 15 far-apart auxes with long (20-char) labels: each label area + // 20*6+10 = 130 wide * 14 = 1820, none overlapping anything. They add + // nothing to the per-label SUM (obscured fraction 0 each) but bloat the + // OLD denominator (total label area), so the OLD area/total score for + // the same clip collapses to ~0.003 -- the under-count this fixes. + let mut elements = vec![aux(1, "ab", 0.0, 0.0), cloud(2, 18.0, 20.0)]; + for k in 0..15 { + // Far apart on a 1000px grid so nothing overlaps; 20-char names. + elements.push(aux( + 100 + k, + "abcdefghijklmnopqrst", + 3000.0 + f64::from(k) * 1000.0, + 3000.0, + )); + } + let view = make_view(elements); + let m = compute_layout_metrics(&view, &cfg()); + + let label_area = 22.0 * 14.0; // 308.0 + let clip_area = 6.5 * 14.0; // 91.0 + let expected = clip_area / label_area; // ~0.2955 + assert!( + (m.label_overlap - expected).abs() < 1e-9, + "small clip must score its per-label obscuration fraction: got {} expected {}", + m.label_overlap, + expected + ); + assert!( + m.label_overlap > 0.1, + "a readability-killing clip must register clearly (> 0.1), got {}", + m.label_overlap + ); + + // Confirm the OLD area/total definition would have under-counted this to + // near-zero: the same clip area divided by the corpus total label area. + let total_label_area = label_area + 15.0 * (130.0 * 14.0); // 308 + 27300 + let old_score = clip_area / total_label_area; // ~0.0033 + assert!( + old_score < 0.01, + "fixture must demonstrate the old under-count (< 0.01), got {}", + old_score + ); + assert!( + m.label_overlap > old_score * 50.0, + "new per-label score {} must be far larger than the old {}", + m.label_overlap, + old_score + ); + } + + // --- AC1.5: aspect_penalty --- + + #[test] + fn test_aspect_penalty_thin_box_positive() { + // Two auxes stacked far apart vertically and close horizontally -> the + // node bounding box is tall and thin (ar >> target), so penalty > 0. + let view = make_view(vec![aux(1, "a", 0.0, 0.0), aux(2, "b", 0.0, 1000.0)]); + let m = compute_layout_metrics(&view, &cfg()); + assert!( + m.aspect_penalty > 0.0, + "a tall thin bbox must be penalized, got {}", + m.aspect_penalty + ); + + // Verify it equals exactly `ar - TARGET_AR_MAX` for the computed bbox. + let node_boxes: Vec<(i32, Rect)> = view + .elements + .iter() + .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r))) + .collect(); + let bbox = view_bounding_box(&node_boxes).unwrap(); + let w = common::rect_width(&bbox); + let h = common::rect_height(&bbox); + let (long, short) = if w >= h { (w, h) } else { (h, w) }; + let expected = (long / short - TARGET_AR_MAX).max(0.0); + assert!((m.aspect_penalty - expected).abs() < 1e-9); + } + + #[test] + fn test_aspect_penalty_balanced_box_zero() { + // Four auxes placed so the bounding box is ~4:3 (well inside the 16:9 + // band) -> zero penalty. Width 400, height 300 between centers; the + // fixed node radii add a small symmetric margin that keeps ar < 16/9. + let view = make_view(vec![ + aux(1, "a", 0.0, 0.0), + aux(2, "b", 400.0, 0.0), + aux(3, "c", 0.0, 300.0), + aux(4, "d", 400.0, 300.0), + ]); + let m = compute_layout_metrics(&view, &cfg()); + + // Confirm the bbox aspect ratio really is inside the band for this + // fixture, then assert the penalty is exactly zero. + let node_boxes: Vec<(i32, Rect)> = view + .elements + .iter() + .filter_map(|e| node_box(e).map(|r| (e.get_uid(), r))) + .collect(); + let bbox = view_bounding_box(&node_boxes).unwrap(); + let w = common::rect_width(&bbox); + let h = common::rect_height(&bbox); + let ar = w.max(h) / w.min(h); + assert!(ar <= TARGET_AR_MAX, "fixture bbox ar {} not in band", ar); + assert_eq!(m.aspect_penalty, 0.0); + } + + // --- AC1.6: weighted_cost is the exact linear combination --- + + #[test] + fn test_weighted_cost_exact_linear_combination() { + let m = LayoutMetrics { + node_overlap: 1.5, + node_connector_overlap: 2.0, + label_overlap: 0.5, + crossings: 3.0, + sprawl: 4.0, + edge_length_cv: 0.25, + aspect_penalty: 6.0, + chain_straightness: 7.0, + loop_compactness: 8.0, + }; + let w = MetricWeights { + node_overlap: 10.0, + node_connector_overlap: 20.0, + label_overlap: 30.0, + crossings: 40.0, + sprawl: 50.0, + edge_length_cv: 60.0, + aspect_penalty: 70.0, + chain_straightness: 80.0, + loop_compactness: 90.0, + }; + let expected = 1.5 * 10.0 + + 2.0 * 20.0 + + 0.5 * 30.0 + + 3.0 * 40.0 + + 4.0 * 50.0 + + 0.25 * 60.0 + + 6.0 * 70.0 + + 7.0 * 80.0 + + 8.0 * 90.0; + assert!((m.weighted_cost(&w) - expected).abs() < 1e-9); + } + + // --- AC5.1: the committed calibrated default expresses readability dominance --- + // + // The Phase-1 placeholder default was all-zeros (so a pre-calibration + // `weighted_cost` was inert). Phase 4 commits real, user-signed-off weights + // (2026-05-23), so the default is no longer all-zeros and `weighted_cost` + // under it is now meaningful. This test pins the DOMINANCE ORDERING the + // committed weights encode -- relationships rather than magic numbers, so it + // documents the intent and survives minor retuning -- and re-confirms that + // `weighted_cost` applies the default exactly as Σ wᵢ·termᵢ. It replaces the + // old "default is all-zeros so cost is inert" assertion, which is no longer + // true by design. + + #[test] + fn test_default_weights_readability_dominant_ordering() { + let w = MetricWeights::default(); + + // The dominant "overlap + crossings" family: each term that hurts + // readability (shapes overlapping shapes, connectors under shapes, labels + // obscured, edges crossing) must outweigh every compactness/aspect term. + let dominant = [ + w.node_overlap, + w.node_connector_overlap, + w.label_overlap, + w.crossings, + ]; + let compactness = [w.sprawl, w.edge_length_cv, w.aspect_penalty]; + for &d in &dominant { + for &c in &compactness { + assert!( + d > c, + "every readability term ({d}) must strictly exceed every \ + compactness/aspect term ({c})" + ); + } + } + + // Compactness/aspect are intentionally zero: spreading out to keep labels + // legible and feedback loops visible is good, not penalized. + assert_eq!(w.sprawl, 0.0, "sprawl is not a goal"); + assert_eq!( + w.edge_length_cv, 0.0, + "edge-length uniformity is not a goal" + ); + assert_eq!(w.aspect_penalty, 0.0, "aspect ratio is not a goal"); + + // chain_straightness is reserved (not yet computed), so it carries no + // weight. + assert_eq!( + w.chain_straightness, 0.0, + "chain_straightness is reserved and must stay zero" + ); + + // loop_compactness rewards visible feedback-loop circles, but only as a + // gentle nudge: a low, non-dominant weight strictly between zero and the + // dominant family. + assert!( + w.loop_compactness > 0.0, + "loop_compactness should gently reward visible loops, got {}", + w.loop_compactness + ); + assert!( + w.loop_compactness < w.node_overlap, + "loop_compactness ({}) must stay below the dominant node_overlap ({})", + w.loop_compactness, + w.node_overlap + ); + + // `weighted_cost` under the default is still the exact linear combination + // (the default is now meaningful, not inert): verify against an explicit + // Σ wᵢ·termᵢ over a hand-set metrics value. + let m = LayoutMetrics { + node_overlap: 0.3, + node_connector_overlap: 0.1, + label_overlap: 0.7, + crossings: 2.0, + sprawl: 5.0, + edge_length_cv: 0.4, + aspect_penalty: 1.5, + chain_straightness: 0.0, + loop_compactness: 0.8, + }; + let expected = m.node_overlap * w.node_overlap + + m.node_connector_overlap * w.node_connector_overlap + + m.label_overlap * w.label_overlap + + m.crossings * w.crossings + + m.sprawl * w.sprawl + + m.edge_length_cv * w.edge_length_cv + + m.aspect_penalty * w.aspect_penalty + + m.chain_straightness * w.chain_straightness + + m.loop_compactness * w.loop_compactness; + assert!( + (m.weighted_cost(&w) - expected).abs() < 1e-12, + "weighted_cost under the default must equal Σ wᵢ·termᵢ: got {} expected {}", + m.weighted_cost(&w), + expected + ); + } + + // --- AC1.7: empty / single-element views are all-zero and finite --- + + fn assert_all_finite(m: &LayoutMetrics) { + assert!(m.node_overlap.is_finite()); + assert!(m.node_connector_overlap.is_finite()); + assert!(m.label_overlap.is_finite()); + assert!(m.crossings.is_finite()); + assert!(m.sprawl.is_finite()); + assert!(m.edge_length_cv.is_finite()); + assert!(m.aspect_penalty.is_finite()); + assert!(m.chain_straightness.is_finite()); + assert!(m.loop_compactness.is_finite()); + } + + fn assert_all_zero(m: &LayoutMetrics) { + assert_eq!(m.node_overlap, 0.0); + assert_eq!(m.node_connector_overlap, 0.0); + assert_eq!(m.label_overlap, 0.0); + assert_eq!(m.crossings, 0.0); + assert_eq!(m.sprawl, 0.0); + assert_eq!(m.edge_length_cv, 0.0); + assert_eq!(m.aspect_penalty, 0.0); + assert_eq!(m.chain_straightness, 0.0); + assert_eq!(m.loop_compactness, 0.0); + } + + #[test] + fn test_empty_view_all_zero_finite() { + let view = make_view(vec![]); + let m = compute_layout_metrics(&view, &cfg()); + assert_all_finite(&m); + assert_all_zero(&m); + } + + #[test] + fn test_single_element_view_all_zero_finite() { + let view = make_view(vec![aux(1, "only", 100.0, 100.0)]); + let m = compute_layout_metrics(&view, &cfg()); + assert_all_finite(&m); + // A single node has no overlaps, no connectors, and a degenerate (zero + // short-side? no -- a real box) bounding box. Its aspect ratio is the + // single aux box's own ar, which for a square-ish aux box is ~1 (inside + // the band), so aspect_penalty is 0; all connector terms are 0. + assert_eq!(m.node_overlap, 0.0); + assert_eq!(m.node_connector_overlap, 0.0); + assert_eq!(m.crossings, 0.0); + assert_eq!(m.sprawl, 0.0); + assert_eq!(m.edge_length_cv, 0.0); + } + + // --- AC1.8 (scoped): scale invariance under uniform coordinate scaling --- + // + // SCOPING (correction to the AC1.8 plan note, 2026-05-22): the plan listed + // `node_connector_overlap`, `crossings`, `edge_length_cv`, and + // `aspect_penalty` as scale-free. After implementing the metric against the + // ACTUAL renderer geometry (the design's load-bearing invariant: metrics + // are computed on the same geometry the renderer draws), only `crossings` + // is exactly scale-invariant -- and even then only for crossings that lie + // INTERIOR to both connectors, away from the fixed-size node boundaries the + // polylines are clipped to (a crossing grazing a node boundary near a + // segment endpoint can flip; see the detailed note at the assertion below). + // This fixture's crossing is at the center of the square the two links form, + // squarely in that interior regime. The reason the other terms are not + // exactly invariant is the same fixed-pixel element geometry the plan + // already cites for node_overlap/label_overlap/sprawl, and it propagates + // further than the plan anticipated: + // + // * Connectors are clipped to fixed-radius element boundaries, so a + // straight link's drawn length is `s*center_dist - r_from - r_to` + // (AFFINE in `s`, not linear). Hence `edge_length_cv = stddev/mean` of + // those affine lengths is only ASYMPTOTICALLY invariant (the fixed + // offset shrinks relative to the scaled spread), not exactly. + // * `node_connector_overlap` divides an inside-fixed-box overlap length + // (which does NOT scale) by total connector length (which does), so it + // shrinks like ~1/s -- scale-SENSITIVE, like `sprawl`. + // * The view bounding box is `union(fixed boxes around scaled centers)`, + // so its width/height are each `s*span + fixed_box_size`; the aspect + // ratio is therefore only asymptotically invariant. + // + // The principled resolution keeps renderer-faithful geometry (the whole + // point of the phase) and accepts that only the topological `crossings` + // term is exactly scale-invariant. This test asserts that exactly, and + // additionally pins the documented scale-SENSITIVITY of + // `node_connector_overlap` (clean ~1/s) so the scoping is non-vacuous. The + // mismatch with the plan's term list is surfaced in the executor report and + // tracked for the calibration phase. + // + // The fixture has zero node-overlap and zero label-overlap so those + // scale-sensitive area terms are trivially 0 before and after scaling. + #[test] + fn test_scale_invariance_of_scale_free_terms() { + // A small connected, well-separated view: three auxes and two stocks, + // far enough apart that there is no node-overlap and no label-overlap, + // with two straight links (one of which passes through a non-incident + // node so node_connector_overlap is nonzero and meaningful). + let view = make_view(vec![ + aux(1, "a", 0.0, 0.0), + aux(2, "b", 400.0, 0.0), + stock(3, "s", 200.0, 0.0), // on the a->b line: nonzero conn overlap + aux(4, "c", 0.0, 300.0), + stock(5, "t", 400.0, 320.0), + straight_link(10, 1, 2), // passes through stock #3 + straight_link(11, 4, 5), + ]); + + let base = compute_layout_metrics(&view, &cfg()); + // Sanity: the fixture must have zero node/label overlap (so the + // scale-sensitive area terms are trivially scale-equal) and a nonzero + // conn-overlap (so the documented scale-SENSITIVITY check is + // non-vacuous). + assert_eq!(base.node_overlap, 0.0, "fixture must have no node overlap"); + assert_eq!( + base.label_overlap, 0.0, + "fixture must have no label overlap" + ); + assert!( + base.node_connector_overlap > 0.0, + "fixture must have a connector through a non-incident node" + ); + + let s = 3.0; + let scaled = compute_layout_metrics(&scale_view(&view, s), &cfg()); + + // The one exactly scale-invariant term here: edge crossings. + // + // Crossings are NOT *universally* scale-invariant. A crossing is counted + // on the drawn polylines, which are clipped to the same fixed-pixel node + // boxes (the connector endpoints sit on element boundaries that do not + // scale). A crossing that merely grazes a node boundary near a segment + // endpoint can therefore appear or disappear under uniform scale. + // Crossings that lie comfortably INTERIOR to both connectors (away from + // those fixed-size boundaries) are exactly preserved, because the + // interior of each polyline is an exact affine image of itself under + // uniform scale and an intersection of two segments is invariant under a + // shared affine map. This fixture's crossing is at the center of the + // square the two links form -- maximally far from every node box -- so + // it is squarely in the scale-invariant interior regime and the count is + // preserved exactly. + assert!( + (scaled.crossings - base.crossings).abs() < 1e-9, + "crossings not scale-invariant: {} vs {}", + scaled.crossings, + base.crossings + ); + + // Documented scale-SENSITIVITY of node_connector_overlap: with + // fixed-size node boxes, scaling the coordinates by `s` leaves the + // inside-box overlap length essentially unchanged (the box and the + // line's center crossing are fixed) while total connector length grows + // with `s`, so the ratio strictly DECREASES under up-scaling. (It does + // not drop by exactly 1/s because the denominator -- connector length + // clipped to fixed-radius element boundaries -- is affine in `s`, not + // linear; we assert the robust direction rather than a brittle factor.) + assert!( + scaled.node_connector_overlap < base.node_connector_overlap, + "node_connector_overlap should DROP under up-scaling (fixed boxes): \ + scaled {} should be < base {}", + scaled.node_connector_overlap, + base.node_connector_overlap + ); + } + + // --- Property test: node_overlap is symmetric under element shuffle --- + + proptest! { + #![proptest_config(ProptestConfig::with_cases(64))] + + /// node_overlap is a sum over unordered element pairs, so it must be + /// invariant under any permutation of the element list. + #[test] + fn prop_node_overlap_shuffle_invariant( + // four stocks at small integer-ish coordinates so some overlap and + // some don't; coordinates kept modest to stay fast. + xs in prop::collection::vec(-50.0f64..50.0, 4), + ys in prop::collection::vec(-50.0f64..50.0, 4), + perm in prop::sample::subsequence(vec![0usize, 1, 2, 3], 4), + ) { + let elems: Vec = (0..4) + .map(|i| stock(i as i32 + 1, "n", xs[i], ys[i])) + .collect(); + + let base = compute_layout_metrics(&make_view(elems.clone()), &cfg()); + + // `perm` is a random ordering of [0,1,2,3]; reorder accordingly. + let shuffled: Vec = perm.iter().map(|&i| elems[i].clone()).collect(); + let other = compute_layout_metrics(&make_view(shuffled), &cfg()); + + prop_assert!( + (base.node_overlap - other.node_overlap).abs() < 1e-9, + "node_overlap changed under shuffle: {} vs {}", + base.node_overlap, + other.node_overlap + ); + } + } + + // --- loop_compactness (isoperimetric loop quality) --- + + /// The center of a node's bare shape box (which is symmetric about the + /// element position, so this is the element center). Mirrors the centers the + /// metric uses to build each loop polygon. + fn shape_center(e: &ViewElement) -> Point { + let r = node_shape_box(e).unwrap(); + Point { + x: (r.left + r.right) / 2.0, + y: (r.top + r.bottom) / 2.0, + } + } + + /// Hand-computed isoperimetric penalty `1 - Q` for a polygon over the given + /// centers in order (shoelace area, summed-edge perimeter, Q clamped to + /// [0,1]). The test's independent oracle for `loop_compactness`. + fn expected_loop_penalty(centers: &[Point]) -> f64 { + let n = centers.len(); + let mut area2 = 0.0; + let mut perim = 0.0; + for i in 0..n { + let a = centers[i]; + let b = centers[(i + 1) % n]; + area2 += a.x * b.y - b.x * a.y; + let dx = b.x - a.x; + let dy = b.y - a.y; + perim += (dx * dx + dy * dy).sqrt(); + } + let area = area2.abs() / 2.0; + let q = (4.0 * std::f64::consts::PI * area / (perim * perim)).clamp(0.0, 1.0); + 1.0 - q + } + + #[test] + fn test_loop_compactness_circle_loop_near_zero() { + // Eight stocks placed on a circle of radius 300, wired into a directed + // 8-cycle by links 1->2->...->8->1. A well-spread loop reads as a clean + // circle, so its isoperimetric quotient Q is close to 1 and the penalty + // (1 - Q) is small. + let n: i32 = 8; + let radius = 300.0; + let mut elements: Vec = Vec::new(); + let mut centers: Vec = Vec::new(); + for i in 0..n { + let theta = 2.0 * std::f64::consts::PI * f64::from(i) / f64::from(n); + let x = radius * theta.cos(); + let y = radius * theta.sin(); + let e = stock(i + 1, "n", x, y); + centers.push(shape_center(&e)); + elements.push(e); + } + for i in 0..n { + let from = i + 1; + let to = (i + 1) % n + 1; + elements.push(straight_link(100 + i, from, to)); + } + let view = make_view(elements); + let m = compute_layout_metrics(&view, &cfg()); + + let expected = expected_loop_penalty(¢ers); + assert!( + (m.loop_compactness - expected).abs() < 1e-9, + "loop_compactness {} != hand-computed penalty {}", + m.loop_compactness, + expected + ); + // A regular octagon's penalty is ~0.05 -- "near 0" (a clean circle). + assert!( + m.loop_compactness < 0.1, + "a well-spread circular loop should score near 0, got {}", + m.loop_compactness + ); + } + + #[test] + fn test_loop_compactness_collapsed_loop_higher() { + // The SAME directed 8-cycle, but the nodes are squished onto a nearly + // straight line (a collapsed/collinear loop). The polygon area shrinks + // toward zero while the perimeter stays large, so Q -> 0 and the penalty + // (1 - Q) -> 1: clearly higher than the circular placement. + let n: i32 = 8; + let mut elements: Vec = Vec::new(); + let mut centers: Vec = Vec::new(); + for i in 0..n { + // Spread along x, with a tiny alternating y wobble so the polygon is + // non-degenerate (nonzero perimeter) but nearly collinear. + let x = f64::from(i) * 100.0; + let y = if i % 2 == 0 { 0.0 } else { 1.0 }; + let e = stock(i + 1, "n", x, y); + centers.push(shape_center(&e)); + elements.push(e); + } + for i in 0..n { + let from = i + 1; + let to = (i + 1) % n + 1; + elements.push(straight_link(100 + i, from, to)); + } + let view = make_view(elements); + let m = compute_layout_metrics(&view, &cfg()); + + let expected = expected_loop_penalty(¢ers); + assert!( + (m.loop_compactness - expected).abs() < 1e-9, + "loop_compactness {} != hand-computed penalty {}", + m.loop_compactness, + expected + ); + // A nearly-collinear loop scores near 1 (squished). + assert!( + m.loop_compactness > 0.9, + "a collapsed/collinear loop should score near 1, got {}", + m.loop_compactness + ); + } + + #[test] + fn test_loop_compactness_no_cycle_is_zero() { + // A pure chain a -> b -> c (no feedback) has no directed cycle, so there + // is nothing to score: loop_compactness == 0.0. + let view = make_view(vec![ + aux(1, "a", 0.0, 0.0), + aux(2, "b", 200.0, 0.0), + aux(3, "c", 400.0, 0.0), + straight_link(10, 1, 2), + straight_link(11, 2, 3), + ]); + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!(m.loop_compactness, 0.0); + } + + #[test] + fn test_loop_compactness_two_node_mutual_pair_is_zero() { + // A 2-node mutual pair (a -> b -> a) is a cycle, but two points form no + // polygon (fewer than 3 distinct nodes), so it contributes nothing. + let view = make_view(vec![ + aux(1, "a", 0.0, 0.0), + aux(2, "b", 200.0, 0.0), + straight_link(10, 1, 2), + straight_link(11, 2, 1), + ]); + let m = compute_layout_metrics(&view, &cfg()); + assert_eq!(m.loop_compactness, 0.0); + } + + #[test] + fn test_loop_compactness_flow_feedback_path_is_a_cycle() { + // A stock--flow--stock feedback path must enter the loop graph: stock #1 + // and stock #2 connected by flow #3 (so #1 -> #3 -> #2), plus a link + // #2 -> #1 closing the loop. The cycle is {#1, #3, #2}: three distinct + // positioned nodes -> a real polygon -> a positive penalty. + let s1 = stock(1, "a", 0.0, 0.0); + let s2 = stock(2, "b", 300.0, 0.0); + let f = flow_between(3, "f", 150.0, 200.0, 1, 2); + let link = straight_link(10, 2, 1); + let view = make_view(vec![s1, s2, f, link]); + let m = compute_layout_metrics(&view, &cfg()); + assert!( + m.loop_compactness > 0.0, + "a stock--flow--stock feedback path must form a scored loop, got {}", + m.loop_compactness + ); + } + + /// A stock--flow--stock loop whose flow has an extra pipe point placed far + /// from the valve, plus a closing link. The flow valve sits at `valve`; an + /// interior pipe point at `bend` (between the two attached endpoints) bends + /// the drawn pipe. `loop_compactness` must score the loop on the flow's + /// VALVE (its visual center), NOT on `flow_shape_bounds`' pipe-extent bbox + /// center, so the result must depend only on `valve` -- never on `bend`. + fn bent_flow_loop_view(valve: Point, bend: Point) -> datamodel::StockFlow { + let s1 = stock(1, "a", 0.0, 0.0); + let s2 = stock(2, "b", 300.0, 0.0); + let f = ViewElement::Flow(view_element::Flow { + name: "f".to_string(), + uid: 3, + x: valve.x, + y: valve.y, + label_side: LabelSide::Bottom, + points: vec![ + view_element::FlowPoint { + x: 0.0, + y: 0.0, + attached_to_uid: Some(1), + }, + // An interior pipe point that bends the drawn pipe and stretches + // `flow_shape_bounds`' bbox, but is NOT the valve. + view_element::FlowPoint { + x: bend.x, + y: bend.y, + attached_to_uid: None, + }, + view_element::FlowPoint { + x: 300.0, + y: 0.0, + attached_to_uid: Some(2), + }, + ], + compat: None, + label_compat: None, + }); + let link = straight_link(10, 2, 1); + make_view(vec![s1, s2, f, link]) + } + + #[test] + fn test_loop_compactness_scored_on_flow_valve_not_pipe_extent() { + // The loop vertex for a flow must be its VALVE (the renderer's visual + // center), not the center of `flow_shape_bounds` (which unions the valve + // box with every pipe point). Extending the pipe with a far interior + // point moves the pipe-extent bbox center but leaves the valve fixed, so + // `loop_compactness` -- which scores the feedback-loop polygon -- must be + // UNCHANGED. On the buggy (shape-box-midpoint) implementation it changes. + let valve = Point { x: 150.0, y: 200.0 }; + + // A pipe bend near the valve vs. one stretched far away. The valve is + // identical in both, so the loop polygon (stock--valve--stock) is too. + let near = compute_layout_metrics( + &bent_flow_loop_view(valve, Point { x: 150.0, y: 210.0 }), + &cfg(), + ); + let far = compute_layout_metrics( + &bent_flow_loop_view( + valve, + Point { + x: 150.0, + y: 2000.0, + }, + ), + &cfg(), + ); + + assert!( + near.loop_compactness > 0.0, + "fixture must form a real (positive-penalty) loop, got {}", + near.loop_compactness + ); + assert!( + (near.loop_compactness - far.loop_compactness).abs() < 1e-12, + "loop_compactness must score the flow VALVE, not the pipe-extent bbox \ + center: stretching the pipe changed it from {} to {}", + near.loop_compactness, + far.loop_compactness + ); + + // Non-vacuous guard: MOVING the valve (with the same pipe bend) DOES + // change the loop polygon, so the metric is not trivially constant. + let moved_valve = compute_layout_metrics( + &bent_flow_loop_view(Point { x: 150.0, y: 400.0 }, Point { x: 150.0, y: 210.0 }), + &cfg(), + ); + assert!( + (near.loop_compactness - moved_valve.loop_compactness).abs() > 1e-9, + "moving the valve must change loop_compactness (test is not trivially \ + constant): {} vs {}", + near.loop_compactness, + moved_valve.loop_compactness + ); + } + + #[test] + fn test_loop_compactness_deterministic_under_shuffle() { + // loop_compactness is a mean over cycles, each computed from node-box + // centers in cycle order. It must be invariant to the order elements + // appear in the view's element list. + let n: i32 = 6; + let radius = 250.0; + let mut elements: Vec = Vec::new(); + for i in 0..n { + let theta = 2.0 * std::f64::consts::PI * f64::from(i) / f64::from(n); + elements.push(stock( + i + 1, + "n", + radius * theta.cos(), + radius * theta.sin(), + )); + } + for i in 0..n { + let from = i + 1; + let to = (i + 1) % n + 1; + elements.push(straight_link(100 + i, from, to)); + } + let base = compute_layout_metrics(&make_view(elements.clone()), &cfg()); + + // Reverse the element order (links before nodes, nodes reversed); the + // graph and its cycles are unchanged. + let mut shuffled = elements.clone(); + shuffled.reverse(); + let other = compute_layout_metrics(&make_view(shuffled), &cfg()); + + assert!( + (base.loop_compactness - other.loop_compactness).abs() < 1e-12, + "loop_compactness changed under element shuffle: {} vs {}", + base.loop_compactness, + other.loop_compactness + ); + assert!(base.loop_compactness > 0.0); + } + + // --- AC5.2: human-vs-auto reference-pair ordering under the committed weights --- + // + // The committed `MetricWeights::default()` must agree with the user's visual + // taste: on the agreed reference pairs the SHIPPED, hand-authored ("human") + // layout must score a lower `weighted_cost` than a machine-generated + // ("auto") layout of the SAME model. This is the objective validation of the + // calibration (Phase 4, AC5.2): if the metric and the weights did not agree + // with human taste on an obvious pair, the metric or the pair would be wrong. + // + // Construction (b) -- "human view vs generated layout" (design glossary): the + // four `default_projects` models each ship a hand-authored main view. We + // score that as-loaded view (human) and a fixed-seed `generate_layout_with_config` + // layout (auto) of the same model, and assert `human < auto`. + // + // Determinism + budget: layout is deterministic per seed (fix #633), so ONE + // fixed seed (not `generate_best_layout`'s multi-seed search) makes the test + // reproducible AND fast. The four default_projects are small (<= 42 + // elements), so a single layout generation each is well under the per-test + // budget. + // + // Anchors: reliability, fishbanks, population, dp(logistic-growth). These all + // flip the right way under the committed weights (verified during + // calibration). `sir` is deliberately NOT a human datamodel::Project { + let path = format!( + "{}/../../default_projects/{}/model.xmile", + env!("CARGO_MANIFEST_DIR"), + dir + ); + let file = + std::fs::File::open(&path).unwrap_or_else(|e| panic!("failed to open {path}: {e}")); + let mut reader = std::io::BufReader::new(file); + crate::compat::open_xmile(&mut reader) + .unwrap_or_else(|e| panic!("failed to parse {path}: {e:?}")) + } + + /// The model's as-loaded, hand-authored main `StockFlow` view (the "human" + /// reference). Panics if the model has no such view -- every chosen anchor + /// ships one, so its absence is a fixture regression. + fn human_view(project: &datamodel::Project) -> datamodel::StockFlow { + let model = project + .get_model("main") + .expect("anchor model must have a 'main' model"); + match model.views.first() { + Some(datamodel::View::StockFlow(sf)) if !sf.elements.is_empty() => sf.clone(), + _ => panic!("anchor model must ship a non-empty hand-authored main view"), + } + } + + /// `weighted_cost` of the shipped human layout under the committed default + /// weights. + fn human_cost(project: &datamodel::Project) -> f64 { + let view = human_view(project); + compute_layout_metrics(&view, &LayoutConfig::default()) + .weighted_cost(&MetricWeights::default()) + } + + /// `weighted_cost` of a single fixed-seed generated layout under the committed + /// default weights. Deterministic per seed, so the score is reproducible. + fn auto_cost(project: &datamodel::Project) -> f64 { + let cfg = LayoutConfig { + annealing_random_seed: REF_PAIR_SEED, + ..LayoutConfig::default() + }; + let view = crate::layout::generate_layout_with_config(project, "main", cfg.clone(), None) + .expect("auto layout generation must succeed for the anchor model"); + compute_layout_metrics(&view, &cfg).weighted_cost(&MetricWeights::default()) + } + + /// Assert the human reference beats the auto layout for one anchor model, + /// naming the model and both costs on failure (so a calibration regression is + /// immediately legible). + fn assert_human_beats_auto(dir: &str) { + let project = load_default_project(dir); + let human = human_cost(&project); + let auto = auto_cost(&project); + assert!( + human < auto, + "reference pair {dir}: expected human_cost ({human}) < auto_cost ({auto}) \ + under MetricWeights::default()" + ); + } + + #[test] + fn test_reference_pair_reliability_human_beats_auto() { + assert_human_beats_auto("reliability"); + } + + #[test] + fn test_reference_pair_fishbanks_human_beats_auto() { + assert_human_beats_auto("fishbanks"); + } + + // Population is a MARGINAL taste anchor: under the committed default weights + // its human cost (~0.0521) beats auto (~0.0533) by only ~2.3%, far thinner + // than the other anchors (reliability ~8.5%, fishbanks ~12%, + // logistic-growth ~58%). The layout is deterministic per seed, so the + // assertion is not flaky -- but if it ever fails it should be read as + // "population sits near the boundary" rather than necessarily a real metric + // regression. The robust signal lives in reliability/fishbanks/logistic-growth. + #[test] + fn test_reference_pair_population_human_beats_auto() { + assert_human_beats_auto("population"); + } + + #[test] + fn test_reference_pair_dp_logistic_growth_human_beats_auto() { + assert_human_beats_auto("logistic-growth"); + } + + #[test] + fn test_sir_auto_beats_reference_under_default_weights() { + // The documented NON-anchor: SIR's shipped reference obscures more labels + // than the auto layout, so the metric correctly prefers the auto. This + // pins that direction so the asymmetry (why SIR is excluded from the + // human = HashSet::new(); + // Iterate edges in a deterministic order. `new_edges` is a HashSet, so its + // iteration order is per-process random; since each newly-created link both + // allocates a sequential `uid` and is appended to `state.elements` in this + // loop, hash order would otherwise assign different uids / element ordering + // to the same logical link run-to-run (the incremental analogue of #633). + let mut sorted_new_edges: Vec<(i32, i32)> = new_edges.iter().copied().collect(); + sorted_new_edges.sort_unstable(); + // Add back preserved links (unchanged) and create new links - for &(from_uid, to_uid) in &new_edges { + for (from_uid, to_uid) in sorted_new_edges { if let Some(old_link) = old_links.get(&(from_uid, to_uid)) { // Preserved: keep the old link exactly as-is state.elements.push(old_link.clone()); consumed_old_links.insert((from_uid, to_uid)); - } else if let Some((&key, old_link)) = old_links.iter().find(|&(&(of, ot), _)| { - if consumed_old_links.contains(&(of, ot)) { - return false; - } - let rf = alias_to_primary.get(&of).copied().unwrap_or(of); - let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot); - rf == from_uid && rt == to_uid - }) { + } else if let Some(key) = old_links + .keys() + .copied() + .filter(|&(of, ot)| { + if consumed_old_links.contains(&(of, ot)) { + return false; + } + let rf = alias_to_primary.get(&of).copied().unwrap_or(of); + let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot); + rf == from_uid && rt == to_uid + }) + // Pick the lowest matching key so the alias-match selection is + // deterministic; HashMap iteration order would otherwise vary. + .min() + { // Preserved via alias: the old link targets an alias whose primary // variable matches this dependency edge. Keep the alias link as-is. - state.elements.push(old_link.clone()); + state.elements.push(old_links[&key].clone()); consumed_old_links.insert(key); } else if let Some((from_ident, to_ident)) = new_edge_idents.get(&(from_uid, to_uid)) { // Added: create new link with default shape @@ -1170,14 +1191,19 @@ pub fn diff_connectors(state: &mut LayoutState, metadata: &ComputedMetadata) { // match a valid dependency. Imported views may have multiple rendered // connectors for the same dependency (e.g., links to two different // aliases of the same variable). - for (&(of, ot), old_link) in &old_links { + // Iterate in a deterministic order for the same reason as the new-edge loop: + // the preserved links are appended to `state.elements`, so HashMap iteration + // order would otherwise perturb element ordering run-to-run. + let mut sorted_old_links: Vec<&(i32, i32)> = old_links.keys().collect(); + sorted_old_links.sort_unstable(); + for &(of, ot) in sorted_old_links { if consumed_old_links.contains(&(of, ot)) { continue; } let rf = alias_to_primary.get(&of).copied().unwrap_or(of); let rt = alias_to_primary.get(&ot).copied().unwrap_or(ot); if new_edges.contains(&(rf, rt)) { - state.elements.push(old_link.clone()); + state.elements.push(old_links[&(of, ot)].clone()); } } } @@ -2454,7 +2480,16 @@ fn run_sfdp_with_rigid_chains( let mut center_y = config.start_y; let mut count = 0; - for (var_ident, node_id) in var_to_node { + // `var_to_node` is a HashMap, so its iteration order is per-process random. + // Two loops below are order-sensitive: the centroid accumulation sums floats + // (non-associative, so hash order perturbs the result) and the aux-placement + // loop assigns each unpositioned aux a polar seed angle by its iteration rank. + // Materialize a deterministic sorted view and iterate THAT in both loops so a + // fixed (model, seed) yields a bit-identical layout across repeated calls (#633). + let mut entries: Vec<(&String, &String)> = var_to_node.iter().collect(); + entries.sort(); + + for &(var_ident, node_id) in &entries { if let Some(uid) = state.uid_manager.get_uid(var_ident) && let Some(&pos) = state.positions.get(&uid) { @@ -2489,7 +2524,7 @@ fn run_sfdp_with_rigid_chains( } let mut aux_index = 0; - for node_id in var_to_node.values() { + for &(_var_ident, node_id) in &entries { if initial_layout.contains_key(node_id) { continue; } @@ -4327,67 +4362,217 @@ fn detect_chains( chains } -/// Count edge crossings in a completed StockFlow view. +/// Whether `p` lies on the segment from flow point `a` to flow point `b`, +/// within a small pixel tolerance. Used to find the pipe segment a flow's valve +/// sits on so the valve can be injected as a shared `elem_{flow.uid}` vertex. /// -/// Arc and multi-point link shapes are approximated as straight segments -/// from source to target position, so counts for diagrams with curved -/// connectors are approximate. -pub fn count_view_crossings(view: &datamodel::StockFlow) -> usize { +/// The perpendicular distance from `p` to the line must be tiny, and `p` must +/// project within the segment (parameter in `[0, 1]`). A degenerate segment +/// (`a == b`) only matches when `p` coincides with it. +fn point_on_segment( + p: Position, + a: &datamodel::view_element::FlowPoint, + b: &datamodel::view_element::FlowPoint, +) -> bool { + const TOL: f64 = 0.5; // pixels + let a = Position::new(a.x, a.y); + let b = Position::new(b.x, b.y); + let ab = b - a; + let ap = p - a; + let len_sq = ab.dot(ab); + if len_sq < f64::EPSILON { + // Degenerate segment: only "on" it if p coincides with the point. + return ap.dot(ap) < TOL * TOL; + } + // Project p onto the line; require it to fall within the segment. + let t = ap.dot(ab) / len_sq; + if !(0.0..=1.0).contains(&t) { + return false; + } + // Perpendicular distance: |ap x ab| / |ab|. + let perp = ap.cross_2d(ab).abs() / len_sq.sqrt(); + perp < TOL +} + +/// Build the set of [`LineSegment`]s that crossing detection runs over for a +/// completed StockFlow view. This is the single source of geometry shared by +/// [`count_view_crossings`] and the layout quality metric, so a layout's +/// crossing score can never disagree with the geometry the renderer draws. +/// +/// Connector geometry comes from [`crate::diagram::connector::connector_polyline`], +/// the exact polyline the SVG renderer draws: straight links are clipped to +/// element boundaries, arcs are sampled along their arc circle, and MultiPoint +/// links contribute nothing (the renderer draws nothing for them today). +/// +/// Element endpoints are resolved over *all* element kinds, so a link incident +/// on a Module or Alias is no longer dropped (the previous chord-based code +/// only mapped Stock/Flow/Aux/Cloud, silently undercounting such crossings). +/// +/// Node naming suppresses self- and shared-endpoint "crossings" exactly like +/// before: a connector's first vertex is `elem_{from_uid}` and its last is +/// `elem_{to_uid}` (so two connectors sharing an element endpoint never count), +/// while internal arc-sample vertices are `link_{link.uid}#{i}` (so the +/// consecutive segments of one arc share an internal node name and never count +/// as self-crossings). +/// +/// A flow's pipe vertices share those same `elem_{uid}` names with whatever +/// element they connect to, so a link incident on the flow grazes but does not +/// "cross" the pipe at the shared connection point. A point attached to a +/// stock/cloud is named `elem_{attached_to_uid}` (matching a link whose +/// endpoint is that stock/cloud), and the flow's valve -- which sits on the +/// pipe, not necessarily at a stored point -- is injected as an extra vertex +/// named `elem_{flow.uid}` so a link incident on the valve (its `to_uid`/ +/// `from_uid` is the flow's own element uid) is suppressed there too. A +/// genuinely free interior point (no attachment, not the valve) keeps the +/// historic per-flow `flow_{uid}#{i}` name, so a link that crosses the pipe +/// mid-span -- sharing no element with the flow -- is still counted. +fn build_view_segments(view: &datamodel::StockFlow) -> Vec { if view.elements.is_empty() { - return 0; + return Vec::new(); } - let mut uid_positions: HashMap = HashMap::new(); + // Resolve every element by uid so a link can find its endpoints regardless + // of the endpoint's kind (Module/Alias included). + let mut uid_elements: HashMap = HashMap::new(); for elem in &view.elements { - match elem { - ViewElement::Stock(s) => { - uid_positions.insert(s.uid, Position::new(s.x, s.y)); - } - ViewElement::Flow(f) => { - uid_positions.insert(f.uid, Position::new(f.x, f.y)); - } - ViewElement::Aux(a) => { - uid_positions.insert(a.uid, Position::new(a.x, a.y)); - } - ViewElement::Cloud(c) => { - uid_positions.insert(c.uid, Position::new(c.x, c.y)); - } - _ => {} - } + uid_elements.insert(elem.get_uid(), elem); } + // Crossing detection is center-based and deterministic; no element is + // treated as arrayed (matching the historic behavior). + let not_arrayed = |_: &str| false; + let mut segments: Vec = Vec::new(); for elem in &view.elements { match elem { ViewElement::Link(link) => { - if let (Some(&from_pos), Some(&to_pos)) = ( - uid_positions.get(&link.from_uid), - uid_positions.get(&link.to_uid), - ) { + let (Some(&from), Some(&to)) = ( + uid_elements.get(&link.from_uid), + uid_elements.get(&link.to_uid), + ) else { + continue; // an endpoint is genuinely missing + }; + + let polyline = crate::diagram::connector::connector_polyline( + link, + from, + to, + ¬_arrayed, + crate::diagram::connector::ARC_POLYLINE_SAMPLES, + ); + if polyline.len() < 2 { + continue; // MultiPoint / degenerate: nothing drawn + } + + let last_idx = polyline.len() - 1; + // Name the first vertex after the source element and the last + // after the target element so two connectors sharing an element + // endpoint are suppressed; name internal vertices per-link so a + // connector never crosses itself. + let vertex_name = |i: usize| -> String { + if i == 0 { + format!("elem_{}", link.from_uid) + } else if i == last_idx { + format!("elem_{}", link.to_uid) + } else { + format!("link_{}#{}", link.uid, i) + } + }; + + for i in 0..last_idx { + let a = polyline[i]; + let b = polyline[i + 1]; segments.push(LineSegment { - start: from_pos, - end: to_pos, - from_node: format!("elem_{}", link.from_uid), - to_node: format!("elem_{}", link.to_uid), + start: Position::new(a.x, a.y), + end: Position::new(b.x, b.y), + from_node: vertex_name(i), + to_node: vertex_name(i + 1), }); } } ViewElement::Flow(flow) => { - for i in 0..flow.points.len().saturating_sub(1) { - segments.push(LineSegment { - start: Position::new(flow.points[i].x, flow.points[i].y), - end: Position::new(flow.points[i + 1].x, flow.points[i + 1].y), - from_node: format!("flow_{}#{}", flow.uid, i), - to_node: format!("flow_{}#{}", flow.uid, i + 1), - }); + if flow.points.len() < 2 { + continue; + } + + // Build the pipe as a sequence of named vertices. A point + // attached to a stock/cloud shares that element's `elem_{uid}` + // name; a free interior point keeps a per-flow `flow_{uid}#{i}` + // name. The valve (the flow's own element, at `flow.x/flow.y`) + // is injected as an `elem_{flow.uid}` vertex on the pipe segment + // whose span contains it, so a link incident on the valve is + // suppressed at that shared connection point. Consecutive + // segments of one flow always share the joining vertex name, so + // a flow never self-crosses. + let point_name = |i: usize| -> String { + match flow.points[i].attached_to_uid { + Some(uid) => format!("elem_{uid}"), + None => format!("flow_{}#{}", flow.uid, i), + } + }; + + let valve = Position::new(flow.x, flow.y); + let valve_name = format!("elem_{}", flow.uid); + // The pipe segment the valve sits strictly interior to. `None` + // when the valve coincides with a stored point or (in a + // hand-edited view) drifted off the polyline; the pipe is then + // not split and the existing point names hold. + let valve_seg = (0..flow.points.len() - 1).find(|&i| { + let a = Position::new(flow.points[i].x, flow.points[i].y); + let b = Position::new(flow.points[i + 1].x, flow.points[i + 1].y); + valve != a + && valve != b + && point_on_segment(valve, &flow.points[i], &flow.points[i + 1]) + }); + + for i in 0..flow.points.len() - 1 { + let a = Position::new(flow.points[i].x, flow.points[i].y); + let b = Position::new(flow.points[i + 1].x, flow.points[i + 1].y); + let a_name = point_name(i); + let b_name = point_name(i + 1); + + if Some(i) == valve_seg { + // Split this pipe segment at the valve so both halves + // share the `elem_{flow.uid}` vertex. + segments.push(LineSegment { + start: a, + end: valve, + from_node: a_name, + to_node: valve_name.clone(), + }); + segments.push(LineSegment { + start: valve, + end: b, + from_node: valve_name.clone(), + to_node: b_name, + }); + } else { + segments.push(LineSegment { + start: a, + end: b, + from_node: a_name, + to_node: b_name, + }); + } } } _ => {} } } - annealing::count_crossings(&segments) + segments +} + +/// Count edge crossings in a completed StockFlow view. +/// +/// Crossings are counted on the connectors' sampled drawn polylines: straight +/// links clipped to element boundaries, arcs sampled along their arc circle, +/// and flow pipes as their point polylines. All element endpoints are resolved +/// (Module/Alias included), so the count reflects the geometry the renderer +/// actually draws rather than a straight chord approximation. +pub fn count_view_crossings(view: &datamodel::StockFlow) -> usize { + annealing::count_crossings(&build_view_segments(view)) } /// Assemble a [`datamodel::StockFlow`] from finalized layout state, copying @@ -4432,7 +4617,12 @@ fn build_stock_flow_from_state( /// Seeds for parallel layout generation. Each seed produces a different SFDP /// layout; the one with fewest connector crossings is selected. -const LAYOUT_SEEDS: [u64; 4] = [42, 123, 456, 789]; +/// +/// These are also the layout-quality sweep's best-of-k production proxy: the +/// `layout_eval` example scores the best layout over exactly this seed set to +/// estimate what production (which picks best-of-`LAYOUT_SEEDS`) would ship, +/// so it is exposed publicly. The value and behavior are unchanged. +pub const LAYOUT_SEEDS: [u64; 4] = [42, 123, 456, 789]; /// Apply a model patch incrementally to an existing diagram view, /// preserving existing element positions and only placing new or @@ -5078,8 +5268,10 @@ pub fn generate_layout_with_config( fresh_layout(model, &metadata, &config) } -/// Generate multiple layouts with different seeds in parallel and pick the -/// one with fewest crossings. On tie, the lowest seed wins. +/// Generate multiple layouts with different seeds in parallel and pick the one +/// that minimizes the full calibrated layout-quality metric (`weighted_cost`, +/// which includes the accurate connector-crossing count alongside node/label +/// overlap and loop compactness). On tie, the lowest seed wins. pub fn generate_best_layout( project: &datamodel::Project, model_name: &str, @@ -5095,10 +5287,14 @@ pub fn generate_best_layout( let mut cfg = config.clone(); cfg.annealing_random_seed = seed; let view = fresh_layout(model, &metadata, &cfg)?; - let crossings = count_view_crossings(&view); + // Score the candidate with the full calibrated metric. Its `crossings` + // term computes the accurate connector-crossing count internally, so we + // no longer call `count_view_crossings` directly here. + let metrics = metrics::compute_layout_metrics(&view, &cfg); + let weighted_cost = metrics.weighted_cost(&metrics::MetricWeights::default()); Ok(LayoutResult { view, - crossings, + weighted_cost, seed, }) }; @@ -5128,7 +5324,12 @@ pub fn compute_layout_metadata( compute_metadata(project, model_name, db_state) } -/// Pick the layout with fewest crossings; on tie, the one from the lowest seed. +/// Pick the layout that minimizes the full calibrated layout-quality metric +/// (`weighted_cost`); on tie, the one from the lowest seed. NaN-cost candidates +/// (degenerate layouts) never win over a finite one regardless of position in +/// the result set; if ALL candidates are NaN the earliest is kept +/// deterministically. The first `Err` short-circuits, and an empty result set is +/// an error. fn select_best_layout( results: Vec>, ) -> Result { @@ -5139,13 +5340,24 @@ fn select_best_layout( best = Some(match best { None => lr, Some(prev) => { - if lr.crossings < prev.crossings - || (lr.crossings == prev.crossings && lr.seed < prev.seed) - { - lr + // NaN-safe and order-independent: a degenerate NaN-cost + // candidate never wins over a finite one regardless of which + // came first. A plain `<` already drops a NaN *challenger* + // (`NaN < finite` is false), but it would NOT let a finite + // challenger overtake a NaN *running best* (`finite < NaN` and + // `finite == NaN` are both false), so the first seed's NaN would + // be sticky. The explicit NaN branches fix that asymmetry. If + // ALL candidates are NaN the challenger is never better, so the + // earliest is kept -- deterministic regardless. + let better = if lr.weighted_cost.is_nan() { + false // a NaN challenger never wins + } else if prev.weighted_cost.is_nan() { + true // a finite challenger always beats a NaN running best } else { - prev - } + lr.weighted_cost < prev.weighted_cost + || (lr.weighted_cost == prev.weighted_cost && lr.seed < prev.seed) + }; + if better { lr } else { prev } } }); } @@ -5157,3 +5369,11 @@ fn select_best_layout( #[cfg(test)] #[path = "layout_tests.rs"] mod tests; + +#[cfg(test)] +#[path = "crossings_tests.rs"] +mod crossings_tests; + +#[cfg(test)] +#[path = "layout_selection_tests.rs"] +mod layout_selection_tests; diff --git a/src/simlin-engine/tests/layout.rs b/src/simlin-engine/tests/layout.rs index fb6b4a1a0..4373cab67 100644 --- a/src/simlin-engine/tests/layout.rs +++ b/src/simlin-engine/tests/layout.rs @@ -2223,3 +2223,113 @@ fn test_incremental_add_chain_rebuilds_existing_cloud_flow() { "chain_flow and waste_flow should not overlap after incremental add (dist={dist})" ); } + +/// Count how many elements differ between two views generated for the same +/// model. Element ordering is structurally stable (see +/// `test_layout_structural_consistency`), so a positional comparison can be +/// done index-by-index; `ViewElement` derives `PartialEq` over its f64 +/// coordinates (and flow `points`), giving an exact byte-for-byte comparison. +/// Returns `(differing, total)`. +fn count_layout_differences( + a: &simlin_engine::datamodel::StockFlow, + b: &simlin_engine::datamodel::StockFlow, +) -> (usize, usize) { + assert_eq!( + a.elements.len(), + b.elements.len(), + "layouts must have the same number of elements to compare" + ); + let differing = a + .elements + .iter() + .zip(b.elements.iter()) + .filter(|(ea, eb)| ea != eb) + .count(); + (differing, a.elements.len()) +} + +/// A layout produced for a fixed (model, annealing_random_seed) must be +/// bit-identical across repeated serial calls in one process (issue #633). +/// The RNG is already seeded deterministically; the only remaining source of +/// run-to-run drift was per-instance-random `HashMap` iteration order inside +/// `run_sfdp_with_rigid_chains` (centroid float accumulation and aux initial +/// placement). SIR has auxiliaries, so it exercises the aux-placement loop. +#[test] +fn test_layout_deterministic_per_seed() { + let project = load_project("test/test-models/samples/SIR/SIR.stmx"); + + let config = LayoutConfig { + annealing_random_seed: 42, + ..Default::default() + }; + + let view1 = generate_layout_with_config(&project, MAIN_MODEL, config.clone(), None) + .expect("first layout should succeed"); + let view2 = generate_layout_with_config(&project, MAIN_MODEL, config, None) + .expect("second layout should succeed"); + + let (differing, total) = count_layout_differences(&view1, &view2); + assert_eq!( + differing, 0, + "layout for a fixed seed must be deterministic: {differing}/{total} elements differ \ + between two serial calls" + ); +} + +/// The incremental layout path (`incremental_layout` -> +/// `compute_new_element_positions`) must also be deterministic for a fixed +/// model + patch. This guards against the same class of HashMap-iteration +/// nondeterminism in the incremental code paths. +#[test] +fn test_incremental_layout_deterministic() { + use simlin_engine::datamodel; + use simlin_engine::layout::incremental_layout; + use simlin_engine::{ModelOperation, ModelPatch}; + + let project = load_project("test/test-models/samples/SIR/SIR.stmx"); + let old_view = + generate_layout(&project, MAIN_MODEL, None).expect("initial layout should succeed"); + + let mut patched_project = project.clone(); + let model = patched_project.get_model_mut(MAIN_MODEL).unwrap(); + model + .variables + .push(datamodel::Variable::Aux(datamodel::Aux { + ident: "vaccination_rate".to_string(), + equation: datamodel::Equation::Scalar("susceptible * 0.01".to_string()), + documentation: String::new(), + units: None, + gf: None, + ai_state: None, + uid: None, + compat: Default::default(), + })); + + let make_patch = || ModelPatch { + name: String::new(), + ops: vec![ModelOperation::UpsertAux(datamodel::Aux { + ident: "vaccination_rate".to_string(), + equation: datamodel::Equation::Scalar("susceptible * 0.01".to_string()), + documentation: String::new(), + units: None, + gf: None, + ai_state: None, + uid: None, + compat: Default::default(), + })], + }; + + let new_view1 = + incremental_layout(&old_view, &patched_project, MAIN_MODEL, &make_patch(), None) + .expect("first incremental layout should succeed"); + let new_view2 = + incremental_layout(&old_view, &patched_project, MAIN_MODEL, &make_patch(), None) + .expect("second incremental layout should succeed"); + + let (differing, total) = count_layout_differences(&new_view1, &new_view2); + assert_eq!( + differing, 0, + "incremental layout must be deterministic: {differing}/{total} elements differ \ + between two serial calls" + ); +}