release: v0.3.0 - Never a Silent Wrong Number by ebootheee · Pull Request #65 · ebootheee/excel-to-engine

ebootheee · 2026-06-10T01:17:34Z

v0.3.0 — Never a Silent Wrong Number

The first tagged release of excel-to-engine. v0.2.0 was about access — getting a
6M-cell Excel model into a form a CLI (and an AI assistant) can navigate. v0.3.0 is
about trust: a correctness campaign against a real ~6M-cell PE platform model,
closing nine root-caused defect classes across the parser, the transpiler, and the
convergence machinery — each pinned by a negative-controlled regression that
reproduces the bug on the old code before the fix counts.

Headline: the parser was corrupting 30% of a real model

The campaign's centerpiece (#57 → PR #62): calamine 0.26's shared-formula
expansion was $-blind. When an .xlsx stores one master formula and fills it
across a range, anchor semantics must be respected on re-expansion — instead,
$AO17 (column-absolute) was shifted as if relative and L$7 (row-anchored) was
frozen as if absolute. Plain-relative and fully-absolute references survived by
accident, which is why 13/23 named outputs looked fine while exactly the
mixed-anchor financial idioms (AVERAGEIFS/SUMIFS windows over date axes, ratio
rows against pinned denominators) collapsed. Blast radius: 1,745,461 of 4.69M
shared-formula member cells — 30% of the model — corrupted since the project
began, with the visible symptom (a zeroed carry waterfall) three sheets downstream
of the cause.

Found by the warm-ground-truth recompute diagnostic: seed every cell with
Excel's own answer, run one compute pass, diff every write. A faithful formula
must reproduce ground truth from a warm seed, so every divergence is a transpiler
defect by definition. The corruption announced itself as a column-shift signature
(got(Y17) == GT(N17) — each cell holding the value from exactly the fill offset
away).
Missed by every prior test because SheetJS — the library generating the
synthetic fixtures — never writes shared formulas. The whole bug class was
structurally invisible to the suite. The new regression
(test-shared-formula-anchors.mjs) hand-zips an .xlsx with real
<f t="shared"> groups and was verified RED on calamine 0.26 with the predicted
wrong values before the upgrade.
Fixed by calamine 0.26 → 0.35, plus hardening our own tokenizer against the
sibling bug it shared (LOG10( parsed as cell LOG10, not a function).
Result: the returns sheet went from 9,819 warm-recompute divergences to
1 (a cosmetic CELL("filename") label); the promote sheet from 4.62M
divergent writes to 30 (cosmetic TEXT() labels). Zero numeric
divergence — carry total, every equity-class basis, and all 14 IRR cells now
reproduce Excel exactly.

The honesty contract

A financial engine must never return a confident wrong number. v0.3.0 enforces
that in layers:

#DIV/0! propagates like an error (Honesty hole: SUM/AVERAGE/SUMPRODUCT/SUMIFS silently DROP a 0/0 NaN (#DIV/0! → confident wrong number); _div canonicalization [gated on real-A-1 re-measure] #60). The old aggregate reducer
(+x||0) made =SUM(100, 0/0, 250) return 350. Division errors now
collapse to a canonical NaN sentinel that flows through
SUM/SUMIFS/SUMPRODUCT/AVERAGE/SUBTOTAL the way #DIV/0! flows through
Excel — while text stays ignored, as Excel ignores it. IFERROR/ISERROR/
ISNUMBER treat ±Infinity and NaN as errors (fix(transpiler): treat #DIV/0! (±Infinity) as an Excel error in IFERROR/ISERROR/ISNUMBER #55).
Honest convergence (Lock-grade convergence: bare (non-IFERROR) divisions leak Infinity → A-1 returns cluster never converges #57/fix(per-sheet-eval): true lockstep — NaN-fill on non-convergence + engine-faithful warming delta (#57 follow-up) #59/Convergence follow-ups (correctness-safe): per-sheet-eval divergence detector; engine divergent+structural churn to MAX_ITER; dead pre-#57 cluster loop #61). Circular clusters (returns ↔ debt ↔ fees)
iterate to a fixed point with transient tolerance — a divide-by-cold-zero
that warms as the cluster solves no longer aborts it — but non-finiteness is
judged at the fixed point: a cluster that doesn't converge reports
converged: false and NaN-fills its cells. Detectably unusable beats
confidently stale. Divergence detection (monotone-up / flat-hot) and a
non-finite churn cap bound the cost of a hopeless cluster.
The measuring stick obeys the same rules (fix(per-sheet-eval): true lockstep — NaN-fill on non-convergence + engine-faithful warming delta (#57 follow-up) #59). eval/per-sheet-eval.mjs
now applies the engine's exact contract in lockstep — same warming-delta
baseline, same NaN-fill — so the accuracy it reports is the accuracy of what
ships, not of a warm-seeded fiction.
Fidelity down to the quirks (Date-axis float drift: monthly date series emitted as literal +30.44 accumulation → exact-match SUMIFS date keys miss → MIP/returns cone collapses to 0/NaN (A-1, post-#43) #47/fix(parser): calamine 0.26-to-0.35 — $-blind shared-formula expansion corrupted 1.75M A-1 cells (#57 structural root) #62/fix(lib): CLI-side XIRR uses Excel's 365-day basis (was 365.25) #63). Dates are integer Excel
day-serials including the phantom 1900-02-29 epoch quirk
(EOMONTH(0,1) = 59, load-bearing on a real sheet). XIRR/XNPV use Excel's
365-day basis in both the engine helpers and the CLI's lib/irr.mjs — the old
365.25 basis was a polite, invisible drift in the fourth decimal of every IRR.

What re-measuring found (and the bug it caught same-day)

Booking this release meant re-measuring the real ~6M-cell model on the fixed
build. The measurement chain itself tells the story of how this project works:

The canonical lockstep eval can't hold the full 17-sheet cluster on a
31 GB box — it OOM'd a 16 GB heap after 61 minutes (it carries a duplicate
ground-truth copy, a ~4.7M-string write-set, and per-cell snapshots). That's
the Row-chunk monster sheets: cluster-bound returns cone limits --lazy-engine pruning #33 scale wall, now characterized as memory-bound, not just time-bound.
So the warm-ground-truth sweep ran per sheet instead — every one of the
17 cluster sheets, fresh GT seed, one pass, every write diffed. Pre-fix
baseline: 1/17 sheets clean, ~5.9M numeric divergences. Post-fix(parser): calamine 0.26-to-0.35 — $-blind shared-formula expansion corrupted 1.75M A-1 cells (#57 structural root) #62:
9/17 sheets exactly clean (the entire returns chain — Equity, GPP
Promote, Future Owned Acquisitions, TRS, Brokered & AMA, Headcount, G&A,
Existing Leased, Assumptions — went from millions of divergences to zero),
~391k residual (−93%).
The residual had a signature: Valuation!G7 computed serial 45106 where
Excel says 45107 — one day early — and downstream sheets showed integer
counts off by one and 0/1 flags flipped. One look at the emitted code found
it: the YEAR/MONTH/DAY lowerings used JavaScript's local-time date
getters, so any engine runtime west of UTC read every Excel serial one day
early. The engine's answers depended on the machine's timezone (fix(transpiler): YEAR/MONTH/DAY use UTC serial math (local-time getters read dates a day early west of UTC) #64, fixed
same day — routed through the UTC/epoch-quirk-aware serial helper, with the
regression running the engine in TZ-pinned child processes so it
discriminates even on UTC CI runners; red pre-fix with exactly the
A-1-observed values).
After the fix and a rebuild: 11/17 cluster sheets reproduce ground truth
exactly from a warm seed (vs 1/17 before the campaign), 13/17 within
float noise (the two others' residuals are ≤1.5e-6). Total numeric
divergence across the 17-sheet circular cluster: ~5.9M → 383k (−93.5%),
with the entire returns / promote / equity chain — every sheet a named
output lives on — at zero.

The remaining four sheets (Technology 285k, Owned Asset PP&E 84k, Lease
Amortization 7.3k, Debt 6.7k) carry a third, distinct defect class: a
formula-structure mismatch — with every input read at its exact Excel
value, the transpiled expression computes a different result (e.g.
Technology!CG14: inputs exact, ours 1, Excel 0), so the transpiled AST
cannot match the workbook's actual formula. That's the next campaign (tracked
in the issues). Until those sheets are exact, a full-cluster convergence
measurement stays moot — warm-seeded iteration walks away from ground truth
through them. Crucially, the honesty contract held: the engine's divergence
detector returns converged: false and NaN for the cluster instead of
fabricating numbers.

Also in this release

ete lite (ADR-027) — right-sized extraction: a four-tier ladder
(closed-form → fitted surrogate → scoped cone → full engine) emits the smallest
artifact meeting the precision budget, with signed provenance and an honesty
gate that force-escalates anything with a breakpoint/kink rather than shipping a
smooth approximation of a cliff.
Downstream contract artifacts — named-outputs.json / named-inputs.json /
cell-types.json / build-manifest.json (+ npm run golden gate): named,
spot-checkable bindings with base-case values and input→output closures, so a
downstream app fails its build on drift instead of rendering it.
Guided onboarding — the analyst-first journey (convert → sanity-check →
ete verify → hand off INTEGRATION.md + example.mjs), model-family
templates (ete init --template, ete manifest export), manifest
doctor/invariants/aggregate refs, and --reuse-parse for 2-second manifest
iteration.
Scale machinery — --lazy-engine (on-demand sheet loading with
cone-scoped load()/runScoped()), compact dependency graphs (0.5 GB instead
of 37 GB on the real models), streamed emit, and experimental scoped cone
emission (--emit-cones, ADR-026).
Fixes — YEAR/MONTH/DAY timezone independence (fix(transpiler): YEAR/MONTH/DAY use UTC serial math (local-time getters read dates a day early west of UTC) #64: local-time
Date getters read every serial a day early on machines west of UTC — the
engine's answers depended on the runtime's timezone; now pure-UTC integer
math, with the regression run in TZ-pinned child processes); ete help no
longer crashes on a nested-backtick template literal (with a smoke test so it
can't recur); COL/ROW reference transpilation (fix(transpiler): parse absolute-row mixed refs — root-cause of the real-A1 * COL`` NaN (T-078) #43, 240,973 NaN cells →
0); per-sheet-eval _sheetConvergence crash (fix(per-sheet-eval): initialize _sheetConvergence in the child eval ctx #56); package repository URL.

Design principles

Ground truth is the spec. Every conversion ships every value Excel
computed; every claim is checkable against it.
Never a silent wrong number. Errors propagate, non-convergence NaN-fills,
approximations escalate at kinks. Plausible-but-wrong is the only unacceptable
failure mode.
Negative-controlled regressions. A fix doesn't exist until a test
reproduces the bug — red on the old code, with the predicted wrong values —
on a synthetic model that runs in seconds, not the 70-minute real rebuild.
Lockstep measurement. Any fast harness must implement the shipped engine's
exact contract, or its numbers are fiction.
Small reviewable merges. The campaign landed as nine PRs, each one fix +
its regression, fast-forward merged green.

Known remaining walls

Row-chunk monster sheets: cluster-bound returns cone limits --lazy-engine pruning #33/Post-#43 A-2 rebuild: Debt sheet transpiled code 3x (104→315 MB) → V8 'invalid size' crash (609 MB string) on multi-pass recompute #46 — the scale wall for a cold-start converged base case on the
largest models: a 17-sheet circular cluster at ~777 MB of modules, with
per-pass cost dominated by three monster sheets; A-2's Debt module emit
(315 MB) exceeds V8's string limit on multi-pass recompute. See the re-measure
above for current status.
Cosmetic: TEXT() format codes unimplemented (labels only),
CELL("filename") placeholder.

Full test suite: 37 suites green (npm test), cargo test green, blind eval
149/150 (99.3%) across 15.5M cells.

First tagged release. Books the correctness campaign: #55/#56/#57(A)/#59/ #60/#61/#62/#63/#64 all merged, #47 closed, the per-sheet warm-GT sweep as the standing fidelity gate (A-1: 1/17 -> 11/17 sheets exactly clean, -93.5% numeric divergence, returns chain at zero). - docs/releases/v0.3.0.md: full release notes (the calamine story, the honesty contract, the TZ bug, design principles, known walls) - README: ete lite, templates, --reuse-parse, "Correctness & honesty" section, warm-GT fidelity gate, refreshed test counts - cli/index.mjs: fix printHelp crash (nested backticks parsed as a tagged template -> TypeError on every bare/unknown invocation) + stale cone help text; help smoke test in test-cli.mjs - package.json: 0.2.0 -> 0.3.0; repository URL pointed at a nonexistent org - CHANGELOG/PLAN/ROADMAP: release entry + re-measure findings + next wave (residual formula-structure mismatch on 4 sheets) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ebootheee merged commit b307a62 into main Jun 10, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.3.0 - Never a Silent Wrong Number#65

release: v0.3.0 - Never a Silent Wrong Number#65
ebootheee merged 1 commit into
mainfrom
release/v0.3.0

ebootheee commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ebootheee commented Jun 10, 2026

v0.3.0 — Never a Silent Wrong Number

Headline: the parser was corrupting 30% of a real model

The honesty contract

What re-measuring found (and the bug it caught same-day)

Also in this release

Design principles

Known remaining walls

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant