release: v0.3.0 - Never a Silent Wrong Number#65
Merged
Conversation
First tagged release. Books the correctness campaign: #55/#56/#57(A)/#59/ #60/#61/#62/#63/#64 all merged, #47 closed, the per-sheet warm-GT sweep as the standing fidelity gate (A-1: 1/17 -> 11/17 sheets exactly clean, -93.5% numeric divergence, returns chain at zero). - docs/releases/v0.3.0.md: full release notes (the calamine story, the honesty contract, the TZ bug, design principles, known walls) - README: ete lite, templates, --reuse-parse, "Correctness & honesty" section, warm-GT fidelity gate, refreshed test counts - cli/index.mjs: fix printHelp crash (nested backticks parsed as a tagged template -> TypeError on every bare/unknown invocation) + stale cone help text; help smoke test in test-cli.mjs - package.json: 0.2.0 -> 0.3.0; repository URL pointed at a nonexistent org - CHANGELOG/PLAN/ROADMAP: release entry + re-measure findings + next wave (residual formula-structure mismatch on 4 sheets) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v0.3.0 — Never a Silent Wrong Number
The first tagged release of excel-to-engine. v0.2.0 was about access — getting a
6M-cell Excel model into a form a CLI (and an AI assistant) can navigate. v0.3.0 is
about trust: a correctness campaign against a real ~6M-cell PE platform model,
closing nine root-caused defect classes across the parser, the transpiler, and the
convergence machinery — each pinned by a negative-controlled regression that
reproduces the bug on the old code before the fix counts.
Headline: the parser was corrupting 30% of a real model
The campaign's centerpiece (#57 → PR #62): calamine 0.26's shared-formula
expansion was
$-blind. When an.xlsxstores one master formula and fills itacross a range, anchor semantics must be respected on re-expansion — instead,
$AO17(column-absolute) was shifted as if relative andL$7(row-anchored) wasfrozen as if absolute. Plain-relative and fully-absolute references survived by
accident, which is why 13/23 named outputs looked fine while exactly the
mixed-anchor financial idioms (
AVERAGEIFS/SUMIFSwindows over date axes, ratiorows against pinned denominators) collapsed. Blast radius: 1,745,461 of 4.69M
shared-formula member cells — 30% of the model — corrupted since the project
began, with the visible symptom (a zeroed carry waterfall) three sheets downstream
of the cause.
Excel's own answer, run one compute pass, diff every write. A faithful formula
must reproduce ground truth from a warm seed, so every divergence is a transpiler
defect by definition. The corruption announced itself as a column-shift signature
(
got(Y17) == GT(N17)— each cell holding the value from exactly the fill offsetaway).
synthetic fixtures — never writes shared formulas. The whole bug class was
structurally invisible to the suite. The new regression
(
test-shared-formula-anchors.mjs) hand-zips an.xlsxwith real<f t="shared">groups and was verified RED on calamine 0.26 with the predictedwrong values before the upgrade.
sibling bug it shared (
LOG10(parsed as cellLOG10, not a function).1 (a cosmetic
CELL("filename")label); the promote sheet from 4.62Mdivergent writes to 30 (cosmetic
TEXT()labels). Zero numericdivergence — carry total, every equity-class basis, and all 14 IRR cells now
reproduce Excel exactly.
The honesty contract
A financial engine must never return a confident wrong number. v0.3.0 enforces
that in layers:
#DIV/0!propagates like an error (Honesty hole: SUM/AVERAGE/SUMPRODUCT/SUMIFS silently DROP a 0/0 NaN (#DIV/0! → confident wrong number); _div canonicalization [gated on real-A-1 re-measure] #60). The old aggregate reducer(+x||0)made=SUM(100, 0/0, 250)return 350. Division errors nowcollapse to a canonical NaN sentinel that flows through
SUM/SUMIFS/SUMPRODUCT/AVERAGE/SUBTOTALthe way#DIV/0!flows throughExcel — while text stays ignored, as Excel ignores it.
IFERROR/ISERROR/ISNUMBERtreat ±Infinity and NaN as errors (fix(transpiler): treat #DIV/0! (±Infinity) as an Excel error in IFERROR/ISERROR/ISNUMBER #55).iterate to a fixed point with transient tolerance — a divide-by-cold-zero
that warms as the cluster solves no longer aborts it — but non-finiteness is
judged at the fixed point: a cluster that doesn't converge reports
converged: falseand NaN-fills its cells. Detectably unusable beatsconfidently stale. Divergence detection (monotone-up / flat-hot) and a
non-finite churn cap bound the cost of a hopeless cluster.
eval/per-sheet-eval.mjsnow applies the engine's exact contract in lockstep — same warming-delta
baseline, same NaN-fill — so the accuracy it reports is the accuracy of what
ships, not of a warm-seeded fiction.
day-serials including the phantom 1900-02-29 epoch quirk
(
EOMONTH(0,1) = 59, load-bearing on a real sheet).XIRR/XNPVuse Excel's365-day basis in both the engine helpers and the CLI's
lib/irr.mjs— the old365.25 basis was a polite, invisible drift in the fourth decimal of every IRR.
What re-measuring found (and the bug it caught same-day)
Booking this release meant re-measuring the real ~6M-cell model on the fixed
build. The measurement chain itself tells the story of how this project works:
31 GB box — it OOM'd a 16 GB heap after 61 minutes (it carries a duplicate
ground-truth copy, a ~4.7M-string write-set, and per-cell snapshots). That's
the Row-chunk monster sheets: cluster-bound returns cone limits --lazy-engine pruning #33 scale wall, now characterized as memory-bound, not just time-bound.
17 cluster sheets, fresh GT seed, one pass, every write diffed. Pre-fix
baseline: 1/17 sheets clean, ~5.9M numeric divergences. Post-fix(parser): calamine 0.26-to-0.35 — $-blind shared-formula expansion corrupted 1.75M A-1 cells (#57 structural root) #62:
9/17 sheets exactly clean (the entire returns chain — Equity, GPP
Promote, Future Owned Acquisitions, TRS, Brokered & AMA, Headcount, G&A,
Existing Leased, Assumptions — went from millions of divergences to zero),
~391k residual (−93%).
Valuation!G7computed serial 45106 whereExcel says 45107 — one day early — and downstream sheets showed integer
counts off by one and 0/1 flags flipped. One look at the emitted code found
it: the
YEAR/MONTH/DAYlowerings used JavaScript's local-time dategetters, so any engine runtime west of UTC read every Excel serial one day
early. The engine's answers depended on the machine's timezone (fix(transpiler): YEAR/MONTH/DAY use UTC serial math (local-time getters read dates a day early west of UTC) #64, fixed
same day — routed through the UTC/epoch-quirk-aware serial helper, with the
regression running the engine in TZ-pinned child processes so it
discriminates even on UTC CI runners; red pre-fix with exactly the
A-1-observed values).
exactly from a warm seed (vs 1/17 before the campaign), 13/17 within
float noise (the two others' residuals are ≤1.5e-6). Total numeric
divergence across the 17-sheet circular cluster: ~5.9M → 383k (−93.5%),
with the entire returns / promote / equity chain — every sheet a named
output lives on — at zero.
The remaining four sheets (Technology 285k, Owned Asset PP&E 84k, Lease
Amortization 7.3k, Debt 6.7k) carry a third, distinct defect class: a
formula-structure mismatch — with every input read at its exact Excel
value, the transpiled expression computes a different result (e.g.
Technology!CG14: inputs exact, ours 1, Excel 0), so the transpiled ASTcannot match the workbook's actual formula. That's the next campaign (tracked
in the issues). Until those sheets are exact, a full-cluster convergence
measurement stays moot — warm-seeded iteration walks away from ground truth
through them. Crucially, the honesty contract held: the engine's divergence
detector returns
converged: falseand NaN for the cluster instead offabricating numbers.
Also in this release
ete lite(ADR-027) — right-sized extraction: a four-tier ladder(closed-form → fitted surrogate → scoped cone → full engine) emits the smallest
artifact meeting the precision budget, with signed provenance and an honesty
gate that force-escalates anything with a breakpoint/kink rather than shipping a
smooth approximation of a cliff.
named-outputs.json/named-inputs.json/cell-types.json/build-manifest.json(+npm run goldengate): named,spot-checkable bindings with base-case values and input→output closures, so a
downstream app fails its build on drift instead of rendering it.
ete verify→ hand offINTEGRATION.md+example.mjs), model-familytemplates (
ete init --template,ete manifest export), manifestdoctor/invariants/aggregate refs, and--reuse-parsefor 2-second manifestiteration.
--lazy-engine(on-demand sheet loading withcone-scoped
load()/runScoped()), compact dependency graphs (0.5 GB insteadof 37 GB on the real models), streamed emit, and experimental scoped cone
emission (
--emit-cones, ADR-026).YEAR/MONTH/DAYtimezone independence (fix(transpiler): YEAR/MONTH/DAY use UTC serial math (local-time getters read dates a day early west of UTC) #64: local-timeDategetters read every serial a day early on machines west of UTC — theengine's answers depended on the runtime's timezone; now pure-UTC integer
math, with the regression run in TZ-pinned child processes);
etehelp nolonger crashes on a nested-backtick template literal (with a smoke test so it
can't recur);
COL/ROWreference transpilation (fix(transpiler): parse absolute-row mixed refs — root-cause of the real-A1*COL`` NaN (T-078) #43, 240,973 NaN cells →0); per-sheet-eval
_sheetConvergencecrash (fix(per-sheet-eval): initialize _sheetConvergence in the child eval ctx #56); package repository URL.Design principles
computed; every claim is checkable against it.
approximations escalate at kinks. Plausible-but-wrong is the only unacceptable
failure mode.
reproduces the bug — red on the old code, with the predicted wrong values —
on a synthetic model that runs in seconds, not the 70-minute real rebuild.
exact contract, or its numbers are fiction.
its regression, fast-forward merged green.
Known remaining walls
largest models: a 17-sheet circular cluster at ~777 MB of modules, with
per-pass cost dominated by three monster sheets; A-2's Debt module emit
(315 MB) exceeds V8's string limit on multi-pass recompute. See the re-measure
above for current status.
TEXT()format codes unimplemented (labels only),CELL("filename")placeholder.Full test suite: 37 suites green (
npm test),cargo testgreen, blind eval149/150 (99.3%) across 15.5M cells.