Skip to content

release: v0.3.0 - Never a Silent Wrong Number#65

Merged
ebootheee merged 1 commit into
mainfrom
release/v0.3.0
Jun 10, 2026
Merged

release: v0.3.0 - Never a Silent Wrong Number#65
ebootheee merged 1 commit into
mainfrom
release/v0.3.0

Conversation

@ebootheee

Copy link
Copy Markdown
Owner

v0.3.0 — Never a Silent Wrong Number

The first tagged release of excel-to-engine. v0.2.0 was about access — getting a
6M-cell Excel model into a form a CLI (and an AI assistant) can navigate. v0.3.0 is
about trust: a correctness campaign against a real ~6M-cell PE platform model,
closing nine root-caused defect classes across the parser, the transpiler, and the
convergence machinery — each pinned by a negative-controlled regression that
reproduces the bug on the old code before the fix counts.

Headline: the parser was corrupting 30% of a real model

The campaign's centerpiece (#57 → PR #62): calamine 0.26's shared-formula
expansion was $-blind. When an .xlsx stores one master formula and fills it
across a range, anchor semantics must be respected on re-expansion — instead,
$AO17 (column-absolute) was shifted as if relative and L$7 (row-anchored) was
frozen as if absolute. Plain-relative and fully-absolute references survived by
accident
, which is why 13/23 named outputs looked fine while exactly the
mixed-anchor financial idioms (AVERAGEIFS/SUMIFS windows over date axes, ratio
rows against pinned denominators) collapsed. Blast radius: 1,745,461 of 4.69M
shared-formula member cells — 30% of the model
— corrupted since the project
began, with the visible symptom (a zeroed carry waterfall) three sheets downstream
of the cause.

  • Found by the warm-ground-truth recompute diagnostic: seed every cell with
    Excel's own answer, run one compute pass, diff every write. A faithful formula
    must reproduce ground truth from a warm seed, so every divergence is a transpiler
    defect by definition. The corruption announced itself as a column-shift signature
    (got(Y17) == GT(N17) — each cell holding the value from exactly the fill offset
    away).
  • Missed by every prior test because SheetJS — the library generating the
    synthetic fixtures — never writes shared formulas. The whole bug class was
    structurally invisible to the suite. The new regression
    (test-shared-formula-anchors.mjs) hand-zips an .xlsx with real
    <f t="shared"> groups and was verified RED on calamine 0.26 with the predicted
    wrong values before the upgrade.
  • Fixed by calamine 0.26 → 0.35, plus hardening our own tokenizer against the
    sibling bug it shared (LOG10( parsed as cell LOG10, not a function).
  • Result: the returns sheet went from 9,819 warm-recompute divergences to
    1 (a cosmetic CELL("filename") label); the promote sheet from 4.62M
    divergent writes to 30 (cosmetic TEXT() labels). Zero numeric
    divergence
    — carry total, every equity-class basis, and all 14 IRR cells now
    reproduce Excel exactly.

The honesty contract

A financial engine must never return a confident wrong number. v0.3.0 enforces
that in layers:

What re-measuring found (and the bug it caught same-day)

Booking this release meant re-measuring the real ~6M-cell model on the fixed
build. The measurement chain itself tells the story of how this project works:

  1. The canonical lockstep eval can't hold the full 17-sheet cluster on a
    31 GB box
    — it OOM'd a 16 GB heap after 61 minutes (it carries a duplicate
    ground-truth copy, a ~4.7M-string write-set, and per-cell snapshots). That's
    the Row-chunk monster sheets: cluster-bound returns cone limits --lazy-engine pruning #33 scale wall, now characterized as memory-bound, not just time-bound.
  2. So the warm-ground-truth sweep ran per sheet instead — every one of the
    17 cluster sheets, fresh GT seed, one pass, every write diffed. Pre-fix
    baseline: 1/17 sheets clean, ~5.9M numeric divergences. Post-fix(parser): calamine 0.26-to-0.35 — $-blind shared-formula expansion corrupted 1.75M A-1 cells (#57 structural root) #62:
    9/17 sheets exactly clean (the entire returns chain — Equity, GPP
    Promote, Future Owned Acquisitions, TRS, Brokered & AMA, Headcount, G&A,
    Existing Leased, Assumptions — went from millions of divergences to zero),
    ~391k residual (−93%).
  3. The residual had a signature: Valuation!G7 computed serial 45106 where
    Excel says 45107 — one day early — and downstream sheets showed integer
    counts off by one and 0/1 flags flipped. One look at the emitted code found
    it: the YEAR/MONTH/DAY lowerings used JavaScript's local-time date
    getters, so any engine runtime west of UTC read every Excel serial one day
    early. The engine's answers depended on the machine's timezone (fix(transpiler): YEAR/MONTH/DAY use UTC serial math (local-time getters read dates a day early west of UTC) #64, fixed
    same day — routed through the UTC/epoch-quirk-aware serial helper, with the
    regression running the engine in TZ-pinned child processes so it
    discriminates even on UTC CI runners; red pre-fix with exactly the
    A-1-observed values).
  4. After the fix and a rebuild: 11/17 cluster sheets reproduce ground truth
    exactly from a warm seed
    (vs 1/17 before the campaign), 13/17 within
    float noise (the two others' residuals are ≤1.5e-6). Total numeric
    divergence across the 17-sheet circular cluster: ~5.9M → 383k (−93.5%),
    with the entire returns / promote / equity chain — every sheet a named
    output lives on — at zero.

The remaining four sheets (Technology 285k, Owned Asset PP&E 84k, Lease
Amortization 7.3k, Debt 6.7k) carry a third, distinct defect class: a
formula-structure mismatch — with every input read at its exact Excel
value, the transpiled expression computes a different result (e.g.
Technology!CG14: inputs exact, ours 1, Excel 0), so the transpiled AST
cannot match the workbook's actual formula. That's the next campaign (tracked
in the issues). Until those sheets are exact, a full-cluster convergence
measurement stays moot — warm-seeded iteration walks away from ground truth
through them. Crucially, the honesty contract held: the engine's divergence
detector returns converged: false and NaN for the cluster instead of
fabricating numbers.

Also in this release

  • ete lite (ADR-027) — right-sized extraction: a four-tier ladder
    (closed-form → fitted surrogate → scoped cone → full engine) emits the smallest
    artifact meeting the precision budget, with signed provenance and an honesty
    gate that force-escalates anything with a breakpoint/kink rather than shipping a
    smooth approximation of a cliff.
  • Downstream contract artifactsnamed-outputs.json / named-inputs.json /
    cell-types.json / build-manifest.json (+ npm run golden gate): named,
    spot-checkable bindings with base-case values and input→output closures, so a
    downstream app fails its build on drift instead of rendering it.
  • Guided onboarding — the analyst-first journey (convert → sanity-check →
    ete verify → hand off INTEGRATION.md + example.mjs), model-family
    templates (ete init --template, ete manifest export), manifest
    doctor/invariants/aggregate refs, and --reuse-parse for 2-second manifest
    iteration.
  • Scale machinery--lazy-engine (on-demand sheet loading with
    cone-scoped load()/runScoped()), compact dependency graphs (0.5 GB instead
    of 37 GB on the real models), streamed emit, and experimental scoped cone
    emission (--emit-cones, ADR-026).
  • FixesYEAR/MONTH/DAY timezone independence (fix(transpiler): YEAR/MONTH/DAY use UTC serial math (local-time getters read dates a day early west of UTC) #64: local-time
    Date getters read every serial a day early on machines west of UTC — the
    engine's answers depended on the runtime's timezone; now pure-UTC integer
    math, with the regression run in TZ-pinned child processes); ete help no
    longer crashes on a nested-backtick template literal (with a smoke test so it
    can't recur); COL/ROW reference transpilation (fix(transpiler): parse absolute-row mixed refs — root-cause of the real-A1 * COL`` NaN (T-078) #43, 240,973 NaN cells →
    0); per-sheet-eval _sheetConvergence crash (fix(per-sheet-eval): initialize _sheetConvergence in the child eval ctx #56); package repository URL.

Design principles

  1. Ground truth is the spec. Every conversion ships every value Excel
    computed; every claim is checkable against it.
  2. Never a silent wrong number. Errors propagate, non-convergence NaN-fills,
    approximations escalate at kinks. Plausible-but-wrong is the only unacceptable
    failure mode.
  3. Negative-controlled regressions. A fix doesn't exist until a test
    reproduces the bug — red on the old code, with the predicted wrong values —
    on a synthetic model that runs in seconds, not the 70-minute real rebuild.
  4. Lockstep measurement. Any fast harness must implement the shipped engine's
    exact contract, or its numbers are fiction.
  5. Small reviewable merges. The campaign landed as nine PRs, each one fix +
    its regression, fast-forward merged green.

Known remaining walls

Full test suite: 37 suites green (npm test), cargo test green, blind eval
149/150 (99.3%) across 15.5M cells.

First tagged release. Books the correctness campaign: #55/#56/#57(A)/#59/
#60/#61/#62/#63/#64 all merged, #47 closed, the per-sheet warm-GT sweep as
the standing fidelity gate (A-1: 1/17 -> 11/17 sheets exactly clean, -93.5%
numeric divergence, returns chain at zero).

- docs/releases/v0.3.0.md: full release notes (the calamine story, the
  honesty contract, the TZ bug, design principles, known walls)
- README: ete lite, templates, --reuse-parse, "Correctness & honesty"
  section, warm-GT fidelity gate, refreshed test counts
- cli/index.mjs: fix printHelp crash (nested backticks parsed as a tagged
  template -> TypeError on every bare/unknown invocation) + stale cone help
  text; help smoke test in test-cli.mjs
- package.json: 0.2.0 -> 0.3.0; repository URL pointed at a nonexistent org
- CHANGELOG/PLAN/ROADMAP: release entry + re-measure findings + next wave
  (residual formula-structure mismatch on 4 sheets)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ebootheee ebootheee merged commit b307a62 into main Jun 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant