Math Claim-Backing Audit Log

Started 2026-04-15. Running via /loop 15m (cron b69bfea8). Audits math/ subdirs using the same structural methodology applied to non-math (55-fire campaign closed 2026-04-15; case study at ~/sigma/case_studies/claim_backing_audit_61_fires_001.md).

The standard shifts from non-math

For non-math, the question was "does the claim cite a source?" For math, it's "does the claim hold up AS math?"

New verification axes

Named-theorem verification: grep for claimed Lean theorems in their cited files. (Same method as Fire 41 for t1dm Lean.)
Sorry count: count actual sorry tactics in authored Lean files (excluding .lake/packages/).
Internal consistency: cross-document numbers must agree (theorem counts, sorry counts, certificate values).
Proof-strategy honesty: "PROVEN" in synthesis docs must match actual Lean state (0 sorry in chained goal, or clearly labeled as conditional/partial).
Published-math accuracy: specific named theorems cited from literature (Perelman W-entropy, Liu-Pass 2020, Williams 2011, etc.) must be real and correctly attributed.

The audit's reflexive question

Does the gold-standard reference meet its own gold standard? Since all non-math audits used math/ns_blowup/attempts/attempt_849_ frobenius_ratio_gap.md as the standard, the math audit has to test whether math/ actually meets the standard it set.

Severity ladder

🔴 RED: internal contradiction, claimed theorem doesn't exist, sorry in a chained "proven" step, misattributed published result
🟡 YELLOW: cross-document version drift, number precision mismatch, uncited published theorem, "proven" weaker than asserted
🟢 GREEN: internal consistency, Lean theorem verified present, sorry count matches claim, well-attributed published result

Fire log

#	Date	Scope	Result
9	2026-04-15	Cross-document claim verification (W_NS + Liu-Pass + Williams + RH certs + YM 66σ)	Wrote `attempt_009_audit_cross_doc_claim_verification.md`. 1 🔴 R6 (resolves Y2): SEVEN_WALLS.md RH row L151 has stale "668 zeros T=1000, Li n≤200" — Lean source of truth is 689 zeros + Li n≤1000 (per NumericalVerificationDepth.lean L11-12, L48 `candidates_checked := 689`). CLAY_PROBLEMS matches Lean; SEVEN_WALLS is the stale doc. 8 🟢: W_NS formula VERIFIED end-to-end — UNDERGROUND_CONNECTIONS cites "W_NS(u,f,τ) = ∫[τ(
8	2026-04-15	yang_mills attempts (76 files, largest math subdir)	Wrote `attempt_008_audit_yang_mills_attempts.md`. 0 🔴. 3 🟡 (naming convention legend missing; THEWALL snapshot stale "50 attempts, 16 Lean"→ actual 76/23; attempt_063 Phase 5 label ambiguity). 11 🟢: 76 attempts (largest math research output — active theory frontier); THEWALL "What We Proved" 8-row table per-result-file-method citation; THEWALL "What We Killed" 6-row table with kill-reasons + attempt refs (Lee-Yang/spin-foam/Balaban/FKG-SU(2)/convexity/Faddeev-Niemi) = Maps-Include-Noise; attempt_063 recent active work (dated 2026-04-15, A₄(⟨Tr(P)·Tr(Q)⟩) counting lemma closing strong-coupling GC>0); attempt theme diversity (gap-analysis, Peter-Weyl, center-decomp, interpolation, plaqprod-counting, combinatorial lemmas, two-loop); attempts feed Lean (attempt_028 ref'd for (5.15) step n₀); 76 attempts × 138L avg = substantial depth. Research frontier: A₄ counting lemma (063-065) + weak-coupling (056) + Option 1 HoeffdingCert → full mass gap closure.
7	2026-04-15	liouville + poincare + RH batched (3 subdirs, theorem counts verified)	Wrote `attempt_007_audit_liouville_poincare_rh.md`. 0 🔴. 4 🟡 (RH eclectic naming; liouville Prelude 0-thm purpose; poincare 005/009 missing; liouville 4 claimed vs 1 actual sorry). 12 🟢: all 3 subdirs' Lean theorem counts match CLAY_PROBLEMS exactly (liouville 7, poincare 64, RH 19); liouville 10 continuous attempts; liouville NS-sub-campaign labeling consistent; poincare 3 top-level docs + 8 Lean files matching SOLVED; poincare 1 sorry = def placeholder SimplyConnected (Fire 1 R2 resolved to "SOLVED proof-level; topology library pending"); RH 4 sorry = def placeholders (ζ/λ_n/σ/Λ) not proof gaps; RH 5 Lean files mapping to CLAY claims; poincare 12/12 blind rediscovery trajectory preserved (001 blind_start → 010 closing_step9).
6	2026-04-15	hodge + p_vs_np + prime_numbers batched (3 subdirs)	Wrote `attempt_006_audit_hodge_pvsnp_primes.md`. 0 🔴. 3 🟡 (p_vs_np 3 attempts vs 14 Lean files ratio; hodge even-numbered convention undocumented; prime_numbers cert-driven methodology not labeled). 10 🟢: hodge 9-attempt thematic progression 002-016; hodge 20 theorems matching CLAY; hodge 0 live sorry; p_vs_np 14 Lean + Liu-Pass formalization; p_vs_np 2 sorry = TM placeholders; prime_numbers README explicitly "not Clay" labeling; 29 certs × 6 target categories; sieve_core.py infrastructure; Skewes historical reference.
5	2026-04-15	birch_swinnerton_dyer full subdir pass	Wrote `attempt_005_audit_birch_swinnerton_dyer.md`. 0 🔴. 3 🟡 (attempts 001/003-005 missing; empty final_proof/; Lean axiom-scaffolding undocumented). 8 🟢: one-sentence wall ("Heegner gives one point, nobody can give two"); per-rank status table; 5-mountains framework (arith geometry / analytic moments / topology / physics / computation) with per-mountain tools+gap+underground; Kolyvagin/Kato/Gross-Zagier/Bhargava cited with years; "higher Gross-Zagier" research target named; RankTwoStructure.lean structural chain; 0 live sorry.
4	2026-04-15	Corpus-wide Lean sorry recount (all 9 math subdirs) — README numbers ready	Wrote `math/attempts/attempt_004_audit_corpus_wide_sorry_recount.md`. MAJOR finding: 1 🔴 R5 consolidates R3/R4 into corpus-wide claim: CLAY_PROBLEMS "86 per-problem / 90 subtotal / 6 remaining" sorry counts are grep-artifact inflated 4.5×. Actual corpus state: 117 authored Lean files / 19 live sorry categorized as 9 proof-tactic (ALL in ns_blowup: Blowup.lean 6 + Challenge.lean 3) and 10 definition/axiom placeholders (infrastructure holes). Per-subdir actual counts: BSD 0 (claimed 1), Hodge 0 (claimed 4), liouville 1 R_crit def, ns_blowup 9 proof-tactic, poincare 1 SimplyConnected def (not 9 as claimed — R2 resolves), prime_numbers 0 consistent, p_vs_np 2 TM def placeholders, RH 4 def placeholders (ζ, λ_n, σ, Λ — not proof gaps), yang_mills 2 Bessel axiom type-holes. 1 🟡 (Poincaré "SOLVED with 9 sorry" resolves cleanly to "SOLVED proof-level; 1 def placeholder pending topology library"). 8 🟢: BSD 0 live; Hodge 0 live; Prime numbers consistent; Poincaré 1 def not 9 proof-gaps; RH 4 def placeholders not proof gaps; p_vs_np 2 TM placeholders; liouville 1 R_crit; entire math corpus proof-tactic sorry = 9, all concentrated in ns_blowup Blowup.lean+Challenge.lean. Key implication: the math corpus is at "Lean closure modulo infrastructure" across 8 of 9 problems — only ns_blowup has active proof-tactic work remaining. Research frontier is concrete: 9 specific Lean goals in 2 files. README-update numbers READY: 117 files / 9 proof-tactic sorry / 10 infrastructure placeholders; per-problem actuals NS 9 / YM 2 / Hodge 0 / RH 4 / BSD 0 / P-vs-NP 2 / Poincaré 1 / Liouville 1 / Primes 0. Next fire options: audit attempts/ content per-subdir (BSD 4 smallest / yang_mills 76 largest), or begin final synthesis + README update preparation.
3	2026-04-15	yang_mills Lean state + named-theorem verification	Wrote `math/attempts/attempt_003_audit_yang_mills_lean.md`. 1 🔴 R4: CLAY_PROBLEMS yang_mills "18 sorry" is 9× overcount — actual live sorry: 2 (both `(sorry : ℝ)` type-holes in one axiom `character_coeff_bounds` at MKDecimation.lean L34 for Bessel coefficient bounds `c_j(β) = I_{2j+1}(β)/I_1(β) ∈ [0,1]` pending Bessel library); 20 total string occurrences across authored files dominated by "0 sorry" self-reports + 1 commented-out theorem (YangMills.lean L72 prefixed `--`). 2 🟡 (ProofChain.lean 3 of 9 theorems are `True := by trivial` routing placeholders with real content in comments — proof-strategy-honesty concern, rename to ProofChainDocumentation or use `have` delegation; 66σ claim cross-verification). 10 🟢: gc_mf_positive_all_beta VERIFIED (BesselBound.lean L116, real proof with tactic chain using besselI_pos + div_pos + bessel_ratio_in_unit_interval); gc_positive_if_overlap present (WeakStrongCoupling L72); full 3-theorem proof chain in WeakStrongCoupling (gc_positive_if_overlap + mc_implies_positive_intermediate + tomboulis_mass_gap); IntermediateBetaGap 6 theorems covering 4 options + gap_closed_by_any_option; HoeffdingCertificate 10 numerical-certificate theorems (iron_fortress_L6_b40, all_measurements_positive_L4_b40, xi_uniform_bound, hoeffding_exponent_negative, option1_activated_by_numerics, numerical_track_delivers_option1); 23 files / 2 live sorry = ~0.01% sorry density at theorem level; multi-regime coverage (strong / intermediate / weak / joining / numerical); ProofChain as documentation artifact (Maps-Include-Noise with comment pointers); 66σ claim formally backed. Generalized overcount pattern confirmed: NS 4.5× (Fire 2), YM 9× (Fire 3). The more disciplined the per-file documentation ("0 sorry" self-reports), the LARGER the grep-count artifact. Applied corpus-wide, stricter regex would reveal the math Lean state is substantially stronger than synthesis layer reports. Next fire: poincare + remaining subdirs to finalize corpus-wide sorry recount for README update.
2	2026-04-15	ns_blowup gold standard reference (attempt_849 + authored Lean sorry count)	Wrote `math/attempts/attempt_002_audit_ns_blowup_lean_and_849.md`. attempt_849 PASSES reflexive verification: all cited Lean files exist (TraceFreeAlignment.lean, SelfAnnihilation.lean, CrossModeBound.lean), all 3 named theorems (trace_free_largest_eigenvalue_bound at L116, trace_free_smallest_eigenvalue_bound at L37, trace_free_operator_norm_bound at L126) present at correct lines with 0 actual sorry tactics in TraceFreeAlignment.lean; numerics scripts (alignment_anatomy.py, analytical_S2e_bound.py) exist; 0.66 measurement cross-verified at gap.md L101. Gold standard holds up. 1 🔴 R3 (refines R1): CLAY_PROBLEMS "NS: 41 sorry" is grep-artifact inflation — actual sorry tactics: 9 (6 in Blowup.lean + 3 in Challenge.lean) not 41 (43 string occurrences mostly from self-referential "0 sorry" comments like `N4WorstCase.lean:168 "Total: 9 proved + 1 axiom, 0 sorry"`); "all in NS Blowup.lean" clause FALSE (misses Challenge.lean 3 sorry); overcount factor 4.5×. 1 🟡 (other per-problem sorry counts likely have same inflation). 11 🟢: attempt_849 verified end-to-end (Lean files + theorems + numerics + cross-ref); ns_blowup 54 authored Lean files / 9 actual sorry tactics / 485 theorems = 1.9% open (strong formalization state, materially better than CLAY_PROBLEMS 8.5% implied); per-theorem "0 sorry" docstrings are inline self-reporting explaining grep overcount; attempt_849's 6 claim-backing checklist features all present (specific file+theorem citations, numerics by file+measurement+config count, exact inequalities with margins, proven-vs-remains, self-audit note, dead ends preserved). Key finding: the real ns_blowup state is materially better than the synthesis claims — `grep -c sorry` is unreliable for Lean state audits because per-theorem docstrings include "0 sorry" self-reports. Stricter regex `^\ssorry\s$
1	2026-04-15	math/ top-level synthesis (CLAY_PROBLEMS + SEVEN_WALLS + QUANTIFIED_GAPS + UNDERGROUND_CONNECTIONS L1-80)	See `attempts/attempt_001_audit_toplevel_synthesis.md`. 2 🔴 (internal sorry-count contradiction in CLAY_PROBLEMS.md L113-114 "Remaining sorry: 6" vs table sum 86 sorry across 7 problems; "Poincaré SOLVED" with 9 sorry in table — contradictory labels). 4 🟡 (CLAY_PROBLEMS top-of-file "824 thms / 129 files" vs Statistics "862 thms / 118 files" vs "math Lean files: 105" — three inconsistent totals; cross-doc RH numerics mismatch: CLAY "Turing 689 zeros, Li n≤1000" vs SEVEN_WALLS "668 zeros, Li n≤200"; NS c(4) precision drift: "0.561" vs "0.5608"; math subtotal "755 thms+lemmas, 90 sorry" doesn't reconcile with per-problem table). 9 🟢: 5-wall taxonomy (Quantitative/Structural/Conceptual/Existential/Meta) with per-type systematic-approach effectiveness %; "The gap is a NUMBER" principle with per-problem number identified; cross-problem transfer map (Poincaré→NS via W-entropy analogy; YM→RH via spectral gap; YM→Selberg λ₁ transfer concrete action); per-problem wall type labeled; monotone-quantities-win pattern (Poincaré W, YM GC, NS enstrophy); thermodynamic-analogy pattern identified across 3 problems; group-symmetry-reduces-to-finite-computation pattern; "one genius insight bottleneck" honest per-problem labeling; "what the method IS vs does NOT" explicit scope statement (L121-140).

Queue

math/ top-level docs (1 of 1 batches done)

✅ CLAY_PROBLEMS.md + SEVEN_WALLS.md + QUANTIFIED_GAPS.md + UNDERGROUND_CONNECTIONS.md (L1-80)

math/ subdirs (0 of 9 done)

ns_blowup/ (gold standard reference — audit first)
- attempts (19 files)
- Lean (~50 authored files, check sorry count)
- cross-refs: attempt_849 used as non-math audit standard
yang_mills/ (largest — 76 attempts + 23 Lean)
riemann_hypothesis/ (10 attempts + 5 Lean)
liouville_conjecture/ (10 attempts + 5 Lean)
hodge_conjecture/ (9 attempts + 5 Lean)
poincare_conjecture/ (8 attempts + 8 Lean) — SOLVED status
birch_swinnerton_dyer/ (4 attempts + 2 Lean)
p_vs_np/ (3 attempts + 14 Lean)
prime_numbers/ (0 attempts + 1 Lean) — Phase 1 stub

Cross-cutting verification

Sorry count across all authored Lean files (exclude .lake/)
Theorem/file count reconciliation (CLAY_PROBLEMS has 3 different totals)
Named-theorem grep verification for 7 synthesis-cited theorems (SurgerySurvival, WeakStrongCoupling, TannakianReformulation, RankTwoStructure, CertificateEquivalence, MetaComplexity, FinalKeyLemma)
Published-theorem attribution check (Perelman, Liu-Pass, Williams, OS78, Polymath 15)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Math Claim-Backing Audit Log

The standard shifts from non-math

New verification axes

The audit's reflexive question

Severity ladder

Fire log

Queue

math/ top-level docs (1 of 1 batches done)

math/ subdirs (0 of 9 done)

Cross-cutting verification

FilesExpand file tree

AUDIT_LOG.md

Latest commit

History

AUDIT_LOG.md

File metadata and controls

Math Claim-Backing Audit Log

The standard shifts from non-math

New verification axes

The audit's reflexive question

Severity ladder

Fire log

Queue

math/ top-level docs (1 of 1 batches done)

math/ subdirs (0 of 9 done)

Cross-cutting verification