feat(papers): adversarial methods-grounded multi-category matrix audit#33
feat(papers): adversarial methods-grounded multi-category matrix audit#33benjibromberg wants to merge 6 commits into
Conversation
Add Phase 4 of the Zotero⇄CAAIL lifecycle: a methods-grounded re-audit of the Papers.md matrix itself. - caail-classification-reviewer agent: read-only, full-text-grounded reviewer of (method × area) placement, distinct from the bibliographic citation reviewer. Verdicts DEFENSIBLE / MISPLACED / UNSUPPORTED per cell, plus MISSING-CELL recommendations and a NOT-PRIMARY flag. - matrix-classification-audit skill + extract_matrix_corpus.py: parses the matrix and references out of Papers.md, indexes both Zotero groups by DOI and URL, and pulls each matrix paper's methods section from the PDF full-text cache into a per-ref corpus for adversarial review. - Register the new skill (Phase 4) and reviewer in CLAUDE.md; cross-reference it from zotero-to-caail-sync; gitignore the corpus build artifact.
Held audit proposals — for human triageThe methods-grounded audit behind this PR also surfaced 9 moves, 27 removes, and 14 “leave the matrix” proposals. They were deliberately not applied here because they challenge CAAIL's curatorial choice to catalogue general/foundational methods (the strict reviewer wanted papers like scGPT, GEARS, UCE, SWE-bench, GPQA out of cell-ag columns). Each cleared a majority of 3 independent adversarial skeptics, but each is a curatorial call — tick the ones to action in a follow-up. They split into two natures: (A) method-accuracy fixes (the cell names the wrong technique) and (B) scope/philosophy calls (method is right, but the application isn't cell-ag-specific). A-type fixes are the safer subset.
Moves — reclassify an existing cell (9)
Removes — drop an existing cell (27)
Not-primary — proposed to leave the matrix entirely (14)
Generated by the |
Add 18 cross-listings where a paper substantively applies more than one AI/ML method (verified against each paper's methods section via the matrix-classification-audit workflow; each survived independent adversarial review). No reference text changed; matrix cells only. - ref 11 Shen 2024 → Ensemble Learning (InfoGAN + dynamically-weighted base classifiers) - ref 20 Rafieyan 2024 → Ensemble Learning (XGBoost/GBM/RF/LightGBM) - ref 26 Sun 2023, ref 28 Sun 2026 → SVM / Ensemble (LS-SVM, RF/GBDT/SVC) - ref 32 Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors (seven model families benchmarked for bioprocess prediction) - ref 61 Wang 2025b, ref 93 Tang 2026 → Agent Infrastructure (LangGraph / hybrid knowledge frameworks) - ref 68 Li 2024 → GNN (GEM-as-graph submodule) - ref 117 Cui 2024, ref 120 Rizvi 2026 → Cell-State & Perturbation Prediction; ref 120 also Reinforcement Learning (GRPO) - ref 161 Narayanan 2025 → Reinforcement Learning (RL-trained chemistry model) - ref 169 Hashizume & Ying 2025 → Ensemble Learning / Genetic Algorithms - ref 182 King 2004 → Active Learning (experiment-selection strategy)
ref 72 trains 18 models including SVM and MLP/Bayesian neural networks for sensory (flavor) prediction under 10-fold CV — confirmed by full-text re-verification as the paper's own applied methods, not a background enumeration. Adds (SVM × Sensory) and (Deep Learning × Sensory) alongside its existing Ensemble Learning placement.
568dfc5 to
2fe066d
Compare
…movals The audit's #33 false positive (it proposed deleting a CNN-surrogate-CFD paper from Bioprocess control, which ResearchAreas/Bioprocess.md already cites) traced to two gaps: the reviewer read only the paper (never CAAIL's own curation context), and a destructive removal carried no more burden than an additive placement. This bakes an asymmetric, context-aware burden on scope removals into the durable tooling. - extract_matrix_corpus.py: add per-ref cited_in_research_areas (scan ResearchAreas/*.md by surname+year / DOI) — an intentional-placement KEEP prior (correctly flags #33 -> Bioprocess control). - caail-classification-reviewer: read the ResearchAreas/<Area>.md scope and honor that prior before any scope call; tag every verdict nature=method-accuracy|scope; default a general-method scope concern to a MOVE to AI Tooling / Methodology, not a removal; method-absent papers stay a firm method-accuracy flag; never hedge a non-fitting paper into a destination-less move. - .claude/workflows/matrix-classification-audit.js: durable named workflow — propose -> skeptics -> (scope only) steelman defender -> gated domain-relevance web grounding. method-accuracy + additive changes bypass the heavy layers. Self-bootstraps inputs from matrix-corpus.json (args is not reliably delivered); fan-out pinned to Sonnet. - SKILL.md / CLAUDE.md: document the asymmetric burden, the layers, and the named-workflow invocation. Behavioral mini-eval (#33,#151,#152,#155): #33 now kept; SWE-bench (#155) correctly flagged NOT-PRIMARY by the defender; #152 scope-removal overturned via the curator-citation prior. No Papers.md content change.
…racy Closes the residual gap from 7fe068c: the method-accuracy path bypassed the defender, so a method-accuracy verdict on a paper the curators cite in a ResearchAreas page could apply a removal of its only cell — orphaning it and severing the live cross-reference (the exact risk the defender flagged for #152). A wrong method row on a cited paper is now a re-row, not a deletion: - workflow: proposer reports cited_by_curators; adjudicate() routes any removal (unsupported / not_primary) of a cited paper through the steelman defender regardless of nature. A re-row MOVE or an uncited method-accuracy fix still needs only skeptics; scope removals still reach the defender (gated, not blanket). - reviewer agent: a curator-cited paper is never UNSUPPORTED/NOT-PRIMARY — a wrong method row is a MISPLACED re-row. Verified by a deterministic truth-table check of the routing guard (7/7) plus the behavioral mini-eval (#33 kept; cited #151/#152 received no applied removal). No Papers.md change.
Hardened re-scrutiny of the held proposalsRe-ran all 32 held-proposal papers through the hardened pipeline (propose → skeptics → steelman defender for scope/cited removals → gated domain grounding). The over-strict scope/philosophy deletions are gone:
✅ Apply — method-accuracy fixes, orphan-safe (5)
|
Adds a non-destructive taxonomy_gap verdict so the classification audit can keep a paper that applies a real AI/ML method whose matrix row/column does not yet exist, and surface a proposed new row/column for curator decision instead of forcing a wrong cell or orphaning the paper. - reviewer: taxonomy_gap verdict + precedence ladder (gap is the last resort, after re-row into an existing label); method-family precision notes (Bayesian Optimization vs Bayesian inference; GNN vs classical network propagation) so a step-2 re-row does not grab a superficially-similar row and bury the real gap. - workflow: taxonomy_gaps schema; per-ref collection that never enters the adjudicated change set (so a gap can never become an applied removal); a Taxonomy phase that clusters pooled gaps and adversarially verifies clusters of >=2 papers into proposed new rows/columns (singletons are parked). - verify_routing.mjs: deterministic guard — the asymmetric-burden routing truth-table, the non-orphan structural invariant, and the >=2 cluster gate. - docs: SKILL.md (verdict, non-orphan guarantee, run-full-corpus note, human-applied rows/columns) and the CLAUDE.md lifecycle entry. Verified: verify_routing 20/20; behavioral mini-eval over refs 59, 60, 34, 121 (taxonomy gaps for 59 and 60, re-row for 34, drop-redundant-cell for 121) kept all four papers (no orphaning removal).
Summary
Re-audits the
Papers.mdmatrix against each paper's methods section (pulledfrom the caail Zotero full-text cache) and adds multi-category placements where
a paper substantively applies more than one AI/ML method. Two parts:
1. Reusable tooling — Phase 4 of the Zotero⇄CAAIL lifecycle
.claude/agents/caail-classification-reviewer.md— read-only, full-text-grounded reviewer of(method × area)placement (distinct from the bibliographiccaail-citation-reviewer)..claude/skills/matrix-classification-audit/—SKILL.md+extract_matrix_corpus.py(reusesscope.py's Zotero helpers; indexes both groups by DOI + URL; pulls each matrix paper's methods text into a per-ref corpus for adversarial review).CLAUDE.md(3→4 skill lifecycle + reviewer-agents list) and cross-referenced fromzotero-to-caail-sync. Corpus is gitignored.2. 20 verified multi-category cross-listings (matrix cells only — no reference text changed, no IDs renumbered). Each was proposed by a methods-reading agent, survived 3 independent adversarial skeptics, and the lower-confidence ones were re-checked by a fresh agent. Highlights:
#32Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors (7-model bioprocess benchmark)#169Hashizume & Ying 2025 → Ensemble / Genetic Algorithms#117Cui 2024 +#120Rizvi 2026 → Cell-State & Perturbation Prediction;#120also Reinforcement Learning#11/#20/#26/#28/#68/#61/#93/#161/#182/#72→ see commits for the per-paper method spansThe audit also surfaced 9 moves / 27 removes / 14 "leave the matrix" proposals. These are deliberately not included here — they challenge CAAIL's curatorial choice to catalogue general/foundational methods (e.g. it wanted scGPT, GEARS, SWE-bench out of cell-ag columns), so they're held for human triage rather than auto-applied.
How the matrix supports this
Multi-cell classification needed zero code changes — the parser already accumulates
methods[]/areas[]per reference across cells; this PR is purely additive matrix anchors.Test Plan
pnpm --dir site lint:papers— 0 hard errors (no dangling anchors, no orphaned primary refs)pnpm --dir site test— 289/289 pass (incl. multi-cell parser test)pnpm --dir site parse—generate-data.tscross-tally assertions pass#16)🤖 Generated with Claude Code