feat(papers): adversarial methods-grounded multi-category matrix audit by benjibromberg · Pull Request #33 · tucca-cellag/caail

benjibromberg · 2026-06-02T18:02:17Z

Summary

Re-audits the Papers.md matrix against each paper's methods section (pulled
from the caail Zotero full-text cache) and adds multi-category placements where
a paper substantively applies more than one AI/ML method. Two parts:

1. Reusable tooling — Phase 4 of the Zotero⇄CAAIL lifecycle

.claude/agents/caail-classification-reviewer.md — read-only, full-text-grounded reviewer of (method × area) placement (distinct from the bibliographic caail-citation-reviewer).
.claude/skills/matrix-classification-audit/ — SKILL.md + extract_matrix_corpus.py (reuses scope.py's Zotero helpers; indexes both groups by DOI + URL; pulls each matrix paper's methods text into a per-ref corpus for adversarial review).
Registered in CLAUDE.md (3→4 skill lifecycle + reviewer-agents list) and cross-referenced from zotero-to-caail-sync. Corpus is gitignored.

2. 20 verified multi-category cross-listings (matrix cells only — no reference text changed, no IDs renumbered). Each was proposed by a methods-reading agent, survived 3 independent adversarial skeptics, and the lower-confidence ones were re-checked by a fresh agent. Highlights:

#32 Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors (7-model bioprocess benchmark)
#169 Hashizume & Ying 2025 → Ensemble / Genetic Algorithms
#117 Cui 2024 + #120 Rizvi 2026 → Cell-State & Perturbation Prediction; #120 also Reinforcement Learning
#11/#20/#26/#28/#68/#61/#93/#161/#182/#72 → see commits for the per-paper method spans

The audit also surfaced 9 moves / 27 removes / 14 "leave the matrix" proposals. These are deliberately not included here — they challenge CAAIL's curatorial choice to catalogue general/foundational methods (e.g. it wanted scGPT, GEARS, SWE-bench out of cell-ag columns), so they're held for human triage rather than auto-applied.

How the matrix supports this

Multi-cell classification needed zero code changes — the parser already accumulates methods[]/areas[] per reference across cells; this PR is purely additive matrix anchors.

Test Plan

pnpm --dir site lint:papers — 0 hard errors (no dangling anchors, no orphaned primary refs)
pnpm --dir site test — 289/289 pass (incl. multi-cell parser test)
pnpm --dir site parse — generate-data.ts cross-tally assertions pass
Fresh-agent adversarial re-check on the lowest-confidence adds (caught + reverted one over-eager placement, #16)

🤖 Generated with Claude Code

Add Phase 4 of the Zotero⇄CAAIL lifecycle: a methods-grounded re-audit of the Papers.md matrix itself. - caail-classification-reviewer agent: read-only, full-text-grounded reviewer of (method × area) placement, distinct from the bibliographic citation reviewer. Verdicts DEFENSIBLE / MISPLACED / UNSUPPORTED per cell, plus MISSING-CELL recommendations and a NOT-PRIMARY flag. - matrix-classification-audit skill + extract_matrix_corpus.py: parses the matrix and references out of Papers.md, indexes both Zotero groups by DOI and URL, and pulls each matrix paper's methods section from the PDF full-text cache into a per-ref corpus for adversarial review. - Register the new skill (Phase 4) and reviewer in CLAUDE.md; cross-reference it from zotero-to-caail-sync; gitignore the corpus build artifact.

benjibromberg · 2026-06-02T18:06:24Z

Held audit proposals — for human triage

The methods-grounded audit behind this PR also surfaced 9 moves, 27 removes, and 14 “leave the matrix” proposals. They were deliberately not applied here because they challenge CAAIL's curatorial choice to catalogue general/foundational methods (the strict reviewer wanted papers like scGPT, GEARS, UCE, SWE-bench, GPQA out of cell-ag columns). Each cleared a majority of 3 independent adversarial skeptics, but each is a curatorial call — tick the ones to action in a follow-up.

They split into two natures: (A) method-accuracy fixes (the cell names the wrong technique) and (B) scope/philosophy calls (method is right, but the application isn't cell-ag-specific). A-type fixes are the safer subset.

Update — #33 overturned. A full-text + domain-literature re-review (stirred-tank mixing CFD is core to cultivated-meat bioreactor scale-up; ResearchAreas/Bioprocess.md already cites this paper) reversed both #33 proposals — it stays in CNN × Bioprocess control. Its struck entries below are kept for the record. The over-strictness that flagged #33 likely affects other (B) scope/philosophy removals too — re-check before actioning.

Moves — reclassify an existing cell (9)

#6 · Ji et al. 2021 — DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome (method+scope)
- Deep Learning × Cellular Engineering → Foundation Models: Masked Language Modeling × AI Tooling / Methodology
- DNABERT follows the same training process as BERT... we significantly modified the pretraining process from the original BERT implementation by removing next sentence prediction, adjusting the sequence length and forcing the model to predict contiguous k tokens adapting to DNA scenario. During pre-t…
#13 · Wang et al. 2021 — scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses (scope/philosophy)
- GNN × Cellular Engineering → GNN × AI Tooling / Methodology
- scGNN integrates three iterative multi-modal autoencoders and outperforms existing tools for gene imputation and cell clustering on four benchmark scRNA-Seq datasets. In an Alzheimer's disease study with 13,214 single nuclei from postmortem brain tissues, scGNN successfully illustrated disease-relat…
#17 · Cosenza & Block 2021 — A generalizable hybrid search framework for optimizing expensive design problems using surrogate models (method+scope)
- Genetic Algorithms × Media Optimization → Genetic Algorithms × AI Tooling / Methodology
- The NNGA algorithm is based on an RBF-assisted genetic algorithm. The NNGA uses an RBF model to suggest points that are close to but not directly on top of optima, using a truncated genetic algorithm (TGA).
#34 · Andrews et al. 2025 — Designing cultured tissue moulds using evolutionary strategies (method-accuracy)
- SVM × Scaffolding → Genetic Algorithms × Scaffolding
- Genetic algorithms (GA) are used here to find optimal mould designs. They are a form of optimisation algorithm that can be used to find solutions of complex or abstract problems. Used as a design tool, they constitute a form of artificial or computational intelligence.
#35 · Andrews et al. 2023 — Rapid prediction of lab-grown tissue properties using deep learning (method-accuracy)
- Deep Learning × Scaffolding → GAN / VAE × Scaffolding
- We use the TensorFlow framework for machine learning to implement the pix2pix conditional GAN (cGAN) described in [25].
#40 · Gao et al. 2025 — TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools (method-accuracy)
- General-Purpose Biomedical Agents × AI Tooling / Methodology → Domain-Specific Biomedical Agents × AI Tooling / Methodology
- TOOLUNIVERSE has 211 biomedical tools, covering the following categories: adverse events, risks, safety; addiction and abuse; drug patient populations; drug administration and handling; pharmacology; drug use, mechanism, composition; ID and labeling tools; general clinical annotations; clinical labo…
#53 · Liu et al. 2026 — Advancing AI Research Assistants with Expert-Involved Learning (method-accuracy)
- Scientific Literature & Discovery Agents × AI Tooling / Methodology → Benchmarks & Evaluation Frameworks × AI Tooling / Methodology
- we propose a new dataset designed for evaluating the ability of FMs for long document summarization and scientific figure understanding... we collected the ground truth information paired with model outputs and performed a quantitative assessment with various metrics
#118 · Rosen et al. 2024 — Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN (scope/philosophy)
- Foundation Models: LM + Biological Priors × Cellular Engineering → Foundation Models: LM + Biological Priors × AI Tooling / Methodology
- Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene fun…
#126 · Youngblut et al. 2025 — scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository (scope/philosophy)
- Domain-Specific Biomedical Agents × Cellular Engineering → Domain-Specific Biomedical Agents × AI Tooling / Methodology
- SRAgent is a Python package that utilizes LangGraph for constructing the agentic workflows... To comply with NCBI API rate limits, jobs are triggered every 1-5 minutes, processing 3-5 datasets per run... All extracted metadata is stored in a GCP SQL database for downstream processing.

Removes — drop an existing cell (27)

Not-primary — proposed to leave the matrix entirely (14)

Generated by the matrix-classification-audit workflow (run wf_810da7cd-742); each item cleared ≥2/3 adversarial skeptics. Not auto-applied — these are curatorial calls.

Add 18 cross-listings where a paper substantively applies more than one AI/ML method (verified against each paper's methods section via the matrix-classification-audit workflow; each survived independent adversarial review). No reference text changed; matrix cells only. - ref 11 Shen 2024 → Ensemble Learning (InfoGAN + dynamically-weighted base classifiers) - ref 20 Rafieyan 2024 → Ensemble Learning (XGBoost/GBM/RF/LightGBM) - ref 26 Sun 2023, ref 28 Sun 2026 → SVM / Ensemble (LS-SVM, RF/GBDT/SVC) - ref 32 Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors (seven model families benchmarked for bioprocess prediction) - ref 61 Wang 2025b, ref 93 Tang 2026 → Agent Infrastructure (LangGraph / hybrid knowledge frameworks) - ref 68 Li 2024 → GNN (GEM-as-graph submodule) - ref 117 Cui 2024, ref 120 Rizvi 2026 → Cell-State & Perturbation Prediction; ref 120 also Reinforcement Learning (GRPO) - ref 161 Narayanan 2025 → Reinforcement Learning (RL-trained chemistry model) - ref 169 Hashizume & Ying 2025 → Ensemble Learning / Genetic Algorithms - ref 182 King 2004 → Active Learning (experiment-selection strategy)

ref 72 trains 18 models including SVM and MLP/Bayesian neural networks for sensory (flavor) prediction under 10-fold CV — confirmed by full-text re-verification as the paper's own applied methods, not a background enumeration. Adds (SVM × Sensory) and (Deep Learning × Sensory) alongside its existing Ensemble Learning placement.

…movals The audit's #33 false positive (it proposed deleting a CNN-surrogate-CFD paper from Bioprocess control, which ResearchAreas/Bioprocess.md already cites) traced to two gaps: the reviewer read only the paper (never CAAIL's own curation context), and a destructive removal carried no more burden than an additive placement. This bakes an asymmetric, context-aware burden on scope removals into the durable tooling. - extract_matrix_corpus.py: add per-ref cited_in_research_areas (scan ResearchAreas/*.md by surname+year / DOI) — an intentional-placement KEEP prior (correctly flags #33 -> Bioprocess control). - caail-classification-reviewer: read the ResearchAreas/<Area>.md scope and honor that prior before any scope call; tag every verdict nature=method-accuracy|scope; default a general-method scope concern to a MOVE to AI Tooling / Methodology, not a removal; method-absent papers stay a firm method-accuracy flag; never hedge a non-fitting paper into a destination-less move. - .claude/workflows/matrix-classification-audit.js: durable named workflow — propose -> skeptics -> (scope only) steelman defender -> gated domain-relevance web grounding. method-accuracy + additive changes bypass the heavy layers. Self-bootstraps inputs from matrix-corpus.json (args is not reliably delivered); fan-out pinned to Sonnet. - SKILL.md / CLAUDE.md: document the asymmetric burden, the layers, and the named-workflow invocation. Behavioral mini-eval (#33,#151,#152,#155): #33 now kept; SWE-bench (#155) correctly flagged NOT-PRIMARY by the defender; #152 scope-removal overturned via the curator-citation prior. No Papers.md content change.

…racy Closes the residual gap from 7fe068c: the method-accuracy path bypassed the defender, so a method-accuracy verdict on a paper the curators cite in a ResearchAreas page could apply a removal of its only cell — orphaning it and severing the live cross-reference (the exact risk the defender flagged for #152). A wrong method row on a cited paper is now a re-row, not a deletion: - workflow: proposer reports cited_by_curators; adjudicate() routes any removal (unsupported / not_primary) of a cited paper through the steelman defender regardless of nature. A re-row MOVE or an uncited method-accuracy fix still needs only skeptics; scope removals still reach the defender (gated, not blanket). - reviewer agent: a curator-cited paper is never UNSUPPORTED/NOT-PRIMARY — a wrong method row is a MISPLACED re-row. Verified by a deterministic truth-table check of the routing guard (7/7) plus the behavioral mini-eval (#33 kept; cited #151/#152 received no applied removal). No Papers.md change.

benjibromberg · 2026-06-03T13:53:04Z

Hardened re-scrutiny of the held proposals

Re-ran all 32 held-proposal papers through the hardened pipeline (propose → skeptics → steelman defender for scope/cited removals → gated domain grounding). The over-strict scope/philosophy deletions are gone:

	original audit	hardened re-scrutiny
scope removals / not-primary	41	0 applied (6 overturned by the defender)
total changes	~50	7 (6 method-accuracy + 1 additive)

✅ Apply — method-accuracy fixes, orphan-safe (5)

#6 · Ji et al. 2021 — DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome — ADD Foundation Models: Masked Language Modeling × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)
#34 · Andrews et al. 2025 — Designing cultured tissue moulds using evolutionary strategies — re-row SVM × Scaffolding → Genetic Algorithms × Scaffolding (skeptic-verified; paper keeps ≥1 cell)
#35 · Andrews et al. 2023 — Rapid prediction of lab-grown tissue properties using deep learning — re-row Deep Learning × Scaffolding → GAN / VAE × Scaffolding (skeptic-verified; paper keeps ≥1 cell)
#119 · Rosen, Y., Roohani, Y., Agrawal, A., Samotorčan, L., Tabula Sapiens Consortium, Quake, S. R., & Leskovec, J. 2026 — Universal Cell Embeddings: A Foundation Model for Cell Biology — remove Foundation Models: Masked Language Modeling × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)
#121 · Roohani et al. 2024 — Predicting transcriptional outcomes of novel multigene perturbations with GEARS — remove Foundation Models: Cell-State & Perturbation Prediction × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)

⚠️ Needs your decision — correct method-fix, but would orphan the paper (2)

Each removes the paper's only cell, and its actual technique has no matrix row — a re-row / not-primary / keep-as-approximation call, not an auto-apply:

#59 · Antonakoudis & Richelle 2026 — Systematic data-driven genome-scale metabolic model reduction for bioprocess modeling: CHO culture case study — remove Bayesian Optimization × Bioprocess control (true method is not a matrix row). The paper applies Bayesian flux estimation (Bayesian statistical inference via the MetRaC framework) to derive uncertainty-aware rate bounds from exo-metabolomics data. This is Bayesian inference/prob…
#60 · Mathieu et al. 2025 — Integrative multi-omics modeling for cultivated meat production, quality, and safety — remove Deep Learning × Cellular Engineering (true method is not a matrix row). The methods text is explicit about the paper's computational approach: 'The causal analysis algorithm employed in this paper scores and ranks interactome nodes based on random walk network propagation…

↩︎ Overturned by the defender — KEEP (6)

Original scope/method removals rejected because the paper is curator-cited and/or the method label is actually correct:

#1 · Nikkhah et al. 2023 — Toward sustainable culture media: Using artificial intelligence to optimize reduced-serum formulations for cultivated meat — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
#7 · Tamburini et al. 2014 — Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology — proposed remove Deep Learning × Bioprocess control, kept (cited_by_curators=True)
#11 · Shen et al. 2024 — Chemometrics methods, sensory evaluation and intelligent sensory technologies combined with GAN-based integrated deep-learning framework to discriminate salted goose breeds — proposed remove CNN × Sensory Prediction, kept (cited_by_curators=True) — note: kept on the cited-prior, but the original "no CNN" concern is unresolved; worth a human re-row check
#17 · Cosenza & Block 2021 — A generalizable hybrid search framework for optimizing expensive design problems using surrogate models — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
#18 · Cosenza 2022 — Sequential Learning Methods for the Experimental Optimization of Cell Culture Media for Cellular Agriculture — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
#126 · Youngblut et al. 2025 — scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository — proposed re-row Domain-Specific Biomedical Agents × Cellular Engineering, kept (cited_by_curators=False)

▪︎ Kept, no change (19)

All resolved to KEEP at the propose/skeptic stage — including every general-CS / general-biomedical paper the original audit wanted to delete (DESC #5, scGNN #13, SWE-bench #155, GPQA #156, MMLU-Pro #157, #110, #114, …). Pruning general-domain benchmarks would be a separate explicit curatorial decision — the hardened review (correctly) won't propose it on scope grounds.

#5, #13, #22, #33, #40, #53, #68, #90, #103, #110, #114, #118, #122, #151, #152, #155, #156, #157, #196

Hardened run wf_e54f3ded-9ce. Supersedes the un-hardened held-proposals comment above.

Adds a non-destructive taxonomy_gap verdict so the classification audit can keep a paper that applies a real AI/ML method whose matrix row/column does not yet exist, and surface a proposed new row/column for curator decision instead of forcing a wrong cell or orphaning the paper. - reviewer: taxonomy_gap verdict + precedence ladder (gap is the last resort, after re-row into an existing label); method-family precision notes (Bayesian Optimization vs Bayesian inference; GNN vs classical network propagation) so a step-2 re-row does not grab a superficially-similar row and bury the real gap. - workflow: taxonomy_gaps schema; per-ref collection that never enters the adjudicated change set (so a gap can never become an applied removal); a Taxonomy phase that clusters pooled gaps and adversarially verifies clusters of >=2 papers into proposed new rows/columns (singletons are parked). - verify_routing.mjs: deterministic guard — the asymmetric-burden routing truth-table, the non-orphan structural invariant, and the >=2 cluster gate. - docs: SKILL.md (verdict, non-orphan guarantee, run-full-corpus note, human-applied rows/columns) and the CLAUDE.md lifecycle entry. Verified: verify_routing 20/20; behavioral mini-eval over refs 59, 60, 34, 121 (taxonomy gaps for 59 and 60, re-row for 34, drop-redundant-cell for 121) kept all four papers (no orphaning removal).

benjibromberg added 2 commits June 2, 2026 14:20

benjibromberg force-pushed the worktree-feat+matrix-classification-audit branch from 568dfc5 to 2fe066d Compare June 2, 2026 18:20

benjibromberg added 2 commits June 3, 2026 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(papers): adversarial methods-grounded multi-category matrix audit#33

feat(papers): adversarial methods-grounded multi-category matrix audit#33
benjibromberg wants to merge 6 commits into
mainfrom
worktree-feat+matrix-classification-audit

benjibromberg commented Jun 2, 2026 •

edited

Loading

Uh oh!

benjibromberg commented Jun 2, 2026 •

edited

Loading

Uh oh!

benjibromberg commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benjibromberg commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How the matrix supports this

Test Plan

Uh oh!

benjibromberg commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Held audit proposals — for human triage

Moves — reclassify an existing cell (9)

Removes — drop an existing cell (27)

Not-primary — proposed to leave the matrix entirely (14)

Uh oh!

benjibromberg commented Jun 3, 2026

Hardened re-scrutiny of the held proposals

✅ Apply — method-accuracy fixes, orphan-safe (5)

⚠️ Needs your decision — correct method-fix, but would orphan the paper (2)

↩︎ Overturned by the defender — KEEP (6)

▪︎ Kept, no change (19)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benjibromberg commented Jun 2, 2026 •

edited

Loading

benjibromberg commented Jun 2, 2026 •

edited

Loading