Skip to content

feat(papers): adversarial methods-grounded multi-category matrix audit#33

Open
benjibromberg wants to merge 6 commits into
mainfrom
worktree-feat+matrix-classification-audit
Open

feat(papers): adversarial methods-grounded multi-category matrix audit#33
benjibromberg wants to merge 6 commits into
mainfrom
worktree-feat+matrix-classification-audit

Conversation

@benjibromberg

@benjibromberg benjibromberg commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

Re-audits the Papers.md matrix against each paper's methods section (pulled
from the caail Zotero full-text cache) and adds multi-category placements where
a paper substantively applies more than one AI/ML method. Two parts:

1. Reusable tooling — Phase 4 of the Zotero⇄CAAIL lifecycle

  • .claude/agents/caail-classification-reviewer.md — read-only, full-text-grounded reviewer of (method × area) placement (distinct from the bibliographic caail-citation-reviewer).
  • .claude/skills/matrix-classification-audit/SKILL.md + extract_matrix_corpus.py (reuses scope.py's Zotero helpers; indexes both groups by DOI + URL; pulls each matrix paper's methods text into a per-ref corpus for adversarial review).
  • Registered in CLAUDE.md (3→4 skill lifecycle + reviewer-agents list) and cross-referenced from zotero-to-caail-sync. Corpus is gitignored.

2. 20 verified multi-category cross-listings (matrix cells only — no reference text changed, no IDs renumbered). Each was proposed by a methods-reading agent, survived 3 independent adversarial skeptics, and the lower-confidence ones were re-checked by a fresh agent. Highlights:

  • #32 Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors (7-model bioprocess benchmark)
  • #169 Hashizume & Ying 2025 → Ensemble / Genetic Algorithms
  • #117 Cui 2024 + #120 Rizvi 2026 → Cell-State & Perturbation Prediction; #120 also Reinforcement Learning
  • #11/#20/#26/#28/#68/#61/#93/#161/#182/#72 → see commits for the per-paper method spans

The audit also surfaced 9 moves / 27 removes / 14 "leave the matrix" proposals. These are deliberately not included here — they challenge CAAIL's curatorial choice to catalogue general/foundational methods (e.g. it wanted scGPT, GEARS, SWE-bench out of cell-ag columns), so they're held for human triage rather than auto-applied.

How the matrix supports this

Multi-cell classification needed zero code changes — the parser already accumulates methods[]/areas[] per reference across cells; this PR is purely additive matrix anchors.

Test Plan

  • pnpm --dir site lint:papers — 0 hard errors (no dangling anchors, no orphaned primary refs)
  • pnpm --dir site test — 289/289 pass (incl. multi-cell parser test)
  • pnpm --dir site parsegenerate-data.ts cross-tally assertions pass
  • Fresh-agent adversarial re-check on the lowest-confidence adds (caught + reverted one over-eager placement, #16)

🤖 Generated with Claude Code

Add Phase 4 of the Zotero⇄CAAIL lifecycle: a methods-grounded re-audit of
the Papers.md matrix itself.

- caail-classification-reviewer agent: read-only, full-text-grounded reviewer
  of (method × area) placement, distinct from the bibliographic citation
  reviewer. Verdicts DEFENSIBLE / MISPLACED / UNSUPPORTED per cell, plus
  MISSING-CELL recommendations and a NOT-PRIMARY flag.
- matrix-classification-audit skill + extract_matrix_corpus.py: parses the
  matrix and references out of Papers.md, indexes both Zotero groups by DOI
  and URL, and pulls each matrix paper's methods section from the PDF
  full-text cache into a per-ref corpus for adversarial review.
- Register the new skill (Phase 4) and reviewer in CLAUDE.md; cross-reference
  it from zotero-to-caail-sync; gitignore the corpus build artifact.
@benjibromberg

benjibromberg commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Held audit proposals — for human triage

The methods-grounded audit behind this PR also surfaced 9 moves, 27 removes, and 14 “leave the matrix” proposals. They were deliberately not applied here because they challenge CAAIL's curatorial choice to catalogue general/foundational methods (the strict reviewer wanted papers like scGPT, GEARS, UCE, SWE-bench, GPQA out of cell-ag columns). Each cleared a majority of 3 independent adversarial skeptics, but each is a curatorial call — tick the ones to action in a follow-up.

They split into two natures: (A) method-accuracy fixes (the cell names the wrong technique) and (B) scope/philosophy calls (method is right, but the application isn't cell-ag-specific). A-type fixes are the safer subset.

Update — #33 overturned. A full-text + domain-literature re-review (stirred-tank mixing CFD is core to cultivated-meat bioreactor scale-up; ResearchAreas/Bioprocess.md already cites this paper) reversed both #33 proposals — it stays in CNN × Bioprocess control. Its struck entries below are kept for the record. The over-strictness that flagged #33 likely affects other (B) scope/philosophy removals too — re-check before actioning.

Moves — reclassify an existing cell (9)

  • #6 · Ji et al. 2021DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome (method+scope)
    • Deep Learning × Cellular EngineeringFoundation Models: Masked Language Modeling × AI Tooling / Methodology
    • DNABERT follows the same training process as BERT... we significantly modified the pretraining process from the original BERT implementation by removing next sentence prediction, adjusting the sequence length and forcing the model to predict contiguous k tokens adapting to DNA scenario. During pre-t…
  • #13 · Wang et al. 2021scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses (scope/philosophy)
    • GNN × Cellular EngineeringGNN × AI Tooling / Methodology
    • scGNN integrates three iterative multi-modal autoencoders and outperforms existing tools for gene imputation and cell clustering on four benchmark scRNA-Seq datasets. In an Alzheimer's disease study with 13,214 single nuclei from postmortem brain tissues, scGNN successfully illustrated disease-relat…
  • #17 · Cosenza & Block 2021A generalizable hybrid search framework for optimizing expensive design problems using surrogate models (method+scope)
    • Genetic Algorithms × Media OptimizationGenetic Algorithms × AI Tooling / Methodology
    • The NNGA algorithm is based on an RBF-assisted genetic algorithm. The NNGA uses an RBF model to suggest points that are close to but not directly on top of optima, using a truncated genetic algorithm (TGA).
  • #34 · Andrews et al. 2025Designing cultured tissue moulds using evolutionary strategies (method-accuracy)
    • SVM × ScaffoldingGenetic Algorithms × Scaffolding
    • Genetic algorithms (GA) are used here to find optimal mould designs. They are a form of optimisation algorithm that can be used to find solutions of complex or abstract problems. Used as a design tool, they constitute a form of artificial or computational intelligence.
  • #35 · Andrews et al. 2023Rapid prediction of lab-grown tissue properties using deep learning (method-accuracy)
    • Deep Learning × ScaffoldingGAN / VAE × Scaffolding
    • We use the TensorFlow framework for machine learning to implement the pix2pix conditional GAN (cGAN) described in [25].
  • #40 · Gao et al. 2025TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools (method-accuracy)
    • General-Purpose Biomedical Agents × AI Tooling / MethodologyDomain-Specific Biomedical Agents × AI Tooling / Methodology
    • TOOLUNIVERSE has 211 biomedical tools, covering the following categories: adverse events, risks, safety; addiction and abuse; drug patient populations; drug administration and handling; pharmacology; drug use, mechanism, composition; ID and labeling tools; general clinical annotations; clinical labo…
  • #53 · Liu et al. 2026Advancing AI Research Assistants with Expert-Involved Learning (method-accuracy)
    • Scientific Literature & Discovery Agents × AI Tooling / MethodologyBenchmarks & Evaluation Frameworks × AI Tooling / Methodology
    • we propose a new dataset designed for evaluating the ability of FMs for long document summarization and scientific figure understanding... we collected the ground truth information paired with model outputs and performed a quantitative assessment with various metrics
  • #118 · Rosen et al. 2024Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN (scope/philosophy)
    • Foundation Models: LM + Biological Priors × Cellular EngineeringFoundation Models: LM + Biological Priors × AI Tooling / Methodology
    • Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene fun…
  • #126 · Youngblut et al. 2025scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository (scope/philosophy)
    • Domain-Specific Biomedical Agents × Cellular EngineeringDomain-Specific Biomedical Agents × AI Tooling / Methodology
    • SRAgent is a Python package that utilizes LangGraph for constructing the agentic workflows... To comply with NCBI API rate limits, jobs are triggered every 1-5 minutes, processing 3-5 datasets per run... All extracted metadata is stored in a GCP SQL database for downstream processing.

Removes — drop an existing cell (27)

  • #1 · Nikkhah et al. 2023Toward sustainable culture media: Using artificial intelligence to optimize reduced-serum formulations for cultivated meat (method-accuracy)
    • Deep Learning × Media Optimization
    • The paper uses a Radial Basis Function (RBF) neural network, which is explicitly a shallow, single-hidden-layer architecture. The methods state: 'RBF has fewer parameters requiring optimization compared to the widely used multilayer perceptron (MLP) neural networks, as it has only one hidden layer a…
  • #5 · Li et al. 2020Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis (scope/philosophy)
    • Deep Learning × Cellular Engineering
    • The paper presents DESC, a general-purpose deep autoencoder-based scRNA-seq clustering tool applied exclusively to macaque retina bipolar cells, human pancreatic islet cells, and human PBMCs from lupus patients. None of these are cellular agriculture contexts. The methods text contains no mention of…
  • #7 · Tamburini et al. 2014Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology (method-accuracy)
    • Deep Learning × Bioprocess control
    • No deep learning is used anywhere in this paper. The methods section describes exclusively classical chemometric and statistical techniques: MLR (Multiple Linear Regression) for on-line monitoring ('an MLR analysis was carried out'), PCA ('PCA was carried out on the acquired spectra in order to sele…
  • #11 · Shen et al. 2024Chemometrics methods, sensory evaluation and intelligent sensory technologies combined with GAN-based integrated deep-learning framework to discriminate salted goose breeds (method-accuracy)
    • CNN × Sensory Prediction
    • No CNN is described anywhere in the methods text. Section 2.8 and the abstract describe an InfoGAN for data augmentation and 'several base classifiers' fused via dynamic weighting, but no convolutional neural network architecture is mentioned or used. The full methods excerpt (84,909 chars available…
  • #17 · Cosenza & Block 2021A generalizable hybrid search framework for optimizing expensive design problems using surrogate models (method+scope)
    • Deep Learning × Media Optimization
    • The paper contains no neural network or deep learning architecture. Section 2.2 explicitly states: 'The NNGA algorithm is based on an RBF-assisted genetic algorithm.' The RBF surrogate model (Section 2.1) is a classical radial basis function interpolation (cubic RBF with a linear tail), not a deep l…
  • #17 · Cosenza & Block 2021A generalizable hybrid search framework for optimizing expensive design problems using surrogate models (method+scope)
    • Genetic Algorithms × Media Optimization
    • The paper does apply a genetic algorithm (the TGA in NNGA, Section 2.2: 'ranking, pairing, crossover and mutation steps'), but all experiments are on mathematical benchmark test functions (Ackley, Rastrigin, Griewank, Levy, Michalewicz, Rosenbrock, etc. — Table 1, Section 2.5). No media optimization…
  • #18 · Cosenza 2022Sequential Learning Methods for the Experimental Optimization of Cell Culture Media for Cellular Agriculture (method-accuracy)
    • Deep Learning × Media Optimization
    • No deep learning method is applied by the dissertation author. The only reference to neural networks appears in the background survey: 'neural networks have been used to optimize bioreactor cultures [46] and multi-objective protein storage conditions [68]' — attributing this work to other cited auth…
  • #22 · Lao et al. 2022Global coordination of the mutation and growth rates across the genetic and nutritional variety in Escherichia coli (scope/philosophy)
    • SVM × Cellular Engineering
    • The SVM in this paper is applied to classify E. coli genotype categories (wild-type vs. reduced-genome vs. mutator strains) and media types, and to predict mutation/growth rates in a basic microbiology study of E. coli mutation-growth-rate trade-offs. There is no cellular engineering application in …
  • #33 · Rojek et al. 2021AI-Accelerated CFD Simulation Based on OpenFOAM and CPU/GPU Computing. In M. Paszynski, D. Kranzlmüller, V. V. Krzhizhanovskaya, J. J. Dongarra, & P. M. A. Sloot (Eds.), (scope/philosophy)
    • Overturned → KEEP CNN × Bioprocess control. Full-text re-review + domain analysis: stirred-tank mixing CFD is the core engineering challenge of cultivated-meat bioreactor scale-up (STRs dominate; impeller shear is the central animal-cell constraint), the method is general OpenFOAM stirred-tank CFD, and CAAIL's own ResearchAreas/Bioprocess.md already cites this paper. The “out of scope” call was over-strict.
  • #34 · Andrews et al. 2025Designing cultured tissue moulds using evolutionary strategies (method-accuracy)
    • SVM × Scaffolding
    • There is no mention of SVM anywhere in the paper. The methods are a genetic algorithm combined with the RAPTOR deep-learning tissue-organisation model and CONDOR biophysical simulations. SVM does not appear as a model, baseline, or comparator.
  • #59 · Antonakoudis & Richelle 2026Systematic data-driven genome-scale metabolic model reduction for bioprocess modeling: CHO culture case study (method-accuracy)
    • Bayesian Optimization × Bioprocess control
    • The paper applies Bayesian flux estimation (the MetRaC framework) to derive uncertainty-aware uptake/secretion rate bounds for metabolic model reduction — this is Bayesian statistical inference, not Bayesian Optimization. Bayesian Optimization requires a surrogate model (e.g., Gaussian Process) and …
  • #60 · Mathieu et al. 2025Integrative multi-omics modeling for cultivated meat production, quality, and safety (scope/philosophy)
    • Deep Learning × Cellular Engineering
    • No deep learning method is used anywhere in the paper. The methods describe a directed-graph interactome model with random walk network propagation and z-score statistical testing for causal analysis: 'The causal analysis algorithm employed in this paper scores and ranks interactome nodes based on r…
  • #68 · Li et al. 2024Leveraging large language models for metabolic engineering design (method-accuracy)
    • Domain-Specific Biomedical Agents × AI Tooling / Methodology
    • The paper's LLM usage is for supervised NER and relation extraction (fine-tuned Qwen1.5 with LoRA), not an autonomous agent system with tool use or agentic reasoning. Methods state: 'we used the Qwen Lora to extract strain ID and gene entities from segments of research papers along with correspondin…
  • #90 · Yu et al. 2026GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents (scope/philosophy)
    • Domain-Specific Biomedical Agents × Cellular Engineering
    • The paper concerns generalizable microscopy image segmentation for general cell biology (mitochondria, ER, Golgi, diverse cell types from mouse brain, human pancreas, plant roots). The HTML full text confirms 'zero references to cellular agriculture, cultivated meat, food science, or bioengineering …
  • #103 · Margulis et al. 2021Intense bitterness of molecules: Machine learning for expediting drug discovery (scope/philosophy)
    • Ensemble Learning × Sensory Prediction
    • The paper develops BitterIntense, an XGBoost classifier that predicts intense bitterness of pharmaceutical molecules to aid drug discovery. The application domain is entirely pharmaceutical compliance (pediatric/geriatric drug palatability) with no connection to cellular agriculture. The CAAIL 'Sens…
  • #110 · Sze & Hassoun 2024Evaluation of search-enabled pretrained Large Language Models on retrieval tasks for the PubChem database (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • The paper does not create a benchmark dataset or evaluation framework as a reusable artifact. It is an evaluation study of GPT-4o on eight pre-existing PubChem retrieval protocols. The methods describe adapting existing protocols into prompts and prompt-engineering them ('we develop a methodology fo…
  • #114 · Yang et al. 2024Reply to: Deeper evaluation of a single-cell foundation model (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • This paper is a 'Matters arising' reply letter defending the original scBERT paper against the Boiarsky et al. critique. It runs a limited set of defensive comparison experiments (scBERT vs. L1 logistic regression on cross-organ cell-type annotation) but does not create, propose, or release any benc…
  • #119 · Rosen, Y., Roohani, Y., Agrawal, A., Samotorčan, L., Tabula Sapiens Consortium, Quake, S. R., & Leskovec, J. 2026Universal Cell Embeddings: A Foundation Model for Cell Biology (method-accuracy)
    • Foundation Models: Masked Language Modeling × Cellular Engineering
    • No span in the methods_text or abstract identifies UCE's pre-training objective as masked language modeling. The methods excerpt compares UCE to Geneformer and scGPT (implicitly distinguishing UCE's approach from theirs) but never describes UCE as using a masked LM objective. UCE is described as 'co…
  • #121 · Roohani et al. 2024Predicting transcriptional outcomes of novel multigene perturbations with GEARS (method-accuracy)
    • Foundation Models: Cell-State & Perturbation Prediction × Cellular Engineering
    • GEARS is a task-specific GNN architecture trained end-to-end from scratch on perturbation datasets, not a pre-trained foundation model. The methods describe two GNN encoders (fpert and fgene) plus MLP components trained with an autofocus direction-aware loss — there is no large pre-trained model, no…
  • #122 · Magnusson, J. P., Roohani, Y., Stauber, D., Situ, Y., Teba, P. R. de C., Sandberg, R., Leskovec, J., & Qi, L. S. 2024PreciCE: Precision engineering of cell fates via data-driven multi-gene control of transcriptional networks (method-accuracy)
    • Deep Learning × Cellular Engineering
    • The abstract states 'a machine learning-based computational algorithm that uses single-cell RNA sequencing data to predict multi-gene perturbation sets' but does not specify deep learning. The methods_text excerpt covers only wet-lab CRISPR/scRNA-seq protocols and contains no description of a neural…
  • #122 · Magnusson, J. P., Roohani, Y., Stauber, D., Situ, Y., Teba, P. R. de C., Sandberg, R., Leskovec, J., & Qi, L. S. 2024PreciCE: Precision engineering of cell fates via data-driven multi-gene control of transcriptional networks (method-accuracy)
    • Foundation Models: Cell-State & Perturbation Prediction × Cellular Engineering
    • The abstract describes 'a machine learning-based computational algorithm' but never mentions a foundation model, pre-trained model, transformer, or any architecture associated with cell-state or perturbation prediction foundation models (e.g., scGPT, Geneformer, GEARS). The methods_text excerpt cove…
  • #151 · Gu et al. 2024How Do Analysts Understand and Verify AI-Assisted Data Analyses? (scope/philosophy)
    • Domain-Specific Biomedical Agents × AI Tooling / Methodology
    • The paper is a CHI 2024 HCI user study examining how human data analysts verify AI-generated analyses using a purpose-built design probe. It does not develop or apply any biomedical agent — the AI system studied is a generic code-interpreter assistant, and the datasets used are retail, movie, and fl…
  • #152 · Gu et al. 2024How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study (scope/philosophy)
    • Domain-Specific Biomedical Agents × AI Tooling / Methodology
    • The paper is a CHI Wizard-of-Oz user study about how data analysts respond to AI planning assistance. It is neither biomedical nor domain-specific to biomedicine — the running example throughout the methods is a soccer referee/skin-tone dataset. No actual AI agent is built or deployed; the 'wizard' …
  • #155 · Jimenez et al. 2024SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • SWE-bench is an evaluation framework for software engineering tasks (resolving GitHub issues in Python OSS repos: astropy, django, matplotlib, etc.). The methods section covers BM25 retrieval for code files, context window limits, patch generation, and model performance on software debugging — with …
  • #156 · Rein et al. 2023GPQA: A Graduate-Level Google-Proof Q&A Benchmark (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • GPQA is a general scientific reasoning benchmark covering biology (Molecular Biology, Genetics), physics (Astrophysics, Quantum Mechanics), and chemistry (Organic Chemistry, General Chemistry). The methods text describes a question-writing protocol, expert/non-expert validation stages, and domain br…
  • #157 · Wang et al. 2024MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • MMLU-Pro is a general-purpose academic language understanding benchmark spanning 14 subjects (Math, Physics, Engineering, History, Law, Psychology, etc.) with zero cellular agriculture content. The methods text confirms the benchmark tests models on reasoning across generic academic disciplines. The…
  • #196 · Gyening et al. 2025MeatScan: An image dataset for machine learning-based classification of fresh and spoiled cow meat (scope/philosophy)
    • Benchmarks & Evaluation Frameworks × AI Evaluation & Benchmarking
    • The paper is a Data in Brief dataset descriptor for MeatScan, an image dataset of conventional (slaughtered) cow meat in Ghanaian markets. The benchmark framing is aspirational and secondary: the methods state the dataset 'could also serve as a benchmark dataset for evaluating the performance and ro…

Not-primary — proposed to leave the matrix entirely (14)

  • #5 · Li et al. 2020Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis (scope/philosophy)
    • → Software.md
    • DESC is a general-purpose bioinformatics tool paper for scRNA-seq clustering. Its applications are entirely biomedical (macaque retina, human pancreas, human PBMCs/lupus). It does not apply an AI method to any cellular agriculture research problem. The methods text confirms: 'we analyzed a scRNA-seq…
  • #7 · Tamburini et al. 2014Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology (method-accuracy)
    • → Reviews & Perspectives
    • The paper applies classical chemometric and statistical methods (MLR, PCA, PLS) paired with NIR spectroscopy to bioprocess monitoring — not any of the AI/ML methods in the matrix's valid row vocabulary. The methods section states: 'an MLR analysis was carried out' (on-line set), 'PCA was carried out…
  • #22 · Lao et al. 2022Global coordination of the mutation and growth rates across the genetic and nutritional variety in Escherichia coli (scope/philosophy)
    • → Reviews & Perspectives
    • Although this paper applies SVM to a scientific problem, it has no cellular agriculture relevance. The study examines E. coli mutation and growth rates across genetic variants (reduced-genome and mutator strains) and nutritional media as a fundamental microbiology/evolutionary biology investigation.…
  • #33 · Rojek et al. 2021AI-Accelerated CFD Simulation Based on OpenFOAM and CPU/GPU Computing. In M. Paszynski, D. Kranzlmüller, V. V. Krzhizhanovskaya, J. J. Dongarra, & P. M. A. Sloot (Eds.), (scope/philosophy)
    • Overturned → KEEP CNN × Bioprocess control. Full-text re-review + domain analysis: stirred-tank mixing CFD is the core engineering challenge of cultivated-meat bioreactor scale-up (STRs dominate; impeller shear is the central animal-cell constraint), the method is general OpenFOAM stirred-tank CFD, and CAAIL's own ResearchAreas/Bioprocess.md already cites this paper. The “out of scope” call was over-strict.
  • #60 · Mathieu et al. 2025Integrative multi-omics modeling for cultivated meat production, quality, and safety (scope/philosophy)
    • → Reviews & Perspectives
    • This is a perspective/framework paper proposing an integrative multi-omics methodology rather than a primary research paper applying a specific AI/ML method. The abstract states 'we discuss the potential of an integrative multi-omics approach' — the language of a perspective, not an experimental app…
  • #103 · Margulis et al. 2021Intense bitterness of molecules: Machine learning for expediting drug discovery (scope/philosophy)
    • → Software.md (as a bitterness prediction tool for pharma/food science, if deemed relevant to cell-ag taste engineering at all) or removed from CAAIL entirely given the absence of any cellular agriculture application
    • This paper applies an AI method (XGBoost ensemble) to a concrete prediction problem, but the problem domain — predicting pharmaceutical drug bitterness for drug discovery compliance — has no connection to cellular agriculture. The paper does not apply AI to any cell-ag research area (Media Optimizat…
  • #110 · Sze & Hassoun 2024Evaluation of search-enabled pretrained Large Language Models on retrieval tasks for the PubChem database (scope/philosophy)
    • → Reviews & Perspectives
    • This paper evaluates GPT-4o on PubChem database retrieval tasks (pharmaceutical chemistry: compound similarity, bioactivity, gene-protein interactions). It does not apply any AI method to a cellular agriculture problem. The evaluation domain is general biomedical/cheminformatics LLM capability, with…
  • #114 · Yang et al. 2024Reply to: Deeper evaluation of a single-cell foundation model (scope/philosophy)
    • → Reviews & Perspectives
    • This is a 'Matters arising' reply/correspondence letter (the paper explicitly labels itself 'Matters arising') authored by the original scBERT team in response to a critique by Boiarsky et al. It does not apply any AI/ML method to a cellular agriculture research problem as primary work. The full tex…
  • #151 · Gu et al. 2024How Do Analysts Understand and Verify AI-Assisted Data Analyses? (scope/philosophy)
    • → OtherResources.md
    • This is a CHI 2024 human-computer interaction user study that investigates analyst verification workflows when using AI-assisted data analysis tools. It applies no AI/ML method to any research problem — it studies human behavior around AI outputs using a design probe methodology. It has no connectio…
  • #152 · Gu et al. 2024How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study (scope/philosophy)
    • → Reviews & Perspectives
    • This is a CHI 2024 HCI user study (Wizard-of-Oz methodology) that produces design guidelines for LLM-supported data analysis planning assistants. It does not apply any AI/ML method to a cellular-agriculture research problem. The core contribution is an empirical study of human responses to AI assist…
  • #155 · Jimenez et al. 2024SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (scope/philosophy)
    • → Reviews & Perspectives
    • SWE-bench does not apply any AI method to a cellular agriculture problem. It is a general software engineering benchmark that evaluates whether LLMs can resolve GitHub issues in Python repositories (astropy, django, matplotlib, seaborn, flask, requests, xarray, pylint, pytest, scikit-learn, sphinx, …
  • #156 · Rein et al. 2023GPQA: A Graduate-Level Google-Proof Q&A Benchmark (scope/philosophy)
    • → Datasets/Benchmarks.md
    • GPQA does not apply any AI/ML method to a cellular agriculture research area. It is a benchmark dataset paper that constructs a graduate-level Q&A evaluation set across general biology, physics, and chemistry domains. The methods text confirms the paper is entirely about question construction, exper…
  • #157 · Wang et al. 2024MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (scope/philosophy)
    • → Datasets/Benchmarks.md
    • MMLU-Pro creates and evaluates a general-purpose LLM benchmark with no cellular agriculture scope whatsoever. The abstract states it extends MMLU 'across diverse domains' (Math, Physics, Engineering, History, Law, Psychology). The methods section describes 5-shot CoT prompting and regex-based answer…
  • #196 · Gyening et al. 2025MeatScan: An image dataset for machine learning-based classification of fresh and spoiled cow meat (scope/philosophy)
    • → Remove from Papers.md entirely (out of CAAIL scope); if kept, data artifact only in Datasets/Cow.md
    • MeatScan is a Data in Brief dataset descriptor whose primary contribution is a curated image dataset (11,000 RGB images of fresh/spoiled slaughtered cow meat from Ghanaian markets). The only AI experiment is an explicitly labelled 'baseline experiment' using MobileNetV2 'to demonstrate that the Meat…

Generated by the matrix-classification-audit workflow (run wf_810da7cd-742); each item cleared ≥2/3 adversarial skeptics. Not auto-applied — these are curatorial calls.

Add 18 cross-listings where a paper substantively applies more than one
AI/ML method (verified against each paper's methods section via the
matrix-classification-audit workflow; each survived independent adversarial
review). No reference text changed; matrix cells only.

- ref 11 Shen 2024 → Ensemble Learning (InfoGAN + dynamically-weighted base
  classifiers)
- ref 20 Rafieyan 2024 → Ensemble Learning (XGBoost/GBM/RF/LightGBM)
- ref 26 Sun 2023, ref 28 Sun 2026 → SVM / Ensemble (LS-SVM, RF/GBDT/SVC)
- ref 32 Roell 2022 → Deep Learning / Ensemble / K-Nearest Neighbors
  (seven model families benchmarked for bioprocess prediction)
- ref 61 Wang 2025b, ref 93 Tang 2026 → Agent Infrastructure (LangGraph /
  hybrid knowledge frameworks)
- ref 68 Li 2024 → GNN (GEM-as-graph submodule)
- ref 117 Cui 2024, ref 120 Rizvi 2026 → Cell-State & Perturbation Prediction;
  ref 120 also Reinforcement Learning (GRPO)
- ref 161 Narayanan 2025 → Reinforcement Learning (RL-trained chemistry model)
- ref 169 Hashizume & Ying 2025 → Ensemble Learning / Genetic Algorithms
- ref 182 King 2004 → Active Learning (experiment-selection strategy)
ref 72 trains 18 models including SVM and MLP/Bayesian neural networks for
sensory (flavor) prediction under 10-fold CV — confirmed by full-text
re-verification as the paper's own applied methods, not a background
enumeration. Adds (SVM × Sensory) and (Deep Learning × Sensory) alongside
its existing Ensemble Learning placement.
@benjibromberg benjibromberg force-pushed the worktree-feat+matrix-classification-audit branch from 568dfc5 to 2fe066d Compare June 2, 2026 18:20
…movals

The audit's #33 false positive (it proposed deleting a CNN-surrogate-CFD
paper from Bioprocess control, which ResearchAreas/Bioprocess.md already
cites) traced to two gaps: the reviewer read only the paper (never CAAIL's
own curation context), and a destructive removal carried no more burden than
an additive placement. This bakes an asymmetric, context-aware burden on
scope removals into the durable tooling.

- extract_matrix_corpus.py: add per-ref cited_in_research_areas (scan
  ResearchAreas/*.md by surname+year / DOI) — an intentional-placement KEEP
  prior (correctly flags #33 -> Bioprocess control).
- caail-classification-reviewer: read the ResearchAreas/<Area>.md scope and
  honor that prior before any scope call; tag every verdict
  nature=method-accuracy|scope; default a general-method scope concern to a
  MOVE to AI Tooling / Methodology, not a removal; method-absent papers stay a
  firm method-accuracy flag; never hedge a non-fitting paper into a
  destination-less move.
- .claude/workflows/matrix-classification-audit.js: durable named workflow —
  propose -> skeptics -> (scope only) steelman defender -> gated domain-relevance
  web grounding. method-accuracy + additive changes bypass the heavy layers.
  Self-bootstraps inputs from matrix-corpus.json (args is not reliably
  delivered); fan-out pinned to Sonnet.
- SKILL.md / CLAUDE.md: document the asymmetric burden, the layers, and the
  named-workflow invocation.

Behavioral mini-eval (#33,#151,#152,#155): #33 now kept; SWE-bench (#155)
correctly flagged NOT-PRIMARY by the defender; #152 scope-removal overturned
via the curator-citation prior. No Papers.md content change.
…racy

Closes the residual gap from 7fe068c: the method-accuracy path bypassed the
defender, so a method-accuracy verdict on a paper the curators cite in a
ResearchAreas page could apply a removal of its only cell — orphaning it and
severing the live cross-reference (the exact risk the defender flagged for #152).

A wrong method row on a cited paper is now a re-row, not a deletion:
- workflow: proposer reports cited_by_curators; adjudicate() routes any removal
  (unsupported / not_primary) of a cited paper through the steelman defender
  regardless of nature. A re-row MOVE or an uncited method-accuracy fix still
  needs only skeptics; scope removals still reach the defender (gated, not blanket).
- reviewer agent: a curator-cited paper is never UNSUPPORTED/NOT-PRIMARY — a wrong
  method row is a MISPLACED re-row.

Verified by a deterministic truth-table check of the routing guard (7/7) plus the
behavioral mini-eval (#33 kept; cited #151/#152 received no applied removal). No
Papers.md change.
@benjibromberg

Copy link
Copy Markdown
Member Author

Hardened re-scrutiny of the held proposals

Re-ran all 32 held-proposal papers through the hardened pipeline (propose → skeptics → steelman defender for scope/cited removals → gated domain grounding). The over-strict scope/philosophy deletions are gone:

original audit hardened re-scrutiny
scope removals / not-primary 41 0 applied (6 overturned by the defender)
total changes ~50 7 (6 method-accuracy + 1 additive)

✅ Apply — method-accuracy fixes, orphan-safe (5)

  • #6 · Ji et al. 2021 — DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome — ADD Foundation Models: Masked Language Modeling × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)
  • #34 · Andrews et al. 2025 — Designing cultured tissue moulds using evolutionary strategies — re-row SVM × ScaffoldingGenetic Algorithms × Scaffolding (skeptic-verified; paper keeps ≥1 cell)
  • #35 · Andrews et al. 2023 — Rapid prediction of lab-grown tissue properties using deep learning — re-row Deep Learning × ScaffoldingGAN / VAE × Scaffolding (skeptic-verified; paper keeps ≥1 cell)
  • #119 · Rosen, Y., Roohani, Y., Agrawal, A., Samotorčan, L., Tabula Sapiens Consortium, Quake, S. R., & Leskovec, J. 2026 — Universal Cell Embeddings: A Foundation Model for Cell Biology — remove Foundation Models: Masked Language Modeling × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)
  • #121 · Roohani et al. 2024 — Predicting transcriptional outcomes of novel multigene perturbations with GEARS — remove Foundation Models: Cell-State & Perturbation Prediction × Cellular Engineering (skeptic-verified; paper keeps ≥1 cell)

⚠️ Needs your decision — correct method-fix, but would orphan the paper (2)

Each removes the paper's only cell, and its actual technique has no matrix row — a re-row / not-primary / keep-as-approximation call, not an auto-apply:

  • #59 · Antonakoudis & Richelle 2026 — Systematic data-driven genome-scale metabolic model reduction for bioprocess modeling: CHO culture case study — remove Bayesian Optimization × Bioprocess control (true method is not a matrix row). The paper applies Bayesian flux estimation (Bayesian statistical inference via the MetRaC framework) to derive uncertainty-aware rate bounds from exo-metabolomics data. This is Bayesian inference/prob…
  • #60 · Mathieu et al. 2025 — Integrative multi-omics modeling for cultivated meat production, quality, and safety — remove Deep Learning × Cellular Engineering (true method is not a matrix row). The methods text is explicit about the paper's computational approach: 'The causal analysis algorithm employed in this paper scores and ranks interactome nodes based on random walk network propagation…

↩︎ Overturned by the defender — KEEP (6)

Original scope/method removals rejected because the paper is curator-cited and/or the method label is actually correct:

  • #1 · Nikkhah et al. 2023 — Toward sustainable culture media: Using artificial intelligence to optimize reduced-serum formulations for cultivated meat — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
  • #7 · Tamburini et al. 2014 — Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology — proposed remove Deep Learning × Bioprocess control, kept (cited_by_curators=True)
  • #11 · Shen et al. 2024 — Chemometrics methods, sensory evaluation and intelligent sensory technologies combined with GAN-based integrated deep-learning framework to discriminate salted goose breeds — proposed remove CNN × Sensory Prediction, kept (cited_by_curators=True) — note: kept on the cited-prior, but the original "no CNN" concern is unresolved; worth a human re-row check
  • #17 · Cosenza & Block 2021 — A generalizable hybrid search framework for optimizing expensive design problems using surrogate models — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
  • #18 · Cosenza 2022 — Sequential Learning Methods for the Experimental Optimization of Cell Culture Media for Cellular Agriculture — proposed remove Deep Learning × Media Optimization, kept (cited_by_curators=True)
  • #126 · Youngblut et al. 2025 — scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository — proposed re-row Domain-Specific Biomedical Agents × Cellular Engineering, kept (cited_by_curators=False)

▪︎ Kept, no change (19)

All resolved to KEEP at the propose/skeptic stage — including every general-CS / general-biomedical paper the original audit wanted to delete (DESC #5, scGNN #13, SWE-bench #155, GPQA #156, MMLU-Pro #157, #110, #114, …). Pruning general-domain benchmarks would be a separate explicit curatorial decision — the hardened review (correctly) won't propose it on scope grounds.

#5, #13, #22, #33, #40, #53, #68, #90, #103, #110, #114, #118, #122, #151, #152, #155, #156, #157, #196


Hardened run wf_e54f3ded-9ce. Supersedes the un-hardened held-proposals comment above.

Adds a non-destructive taxonomy_gap verdict so the classification audit can
keep a paper that applies a real AI/ML method whose matrix row/column does not
yet exist, and surface a proposed new row/column for curator decision instead
of forcing a wrong cell or orphaning the paper.

- reviewer: taxonomy_gap verdict + precedence ladder (gap is the last resort,
  after re-row into an existing label); method-family precision notes (Bayesian
  Optimization vs Bayesian inference; GNN vs classical network propagation) so a
  step-2 re-row does not grab a superficially-similar row and bury the real gap.
- workflow: taxonomy_gaps schema; per-ref collection that never enters the
  adjudicated change set (so a gap can never become an applied removal); a
  Taxonomy phase that clusters pooled gaps and adversarially verifies clusters
  of >=2 papers into proposed new rows/columns (singletons are parked).
- verify_routing.mjs: deterministic guard — the asymmetric-burden routing
  truth-table, the non-orphan structural invariant, and the >=2 cluster gate.
- docs: SKILL.md (verdict, non-orphan guarantee, run-full-corpus note,
  human-applied rows/columns) and the CLAUDE.md lifecycle entry.

Verified: verify_routing 20/20; behavioral mini-eval over refs 59, 60, 34, 121
(taxonomy gaps for 59 and 60, re-row for 34, drop-redundant-cell for 121) kept
all four papers (no orphaning removal).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant