Skip to content

Add agent-discoverable metadata and spatial-mapping skill for coding agents#442

Open
vitkl wants to merge 7 commits into
masterfrom
agent-discoverability-and-spatial-mapping-skill
Open

Add agent-discoverable metadata and spatial-mapping skill for coding agents#442
vitkl wants to merge 7 commits into
masterfrom
agent-discoverability-and-spatial-mapping-skill

Conversation

@vitkl
Copy link
Copy Markdown
Contributor

@vitkl vitkl commented May 21, 2026

Summary

Make cell2location easier for coding agents (Claude Code, Cursor, Aider, Copilot, Codex) to find and use correctly. Adds four layers of agent-friendliness, all in .claude/skills/ and repo-root metadata files. No Python source under cell2location/ or tests/ is touched. Zero CI risk.

Four layers

  1. AGENTS.md at repo root — single agent landing page. Trigger phrases ("spatial mapping", "spatial deconvolution", etc.), related-tools routing to sibling packages (gerstung-lab/BaSISS, BayraktarLab/GBMspace, BayraktarLab/cell2fate, vitkl/regularizedvi, vitkl/SpaceJam), dual-skill pointer, NO-CODE refusal block.
  2. Discoverability metadata — README tagline + "For coding agents" subsection; setup.cfg PyPI keywords (16) and classifiers (6); CITATION.cff for the 2022 Nature Biotechnology paper.
  3. .claude/skills/spatial-mapping/ — main operating manual. Single skill, format-plan-style (instructions + <reference> tag), dual-mode (interactive AskUserQuestion / autonomous data-driven). Walks the user through 10 phases (mode + data, reference signatures, spatial QC, N_cells_per_location Fig S27 decision tree, detection_alpha, chunking, branch selection master vs hires_sliding_window, model hyperparameters, training + posterior export, launch, aggregation). Bundled reference materials: Fig S1 + Fig S27 PNG extracts, paraphrased supplement §1.2-§1.4 + §2, paraphrased issue corpus from ~25 recurring maintainer answers, full supplement PDF for deeper questions.
  4. .claude/skills/cell2location-troubleshooting/ — companion skill. Matches user symptoms against the harvested issue corpus, gh search fallback for newer issues, drafts (does NOT submit) gh issue create bodies with the diagnostic metadata vitkl normally asks for. Routes biology-interpretation questions to discourse.scverse.org.

Bundled templates

.claude/skills/spatial-mapping/templates/ contains three papermill-parametrised notebooks based on the cell2state_embryo workflow (the only published workflow with correct stratified per-sample chunking and nuclei-occupancy model wiring), simplified for general use and stripped of embryo specifics:

  • step1_reference_signatures.ipynbRegressionModel for batch-corrected signatures.
  • step2_spatial_mapping.ipynb — per-chunk Cell2location; supports both master and hires_sliding_window via runtime-conditional kwargs.
  • step2_aggregate_chunks.ipynb — combine per-chunk outputs.

Plus three launchers (LSF bsub.sh, Slurm sbatch.sh, local run_local.sh) with the same parameter contract, and data/download_mouse_brain.py that fetches the published Kleshchevnikov 2022 mouse-brain dataset (5 Visium + paired snRNA reference) from the public Sanger object store.

Commit structure

  • A `65d6dcc` — AGENTS.md, README.md tagline, setup.cfg keywords/classifiers, CITATION.cff.
  • B `a0c7338` — spatial-mapping SKILL.md + skill README.md + Fig S1+S27 PNGs + paraphrase markdowns + supplement PDF.
  • C `a9cfeaf` — 3 template notebooks + 3 launchers + mouse-brain download helper.
  • D `79ca9f8` — cell2location-troubleshooting SKILL.md + README.md.

Test plan

  • No Python source under cell2location/ or tests/ modified (CI flake8/black/isort/pytest pass trivially).
  • All three template notebooks: valid nbformat 4.5 JSON, exactly one parameters-tagged cell each.
  • All three launchers: valid bash syntax (bash -n clean).
  • download_mouse_brain.py: clean Python compile.
  • CITATION.cff: parses, contains 2022 Nature Biotechnology DOI.
  • setup.cfg: parses, 16 keywords + 6 classifiers added; existing fields untouched.
  • Embryo-specifics grep clean: grep -rE "cell_type_lvl7|FFPE_Cytassist|/nfs/team283|/nemo/lab|/lustre|sectionsRef|suspensionRef|FraqLim|CS17" .claude/skills/ returns no matches.
  • End-to-end smoke test on the published mouse-brain dataset (run locally before merge): python templates/data/download_mouse_brain.py && papermill templates/step1_reference_signatures.ipynb out.ipynb -p ref_h5ad_path .../sc.h5ad -p max_epochs 100 — slow (~20 min on CPU), documented here rather than added to CI to keep CI fast.
  • Optional follow-up PR to scverse/ecosystem-packages: enrich the existing packages/cell2location/meta.yaml description with the skill mention and add tags (spatial-mapping, spatial-deconvolution, visium, visium-hd, agent-friendly).

What this does NOT change

  • No Python module under cell2location/.
  • No test under tests/.
  • No dependency added.
  • No CI workflow modified.
  • No existing README content moved, reworded, or restructured (only two additions).

🤖 Generated with Claude Code

Vitalii and others added 7 commits May 21, 2026 00:33
Make cell2location easier to find and use for coding agents (Claude Code, Cursor,
Aider, Copilot, Codex). Adds: AGENTS.md (single agent landing page with trigger
phrases, related-tools routing to sibling packages, dual-skill pointer, NO-CODE
refusal block); README tagline + "For coding agents" pointer subsection; PyPI
keywords/classifiers for cross-repo discovery; CITATION.cff for the 2022 Nature
Biotechnology paper.
Main operating skill at .claude/skills/spatial-mapping/SKILL.md walks users
through ten phases (mode + data, reference signatures, spatial QC, N_cells_per_location
Fig S27 decision tree, detection_alpha, chunking, branch selection master vs
hires_sliding_window, model hyperparameters, training + posterior export, launch,
aggregation). Supports both interactive AskUserQuestion mode and autonomous
data-driven mode. Forces explicit decisions on the four hyperparameters the
maintainer routinely answers on the issue tracker. Reference materials bundled:
Fig S1 + Fig S27 PNG extracts, paraphrased supplement §1.2-§1.4 + §2,
paraphrased issue corpus from ~25 recurring vitkl answers, full supplement PDF.
…brain data helper

Three papermill-parametrised templates based on the cell2state_embryo
workflow (the only published cell2location workflow with correct stratified
per-sample chunking and nuclei-occupancy model wiring), simplified for
general use and stripped of embryo specifics:
  - step1_reference_signatures.ipynb (RegressionModel)
  - step2_spatial_mapping.ipynb (per-chunk Cell2location; supports both master
    and hires_sliding_window via runtime-conditional kwargs)
  - step2_aggregate_chunks.ipynb (combine per-chunk outputs)

Three launchers (LSF bsub, Slurm sbatch, local single-GPU) with the same
parameter contract — bsub.sh is derived from the embryo bsub.sh.

download_mouse_brain.py fetches the published Kleshchevnikov 2022 mouse-brain
dataset (5 Visium + paired snRNA reference) from the public Sanger object
store, letting users validate against published results before applying to
their own data.
Companion to the main spatial-mapping skill. Three-phase workflow:
(1) match user's symptom against the harvested vitkl-guidance corpus shared
with the main skill; (2) gh search GitHub issues for matches the corpus
snapshot may have missed; (3) draft (not submit) a clean `gh issue create`
body with the diagnostic checklist vitkl normally asks for — environment,
data shape, hyperparameters used, ELBO trajectory, error trace. Routes
biology-interpretation questions to discourse.scverse.org instead of the
issue tracker. Refuses to auto-submit; refuses to include raw user data
in drafted bodies.
…private GBMspace link

Three fixes raised by the initial PR's CI run + post-merge text review:

1. download_mouse_brain.py: remove unused `hashlib`/`os` imports (F401) and the
   trailing `f""` without placeholder (F541) — flake8 now clean.

2. setup.cfg: pin `lightning != 2.6.2, != 2.6.3` (and mirror on `pytorch-lightning`)
   per the supply-chain attack disclosed 2026-04-30 (CVE-2026-44484 /
   GHSA-w37p-236h-pfx3). The compromised versions have been yanked from PyPI but
   the explicit exclusion protects users with stale mirrors or cached wheels.
   scvi-tools pulls lightning transitively; the cell2location pin is defensive.

3. AGENTS.md: drop the `BayraktarLab/GBMspace` link from the related-tools
   routing — the repo is private, so an external user clicking the link from
   public AGENTS.md would 404. BaSISS stays as the cancer-clone routing target.
Inserts a scientific-scope interview as the very first step of the spatial-mapping
workflow and a technical-completeness sweep just before launch. Persists answers in
SPATIAL_MAPPING_CONTEXT.md so future runs (and the troubleshooting skill) inherit
the user's goal, reference, target populations, and failure criteria.

- New skill .claude/skills/cell2location-context/ owns the persistent file
  (auto-discovery across cwd / .claude/ / ~/.claude/plans/; first-creation asks
  where to save). Two modes: --science (7-group rubric) and --technical
  (Phase 1-8 slot sweep + scope-vs-decision cross-check).
- spatial-mapping/SKILL.md: new Phase 0a invokes --science before any technical
  decision; new Phase 8.5 invokes --technical before Phase 9 launch and can block
  launch on hard cross-check failures (e.g. detection_alpha=200 vs failure
  criterion about 10x within-sample variation).
- cell2location-troubleshooting/SKILL.md: new Phase -1 reads BOTH the scope
  (especially declared failure criteria) AND the technical-decisions block before
  classifying the symptom; Phase 3 pre-fills the gh-issue diagnostic template
  from the context file instead of re-asking the user.
- Skip path: users can opt out (recommended copy explains why answering helps)
  or import a prior handoff document as free-form scope.
- Autonomous mode: skips the interview and emits a notebook markdown cell
  documenting the missing scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pip-installed cell2location previously left the bundled Claude / Cursor / Aider
skills inaccessible -- agents only auto-discover skills in cwd or in
~/.claude/skills/. This change makes /spatial-mapping, /cell2location-context,
and /cell2location-troubleshooting slash commands available across all projects
after a one-time:

    cell2location install-skills           # copy
    cell2location install-skills --symlink # or symlink so pip -U flows through

How it works:
- setup.py mirrors .claude/skills/ -> cell2location/_bundled_skills/ at build
  time so the wheel/sdist always ships the skills.
- cell2location/_cli.py resolves the bundled dir first, falls back to the
  source .claude/skills/ tree for editable installs.
- New console_scripts entry point cell2location -> cell2location._cli:main
  exposes list-skills / install-skills (--symlink --force --dry-run) /
  uninstall-skills.
- Installed entries are namespaced cell2location-<skill> in ~/.claude/skills/
  to avoid collisions and make provenance obvious.
- MANIFEST.in + setup.cfg [options.package_data] put _bundled_skills/ in the
  wheel; .gitignore keeps the build-time mirror out of the repo.
- AGENTS.md and README.md document the install flow.
- tests/test_cli.py covers list / dry-run install / install+reinstall+uninstall
  against a temp $HOME (loads _cli.py via importlib so the test runs even when
  the surrounding cell2location import is broken in the local env).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant