BioDSA: Vibe-Prototype AI Agents for Biomedicine

BioDSA is an open-source framework for rapidly prototyping, optimizing, and benchmarking AI agents for biomedical tasks — from data analysis and literature research to clinical trial matching and drug discovery.

Describe what you want in natural language. Get a working agent in minutes.

Motivation

Building AI agents for biomedicine is hard. A typical agent needs LLM orchestration, access to domain-specific knowledge bases (PubMed, ChEMBL, ClinicalTrials.gov, ...), safe code execution, multi-step reasoning, and structured output — all wired together correctly. Starting from scratch every time is slow and error-prone.

BioDSA solves this by providing:

A BaseAgent foundation with built-in LLM support (OpenAI, Anthropic, Azure, Google), Docker-sandboxed code execution, and retry handling — so you focus on the agent logic, not the plumbing
LangGraph workflows for composing agent logic as state graphs with conditional edges — supporting ReAct loops, multi-stage pipelines, and multi-agent orchestration
17+ biomedical knowledge base integrations (PubMed, ChEMBL, UniProt, Open Targets, Ensembl, cBioPortal, Reactome, ...) as plug-and-play tools
10 benchmarks with 1,900+ tasks for systematic evaluation
Two skill libraries that teach AI coding assistants (Cursor, Claude Code, Codex, Gemini, OpenClaw) to both create new agents and run existing ones — so you can vibe-prototype or execute agents in minutes

Implemented Agents

8 specialized agents have been built and published on BioDSA, spanning data analysis, deep research, literature review, clinical matching, and more:

Agent	Type	Description	Paper	Docs
DSWizard	Single	Two-phase data science agent (planning → implementation) for biomedical data analysis	Nature BME	README \| Tutorial
DeepEvidence	Multi-agent	Hierarchical orchestrator + BFS/DFS sub-agents for deep research across 17+ knowledge bases	arXiv	README \| Tutorial
TrialMind-SLR	Multi-stage	Systematic literature review with 4-stage workflow (search, screen, extract, synthesize)	npj Digit. Med.	README \| Tutorial
InformGen	Multi-stage	Clinical document generation with iterative write-review-revise workflow	JAMIA	README \| Tutorial
TrialGPT	Multi-stage	Patient-to-trial matching with retrieval and eligibility scoring	Nature Comm.	README \| Tutorial
AgentMD	Pipeline	Clinical risk prediction using large-scale toolkit of 2,164+ clinical calculators	Nature Comm.	README \| Tutorial
GeneAgent	Single	Self-verification agent for gene set analysis with database-backed claim verification	Nature Methods	README \| Tutorial
Virtual Lab	Multi-participant	Multi-agent meeting system for AI-powered scientific research discussions	Nature	README \| Tutorial

Flow: From Idea to Working Agent

BioDSA supports three paths — manual (write code yourself), vibe-prototyping (let an AI assistant build a new agent), and vibe-executing (let an AI assistant run an existing agent on your task).

Path A: Vibe-Prototype a New Agent

 ┌──────────────────────────────────────────────────────────────┐
 │  1. INSTALL SKILLS                                          │
 │     ./install-cursor.sh   (or claude-code/codex/gemini)     │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  2. DESCRIBE YOUR AGENT                                     │
 │     "Build an agent that searches PubMed and ClinicalTrials │
 │      to find competing trials for a drug candidate"         │
 │                                                             │
 │     Optionally attach: reference paper, design docs,        │
 │     or point to a benchmark dataset                         │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  3. REVIEW THE DESIGN PROPOSAL                              │
 │     AI proposes: pattern, workflow diagram, tools, state     │
 │     You: confirm, adjust, or ask questions                  │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  4. AI GENERATES THE AGENT                                  │
 │     biodsa/agents/<name>/                                   │
 │       ├── agent.py, state.py, prompt.py, tools.py           │
 │       ├── README.md + DESIGN.md (with Mermaid diagrams)     │
 │     run_<name>.py                                           │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  5. RUN & ITERATE                                           │
 │     python run_<name>.py                                    │
 │     Evaluate on benchmarks, refine prompts/tools/logic      │
 └──────────────────────────────────────────────────────────────┘

Path B: Vibe-Execute an Existing Agent

 ┌──────────────────────────────────────────────────────────────┐
 │  1. INSTALL SKILLS (same as above)                          │
 │     ./install-cursor.sh   (or claude-code/codex/gemini)     │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  2. DESCRIBE YOUR TASK                                      │
 │     "Run DeepEvidenceAgent to research EGFR inhibitor       │
 │      resistance mechanisms in lung cancer"                  │
 │                                                             │
 │     "Write a batch eval script for SLRMetaAgent on my       │
 │      benchmark dataset at benchmarks/TrialPanoramaBench/"   │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  3. AI PICKS THE AGENT & WRITES THE SCRIPT                  │
 │     Selects the right agent, configures it, handles output  │
 │     → run_task.py  (single or batch execution)              │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  4. COLLECT DELIVERABLES                                    │
 │     JSON results, PDF report, downloaded artifacts          │
 │     python run_task.py                                      │
 └──────────────────────────────────────────────────────────────┘

Install Skills

./install-cursor.sh        # Cursor (project-level)
./install-claude-code.sh   # Claude Code (global)
./install-codex.sh         # Codex CLI (global)
./install-gemini.sh        # Gemini CLI (global)
./install-openclaw.sh      # OpenClaw (global)

Each installer installs both skill sets (agent development + agent execution). All installers support --project, --uninstall, --dry-run, and --verbose flags.

Manual installation & uninstall

Copy the .md files from both skill source directories to your tool's skills directory:

Tool	Target Base Directory
Cursor	`<project>/.cursor/skills/`
Claude Code (global)	`~/.claude/skills/`
Claude Code (project)	`<project>/.claude/skills/`
Codex CLI (global)	`~/.codex/skills/`
Gemini CLI (global)	`~/.gemini/skills/`
OpenClaw (global)	`~/.openclaw/skills/`

Inside the target base, create two folders:

biodsa-agent-development/ — copy files from biodsa-agent-dev-skills/
biodsa-agent-execution/ — copy files from biodsa-agent-exec-skills/

To uninstall, run any installer with --uninstall, or delete both folders from your tool's skills directory.

Example Prompts

Creating new agents (uses dev skills):

"Create an agent called DrugRepurposing that searches PubMed, ChEMBL,
 and Open Targets for drug repurposing opportunities."

"Here is a paper on clinical evidence synthesis (~/papers/synthesis.pdf).
 Build the agent and evaluate it on benchmarks/TrialPanoramaBench/"

"Build a multi-agent system where an orchestrator delegates gene analysis
 to a BFS sub-agent and pathway analysis to a DFS sub-agent."

Running existing agents (uses exec skills):

"Run DeepEvidenceAgent to research EGFR inhibitor resistance in NSCLC"

"Write a script that uses DSWizardAgent to analyze the cBioPortal BRCA
 dataset and generate a PDF report."

"Batch-evaluate SLRMetaAgent on 10 systematic review questions and
 collect results as JSON."

"Use TrialGPTAgent to match this patient note to clinical trials."

Path C: Build Manually

git clone https://github.com/RyanWangZf/BioDSA.git
cd BioDSA
pip install pipenv && pipenv install && pipenv shell

Create a .env file with your API keys:

OPENAI_API_KEY=your_key_here
# Or: AZURE_OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY

Then extend BaseAgent and define your workflow as a LangGraph state graph:

import os
from biodsa.agents import DSWizardAgent

agent = DSWizardAgent(
    model_name="gpt-5",
    api_type="openai",
    api_key=os.environ["OPENAI_API_KEY"]
)

agent.register_workspace("./biomedical_data/cBioPortal/datasets/acbc_mskcc_2015")
results = agent.go("Perform survival analysis for TP53 mutant vs wild-type patients")

See tutorials/ for Jupyter notebooks covering each agent.

Output Example

Every agent returns an ExecutionResults object with a structured trace of the full run:

results = agent.go("Analyze TP53 mutation patterns in breast cancer")

# The agent's final answer
print(results.final_response)

# Full conversation trace (all LLM calls, tool outputs, reasoning steps)
print(results.message_history)

# Any code the agent wrote and executed in the sandbox
print(results.code_execution_results)

# Export a PDF report with figures, code, and narrative
results.to_pdf(output_dir="reports")

# Export structured JSON
results.to_json(output_path="results/analysis.json")

# Download generated artifacts (plots, tables, etc.)
results.download_artifacts(output_dir="artifacts")

The PDF report includes the agent's reasoning, executed code blocks, generated figures, and final conclusions — ready to share with collaborators.

Benchmarking

Evaluate agents on 10 benchmarks covering hypothesis validation, code generation, reasoning, and evidence synthesis:

Benchmark	Tasks	Type
BioDSA-1K	1,029	Hypothesis validation
BioDSBench (Python + R)	293	Code generation
HLE-Biomedicine / Medicine	70	Hard reasoning QA
LabBench	75	Literature & database QA
SuperGPQA	172	Expert-level QA
TrialPanoramaBench	50	Evidence synthesis
TRQA-lit	172	Translational research QA

See benchmarks/README.md for dataset details and loading instructions.

Repository Structure

BioDSA/
├── biodsa/                          # Core framework
│   ├── agents/                      #   Agent implementations (8 published + base classes)
│   ├── tools/                       #   Low-level API tools (17+ knowledge bases)
│   ├── tool_wrappers/               #   LangChain tool wrappers
│   ├── sandbox/                     #   Docker sandbox & ExecutionResults
│   └── memory/                      #   Memory graph system
├── benchmarks/                      # 10 evaluation benchmarks (1,900+ tasks)
├── tutorials/                       # Jupyter notebook tutorials for each agent
├── scripts/                         # Example run scripts
├── biodsa-agent-dev-skills/         # Skill library: creating new agents
├── biodsa-agent-exec-skills/        # Skill library: running existing agents
├── install-*.sh                     # One-command installers (Cursor, Claude, Codex, Gemini, OpenClaw)
├── biodsa_env/                      # Docker sandbox build files
├── tests/                           # Tool and integration tests
└── biomedical_data/                 # Example datasets (cBioPortal, Open Targets)

Reference

If you use BioDSA in your research, please cite:

@article{wang2026reliable,
  title={Making large language models reliable data science programming copilots for biomedical research},
  author={Wang, Zifeng and Danek, Benjamin and Yang, Ziwei and Chen, Zheng and Sun, Jimeng},
  journal={Nature Biomedical Engineering},
  year={2026},
  doi={10.1038/s41551-025-01587-2}
}

@article{wang2026deepevidence,
  title={DeepEvidence: Empowering Biomedical Discovery with Deep Knowledge Graph Research},
  author={Wang, Zifeng and Chen, Zheng and Yang, Ziwei and Wang, Xuan and Jin, Qiao and Peng, Yifan and Lu, Zhiyong and Sun, Jimeng},
  journal={arXiv preprint arXiv:2601.11560},
  year={2026}
}

Documentation: tutorials/ | biodsa-agent-dev-skills/ | biodsa-agent-exec-skills/ | benchmarks/ | biodsa_env/

Links: biodsa.github.io | Keiji AI | BioDSA-1K | DeepEvidence | TrialReviewBench

License: LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioDSA: Vibe-Prototype AI Agents for Biomedicine

Motivation

Implemented Agents

Flow: From Idea to Working Agent

Path A: Vibe-Prototype a New Agent

Path B: Vibe-Execute an Existing Agent

Install Skills

Example Prompts

Path C: Build Manually

Output Example

Benchmarking

Repository Structure

Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
benchmarks		benchmarks
biodsa-agent-dev-skills		biodsa-agent-dev-skills
biodsa-agent-exec-skills		biodsa-agent-exec-skills
biodsa		biodsa
biodsa_env		biodsa_env
biomedical_data/cBioPortal		biomedical_data/cBioPortal
figs		figs
scripts		scripts
tests		tests
tutorials		tutorials
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
biodsa-logo.svg		biodsa-logo.svg
install-claude-code.sh		install-claude-code.sh
install-codex.sh		install-codex.sh
install-common.sh		install-common.sh
install-cursor.sh		install-cursor.sh
install-gemini.sh		install-gemini.sh
install-openclaw.sh		install-openclaw.sh

License

RyanWangZf/BioDSA

Folders and files

Latest commit

History

Repository files navigation

BioDSA: Vibe-Prototype AI Agents for Biomedicine

Motivation

Implemented Agents

Flow: From Idea to Working Agent

Path A: Vibe-Prototype a New Agent

Path B: Vibe-Execute an Existing Agent

Install Skills

Example Prompts

Path C: Build Manually

Output Example

Benchmarking

Repository Structure

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages