BioDSA is an open-source framework for rapidly prototyping, optimizing, and benchmarking AI agents for biomedical tasks — from data analysis and literature research to clinical trial matching and drug discovery.
Describe what you want in natural language. Get a working agent in minutes.
Building AI agents for biomedicine is hard. A typical agent needs LLM orchestration, access to domain-specific knowledge bases (PubMed, ChEMBL, ClinicalTrials.gov, ...), safe code execution, multi-step reasoning, and structured output — all wired together correctly. Starting from scratch every time is slow and error-prone.
BioDSA solves this by providing:
- A
BaseAgentfoundation with built-in LLM support (OpenAI, Anthropic, Azure, Google), Docker-sandboxed code execution, and retry handling — so you focus on the agent logic, not the plumbing - LangGraph workflows for composing agent logic as state graphs with conditional edges — supporting ReAct loops, multi-stage pipelines, and multi-agent orchestration
- 17+ biomedical knowledge base integrations (PubMed, ChEMBL, UniProt, Open Targets, Ensembl, cBioPortal, Reactome, ...) as plug-and-play tools
- 10 benchmarks with 1,900+ tasks for systematic evaluation
- Two skill libraries that teach AI coding assistants (Cursor, Claude Code, Codex, Gemini, OpenClaw) to both create new agents and run existing ones — so you can vibe-prototype or execute agents in minutes
8 specialized agents have been built and published on BioDSA, spanning data analysis, deep research, literature review, clinical matching, and more:
| Agent | Type | Description | Paper | Docs |
|---|---|---|---|---|
| DSWizard | Single | Two-phase data science agent (planning → implementation) for biomedical data analysis | Nature BME | README | Tutorial |
| DeepEvidence | Multi-agent | Hierarchical orchestrator + BFS/DFS sub-agents for deep research across 17+ knowledge bases | arXiv | README | Tutorial |
| TrialMind-SLR | Multi-stage | Systematic literature review with 4-stage workflow (search, screen, extract, synthesize) | npj Digit. Med. | README | Tutorial |
| InformGen | Multi-stage | Clinical document generation with iterative write-review-revise workflow | JAMIA | README | Tutorial |
| TrialGPT | Multi-stage | Patient-to-trial matching with retrieval and eligibility scoring | Nature Comm. | README | Tutorial |
| AgentMD | Pipeline | Clinical risk prediction using large-scale toolkit of 2,164+ clinical calculators | Nature Comm. | README | Tutorial |
| GeneAgent | Single | Self-verification agent for gene set analysis with database-backed claim verification | Nature Methods | README | Tutorial |
| Virtual Lab | Multi-participant | Multi-agent meeting system for AI-powered scientific research discussions | Nature | README | Tutorial |
BioDSA supports three paths — manual (write code yourself), vibe-prototyping (let an AI assistant build a new agent), and vibe-executing (let an AI assistant run an existing agent on your task).
┌──────────────────────────────────────────────────────────────┐
│ 1. INSTALL SKILLS │
│ ./install-cursor.sh (or claude-code/codex/gemini) │
└──────────────────┬───────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ 2. DESCRIBE YOUR AGENT │
│ "Build an agent that searches PubMed and ClinicalTrials │
│ to find competing trials for a drug candidate" │
│ │
│ Optionally attach: reference paper, design docs, │
│ or point to a benchmark dataset │
└──────────────────┬───────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ 3. REVIEW THE DESIGN PROPOSAL │
│ AI proposes: pattern, workflow diagram, tools, state │
│ You: confirm, adjust, or ask questions │
└──────────────────┬───────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ 4. AI GENERATES THE AGENT │
│ biodsa/agents/<name>/ │
│ ├── agent.py, state.py, prompt.py, tools.py │
│ ├── README.md + DESIGN.md (with Mermaid diagrams) │
│ run_<name>.py │
└──────────────────┬───────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ 5. RUN & ITERATE │
│ python run_<name>.py │
│ Evaluate on benchmarks, refine prompts/tools/logic │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ 1. INSTALL SKILLS (same as above) │
│ ./install-cursor.sh (or claude-code/codex/gemini) │
└──────────────────┬───────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ 2. DESCRIBE YOUR TASK │
│ "Run DeepEvidenceAgent to research EGFR inhibitor │
│ resistance mechanisms in lung cancer" │
│ │
│ "Write a batch eval script for SLRMetaAgent on my │
│ benchmark dataset at benchmarks/TrialPanoramaBench/" │
└──────────────────┬───────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ 3. AI PICKS THE AGENT & WRITES THE SCRIPT │
│ Selects the right agent, configures it, handles output │
│ → run_task.py (single or batch execution) │
└──────────────────┬───────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ 4. COLLECT DELIVERABLES │
│ JSON results, PDF report, downloaded artifacts │
│ python run_task.py │
└──────────────────────────────────────────────────────────────┘
./install-cursor.sh # Cursor (project-level)
./install-claude-code.sh # Claude Code (global)
./install-codex.sh # Codex CLI (global)
./install-gemini.sh # Gemini CLI (global)
./install-openclaw.sh # OpenClaw (global)Each installer installs both skill sets (agent development + agent execution). All installers support --project, --uninstall, --dry-run, and --verbose flags.
Manual installation & uninstall
Copy the .md files from both skill source directories to your tool's skills directory:
| Tool | Target Base Directory |
|---|---|
| Cursor | <project>/.cursor/skills/ |
| Claude Code (global) | ~/.claude/skills/ |
| Claude Code (project) | <project>/.claude/skills/ |
| Codex CLI (global) | ~/.codex/skills/ |
| Gemini CLI (global) | ~/.gemini/skills/ |
| OpenClaw (global) | ~/.openclaw/skills/ |
Inside the target base, create two folders:
biodsa-agent-development/— copy files frombiodsa-agent-dev-skills/biodsa-agent-execution/— copy files frombiodsa-agent-exec-skills/
To uninstall, run any installer with --uninstall, or delete both folders from your tool's skills directory.
Creating new agents (uses dev skills):
"Create an agent called DrugRepurposing that searches PubMed, ChEMBL,
and Open Targets for drug repurposing opportunities."
"Here is a paper on clinical evidence synthesis (~/papers/synthesis.pdf).
Build the agent and evaluate it on benchmarks/TrialPanoramaBench/"
"Build a multi-agent system where an orchestrator delegates gene analysis
to a BFS sub-agent and pathway analysis to a DFS sub-agent."
Running existing agents (uses exec skills):
"Run DeepEvidenceAgent to research EGFR inhibitor resistance in NSCLC"
"Write a script that uses DSWizardAgent to analyze the cBioPortal BRCA
dataset and generate a PDF report."
"Batch-evaluate SLRMetaAgent on 10 systematic review questions and
collect results as JSON."
"Use TrialGPTAgent to match this patient note to clinical trials."
git clone https://github.com/RyanWangZf/BioDSA.git
cd BioDSA
pip install pipenv && pipenv install && pipenv shellCreate a .env file with your API keys:
OPENAI_API_KEY=your_key_here
# Or: AZURE_OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEYThen extend BaseAgent and define your workflow as a LangGraph state graph:
import os
from biodsa.agents import DSWizardAgent
agent = DSWizardAgent(
model_name="gpt-5",
api_type="openai",
api_key=os.environ["OPENAI_API_KEY"]
)
agent.register_workspace("./biomedical_data/cBioPortal/datasets/acbc_mskcc_2015")
results = agent.go("Perform survival analysis for TP53 mutant vs wild-type patients")See tutorials/ for Jupyter notebooks covering each agent.
Every agent returns an ExecutionResults object with a structured trace of the full run:
results = agent.go("Analyze TP53 mutation patterns in breast cancer")
# The agent's final answer
print(results.final_response)
# Full conversation trace (all LLM calls, tool outputs, reasoning steps)
print(results.message_history)
# Any code the agent wrote and executed in the sandbox
print(results.code_execution_results)
# Export a PDF report with figures, code, and narrative
results.to_pdf(output_dir="reports")
# Export structured JSON
results.to_json(output_path="results/analysis.json")
# Download generated artifacts (plots, tables, etc.)
results.download_artifacts(output_dir="artifacts")The PDF report includes the agent's reasoning, executed code blocks, generated figures, and final conclusions — ready to share with collaborators.
Evaluate agents on 10 benchmarks covering hypothesis validation, code generation, reasoning, and evidence synthesis:
| Benchmark | Tasks | Type |
|---|---|---|
| BioDSA-1K | 1,029 | Hypothesis validation |
| BioDSBench (Python + R) | 293 | Code generation |
| HLE-Biomedicine / Medicine | 70 | Hard reasoning QA |
| LabBench | 75 | Literature & database QA |
| SuperGPQA | 172 | Expert-level QA |
| TrialPanoramaBench | 50 | Evidence synthesis |
| TRQA-lit | 172 | Translational research QA |
See benchmarks/README.md for dataset details and loading instructions.
BioDSA/
├── biodsa/ # Core framework
│ ├── agents/ # Agent implementations (8 published + base classes)
│ ├── tools/ # Low-level API tools (17+ knowledge bases)
│ ├── tool_wrappers/ # LangChain tool wrappers
│ ├── sandbox/ # Docker sandbox & ExecutionResults
│ └── memory/ # Memory graph system
├── benchmarks/ # 10 evaluation benchmarks (1,900+ tasks)
├── tutorials/ # Jupyter notebook tutorials for each agent
├── scripts/ # Example run scripts
├── biodsa-agent-dev-skills/ # Skill library: creating new agents
├── biodsa-agent-exec-skills/ # Skill library: running existing agents
├── install-*.sh # One-command installers (Cursor, Claude, Codex, Gemini, OpenClaw)
├── biodsa_env/ # Docker sandbox build files
├── tests/ # Tool and integration tests
└── biomedical_data/ # Example datasets (cBioPortal, Open Targets)
If you use BioDSA in your research, please cite:
@article{wang2026reliable,
title={Making large language models reliable data science programming copilots for biomedical research},
author={Wang, Zifeng and Danek, Benjamin and Yang, Ziwei and Chen, Zheng and Sun, Jimeng},
journal={Nature Biomedical Engineering},
year={2026},
doi={10.1038/s41551-025-01587-2}
}
@article{wang2026deepevidence,
title={DeepEvidence: Empowering Biomedical Discovery with Deep Knowledge Graph Research},
author={Wang, Zifeng and Chen, Zheng and Yang, Ziwei and Wang, Xuan and Jin, Qiao and Peng, Yifan and Lu, Zhiyong and Sun, Jimeng},
journal={arXiv preprint arXiv:2601.11560},
year={2026}
}Documentation: tutorials/ | biodsa-agent-dev-skills/ | biodsa-agent-exec-skills/ | benchmarks/ | biodsa_env/
Links: biodsa.github.io | Keiji AI | BioDSA-1K | DeepEvidence | TrialReviewBench
License: LICENSE