Skip to content

BioDSA: Framework for Vibe Prototyping of AI Agents for Biomedicine

License

Notifications You must be signed in to change notification settings

RyanWangZf/BioDSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keiji AI

Nature BME Website Platform BioDSA-1K DeepEvidence

BioDSA: Vibe-Prototype AI Agents for Biomedicine

BioDSA is an open-source framework for rapidly prototyping, optimizing, and benchmarking AI agents for biomedical tasks — from data analysis and literature research to clinical trial matching and drug discovery.

Describe what you want in natural language. Get a working agent in minutes.


Motivation

Building AI agents for biomedicine is hard. A typical agent needs LLM orchestration, access to domain-specific knowledge bases (PubMed, ChEMBL, ClinicalTrials.gov, ...), safe code execution, multi-step reasoning, and structured output — all wired together correctly. Starting from scratch every time is slow and error-prone.

BioDSA solves this by providing:

  • A BaseAgent foundation with built-in LLM support (OpenAI, Anthropic, Azure, Google), Docker-sandboxed code execution, and retry handling — so you focus on the agent logic, not the plumbing
  • LangGraph workflows for composing agent logic as state graphs with conditional edges — supporting ReAct loops, multi-stage pipelines, and multi-agent orchestration
  • 17+ biomedical knowledge base integrations (PubMed, ChEMBL, UniProt, Open Targets, Ensembl, cBioPortal, Reactome, ...) as plug-and-play tools
  • 10 benchmarks with 1,900+ tasks for systematic evaluation
  • Two skill libraries that teach AI coding assistants (Cursor, Claude Code, Codex, Gemini, OpenClaw) to both create new agents and run existing ones — so you can vibe-prototype or execute agents in minutes

Implemented Agents

8 specialized agents have been built and published on BioDSA, spanning data analysis, deep research, literature review, clinical matching, and more:

Agent Type Description Paper Docs
DSWizard Single Two-phase data science agent (planning → implementation) for biomedical data analysis Nature BME README | Tutorial
DeepEvidence Multi-agent Hierarchical orchestrator + BFS/DFS sub-agents for deep research across 17+ knowledge bases arXiv README | Tutorial
TrialMind-SLR Multi-stage Systematic literature review with 4-stage workflow (search, screen, extract, synthesize) npj Digit. Med. README | Tutorial
InformGen Multi-stage Clinical document generation with iterative write-review-revise workflow JAMIA README | Tutorial
TrialGPT Multi-stage Patient-to-trial matching with retrieval and eligibility scoring Nature Comm. README | Tutorial
AgentMD Pipeline Clinical risk prediction using large-scale toolkit of 2,164+ clinical calculators Nature Comm. README | Tutorial
GeneAgent Single Self-verification agent for gene set analysis with database-backed claim verification Nature Methods README | Tutorial
Virtual Lab Multi-participant Multi-agent meeting system for AI-powered scientific research discussions Nature README | Tutorial

Flow: From Idea to Working Agent

BioDSA supports three paths — manual (write code yourself), vibe-prototyping (let an AI assistant build a new agent), and vibe-executing (let an AI assistant run an existing agent on your task).

Path A: Vibe-Prototype a New Agent

 ┌──────────────────────────────────────────────────────────────┐
 │  1. INSTALL SKILLS                                          │
 │     ./install-cursor.sh   (or claude-code/codex/gemini)     │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  2. DESCRIBE YOUR AGENT                                     │
 │     "Build an agent that searches PubMed and ClinicalTrials │
 │      to find competing trials for a drug candidate"         │
 │                                                             │
 │     Optionally attach: reference paper, design docs,        │
 │     or point to a benchmark dataset                         │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  3. REVIEW THE DESIGN PROPOSAL                              │
 │     AI proposes: pattern, workflow diagram, tools, state     │
 │     You: confirm, adjust, or ask questions                  │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  4. AI GENERATES THE AGENT                                  │
 │     biodsa/agents/<name>/                                   │
 │       ├── agent.py, state.py, prompt.py, tools.py           │
 │       ├── README.md + DESIGN.md (with Mermaid diagrams)     │
 │     run_<name>.py                                           │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  5. RUN & ITERATE                                           │
 │     python run_<name>.py                                    │
 │     Evaluate on benchmarks, refine prompts/tools/logic      │
 └──────────────────────────────────────────────────────────────┘

Path B: Vibe-Execute an Existing Agent

 ┌──────────────────────────────────────────────────────────────┐
 │  1. INSTALL SKILLS (same as above)                          │
 │     ./install-cursor.sh   (or claude-code/codex/gemini)     │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  2. DESCRIBE YOUR TASK                                      │
 │     "Run DeepEvidenceAgent to research EGFR inhibitor       │
 │      resistance mechanisms in lung cancer"                  │
 │                                                             │
 │     "Write a batch eval script for SLRMetaAgent on my       │
 │      benchmark dataset at benchmarks/TrialPanoramaBench/"   │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  3. AI PICKS THE AGENT & WRITES THE SCRIPT                  │
 │     Selects the right agent, configures it, handles output  │
 │     → run_task.py  (single or batch execution)              │
 └──────────────────┬───────────────────────────────────────────┘
                    ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  4. COLLECT DELIVERABLES                                    │
 │     JSON results, PDF report, downloaded artifacts          │
 │     python run_task.py                                      │
 └──────────────────────────────────────────────────────────────┘

Install Skills

./install-cursor.sh        # Cursor (project-level)
./install-claude-code.sh   # Claude Code (global)
./install-codex.sh         # Codex CLI (global)
./install-gemini.sh        # Gemini CLI (global)
./install-openclaw.sh      # OpenClaw (global)

Each installer installs both skill sets (agent development + agent execution). All installers support --project, --uninstall, --dry-run, and --verbose flags.

Manual installation & uninstall

Copy the .md files from both skill source directories to your tool's skills directory:

Tool Target Base Directory
Cursor <project>/.cursor/skills/
Claude Code (global) ~/.claude/skills/
Claude Code (project) <project>/.claude/skills/
Codex CLI (global) ~/.codex/skills/
Gemini CLI (global) ~/.gemini/skills/
OpenClaw (global) ~/.openclaw/skills/

Inside the target base, create two folders:

  • biodsa-agent-development/ — copy files from biodsa-agent-dev-skills/
  • biodsa-agent-execution/ — copy files from biodsa-agent-exec-skills/

To uninstall, run any installer with --uninstall, or delete both folders from your tool's skills directory.

Example Prompts

Creating new agents (uses dev skills):

"Create an agent called DrugRepurposing that searches PubMed, ChEMBL,
 and Open Targets for drug repurposing opportunities."

"Here is a paper on clinical evidence synthesis (~/papers/synthesis.pdf).
 Build the agent and evaluate it on benchmarks/TrialPanoramaBench/"

"Build a multi-agent system where an orchestrator delegates gene analysis
 to a BFS sub-agent and pathway analysis to a DFS sub-agent."

Running existing agents (uses exec skills):

"Run DeepEvidenceAgent to research EGFR inhibitor resistance in NSCLC"

"Write a script that uses DSWizardAgent to analyze the cBioPortal BRCA
 dataset and generate a PDF report."

"Batch-evaluate SLRMetaAgent on 10 systematic review questions and
 collect results as JSON."

"Use TrialGPTAgent to match this patient note to clinical trials."

Path C: Build Manually

git clone https://github.com/RyanWangZf/BioDSA.git
cd BioDSA
pip install pipenv && pipenv install && pipenv shell

Create a .env file with your API keys:

OPENAI_API_KEY=your_key_here
# Or: AZURE_OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY

Then extend BaseAgent and define your workflow as a LangGraph state graph:

import os
from biodsa.agents import DSWizardAgent

agent = DSWizardAgent(
    model_name="gpt-5",
    api_type="openai",
    api_key=os.environ["OPENAI_API_KEY"]
)

agent.register_workspace("./biomedical_data/cBioPortal/datasets/acbc_mskcc_2015")
results = agent.go("Perform survival analysis for TP53 mutant vs wild-type patients")

See tutorials/ for Jupyter notebooks covering each agent.


Output Example

Every agent returns an ExecutionResults object with a structured trace of the full run:

results = agent.go("Analyze TP53 mutation patterns in breast cancer")

# The agent's final answer
print(results.final_response)

# Full conversation trace (all LLM calls, tool outputs, reasoning steps)
print(results.message_history)

# Any code the agent wrote and executed in the sandbox
print(results.code_execution_results)

# Export a PDF report with figures, code, and narrative
results.to_pdf(output_dir="reports")

# Export structured JSON
results.to_json(output_path="results/analysis.json")

# Download generated artifacts (plots, tables, etc.)
results.download_artifacts(output_dir="artifacts")

The PDF report includes the agent's reasoning, executed code blocks, generated figures, and final conclusions — ready to share with collaborators.

Benchmarking

Evaluate agents on 10 benchmarks covering hypothesis validation, code generation, reasoning, and evidence synthesis:

Benchmark Tasks Type
BioDSA-1K 1,029 Hypothesis validation
BioDSBench (Python + R) 293 Code generation
HLE-Biomedicine / Medicine 70 Hard reasoning QA
LabBench 75 Literature & database QA
SuperGPQA 172 Expert-level QA
TrialPanoramaBench 50 Evidence synthesis
TRQA-lit 172 Translational research QA

See benchmarks/README.md for dataset details and loading instructions.


Repository Structure

BioDSA/
├── biodsa/                          # Core framework
│   ├── agents/                      #   Agent implementations (8 published + base classes)
│   ├── tools/                       #   Low-level API tools (17+ knowledge bases)
│   ├── tool_wrappers/               #   LangChain tool wrappers
│   ├── sandbox/                     #   Docker sandbox & ExecutionResults
│   └── memory/                      #   Memory graph system
├── benchmarks/                      # 10 evaluation benchmarks (1,900+ tasks)
├── tutorials/                       # Jupyter notebook tutorials for each agent
├── scripts/                         # Example run scripts
├── biodsa-agent-dev-skills/         # Skill library: creating new agents
├── biodsa-agent-exec-skills/        # Skill library: running existing agents
├── install-*.sh                     # One-command installers (Cursor, Claude, Codex, Gemini, OpenClaw)
├── biodsa_env/                      # Docker sandbox build files
├── tests/                           # Tool and integration tests
└── biomedical_data/                 # Example datasets (cBioPortal, Open Targets)

Reference

If you use BioDSA in your research, please cite:

@article{wang2026reliable,
  title={Making large language models reliable data science programming copilots for biomedical research},
  author={Wang, Zifeng and Danek, Benjamin and Yang, Ziwei and Chen, Zheng and Sun, Jimeng},
  journal={Nature Biomedical Engineering},
  year={2026},
  doi={10.1038/s41551-025-01587-2}
}

@article{wang2026deepevidence,
  title={DeepEvidence: Empowering Biomedical Discovery with Deep Knowledge Graph Research},
  author={Wang, Zifeng and Chen, Zheng and Yang, Ziwei and Wang, Xuan and Jin, Qiao and Peng, Yifan and Lu, Zhiyong and Sun, Jimeng},
  journal={arXiv preprint arXiv:2601.11560},
  year={2026}
}

Documentation: tutorials/ | biodsa-agent-dev-skills/ | biodsa-agent-exec-skills/ | benchmarks/ | biodsa_env/

Links: biodsa.github.io | Keiji AI | BioDSA-1K | DeepEvidence | TrialReviewBench

License: LICENSE

About

BioDSA: Framework for Vibe Prototyping of AI Agents for Biomedicine

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published