Structuring Large Language Models as Agentic Workflows Improves Diagnostic Performance and Physician Assistance

This repository contains the code accompanying the paper:
"Structuring Large Language Models as Agentic Workflows Improves Diagnostic Performance and Physician Assistance".

Requirements

The code is released under an open-source license and enables full replication of our analyses once access to the MIMIC-IV-CDM dataset has been obtained. To reproduce the experiments, two requirements must be fulfilled: 1.Dataset access: Users must obtain independent access to the MIMIC-IV-CDM dataset via PhysioNet, in compliance with the data use agreement. 2.Cloud service configuration: Some components of our workflow require connecting to a private cloud service API in order to enable large language model (LLM) interaction with the MIMIC-IV-CDM dataset. This API service must adhere to the PhysioNet data use policy, including disabling any content retention, logging, or review functions that would result in secondary storage or inspection of protected data.

Main Components

Agentic Diagnostic Workflow (ADW)

The main implementation of ADW is in AGAP/auto_diagnosis.py.
subset_ids_1.csv is one of the sampled case id subset for reproducing the experiment easier.

At lines 699–700, the workflow is initialized and executed:

graph = AgentGraph(nodes=nodes, edges=edges, prompt_mode='strict')
graph.run_subset(seed=1)

The parameter prompt_mode controls the prompting variant. Available options:
- 'strict'
- 'lenient'
- 'own'

Full Dataset Execution

AGAP_own_full_dataset.py: Runs ADW on the full dataset with the specified settings.
prompts.py: All the LLM method prompts are here.

Baseline Comparisons

diagnosis_with_full_information.py: Runs Chain-of-Thought (CoT) and Vanilla Prompting baselines.

Deep Learning Baselines

lm_classification/: Contains code for Clinical Longformer and Longformer baselines.
data_processing.ipynb: Handles preprocessing of the dataset for classification models.

Human-in-the-Loop Experiment

First, set up the data in path: /Human_Examination/data, detailed see file: streamlit_app.spec(or streamlit_wo_ADW_app.spec).
Second, install the application with env ins_human_interface: pyinstaller streamlit_app.spec

Citation

If you use this code, please cite our paper:

@article{,
  title={Structuring Large Language Models as Agentic Workflows Improves Diagnostic Performance and Physician Assistance},
  author={...},
  journal={...},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
AGAP		AGAP
GEE		GEE
Human_Examination		Human_Examination
lm_classification		lm_classification
statistics		statistics
.gitignore		.gitignore
ADW_env.yaml		ADW_env.yaml
Appendicitis_diagnostic_criteria.txt		Appendicitis_diagnostic_criteria.txt
Cholecystitis_diagnostic_criteria.txt		Cholecystitis_diagnostic_criteria.txt
Diverticulitis_diagnostic_criteria.txt		Diverticulitis_diagnostic_criteria.txt
Pancreatitis_diagnostic_criteria.txt		Pancreatitis_diagnostic_criteria.txt
README.md		README.md
diag_differ.txt		diag_differ.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structuring Large Language Models as Agentic Workflows Improves Diagnostic Performance and Physician Assistance

Requirements

Main Components

Agentic Diagnostic Workflow (ADW)

Full Dataset Execution

Baseline Comparisons

Deep Learning Baselines

Human-in-the-Loop Experiment

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Structuring Large Language Models as Agentic Workflows Improves Diagnostic Performance and Physician Assistance

Requirements

Main Components

Agentic Diagnostic Workflow (ADW)

Full Dataset Execution

Baseline Comparisons

Deep Learning Baselines

Human-in-the-Loop Experiment

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages