Skip to content

AI-powered predictive pipeline to forecast medication adherence risk using real-world refill data from chronic disease patients. Built for research, explainability, and publication.

Notifications You must be signed in to change notification settings

mathachew7/MedAdhereAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 

Repository files navigation

🧠 MedAdhereAI: Predicting Medication Adherence Risk Using Real-World Data

Python License: MIT

MedAdhereAI is a research-grade machine learning pipeline built to predict the risk of medication non-adherence among patients with chronic conditions like diabetes and hypertension. Using real-world claims and refill data from a developing nation's healthcare system, the project aims to deliver interpretable, deployable, and publishable adherence prediction models.


πŸ“„ Preprint Published on medRxiv
πŸ†• Citation:
Subash Yadav, Saijal Rajbhandari. MedAdhereAI: An Interpretable Machine Learning Pipeline for Predicting Medication Non-Adherence in Chronic Disease Patients Using Real-World Refill Data. medRxiv 2025.07.01.25330675; doi: 10.1101/2025.07.01.25330675
πŸ”— Read the Preprint


πŸ“Œ Key Objectives

  • πŸ“Š Forecast whether a patient will adhere or not to prescribed medications
  • βš™οΈ Engineer features from real claim-level refill data
  • 🌍 Focus on chronic conditions in resource-constrained settings
  • πŸ”Ž Use interpretable AI via SHAP to understand drivers of non-adherence
  • πŸ“ Support reproducible publication with notebooks + scripts

🧠 Current Progress (Complete)

βœ… Phase 1: Exploratory Data Analysis

  • Cleaned and loaded raw diabetes adherence dataset
  • Created binary adherence target using domain threshold (β‰₯ 8)
  • Converted date columns to datetime for time-based feature creation
  • Engineered temporal features: time between service, assess, and refill dates
  • Computed refill gaps per patient and visualized refill behavior trends

βœ… Phase 2: Feature Engineering

  • Aggregated refill behavior features per patient:
    • avg_refill_gap, max_refill_gap, total_visits
  • Merged most recent binary adherence label (ADHERENT_BINARY) per patient
  • Enriched dataset with demographic features: GENDER and AGE
  • Handled missing values:
    • Refill gaps filled with 0.0 for single-visit patients
    • Dropped intermediate date fields after transformation
  • Exported cleaned dataset as .csv and .pkl for modeling

βœ… Phase 3: Model Building & Evaluation

  • Trained baseline models:
    • Logistic Regression (ROC AUC: 0.82)
    • Random Forest (ROC AUC: 0.77)
  • Performed evaluation with accuracy, F1-score, and ROC AUC
  • Validated model stability using 5-fold cross-validation
  • Assessed probability calibration via Brier score and calibration curve
  • Applied SHAP for local and global explainability
  • Exported trained models for deployment (.pkl format)

βœ… Phase 4: Visualization, Public Health Framing, and Real-World Deployment

  • All visualizations (SHAP plots, feature importance, calibration curve) completed
  • Public health impact framing and interpretation added
  • Ready for research publication, deployment, and stakeholder engagement

πŸŽ‰ All project phases are complete as per the attached documentation and deliverables.


πŸ“ Project Structure

MedAdhereAI/
β”œβ”€β”€ dataset/
β”‚   └── raw/                          # Real-world data (CSV files, not committed)
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 01_data_exploration.ipynb     # βœ… EDA + target engineering (complete)
β”‚   β”œβ”€β”€ 02_feature_engineering.ipynb  # ⏳ Feature aggregation (in progress)
β”‚   β”œβ”€β”€ 03_model_training.ipynb       # ⏳ Model building
β”‚   └── 04_model_explainability.ipynb # ⏳ SHAP analysis
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ data_cleaning.py              # Placeholder for modular code
β”‚   β”œβ”€β”€ feature_engineering.py
β”‚   β”œβ”€β”€ model_utils.py
β”‚   └── shap_explainer.py
β”œβ”€β”€ reports/
β”‚   └── figures/                      # Output graphs, charts
β”œβ”€β”€ requirements.txt                 # Python packages
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md                        # This file
└── LICENSE                          # MIT License

πŸ› οΈ Tech Stack

  • Python 3.11
  • Pandas, NumPy
  • scikit-learn, XGBoost
  • SHAP
  • Jupyter Notebook

πŸš€ Getting Started

# 1. Clone the repo
git clone https://github.com/mathachew7/MedAdhereAI.git
cd MedAdhereAI

# 2. Create & activate virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 4. Launch notebooks
jupyter notebook

πŸ“Š Sample Outputs

  • ADHERENT_BINARY label (78% adherent / 22% non-adherent)
  • Logistic Regression AUC: 0.82 | Random Forest AUC: 0.77
  • Brier Score: 0.1749 (well-calibrated)
  • SHAP summary: total_visits, AGE, refill_gap as top predictors
  • Logistic Regression Coefficients: Bar plot
  • Random Forest Feature Importance: Bar plot
  • SHAP Local Explanations: Individual-level interpretability

All outputs, visuals, and impact framing have been generated and included in the documentation.


πŸ’‘ Public Health Impact

Medication non-adherence contributes to over $300 billion in preventable U.S. healthcare costs annually.
This project provides an interpretable system to flag patients at risk of skipping essential medications using refill behavior and minimal demographic data.

This supports:

  • Early risk stratification
  • Targeted outreach and follow-ups
  • Clinically explainable, data-driven care optimization

πŸ“„ License

This project is licensed under the MIT License.


πŸ™Œ Credits

  • Dataset by researchers on Mendeley Data
  • Built by Subash Yadav for real-world predictive health research

πŸ’¬ Contact

For collab or publication inquiries:
πŸ“§ subashyadav7@outlook.com
πŸ”— LinkedIn

About

AI-powered predictive pipeline to forecast medication adherence risk using real-world refill data from chronic disease patients. Built for research, explainability, and publication.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published