Skip to content

Vigodang/Infection-Risk-Prediction-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Infection Risk Prediction using Machine Learning

A Complete End-to-End Predictive Analytics Case Study


Why This Project Matters

This project demonstrates my capability to build and deliver a complete machine learning solution—from raw data to actionable insights. It showcases my ability to:

  • Develop a fully operational end-to-end ML pipeline
  • Perform data cleaning, feature engineering, and model training
  • Transform messy, real-world datasets into useful analytical outputs
  • Apply predictive modeling techniques to a public health domain
  • Communicate findings clearly through a structured analytical workflow

This repository reflects my approach to real-world analytics: clarity, reproducibility, and practical value.


1. Project Overview

This project predicts whether an individual is at high risk of infection based on demographic, behavioral, and medical attributes.
The goal is to support early detection and healthcare prioritization.

The workflow includes:

  • Exploratory Data Analysis (EDA)
  • Data preprocessing
  • Feature engineering
  • Model training and comparison
  • Performance evaluation
  • Report summarization

2. Key Skills Demonstrated

  • Supervised machine learning (classification)
  • Data cleaning & feature engineering
  • Model evaluation (Accuracy, Precision, Recall, F1, ROC-AUC)
  • Exploratory Data Analysis (visualization + statistics)
  • Python: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
  • Reproducible and modular ML project structure
  • Clear communication of insights

These skills align strongly with roles such as:

  • Data Scientist
  • Machine Learning Engineer
  • Data Analyst
  • Health Analytics Specialist

3. Methods & Approach

Exploratory Data Analysis (EDA)

  • Summary statistics
  • Missing value analysis
  • Outlier detection
  • Correlation matrices
  • Feature distribution plots

Preprocessing

  • Handling missing/inconsistent values
  • Encoding categorical variables
  • Scaling numerical features
  • Train–test split

Feature Engineering

  • Derived features
  • Domain-based transformations
  • Normalization / standardization

Models Evaluated

  • Logistic Regression
  • Random Forest
  • Gradient Boosting (optional)

Evaluation Metrics

  • Accuracy
  • Precision & Recall
  • F1-score
  • ROC-AUC
  • Confusion Matrix

These metrics ensure robust evaluation, especially with imbalanced datasets.


About

A complete end-to-end machine learning workflow for predicting infection risk using demographic and health-related features. This repository covers exploratory data analysis (EDA), feature engineering, model building, evaluation metrics, and reporting for an academic case study.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors