This project demonstrates my capability to build and deliver a complete machine learning solution—from raw data to actionable insights. It showcases my ability to:
- Develop a fully operational end-to-end ML pipeline
- Perform data cleaning, feature engineering, and model training
- Transform messy, real-world datasets into useful analytical outputs
- Apply predictive modeling techniques to a public health domain
- Communicate findings clearly through a structured analytical workflow
This repository reflects my approach to real-world analytics: clarity, reproducibility, and practical value.
This project predicts whether an individual is at high risk of infection based on demographic, behavioral, and medical attributes.
The goal is to support early detection and healthcare prioritization.
The workflow includes:
- Exploratory Data Analysis (EDA)
- Data preprocessing
- Feature engineering
- Model training and comparison
- Performance evaluation
- Report summarization
- Supervised machine learning (classification)
- Data cleaning & feature engineering
- Model evaluation (Accuracy, Precision, Recall, F1, ROC-AUC)
- Exploratory Data Analysis (visualization + statistics)
- Python: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
- Reproducible and modular ML project structure
- Clear communication of insights
These skills align strongly with roles such as:
- Data Scientist
- Machine Learning Engineer
- Data Analyst
- Health Analytics Specialist
- Summary statistics
- Missing value analysis
- Outlier detection
- Correlation matrices
- Feature distribution plots
- Handling missing/inconsistent values
- Encoding categorical variables
- Scaling numerical features
- Train–test split
- Derived features
- Domain-based transformations
- Normalization / standardization
- Logistic Regression
- Random Forest
- Gradient Boosting (optional)
- Accuracy
- Precision & Recall
- F1-score
- ROC-AUC
- Confusion Matrix
These metrics ensure robust evaluation, especially with imbalanced datasets.