Skip to content

LipunKumarDalai/Loan-Approval-Domain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Loan Domain MLOps Pipeline

End-to-End Machine Learning Project with CI/CD, Docker, AWS & MongoDB

Python ML MLOps AWS Docker CI/CD

An End-to-End MLOps project that demonstrates how a real-world machine learning system is built, deployed, and automated using modern industry tools.

This project implements a complete ML lifecycle pipeline, including:

  • Data ingestion from MongoDB Atlas
  • Data validation and transformation
  • Model training and evaluation
  • Model versioning in AWS S3
  • Containerization with Docker
  • Automated CI/CD using GitHub Actions
  • Deployment on AWS EC2

The project simulates a production-grade ML system architecture.


πŸ“Œ Project Architecture

Data Source (MongoDB Atlas)
        β”‚
        β–Ό
Data Ingestion
        β”‚
        β–Ό
Data Validation
        β”‚
        β–Ό
Data Transformation
        β”‚
        β–Ό
Model Trainer
        β”‚
        β–Ό
Model Evaluation
        β”‚
        β–Ό
Model Pusher (AWS S3)
        β”‚
        β–Ό
Prediction Pipeline
        β”‚
        β–Ό
Web Application (Streamlit)
        β”‚
        β–Ό
Docker Container
        β”‚
        β–Ό
CI/CD (GitHub Actions)
        β”‚
        β–Ό
Deployment (AWS EC2)

βš™οΈ Tech Stack

Programming

  • Python 3.10

Machine Learning

  • Scikit-Learn
  • Pandas
  • NumPy

MLOps

  • Docker
  • GitHub Actions
  • CI/CD Automation

Cloud

  • AWS S3 (Model Registry)
  • AWS EC2 (Deployment)
  • AWS ECR (Docker Registry)

Database

  • MongoDB Atlas

Backend

  • Streamlit

πŸ“‚ Project Structure

vehicle-data-mlops
β”‚
β”œβ”€β”€ notebook/
β”‚   β”œβ”€β”€ mongoDB_demo.ipynb
β”‚   β”œβ”€β”€ EDA.ipynb
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ data_ingestion.py
β”‚   β”‚   β”œβ”€β”€ data_validation.py
β”‚   β”‚   β”œβ”€β”€ data_transformation.py
β”‚   β”‚   β”œβ”€β”€ model_trainer.py
β”‚   β”‚   β”œβ”€β”€ model_evaluation.py
β”‚   β”‚   └── model_pusher.py
β”‚   β”‚
β”‚   β”œβ”€β”€ configuration/
β”‚   β”‚   β”œβ”€β”€ mongo_db_connection.py
β”‚   β”‚   └── aws_connection.py
β”‚   β”‚
β”‚   β”œβ”€β”€ entity/
β”‚   β”‚   β”œβ”€β”€ config_entity.py
β”‚   β”‚   β”œβ”€β”€ artifact_entity.py
β”‚   β”‚   β”œβ”€β”€ estimator.py
β”‚   β”‚   └── s3_estimator.py
β”‚   β”‚
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── main_utils.py
β”‚
β”œβ”€β”€ pipeline/
β”‚   β”œβ”€β”€ training_pipeline.py
β”‚   └── prediction_pipeline.py
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ setup.py
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ .dockerignore
└── README.md

πŸš€ Getting Started

πŸ“Š Data Ingestion Pipeline

The ingestion pipeline:

  • Connects to MongoDB Atlas
  • Fetches raw dataset
  • Converts key-value records into Pandas DataFrame
  • Stores dataset inside artifact directory

Main modules involved:

data_access/
configuration/
entity/
components/data_ingestion.py

πŸ” Data Validation

Validation checks include:

  • Schema validation
  • Column type validation
  • Missing values
  • Data consistency

Configuration defined in:

config/schema.yaml

πŸ”§ Data Transformation

Feature engineering and preprocessing steps:

  • Handling missing values
  • Feature scaling
  • Encoding categorical variables
  • Preparing training dataset

πŸ€– Model Training

The model trainer:

  • Splits dataset
  • Trains ML models
  • Selects best performing model
  • Saves trained model artifact

πŸ“ˆ Model Evaluation

Compares new model with previous production model.

Threshold defined in constants:

MODEL_EVALUATION_CHANGED_THRESHOLD_SCORE = 0.02

If performance improves β†’ model is pushed to S3 Model Registry


πŸ”„ CI/CD Pipeline (GitHub Actions)

CI/CD automatically performs:

1️⃣ Build Docker Image 2️⃣ Push image to AWS ECR 3️⃣ Deploy to EC2 instance

Secrets required in GitHub:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
ECR_REPO

πŸ–₯️ EC2 Deployment

Launch EC2 instance:

Ubuntu Server 24.04
Instance: t2.medium
Storage: 30GB

Install Docker:

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker

πŸ”— Self Hosted Runner (GitHub β†’ EC2)

Steps:

GitHub β†’ Settings
Actions β†’ Runner
New Self Hosted Runner

Run commands on EC2:

./config.sh
./run.sh

🌐 Application Access

Allow port 5000 in EC2 security group.

Access application:

http://<EC2-PUBLIC-IP>:5000

Training endpoint:

/training

πŸ“Έ Application Features

βœ” Train ML model from UI βœ” Predict using trained model βœ” Fully automated CI/CD deployment βœ” Cloud model registry


🧠 Key Learning Outcomes

This project demonstrates real production ML system design including:

  • End-to-End ML pipelines
  • Cloud infrastructure
  • Model versioning
  • CI/CD automation
  • Docker containerization
  • Scalable deployment architecture

πŸ‘¨β€πŸ’» Author

Lipu Daman

Machine Learning | MLOps | Data Scientist


⭐ If you like this project, please star the repository!

About

In depth Pipeline implementation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors