🔒 Network Security – Phishing Detection System

A production-grade machine learning system for detecting phishing websites using 30 security features.

✨ Features • 🏗️ Architecture • 🚀 Quick Start • 🔬 How It Works • 📚 Examples

🎯 Overview

Network Security is a comprehensive ML-based phishing detection system that analyzes websites using 30 engineered features to identify potential phishing threats.

It includes:

A FastAPI backend for real-time and batch predictions
A Streamlit frontend for interactive analysis
A full ML pipeline for training, validation, and evaluation
MLflow integration for experiment tracking and model registry
Dockerized deployment for production use

From this:

Upload a CSV with website features

To this:

Prediction Results
------------------
Website 1 → Legitimate
Website 2 → Phishing
Website 3 → Suspicious

precision: 0.9890399837629389
recall_score: 0.9926665308616827
f1_score: 0.9908499389995933

⏱️ Prediction time: milliseconds per sample ✅ Status: Production-ready ML system

💡 Why I Built This

The Problem Phishing websites continue to grow rapidly, and manual inspection is slow, inconsistent, and error-prone.

Goal Build an end-to-end machine learning system that can:

Automatically detect phishing websites
Support real-time and batch predictions
Track experiments and models professionally
Be deployed easily using Docker

What I Learned

Designing modular ML pipelines improves maintainability
MLflow simplifies experiment tracking and versioning
FastAPI is ideal for low-latency ML APIs
Streamlit accelerates building data apps
Docker makes ML systems reproducible and portable

✨ Key Features

🎯 Core Functionality

Batch prediction using CSV uploads
Manual feature input for single-website analysis
Real-time phishing detection
RESTful FastAPI backend
Interactive Streamlit dashboard

📊 Analysis Capabilities

30 website features analyzed
Risk assessment scoring
Accuracy, Precision, Recall, F1-Score
Feature-wise breakdown
Exportable CSV results

🔧 ML Pipeline

Automated data ingestion from MongoDB
Data validation and transformation
Hyperparameter tuning using GridSearchCV
Multiple models evaluated
MLflow experiment tracking
Model versioning and registry

🏗️ Architecture

High-Level Flow

User Input / CSV
        │
        ▼
 Streamlit UI
        │
        ▼
 FastAPI Backend
        │
        ▼
 Preprocessing
        │
        ▼
 Trained ML Model
        │
        ▼
 Predictions + Metrics

Project Structure

Network_Security/
├── app.py                      # FastAPI backend
├── streamlit_app.py            # Streamlit frontend
├── main.py                     # Training pipeline
├── Dockerfile                  # Docker configuration
├── requirements.txt            # Dependencies
├── setup.py                    # Package setup
├── .env                        # Environment variables
│
├── networksecurity/
│   ├── components/
│   │   ├── data_ingestion.py
│   │   ├── data_validation.py
│   │   ├── data_transformation.py
│   │   └── model_trainer.py
│   │
│   ├── entity/
│   │   ├── config_entity.py
│   │   └── artifact_entity.py
│   │
│   ├── pipeline/
│   │   ├── training_pipeline.py
│   │   └── batch_prediction.py
│   │
│   ├── utils/
│   ├── exception/
│   ├── logging/
│   └── constant/
│
├── final_model/
├── Artifacts/
├── Network_data/
├── prediction_output/
└── templates/

🛠️ Tech Stack

Component	Technology
Backend	FastAPI, Uvicorn
Frontend	Streamlit
ML	scikit-learn, NumPy, Pandas
MLOps	MLflow, DagsHub
Database	MongoDB, PyMongo
Visualization	Plotly, Matplotlib
Deployment	Docker, Docker Compose

🚀 Quick Start

Prerequisites

Python 3.10+
MongoDB (local or Atlas)
Git

Installation

git clone <repository-url>
cd Network_Security

python -m venv myenv
source myenv/bin/activate  # Linux/Mac
myenv\Scripts\activate     # Windows

pip install -r requirements.txt
pip install -e .

Configuration

Create a .env file:

MONGODB_URL_KEY=mongodb+srv://username:password@cluster.mongodb.net/?appName=Cluster0

MLflow settings (in model_trainer.py):

os.environ["MLFLOW_TRACKING_URI"] = "https://dagshub.com/your-username/Network-Security.mlflow"
os.environ["MLFLOW_TRACKING_USERNAME"] = "your-username"
os.environ["MLFLOW_TRACKING_PASSWORD"] = "your-token"

🔬 How It Works

Phase 1: Data Ingestion

Load phishing dataset from MongoDB
Store raw data in artifacts

Phase 2: Data Validation

Check schema
Handle missing values
Validate feature ranges

Phase 3: Data Transformation

Feature preprocessing
Train-test split
Scaling and encoding

Phase 4: Model Training

Train multiple models
Perform GridSearchCV
Select best model
Log metrics to MLflow

Phase 5: Prediction

Load trained model
Apply preprocessing
Generate predictions
Return risk score

📚 Example Usage

Start Backend

python app.py

Docs: http://localhost:8000/docs

Start Frontend

streamlit run streamlit_app.py

UI: http://localhost:8501

🌐 API Endpoints

GET /

Redirects to docs

GET /train

Triggers model training

curl -X GET http://localhost:8000/train

POST /predict

Batch prediction

curl -X POST http://localhost:8000/predict \
  -F "file=@data.csv"

🎓 Model Training

Models Evaluated

Random Forest
Decision Tree
Gradient Boosting
Logistic Regression
AdaBoost

Feature Values

-1 → phishing
0 → suspicious
1 → legitimate

🐳 Docker Deployment

docker build -t networksecurity:latest .
docker run -p 8000:8000 --env-file .env networksecurity:latest

Docker Compose:

version: '3.8'
services:
  api:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env

📊 Model Performance

Accuracy: ~95.2%
Precision: ~94.8%
Recall: ~93.5%
F1-Score: ~94.1%

🤝 Contributing

Fork the repository
Create feature branch
Commit changes
Push and open PR

📄 License

👨‍💻 Author

Vivek Kumar Gupta AI Engineering Student | MLOps & ML Systems

GitHub: @vivek34561 LinkedIn: vivek-gupta-0400452b6

🙏 Acknowledgments

UCI Machine Learning Repository
MLflow
FastAPI
Streamlit

Made with ❤️ for Cybersecurity

_{Built to explore end-to-end machine learning systems}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
Network_data		Network_data
__pycache__		__pycache__
data_schema		data_schema
final_model		final_model
logs		logs
networksecurity		networksecurity
prediction_output		prediction_output
templates		templates
valid_data		valid_data
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
app.py		app.py
main.py		main.py
push_data.py		push_data.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py
test_mongodb.py		test_mongodb.py

Folders and files

Latest commit

History

Repository files navigation

🔒 Network Security – Phishing Detection System

🎯 Overview

💡 Why I Built This

✨ Key Features

🎯 Core Functionality

📊 Analysis Capabilities

🔧 ML Pipeline

🏗️ Architecture

High-Level Flow

Project Structure

🛠️ Tech Stack

🚀 Quick Start

Prerequisites

Installation

Configuration

🔬 How It Works

Phase 1: Data Ingestion

Phase 2: Data Validation

Phase 3: Data Transformation

Phase 4: Model Training

Phase 5: Prediction

📚 Example Usage

Start Backend

Start Frontend

🌐 API Endpoints

GET /

GET /train

POST /predict

🎓 Model Training

Models Evaluated

Feature Values

🐳 Docker Deployment

📊 Model Performance

🤝 Contributing

📄 License

👨‍💻 Author

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages