Skip to content

vivek34561/Network-Security

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”’ Network Security – Phishing Detection System

Python FastAPI MLflow License PRs Welcome

A production-grade machine learning system for detecting phishing websites using 30 security features.

✨ Features β€’ πŸ—οΈ Architecture β€’ πŸš€ Quick Start β€’ πŸ”¬ How It Works β€’ πŸ“š Examples


🎯 Overview

Network Security is a comprehensive ML-based phishing detection system that analyzes websites using 30 engineered features to identify potential phishing threats.

It includes:

  • A FastAPI backend for real-time and batch predictions
  • A Streamlit frontend for interactive analysis
  • A full ML pipeline for training, validation, and evaluation
  • MLflow integration for experiment tracking and model registry
  • Dockerized deployment for production use

From this:

Upload a CSV with website features

To this:

Prediction Results
------------------
Website 1 β†’ Legitimate
Website 2 β†’ Phishing
Website 3 β†’ Suspicious

precision: 0.9890399837629389
recall_score: 0.9926665308616827
f1_score: 0.9908499389995933

⏱️ Prediction time: milliseconds per sample βœ… Status: Production-ready ML system


πŸ’‘ Why I Built This

The Problem Phishing websites continue to grow rapidly, and manual inspection is slow, inconsistent, and error-prone.

Goal Build an end-to-end machine learning system that can:

  • Automatically detect phishing websites
  • Support real-time and batch predictions
  • Track experiments and models professionally
  • Be deployed easily using Docker

What I Learned

  • Designing modular ML pipelines improves maintainability
  • MLflow simplifies experiment tracking and versioning
  • FastAPI is ideal for low-latency ML APIs
  • Streamlit accelerates building data apps
  • Docker makes ML systems reproducible and portable

✨ Key Features

🎯 Core Functionality

  • Batch prediction using CSV uploads
  • Manual feature input for single-website analysis
  • Real-time phishing detection
  • RESTful FastAPI backend
  • Interactive Streamlit dashboard

πŸ“Š Analysis Capabilities

  • 30 website features analyzed
  • Risk assessment scoring
  • Accuracy, Precision, Recall, F1-Score
  • Feature-wise breakdown
  • Exportable CSV results

πŸ”§ ML Pipeline

  • Automated data ingestion from MongoDB
  • Data validation and transformation
  • Hyperparameter tuning using GridSearchCV
  • Multiple models evaluated
  • MLflow experiment tracking
  • Model versioning and registry

πŸ—οΈ Architecture

High-Level Flow

User Input / CSV
        β”‚
        β–Ό
 Streamlit UI
        β”‚
        β–Ό
 FastAPI Backend
        β”‚
        β–Ό
 Preprocessing
        β”‚
        β–Ό
 Trained ML Model
        β”‚
        β–Ό
 Predictions + Metrics

Project Structure

Network_Security/
β”œβ”€β”€ app.py                      # FastAPI backend
β”œβ”€β”€ streamlit_app.py            # Streamlit frontend
β”œβ”€β”€ main.py                     # Training pipeline
β”œβ”€β”€ Dockerfile                  # Docker configuration
β”œβ”€β”€ requirements.txt            # Dependencies
β”œβ”€β”€ setup.py                    # Package setup
β”œβ”€β”€ .env                        # Environment variables
β”‚
β”œβ”€β”€ networksecurity/
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ data_ingestion.py
β”‚   β”‚   β”œβ”€β”€ data_validation.py
β”‚   β”‚   β”œβ”€β”€ data_transformation.py
β”‚   β”‚   └── model_trainer.py
β”‚   β”‚
β”‚   β”œβ”€β”€ entity/
β”‚   β”‚   β”œβ”€β”€ config_entity.py
β”‚   β”‚   └── artifact_entity.py
β”‚   β”‚
β”‚   β”œβ”€β”€ pipeline/
β”‚   β”‚   β”œβ”€β”€ training_pipeline.py
β”‚   β”‚   └── batch_prediction.py
β”‚   β”‚
β”‚   β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ exception/
β”‚   β”œβ”€β”€ logging/
β”‚   └── constant/
β”‚
β”œβ”€β”€ final_model/
β”œβ”€β”€ Artifacts/
β”œβ”€β”€ Network_data/
β”œβ”€β”€ prediction_output/
└── templates/

πŸ› οΈ Tech Stack

Component Technology
Backend FastAPI, Uvicorn
Frontend Streamlit
ML scikit-learn, NumPy, Pandas
MLOps MLflow, DagsHub
Database MongoDB, PyMongo
Visualization Plotly, Matplotlib
Deployment Docker, Docker Compose

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • MongoDB (local or Atlas)
  • Git

Installation

git clone <repository-url>
cd Network_Security

python -m venv myenv
source myenv/bin/activate  # Linux/Mac
myenv\Scripts\activate     # Windows

pip install -r requirements.txt
pip install -e .

Configuration

Create a .env file:

MONGODB_URL_KEY=mongodb+srv://username:password@cluster.mongodb.net/?appName=Cluster0

MLflow settings (in model_trainer.py):

os.environ["MLFLOW_TRACKING_URI"] = "https://dagshub.com/your-username/Network-Security.mlflow"
os.environ["MLFLOW_TRACKING_USERNAME"] = "your-username"
os.environ["MLFLOW_TRACKING_PASSWORD"] = "your-token"

πŸ”¬ How It Works

Phase 1: Data Ingestion

  • Load phishing dataset from MongoDB
  • Store raw data in artifacts

Phase 2: Data Validation

  • Check schema
  • Handle missing values
  • Validate feature ranges

Phase 3: Data Transformation

  • Feature preprocessing
  • Train-test split
  • Scaling and encoding

Phase 4: Model Training

  • Train multiple models
  • Perform GridSearchCV
  • Select best model
  • Log metrics to MLflow

Phase 5: Prediction

  • Load trained model
  • Apply preprocessing
  • Generate predictions
  • Return risk score

πŸ“š Example Usage

Start Backend

python app.py

Docs: http://localhost:8000/docs

Start Frontend

streamlit run streamlit_app.py

UI: http://localhost:8501


🌐 API Endpoints

GET /

Redirects to docs

GET /train

Triggers model training

curl -X GET http://localhost:8000/train

POST /predict

Batch prediction

curl -X POST http://localhost:8000/predict \
  -F "file=@data.csv"

πŸŽ“ Model Training

Models Evaluated

  • Random Forest
  • Decision Tree
  • Gradient Boosting
  • Logistic Regression
  • AdaBoost

Feature Values

  • -1 β†’ phishing
  • 0 β†’ suspicious
  • 1 β†’ legitimate

🐳 Docker Deployment

docker build -t networksecurity:latest .
docker run -p 8000:8000 --env-file .env networksecurity:latest

Docker Compose:

version: '3.8'
services:
  api:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env

πŸ“Š Model Performance

  • Accuracy: ~95.2%
  • Precision: ~94.8%
  • Recall: ~93.5%
  • F1-Score: ~94.1%

🀝 Contributing

  1. Fork the repository
  2. Create feature branch
  3. Commit changes
  4. Push and open PR

πŸ“„ License

MIT License Β© 2025 Vivek Kumar Gupta


πŸ‘¨β€πŸ’» Author

Vivek Kumar Gupta AI Engineering Student | MLOps & ML Systems

GitHub: @vivek34561 LinkedIn: vivek-gupta-0400452b6


πŸ™ Acknowledgments

  • UCI Machine Learning Repository
  • MLflow
  • FastAPI
  • Streamlit

Made with ❀️ for Cybersecurity

Built to explore end-to-end machine learning systems

Releases

No releases published

Packages

 
 
 

Contributors

Languages