A production-grade machine learning system for detecting phishing websites using 30 security features.
β¨ Features β’ ποΈ Architecture β’ π Quick Start β’ π¬ How It Works β’ π Examples
Network Security is a comprehensive ML-based phishing detection system that analyzes websites using 30 engineered features to identify potential phishing threats.
It includes:
- A FastAPI backend for real-time and batch predictions
- A Streamlit frontend for interactive analysis
- A full ML pipeline for training, validation, and evaluation
- MLflow integration for experiment tracking and model registry
- Dockerized deployment for production use
From this:
Upload a CSV with website features
To this:
Prediction Results
------------------
Website 1 β Legitimate
Website 2 β Phishing
Website 3 β Suspicious
precision: 0.9890399837629389
recall_score: 0.9926665308616827
f1_score: 0.9908499389995933
β±οΈ Prediction time: milliseconds per sample β Status: Production-ready ML system
The Problem Phishing websites continue to grow rapidly, and manual inspection is slow, inconsistent, and error-prone.
Goal Build an end-to-end machine learning system that can:
- Automatically detect phishing websites
- Support real-time and batch predictions
- Track experiments and models professionally
- Be deployed easily using Docker
What I Learned
- Designing modular ML pipelines improves maintainability
- MLflow simplifies experiment tracking and versioning
- FastAPI is ideal for low-latency ML APIs
- Streamlit accelerates building data apps
- Docker makes ML systems reproducible and portable
- Batch prediction using CSV uploads
- Manual feature input for single-website analysis
- Real-time phishing detection
- RESTful FastAPI backend
- Interactive Streamlit dashboard
- 30 website features analyzed
- Risk assessment scoring
- Accuracy, Precision, Recall, F1-Score
- Feature-wise breakdown
- Exportable CSV results
- Automated data ingestion from MongoDB
- Data validation and transformation
- Hyperparameter tuning using GridSearchCV
- Multiple models evaluated
- MLflow experiment tracking
- Model versioning and registry
User Input / CSV
β
βΌ
Streamlit UI
β
βΌ
FastAPI Backend
β
βΌ
Preprocessing
β
βΌ
Trained ML Model
β
βΌ
Predictions + Metrics
Network_Security/
βββ app.py # FastAPI backend
βββ streamlit_app.py # Streamlit frontend
βββ main.py # Training pipeline
βββ Dockerfile # Docker configuration
βββ requirements.txt # Dependencies
βββ setup.py # Package setup
βββ .env # Environment variables
β
βββ networksecurity/
β βββ components/
β β βββ data_ingestion.py
β β βββ data_validation.py
β β βββ data_transformation.py
β β βββ model_trainer.py
β β
β βββ entity/
β β βββ config_entity.py
β β βββ artifact_entity.py
β β
β βββ pipeline/
β β βββ training_pipeline.py
β β βββ batch_prediction.py
β β
β βββ utils/
β βββ exception/
β βββ logging/
β βββ constant/
β
βββ final_model/
βββ Artifacts/
βββ Network_data/
βββ prediction_output/
βββ templates/
| Component | Technology |
|---|---|
| Backend | FastAPI, Uvicorn |
| Frontend | Streamlit |
| ML | scikit-learn, NumPy, Pandas |
| MLOps | MLflow, DagsHub |
| Database | MongoDB, PyMongo |
| Visualization | Plotly, Matplotlib |
| Deployment | Docker, Docker Compose |
- Python 3.10+
- MongoDB (local or Atlas)
- Git
git clone <repository-url>
cd Network_Security
python -m venv myenv
source myenv/bin/activate # Linux/Mac
myenv\Scripts\activate # Windows
pip install -r requirements.txt
pip install -e .Create a .env file:
MONGODB_URL_KEY=mongodb+srv://username:password@cluster.mongodb.net/?appName=Cluster0MLflow settings (in model_trainer.py):
os.environ["MLFLOW_TRACKING_URI"] = "https://dagshub.com/your-username/Network-Security.mlflow"
os.environ["MLFLOW_TRACKING_USERNAME"] = "your-username"
os.environ["MLFLOW_TRACKING_PASSWORD"] = "your-token"- Load phishing dataset from MongoDB
- Store raw data in artifacts
- Check schema
- Handle missing values
- Validate feature ranges
- Feature preprocessing
- Train-test split
- Scaling and encoding
- Train multiple models
- Perform GridSearchCV
- Select best model
- Log metrics to MLflow
- Load trained model
- Apply preprocessing
- Generate predictions
- Return risk score
python app.pyDocs: http://localhost:8000/docs
streamlit run streamlit_app.pyRedirects to docs
Triggers model training
curl -X GET http://localhost:8000/trainBatch prediction
curl -X POST http://localhost:8000/predict \
-F "file=@data.csv"- Random Forest
- Decision Tree
- Gradient Boosting
- Logistic Regression
- AdaBoost
- -1 β phishing
- 0 β suspicious
- 1 β legitimate
docker build -t networksecurity:latest .
docker run -p 8000:8000 --env-file .env networksecurity:latestDocker Compose:
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
env_file:
- .env- Accuracy: ~95.2%
- Precision: ~94.8%
- Recall: ~93.5%
- F1-Score: ~94.1%
- Fork the repository
- Create feature branch
- Commit changes
- Push and open PR
MIT License Β© 2025 Vivek Kumar Gupta
Vivek Kumar Gupta AI Engineering Student | MLOps & ML Systems
GitHub: @vivek34561 LinkedIn: vivek-gupta-0400452b6
- UCI Machine Learning Repository
- MLflow
- FastAPI
- Streamlit
Made with β€οΈ for Cybersecurity
Built to explore end-to-end machine learning systems