📧 Spam Message Detection System

An end-to-end Machine Learning pipeline to classify SMS/email messages as spam or ham (not spam), with multiple model comparisons, TF-IDF vectorization, evaluation metrics, and a FastAPI deployment endpoint.

📌 Features

✅ Full NLP preprocessing (lowercasing, stemming, stopword removal, URL/phone normalization)
✅ TF-IDF vectorization with unigrams + bigrams
✅ 4 models compared: Naive Bayes, Logistic Regression, Random Forest, Linear SVM
✅ Evaluation: Accuracy, F1, ROC-AUC, Confusion Matrix, ROC Curves
✅ Feature importance visualization
✅ 5-fold cross-validation
✅ FastAPI REST endpoint for real-time prediction

📁 Project Structure

spam-detector/
│
├── spam_detection.ipynb     # Main Jupyter / Colab notebook
├── src/
│   ├── preprocess.py        # Text cleaning & preprocessing
│   ├── train.py             # Model training & saving
│   ├── predict.py           # Prediction function
│   └── api.py               # FastAPI REST API
│
├── models/                  # Saved model files (auto-created on train)
│   ├── spam_model.pkl
│   └── tfidf_vectorizer.pkl
│
├── data/
│   └── README.md            # Dataset download instructions
│
├── requirements.txt
├── .gitignore
└── README.md

🚀 Quick Start

1. Clone the repo

git clone https://github.com/MithraKL/spam-detector.git
cd spam-detector

2. Create virtual environment

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Get the dataset

Download SMS Spam Collection from Kaggle and place spam.csv in the data/ folder.

5. Train the model

python src/train.py

6. Run the API

uvicorn src.api:app --reload

Then test it:

curl -X POST http://localhost:8000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "Congratulations! You won a FREE iPhone. Click now!"}'

🌐 API Endpoints

Method	Endpoint	Description
GET	`/`	Health check
POST	`/predict`	Predict spam or ham

Request body:

{ "text": "Your message here" }

Response:

{ "label": "spam", "confidence": 0.9871 }

📊 Model Performance (on SMS Spam Collection)

Model	Accuracy	AUC-ROC
Naive Bayes	~97.4%	~0.985
Logistic Regression	~98.1%	~0.992
Random Forest	~97.8%	~0.991
Linear SVM	~98.3%	~0.994

🛠 Tech Stack

Language: Python 3.10+
ML/NLP: scikit-learn, NLTK
Visualization: Matplotlib, Seaborn, WordCloud
API: FastAPI, Uvicorn
Serialization: Joblib

📄 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📧 Spam Message Detection System

📌 Features

📁 Project Structure

🚀 Quick Start

1. Clone the repo

2. Create virtual environment

3. Install dependencies

4. Get the dataset

5. Train the model

6. Run the API

🌐 API Endpoints

📊 Model Performance (on SMS Spam Collection)

🛠 Tech Stack

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
models		models
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
spam_detection.ipynb		spam_detection.ipynb

Folders and files

Latest commit

History

Repository files navigation

📧 Spam Message Detection System

📌 Features

📁 Project Structure

🚀 Quick Start

1. Clone the repo

2. Create virtual environment

3. Install dependencies

4. Get the dataset

5. Train the model

6. Run the API

🌐 API Endpoints

📊 Model Performance (on SMS Spam Collection)

🛠 Tech Stack

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages