📬 Spam Detection App

A machine learning-powered application that classifies SMS messages as Spam or Ham (Not Spam) using Natural Language Processing (NLP). This project includes:

✅ Command-line interface (CLI)
✅ Interactive Streamlit web app
✅ Trained model using Multinomial Naive Bayes
✅ TF-IDF-based text preprocessing
✅ K-Fold evaluation metrics

🖼️ Preview

Live Link

🤖 Model Overview

Vectorizer: TfidfVectorizer(stop_words='english')
Classifier: MultinomialNB()
Training Strategy: 10-Fold Cross Validation
Evaluation Metrics:
- Accuracy: ~97.15%
- Precision: ~99.66%
- Recall: ~78.98%
- F1 Score: ~88.10%
- Geometric Mean: ~88.84%

🧠 Why Naive Bayes?

Multinomial Naive Bayes is efficient and performs well for text classification tasks like spam detection, where input features are word frequencies or TF-IDF scores.

🚀 How to Run Locally

1. Clone the Repository

git clone https://github.com/mayankraj052/SpamdetectionApp.git cd SpamdetectionApp

2. Create a Virtual Environment

for window

python -m venv venv
venv\Scripts\activate

macOS/Linux

source venv/bin/activate

3. Install Requirements

pip install -r requirements.txt

4. Run Streamlit App

streamlit run app.py

5. Run Command-Line Tool

python spam_check_cli.py

🧪 Example Predictions

"Hey John, I thought you might like this opportunity — earn $500/day working from home!" → Spam

"Are we still meeting at 6 PM today?" → Ham

🗂 Dataset Used

SMS Spam Collection Dataset (UCI ML Repository)
Data Set Link
Format: Label (spam or ham) + Message content

📦 Model Pipeline

The model pipeline is saved in spam_pipeline.pkl contains:

TF-IDF Vectorizer (preprocessing)
Multinomial Naive Bayes Classifier

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

spam_pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words='english')),
    ('model', MultinomialNB())
])

📈 Evaluation via K-Fold Cross-Validation

from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix

Used to validate model performance across 10 different splits. Metrics like accuracy, precision, recall, F1, specificity, and geometric mean were averaged.

🛠 Tools & Libraries

Python 3.11+
Scikit-learn
Pandas & NumPy
Streamlit
Joblib
Git for version control

📜 License

This project is open-source and available under the MIT License. 🙋‍♂️ Author

Made with ❤️ by Mayank Raj

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
SMSSpamCollection		SMSSpamCollection
Screenshot 2025-08-05 000957.png		Screenshot 2025-08-05 000957.png
app.py		app.py
model-making.ipynb		model-making.ipynb
requirements.txt		requirements.txt
spam_check_cli.py		spam_check_cli.py
spam_pipeline.pkl		spam_pipeline.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📬 Spam Detection App

🖼️ Preview

🤖 Model Overview

🧠 Why Naive Bayes?

🚀 How to Run Locally

1. Clone the Repository

2. Create a Virtual Environment

3. Install Requirements

4. Run Streamlit App

5. Run Command-Line Tool

🧪 Example Predictions

🗂 Dataset Used

📦 Model Pipeline

The model pipeline is saved in spam_pipeline.pkl contains:

📈 Evaluation via K-Fold Cross-Validation

🛠 Tools & Libraries

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📬 Spam Detection App

🖼️ Preview

🤖 Model Overview

🧠 Why Naive Bayes?

🚀 How to Run Locally

1. Clone the Repository

2. Create a Virtual Environment

3. Install Requirements

4. Run Streamlit App

5. Run Command-Line Tool

🧪 Example Predictions

🗂 Dataset Used

📦 Model Pipeline

The model pipeline is saved in spam_pipeline.pkl contains:

📈 Evaluation via K-Fold Cross-Validation

🛠 Tools & Libraries

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages