A Streamlit web app for detecting cyberbullying in tweets using a trained XGBoost classifier.
The project preprocesses tweet text, transforms it with a TF-IDF vectorizer, and predicts one of the following categories:
gender(gender-based bullying)religion(religion-based bullying)age(age-based bullying)ethnicity(ethnicity-based bullying)other_cyberbullying(other bullying content)not_cyberbullying(not cyberbullying)
- Clean and normalize tweet text
- Remove URLs, mentions, hashtags, punctuation, numbers, and stop words
- Lemmatize tokens using NLTK
- Predict cyberbullying category with model confidence visualization
- Display model metadata and metrics
- Model: XGBoost Classifier
- Accuracy: 83.52%
- F1 Macro: 0.8316
These values are loaded from model_metadata.json.
- Python 3.10+ (recommended)
pippackage manager
pip install -r requirements.txtstreamlit run app.pyThen open the local Streamlit URL shown in the terminal.
app.py- Streamlit application entry pointrequirements.txt- Python dependenciesbest_model.pkl- serialized trained modeltfidf_vectorizer.pkl- serialized TF-IDF vectorizerlabel_encoder.pkl- serialized label encodermodel_metadata.json- metadata for the trained model
- The app downloads NLTK resources for stop words and lemmatization at startup.
- If the NLTK data is not already installed, the app will fetch it automatically.
This repository does not include a license file. Add one if you want to publish or share the project publicly.