Movie Genre Classification using TF-IDF and Machine Learning
A Machine Learning project that predicts the genre of a movie based on its plot summary using TF-IDF and classifiers like Naive Bayes, Logistic Regression, and Linear SVM.
Create a model that predicts the genre of a movie based on its plot summary using NLP techniques.
- Source: Kaggle - Genre Classification Dataset IMDB
- Files:
train_data.txt,test_data.txt,test_data_solution.txt
- Python
- Google Colab
- Scikit-learn
- TF-IDF Vectorizer
- Pandas, NumPy, Matplotlib, Seaborn
| Model | Accuracy |
|---|---|
| Naive Bayes | ~52% |
| Logistic Regression | ~58% |
| Linear SVM | ~57% |
- Load & explore the dataset
- Clean and preprocess text
- TF-IDF Vectorization
- Train multiple ML models
- Compare accuracies
- Predict genre for new descriptions
- Open
movie_genre_classification.ipynbin Google Colab - Download the dataset from Kaggle
- Run all cells step by step ##0R Run Google Collab Link:-https://colab.research.google.com/drive/1mMlIgvHBdjgu6fUP3qm2Sw4oTdqlWQLy?usp=sharing
- Best Model: Logistic Regression with 58% accuracy
- Dataset is imbalanced (Drama & Documentary dominate)
- All 3 models used TF-IDF (10,000 features, bigrams)
- Suhani Parveen
- GitHub: Suhani-ai-dev