GitHub - udarshcodes/imdb-sentiment

IMDb Sentiment Classification

This project trains a text classification model on the IMDb Movie Review dataset to predict whether a review is positive or negative.

##The dataset contains two columns:

review → text of the movie review
entiment → label (positive or negative)

##Preprocessing Steps Text Cleaning & Stopword Removal Used CountVectorizer(stop_words='english') to remove common stopwords (e.g., "the", "is", "and").

##Text Vectorization Converted raw text into a bag-of-words representation using CountVectorizer. Each review is represented as a sparse numeric vector based on word counts.

##Train/Test Split Split dataset into 80% training and 20% testing using train_test_split.

##Models Used Logistic Regression (LogisticRegression(max_iter=1000)) Trained on the vectorized reviews. Achieved 88.3% accuracy on the IMDb dataset.

##Results Logistic Regression → 88.3% accuracy

##How to Run Install dependencies: pip install pandas scikit-learn Run the script: python imdb_sentiment.py

Output: Accuracy of Logistic Regression model: 0.883

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
imdb.csv		imdb.csv
imdb_sentiment.py		imdb_sentiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages