🛒 Women's E-Commerce Clothing Reviews

Project Description

In this project, we use transformers to predict product ratings and to determine the tone of the review based on preprocessed customer text reviews.

Files

ClothingReviewsClassification.ipynb : Jupyter Notebook with text preprocessing and model training
/binary : Binary classification model's files
/multiclass : Multiclass classification model's files

Dataset

We use Women’s Clothing E-Commerce dataset revolving around the reviews written by customers

This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

Feature	Type	Description
`Clothing ID`	Integer	Variable that refers to the specific piece being reviewed.
`Age`	Integer	Variable of the reviewers age.
`Title`	String	Variable for the title of the review.
`Review Text`	String	Variable for the review body.
`Rating`	Integer	Variable for the product score granted by the customer from 1 Worst, to 5 Best.
`Recommended IND`	Integer	Variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
`Positive Feedback Count`	Integer	Variable documenting the number of other customers who found this review positive.
`Division Name`	String	Categorical name of the product high level division.
`Department Name`	String	Categorical name of the product department name.
`Class Name`	String	Categorical name of the product class name.

Preprocessing

In the process of text preprocessing, we employ methods such as tokenization, POS-tagging, and lemmatization. All these operations are carried out using the nltk library. Additionally, we remove stop words and convert the text to lowercase.

Binary Classification

We use bert-base-uncased to binary classification for predicting whether the review is positive or negative.

f1 score: 0.8507015684469733 number of epochs: 10

The classifier does its job well. For successful classification, it is necessary to write a review using a sufficient number of explicit evaluative words.

Multiclass Classification

We use distilroberta-base to multiclass classification for predicting the rating of the product.

f1 score: 0.6033866825602262 number of epochs: 20

Overall accuracy for each class:

Class 0: 0.24
Class 1: 0.31
Class 2: 0.34
Class 3: 0.35
Class 4: 0.81

The classifier confuses classes that are close in value, for example: class 2 with 1 and 3. The classifier distinguishes classes well at a distance, for example: class 4 is little confused with class 0. Most likely, this result was due to an unbalanced dataset.

Include Credits

Author

Maxim Ivanov - GitHub, Telegram

Course

This project was completed as part of the "Практический Deep Learning" course offered by AI Education.

License

This project is licensed under the MIT license. For more information, see the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛒 Women's E-Commerce Clothing Reviews

Project Description

Table of Contents

Files

Dataset

Preprocessing

Binary Classification

Multiclass Classification

Include Credits

Author

Course

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
binary		binary
img		img
multiclass		multiclass
.gitattributes		.gitattributes
ClothingReviewsClassification.ipynb		ClothingReviewsClassification.ipynb
LICENSE		LICENSE
README.md		README.md

License

moxeeem/ClothingReviews

Folders and files

Latest commit

History

Repository files navigation

🛒 Women's E-Commerce Clothing Reviews

Project Description

Table of Contents

Files

Dataset

Preprocessing

Binary Classification

Multiclass Classification

Include Credits

Author

Course

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages