Skip to content

govind-arora/WordEmbeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordEmbeddings

Applied word embedding using fasttext

Included the Data-set from the given link :https://www.kaggle.com/uciml/sms-spam-collection-dataset

Initially preprocessed the data using the following command:

cat spam.csv | sed -e "s/([.!?,'/()])/ \1 /g" | tr "[:upper:]" "[:lower:]" > spam.csv

Then, divide the data into two parts:

Total Sample Data ( 5572) = Training Data (4072) + Test Data (1500) head -n 4072 spam.csv > Train.csv tail -n 1500 spam.csv > Test.csv

Classify the data using supervised learning through fasttext.

classifier= fasttext.supervised('Train.csv', 'model', label_prefix='label_', lr=1.0, dim=100, epoch=25)

Once the model was trained, it got 98% of precision onto the test data.

Now run input.py, take the input.

I have clasified on the basis of 60:40, 60% weightage to maximum iterations and 40% weightage to the probability calculation.

About

Applied word embedding using fasttext

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages