Text Mining

Here You can find my projects which are connected to Text Mining.

Lab1

Using the re library
Working with regex (regular expressions)
Removing numbers, html characters, punctuation from text
Extracting hashtags, emoticons

Lab2

Creation of a function that cleans up text (pulling out emoticons, converting letters to lower case, removing numbers, html tags, punctuation, removing excessive spaces)
Library stopwords.
Implementation of a function that removes unnecessary words
Library nltk.
Implementation of the stemming process, based on Porter's algorithm

Lab3

Working on cleared text using previously created functions
Creation of a bag of words bow
Generation of a word cloud using WordCloud (graphical visualisation of words from the text, together with their frequency of occurrence)

Lab4:

Library sklern (scikit-learn)
Measures for assessing word importance (count, binary, TF-IDF)
Creation of a text-tokenizer function that accepts cleaned, steamed text, word length is longer than 3
Create an instance of the vectorizer vectorizer= TfidfVectorizer(tokenizer=text_tokenizer)
Vectorise the text to be processed X_transform = vectorizer.fit_transform(X)
Print the resulting matrix X_transform
Extracting terms using vectorizer.get_feature_names_out().
Extraction of the top 10 most frequently occurring tokens
Extraction of the top 10 most frequent tokens
Finding the top 10 documents that contain the most tokens

Lab5:

Library matplotlib
Visualisation of top10
Using prettytable and visualising top10 tokens in prettytables

Lab6:

Joining DataFrames
Library sklearn
Dividing into train and test set
Vectorizing sets using previously written functions
Training and classify using DecissionTree, RandomForest, SVM, AdaBoost, Bagging
Classification evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
lab1		lab1
lab2		lab2
lab3		lab3
lab4		lab4
lab5		lab5
lab6		lab6
project - EntityMatching		project - EntityMatching
project - films descriptions		project - films descriptions
project - tweets_airline		project - tweets_airline
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Mining

Lab1

Lab2

Lab3

Lab4:

Lab5:

Lab6:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Mining

Lab1

Lab2

Lab3

Lab4:

Lab5:

Lab6:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages