This project aims to recognize whether a given tweet is about a disaster or not. The task is solved using transformers with the AdaBoost technique.
- Data Loading and Preprocessing
- Analyzing and displaying the most popular tweet locations on an interactive map
- Analyzing the class distribution in the training set
- Data processing - removing punctuation, special characters, and emojis
- Utilizing an external file for word vectorization
- Tokenizing tweets and creating word vectors
- Splitting data into training and validation sets
- Implementing the AdaBoost mechanism
- Implementing a transformer-based classifier
- Training the classifier and evaluating the results
After training the classifier, we achieved an accuracy of approximately 80% on the validation set.
To run this project, you need the following libraries and tools:
- pandas
- matplotlib
- numpy
- re
- geopy
- folium
- nltk
- keras
- tensorflow
- sklearn
- livelossplot
The training and test datasets are provided in the files train.csv and test.csv. Additionally, you need to download the file glove.twitter.27B.200d.txt and place it in the project directory.
To run this project, follow these steps:
- Install the required libraries if you haven't already.
- Download the training and test datasets from the appropriate sources and place them in the files
train.csvandtest.csv. - Run the Jupyter Notebook file
switch_transformers.ipynb.