Binary long text classification with Deep Learning models.
You only need to have a datframe with 3 columns:
- text: the text to classify (list of word)
- label: the label of the text (0 or 1)
- id: the id of the text (int)
Place your dataset in the project directory data/corpus and update the file path in the config.yaml file, the column name and the parameter of the model you want to train.
This repository is particularly adapted to classify small dataset of long text.
We have used it to classify PTSD from text. The process is the following:
We provide an example corpus in the data/corpus folder based on french presidential campaign speeches. You can inspect and create classifier using this online tool: http://hyperbase.unice.fr/hyperbase/
- Clone this repository to your local machine.
- Install the required packages:
pip install -r requirements.txt. - Place your dataset in the project directory and update the file path in the script.
- Run the main script:
python train_model.py 10. If you want to train 10 models for 10 random seed - Analyze results in the
resultsfolder and using the log file:deep_classification_text.log.
This project receive the support from the following organization:

