The codebase accompanying the Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialectal Arabic Datasets paper, accepted to ACL 2024.
conda create -n "ALDI_IAA" python=3.10
pip install -r requirements.txt
camel_data -i defaults
| Dataset | Link | |
|---|---|---|
| 1 | MPOLD | GitHub |
| 2 | YouTube Cyberbullying | OneDrive |
| 3 | DCD | Personal Site |
| 4 | ArSAS | Personal Site |
| 5 | ArSarcasm-v1 | Provided by the authors |
| 6 | iSarcasm | GitHub |
| 7 | DART | Dropbox |
| 8 | Mawqif | Provided by the authors |
| 9 | ASAD | Provided by the authors |
conda activate ALDI_IAA
# 1) MANUALLY Download the dataset files to `data/raw_data/`
# 2) Augment the dataset files with ALDi scores, and dialect labels
python prepare_datasets.py
# 3) Generate the Agreement plots
python compute_agreement_percentages.py