Skip to content

mainlp/WiLoVa-QA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WIkipedia LOcal VAriety Question Answering (WILOVA-QA)

Data and code of Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

The WILOVA-QA dataset and the data used for generate prompts have been compressed as .zip files to prevent direct leakage. The password for decompressing the files is: wilovaqa

Pipeline

Generate prompts -> Run the LLM generation -> Run the LLM-as-a-judge -> Evaluation

Generate prompts

Run python generate_prompts.py <language_id> <source_type> to generate a .pkl file for prompts, which is is a dictionary of the form: dict[str, dict[str, dict]].

'<language_id>' can be 'deu' (deu-bar) or 'zho' (cmn-yue).

'<source_type>' can be 'dialectqa' or 'eclektic'. Please manually modify the list of settings inside generate_prompts.py to include the desired settings as prompt settings.

Usage example: python generate_prompts.py zho dialectqa

Run the LLM generation

Run python3 -u dialectqa.py <GPU_id(s)> <path_to_pkl_file_of_prompts> <model_name> <tokenizer_path> <model_path> to run the LLM generation. The results will be saved as a .pkl file, which is a dictionary of the form: dict[str, dict[str, dict]].

<GPU_id(s)> may be 1 GPU id for smaller models, or 4 GPU ids for larger models like llama3_70b and qwen2.5_72b.

Run the LLM-as-a-judge

After obtaining the results generated by the LLM, run python3 -u dialectqa.py <GPU_id(s)> <path_to_pkl_file_of_results> <model_name> <tokenizer_path> <model_path> to use another LLM to evaluate the generated results. The LLM-as-a-judge evaluation results will be appended to the existing results, and be saved as a .pkl file, which is a dictionary of the form: dict[str, dict[str, dict]].

Evaluation

After obtaining the results of LLM-as-a-judge, run python evaluation.py <path_to_pkl_file_of_LLM-as-a-judge_results> to evaluate all the results (including metrics other than LLM-as-a-judge). The evaluation scores will be printed on the screen.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages