Skip to content

Add multilingual task suites#2

Open
rakkit wants to merge 8 commits into
opt_moefrom
opt_moe_multilingual
Open

Add multilingual task suites#2
rakkit wants to merge 8 commits into
opt_moefrom
opt_moe_multilingual

Conversation

@rakkit
Copy link
Copy Markdown

@rakkit rakkit commented Mar 28, 2026

see TASK_NAMING_Multilingual.md

@rakkit rakkit force-pushed the opt_moe_multilingual branch 7 times, most recently from 9811252 to 709ac24 Compare April 4, 2026 03:00
…tions

Refactor existing multilingual tasks and add new ones to match the English
task conventions (BPB merged into CF, explicit few-shot counts, English
explicitly excluded from all multilingual task registrations).

Refactored:
- mlmm_arc_challenge: cf+mcf variants, 26 langs, hf_revision pinned
- global_mmlu: cf+mcf variants, 33 langs (English removed), both formulations
- mlmm_hellaswag: cf only, 33 langs, few_shots_split=train added
- mgsm: :gen suffix, generation_size=512, both expr_gold + multilingual_quasi_em

New tasks:
- global_mmlu_lite: CohereLabs/Global-MMLU-Lite, 17 langs, cf+mcf
- mmlu_prox (multilingual): li-lab/MMLU-ProX, 28 langs, 10-option cf+mcf
- mmlu_prox (English): li-lab/MMLU-ProX English subset in tasks/tasks/
- wmt24pp: google/wmt24pp, 24 lang pairs × 2 directions, 0-shot gen

Tooling:
- scripts/multilingual_aggregate.py: cross-language average post-processor
- TASK_NAMING_Multilingual.md: naming conventions and language inventory doc

 add comet22 metrics
@rakkit rakkit force-pushed the opt_moe_multilingual branch from 709ac24 to 9cf5ef8 Compare April 4, 2026 14:57
"choices": [str(line["answer_number"])],
},
),
hf_repo="juletxara/mgsm",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's switch to https://huggingface.co/datasets/CohereLabs/global-mgsm

It has more languages + maybe they cleaned it, can you update accordingly? so replacing line["answer_number"] -> line["answer"] + extend _LANGUAGES + maybe smth else i'm missing

"""
source = (doc.specific or {}).get("source_text", "")
golds = as_list(doc.get_golds())
return COMETCorpusMetricInput(source=source, hyp=model_response.final_text, ref=golds)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle case when preds is a list (this is actually what typing downstream assumes)

Suggested change
return COMETCorpusMetricInput(source=source, hyp=model_response.final_text, ref=golds)
preds = model_response.final_text
if len(preds) > 1:
logger.warning("Multiple predictions present, keeping only the first prediction (for COMET).")
return COMETCorpusMetricInput(source=source, hyp=preds[0], ref=golds)

MultilingualQuasiExactMatchMetric(language, "full"),
],
stop_sequence=("\n",),
stop_sequence=["\n"],
Copy link
Copy Markdown
Member

@ofivite ofivite Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove "\n" because sometimes model makes linebreaks between reasoning

instead put ["Question:", "Answer:"] in all _LANGUAGES to stop generation when model starts looping

generation_size=25,
generation_size=512,
metrics=[
Metrics.expr_gold_metric,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric is for English-only and I tested it and out-of-the-box it worked only for like half of languages (i don't remember how i fixed it). It may or may not extract correctly, we need to test it and fix if needed

_arc_adapter,
formulation=formulation,
),
hf_repo="jon-tow/okapi_arc_challenge",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not for this PR but in general, we should extend arc + other tasks to cover at least core languages + major maybe too https://huggingface.co/collections/Eurolingua/evaluation-suite

Comment on lines +1451 to +1452
TRANSLATION_LITERALS[_language].question_word = "question"
TRANSLATION_LITERALS[_language].answer = "answer"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix it to have proper translations instead of english fallback? At least for Korean and Lithuanian

hf_avail_splits=("dev", "devtest"),
evaluation_splits=("devtest",),
few_shots_split="dev",
few_shots_select="random_sampling_from_train",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove for clarity (there is no "train" split)

@ofivite
Copy link
Copy Markdown
Member

ofivite commented Apr 14, 2026

also please add this fix: 9649aff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants