feat/automated-submission-code-review by mlederbauer · Pull Request #67 · pluskal-lab/MassSpecGym

mlederbauer · 2026-05-11T00:23:22Z

This PR introduces a restructuring of the leaderboard update workflow. In order to update the leaderboard, a PR with a model card must be submitted according to the submissions/SUBMISSION_GUIDE.md. All model cards are stored in submissions/ and used to update the leaderboard. Also includes model cards for all models previously in the leaderboard.

Asking for a review in particular on the following aspects:

For future PRs, upon merge with main, leaderboard will be updated from the model cards and not from the results csvs directly anymore (results csvs are updated under-the-hood with scripts/leaderboard/generate_results_csvs.py).

Note that for this workflow to work in the main repo, the repository secret ANTHROPIC_API_KEY must be set.

* adding MIST and SpecBridge Evals * Add DreaMS-MSG support. * feat: skill to build model with MSG ABCs * feat: skill for review for common issues * chore: cosmetic changes * chore: remove implementation script since we focus on review * chore: update review "skill" to be used as maintainer guide * feat: add submissions guide and model template card * feat: add submission review script * feat: update leaderboard GH action (to be triggered after manual approval) * feat: add "submissions" label to test workflow * chore: add pyarrow dependency * chore: move pyarrow to setup deps * tmp: add feature branch to github action sorkflow * Merge pull request #4 from mlederbauer/submission/diffms feat: level 1 model card impl * feat: add mist molforge submission * tmp chore: adaptations to review scripts * tmp feat: test out with diffms (v0 model card) * chore: update model card * fix: suibmission dir * cho9re: remove pubication field * chore: remove superfluous comments * chore: add .claude to gitignore * chore: update scripts in llm skills * feat: diffms implementation * feat: add mist molforge submission * tmp fix: remove model card for llm skills * chore: remocve mist molforge * choe: only model card for ttt-msms * Revert "choe: only model card for ttt-msms" This reverts commit dae836a. * feat v: llm review w local repo * feat: update system pormpt to use unified SKILL.md * chore: remove emojis from review * feat: update results metrics in model card instead of csv * chore: remove reference section * chore: refine warning gravity and red/yellow labeling for revbiew * fix: detect mist bug * Delete checkpoints/README.md * chore: remove superfluous code for LLM review submission * chore: remover superfluous code for llm skills submission * chore: remove superfluous code * chore: move leaderboard update scripts * chore: keep llm skills as CI branch for testing * chore: update workflow to edit csvs during PR * chore: cosmetic changes * feat: create model cards for baselines * fix: update trigger to also include feature branch --------- Co-authored-by: harrylaucngd <harrylaucngd@gmail.com>

* adding MIST and SpecBridge Evals * Add DreaMS-MSG support. * feat: skill to build model with MSG ABCs * feat: skill for review for common issues * chore: cosmetic changes * chore: remove implementation script since we focus on review * chore: update review "skill" to be used as maintainer guide * feat: add submissions guide and model template card * feat: add submission review script * feat: update leaderboard GH action (to be triggered after manual approval) * feat: add "submissions" label to test workflow * chore: add pyarrow dependency * chore: move pyarrow to setup deps * tmp: add feature branch to github action sorkflow * Merge pull request #4 from mlederbauer/submission/diffms feat: level 1 model card impl * feat: add mist molforge submission * tmp chore: adaptations to review scripts * tmp feat: test out with diffms (v0 model card) * chore: update model card * fix: suibmission dir * cho9re: remove pubication field * chore: remove superfluous comments * chore: add .claude to gitignore * chore: update scripts in llm skills * feat: diffms implementation * feat: add mist molforge submission * tmp fix: remove model card for llm skills * chore: remocve mist molforge * choe: only model card for ttt-msms * Revert "choe: only model card for ttt-msms" This reverts commit dae836a. * feat v: llm review w local repo * feat: update system pormpt to use unified SKILL.md * chore: remove emojis from review * feat: update results metrics in model card instead of csv * chore: remove reference section * chore: refine warning gravity and red/yellow labeling for revbiew * fix: detect mist bug * Delete checkpoints/README.md * chore: remove superfluous code for LLM review submission * chore: remover superfluous code for llm skills submission * chore: remove superfluous code * chore: move leaderboard update scripts * chore: keep llm skills as CI branch for testing * chore: update workflow to edit csvs during PR * chore: cosmetic changes * feat: create model cards for baselines * fix: update trigger to also include feature branch * chore: create PR for leaderboard update --------- Co-authored-by: harrylaucngd <harrylaucngd@gmail.com>

Restrict review submission workflow to main branch only.

harrylaucngd and others added 5 commits March 15, 2026 17:09

v1.5 model zoo implementation.

8864393

Update prepare_submission.yml

22b9986

Update branches for review submission workflow

ba21f4d

Restrict review submission workflow to main branch only.

mlederbauer marked this pull request as ready for review May 11, 2026 01:40

mlederbauer changed the title ~~(WIP) feat/automated-submission-code-review~~ feat/automated-submission-code-review May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/automated-submission-code-review#67

feat/automated-submission-code-review#67
mlederbauer wants to merge 5 commits into
pluskal-lab:mainfrom
mlederbauer:main

mlederbauer commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mlederbauer commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mlederbauer commented May 11, 2026 •

edited

Loading