Skip to content

feat/automated-submission-code-review#67

Open
mlederbauer wants to merge 5 commits into
pluskal-lab:mainfrom
mlederbauer:main
Open

feat/automated-submission-code-review#67
mlederbauer wants to merge 5 commits into
pluskal-lab:mainfrom
mlederbauer:main

Conversation

@mlederbauer

@mlederbauer mlederbauer commented May 11, 2026

Copy link
Copy Markdown

This PR introduces a restructuring of the leaderboard update workflow. In order to update the leaderboard, a PR with a model card must be submitted according to the submissions/SUBMISSION_GUIDE.md. All model cards are stored in submissions/ and used to update the leaderboard. Also includes model cards for all models previously in the leaderboard.

Asking for a review in particular on the following aspects:

For future PRs, upon merge with main, leaderboard will be updated from the model cards and not from the results csvs directly anymore (results csvs are updated under-the-hood with scripts/leaderboard/generate_results_csvs.py).

Note that for this workflow to work in the main repo, the repository secret ANTHROPIC_API_KEY must be set.

harrylaucngd and others added 5 commits March 15, 2026 17:09
* adding MIST and SpecBridge Evals

* Add DreaMS-MSG support.

* feat: skill to build model with MSG ABCs

* feat: skill for review for common issues

* chore: cosmetic changes

* chore: remove implementation script since we focus on review

* chore: update review "skill" to be used as maintainer guide

* feat: add submissions guide and model template card

* feat: add submission review script

* feat: update leaderboard GH action (to be triggered after manual approval)

* feat: add "submissions" label to test workflow

* chore: add pyarrow dependency

* chore: move pyarrow to setup deps

* tmp: add feature branch to github action sorkflow

* Merge pull request #4 from mlederbauer/submission/diffms

feat: level 1 model card impl

* feat: add mist molforge submission

* tmp chore: adaptations to review scripts

* tmp feat: test out with diffms (v0 model card)

* chore: update model card

* fix: suibmission dir

* cho9re: remove pubication field

* chore: remove superfluous comments

* chore: add .claude to gitignore

* chore: update scripts in llm skills

* feat: diffms implementation

* feat: add mist molforge submission

* tmp fix: remove model card for llm skills

* chore: remocve mist molforge

* choe: only model card for ttt-msms

* Revert "choe: only model card for ttt-msms"

This reverts commit dae836a.

* feat v: llm review w local repo

* feat: update system pormpt to use unified SKILL.md

* chore: remove emojis from review

* feat: update results metrics in model card instead of csv

* chore: remove reference section

* chore: refine warning gravity and red/yellow labeling for revbiew

* fix: detect mist bug

* Delete checkpoints/README.md

* chore: remove superfluous code for LLM review submission

* chore: remover superfluous code for llm skills submission

* chore: remove superfluous code

* chore: move leaderboard update scripts

* chore: keep llm skills as CI branch for testing

* chore: update workflow to edit csvs during PR

* chore: cosmetic changes

* feat: create model cards for baselines

* fix: update trigger to also include feature branch

---------

Co-authored-by: harrylaucngd <harrylaucngd@gmail.com>
* adding MIST and SpecBridge Evals

* Add DreaMS-MSG support.

* feat: skill to build model with MSG ABCs

* feat: skill for review for common issues

* chore: cosmetic changes

* chore: remove implementation script since we focus on review

* chore: update review "skill" to be used as maintainer guide

* feat: add submissions guide and model template card

* feat: add submission review script

* feat: update leaderboard GH action (to be triggered after manual approval)

* feat: add "submissions" label to test workflow

* chore: add pyarrow dependency

* chore: move pyarrow to setup deps

* tmp: add feature branch to github action sorkflow

* Merge pull request #4 from mlederbauer/submission/diffms

feat: level 1 model card impl

* feat: add mist molforge submission

* tmp chore: adaptations to review scripts

* tmp feat: test out with diffms (v0 model card)

* chore: update model card

* fix: suibmission dir

* cho9re: remove pubication field

* chore: remove superfluous comments

* chore: add .claude to gitignore

* chore: update scripts in llm skills

* feat: diffms implementation

* feat: add mist molforge submission

* tmp fix: remove model card for llm skills

* chore: remocve mist molforge

* choe: only model card for ttt-msms

* Revert "choe: only model card for ttt-msms"

This reverts commit dae836a.

* feat v: llm review w local repo

* feat: update system pormpt to use unified SKILL.md

* chore: remove emojis from review

* feat: update results metrics in model card instead of csv

* chore: remove reference section

* chore: refine warning gravity and red/yellow labeling for revbiew

* fix: detect mist bug

* Delete checkpoints/README.md

* chore: remove superfluous code for LLM review submission

* chore: remover superfluous code for llm skills submission

* chore: remove superfluous code

* chore: move leaderboard update scripts

* chore: keep llm skills as CI branch for testing

* chore: update workflow to edit csvs during PR

* chore: cosmetic changes

* feat: create model cards for baselines

* fix: update trigger to also include feature branch

* chore: create PR for leaderboard update

---------

Co-authored-by: harrylaucngd <harrylaucngd@gmail.com>
Restrict review submission workflow to main branch only.
@mlederbauer mlederbauer marked this pull request as ready for review May 11, 2026 01:40
@mlederbauer mlederbauer changed the title (WIP) feat/automated-submission-code-review feat/automated-submission-code-review May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants