feat/automated-submission-code-review#67
Open
mlederbauer wants to merge 5 commits into
Open
Conversation
* adding MIST and SpecBridge Evals * Add DreaMS-MSG support. * feat: skill to build model with MSG ABCs * feat: skill for review for common issues * chore: cosmetic changes * chore: remove implementation script since we focus on review * chore: update review "skill" to be used as maintainer guide * feat: add submissions guide and model template card * feat: add submission review script * feat: update leaderboard GH action (to be triggered after manual approval) * feat: add "submissions" label to test workflow * chore: add pyarrow dependency * chore: move pyarrow to setup deps * tmp: add feature branch to github action sorkflow * Merge pull request #4 from mlederbauer/submission/diffms feat: level 1 model card impl * feat: add mist molforge submission * tmp chore: adaptations to review scripts * tmp feat: test out with diffms (v0 model card) * chore: update model card * fix: suibmission dir * cho9re: remove pubication field * chore: remove superfluous comments * chore: add .claude to gitignore * chore: update scripts in llm skills * feat: diffms implementation * feat: add mist molforge submission * tmp fix: remove model card for llm skills * chore: remocve mist molforge * choe: only model card for ttt-msms * Revert "choe: only model card for ttt-msms" This reverts commit dae836a. * feat v: llm review w local repo * feat: update system pormpt to use unified SKILL.md * chore: remove emojis from review * feat: update results metrics in model card instead of csv * chore: remove reference section * chore: refine warning gravity and red/yellow labeling for revbiew * fix: detect mist bug * Delete checkpoints/README.md * chore: remove superfluous code for LLM review submission * chore: remover superfluous code for llm skills submission * chore: remove superfluous code * chore: move leaderboard update scripts * chore: keep llm skills as CI branch for testing * chore: update workflow to edit csvs during PR * chore: cosmetic changes * feat: create model cards for baselines * fix: update trigger to also include feature branch --------- Co-authored-by: harrylaucngd <harrylaucngd@gmail.com>
* adding MIST and SpecBridge Evals * Add DreaMS-MSG support. * feat: skill to build model with MSG ABCs * feat: skill for review for common issues * chore: cosmetic changes * chore: remove implementation script since we focus on review * chore: update review "skill" to be used as maintainer guide * feat: add submissions guide and model template card * feat: add submission review script * feat: update leaderboard GH action (to be triggered after manual approval) * feat: add "submissions" label to test workflow * chore: add pyarrow dependency * chore: move pyarrow to setup deps * tmp: add feature branch to github action sorkflow * Merge pull request #4 from mlederbauer/submission/diffms feat: level 1 model card impl * feat: add mist molforge submission * tmp chore: adaptations to review scripts * tmp feat: test out with diffms (v0 model card) * chore: update model card * fix: suibmission dir * cho9re: remove pubication field * chore: remove superfluous comments * chore: add .claude to gitignore * chore: update scripts in llm skills * feat: diffms implementation * feat: add mist molforge submission * tmp fix: remove model card for llm skills * chore: remocve mist molforge * choe: only model card for ttt-msms * Revert "choe: only model card for ttt-msms" This reverts commit dae836a. * feat v: llm review w local repo * feat: update system pormpt to use unified SKILL.md * chore: remove emojis from review * feat: update results metrics in model card instead of csv * chore: remove reference section * chore: refine warning gravity and red/yellow labeling for revbiew * fix: detect mist bug * Delete checkpoints/README.md * chore: remove superfluous code for LLM review submission * chore: remover superfluous code for llm skills submission * chore: remove superfluous code * chore: move leaderboard update scripts * chore: keep llm skills as CI branch for testing * chore: update workflow to edit csvs during PR * chore: cosmetic changes * feat: create model cards for baselines * fix: update trigger to also include feature branch * chore: create PR for leaderboard update --------- Co-authored-by: harrylaucngd <harrylaucngd@gmail.com>
Restrict review submission workflow to main branch only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a restructuring of the leaderboard update workflow. In order to update the leaderboard, a PR with a model card must be submitted according to the
submissions/SUBMISSION_GUIDE.md. All model cards are stored insubmissions/and used to update the leaderboard. Also includes model cards for all models previously in the leaderboard.Asking for a review in particular on the following aspects:
---> note that these are the LLM-review-only submissions. In order to be included in the leaderboard, the label
ready-to-mergemust be added by the maintainers, which results in the PR to be closed and a new PR with the model card & the updated .csv file created ONLY. Reason: That way, we can take into account PRs to the upstream repo, main branches on forks, and feature branches on forks. Also, the superfluous source code is automatically removed. An example of such a "final merge request" (that can simply be merged to main) is shown here: [prepared] feat: submit source code for MVP mlederbauer/MassSpecGym#20---> note that models with "hard fails" (e.g., missing metrics, vide SpectraLLM) cannot be merged. All submissions require maintainer approval with the ready-to-merge label.
skills/review/SKILL.mdandscripts/leaderboard/review_submission.py. Approved for the NeurIPS submission.For future PRs, upon merge with
main, leaderboard will be updated from the model cards and not from the results csvs directly anymore (results csvs are updated under-the-hood withscripts/leaderboard/generate_results_csvs.py).Note that for this workflow to work in the main repo, the repository secret
ANTHROPIC_API_KEYmust be set.