Skip to content

Dev/v1.5 MassSpecGym#68

Open
harrylaucngd wants to merge 41 commits into
pluskal-lab:mainfrom
harrylaucngd:dev/v1.5
Open

Dev/v1.5 MassSpecGym#68
harrylaucngd wants to merge 41 commits into
pluskal-lab:mainfrom
harrylaucngd:dev/v1.5

Conversation

@harrylaucngd

@harrylaucngd harrylaucngd commented May 12, 2026

Copy link
Copy Markdown

This PR merges dev/v1.5 into pluskal-lab/MassSpecGym. On top of upstream main, it adds the v1.5 benchmark / model zoo, data and download paths, implementations for encoders, retrieval, de novo, simulation, and oracles, plus scripts, dependencies, and tests.

What changed (by area)

  1. Data & Hub

    • massspecgym/data/: download, fp2mol_dataset, MIST-related modules, subformulae, transforms, sanity_check, etc., for v1.5-style loading and inferred-formula workflows.
    • New data/exclude_inchikeys.csv.
    • Supporting scripts/download_data.py, scripts/convert_to_parquet.py.
  2. Model zoo / v1.5

    • De novo (FP2Mol): diffms, frigid, molforge, shared fp2mol base.
    • Encoders: MIST, DreaMS.
    • Retrieval: generative_retrieval, iceberg_retrieval, mist_retrieval, updates to retrieval/base and related code.
    • Oracles: Iceberg, MIST-CF (predict, fast_filter, sirius, etc.).
    • Simulation (Iceberg): adapter, DAG, intensity/joint models, Magma fragmentation.
  3. Scripts

    • Updates to scripts/run.py, run_simulation.py, etc., for v1.5 runs.
  4. Packaging & docs

    • setup.py (e.g. pyarrow and related deps).
    • README.md, checkpoints/README.md.

harrylaucngd and others added 30 commits February 5, 2026 10:10
Merge the MIST-CF branch history while keeping the v1.5 closure tree focused on the corrected oracle implementation.

Co-authored-by: Cursor <cursoragent@cursor.com>
v1.5 model zoo implementation.
* tmp feat: test out with diffms (v0 model card)

* chore: update model card

* fix: suibmission dir

* cho9re: remove pubication field

* chore: remove superfluous comments

* chore: add .claude to gitignore

* chore: update scripts in llm skills

* feat: diffms implementation

* feat: add mist molforge submission

* tmp fix: remove model card for llm skills
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants