Review: Condensed matter generation task (Jeffrey) by doncamilom · Pull Request #44 · schwallergroup/mist

doncamilom · 2026-04-06T11:00:09Z

Reproducibility Assessment — `condmatgen-jeffrey`

Author: Jeffrey | Commits: 41 | Files changed: 11 | +395 / -10 lines

Status: NOT READY for reviewer reproducibility. Needs significant cleanup before merge.

What this contributes

A new GRPO task — Conditional Material Generation (condmatgen) — where the model proposes a novel crystalline compound (element list + space group) given a set of chemical elements.

New files:

src/open_r1/tasks/condmatgen/condmatgen.py (325 lines) — ConditionalMaterialGeneration task class
src/open_r1/tasks/condmatgen/comps_used_in_sft.json — placeholder for seen-compositions dedup (currently empty [])
recipes/condmatgen.yaml — GRPO training recipe

Reward logic (multi-signal, ~170 lines):

Format reward: checks <think>/<answer> tag presence and ordering
Reasoning length penalty: -5 if <think> content < 500 chars
Space group validity: must be 1–230
Element precision: penalizes extra elements not in prompt
SMACT validity: uses smact.screening.smact_validity() via pymatgen
Novelty bonus: +2 if composition not seen before (tracked in self.seen_comps_set)

Breaks / Blockers

Issue	Severity
Global `"stop": ["</answer>"]` added to `utils.py` SamplingParams — affects ALL tasks, not just condmatgen. Could truncate other tasks' generations.	Breaking
`base.py` `random_print` rate changed from 0.01 → 0.1 — 10x more debug output for ALL tasks	Breaking
`launch_CSCS.slurm` overwritten with Jeffrey-specific paths (`a131` account, personal dir)	Breaking for shared infra
`model_paths.txt` adds entries pointing to personal CSCS storage	Non-portable

Reproducibility Gaps

Gap	Details
5+ hardcoded absolute paths	`condmatgen.py` line 57: `/capstor/store/cscs/swissai/a131/jmeng/sink/...`; recipe: `/capstor/.../a131/jmeng/sink/src/open_r1/dataset/`; SLURM script: 3 occurrences
Missing dataset	`NatureLM_conditional_v2.json` is not in repo, not on HuggingFace, no download script or instructions
Undeclared dependencies	`smact` and `pymatgen` imported but not in `setup.py` or any requirements file
No `__init__.py` in `condmatgen/` directory	May cause import issues in some Python setups
`comps_used_in_sft.json` is empty	Novelty bonus is always available — may not reflect intended training dynamics
No tests	Zero unit tests for reward functions
No documentation	No `.rst` file in `docs/source/tasks/`
~170 lines of commented-out code	Element/space-group overuse counters, debug prints
Unused imports	`requests`, `Optional`, `rdkit.Chem`, `pd` (pandas)
38 unsquashed debug commits	"added debugging code" x7, "try this...", "see if its coz of..."

What's needed for reviewer reproducibility

🤖 Generated with Claude Code

doncamilom · 2026-04-06T11:39:10Z

Paper-to-Code Mapping Update

This task is described in the paper as CMG (Conditional Material Generation).

Paper claims for CMG:

Table 4: CMG accuracy goes from 40.6% (base) to 64.9% direct / 70.5% reasoning after RL
Reward function described: R = alpha1Validity + alpha2Precision + alpha3Novelty + alpha4Format
Dataset: Materials Project, 1000 samples with constituent-element prompts
One of 3 inorganic chemistry tasks

What this branch provides:

The GRPO training task implementation (matches paper's reward design)
The accuracy_reward with SMACT validity + element precision + novelty bonus aligns with paper description

What is MISSING to back the paper's claims:

Dataset not available -- the NatureLM_conditional_v2.json data is not public and has no download script. dataset_components.csv lists this as "needs export, planned"
No evaluation harness -- paper reports accuracy numbers but there is no standalone evaluation script to reproduce Table 4 results
Two implementations exist -- this branch (condmatgen-jeffrey) and PR Binary compound #33 (binary_compound_Ruizhi) both implement CMG differently. Which was used for the paper results?
Trained model checkpoint not released -- needed to verify reported accuracies

Priority for reviewer reproducibility: HIGH

This task is a headline result in the paper (Table 4). Without the dataset, evaluation script, and a working task implementation in main, the CMG results cannot be reproduced.

doncamilom · 2026-04-06T11:44:26Z

Final Classification

Paper task: CMG (Conditional Material Generation)
Priority: HIGH
Paper relevance: Headline result -- Table 4 reports CMG accuracy 40.6% -> 64.9%/70.5% after RL

Verdict

This PR provides one of two competing CMG implementations (the other is in PR #33). It must be determined which implementation produced the paper results. Once resolved, the chosen implementation needs: bug fixes (global stop token, global print rate change), dataset release, evaluation harness, tests, and documentation. The other implementation should be closed.

Work needed for peer review

Item	Effort
Determine which CMG implementation (this or PR #33) matches paper results	Decision
Revert global `stop` token and `random_print` changes in `utils.py`/`base.py`	Small
Replace 5+ hardcoded `/capstor/` paths with `${MIST_DATA_DIR}`	Small
Release NatureLM_conditional_v2.json dataset (or provide generation script)	Medium
Add `smact` and `pymatgen` to `setup.py`	Small
Write evaluation script to reproduce Table 4 CMG numbers	Medium
Add unit tests for reward functions	Medium
Add `docs/source/tasks/condmatgen.rst`	Medium
Add demo fixture for smoke testing	Small
Remove ~170 lines commented-out code, unused imports	Small
Squash 38 debug commits	Small

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

doncamilom mentioned this pull request Apr 6, 2026

Binary compound #33

Open

doncamilom added priority: high Important for paper claims effort: medium A day or two of work paper-task Implements a task described in the paper has-bugs Contains known runtime bugs labels Apr 6, 2026

Meng0o added 23 commits April 6, 2026 14:58

added cond_mat_gen task

982d36c

fixed slurm and model_paths

b84e908

changed launch_CSCS.slurm

7cd3a60

Fixed config

8f2297b

added deps

4bca3cd

fixed recipe filename

909f4e3

fixed data filename

27384d9

removed mkdir in condmatgen

8baeea0

fixing dataloader

12e46ed

added debugging code for reading data

5d92141

fixed data loading error

3a9d100

added placeholder solutions

16a6ca9

moved the randomisation into the read file function

bb7aa25

moved the randomisation into the read file function

568bfa6

fixed random import

3848abc

fixed random seed

3903f47

added debugging code

f586bf4

fixed try statement

55f317b

added debugging code

011e47a

see if its coz of the double quotes

5828d3e

see if its coz of the get

acfce96

try this keyword

7d1ad6a

try this dataset conversion

7802ef6

Meng0o and others added 19 commits April 6, 2026 14:58

try this override

ea2dfb8

added debugging code

68c3f34

added debugging code

b808b56

added debugging code

45e5cfd

added debugging code

b4fcfd4

Maybe its this

d3c935f

made problems a list of dicts

d807a76

added generate prompt

1767bf1

changed to 12 hrs

fc03efc

found bug in reward

0f6f4b0

Added debugging code

e41b663

added condmatgen.yaml

43b5815

cleaned up cgm

6fedda6

changed gitignore

bbd2473

Stop tracking wandb_api_key.txt

033d4a0

bug fixing cgm

014f8c2

improving cgm training

66d6bd7

continued

5f11827

Run black and isort for CI compliance

9d6efd2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

doncamilom force-pushed the condmatgen-jeffrey branch from 0fbadd7 to 9d6efd2 Compare April 6, 2026 13:00

doncamilom and others added 2 commits April 6, 2026 16:41

Make pymatgen/smact imports lazy for CI compatibility

ccf87e6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix lint, add missing doc stub, use portable kinetic test

4449832

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

doncamilom closed this Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review: Condensed matter generation task (Jeffrey)#44

Review: Condensed matter generation task (Jeffrey)#44
doncamilom wants to merge 44 commits intomainfrom
condmatgen-jeffrey

doncamilom commented Apr 6, 2026

Uh oh!

doncamilom commented Apr 6, 2026

Uh oh!

doncamilom commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

doncamilom commented Apr 6, 2026

Reproducibility Assessment — condmatgen-jeffrey

What this contributes

Breaks / Blockers

Reproducibility Gaps

What's needed for reviewer reproducibility

Uh oh!

doncamilom commented Apr 6, 2026

Paper-to-Code Mapping Update

Paper claims for CMG:

What this branch provides:

What is MISSING to back the paper's claims:

Priority for reviewer reproducibility: HIGH

Uh oh!

doncamilom commented Apr 6, 2026

Final Classification

Verdict

Work needed for peer review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reproducibility Assessment — `condmatgen-jeffrey`