Skip to content

Review: Condensed matter generation task (Jeffrey)#44

Closed
doncamilom wants to merge 44 commits intomainfrom
condmatgen-jeffrey
Closed

Review: Condensed matter generation task (Jeffrey)#44
doncamilom wants to merge 44 commits intomainfrom
condmatgen-jeffrey

Conversation

@doncamilom
Copy link
Copy Markdown
Contributor

Reproducibility Assessment — condmatgen-jeffrey

Author: Jeffrey | Commits: 41 | Files changed: 11 | +395 / -10 lines

Status: NOT READY for reviewer reproducibility. Needs significant cleanup before merge.


What this contributes

A new GRPO task — Conditional Material Generation (condmatgen) — where the model proposes a novel crystalline compound (element list + space group) given a set of chemical elements.

New files:

  • src/open_r1/tasks/condmatgen/condmatgen.py (325 lines) — ConditionalMaterialGeneration task class
  • src/open_r1/tasks/condmatgen/comps_used_in_sft.json — placeholder for seen-compositions dedup (currently empty [])
  • recipes/condmatgen.yaml — GRPO training recipe

Reward logic (multi-signal, ~170 lines):

  1. Format reward: checks <think>/<answer> tag presence and ordering
  2. Reasoning length penalty: -5 if <think> content < 500 chars
  3. Space group validity: must be 1–230
  4. Element precision: penalizes extra elements not in prompt
  5. SMACT validity: uses smact.screening.smact_validity() via pymatgen
  6. Novelty bonus: +2 if composition not seen before (tracked in self.seen_comps_set)

Breaks / Blockers

Issue Severity
Global "stop": ["</answer>"] added to utils.py SamplingParams — affects ALL tasks, not just condmatgen. Could truncate other tasks' generations. Breaking
base.py random_print rate changed from 0.01 → 0.1 — 10x more debug output for ALL tasks Breaking
launch_CSCS.slurm overwritten with Jeffrey-specific paths (a131 account, personal dir) Breaking for shared infra
model_paths.txt adds entries pointing to personal CSCS storage Non-portable

Reproducibility Gaps

Gap Details
5+ hardcoded absolute paths condmatgen.py line 57: /capstor/store/cscs/swissai/a131/jmeng/sink/...; recipe: /capstor/.../a131/jmeng/sink/src/open_r1/dataset/; SLURM script: 3 occurrences
Missing dataset NatureLM_conditional_v2.json is not in repo, not on HuggingFace, no download script or instructions
Undeclared dependencies smact and pymatgen imported but not in setup.py or any requirements file
No __init__.py in condmatgen/ directory May cause import issues in some Python setups
comps_used_in_sft.json is empty Novelty bonus is always available — may not reflect intended training dynamics
No tests Zero unit tests for reward functions
No documentation No .rst file in docs/source/tasks/
~170 lines of commented-out code Element/space-group overuse counters, debug prints
Unused imports requests, Optional, rdkit.Chem, pd (pandas)
38 unsquashed debug commits "added debugging code" x7, "try this...", "see if its coz of..."

What's needed for reviewer reproducibility

  • Replace all hardcoded /capstor/... paths with ${MIST_DATA_DIR} or os.path.dirname(__file__)
  • Revert the global "stop" token change in utils.py — make it task-specific
  • Revert random_print rate change in base.py
  • Revert personal changes to launch_CSCS.slurm and model_paths.txt
  • Add dataset download/preparation instructions or script
  • Add smact and pymatgen to project dependencies
  • Add docs/source/tasks/condmatgen.rst
  • Add unit tests for accuracy_reward
  • Remove unused imports and commented-out code
  • Squash or clean up the 38 debug commits
  • Add demo fixture data for smoke testing

🤖 Generated with Claude Code

@doncamilom
Copy link
Copy Markdown
Contributor Author

Paper-to-Code Mapping Update

This task is described in the paper as CMG (Conditional Material Generation).

Paper claims for CMG:

  • Table 4: CMG accuracy goes from 40.6% (base) to 64.9% direct / 70.5% reasoning after RL
  • Reward function described: R = alpha1Validity + alpha2Precision + alpha3Novelty + alpha4Format
  • Dataset: Materials Project, 1000 samples with constituent-element prompts
  • One of 3 inorganic chemistry tasks

What this branch provides:

  • The GRPO training task implementation (matches paper's reward design)
  • The accuracy_reward with SMACT validity + element precision + novelty bonus aligns with paper description

What is MISSING to back the paper's claims:

  1. Dataset not available -- the NatureLM_conditional_v2.json data is not public and has no download script. dataset_components.csv lists this as "needs export, planned"
  2. No evaluation harness -- paper reports accuracy numbers but there is no standalone evaluation script to reproduce Table 4 results
  3. Two implementations exist -- this branch (condmatgen-jeffrey) and PR Binary compound  #33 (binary_compound_Ruizhi) both implement CMG differently. Which was used for the paper results?
  4. Trained model checkpoint not released -- needed to verify reported accuracies

Priority for reviewer reproducibility: HIGH

This task is a headline result in the paper (Table 4). Without the dataset, evaluation script, and a working task implementation in main, the CMG results cannot be reproduced.

@doncamilom doncamilom mentioned this pull request Apr 6, 2026
@doncamilom
Copy link
Copy Markdown
Contributor Author

Final Classification

Paper task: CMG (Conditional Material Generation)
Priority: HIGH
Paper relevance: Headline result -- Table 4 reports CMG accuracy 40.6% -> 64.9%/70.5% after RL

Verdict

This PR provides one of two competing CMG implementations (the other is in PR #33). It must be determined which implementation produced the paper results. Once resolved, the chosen implementation needs: bug fixes (global stop token, global print rate change), dataset release, evaluation harness, tests, and documentation. The other implementation should be closed.

Work needed for peer review

Item Effort
Determine which CMG implementation (this or PR #33) matches paper results Decision
Revert global stop token and random_print changes in utils.py/base.py Small
Replace 5+ hardcoded /capstor/ paths with ${MIST_DATA_DIR} Small
Release NatureLM_conditional_v2.json dataset (or provide generation script) Medium
Add smact and pymatgen to setup.py Small
Write evaluation script to reproduce Table 4 CMG numbers Medium
Add unit tests for reward functions Medium
Add docs/source/tasks/condmatgen.rst Medium
Add demo fixture for smoke testing Small
Remove ~170 lines commented-out code, unused imports Small
Squash 38 debug commits Small

@doncamilom doncamilom added priority: high Important for paper claims effort: medium A day or two of work paper-task Implements a task described in the paper has-bugs Contains known runtime bugs labels Apr 6, 2026
@doncamilom doncamilom force-pushed the condmatgen-jeffrey branch from 0fbadd7 to 9d6efd2 Compare April 6, 2026 13:00
doncamilom and others added 2 commits April 6, 2026 16:41
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@doncamilom doncamilom closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort: medium A day or two of work has-bugs Contains known runtime bugs paper-task Implements a task described in the paper priority: high Important for paper claims

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants