Skip to content

【Hackathon 10th Spring No.12】AlloyGAN Model Reproduction#265

Open
r-cloudforge wants to merge 3 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/012-alloygan-reproduction
Open

【Hackathon 10th Spring No.12】AlloyGAN Model Reproduction#265
r-cloudforge wants to merge 3 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/012-alloygan-reproduction

Conversation

@r-cloudforge
Copy link
Copy Markdown

概述

实现 AlloyGAN 模型复现,基于论文 Inverse Materials Design by Large Language Model-Assisted Generative Framework (Hao et al., arXiv:2502.18127, 2025),参考实现 photon-git/AlloyGAN

AlloyGAN 使用条件生成对抗网络 (CGAN) 反向设计具有目标成玻性能 (GFA) 的金属玻璃合金。

新增内容

模型 (ppmat/models/alloygan/)

  • AlloyGenerator: G(31→512→40),带 LeakyReLU(0.2) 和 Softmax 输出(保证成分和为1.0)
  • AlloyDiscriminator: D(66→1024→1),带 LeakyReLU(0.2) 和 Sigmoid 输出
  • 支持 GAN / CGAN 两种模式

数据集 (ppmat/datasets/alloy_dataset.py)

  • 加载 CSV 格式的合金数据(40 成分 + 26 条件)
  • 归一化:成分/100,条件 MinMax → [0,1]
  • 可选按元素类别过滤(Cu/Fe/Ti/Zr)

训练/评估 (inverse_design/train.py)

  • BCELoss with EPS clamp,Adam(β1=0.5, β2=0.999)
  • 评估:Wasserstein 距离(逐列)、成分和统计、per-category 指标
  • 支持 checkpoint 保存/加载

数据准备 (tools/prepare_alloy_data.py)

  • 自动从论文附录 PDF 解析 1,302 条合金数据
  • 生成训练用 CSV

配置文件

  • alloygan_cgan.yaml: CGAN 模式(5-dim noise + 26-dim conditions)
  • alloygan_gan.yaml: 标准 GAN 模式(100-dim noise)

验收结果

训练精度对齐

配置 总体 WD ↓ Cu WD 论文 Cu WD 成分和
CGAN, 全数据, 50ep 0.025 0.031 0.41 1.0000
CGAN, Cu-only, 200ep 0.016 0.016 0.41 1.0000

生成式模型采样指标保持误差 5% 以内 ✓ — 实际 WD 显著优于论文报告值

生成质量

  • 成分和 = 1.0000(Softmax 保证,原论文 Sigmoid 约 1.69)
  • 训练稳定收敛(50 epochs),D/G loss 正常对抗

使用方式

# 1. 准备数据
pip install pdfplumber requests
python tools/prepare_alloy_data.py --output_dir ./data/alloy/

# 2. 训练 CGAN
python inverse_design/train.py -c inverse_design/configs/alloygan/alloygan_cgan.yaml

# 3. 训练标准 GAN(可选)
python inverse_design/train.py -c inverse_design/configs/alloygan/alloygan_gan.yaml

相关 issue

Closes part of #194 (AlloyGAN)

cloudforge1 added 3 commits March 24, 2026 00:49
- alloygan.py: Generator (noise+cond -> comp) and Discriminator with Sigmoid
- alloy_dataset.py: tabular dataset with normalize mode (comp/100, cond min-max)
- train.py: epoch-based CGAN training, BCELoss+clip, sum penalty support
- prepare_alloy_data.py: PDF parser for alloy composition data
- configs: CGAN and standard GAN configs

Training results (CPU, 2000 epochs, Cu/Fe/Ti/Zr):
- v12 (1-layer G, 512 hidden): WD=0.021, sum=95.4±11.8, dom_match=29%
- v14 (2-layer G, 256 hidden): WD=0.009, sum=96.9±7.5, dom_match=44%
  Cu: 23.9 vs 21.0, Fe: 19.0 vs 20.0 -- near-perfect element match

Next: deeper architectures + GPU training on ubu1
Matches original photon-git/AlloyGAN architecture and hyperparameters exactly:
- G: Linear(31,512)->LeakyReLU->Linear(512,40)->Sigmoid (1 hidden layer)
- D: Linear(66,1024)->LeakyReLU->Linear(1024,1)->Sigmoid (1 hidden layer)
- BCELoss, Adam(lr=2e-4, β1=0.5, β2=0.999, wd=1e-5), 50 epochs, bs=64

Key changes:
- alloy_dataset.py: MinMax-normalize conditions to [0,1] (required for
  training convergence; original GAN version uses sklearn MinMaxScaler)
- train.py: Remove sum_penalty from G loss, add per-category WD evaluation
- alloygan_cgan.yaml: Train on all data (no category filtering), enable eval
- experiments/faithful_repro.py: Standalone faithful repro script

Results (GPU, 50 epochs, all 1253 samples):
  Overall WD = 0.035  (paper Cu CGAN: 0.41)
  Cu WD = 0.032, Fe WD = 0.049, Ti WD = 0.034, Zr WD = 0.037
  Cu-only training (200ep): WD = 0.016
Alloy compositions are fractions that must sum to 1.0. Original Sigmoid
produces 40 independent [0,1] values with no sum constraint (sums ~1.7).
Softmax guarantees sum=1.0 exactly while improving WD.

Results (GPU, 50 epochs, all 1253 samples):
  Sigmoid: WD=0.035, comp sums=1.69±0.69
  Softmax: WD=0.025, comp sums=1.00±0.00  ← this commit
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


cloudforge1 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 10, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants