Skip to content

【Hackathon 10th Spring No.11】TrinityLLM模型复现#269

Open
r-cloudforge wants to merge 5 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/011-trinityllm-reproduction
Open

【Hackathon 10th Spring No.11】TrinityLLM模型复现#269
r-cloudforge wants to merge 5 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:task/011-trinityllm-reproduction

Conversation

@r-cloudforge
Copy link
Copy Markdown

通过迁移 https://github.com/ningliu-iga/TrinityLLM 项目实现 TrinityLLM 的PaddlePaddle版本复现。

主要改动

  1. 新增 ppmat/models/trinityllm/ 模型目录 — 完整的TrinityLLM实现,包括SMILES tokenizer和材料性质预测LLM
  2. 新增 property_prediction/configs/trinityllm/ 训练配置
  3. 新增 test/trinityllm/ 测试代码 — 13个单元测试

模型特性

  • 基于SMILES的材料大语言模型
  • 自定义SMILES Tokenizer,支持分子表达式解析
  • RoPE (旋转位置编码) 注意力机制
  • SwiGLU 前馈网络
  • RMSNorm 归一化
  • 基于PaddleMaterials标准API
  • 已注册到 ppmat/models/__init__.py

测试

cd test && python -m pytest trinityllm/ -v
# 13 passed

参考链接

cloudforge1 and others added 4 commits March 28, 2026 10:00
Implement MoLFormer-based SMILES language model with Rotary Position
Embeddings, PropertyHead with skip connections, SMILESTokenizer, and
13 unit tests covering model + tokenizer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


cloudforge1 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 11, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 11, 2026
- Implement SMILESDataset for CSV-based SMILES property prediction
- Register SMILESDataset in ppmat/datasets/__init__.py
- Make pad_id configurable in TrinityLLM model (was hardcoded to 2)
- Update config YAML with pad_id and proper dataset params

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants