Skip to content

Implementation of "Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models" [NeurIPS 2025]

Notifications You must be signed in to change notification settings

maple-research-lab/LLaDOU

Repository files navigation

Large Language Diffusion with Ordered Unmasking (LLaDOU)

ArXiv Checkpoint Checkpoint

We introduce the Large Language Diffusion with Ordered Unmasking (LLaDOU), which is trained by reinforcing a new reasoning paradigm named the Diffusion Chain of Lateral Thought (DCoLT) for diffusion language models.

Compared to standard CoT, DCoLT is distinguished with several notable features:

  • Bidirectional Reasoning: Allowing global refinement throughout generations with bidirectional self-attention masks.
  • Format-Free Reasoning: No strict rule on grammatical correctness amid its intermediate steps of thought.
  • Nonlinear Generation: Generating tokens at various positions in different steps.

Demonstration of DCoLT

News

Getting Started

Inference

import torch
from transformers import AutoTokenizer
from networks.lladou_v0 import LLaDOUModelLM, sample

tokenizer = AutoTokenizer.from_pretrained("models/LLaDOU-v0-Math")
model = LLaDOUModelLM.from_pretrained(
    pretrained_model_name_or_path="models/LLaDOU-v0-Math",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)

problem = "What is the answer of 1+1?"
outputs = sample(
    model,
    problem,
    tokenizer,
    device="cuda",
)
response = outputs["responses"][0]
print(response)

Training

We provide an example to train LLaDOU on GSM8K dataset, feel free to change the configuration file!

accelerate launch --num_processes 8 --config_file configs/accelerate/fsdp.yaml train.py --config configs/gsm8k_64step_example.yaml

Evaluation

Prepare datasets as following:

├── datasets
│   ├── gsm8k
│   │   └── ...
│   ├── MATH
│   │   └── ...
│   ├── mbpp.jsonl
│   ├── mbpp_test.jsonl
│   └── HumanEval.jsonl.gz
Evaluation Metrics

Evaluation Metrics

Citation

If this repository helps with your work, please consider giving a star and citation:

@inproceedings{huang2025reinforcing,
  title={Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models},
  author={Zemin Huang and Zhiyang Chen and Zijun Wang and Tiancheng Li and Guo-Jun Qi},
  journal={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025}
}

About

Implementation of "Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models" [NeurIPS 2025]

Resources

Stars

Watchers

Forks