Sync-R1 Core Training Logic

This repository snapshot keeps only the current paper-aligned GRPO training path for Sync-R1.

Included Code

The GitHub-facing version intentionally keeps a single public entrypoint and its direct dependencies:

train_grpo_paper.py: training launcher
grpo_paper.py: paper-closer GRPO implementation
ref_model.py: reference-side rollout helper
pdata.py, utils.py, clip_eval.py, glm_api.py: runtime helpers
models/, training/, llava/: model and training modules
configs/: config files
requirements.txt: minimal dependency list

Older draft variants are intentionally excluded to avoid ambiguity for users.

Paths To Fill In

The config files use placeholder paths:

path/to/show-o-512x512
path/to/show-o
path/to/magvitv2
path/to/phi-1_5
path/to/checkpoints

Update them in:

configs/showo_demo_512x512.yaml
configs/showo_demo.yaml

The training command also expects:

--data_root path/to/unictokens_data
--pre_trained_ckpt_name path/to/second_stage_checkpoint_dir

External Assets Not Included

This trimmed repo does not include:

training data
pretrained checkpoints
CLIP weights
facenet weights
generated images and logs

You need to provide them locally before training.

Launch

Run from Sync-R1/. The script initializes torch.distributed, so torchrun is the recommended launcher.

3 GPUs

torchrun --nproc_per_node=3 train_grpo_paper.py \
  --num_gpus 3 \
  --config_file configs/showo_demo_512x512.yaml \
  --data_root path/to/unictokens_data \
  --pre_trained_ckpt_name path/to/second_stage_checkpoint_dir \
  --concept adrien_brody \
  --save_dir ./tmp_result_accelerate/ \
  --epoch_to_load 15 \
  --batch_num 10 \
  --batch_size 1 \
  --num_gen 9 \
  --llm glm \
  --accelerate True \
  --semantic True

1 GPU

torchrun --nproc_per_node=1 train_grpo_paper.py \
  --num_gpus 1 \
  --config_file configs/showo_demo_512x512.yaml \
  --data_root path/to/unictokens_data \
  --pre_trained_ckpt_name path/to/second_stage_checkpoint_dir \
  --concept adrien_brody \
  --save_dir ./tmp_result_accelerate/ \
  --epoch_to_load 15 \
  --batch_num 10 \
  --batch_size 1 \
  --num_gen 3 \
  --llm glm \
  --accelerate True \
  --semantic True

Runtime Notes

--num_gen is the rollout group size for a single prompt.
The current training loop assumes batch_size=1 and multiple rollouts per prompt.
--num_gpus should match --nproc_per_node.

Optional Environment Variables

If you use the LLM-based scoring paths, configure credentials through environment variables instead of hardcoding them:

ZAI_API_KEY
VERTEXAI_PROJECT
VERTEXAI_LOCATION
GOOGLE_APPLICATION_CREDENTIALS

Scope

This trimmed repo is meant for:

reading the current GRPO training logic
reproducing the paper-closer implementation
auditing or modifying the Sync-R1 training path

It is not a plug-and-play full training package until you attach the required local datasets, checkpoints, and external model assets.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Sync-R1		Sync-R1
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sync-R1 Core Training Logic

Included Code

Paths To Fill In

External Assets Not Included

Launch

3 GPUs

1 GPU

Runtime Notes

Optional Environment Variables

Scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sync-R1 Core Training Logic

Included Code

Paths To Fill In

External Assets Not Included

Launch

3 GPUs

1 GPU

Runtime Notes

Optional Environment Variables

Scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages