This repository snapshot keeps only the current paper-aligned GRPO training path for Sync-R1.
The GitHub-facing version intentionally keeps a single public entrypoint and its direct dependencies:
train_grpo_paper.py: training launchergrpo_paper.py: paper-closer GRPO implementationref_model.py: reference-side rollout helperpdata.py,utils.py,clip_eval.py,glm_api.py: runtime helpersmodels/,training/,llava/: model and training modulesconfigs/: config filesrequirements.txt: minimal dependency list
Older draft variants are intentionally excluded to avoid ambiguity for users.
The config files use placeholder paths:
path/to/show-o-512x512path/to/show-opath/to/magvitv2path/to/phi-1_5path/to/checkpoints
Update them in:
configs/showo_demo_512x512.yamlconfigs/showo_demo.yaml
The training command also expects:
--data_root path/to/unictokens_data--pre_trained_ckpt_name path/to/second_stage_checkpoint_dir
This trimmed repo does not include:
- training data
- pretrained checkpoints
- CLIP weights
- facenet weights
- generated images and logs
You need to provide them locally before training.
Run from Sync-R1/. The script initializes torch.distributed, so torchrun is the recommended launcher.
torchrun --nproc_per_node=3 train_grpo_paper.py \
--num_gpus 3 \
--config_file configs/showo_demo_512x512.yaml \
--data_root path/to/unictokens_data \
--pre_trained_ckpt_name path/to/second_stage_checkpoint_dir \
--concept adrien_brody \
--save_dir ./tmp_result_accelerate/ \
--epoch_to_load 15 \
--batch_num 10 \
--batch_size 1 \
--num_gen 9 \
--llm glm \
--accelerate True \
--semantic Truetorchrun --nproc_per_node=1 train_grpo_paper.py \
--num_gpus 1 \
--config_file configs/showo_demo_512x512.yaml \
--data_root path/to/unictokens_data \
--pre_trained_ckpt_name path/to/second_stage_checkpoint_dir \
--concept adrien_brody \
--save_dir ./tmp_result_accelerate/ \
--epoch_to_load 15 \
--batch_num 10 \
--batch_size 1 \
--num_gen 3 \
--llm glm \
--accelerate True \
--semantic True--num_genis the rollout group size for a single prompt.- The current training loop assumes
batch_size=1and multiple rollouts per prompt. --num_gpusshould match--nproc_per_node.
If you use the LLM-based scoring paths, configure credentials through environment variables instead of hardcoding them:
ZAI_API_KEYVERTEXAI_PROJECTVERTEXAI_LOCATIONGOOGLE_APPLICATION_CREDENTIALS
This trimmed repo is meant for:
- reading the current GRPO training logic
- reproducing the paper-closer implementation
- auditing or modifying the Sync-R1 training path
It is not a plug-and-play full training package until you attach the required local datasets, checkpoints, and external model assets.