-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
使用z-image-turbo lora训练后,8步推理出来的结果非常模糊,要到50步才勉强可看。下面是同样的prompt,同样的seed训练前后对比图:
下面是分阶段训练的脚本:
accelerate launch DiffSynth-Studio/examples/z_image/model_training/train.py
--dataset_base_path autodl-tmp/dataset
--dataset_metadata_path autodl-tmp/dataset/metadata.csv
--max_pixels 1048576
--dataset_repeat 1
--model_paths '[
[
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00001-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00002-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00003-of-00003.safetensors"
],
[
"autodl-tmp/Z-Image-Turbo/text_encoder/model-00001-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/text_encoder/model-00002-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/text_encoder/model-00003-of-00003.safetensors"
],
"autodl-tmp/Z-Image-Turbo/vae/diffusion_pytorch_model.safetensors"
]'
--learning_rate 1e-4
--num_epochs 5
--remove_prefix_in_ckpt "pipe.dit."
--output_path "autodl-tmp/z-image-turbo-cache"
--lora_base_model "dit"
--lora_target_modules "to_q,to_k,to_v,to_out.0,w1,w2,w3"
--lora_rank 32
--use_gradient_checkpointing
--dataset_num_workers 8
--task "sft:data_process"
accelerate launch DiffSynth-Studio/examples/z_image/model_training/train.py
--dataset_base_path autodl-tmp/z-image-turbo-cache
--max_pixels 1048576
--dataset_repeat 1
--model_paths '[
[
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00001-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00002-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00003-of-00003.safetensors"
]
]'
--learning_rate 1e-4
--num_epochs 5
--remove_prefix_in_ckpt "pipe.dit."
--output_path "autodl-tmp/z-image-turbo-lora"
--lora_base_model "dit"
--lora_target_modules "to_q,to_k,to_v,to_out.0,w1,w2,w3"
--lora_rank 32
--use_gradient_checkpointing
--dataset_num_workers 8
--task "sft:train"
我怀疑底层脚本训练时,仍然是按照很大步数训练的,才导致这个问题,这样就背离该模型的初衷了。