The gradient norm is too large

Dear authors, I'm a novice in DPO, and I am attracted by the awesome results in this paper, therefore I tried to train this based with the open-sourced codes. However, I found the L2 norm of the gradient is too large. Is that a normal practice in diffusion DPO training? I'm looking forward to your reply, and thanks for your help.

<img width="382" alt="Image" src="https://github.com/user-attachments/assets/ec12e433-6dc4-4a3d-8303-da660c6a74a0" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The gradient norm is too large #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The gradient norm is too large #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions