BETA-Tuned Timestep Distribution by Koratahiu · Pull Request #1225 · Nerogar/OneTrainer

Koratahiu · 2025-12-26T06:05:27Z

This PR implements the timestep distribution proposed in the paper:
Beta-Tuned Timestep Diffusion Model

This method aims to align timestep sampling with the diffusion model's forward pass, resulting in faster convergence and improved training performance. The paper observes that the data distribution changes most significantly during the initial timesteps, rendering standard uniform sampling sub-optimal.

Usage

Select BETA timestep distribution.
Set Noising bias to 1 (corresponds to Beta in the paper; recommended: 1).
Set Noising weight to < 1 (corresponds to Alpha in the paper; recommended: 0.8).

Note: This is compatible with existing loss weighting strategies (e.g., Min-SNR, Debiased, etc.).

dxqb · 2025-12-27T09:59:07Z

Does this apply to all models? Only diffusion models are Beta-sampled during inference. Flow matching models are sampled with linear sigmas and often with timestep-shifting ("Flux-shift").
This would mean that using a beta timestep distribution during training is equivalent to using (dynamic) timestep shifting during training for flow matching models, which we already have.

is that correct? did #1124 also only apply to diffusion, not to flow matching?

Koratahiu · 2025-12-27T17:59:48Z

Does this apply to all models? Only diffusion models are Beta-sampled during inference.

It’s a tunable distribution, but it’s specifically intended for diffusion models (SD, SDXL, etc.).
For flow-matching, we need to identify where the data distribution changes most significantly.

Flow matching models are sampled with linear sigmas and often with timestep-shifting ("Flux-shift"). This would mean that using a beta timestep distribution during training is equivalent to using (dynamic) timestep shifting during training for flow matching models, which we already have.

Here's examples:

(08, 1) The paper's J-shaped:

(2, 2) This is very similar to Chroma timestep distribution.

(1, 1.2) the reverse

is that correct? did #1124 also only apply to diffusion, not to flow matching?

The issue is that #1124 lacks a theoretical basis (it’s more of a heuristic method) but it functions similarly. Also, while it supports flow matching by accepting sigmas, requiring both betas and sigmas added too much code.

O-J1 · 2026-01-02T12:43:28Z

Do we have any results of our own showing this actually works on SD1.5 and SDXL and not on these specific datasets? The paper only covers training at 32x32, 128x128 and 256x256 which are not resolutions either model can do?

Koratahiu · 2026-01-02T14:50:58Z

Do we have any results of our own showing this actually works on SD1.5 and SDXL? The paper only covers training at 32x32, 128x128 and 256x256 which are not resolutions either model can do?

It is a known observation in diffusion papers that the later timesteps are relatively easy for the model compared to others (since most of the image is still noise).
While the initial timesteps have near-infinite possibilities and are relatively hard (e.g., the issue mentioned in #1230).
I implemented the method from this paper as it was straightforward to do; it should provide similar benefits to those seen in #1124.

O-J1 · 2026-01-02T15:06:36Z

So we havent tried it for any training, at all?

Koratahiu · 2026-01-02T15:55:21Z

You mean testing? Yes, I tested it in my recent runs (SDXL - 1024) and they went very well.
I haven't done any direct comparisons yet, though I did run some tests using #1124, which was more stable and faster to train (in terms of validation loss) compared to uniform sampling.

dxqb · 2026-02-12T09:53:06Z

It is a known observation in diffusion papers that the later timesteps are relatively easy for the model compared to others (since most of the image is still noise).

Isn't the opposite the case? Later (= low) timesteps are hard, very late timesteps are impossible (which is what MIN_SNR_GAMMA attempted to solve)
Could it be that this paper defines the timestep reversed, as the diff2flow paper did?

Does this apply to all models? Only diffusion models are Beta-sampled during inference.

It’s a tunable distribution, but it’s specifically intended for diffusion models (SD, SDXL, etc.). For flow-matching, we need to identify where the data distribution changes most significantly.

I'm hesitant with this PR for two more reasons:

at some point we have to ask the question how much additional functionality we need in OneTrainer that is specifically for two very outdated models. I realize that SD/SDXL still have a large userbase, and I would not suggest removing functionality. but finetuning these models can also be considered a "solved problem". Do those users need more parameters?
this PR conflicts with Add per-concept noise and timesteps distribution settings #1062 and will not be trivial to merge

Koratahiu · 2026-02-13T22:17:02Z

at some point we have to ask the question how much additional functionality we need in OneTrainer that is specifically for two very outdated models. I realize that SD/SDXL still have a large userbase, and I would not suggest removing functionality. but finetuning these models can also be considered a "solved problem". Do those users need more parameters?

Outdated models have many flaws that recent papers try to address. This is one of those cases; I’ve read about five papers proposing a similar method (sampling more heavily from 'hard' timesteps). However, I’ll close this PR if you aren't planning to support SD/SDXL-specific features anymore

dxqb · 2026-03-01T07:49:23Z

#1062 (comment)

Koratahiu added 2 commits December 26, 2025 07:58

initial

3016a4a

use torch.rand for alpha==1

e910566

Koratahiu mentioned this pull request Dec 26, 2025

SpeeD Timestep Sampling #1124

Closed

2 tasks

Koratahiu mentioned this pull request Dec 27, 2025

E-TSDM: Early Timestep-shared Diffusion Model #1230

Draft

3 tasks

dxqb mentioned this pull request Mar 1, 2026

Add per-concept noise and timesteps distribution settings #1062

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BETA-Tuned Timestep Distribution#1225

BETA-Tuned Timestep Distribution#1225
Koratahiu wants to merge 2 commits intoNerogar:masterfrom
Koratahiu:B-TTDM

Koratahiu commented Dec 26, 2025 •

edited

Loading

Uh oh!

dxqb commented Dec 27, 2025

Uh oh!

Koratahiu commented Dec 27, 2025

Uh oh!

O-J1 commented Jan 2, 2026 •

edited

Loading

Uh oh!

Koratahiu commented Jan 2, 2026 •

edited

Loading

Uh oh!

O-J1 commented Jan 2, 2026

Uh oh!

Koratahiu commented Jan 2, 2026

Uh oh!

dxqb commented Feb 12, 2026 •

edited

Loading

Uh oh!

Koratahiu commented Feb 13, 2026

Uh oh!

dxqb commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Koratahiu commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Uh oh!

dxqb commented Dec 27, 2025

Uh oh!

Koratahiu commented Dec 27, 2025

Uh oh!

O-J1 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Koratahiu commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

O-J1 commented Jan 2, 2026

Uh oh!

Koratahiu commented Jan 2, 2026

Uh oh!

dxqb commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Koratahiu commented Feb 13, 2026

Uh oh!

dxqb commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Koratahiu commented Dec 26, 2025 •

edited

Loading

O-J1 commented Jan 2, 2026 •

edited

Loading

Koratahiu commented Jan 2, 2026 •

edited

Loading

dxqb commented Feb 12, 2026 •

edited

Loading