feat: enable immediate saving on UI stop and enhance optimizer state backup by avan06 · Pull Request #727 · ostris/ai-toolkit

avan06 · 2026-02-27T03:02:30Z

This PR improves the training interruption workflow by ensuring that training progress is captured immediately when a user requests a stop via the UI. It also introduces a backup mechanism for the optimizer state to provide users with more flexibility when resuming or rolling back training.

Key Changes

Immediate Save on UI Stop (DiffusionTrainer)
Modified maybe_stop to trigger self.save() as soon as a stop signal is detected.
Added a _is_saving flag to manage the saving state, preventing infinite recursion or redundant calls during emergency saves.
Benefit: Users can now resume training from the precise step of interruption rather than being forced to roll back to the last scheduled checkpoint. This ensures LoRA weights, Metadata, and optimizer states are perfectly synchronized at the exit point.
Optimizer State Rotation (BaseSDTrainProcess)
Implemented a backup system for the optimizer state. When saving, the existing optimizer.pt is moved to optimizer_prev.pt instead of being directly overwritten by the new state.

Example Scenario (How to roll back)

If a user decides they prefer the results from a previous scheduled save over the final interrupted save:

Current State: You have a scheduled save at Step 500 (train_000000500.safetensors and optimizer_prev.pt) and an interruption save at Step 666 (train_000000666.safetensors and optimizer.pt).
Rollback Process:

Delete the interruption files: train_000000666.safetensors and optimizer.pt.
Rename optimizer_prev.pt to optimizer.pt.

Result: The trainer will successfully resume training from Step 500 using the correct historical optimizer state.

These changes have been verified in a local environment and are confirmed to be working as intended.

- Modified maybe_stop in DiffusionTrainer to trigger self.save() when a stop signal is detected. - Added _is_saving flag to manage saving state and prevent infinite recursion during emergency saves. - Ensures LoRA weights, Metadata, and optimizer state are synchronized at the exact exit step. - Enables users to resume training from the precise interruption point instead of rolling back to the last scheduled save.

Now maintains a backup of the previous optimizer state. When saving, the current optimizer.pt is moved to optimizer_prev.pt rather than being overwritten.

avan06 added 2 commits February 27, 2026 10:16

Optimizer Saving Update(BaseSDTrainProcess):

5cd62b0

Now maintains a backup of the previous optimizer state. When saving, the current optimizer.pt is moved to optimizer_prev.pt rather than being overwritten.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enable immediate saving on UI stop and enhance optimizer state backup#727

feat: enable immediate saving on UI stop and enhance optimizer state backup#727
avan06 wants to merge 2 commits intoostris:mainfrom
avan06:save-checkpoint-on-stop

avan06 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

avan06 commented Feb 27, 2026

Key Changes

Example Scenario (How to roll back)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant