built from Official code implementation of "Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models", European Conference on Computer Vision (ECCV 2024).
updated for cuda 12.8 and pytorch 2.7.1 and also, believe it or not, the gluon optimizer!
uv init
uv venv --seed
#uv add torch torchvision torchaudio --index pytorch=https://download.pytorch.org/whl/cu124
uv sync --extra cuda
#uv add flash-attn... if you dare...
`"gluon-experiment @ git+https://github.com/sqcu/gluon-experiment.git"` in ur pyproject.toml...
uv run trainscripts/imagesliders/train_lora-scale-xl.py *args ...
if you know how to install uv, i trust you. i trust you to understand how to reach inside of the pyproject.toml and lockfile... and add the appropriate lines to use the right indices for bsd, apple-mps, linux, and even system-v. you're gonna make it. it's gonna be okay.
remember that pyprojects and lockfiles are there to increase the scope of support and reproducibility of software projects, not to induce playground arguments about whether ALGOLS or LISPS are gonna be the machines of the future.
gluon update!
use these hyperparameters as a starting point to get a reasonable assurance of a gluon orthogonalized optimization run.
gluon requires a warmup run: train 2 adapters: one for like idk 240 to 360 steps with at least grad accum 8 (necessary for gluon statistics, sorry, diffusion has very bouncy noisy gradient stats).
you will end up with a logfile printed to a relative directory with some files and stuff. read them if you want! running the same script a second time will parse those files to initialize the l0 and l1 terms of the gluon algorithm and start up a muon-optimizer-like-training-run which will probably converge.
gluon is stable at far higher learning rates than adam, but 1.0 LR still blows up.
network:
type: "c3lier" # or "c3lier" or "lierla"
rank: 64
alpha: 16.0
training_method: "noxattn"
train:
precision: "bfloat16"
noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a", "ddim"
iterations: 1200
lr: 0.01 # different semantic meaning in gluon
optimizer: "gluondist"
lr_scheduler: "cosine" #or "constant" or "cosine"
max_denoising_steps: 50
grad_accum: 8
grad_clip: 0.1
edit f"config-xl-{your_experiment}.yaml" to:
network:
type: "c3lier" # or "c3lier" or "lierla"
rank: 64
alpha: 16.0
training_method: "noxattn"
train:
precision: "bfloat16"
noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a", "ddim"
iterations: 4000
lr: 0.00012
optimizer: "AdamW"
lr_scheduler: "cosine" #or "constant" or "cosine"
max_denoising_steps: 850
edit f"prompts-xl-dilora-{your_experiment}.yaml to be either:
- target: "" # what word for erasing the positive concept from
positive: "" # concept to erase
unconditional: "" # word to take the difference from the positive concept
neutral: "" # starting point for conditioning the target
action: "enhance" # erase or enhance
guidance_scale: 4
resolution: 1024
dynamic_resolution: false
batch_size: 1
or
- target: f"{invariant in images}" # what word for erasing the positive concept from
positive: f"{invariant in images}, {thing ur varying on purpose}" # concept to erase
unconditional: f"{invariant in images}" # word to take the difference from the positive concept
neutral: f"{invariant in images}" # starting point for conditioning the target
action: f"enhance" # erase or enhance
guidance_scale: 4
resolution: 1024
dynamic_resolution: false
batch_size: 1
datasets must be made of images with identical filenames spread across every folder you include in a list of argparse operands to trainscripts/imagesliders/train_lora-scale-xl.py
refactoring that weird choice (to other sorts of datasets made of directed graphs) is a refactor so obvious and tempting we are totally refusing to do it. other deadlines press...
our suggested training template:
--name 'sldr_dilora_frsht_robe_III'
--rank 96 --alpha 48
--config_file 'trainscripts/imagesliders/data/config-xl-{your_experiment}.yaml'
--folder_main 'datasets/assym_dilora' --folders 'base_one_minus, base_one, base_one_plus, base_k' --scales 0.67, 1, 1.3, 2.0
the names are helpful clues for how to use this approach but are not necessary to program operation.
the 'scales' are semantically meaningful: changing the interval between these numbers changes how the 'slider' learns to separate and fuse visual ideas in very definite ways.
argparse operands override everything that looks like a related parameter in the yaml config files. this is inherited from upstream. think of this as a bug-for-bug reproduction of the upstream sliders implementation, to make it more obvious how little must be changed to extend the behavior of this sort of loss function.
if you use comfyui i am very sorry and i hope you recover, someday, somehow. maybe if just 12 more video essayists and 3 more dormant non-programmer patreons pick up your preferred 'workflow' you'll finally figure out the obvious and very smart deployment case that makes your text2image so unique and interesting and different from everyone else's all this time...?
for everyone else: the dynamic prompts extension to the automatic-like webuis supports really easy scripting.
`{slider:0|slider:0.8|slider:1.1|slider:2.7} w/ fixed seeds and combinatorial generation will make it very easy to sample your 'fractional differences' & explore the transitions between slider-multiplier-conditioned model behavior.
if you are rolling your own inference, this is even easier! i think i've said it all with 'combinatorial generation and fixed seeds', haven't i?
The upstream's preprint can be cited as follows
@inproceedings{gandikota2023erasing,
title={Erasing Concepts from Diffusion Models},
author={Rohit Gandikota and Joanna Materzy\'nska and Tingrui Zhou and Antonio Torralba and David Bau},
booktitle={Proceedings of the 2024 IEEE European Conference on Computer Vision},
note={arXiv preprint arXiv:2311.12092},
year={2024}
}
huh?