Skip to content

SQCU/sliders

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

104 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fractional visual semantic offsets and beyond: a messy extension of Concept Sliders

built from Official code implementation of "Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models", European Conference on Computer Vision (ECCV 2024).

setup:

updated for cuda 12.8 and pytorch 2.7.1 and also, believe it or not, the gluon optimizer!

uv init
uv venv --seed 
#uv add torch torchvision torchaudio --index pytorch=https://download.pytorch.org/whl/cu124
uv sync --extra cuda
#uv add flash-attn... if you dare...
`"gluon-experiment @ git+https://github.com/sqcu/gluon-experiment.git"` in ur pyproject.toml...

uv run trainscripts/imagesliders/train_lora-scale-xl.py *args ... if you know how to install uv, i trust you. i trust you to understand how to reach inside of the pyproject.toml and lockfile... and add the appropriate lines to use the right indices for bsd, apple-mps, linux, and even system-v. you're gonna make it. it's gonna be okay.

remember that pyprojects and lockfiles are there to increase the scope of support and reproducibility of software projects, not to induce playground arguments about whether ALGOLS or LISPS are gonna be the machines of the future.

config:

gluon update!

use these hyperparameters as a starting point to get a reasonable assurance of a gluon orthogonalized optimization run.

gluon requires a warmup run: train 2 adapters: one for like idk 240 to 360 steps with at least grad accum 8 (necessary for gluon statistics, sorry, diffusion has very bouncy noisy gradient stats).

you will end up with a logfile printed to a relative directory with some files and stuff. read them if you want! running the same script a second time will parse those files to initialize the l0 and l1 terms of the gluon algorithm and start up a muon-optimizer-like-training-run which will probably converge.

gluon is stable at far higher learning rates than adam, but 1.0 LR still blows up.

network:
  type: "c3lier" # or "c3lier" or "lierla"
  rank: 64
  alpha: 16.0
  training_method: "noxattn"
train:
  precision: "bfloat16"
  noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a", "ddim"
  iterations: 1200
  lr: 0.01  # different semantic meaning in gluon
  optimizer: "gluondist"
  lr_scheduler: "cosine"  #or "constant" or "cosine"
  max_denoising_steps: 50
  grad_accum: 8
  grad_clip: 0.1

edit f"config-xl-{your_experiment}.yaml" to:

network:
  type: "c3lier" # or "c3lier" or "lierla"
  rank: 64
  alpha: 16.0
  training_method: "noxattn"
train:
  precision: "bfloat16"
  noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a", "ddim"
  iterations: 4000
  lr: 0.00012
  optimizer: "AdamW"
  lr_scheduler: "cosine"  #or "constant" or "cosine"
  max_denoising_steps: 850

edit f"prompts-xl-dilora-{your_experiment}.yaml to be either:

- target: "" # what word for erasing the positive concept from
  positive: "" # concept to erase
  unconditional: "" # word to take the difference from the positive concept
  neutral: "" # starting point for conditioning the target
  action: "enhance" # erase or enhance
  guidance_scale: 4
  resolution: 1024
  dynamic_resolution: false
  batch_size: 1

or

- target: f"{invariant in images}" # what word for erasing the positive concept from
  positive: f"{invariant in images}, {thing ur varying on purpose}" # concept to erase
  unconditional: f"{invariant in images}" # word to take the difference from the positive concept
  neutral: f"{invariant in images}" # starting point for conditioning the target
  action: f"enhance" # erase or enhance
  guidance_scale: 4
  resolution: 1024
  dynamic_resolution: false
  batch_size: 1

datasets must be made of images with identical filenames spread across every folder you include in a list of argparse operands to trainscripts/imagesliders/train_lora-scale-xl.py

refactoring that weird choice (to other sorts of datasets made of directed graphs) is a refactor so obvious and tempting we are totally refusing to do it. other deadlines press...

our suggested training template:

--name 'sldr_dilora_frsht_robe_III' 
--rank 96 --alpha 48 
--config_file 'trainscripts/imagesliders/data/config-xl-{your_experiment}.yaml'
--folder_main 'datasets/assym_dilora' --folders 'base_one_minus, base_one, base_one_plus, base_k' --scales 0.67, 1, 1.3, 2.0

the names are helpful clues for how to use this approach but are not necessary to program operation.

the 'scales' are semantically meaningful: changing the interval between these numbers changes how the 'slider' learns to separate and fuse visual ideas in very definite ways.

argparse operands override everything that looks like a related parameter in the yaml config files. this is inherited from upstream. think of this as a bug-for-bug reproduction of the upstream sliders implementation, to make it more obvious how little must be changed to extend the behavior of this sort of loss function.

eval and inference:

if you use comfyui i am very sorry and i hope you recover, someday, somehow. maybe if just 12 more video essayists and 3 more dormant non-programmer patreons pick up your preferred 'workflow' you'll finally figure out the obvious and very smart deployment case that makes your text2image so unique and interesting and different from everyone else's all this time...?

for everyone else: the dynamic prompts extension to the automatic-like webuis supports really easy scripting. `{slider:0|slider:0.8|slider:1.1|slider:2.7} w/ fixed seeds and combinatorial generation will make it very easy to sample your 'fractional differences' & explore the transitions between slider-multiplier-conditioned model behavior.

if you are rolling your own inference, this is even easier! i think i've said it all with 'combinatorial generation and fixed seeds', haven't i?

Citing the upstream work:

The upstream's preprint can be cited as follows

@inproceedings{gandikota2023erasing,
  title={Erasing Concepts from Diffusion Models},
  author={Rohit Gandikota and Joanna Materzy\'nska and Tingrui Zhou and Antonio Torralba and David Bau},
  booktitle={Proceedings of the 2024 IEEE European Conference on Computer Vision},
  note={arXiv preprint arXiv:2311.12092},
  year={2024}
}

citing this work:

huh?

About

fractional visual semantic offsets and beyond:

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 96.6%
  • Python 3.4%