fractional visual semantic offsets and beyond: a messy extension of Concept Sliders

Upstream Project Website | Arxiv Preprint |

built from Official code implementation of "Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models", European Conference on Computer Vision (ECCV 2024).

setup:

updated for cuda 12.8 and pytorch 2.7.1 and also, believe it or not, the gluon optimizer!

uv init
uv venv --seed 
#uv add torch torchvision torchaudio --index pytorch=https://download.pytorch.org/whl/cu124
uv sync --extra cuda
#uv add flash-attn... if you dare...
`"gluon-experiment @ git+https://github.com/sqcu/gluon-experiment.git"` in ur pyproject.toml...

uv run trainscripts/imagesliders/train_lora-scale-xl.py *args ... if you know how to install uv, i trust you. i trust you to understand how to reach inside of the pyproject.toml and lockfile... and add the appropriate lines to use the right indices for bsd, apple-mps, linux, and even system-v. you're gonna make it. it's gonna be okay.

remember that pyprojects and lockfiles are there to increase the scope of support and reproducibility of software projects, not to induce playground arguments about whether ALGOLS or LISPS are gonna be the machines of the future.

config:

gluon update!

use these hyperparameters as a starting point to get a reasonable assurance of a gluon orthogonalized optimization run.

gluon requires a warmup run: train 2 adapters: one for like idk 240 to 360 steps with at least grad accum 8 (necessary for gluon statistics, sorry, diffusion has very bouncy noisy gradient stats).

you will end up with a logfile printed to a relative directory with some files and stuff. read them if you want! running the same script a second time will parse those files to initialize the l0 and l1 terms of the gluon algorithm and start up a muon-optimizer-like-training-run which will probably converge.

gluon is stable at far higher learning rates than adam, but 1.0 LR still blows up.

network:
  type: "c3lier" # or "c3lier" or "lierla"
  rank: 64
  alpha: 16.0
  training_method: "noxattn"
train:
  precision: "bfloat16"
  noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a", "ddim"
  iterations: 1200
  lr: 0.01  # different semantic meaning in gluon
  optimizer: "gluondist"
  lr_scheduler: "cosine"  #or "constant" or "cosine"
  max_denoising_steps: 50
  grad_accum: 8
  grad_clip: 0.1

edit f"config-xl-{your_experiment}.yaml" to:

network:
  type: "c3lier" # or "c3lier" or "lierla"
  rank: 64
  alpha: 16.0
  training_method: "noxattn"
train:
  precision: "bfloat16"
  noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a", "ddim"
  iterations: 4000
  lr: 0.00012
  optimizer: "AdamW"
  lr_scheduler: "cosine"  #or "constant" or "cosine"
  max_denoising_steps: 850

edit f"prompts-xl-dilora-{your_experiment}.yaml to be either:

- target: "" # what word for erasing the positive concept from
  positive: "" # concept to erase
  unconditional: "" # word to take the difference from the positive concept
  neutral: "" # starting point for conditioning the target
  action: "enhance" # erase or enhance
  guidance_scale: 4
  resolution: 1024
  dynamic_resolution: false
  batch_size: 1

or

- target: f"{invariant in images}" # what word for erasing the positive concept from
  positive: f"{invariant in images}, {thing ur varying on purpose}" # concept to erase
  unconditional: f"{invariant in images}" # word to take the difference from the positive concept
  neutral: f"{invariant in images}" # starting point for conditioning the target
  action: f"enhance" # erase or enhance
  guidance_scale: 4
  resolution: 1024
  dynamic_resolution: false
  batch_size: 1

datasets must be made of images with identical filenames spread across every folder you include in a list of argparse operands to trainscripts/imagesliders/train_lora-scale-xl.py

refactoring that weird choice (to other sorts of datasets made of directed graphs) is a refactor so obvious and tempting we are totally refusing to do it. other deadlines press...

our suggested training template:

--name 'sldr_dilora_frsht_robe_III' 
--rank 96 --alpha 48 
--config_file 'trainscripts/imagesliders/data/config-xl-{your_experiment}.yaml'
--folder_main 'datasets/assym_dilora' --folders 'base_one_minus, base_one, base_one_plus, base_k' --scales 0.67, 1, 1.3, 2.0

the names are helpful clues for how to use this approach but are not necessary to program operation.

the 'scales' are semantically meaningful: changing the interval between these numbers changes how the 'slider' learns to separate and fuse visual ideas in very definite ways.

argparse operands override everything that looks like a related parameter in the yaml config files. this is inherited from upstream. think of this as a bug-for-bug reproduction of the upstream sliders implementation, to make it more obvious how little must be changed to extend the behavior of this sort of loss function.

eval and inference:

if you use comfyui i am very sorry and i hope you recover, someday, somehow. maybe if just 12 more video essayists and 3 more dormant non-programmer patreons pick up your preferred 'workflow' you'll finally figure out the obvious and very smart deployment case that makes your text2image so unique and interesting and different from everyone else's all this time...?

for everyone else: the dynamic prompts extension to the automatic-like webuis supports really easy scripting. `{slider:0|slider:0.8|slider:1.1|slider:2.7} w/ fixed seeds and combinatorial generation will make it very easy to sample your 'fractional differences' & explore the transitions between slider-multiplier-conditioned model behavior.

if you are rolling your own inference, this is even easier! i think i've said it all with 'combinatorial generation and fixed seeds', haven't i?

Citing the upstream work:

The upstream's preprint can be cited as follows

@inproceedings{gandikota2023erasing,
  title={Erasing Concepts from Diffusion Models},
  author={Rohit Gandikota and Joanna Materzy\'nska and Tingrui Zhou and Antonio Torralba and David Bau},
  booktitle={Proceedings of the 2024 IEEE European Conference on Computer Vision},
  note={arXiv preprint arXiv:2311.12092},
  year={2024}
}

citing this work:

huh?

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
eval-scripts		eval-scripts
flux-sliders		flux-sliders
images		images
prompts		prompts
trainscripts		trainscripts
.gitignore		.gitignore
.python-version		.python-version
GPT_prompt_helper.ipynb		GPT_prompt_helper.ipynb
LICENSE		LICENSE
README.md		README.md
SD1-sliders-inference.ipynb		SD1-sliders-inference.ipynb
XL-sliders-inference.ipynb		XL-sliders-inference.ipynb
__init__.py		__init__.py
concept_sliders_diffusers.ipynb		concept_sliders_diffusers.ipynb
cu124_pyproject.toml		cu124_pyproject.toml
cu128_pyproject.toml		cu128_pyproject.toml
demo_SDXL_Turbo.ipynb		demo_SDXL_Turbo.ipynb
demo_concept_sliders.ipynb		demo_concept_sliders.ipynb
demo_image_editing.ipynb		demo_image_editing.ipynb
notes.txt		notes.txt
pyproject.toml		pyproject.toml
requirements-loose.txt		requirements-loose.txt
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fractional visual semantic offsets and beyond: a messy extension of Concept Sliders

Upstream Project Website | Arxiv Preprint |

setup:

config:

eval and inference:

Citing the upstream work:

citing this work:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fractional visual semantic offsets and beyond: a messy extension of Concept Sliders

Upstream Project Website | Arxiv Preprint |

setup:

config:

eval and inference:

Citing the upstream work:

citing this work:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages