Hi @zhengkw18 @jt-zhang Thanks for your great efforts!
Regarding turbodiffusion/scripts/merge_models.py, could it be employed as a common practice of merging step distillation weights (e.g., DMD, discrete / continuous CD, adversarial distillation, etc.) and sparse attn weights (e.g., VSA, STA, radial attn, etc.), in order to combine their strength?
Or else, it is just a nice property of rCM together with SLA? Thanks in advance.
Hi @zhengkw18 @jt-zhang Thanks for your great efforts!
Regarding
turbodiffusion/scripts/merge_models.py, could it be employed as a common practice of mergingstep distillation weights(e.g., DMD, discrete / continuous CD, adversarial distillation, etc.) andsparse attn weights(e.g., VSA, STA, radial attn, etc.), in order to combine their strength?Or else, it is just a nice property of rCM together with SLA? Thanks in advance.