Skip to content

SQCU/futudiffu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

futudiffu

future diffusers.

what is this repository?

the future of diffusers!

no, seriously?

modern deep learning models are trained by unsupervised learning on lots of different data.

the more they see, the more they learn.

but modern deep learning models are not pretrained and then released.

there are other things you have to do besides 'pretraining' to make a useable machine learning model people can deploy and run.

this repository covers several important gaps in 'midtraining' and 'posttraining' allowing the task adaptation of diffusion models.

major features:

  • various kernels you shouldn't need to look at
  • verifiable reward functions as an example of use to promote ordinary software development over rewards
  • pairwise ranking reward model training code to teach unsupervised models to 'look' for visual features
  • two demonstration BTRM heads demonstrating PINKIFY/THISNOTTHAT rankings
  • total liberation from comfyui; we're all free now, you never need to drag the nodes/noodles around ever again.
  • todo: stepcount and activation quantization distillation reward models as alternative to reward weighted odds maximization distillation
  • DRGPO for denoising diffusion (porting in progress)
  • todo: total replacement of buggy shim code first pass codebase
  • todo: SSDIT text encoder quantization aware distillation training
  • todo: vlm-as-judge RLVR support (super advanced feature: requires cross integration w/ primeintellect environments to train judge VLMs)

r_theta validation

This is a compact demonstration that reward models implemented as low rank adapters over pretrained models... use the existing residual stream and feature circuits from unsupervised objectives.

  • A reward adapter (r_theta) is trained via BTRM to simply predict whether an image is more or less pink, and more like reference_image_a while also less like reference_image_b.
  • The composites below show reference (no adapter) model sampling trajectories on the left and r_theta intervened-models on the right.
  • Plots demonstrate BTRM scores for each step for both the pinkify and thisnotthat reward heads in both the reference model's sampling trajectories and the reward-intervened model's sampling trajectories.
  • Reward models trained to detect pinkness don't make sampled images more pink; reward adapters are not policy adapters.

Laser shark composite -- reference vs r_theta intervention, 30-step trajectory with BTRM scores

Portrait composite -- reference vs r_theta intervention, 30-step trajectory with BTRM scores

Shrimp field composite -- reference vs r_theta intervention, 30-step trajectory with BTRM scores

policy intervention validation

scripts_ii/validate_policy_intervention.py is a resumable, incremental-persistence script that compares DDGRPO-trained policy adapters against a BTRM-only reference across prompts, seeds, and resolutions.

Garden 512x512 -- ref / v2 / v2b policy comparison with BTRM score trajectories

Cabin 1280x832 -- ref / v2 / v2b policy comparison with BTRM score trajectories

if this is a diffusion model repo where do i click on buttons and write 'prompts'?

uv run python scripts_ii\launch_server.py

uv run python scripts_ii\launch_yeetums.py --inference-url http://localhost:8000 --port 8079

why?

brain hurt after trying to cram for mats / anthropic fellows code screens in ancient dead languages no longer used in ml, needed cooldown exercise

About

future diffusers. it's the future!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors