Compare our Stable Diffusion implementation with Hugging Face Candle's reference implementation to identify where the wrong images are coming from.
- Reference implementation cloned to
.hf/candle/ - Candle Stable Diffusion example built and ready
- Both using same model:
runwayml/stable-diffusion-v1-5 - Test prompt: "a cat on a beach"
- Run both implementations with same prompt
- Compare (77, 768) embeddings
- Verify tokenization, attention, MLP computations
Our implementation:
cargo run --release -- generate --prompt "a cat on a beach" --steps 1 --output /tmp/ours_clip_test.pngCandle reference:
cd .hf/candle && cargo run --example stable-diffusion --release --cpu -- --prompt "a cat on a beach" --n-steps 1 --final-image /tmp/candle_clip_test.png- Verify cosine schedule values match
- Check timestep embeddings
Our noise schedule values:
cargo run --release -- noise-test- Run both with same seed (if possible)
- Compare latent outputs at key timesteps
- Check if UNet predictions are reasonable
- Compare upsampling logic
- Check channel projections (4 → 3)
- Verify output range [0, 1]
- CLIP Embeddings: Should have reasonable range and distribution
- Noise Schedule: Linear/cosine values should match exactly
- Timestep Embeddings: Should encode time information properly
- UNet Output: Should predict noise, not generate garbage
- VAE Upsampling: Should properly upsample (64,64) → (512,512)
- Extract CLIP embeddings from both, save as tensor
- Compute statistics (mean, std, min, max)
- Compare attention weights
- Compare diffusion latent at each step
- Visualize intermediate latents