Skip to content

Apply DreamBooth finetuning for pose conditioning T2I (ControlNet)

Notifications You must be signed in to change notification settings

BFCmath/PoseDreamBooth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

79 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DreamBooth & ControlNet: Pose-Conditioned Identity Preservation

This repository contains the implementation, experiments, and research notes for an AI Residency project focused on fine-tuning Stable Diffusion models for concurrent identity preservation and pose guidance.


πŸ“‚ Repository Navigation

  • dreambooth/: Core DreamBooth implementation for subject identity learning.
  • controlnet/: Advanced fine-tuning of ControlNet integrated with DreamBooth architectures.
  • docs/: Project documentation, task descriptions, and technical concepts.
    • πŸ“„ Full Project Report
  • paper/: Detailed research summaries and pseudocode for relevant SOTA methods (HyperHuman, MagicPose, etc.).

πŸ”¬ Research & Implementation Summary

This project explores the fine-tuning of Stable Diffusion v1.5 to generate specific subjects (identity) in user-defined configurations (pose).

1. Phase I: Baseline and Human Subjects

  • DreamBooth Reproduction: Successfully learned specific subjects with minimal data; identified prompt fidelity vs. identity trade-offs.
  • Human Pose Integration: Combined ControlNet with DreamBooth. Discovered that 200–600 steps (avg. 400) and LoRA ranks $\geq$ 16 provide the optimal balance for identity preservation.
  • Key Finding: Fine-tuning the Text Encoder is critical for learning complex human identities but risks "catastrophic forgetting" of structural concepts.

2. Phase II: The Humanoid Challenge (Unitree G1 Robot)

Transitioned to a significantly harder domain: a non-humanoid robot subject with sparse training data (3–6 images).

  • Constraint: OpenPose algorithms struggle with non-humanoid joint structures, limiting dataset quality.
  • ControlNet Bias: Pre-trained ControlNets exhibit a strong "human bias," making it difficult to maintain robot morphology in extreme or unusual poses.

3. Advanced Methodologies

To address overfitting and structural bias, several research-backed techniques were implemented:

  • Custom Diffusion Optimization:
    • K/V Attention Tuning: Trained only the Key (K) and Value (V) projections in cross-attention layers.
    • Embedding Training: Optimized the [V] rare-token embedding exclusively, which reduced structural forgetting but resulted in lower identity fidelity.
  • Multi-Stage Training (MagicPose Style):
    • Stage 1 (Appearance): Isolated identity training without ControlNet interference.
    • Stage 2 (Pose): Structural guidance training with the identity-aware Text Encoder frozen.

🏁 Key Conclusions

  1. Pose-Identity Conflict: Precise subject generation in "hard" (non-humanoid) poses remains a major challenge due to the inherent human-centric bias in pre-trained spatial adapters.
  2. Overfitting vs. Generalization: Naive DreamBooth training often causes the model to "forget" structural flexibility. Strategic dropout and targeted parameter tuning (e.g., K/V attention) are essential for maintaining pose adherence.
  3. Future Work: Bridging the gap between the specific morphology of non-humanoid subjects and general spatial conditioning models.

Tip

Refer to the IDEA.md for a deep dive into the technical papers that inspired these implementations.

About

Apply DreamBooth finetuning for pose conditioning T2I (ControlNet)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published