ComfyUI custom nodes for using NVIDIA PiD as a pixel diffusion decoder.
PiD is not a normal ComfyUI VAE. It needs a latent, a prompt/caption, a sigma value, and optionally a native decoder baseline image:
LATENT + caption + sigma + optional baseline IMAGE -> PiD -> IMAGE
For the official latent-conditioned PiD checkpoints, this node can infer the baseline size from the latent and skip the extra VAE/baseline image path to reduce VRAM use.
- Direct PiD Decode node that returns a ComfyUI
IMAGE. - Staged low-VRAM workflow: PiD Prepare → PiD Sample → PiD Finalize.
- PiD Sample runs in a subprocess so CUDA memory is released after sampling.
- PiD KSampler Capture for grabbing an intermediate latent and matching sigma.
- Lazy setup: PiD source, checkpoints, and required assets are prepared on first run when
auto_download=true. - Optional sequential block offload for lower VRAM at the cost of speed.
Clone into ComfyUI/custom_nodes:
cd ComfyUI/custom_nodes
git clone https://github.com/Merserk/ComfyUI-PiD.git
cd ComfyUI-PiD
python -m pip install -r requirements.txtRestart ComfyUI.
Requirements:
- Python
>=3.10 - NVIDIA CUDA GPU
- Working ComfyUI install
- Enough VRAM for PiD, especially for
2kto4kor large output scales
requirements.txt does not install PyTorch because ComfyUI usually provides it.
| Node | Purpose |
|---|---|
| PiD Decode | One-node PiD decode from latent to image. |
| PiD Text Prompt | One prompt box with text for CLIP and caption for PiD. |
| PiD KSampler Capture | KSampler-compatible sampler that returns final latent, captured PiD latent, and sigma. |
| PiD Prepare | Prepares latent, caption, checkpoint, assets, and metadata on CPU. |
| PiD Sample | Runs the heavy PiD sampling step in a subprocess. |
| PiD Finalize | Converts sampled PiD output back to ComfyUI IMAGE. |
| PiD Decode (Staged) | Convenience wrapper around the staged path. |
| Value | Backbone | Latent channels | Checkpoints |
|---|---|---|---|
zimage |
Z-Image / Flux-compatible | 16 | 2k, 2kto4k |
flux |
Flux | 16 | 2k, 2kto4k |
flux2 |
Flux2 | 128 | 2k, 2kto4k |
sd3 |
Stable Diffusion 3 | 16 | 2k, 2kto4k |
dinov2 |
DINOv2 RAE | 768 | 2k |
siglip |
SigLIP Scale-RAE | 1152 | 2k |
scale=0 uses NVIDIA's default scale for the selected checkpoint: usually 4x, or 8x for SigLIP Scale-RAE.
For Z-Image / Flux-style workflows:
PiD Text Prompt text -> CLIP Text Encode
PiD Text Prompt caption -> PiD Decode caption
KSampler latent -> PiD Decode latent
PiD Decode image -> Save Image
Recommended first test settings:
backbone = zimage
pid_ckpt_type = 2k
pid_steps = 4
scale = 1 or 2
cfg_scale = 1.0
sigma = 0.0
auto_download = true
unload_comfy_before_pid = true
aggressive_cleanup = true
sequential_offload = disabled
For official latent-conditioned checkpoints, leave vae and baseline_image disconnected unless you specifically need an external baseline size.
Use the staged nodes when VRAM is tight:
PiD KSampler Capture pid_latent -> PiD Prepare latent
PiD Text Prompt caption -> PiD Prepare caption
PiD Prepare -> PiD Sample
PiD Sample -> PiD Finalize
PiD Finalize image -> Save Image
Recommended Z-Image capture settings:
steps = 50
sampler_name = euler
scheduler = beta
capture_step = 46
PiD Sample runs in a separate Python process, so its CUDA context is destroyed after the sample is finished.
512x512 base + 2k + scale 4 -> 2048x2048
1024x1024 base + 2kto4k + scale 4 -> 4096x4096
Large outputs can require a lot of VRAM. If a run fails, try:
- Lower
scale. - Use a smaller base latent.
- Keep cleanup options enabled.
- Try
sequential_blocks, thensequential_blocks_aggressive. - Restart ComfyUI after CUDA allocator crashes.
By default, the node uses:
ComfyUI/custom_nodes/ComfyUI-PiD/vendor/PiD
You can override the PiD source location with:
- the
pid_source_dirnode input PID_REPO_DIRCOMFYUI_PID_REPO_DIR
When auto_download=true, the node downloads missing PiD source/checkpoints/assets as needed.
A template workflow is included in:
example_workflows/image_z_image_pid.json
After restart, open it from ComfyUI workflow templates or load the JSON manually.
- This is a community wrapper around NVIDIA's public PiD code, not an official NVIDIA or ComfyUI project.
- PiD outputs
IMAGE, not a ComfyUIVAE. - NVIDIA's PiD weights may have separate license/usage terms. Check the model card before commercial use.
- Final latents with
sigma=0.0can work, but captured intermediate latents usually better match the official PiD recipe.
This project is released under the MIT License.