Skip to content

feat: Support NVIDIA PixelDiT and PiD (CORE-201)#14103

Merged
comfyanonymous merged 23 commits into
Comfy-Org:masterfrom
kijai:pixeldit
May 27, 2026
Merged

feat: Support NVIDIA PixelDiT and PiD (CORE-201)#14103
comfyanonymous merged 23 commits into
Comfy-Org:masterfrom
kijai:pixeldit

Conversation

@kijai
Copy link
Copy Markdown
Collaborator

@kijai kijai commented May 25, 2026

Adds support for Nvidia PixelDiT T2I image model, as well as the new PiD models.

Models (nsclv1 license):

https://huggingface.co/Comfy-Org/PixelDiT


PixelDiT text to image

pixeldit_test_01.json
Screenshot 2026-05-25 202425


PiD

Encode - decode example:

pid_512-2048_flux1_upscale_example_02.json

image

Z-image to 4096 example:

z_image_turbo_to_pid_03.json
image

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 88ec5108-8a46-4d54-a2cc-0210aa722af0

📥 Commits

Reviewing files that changed from the base of the PR and between adcf14d and 81ba159.

📒 Files selected for processing (1)
  • comfy/ldm/pixeldit/pid.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • comfy/ldm/pixeldit/pid.py

📝 Walkthrough

Walkthrough

This pull request adds PixelDiT, a pixel-space multimodal diffusion model, and PiD, a pixel-diffusion decoder variant, to ComfyUI. The implementation includes new pixel- and patch-level transformer modules, a Gemma2-2B-based text encoder/tokenizer, PiD low-quality latent injection and gating, updates to model detection/registration, model wrappers, and a ComfyUI node for PiD conditioning.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Support NVIDIA PixelDiT and PiD (CORE-201)' accurately describes the main change—adding support for two NVIDIA image models—and is clear and specific.
Description check ✅ Passed The description is directly related to the changeset, referencing the NVIDIA PixelDiT and PiD models being added, providing context about licensing, and including workflow examples.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@iperide
Copy link
Copy Markdown

iperide commented May 25, 2026

Hi, quick question about NVIDIA licensing.
For the new PixelDiT model on HuggingFace, NVIDIA explicitly lists NSCLv1 as the license, which allows redistribution as long as it remains non‑commercial:

redistribution allowed under the same license
derivative works allowed
non‑commercial use only

I didn’t find the restrictive clause (“may not be distributed, deployed…”) on the HF page or in the LICENSE file.

Since NVIDIA sometimes applies multiple overlapping licenses to their research models, could you clarify whether we should always assume the more restrictive NVIDIA Research license applies, even when the HuggingFace page only shows NSCLv1?

Just trying to understand the policy you follow for ComfyUI integration.

Thanks.

@jprsyt5
Copy link
Copy Markdown

jprsyt5 commented May 25, 2026

Really want to try PiD, but I can't get access to the TE model yet.
Does it also work with Gemma 2 2B from lumina repackaged? https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 25, 2026

Hi, quick question about NVIDIA licensing. For the new PixelDiT model on HuggingFace, NVIDIA explicitly lists NSCLv1 as the license, which allows redistribution as long as it remains non‑commercial:

redistribution allowed under the same license
derivative works allowed
non‑commercial use only

I didn’t find the restrictive clause (“may not be distributed, deployed…”) on the HF page or in the LICENSE file.

Since NVIDIA sometimes applies multiple overlapping licenses to their research models, could you clarify whether we should always assume the more restrictive NVIDIA Research license applies, even when the HuggingFace page only shows NSCLv1?

Just trying to understand the policy you follow for ComfyUI integration.

Thanks.

It's two different repos, the new PiD model uses the PixelDiT as base, this PR implements both.

PixelDiT license: https://huggingface.co/nvidia/PixelDiT-1300M-1024px/blob/main/LICENSE

3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this license, (b) you include a complete copy of this license with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.

So okay to repackage and share when license is included.


PiD license however is this: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/

3.1 The Model and any Derivative Model may not be distributed, deployed, sublicensed, publicly displayed, publicly performed, or sublicensed by You. You may not use the Model or a Derivative Model in a production environment or for the purpose of generating works for sale or distribution.

Edit: They have now changed the license to match PixelDiT: https://huggingface.co/nvidia/PiD/commit/b87dba45e5a2b2a18bac9515fca883f52b957558

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 25, 2026

Really want to try PiD, but I can't get access to the TE model yet. Does it also work with Gemma 2 2B from lumina repackaged? https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders

I've made the repo public now: https://huggingface.co/Comfy-Org/PixelDiT

@jprsyt5
Copy link
Copy Markdown

jprsyt5 commented May 25, 2026

Really want to try PiD, but I can't get access to the TE model yet. Does it also work with Gemma 2 2B from lumina repackaged? https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders

I've made the repo public now: https://huggingface.co/Comfy-Org/PixelDiT

Thank you!

Just tested it, but I'm still a bit confused.

Is PiD sensitive to the dimensions/image size?

I tried with a vertical image, the source is 1088×1440. Then I downscaled it by 0.5 using the Upscale Image By node.

After that, I used another Image Resize node and set it to 2048×2048 with the method set to keep proportion, and used the height & width output from this node to adjust the latent width/height, but I got weird results.

python_V0fRpe3ANL ComfyUI_temp_vpiip_00002_

If I increase it to something like 2880×2880, the image looks fine.

python_0G5Yr8PNmJ

I also noticed it always adds a green color at the bottom part of image, even though I'm not seeing that in your example results?

ComfyUI_temp_vpiip_00004_ ComfyUI_temp_vpiip_00005_

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 25, 2026

I added better explanation for the different models, it's important to choose the correct one closest to your resolution, their naming is a bit confusing.

@TheNeObr
Copy link
Copy Markdown

I wrote my own implementation, but for some reason yours just doesn't work...
image FLUX
imageFLUX2

Any idea what might be causing this problem?

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 25, 2026

Any idea what might be causing this problem?

Input image is too large maybe? That selected model works with 512x512 inputs.

Also make sure to use the ELM variant of the Gemma2... at least with the PixelDit T2I the standard Gemma2 gives broken outputs.

@ssugar008-maker
Copy link
Copy Markdown

I have used the same workflow - [z_image_turbo_to_pid_02.json] but kept returning with error on the incorrect dimensions, e.g. Even after the PID conditioning, it does not work.
weight [512, 128, 3, 3] → PiD expects 128 channels
input [1, 16, 256, 256] → Z-Image latent has 16 channels

Not sure what ight be the exact issue, thanks

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 26, 2026

I have used the same workflow - [z_image_turbo_to_pid_02.json] but kept returning with error on the incorrect dimensions, e.g. Even after the PID conditioning, it does not work. weight [512, 128, 3, 3] → PiD expects 128 channels input [1, 16, 256, 256] → Z-Image latent has 16 channels

Not sure what ight be the exact issue, thanks

Z-image uses flux1 VAE, do you have that selected in the conditioning node?
And the PiD model itself being the flux1?

@ssugar008-maker
Copy link
Copy Markdown

ssugar008-maker commented May 26, 2026

I have used the same workflow - [z_image_turbo_to_pid_02.json] but kept returning with error on the incorrect dimensions, e.g. Even after the PID conditioning, it does not work. weight [512, 128, 3, 3] → PiD expects 128 channels input [1, 16, 256, 256] → Z-Image latent has 16 channels
Not sure what ight be the exact issue, thanks

Z-image uses flux1 VAE, do you have that selected in the conditioning node? And the PiD model itself being the flux1?

Thank you for the prompt reply. Yes, Flux1 VAE is selected in the conditioning node (latent_format = flux1) and the PID model itself being flux1 (e.g. [PiD_res2kto4k_sr4x_official_flux_distill_4step]) but same issue; perhaps I'll reinstall ComfyUI for a re-test thanks!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
comfy/ldm/pixeldit/modules.py (1)

149-170: ⚡ Quick win

Avoid rebuilding the same RoPE table in every block call.

_fetch_pos() allocates a fresh position tensor for each PiTBlock.forward(). In this path the same (Hs, Ws, dtype, rope_options) values repeat across blocks and denoise steps, so this is just extra latency and GPU memory churn. Reusing a per-forward cache would pay off here. As per coding guidelines comfy/**: Core ML/diffusion engine. Focus on performance implications in hot paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/ldm/pixeldit/modules.py` around lines 149 - 170, PiTBlock.forward
currently calls _fetch_pos each block which calls precompute_freqs_cis_2d and
reallocates the same RoPE tensor repeatedly; add a small per-instance cache
keyed by (Hs, Ws, device, dtype, rope_options) (e.g., self._rope_cache = {}) and
modify _fetch_pos to check the cache and return the cached tensor when present,
otherwise compute, store, and return it; ensure the cache key includes
transformer_options.get("rope_options") and that tensors are moved to the
correct device/dtype if needed and invalidated when shapes or devices change to
avoid stale tensors.
comfy/ldm/pixeldit/pid.py (1)

117-131: ⚡ Quick win

Add an explicit LQ latent channel check.

LQProjection2D is configured for a fixed latent_channels, but wrong latent formats currently hit the first conv and fail with a backend shape-mismatch error. A small upfront validation would make this failure deterministic and much easier to diagnose.

Suggested fix
     def _align_latent_to_patch_grid(self, lq_latent: torch.Tensor, pH: int, pW: int) -> torch.Tensor:
         B, z_dim = lq_latent.shape[:2]
+        if z_dim != self.latent_channels:
+            raise ValueError(f"Expected lq_latent with {self.latent_channels} channels, got {z_dim}")
         if self.z_to_patch_ratio >= 1:
             if lq_latent.shape[2] != pH or lq_latent.shape[3] != pW:
                 z_aligned = F.interpolate(lq_latent, size=(pH, pW), mode="nearest")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/ldm/pixeldit/pid.py` around lines 117 - 131, Add a deterministic
channel validation at the start of _align_latent_to_patch_grid: check that
lq_latent.shape[1] (channels) equals the configured latent_channels (or
self.latent_channels / expected channel count used by LQProjection2D) and raise
a clear ValueError if it does not; this prevents an opaque backend
shape-mismatch later in the forward pass and directs the user to the expected
tensor format before any interpolation or reshape logic runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy_extras/nodes_pid.py`:
- Around line 37-40: When latent_format == "flux" explicitly validate the
channel count instead of treating all non-128 shapes as Flux1: check
samples.shape[1] and set fmt_cls = comfy.latent_formats.Flux2 when it equals
128, set fmt_cls = comfy.latent_formats.Flux when it equals the expected Flux
channel count (e.g., 64 or your project's Flux channels), and otherwise raise a
clear ValueError/RuntimeError referencing the invalid channel count and
latent_format so the caller fails fast; update the code that assigns fmt_cls
(the branch using latent_format, samples.shape[1], comfy.latent_formats.Flux,
and comfy.latent_formats.Flux2) to perform this validation and error handling.

In `@comfy/ldm/pixeldit/model.py`:
- Around line 200-206: The code silently drops border pixels when computing
patches (Hs = H // self.patch_size, Ws = W // self.patch_size and using F.unfold
with stride=self.patch_size), so reject inputs whose height or width are not
divisible by self.patch_size: in the beginning of the patchifying logic (around
where H, W are read and pos_img is computed and x_patches is created, and
likewise where F.fold is used to reconstruct) add a guard that checks H %
self.patch_size == 0 and W % self.patch_size == 0 and raise a clear ValueError
(mentioning expected patch_size and actual H/W) instead of proceeding; reference
the functions/variables _fetch_patch_pos, x_patches (F.unfold), and the
reconstruction code that uses F.fold to locate where to add the guard.
- Around line 221-223: The loop over self.patch_blocks is hardcoding the
attention mask as None when calling each block, so padded text tokens remain
visible; update the call inside the loop in model.py (the code invoking
self.patch_blocks and _pre_patch_block) to pass the attention_mask from the
outer forward signature into each block call instead of None (i.e., forward the
attention_mask to MMDiTJointAttention.forward via the existing
transformer_options/kwargs), ensuring every patch block receives the same
attention_mask.

In `@comfy/ldm/pixeldit/modules.py`:
- Around line 128-139: The module sizes compress_to_attn and expand_from_attn
using the constructor patch_size but forward() recomputes a caller-supplied
patch_size; add a fast-fail check to ensure they match: store the constructor
patch_size (or p2) on self (e.g., self._patch_size or self._p2) in __init__, and
in forward() assert the recomputed patch_size (or p2) equals the stored value
and raise a clear error referencing compress_to_attn/expand_from_attn if not;
apply the same guard for the related logic referenced around the other block
(lines ~155-159) to prevent opaque reshape failures.

In `@comfy/ldm/pixeldit/pid.py`:
- Around line 200-209: The _forward method currently calls degrade_sigma.to(...)
without checking for None and doesn't validate its length; update the beginning
of PidNet._forward to (1) raise a clear ValueError if degrade_sigma is None, (2)
ensure degrade_sigma is a tensor (or convert it) before calling .to(device=...,
dtype=...), and (3) validate that degrade_sigma.numel() is either 1 or equals
the batch size B, raising a ValueError on mismatch; keep the existing
expand(B).contiguous() behavior when numel()==1 so downstream gating logic
receives a 1-D tensor of length B on the correct device and dtype.

In `@comfy/model_detection.py`:
- Around line 466-481: The PiD detection only checks for an unprefixed lq_proj
key but later checks a core-prefixed pixel_embedder key, causing mis-detection
when checkpoints use a core.* prefix; update the detection to look for both
prefixed and unprefixed forms: when testing for the PiD marker, check both
'{}lq_proj.latent_proj.0.weight' and '{}core.lq_proj.latent_proj.0.weight' (and
build the gate prefix similarly from whichever matched), and when falling back
to PixelDiT T2I check both '{}core.pixel_embedder.proj.weight' and
'{}pixel_embedder.proj.weight', so the code correctly handles checkpoints with
or without the core. prefix.

---

Nitpick comments:
In `@comfy/ldm/pixeldit/modules.py`:
- Around line 149-170: PiTBlock.forward currently calls _fetch_pos each block
which calls precompute_freqs_cis_2d and reallocates the same RoPE tensor
repeatedly; add a small per-instance cache keyed by (Hs, Ws, device, dtype,
rope_options) (e.g., self._rope_cache = {}) and modify _fetch_pos to check the
cache and return the cached tensor when present, otherwise compute, store, and
return it; ensure the cache key includes transformer_options.get("rope_options")
and that tensors are moved to the correct device/dtype if needed and invalidated
when shapes or devices change to avoid stale tensors.

In `@comfy/ldm/pixeldit/pid.py`:
- Around line 117-131: Add a deterministic channel validation at the start of
_align_latent_to_patch_grid: check that lq_latent.shape[1] (channels) equals the
configured latent_channels (or self.latent_channels / expected channel count
used by LQProjection2D) and raise a clear ValueError if it does not; this
prevents an opaque backend shape-mismatch later in the forward pass and directs
the user to the expected tensor format before any interpolation or reshape logic
runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1afcc0ec-af3d-48ac-849b-40d5b1da9892

📥 Commits

Reviewing files that changed from the base of the PR and between f9f54ca and 394fd1f.

📒 Files selected for processing (12)
  • comfy/latent_formats.py
  • comfy/ldm/modules/diffusionmodules/mmdit.py
  • comfy/ldm/pixeldit/model.py
  • comfy/ldm/pixeldit/modules.py
  • comfy/ldm/pixeldit/pid.py
  • comfy/model_base.py
  • comfy/model_detection.py
  • comfy/sd.py
  • comfy/supported_models.py
  • comfy/text_encoders/pixeldit.py
  • comfy_extras/nodes_pid.py
  • nodes.py

Comment thread comfy_extras/nodes_pid.py
Comment thread comfy/ldm/pixeldit/model.py
Comment thread comfy/ldm/pixeldit/model.py
Comment thread comfy/ldm/pixeldit/modules.py
Comment thread comfy/ldm/pixeldit/pid.py
Comment thread comfy/model_detection.py
@lrzjason
Copy link
Copy Markdown

lrzjason commented May 26, 2026

Hello kijai,
I am facing a huge quality different between pid_flux1_1024_to_4096_4step_mxfp8 and pid_flux2_1024_to_4096_4step_mxfp8.
image
Workflow (basically same workflow just changed the model and latent format):
image
Compare Flux1 Pid and Flux2 Pid.json

It is lack of example in original paper for flux2. I am not sure it is a model issue or implementation issue.
Could you have some advice?

@mariojuniortrab
Copy link
Copy Markdown

Hello,

I am facing the following error:

"TypeError: HostBuffer.init() takes 2 positional arguments but 4 were given"

I know this error can happen when there is an incompatibility between some custom nodes and the installed ComfyUI version.

I had an updated version of ComfyUI installed, and then I did the following:

  • Went to my ComfyUI installation path
  • Ran git pull origin master
  • Created a new branch called test-pid
  • Merged this PR/branch into my new local branch

After that, I started getting this error in every workflow I try to run.

My guess is that this may be caused by a version mismatch between ComfyUI and one or more custom nodes/dependencies, but I am not sure which versions I should be using.

If anyone could help me identify what needs to be updated or downgraded, I would really appreciate it.

If not, no problem — I can wait until this PR is officially merged.

Thanks!

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 26, 2026

Hello,

I am facing the following error:

"TypeError: HostBuffer.init() takes 2 positional arguments but 4 were given"

I know this error can happen when there is an incompatibility between some custom nodes and the installed ComfyUI version.

I had an updated version of ComfyUI installed, and then I did the following:

  • Went to my ComfyUI installation path
  • Ran git pull origin master
  • Created a new branch called test-pid
  • Merged this PR/branch into my new local branch

After that, I started getting this error in every workflow I try to run.

My guess is that this may be caused by a version mismatch between ComfyUI and one or more custom nodes/dependencies, but I am not sure which versions I should be using.

If anyone could help me identify what needs to be updated or downgraded, I would really appreciate it.

If not, no problem — I can wait until this PR is officially merged.

Thanks!

This is probably due to the PR code being new and requiring the latest comfy-aimdo, you'd have to update that.

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 26, 2026

It is lack of example in original paper for flux2. I am not sure it is a model issue or implementation issue. Could you have some advice?

The flux2 version is much more sensitive to sigmas, and the model doesn't do well with portrait resolution I find.

This is from my test script that runs the original and ComfyUI implementation:

zimage_comparison_grid_2x4

As you can see, the flux2 version seems to just perform poorly for upscale task, at least on this image. It's a bit better in ComfyUI if you use the manual sigmas instead of the simple schedule, or just higher shift, but the lighting change seems inevitable.

Comment thread comfy/text_encoders/pixeldit.py Outdated


def _build_padded_tokens(combined_text: str, spiece_tokenizer, pad_id: int, chi_token_count: int):
# Right-pad to chi_token_count + 300 - 2 (matches upstream's max_length_all).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed? SDTokenizer class already has functionality to pad the tokens.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's indeed not, also other redundant things in this file so cleaned it up more.

@TheNeObr
Copy link
Copy Markdown

Kijai, the key is to forget about implementing this directly in the Core and instead work with a centralized pipeline focused solely on the model. I created a set of nodes to run this workflow without integrating with other nodes, using a standalone upscaling and enhancement system. I also built a tile-based upscaling system that lets you increase the image quality as much as you want using the model’s multiplier, making it easier to handle high resolutions even at the 24GB limit.

If you're interested and would like to take a look at what I've done, I can send it to you, though keep in mind that I'm not a professional programmer, just an enthusiast who uses AI to code. =)

image

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 26, 2026

Kijai, the key is to forget about implementing this directly in the Core and instead work with a centralized pipeline focused solely on the model. I created a set of nodes to run this workflow without integrating with other nodes, using a standalone upscaling and enhancement system. I also built a tile-based upscaling system that lets you increase the image quality as much as you want using the model’s multiplier, making it easier to handle high resolutions even at the 24GB limit.

If you're interested and would like to take a look at what I've done, I can send it to you, though keep in mind that I'm not a professional programmer, just an enthusiast who uses AI to code. =)

image

The key to what exactly? The original pipeline is highly inefficient, doesn't allow using native memory management, doesn't allow using sageattn (which doubles the inference speed for me with negligible quality loss, and numerous other things we can combine this model with. When you start to add such features you realize you're working backwards to what the core already offers, and ends up just being more work. And I say this as someone who used to do wrappers, that is not the way.

@TheNeObr
Copy link
Copy Markdown

TheNeObr commented May 26, 2026

Kijai, the key is to forget about implementing this directly in the Core and instead work with a centralized pipeline focused solely on the model. I created a set of nodes to run this workflow without integrating with other nodes, using a standalone upscaling and enhancement system. I also built a tile-based upscaling system that lets you increase the image quality as much as you want using the model’s multiplier, making it easier to handle high resolutions even at the 24GB limit.
If you're interested and would like to take a look at what I've done, I can send it to you, though keep in mind that I'm not a professional programmer, just an enthusiast who uses AI to code. =)
image

The key to what exactly? The original pipeline is highly inefficient, doesn't allow using native memory management, doesn't allow using sageattn (which doubles the inference speed for me with negligible quality loss, and numerous other things we can combine this model with. When you start to add such features you realize you're working backwards to what the core already offers, and ends up just being more work. And I say this as someone who used to do wrappers, that is not the way.

I may be talking nonsense, but the way I understand how this model works is completely different from how we use other models, and I believe that running it in a workflow as we know it isn't practical.
I may be wrong—please correct me if I am—but this is how I see my day-to-day use. I created a node that runs the entire workflow in isolation, just as Supir did, or even SEEDVR2 does. I’m posting what I created; if it’s useful in any way to the community, that’s good enough.... I’m just trying to help within the limits of my knowledge.

I've posted the Node.js package I developed. If it's of any help to the community or even sparks ideas for the official implementation, I'd be happy to help.

[]'s

https://github.com/TheNeObr/Confyui-PiD

@comfyanonymous comfyanonymous merged commit 28f4ef2 into Comfy-Org:master May 27, 2026
14 checks passed
@zwukong
Copy link
Copy Markdown

zwukong commented May 27, 2026

only 1024*1024 works, 9:16 will add green pixels to the bottom. using flux1-1024-4K. Flux2 all failed, it will change the entire color.
2026-05-27_152957

2026-05-27_153006

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 27, 2026

Kijai, the key is to forget about implementing this directly in the Core and instead work with a centralized pipeline

I may be talking nonsense, but the way I understand how this model works is completely different from how we use other models, and I believe that running it in a workflow as we know it isn't practical. I may be wrong—please correct me if I am—but this is how I see my day-to-day use. I created a node that runs the entire workflow in isolation, just as Supir did, or even SEEDVR2 does. I’m posting what I created; if it’s useful in any way to the community, that’s good enough.... I’m just trying to help within the limits of my knowledge.

Well that is completely wrong, especially for this model that's best used as part of a pipeline, as the decode/upscale stage. Only reason SUPIR/SEEDVR2 were standalone was the effort/knowledge needed to integrate them to the core, not because they'd work better standalone, very much the opposite.

Also I'd appreciate if you didn't come here telling me I'm wasting my time and advertising your node pack (FYI there are already multiple other wrappers for it too). It is not better approach, just one that takes less effort.

@TheNeObr
Copy link
Copy Markdown

Kijai, the key is to forget about implementing this directly in the Core and instead work with a centralized pipeline

I may be talking nonsense, but the way I understand how this model works is completely different from how we use other models, and I believe that running it in a workflow as we know it isn't practical. I may be wrong—please correct me if I am—but this is how I see my day-to-day use. I created a node that runs the entire workflow in isolation, just as Supir did, or even SEEDVR2 does. I’m posting what I created; if it’s useful in any way to the community, that’s good enough.... I’m just trying to help within the limits of my knowledge.

Well that is completely wrong, especially for this model that's best used as part of a pipeline, as the decode/upscale stage. Only reason SUPIR/SEEDVR2 were standalone was the effort/knowledge needed to integrate them to the core, not because they'd work better standalone, very much the opposite.

Also I'd appreciate if you didn't come here telling me I'm wasting my time and advertising your node pack (FYI there are already multiple other wrappers for it too). It is not better approach, just one that takes less effort.

I never meant to offend you or belittle your work—it was just a figure of speech. I believe what you say, and yes, if we commit to implementing it, things will certainly improve.... I apologize; I didn’t mean to offend you by posting the link to the node I created... I just wanted to help in some way by trying out the technology in an alternative way until the implementation is stable.

[ ]'s

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 27, 2026

Kijai, the key is to forget about implementing this directly in the Core and instead work with a centralized pipeline

I may be talking nonsense, but the way I understand how this model works is completely different from how we use other models, and I believe that running it in a workflow as we know it isn't practical. I may be wrong—please correct me if I am—but this is how I see my day-to-day use. I created a node that runs the entire workflow in isolation, just as Supir did, or even SEEDVR2 does. I’m posting what I created; if it’s useful in any way to the community, that’s good enough.... I’m just trying to help within the limits of my knowledge.

Well that is completely wrong, especially for this model that's best used as part of a pipeline, as the decode/upscale stage. Only reason SUPIR/SEEDVR2 were standalone was the effort/knowledge needed to integrate them to the core, not because they'd work better standalone, very much the opposite.
Also I'd appreciate if you didn't come here telling me I'm wasting my time and advertising your node pack (FYI there are already multiple other wrappers for it too). It is not better approach, just one that takes less effort.

I never meant to offend you or belittle your work—it was just a figure of speech. I believe what you say, and yes, if we commit to implementing it, things will certainly improve.... I apologize; I didn’t mean to offend you by posting the link to the node I created... I just wanted to help in some way by trying out the technology in an alternative way until the implementation is stable.

[ ]'s

Ok, well appreciate the clarification. The implementation is stable and tested to match the original code, with additional memory optimizations, rest is up to the workflows. It's not the easiest model to use due to the resolution and some aspect ratio restrictions, the green artifact comes from going outside the model's training for example (and yes tiling can be one way to combat that). I know my examples here are just barebones basic ones and should only be taken as starting point.

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 27, 2026

This PR adds preliminary context window support, so we can do sliding tiling on 1D axis to avoid the aspect ratio issues of the model:

#14136 (comment)

image image

@jtreminio
Copy link
Copy Markdown

Heads' up for anyone following along: the PiD Conditioning.latent_format dropdown options were changed to simply flux and sd3, removing flux1 and flux2

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 27, 2026

Heads' up for anyone following along: the PiD Conditioning.latent_format dropdown options were changed to simply flux and sd3, removing flux1 and flux2

Yes, sorry for not mentioning that here, for the actually merged version it's just "flux" and "sd3" currently since it's easy to autodetect flux2, this makes the node automatic for anything but sd3.

@jtreminio
Copy link
Copy Markdown

Thanks @kijai - I believe the attached JSON workflows from first post still use the old values.

@yeyingxian
Copy link
Copy Markdown

Hi @kijai . Why PiD for flux2, will make the picture turn whitish? I use ERNIE-Image to generate the origin picture
QQ图片20260528101533

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 28, 2026

Hi @kijai . Why PiD for flux2, will make the picture turn whitish? I use ERNIE-Image to generate the origin picture QQ图片20260528101533

It's just how the model is, I don't know why, could be the distillation (undistilled models are not released yet) or just their training in general.. all their examples are bright scenes too. The ComfyUI implementation matches what I get from their code:

0107_comparison_grid

@Heliumrich
Copy link
Copy Markdown

What would have the most "real" details, generating at 1536x1536 natively, or using PiD to get from 1024x1024 to 4096x4096 ?
Not talking of overall sharpness, but having real fine details that are not sharpened random blobs

@kijai
Copy link
Copy Markdown
Collaborator Author

kijai commented May 28, 2026

What would have the most "real" details, generating at 1536x1536 natively, or using PiD to get from 1024x1024 to 4096x4096 ? Not talking of overall sharpness, but having real fine details that are not sharpened random blobs

I suppose it depends on the model, but probably the best use is to generate at given models native resolution, ending it early and passing the latent with the noise to PiD, then using the degrade_sigma to finish and decode the image. I don't know the optimal workflow for this yet, but seems to work great with Z-image at least.

@yifanlu0227
Copy link
Copy Markdown

yifanlu0227 commented May 28, 2026

Hey guys, we have a new checkpoint for FLUX.2 2kto4k that solves the color drifting problem. We will release the fixed checkpoint soon!!

Thanks for merging PiD to ComfyUI!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.