Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
173 commits
Select commit Hold shift + click to select a range
72bbf49
Add 'sigmas' to transformer_options so that downstream code can know …
Kosinkadink Dec 29, 2024
bf21be0
Merge branch 'master' into hooks_part2
Kosinkadink Dec 30, 2024
d44295e
Merge branch 'master' into hooks_part2
Kosinkadink Jan 4, 2025
5a2ad03
Cleaned up hooks.py, refactored Hook.should_register and add_hook_pat…
Kosinkadink Jan 4, 2025
776aa73
Refactor WrapperHook into TransformerOptionsHook, as there is no need…
Kosinkadink Jan 4, 2025
111fd0c
Refactored HookGroup to also store a dictionary of hooks separated by…
Kosinkadink Jan 4, 2025
6620d86
In inner_sample, change "sigmas" to "sampler_sigmas" in transformer_o…
Kosinkadink Jan 5, 2025
db2d7ad
Merge branch 'add_sample_sigmas' into hooks_part2
Kosinkadink Jan 5, 2025
8270ff3
Refactored 'registered' to be HookGroup instead of a list of Hooks, m…
Kosinkadink Jan 6, 2025
4446c86
Made hook clone code sane, made clear ObjectPatchHook and SetInjectio…
Kosinkadink Jan 6, 2025
03a97b6
Fix performance of hooks when hooks are appended via Cond Pair Set Pr…
Kosinkadink Jan 6, 2025
0a7e2ae
Filter only registered hooks on self.conds in CFGGuider.sample
Kosinkadink Jan 6, 2025
6463c39
Merge branch 'master' into hooks_part2
Kosinkadink Jan 6, 2025
f48f90e
Make hook_scope functional for TransformerOptionsHook
Kosinkadink Jan 6, 2025
2724ac4
Merge branch 'master' into hooks_part2
Kosinkadink Jan 6, 2025
1b38f5b
removed 4 whitespace lines to satisfy Ruff,
Kosinkadink Jan 6, 2025
58bf881
Add a get_injections function to ModelPatcher
Kosinkadink Jan 7, 2025
216fea1
Made TransformerOptionsHook contribute to registered hooks properly, …
Kosinkadink Jan 7, 2025
11c6d56
Merge branch 'master' into hooks_part2
Kosinkadink Jan 7, 2025
3cd4c5c
Rename AddModelsHooks to AdditionalModelsHook, rename SetInjectionsHo…
Kosinkadink Jan 7, 2025
7333281
Clean up a typehint
Kosinkadink Jan 7, 2025
66838eb
Merge branch 'comfyanonymous:master' into multigpu_support
Kosinkadink Jan 8, 2025
871258a
Add get_all_torch_devices to get detected devices intended for curren…
Kosinkadink Jan 8, 2025
7448f02
Initial proof of concept of giving splitting cond sampling between mu…
Kosinkadink Jan 8, 2025
d3cf2b7
Merge branch 'comfyanonymous:master' into multigpu_support
Kosinkadink Jan 11, 2025
e88c6c0
Fix cond_cat to not try to cast anything that doesn't have a 'to' fun…
Kosinkadink Jan 11, 2025
8d4b501
Merge branch 'master' into multigpu_support
Kosinkadink Jan 12, 2025
d508807
Make test node for multigpu instead of storing it in just a local __i…
Kosinkadink Jan 14, 2025
ec16ee2
Merge branch 'master' into multigpu_support
Kosinkadink Jan 14, 2025
198953c
Add nodes_multigpu.py to loaded nodes
Kosinkadink Jan 14, 2025
25818dc
Added a 'max_gpus' input
Kosinkadink Jan 14, 2025
2145a20
Merge branch 'master' into multigpu_support
Kosinkadink Jan 16, 2025
31f5458
Merge branch 'master' into multigpu_support
Kosinkadink Jan 17, 2025
bfce723
Initial work on multigpu_clone function, which will account for addit…
Kosinkadink Jan 17, 2025
6c9e94b
Merge branch 'master' into multigpu_support
Kosinkadink Jan 20, 2025
bdbcb85
Merge branch 'multigpu_support' of https://github.com/Kosinkadink/Com…
Kosinkadink Jan 20, 2025
328d4f1
Make WeightHooks compatible with MultiGPU, clean up some code
Kosinkadink Jan 20, 2025
ef137ac
Merge branch 'multigpu_support' of https://github.com/kosinkadink/Com…
Kosinkadink Jan 20, 2025
02a4d0a
Added unload_model_and_clones to model_management.py to allow unloadi…
Kosinkadink Jan 23, 2025
5db4277
Make sure additional_models are unloaded as well when perform
Kosinkadink Jan 24, 2025
46969c3
Initial MultiGPU support for controlnets
Kosinkadink Jan 24, 2025
51af7fa
Fix multigpu ControlBase get_models and cleanup calls to avoid multip…
Kosinkadink Jan 25, 2025
c7feef9
Cast transformer_options for multigpu
Kosinkadink Jan 26, 2025
e3298b8
Create proper MultiGPU Initialize node, create gpu_options to create …
Kosinkadink Jan 26, 2025
eda866b
Extracted multigpu core code into multigpu.py, added load_balance_dev…
Kosinkadink Jan 27, 2025
0b3233b
Merge remote-tracking branch 'origin/master' into multigpu_support
Kosinkadink Jan 28, 2025
02747cd
Carry over change from _calc_cond_batch into _calc_cond_batch_multigpu
Kosinkadink Jan 29, 2025
99a5c10
Merge branch 'master' into multigpu_support
Kosinkadink Feb 2, 2025
441cfd1
Merge branch 'master' into multigpu_support
Kosinkadink Feb 6, 2025
476aa79
Let --cuda-device take in a string to allow multiple devices (or devi…
Kosinkadink Feb 6, 2025
b03763b
Merge branch 'multigpu_support' into worksplit-multigpu
Kosinkadink Feb 7, 2025
d2504fb
Merge branch 'master' into worksplit-multigpu
Kosinkadink Feb 12, 2025
048f4f0
Merge branch 'master' into worksplit-multigpu
Kosinkadink Feb 18, 2025
605893d
Merge branch 'master' into worksplit-multigpu
Kosinkadink Feb 25, 2025
093914a
Made MultiGPU Work Units node more robust by forcing ModelPatcher clo…
Kosinkadink Mar 4, 2025
5080105
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 4, 2025
6dca17b
Satisfy ruff linting
Kosinkadink Mar 4, 2025
6e144b9
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 9, 2025
cc928a7
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 14, 2025
c4ba399
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 15, 2025
219d3cd
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 17, 2025
5ccec33
Merge branch 'worksplit-multigpu' of https://github.com/comfyanonymou…
Kosinkadink Mar 17, 2025
4879b47
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 19, 2025
a786ce5
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 27, 2025
63567c0
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 28, 2025
9ce9ff8
Allow chained MultiGPU Work Unit nodes to affect max_gpus present on …
Kosinkadink Mar 28, 2025
407a5a6
Rollback core of last commit due to weird behavior
Kosinkadink Mar 28, 2025
2fa9aff
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 9, 2025
ccd5c01
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 9, 2025
adc66c0
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 16, 2025
ed6f92c
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 16, 2025
2a54a90
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 17, 2025
b5cccf1
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 18, 2025
8be7117
Make unload_all_models account for all devices
Kosinkadink Apr 19, 2025
6211d2b
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 19, 2025
272e8d4
Merge branch 'master' into worksplit-multigpu
Kosinkadink Apr 23, 2025
9726eac
Merge branch 'master' into worksplit-multigpu
Kosinkadink May 13, 2025
8ae2523
Merge branch 'master' into worksplit-multigpu
Kosinkadink May 21, 2025
0336b0a
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jun 1, 2025
1ae9893
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jun 17, 2025
44e053c
Improve error handling for multigpu threads
Kosinkadink Jun 24, 2025
431dec8
Merge branch 'worksplit-multigpu' of https://github.com/comfyanonymou…
Kosinkadink Jun 24, 2025
443a795
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jun 24, 2025
d53479a
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jul 1, 2025
9855baa
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jul 9, 2025
3c41046
Merge branch 'master' into worksplit-multigpu-wip
Kosinkadink Jul 22, 2025
3b90a30
Merge branch 'master' into worksplit-multigpu-wip
Kosinkadink Jul 27, 2025
5d50242
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jul 28, 2025
9cca36f
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jul 29, 2025
382f84a
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jul 30, 2025
67e906a
Merge branch 'master' into worksplit-multigpu
Kosinkadink Jul 31, 2025
df122a7
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 1, 2025
b4f559b
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 5, 2025
6ea6936
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 8, 2025
962c3c8
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 11, 2025
cfb63bf
Merge branch 'worksplit-multigpu' of https://github.com/comfyanonymou…
Kosinkadink Aug 11, 2025
3677943
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 13, 2025
1489399
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 14, 2025
b0741c7
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 15, 2025
383f9b3
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 17, 2025
2c8f485
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 18, 2025
ac14ee6
Merge branch 'master' into worksplit-multigpu
Kosinkadink Aug 19, 2025
9e9c129
Merge remote-tracking branch 'origin/master' into worksplit-multigpu
Kosinkadink Aug 30, 2025
efcd828
Merge branch 'master' into worksplit-multigpu
Kosinkadink Sep 12, 2025
bb44c2e
Merge branch 'master' into worksplit-multigpu
Kosinkadink Sep 18, 2025
c2115a4
Merge branch 'master' into worksplit-multigpu
Kosinkadink Sep 25, 2025
8cbbf0b
Merge branch 'master' into worksplit-multigpu
Kosinkadink Oct 14, 2025
d89dd5f
Satisfy ruff
Kosinkadink Oct 14, 2025
b326a54
Merge branch 'master' into worksplit-multigpu
Kosinkadink Oct 16, 2025
4661d1d
Bring patches changes from _calc_cond_batch into _calc_cond_batch_mul…
Kosinkadink Oct 16, 2025
df2fd4c
Merge branch 'master' into worksplit-multigpu
Kosinkadink Feb 17, 2026
f4b99bc
Made multigpu deepclone load model from disk to avoid needing to deep…
Kosinkadink Feb 17, 2026
f410d28
Merge origin/master into worksplit-multigpu
Kosinkadink Mar 18, 2026
be35378
Merge branch 'master' into worksplit-multigpu
Kosinkadink Mar 30, 2026
84f465e
Set CUDA device at start of multigpu threads to avoid multithreading …
Kosinkadink Mar 30, 2026
d52dcbc
Rewrite multigpu nodes to V3 format
Kosinkadink Mar 30, 2026
5f4fcd1
Simplify multigpu nodes: default max_gpus=2, remove gpu_options input…
Kosinkadink Mar 30, 2026
1d8e379
Rename MultiGPU Work Units to MultiGPU CFG Split
Kosinkadink Mar 30, 2026
afdddce
Re-enable comfy-kitchen cuda backend for multigpu testing
Kosinkadink Mar 30, 2026
3fab720
Add debug logging for device mismatch in ModelPatcherDynamic.load
Kosinkadink Mar 30, 2026
2080374
Add detailed multigpu debug logging to load_models_gpu
Kosinkadink Mar 30, 2026
b418fb1
Fix device mismatch: update LoadedModel.device when _switch_parent sw…
Kosinkadink Mar 30, 2026
da38644
Merge remote-tracking branch 'origin/master' into worksplit-multigpu
Kosinkadink Apr 8, 2026
4b93c43
Implement persistent thread pool for multi-GPU CFG splitting (#13329)
Kosinkadink Apr 8, 2026
48deb15
Simplify multigpu dispatch: run all devices on pool threads (#13340)
Kosinkadink Apr 9, 2026
f0d550b
Minor updates for worksplit_gpu with comfy-aimdo (#13419)
rattus128 Apr 16, 2026
37deccb
Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of…
Kosinkadink Apr 20, 2026
b502bcf
Merge remote-tracking branch 'origin/master' into worksplit-multigpu
Kosinkadink Apr 20, 2026
7b8b367
comfy-aimdo: 0.0.214 (#13532)
rattus128 Apr 24, 2026
aa464b3
Multi-GPU device selection for loader nodes + CUDA context fixes (#13…
Kosinkadink Apr 24, 2026
1b96430
Merge master into worksplit-multigpu (#13546)
Kosinkadink Apr 24, 2026
a61e2bb
Add device selection on Image Only Load Checkpoint (CORE-158) (#13748)
alexisrolland May 7, 2026
9e3ede1
Fix MultiGPU scheduler capacity accounting (#14000)
Kosinkadink May 20, 2026
819c7c0
Refactor MultiGPU scheduler for readability and termination safety (#…
Kosinkadink May 20, 2026
ff766e5
Merge remote-tracking branch 'origin/master' into merge-master-into-w…
Kosinkadink May 20, 2026
50d1dd6
Fix MultiGPU Options node discarding cloned GPUOptionsGroup
Kosinkadink May 20, 2026
9a681cc
Guard cached_patcher_init when output_model is False
Kosinkadink May 20, 2026
ba41775
Fix get_all_torch_devices for XPU/NPU and guard remove()
Kosinkadink May 20, 2026
dd85851
Prune inherited multigpu clones when max_gpus is lowered
Kosinkadink May 20, 2026
ac0a90c
Use cond_shapes in multigpu memory-fit check (parity with single-GPU …
Kosinkadink May 21, 2026
4d9106d
Document --cuda-device comma format and MultiGPU Options relative_spe…
Kosinkadink May 21, 2026
adde123
Restore prepare_state backward-compatible signature
May 21, 2026
9636216
Free QwenFunControlNet base_model reference in cleanup
May 21, 2026
a18dd21
Pass per-device model to multigpu control clones in pre_run_control
May 21, 2026
1417b71
Fix CodeRabbit findings in worksplit-multigpu (#14017)
Kosinkadink May 21, 2026
822a3ec
Note _calc_cond_batch and _calc_cond_batch_multigpu must stay in sync
May 21, 2026
019261e
Simplify Hunyuan 3D 2.1 swap_cfg_halves gate to a shape check
May 21, 2026
fd79f22
Backport Hunyuan 3D 2.1 attention batch-size fixes from #13699
May 21, 2026
d0b9dbb
Merge remote-tracking branch 'origin/master' into worksplit-multigpu
May 21, 2026
2ed396c
Mark non-NVIDIA multigpu gaps with TODOs in _handle_batch
Kosinkadink May 21, 2026
b649502
Report all torch devices from /system_stats
Kosinkadink May 21, 2026
df17b56
memory_management: replace thread refusal with mutex
rattus128 May 22, 2026
7a18f9a
comfy-aimdo 0.4.4
rattus128 May 22, 2026
74b0a82
Add UPSCALE_MODEL lane to MultiGPU CFG Split
pollockjj May 20, 2026
4d3d68e
Add tiled VAE lane to MultiGPU Work Units
pollockjj May 22, 2026
cb83c41
Merge pull request #14052 from rattus128/prs/worksplit-t-load-fix
Kosinkadink May 22, 2026
5dc4e38
Defer @pollockjj's tiled-VAE and UPSCALE_MODEL MultiGPU lanes (#14066)
Kosinkadink May 22, 2026
5ffea26
Fix single-GPU non-CUDA regressions on worksplit-multigpu
Kosinkadink May 23, 2026
e6c65fa
Merge pull request #14068 from Comfy-Org/fix/single-gpu-non-cuda
Kosinkadink May 23, 2026
2e5211e
Merge branch 'master' into worksplit-multigpu
Kosinkadink May 23, 2026
403ff49
Restore nodes_kling.py removal of max_poll_attempts=280 lost in merge
Kosinkadink May 23, 2026
2369eb0
Route aimdo init through get_all_torch_devices() instead of raw torch…
Kosinkadink May 23, 2026
711bb1b
Simplify aimdo init call - drop redundant type/index filter
Kosinkadink May 23, 2026
9a12a93
Revert per-loader device inputs from #13483 / #13748
Kosinkadink May 23, 2026
d770609
Add Select Model/CLIP/VAE Device passthrough nodes
Kosinkadink May 23, 2026
4e65005
SelectXDevice nodes: register new load_device with ModelPatcherDynamic
Kosinkadink May 23, 2026
9ee1540
SelectXDevice: use lowercase validate_inputs for V3 combo bypass
Kosinkadink May 23, 2026
b319c80
SelectXDevice: address code-review follow-ups
Kosinkadink May 23, 2026
5c2e34c
Merge branch 'master' into worksplit-multigpu
Kosinkadink May 23, 2026
bece6b2
multigpu: refactor deepclone_multigpu + register cached_patcher_init …
Kosinkadink May 24, 2026
ac5b7e8
multigpu: drop unused copy import; sync requirements.txt with master
Kosinkadink May 25, 2026
de487c1
Merge branch 'master' into worksplit-multigpu
Kosinkadink May 25, 2026
7d958e1
multigpu: fix CPU SelectModelDevice slowness + MGCS reuse stripping i…
Kosinkadink May 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion comfy/cli_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def __call__(self, parser, namespace, values, option_string=None):
parser.add_argument("--input-directory", type=str, default=None, help="Set the ComfyUI input directory. Overrides --base-directory.")
parser.add_argument("--auto-launch", action="store_true", help="Automatically launch ComfyUI in the default browser.")
parser.add_argument("--disable-auto-launch", action="store_true", help="Disable auto launching the browser.")
parser.add_argument("--cuda-device", type=int, default=None, metavar="DEVICE_ID", help="Set the id of the cuda device this instance will use. All other devices will not be visible.")
parser.add_argument("--cuda-device", type=str, default=None, metavar="DEVICE_ID", help="Set the ids of cuda devices this instance will use, as a comma-separated list (e.g. '0' or '0,1'). All other devices will not be visible.")
parser.add_argument("--default-device", type=int, default=None, metavar="DEFAULT_DEVICE_ID", help="Set the id of the default device, all other devices will stay visible.")
cm_group = parser.add_mutually_exclusive_group()
cm_group.add_argument("--cuda-malloc", action="store_true", help="Enable cudaMallocAsync (enabled by default for torch 2.0 and up).")
Expand Down
65 changes: 60 additions & 5 deletions comfy/controlnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,14 @@
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""

from __future__ import annotations

import torch
from enum import Enum
import math
import os
import logging
import copy
import comfy.utils
import comfy.model_management
import comfy.model_detection
Expand All @@ -38,7 +39,7 @@
import comfy.ldm.flux.controlnet
import comfy.ldm.qwen_image.controlnet
import comfy.cldm.dit_embedder
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Union
if TYPE_CHECKING:
from comfy.hooks import HookGroup

Expand All @@ -64,6 +65,18 @@ class StrengthType(Enum):
CONSTANT = 1
LINEAR_UP = 2

class ControlIsolation:
'''Temporarily set a ControlBase object's previous_controlnet to None to prevent cascading calls.'''
def __init__(self, control: ControlBase):
self.control = control
self.orig_previous_controlnet = control.previous_controlnet

def __enter__(self):
self.control.previous_controlnet = None

def __exit__(self, *args):
self.control.previous_controlnet = self.orig_previous_controlnet

class ControlBase:
def __init__(self):
self.cond_hint_original = None
Expand All @@ -77,14 +90,15 @@ def __init__(self):
self.compression_ratio = 8
self.upscale_algorithm = 'nearest-exact'
self.extra_args = {}
self.previous_controlnet = None
self.previous_controlnet: Union[ControlBase, None] = None
self.extra_conds = []
self.strength_type = StrengthType.CONSTANT
self.concat_mask = False
self.extra_concat_orig = []
self.extra_concat = None
self.extra_hooks: HookGroup = None
self.preprocess_image = lambda a: a
self.multigpu_clones: dict[torch.device, ControlBase] = {}

def set_cond_hint(self, cond_hint, strength=1.0, timestep_percent_range=(0.0, 1.0), vae=None, extra_concat=[]):
self.cond_hint_original = cond_hint
Expand All @@ -111,17 +125,38 @@ def set_previous_controlnet(self, controlnet):
def cleanup(self):
if self.previous_controlnet is not None:
self.previous_controlnet.cleanup()

for device_cnet in self.multigpu_clones.values():
with ControlIsolation(device_cnet):
device_cnet.cleanup()
self.cond_hint = None
self.extra_concat = None
self.timestep_range = None

def get_models(self):
out = []
for device_cnet in self.multigpu_clones.values():
out += device_cnet.get_models_only_self()
if self.previous_controlnet is not None:
out += self.previous_controlnet.get_models()
return out

def get_models_only_self(self):
'Calls get_models, but temporarily sets previous_controlnet to None.'
with ControlIsolation(self):
return self.get_models()

def get_instance_for_device(self, device):
'Returns instance of this Control object intended for selected device.'
return self.multigpu_clones.get(device, self)

def deepclone_multigpu(self, load_device, autoregister=False):
'''
Create deep clone of Control object where model(s) is set to other devices.

When autoregister is set to True, the deep clone is also added to multigpu_clones dict.
'''
raise NotImplementedError("Classes inheriting from ControlBase should define their own deepclone_multigpu funtion.")

def get_extra_hooks(self):
out = []
if self.extra_hooks is not None:
Expand All @@ -130,7 +165,7 @@ def get_extra_hooks(self):
out += self.previous_controlnet.get_extra_hooks()
return out

def copy_to(self, c):
def copy_to(self, c: ControlBase):
c.cond_hint_original = self.cond_hint_original
c.strength = self.strength
c.timestep_percent_range = self.timestep_percent_range
Expand Down Expand Up @@ -284,6 +319,14 @@ def copy(self):
self.copy_to(c)
return c

def deepclone_multigpu(self, load_device, autoregister=False):
c = self.copy()
c.control_model = copy.deepcopy(c.control_model)
c.control_model_wrapped = comfy.model_patcher.ModelPatcher(c.control_model, load_device=load_device, offload_device=comfy.model_management.unet_offload_device())
if autoregister:
self.multigpu_clones[load_device] = c
return c
Comment on lines +322 to +328
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the previous_controlnet chain in multigpu clones.

These new clone paths build c from copy(), but copy_to() does not carry previous_controlnet. Once get_instance_for_device() returns the per-device clone, stacked ControlNets/T2IAdapters on earlier links are silently dropped on secondary GPUs.

As per coding guidelines, comfy/** changes should focus on backward compatibility.

Also applies to: 952-958

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/controlnet.py` around lines 322 - 328, The multigpu clone path in
deepclone_multigpu currently builds c = self.copy() which does not carry the
previous_controlnet chain, causing stacked ControlNets/T2IAdapters to be lost on
secondary GPUs; update deepclone_multigpu to copy previous_controlnet (and any
linked .previous_controlnet chain) from self to c after c = self.copy() so that
the full chain is preserved, then continue deep-copying control_model and
wrapping it as before (ensure multigpu_clones[load_device] assignment remains
unchanged); apply the same preservation of previous_controlnet chaining to the
similar clone code paths that use copy_to()/get_instance_for_device() so all
per-device clones keep the full previous_controlnet chain.


def get_models(self):
out = super().get_models()
out.append(self.control_model_wrapped)
Expand Down Expand Up @@ -314,6 +357,10 @@ def pre_run(self, model, percent_to_timestep_function):
super().pre_run(model, percent_to_timestep_function)
self.set_extra_arg("base_model", model.diffusion_model)

def cleanup(self):
self.extra_args.pop("base_model", None)
super().cleanup()

def copy(self):
c = QwenFunControlNet(None, global_average_pooling=self.global_average_pooling, load_device=self.load_device, manual_cast_dtype=self.manual_cast_dtype)
c.control_model = self.control_model
Expand Down Expand Up @@ -906,6 +953,14 @@ def copy(self):
self.copy_to(c)
return c

def deepclone_multigpu(self, load_device, autoregister=False):
c = self.copy()
c.t2i_model = copy.deepcopy(c.t2i_model)
c.device = load_device
if autoregister:
self.multigpu_clones[load_device] = c
return c

def load_t2i_adapter(t2i_data, model_options={}): #TODO: model_options
compression_ratio = 8
upscale_algorithm = 'nearest-exact'
Expand Down
20 changes: 12 additions & 8 deletions comfy/ldm/hunyuan3dv2_1/hunyuandit.py
Original file line number Diff line number Diff line change
Expand Up @@ -607,9 +607,13 @@ def __init__(
def forward(self, x, t, context, transformer_options = {}, **kwargs):

x = x.movedim(-1, -2)
if context.shape[0] >= 2:
uncond_emb, cond_emb = context.chunk(2, dim = 0)
context = torch.cat([cond_emb, uncond_emb], dim = 0)

swap_cfg_halves = context.shape[0] >= 2

if swap_cfg_halves:
first_half, second_half = context.chunk(2, dim = 0)
context = torch.cat([second_half, first_half], dim = 0)

main_condition = context

t = 1.0 - t
Expand Down Expand Up @@ -657,8 +661,8 @@ def block_wrap(args):
output = self.final_layer(combined)
output = output.movedim(-2, -1) * (-1.0)

if output.shape[0] >= 2:
cond_emb, uncond_emb = output.chunk(2, dim = 0)
return torch.cat([uncond_emb, cond_emb])
else:
return output
if swap_cfg_halves:
first_half, second_half = output.chunk(2, dim = 0)
output = torch.cat([second_half, first_half], dim = 0)

return output
36 changes: 18 additions & 18 deletions comfy/memory_management.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import math
import ctypes
import threading
import dataclasses
import torch
from typing import NamedTuple
Expand All @@ -10,7 +9,7 @@

class TensorFileSlice(NamedTuple):
file_ref: object
thread_id: int
lock: object
offset: int
size: int

Expand Down Expand Up @@ -43,7 +42,6 @@ def read_tensor_file_slice_into(tensor, destination, stream=None, destination2=N
file_obj = info.file_ref
if (destination.device.type != "cpu"
or file_obj is None
or threading.get_ident() != info.thread_id
or destination.numel() * destination.element_size() < info.size
or tensor.numel() * tensor.element_size() != info.size
or tensor.storage_offset() != 0
Expand All @@ -57,27 +55,29 @@ def read_tensor_file_slice_into(tensor, destination, stream=None, destination2=N
if hostbuf is not None:
stream_ptr = getattr(stream, "cuda_stream", 0) if stream is not None else 0
device_ptr = destination2.data_ptr() if destination2 is not None else 0
hostbuf.read_file_slice(file_obj, info.offset, info.size,
offset=destination.data_ptr() - hostbuf.get_raw_address(),
stream=stream_ptr,
device_ptr=device_ptr,
device=None if destination2 is None else destination2.device.index)
with info.lock:
hostbuf.read_file_slice(file_obj, info.offset, info.size,
offset=destination.data_ptr() - hostbuf.get_raw_address(),
stream=stream_ptr,
device_ptr=device_ptr,
device=None if destination2 is None else destination2.device.index)
return True

buf_type = ctypes.c_ubyte * info.size
view = memoryview(buf_type.from_address(destination.data_ptr()))

try:
file_obj.seek(info.offset)
done = 0
while done < info.size:
try:
n = file_obj.readinto(view[done:])
except OSError:
return False
if n <= 0:
return False
done += n
with info.lock:
file_obj.seek(info.offset)
done = 0
while done < info.size:
try:
n = file_obj.readinto(view[done:])
except OSError:
return False
if n <= 0:
return False
done += n
return True
finally:
view.release()
Expand Down
Loading
Loading