fix(nodes): restore non-quantized backups locally and skip super's mmap walk by Booyaka101 · Pull Request #445 · city96/ComfyUI-GGUF

Booyaka101 · 2026-05-10T06:34:47Z

Closes #444.

Symptom

UnetLoaderGGUF wired into a multi-output node (e.g. `(MODEL, STRING)`) causes ComfyUI to flush the cached patcher and call `unpatch_model`. The forwarded `unpatch_weights=True` reaches comfy core's `ModelPatcher.unpatch_model`, which ultimately runs `self.model.to(device_to)`. Walking `.to()` over the still-mmap-backed GGUF quantized tensors faults with a Windows `EXCEPTION_ACCESS_VIOLATION`.

Background

This is a known regression. The September 2024 mitigation in 6dbb4ba gated the device_to forward on `DISABLE_SMART_MEMORY`, sidestepping the `.to()` walk in normal use. It was reverted in 717a0e1 ("Seems to break VRAM estimation and needs more involved fix with upstream ComfyUI repo").

@stonerabit's proposed fix (forward `unpatch_weights=False` to super) avoids the crash, but it also skips comfy core's non-quantized backup-restore loop. Combined with `patch_weight_to_device:51` — `if key not in self.backup` only re-saves once — the next `patch_model` reads the still-patched live weight as `temp_weight` and `calculate_weight` re-applies the delta on top, so LoRA deltas compound across runs.

Fix

Restore non-quantized backups inside the GGUF override (mirroring comfy core's loop verbatim — `copy_to_param` / `set_attr_param`, `backup.clear()`, `current_weight_patches_uuid = None`, drop `comfy_patched_weights` attr), then forward `unpatch_weights=False` to super so it skips both the weight-restore block and the `self.model.to(device_to)` walk.

Net effect:

Quantized weights still get their `.patches` cleared (existing logic).
Non-quantized backups are restored, so LoRA round-trips correctly across runs.
super still runs `eject_model()` and `object_patches_backup` restoration (those live outside the `unpatch_weights` block).
The `self.model.to(device_to)` walk is bypassed — no access violation.

Verification

Cannot run real CUDA in the dev environment, so I built a stub-based reproducer (`verify_unpatch.py`) that synthesizes the relevant interfaces: an `mmap_backed` flag on a stub tensor that raises `OSError("EXCEPTION_ACCESS_VIOLATION")` from `.to(device)` (mirroring the Windows symptom), a minimal base `unpatch_model` mirroring the comfy core sequence, and three override variants applied to the same fixture:

```
Scenario 1: pre-fix override (forwards unpatch_weights=True)
PASS crashes with simulated Windows access violation
PASS matches reporter's symptom (#444)

Scenario 2: reporter's fix (super(unpatch_weights=False))
PASS avoids the access-violation crash
PASS LoRA-correctness regression: non-quantized backup NOT restored
PASS quantized patches still cleared

Scenario 3: proposed fix (manual backup restore + super(False))
PASS avoids the access-violation crash
PASS non-quantized backup restored to original
PASS quantized patches still cleared
PASS backup dict cleared post-restore
```

Scenario 1 reproduces the exact failure mode of the live unpatch path. Scenario 2 demonstrates the LoRA-correctness gap in the simpler patch. Scenario 3 covers what this PR ships.

Scope notes

The duplicated lines mirror comfy core lines 945-955 of `model_patcher.py`. If those drift, this override needs a follow-up sync — a comment marks the rationale.
Maintaining `device_to` propagation: super still receives `device_to=device_to` so `eject_model()` runs in the right context, but with `unpatch_weights=False` the `self.model.to(device_to)` line in super is gated on `if unpatch_weights` and won't run. VRAM accounting is preserved because we don't zero `model_loaded_weight_memory` (the model genuinely stays in memory; that's the point — comfy will re-evaluate residency on the next `partially_load`).
A more involved fix at the upstream comfy layer (skip mmap-backed tensors in the `model.to()` walk) would let this override go away. Out of scope for this PR.

Credit @stonerabit for the clean repro + first-pass diagnosis pinning the mmap-touching base patcher as the trigger.

…ap walk The override forwards unpatch_weights=True to comfy core's unpatch_model, which then calls self.model.to(device_to). Walking that .to() over the GGUF-quantized tensors that are still mmap-backed crashes with a Windows access violation (city96#444), surfacing whenever ComfyUI actually invokes unpatch_model — easiest repro is wrapping UnetLoaderGGUF in a node that returns multiple outputs, which flushes cache and triggers the unpatch path. The maintainer mitigated this in Sep 2024 (commit 6dbb4ba) by passing device_to=None to super, then reverted in 717a0e1 because that broke VRAM estimation. Reporter's proposed fix (passing unpatch_weights=False to super) avoids the crash but skips comfy core's non-quantized backup restoration — patch_weight_to_device's `if key not in self.backup` guard then never re-saves the original weight, and the next run's temp_weight = weight.to(...) reads the still-patched value, so LoRA deltas compound across runs. Restore the non-quantized backups in the override mirroring comfy core's loop (copy_to_param / set_attr_param + backup.clear() + comfy_patched_weights cleanup), then forward unpatch_weights=False so super skips its weight-restore + model.to(device_to) block. Keeps the crash fix without regressing LoRA correctness or VRAM accounting. Verified with a stub-based reproducer (verify_unpatch.py) covering three scenarios: pre-fix crashes on simulated mmap .to() — matches city96#444 reporter fix avoids crash, but non-quantized weight stays patched this fix avoids crash AND restores non-quantized backup correctly Closes city96#444 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

romybaby · 2026-05-23T20:51:05Z

Confirmed working on real CUDA. RTX 3060 (12GB), Windows 10, ComfyUI 0.22.2, PyTorch 2.8.0+cu129. Running Flux 2 Klein 9b, with both TE and diffusion model as GGUFs. Was facing an issue where any attempt to unload GGUF model weights (either manually or automatically, in my case triggered by normal between-run model caching) led to the same "Windows fatal exception: access violation" fault described above. After applying this patch, both manual and automatic unloads work and repeated runs are stable. LoRAs produce identical output across multiple runs with the same seed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(nodes): restore non-quantized backups locally and skip super's mmap walk#445

fix(nodes): restore non-quantized backups locally and skip super's mmap walk#445
Booyaka101 wants to merge 1 commit into
city96:mainfrom
Booyaka101:fix/unpatch-model-mmap-access-violation

Booyaka101 commented May 10, 2026

Uh oh!

romybaby commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Booyaka101 commented May 10, 2026

Symptom

Background

Fix

Verification

Scope notes

Uh oh!

romybaby commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants