Skip to content

ComfyUI fails to cache model to RAM, reloads from disk every run. #14076

@BobJohnson24

Description

@BobJohnson24

Custom Node Testing

Expected Behavior

Cache models to RAM as needed.

Actual Behavior

Reads models from disk once they are evicted from VRAM, meaning every single time you run a workflow with multiple models that don't fit entirely into VRAM.

Steps to Reproduce

WanWorkflow.json

Run this workflow on 24GB VRAM. Marvel at endless disk loading every time you re-run the workflow.

Image

Debug Logs

Downloads/ComfyUINew$ uv run main.py 
setup plugin alembic.autogenerate.schemas
setup plugin alembic.autogenerate.tables
setup plugin alembic.autogenerate.types
setup plugin alembic.autogenerate.constraints
setup plugin alembic.autogenerate.defaults
setup plugin alembic.autogenerate.comments
Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
Found comfy_kitchen backend cuda: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_mxfp8', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_mxfp8', 'scaled_mm_nvfp4']}
Checkpoint files will always be loaded safely.
Total VRAM 24105 MB, total RAM 63989 MB
pytorch version: 2.12.0+cu130
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 Ti : cudaMallocAsync
Using async weight offloading with 2 streams
Enabled pinned memory 57590.0
Using pytorch attention
aimdo: /project/src-posix/cuda-funchooks.c:52:DEBUG:aimdo_setup_hooks: hooks successfully installed
aimdo: /project/src/control.c:236:INFO:comfy-aimdo inited for GPU: NVIDIA GeForce RTX 3090 Ti (VRAM: 24105 MB)
DynamicVRAM support detected and enabled
Python version: 3.12.3 (main, Jan  8 2026, 11:30:50) [GCC 13.3.0]
ComfyUI version: 0.22.0
comfy-aimdo version: 0.4.3
comfy-kitchen version: 0.2.8
****** User settings have been changed to be stored on the server instead of browser storage. ******
****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******
comfyui-frontend-package version: 1.43.18
comfyui-workflow-templates version: 0.9.82
comfyui-embedded-docs version: 0.5.0
comfy-kitchen version: 0.2.8
comfy-aimdo version: 0.4.3
[Prompt Server] web root: /home/il/Downloads/ComfyUINew/.venv/lib/python3.12/site-packages/comfyui_frontend_package/static
Asset seeder disabled

Import times for custom nodes:
   0.0 seconds: /home/il/Downloads/ComfyUINew/custom_nodes/websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
Context impl SQLiteImpl.
Will assume non-transactional DDL.
Running upgrade  -> 0001_assets, Initial assets schema
Revision ID: 0001_assets
Revises: None
Create Date: 2025-12-10 00:00:00
Running upgrade 0001_assets -> 0002_merge_to_asset_references, Merge AssetInfo and AssetCacheState into unified asset_references table.
Running upgrade 0002_merge_to_asset_references -> 0003_add_metadata_job_id, Add system_metadata and job_id columns to asset_references.
Change preview_id FK from assets.id to asset_references.id.
Database upgraded from None to 0003_add_metadata_job_id
Using RAM pressure cache.
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
Model WanTEModel prepared for dynamic VRAM loading. 6419MB Staged. 0 patches attached. Force pre-loaded 73 weights: 488 KB.
Model WanTEModel prepared for dynamic VRAM loading. 6419MB Staged. 0 patches attached. Force pre-loaded 73 weights: 488 KB.
Requested to load WanVAE
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.32s/it]
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.34s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 226.60 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:06<00:00,  3.03s/it]
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.60s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 174.92 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.22s/it]
0 models unloaded.
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.43s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 167.58 seconds

Other

Might be related to #13978 but this is without GGUF or any other custom nodes, brand new install, etc etc.

PS: Also happens with FP8 model for both, though host mem usage according to nvtop now is only 2.5GB, down from about 16GB in bf16. Which also serves again as a highlight for the failure to cache to RAM.

Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.32s/it]
model weight dtype torch.float8_e4m3fn, manual cast: torch.float16
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.18s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 256.09 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.64s/it]
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.09s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 146.36 seconds

Metadata

Metadata

Assignees

Labels

Potential BugUser is reporting a bug. This should be tested.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions