Custom Node Testing
Expected Behavior
Cache models to RAM as needed.
Actual Behavior
Reads models from disk once they are evicted from VRAM, meaning every single time you run a workflow with multiple models that don't fit entirely into VRAM.
Steps to Reproduce
WanWorkflow.json
Run this workflow on 24GB VRAM. Marvel at endless disk loading every time you re-run the workflow.
Debug Logs
Downloads/ComfyUINew$ uv run main.py
setup plugin alembic.autogenerate.schemas
setup plugin alembic.autogenerate.tables
setup plugin alembic.autogenerate.types
setup plugin alembic.autogenerate.constraints
setup plugin alembic.autogenerate.defaults
setup plugin alembic.autogenerate.comments
Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
Found comfy_kitchen backend cuda: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_mxfp8', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_mxfp8', 'scaled_mm_nvfp4']}
Checkpoint files will always be loaded safely.
Total VRAM 24105 MB, total RAM 63989 MB
pytorch version: 2.12.0+cu130
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 Ti : cudaMallocAsync
Using async weight offloading with 2 streams
Enabled pinned memory 57590.0
Using pytorch attention
aimdo: /project/src-posix/cuda-funchooks.c:52:DEBUG:aimdo_setup_hooks: hooks successfully installed
aimdo: /project/src/control.c:236:INFO:comfy-aimdo inited for GPU: NVIDIA GeForce RTX 3090 Ti (VRAM: 24105 MB)
DynamicVRAM support detected and enabled
Python version: 3.12.3 (main, Jan 8 2026, 11:30:50) [GCC 13.3.0]
ComfyUI version: 0.22.0
comfy-aimdo version: 0.4.3
comfy-kitchen version: 0.2.8
****** User settings have been changed to be stored on the server instead of browser storage. ******
****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******
comfyui-frontend-package version: 1.43.18
comfyui-workflow-templates version: 0.9.82
comfyui-embedded-docs version: 0.5.0
comfy-kitchen version: 0.2.8
comfy-aimdo version: 0.4.3
[Prompt Server] web root: /home/il/Downloads/ComfyUINew/.venv/lib/python3.12/site-packages/comfyui_frontend_package/static
Asset seeder disabled
Import times for custom nodes:
0.0 seconds: /home/il/Downloads/ComfyUINew/custom_nodes/websocket_image_save.py
Context impl SQLiteImpl.
Will assume non-transactional DDL.
Context impl SQLiteImpl.
Will assume non-transactional DDL.
Running upgrade -> 0001_assets, Initial assets schema
Revision ID: 0001_assets
Revises: None
Create Date: 2025-12-10 00:00:00
Running upgrade 0001_assets -> 0002_merge_to_asset_references, Merge AssetInfo and AssetCacheState into unified asset_references table.
Running upgrade 0002_merge_to_asset_references -> 0003_add_metadata_job_id, Add system_metadata and job_id columns to asset_references.
Change preview_id FK from assets.id to asset_references.id.
Database upgraded from None to 0003_add_metadata_job_id
Using RAM pressure cache.
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
Model WanTEModel prepared for dynamic VRAM loading. 6419MB Staged. 0 patches attached. Force pre-loaded 73 weights: 488 KB.
Model WanTEModel prepared for dynamic VRAM loading. 6419MB Staged. 0 patches attached. Force pre-loaded 73 weights: 488 KB.
Requested to load WanVAE
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00, 1.32s/it]
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00, 1.34s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 226.60 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:06<00:00, 3.03s/it]
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.60s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 174.92 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.22s/it]
0 models unloaded.
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.43s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 167.58 seconds
Other
Might be related to #13978 but this is without GGUF or any other custom nodes, brand new install, etc etc.
PS: Also happens with FP8 model for both, though host mem usage according to nvtop now is only 2.5GB, down from about 16GB in bf16. Which also serves again as a highlight for the failure to cache to RAM.
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.32s/it]
model weight dtype torch.float8_e4m3fn, manual cast: torch.float16
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.18s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 256.09 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.64s/it]
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.09s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 146.36 seconds
Custom Node Testing
Expected Behavior
Cache models to RAM as needed.
Actual Behavior
Reads models from disk once they are evicted from VRAM, meaning every single time you run a workflow with multiple models that don't fit entirely into VRAM.
Steps to Reproduce
WanWorkflow.json
Run this workflow on 24GB VRAM. Marvel at endless disk loading every time you re-run the workflow.
Debug Logs
Other
Might be related to #13978 but this is without GGUF or any other custom nodes, brand new install, etc etc.
PS: Also happens with FP8 model for both, though host mem usage according to nvtop now is only 2.5GB, down from about 16GB in bf16. Which also serves again as a highlight for the failure to cache to RAM.