ComfyUI fails to cache model to RAM, reloads from disk every run.

### Custom Node Testing

- [x] I have tried disabling custom nodes and the issue persists (see [how to disable custom nodes](https://docs.comfy.org/troubleshooting/custom-node-issues#step-1%3A-test-with-all-custom-nodes-disabled) if you need help)

### Expected Behavior

Cache models to RAM as needed.

### Actual Behavior

Reads models from disk once they are evicted from VRAM, meaning every single time you run a workflow with multiple models that don't fit entirely into VRAM.

### Steps to Reproduce

[WanWorkflow.json](https://github.com/user-attachments/files/28176975/WanWorkflow.json)

Run this workflow on 24GB VRAM. Marvel at endless disk loading every time you re-run the workflow. 

<img width="614" height="168" alt="Image" src="https://github.com/user-attachments/assets/719c2bc1-0740-4e81-a3ce-3d282e4f01f0" />

### Debug Logs

```powershell
Downloads/ComfyUINew$ uv run main.py 
setup plugin alembic.autogenerate.schemas
setup plugin alembic.autogenerate.tables
setup plugin alembic.autogenerate.types
setup plugin alembic.autogenerate.constraints
setup plugin alembic.autogenerate.defaults
setup plugin alembic.autogenerate.comments
Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
Found comfy_kitchen backend cuda: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_mxfp8', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_mxfp8', 'scaled_mm_nvfp4']}
Checkpoint files will always be loaded safely.
Total VRAM 24105 MB, total RAM 63989 MB
pytorch version: 2.12.0+cu130
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 Ti : cudaMallocAsync
Using async weight offloading with 2 streams
Enabled pinned memory 57590.0
Using pytorch attention
aimdo: /project/src-posix/cuda-funchooks.c:52:DEBUG:aimdo_setup_hooks: hooks successfully installed
aimdo: /project/src/control.c:236:INFO:comfy-aimdo inited for GPU: NVIDIA GeForce RTX 3090 Ti (VRAM: 24105 MB)
DynamicVRAM support detected and enabled
Python version: 3.12.3 (main, Jan  8 2026, 11:30:50) [GCC 13.3.0]
ComfyUI version: 0.22.0
comfy-aimdo version: 0.4.3
comfy-kitchen version: 0.2.8
****** User settings have been changed to be stored on the server instead of browser storage. ******
****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******
comfyui-frontend-package version: 1.43.18
comfyui-workflow-templates version: 0.9.82
comfyui-embedded-docs version: 0.5.0
comfy-kitchen version: 0.2.8
comfy-aimdo version: 0.4.3
[Prompt Server] web root: /home/il/Downloads/ComfyUINew/.venv/lib/python3.12/site-packages/comfyui_frontend_package/static
Asset seeder disabled

Import times for custom nodes:
   0.0 seconds: /home/il/Downloads/ComfyUINew/custom_nodes/websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
Context impl SQLiteImpl.
Will assume non-transactional DDL.
Running upgrade  -> 0001_assets, Initial assets schema
Revision ID: 0001_assets
Revises: None
Create Date: 2025-12-10 00:00:00
Running upgrade 0001_assets -> 0002_merge_to_asset_references, Merge AssetInfo and AssetCacheState into unified asset_references table.
Running upgrade 0002_merge_to_asset_references -> 0003_add_metadata_job_id, Add system_metadata and job_id columns to asset_references.
Change preview_id FK from assets.id to asset_references.id.
Database upgraded from None to 0003_add_metadata_job_id
Using RAM pressure cache.
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
Model WanTEModel prepared for dynamic VRAM loading. 6419MB Staged. 0 patches attached. Force pre-loaded 73 weights: 488 KB.
Model WanTEModel prepared for dynamic VRAM loading. 6419MB Staged. 0 patches attached. Force pre-loaded 73 weights: 488 KB.
Requested to load WanVAE
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.32s/it]
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.34s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 226.60 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|█████████████████████████████████████████████| 2/2 [00:06<00:00,  3.03s/it]
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.60s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 174.92 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.22s/it]
0 models unloaded.
Model WAN21 prepared for dynamic VRAM loading. 27251MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.43s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 167.58 seconds
```

### Other

Might be related to https://github.com/Comfy-Org/ComfyUI/issues/13978 but this is without GGUF or any other custom nodes, brand new install, etc etc.

PS: Also happens with FP8 model for both, though host mem usage according to nvtop now is only 2.5GB, down from about 16GB in bf16. Which also serves again as a highlight for the failure to cache to RAM.

```
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.32s/it]
model weight dtype torch.float8_e4m3fn, manual cast: torch.float16
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.18s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 256.09 seconds
got prompt
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.64s/it]
Model WAN21 prepared for dynamic VRAM loading. 13627MB Staged. 0 patches attached. Force pre-loaded 160 weights: 1603 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.09s/it]
0 models unloaded.
Model WanVAE prepared for dynamic VRAM loading. 241MB Staged. 0 patches attached. Force pre-loaded 60 weights: 61 KB.
Prompt executed in 146.36 seconds

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComfyUI fails to cache model to RAM, reloads from disk every run. #14076

Custom Node Testing

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ComfyUI fails to cache model to RAM, reloads from disk every run. #14076

Description

Custom Node Testing

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions