Bypass safetensors mmap when --disable-mmap is set#13609
Conversation
Load safetensors through a direct read path under --disable-mmap so unified-memory systems avoid retaining mmap-backed file pages alongside framework tensors. Made-with: Cursor
📝 WalkthroughWalkthroughThe change adds a new safetensors loader function that manually reads file headers and streams tensor payloads into memory, bypassing Rust mmap. This loader is integrated into 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/utils.py`:
- Around line 149-156: The code reads header_size from the file and trusts it;
to prevent OOM attacks you must cap header_size to the same hard limit used by
safetensors_header() before calling f.read(header_size): validate that
header_size is an int within (0, MAX_HEADER_SIZE] (or the existing constant/name
used in safetensors_header), raise a ValueError if out of range, then proceed to
read and decode header_data and json.loads as before; update the logic around
header_size/header_data in the function containing this diff to reuse the
existing safetensors_header size limit and checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fc606afb-141f-472f-82af-a32d086fc2af
📒 Files selected for processing (1)
comfy/utils.py
|
Additional local validation for the temporary no-mmap path: uv run --with torch --with safetensors --with packaging --with numpy --with pillow --with tqdm --with einops python - <<'PY'
# Creates a temporary safetensors file, forces comfy.utils.DISABLE_MMAP = True,
# monkeypatches comfy.utils.safetensors.safe_open to raise if called, then
# verifies load_torch_file(..., return_metadata=True/False) preserves tensors
# and metadata for f32/f16/bf16/i64/u8/empty tensors.
PYResult: This specifically verifies that |
|
Additional comparison with Interpretation: with |
|
Clear mmap on/off comparison from local reproduction with a synthetic 256 MiB In ComfyUI terms:
Interpretation:
|
|
We just cut comfy-aimdo 0.3.0 with aarch64 support. does it help? This should bring the dynamic-vram feature into play which very deliberately avoids this copy. The thinking in aimdo is the sole copy of the model is the mmap copy. If you have a look at load_safetensors there is the notion of the _comfy_tensor_file_slice, which is a little piece of metadata, which allows the model loader to bypass the mmap completely without needing the upfront model deep-copy either. Currently its only used for pins, however on DGX spark it may actually make sense to load the entire model with this path, that is, in some new mode, try using the _comfy_tensor_file_slice to read to temp buffer then to GPU. That would take you RAM footprint for the model to 0 and leave you only with the VRAM copy (which is how it should be on DGX spark) |
that is good, i didn't know that. Let's community to try it |
Cold notwithstanding, finally got time to test PR #13609. Quick report. Setup: same Spark / GB10 / Comfy 0.20.1 / PyTorch 2.9.1+cu130 box from my earlier comment. Chroma1-HD bf16 + 6 LoRAs + T5xxl_fp16 + ae VAE. Ran with Loader works as advertised. Process settles after load: Logs confirm Two things came out of the testing that may be worth folding back:
|
This has fully addressed the issue. Thanks! |
|
@Balaxxe @johnnynunez-nv @johnnynunez , please take a look at #13802 . I am getting a big improvement in DGX spark loads as that PR takes the RAM footprint even lower again and seems to solve slowness issues. Let me know either way if you try it as if its good we can push this to master. Note the custom aimdo wheel for DGX listed in the PR. |
Summary
safetensors.safe_open()when--disable-mmapis set.readinto, and wraps the bytes withtorch.frombufferbefore moving to the requested device.tensor.to(..., copy=True)workaround, which still retained the mmap-backed storage and could increase peak memory instead of reducing it.Motivation
safetensors.safe_open()currently always mmaps the underlying file. On unified CPU/GPU memory systems such as NVIDIA Grace Blackwell / DGX Spark, Apple Silicon, AMD APUs, Jetson/IGX, and integrated GPUs, mmap-backed file pages and device/framework tensors can live in the same physical memory pool. Loading large models can therefore require roughly two resident copies of the checkpoint and OOM even when the system has enough memory for one model copy.This PR makes ComfyUI's existing
--disable-mmapflag actually bypass safetensors mmap for.safetensors/.sftfiles. Peak memory becomes one model copy plus one tensor in flight while the upstream safetensors fix is being reviewed.Related:
--disable-mmapmmap=False: feat(python): add opt-in no-mmap loading safetensors/safetensors#759Test plan
python3 -c "import ast; ast.parse(open('comfy/utils.py').read()); print('syntax OK')"comfy/utils.py: no errorsgit diff --check -- comfy/utils.py