Bypass safetensors mmap when --disable-mmap is set by johnnynunez · Pull Request #13609 · Comfy-Org/ComfyUI

johnnynunez · 2026-04-29T00:53:57Z

Summary

Adds a temporary ComfyUI-side safetensors loader that does not call safetensors.safe_open() when --disable-mmap is set.
Parses the safetensors header directly, reads each tensor from disk with readinto, and wraps the bytes with torch.frombuffer before moving to the requested device.
Removes the previous tensor.to(..., copy=True) workaround, which still retained the mmap-backed storage and could increase peak memory instead of reducing it.

Motivation

safetensors.safe_open() currently always mmaps the underlying file. On unified CPU/GPU memory systems such as NVIDIA Grace Blackwell / DGX Spark, Apple Silicon, AMD APUs, Jetson/IGX, and integrated GPUs, mmap-backed file pages and device/framework tensors can live in the same physical memory pool. Loading large models can therefore require roughly two resident copies of the checkpoint and OOM even when the system has enough memory for one model copy.

This PR makes ComfyUI's existing --disable-mmap flag actually bypass safetensors mmap for .safetensors / .sft files. Peak memory becomes one model copy plus one tensor in flight while the upstream safetensors fix is being reviewed.

Fixes Loading .safetensors files requires double memory on DGX Spark #10896 when ComfyUI is launched with --disable-mmap
Upstream safetensors issue: safe_open() always mmaps: doubles peak memory on unified-memory devices (Grace Blackwell / DGX Spark / Apple Silicon, etc.) safetensors/safetensors#758
Upstream safetensors PR adding mmap=False: feat(python): add opt-in no-mmap loading safetensors/safetensors#759

Test plan

python3 -c "import ast; ast.parse(open('comfy/utils.py').read()); print('syntax OK')"
Cursor linter diagnostics for comfy/utils.py: no errors
git diff --check -- comfy/utils.py

Load safetensors through a direct read path under --disable-mmap so unified-memory systems avoid retaining mmap-backed file pages alongside framework tensors. Made-with: Cursor

coderabbitai · 2026-04-29T00:58:13Z

📝 Walkthrough

Walkthrough

The change adds a new safetensors loader function that manually reads file headers and streams tensor payloads into memory, bypassing Rust mmap. This loader is integrated into load_torch_file to activate when DISABLE_MMAP is enabled for .safetensors or .sft files, allowing tensors to be constructed from buffers and optionally transferred to requested devices. The prior workaround that copied tensors inside the safe_open retrieval loop is removed. The modification adds approximately 75 lines while removing 4.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding logic to bypass safetensors mmap when --disable-mmap is set.
Linked Issues check	✅ Passed	The PR implements the required non-mmap safetensors loader and removes the ineffective copy=True workaround, directly addressing the memory duplication issue in `#10896` on unified-memory systems.
Out of Scope Changes check	✅ Passed	All changes are scoped to implementing the non-mmap safetensors loader and removing the previous workaround; no unrelated modifications detected.
Description check	✅ Passed	The pull request description clearly and specifically explains the changes, motivation, and test plan for implementing a non-mmap safetensors loader.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy/utils.py`:
- Around line 149-156: The code reads header_size from the file and trusts it;
to prevent OOM attacks you must cap header_size to the same hard limit used by
safetensors_header() before calling f.read(header_size): validate that
header_size is an int within (0, MAX_HEADER_SIZE] (or the existing constant/name
used in safetensors_header), raise a ValueError if out of range, then proceed to
read and decode header_data and json.loads as before; update the logic around
header_size/header_data in the function containing this diff to reuse the
existing safetensors_header size limit and checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fc606afb-141f-472f-82af-a32d086fc2af

📥 Commits

Reviewing files that changed from the base of the PR and between fce0398 and 1ff3648.

📒 Files selected for processing (1)

comfy/utils.py

johnnynunez · 2026-04-29T01:04:26Z

Additional local validation for the temporary no-mmap path:

uv run --with torch --with safetensors --with packaging --with numpy --with pillow --with tqdm --with einops python - <<'PY'
# Creates a temporary safetensors file, forces comfy.utils.DISABLE_MMAP = True,
# monkeypatches comfy.utils.safetensors.safe_open to raise if called, then
# verifies load_torch_file(..., return_metadata=True/False) preserves tensors
# and metadata for f32/f16/bf16/i64/u8/empty tensors.
PY

Result:

no-mmap load_torch_file functional test passed

This specifically verifies that --disable-mmap takes the manual parser path and does not reach safetensors.safe_open().

johnnynunez · 2026-04-29T01:07:02Z

Additional comparison with comfy.utils.DISABLE_MMAP = False on a local synthetic 256 MiB .safetensors tensor:

=== mmap ===
baseline                        194.4 MiB peak
after load_torch_file           195.6 MiB peak
after touching tensor           452.5 MiB peak

=== mmap_clone ===
baseline                        193.9 MiB peak
after load_torch_file           195.1 MiB peak
after touching tensor           452.0 MiB peak
after clone/copy                708.2 MiB peak

=== nommap ===
baseline                        193.5 MiB peak
after load_torch_file           450.0 MiB peak
after touching tensor           451.3 MiB peak

=== nommap_clone ===
baseline                        193.2 MiB peak
after load_torch_file           449.7 MiB peak
after touching tensor           451.0 MiB peak
after clone/copy                707.2 MiB peak

Interpretation: with DISABLE_MMAP=False, load_torch_file() goes through safetensors.safe_open(). The mmap-backed tensor is lazy at first, then becomes resident once touched. Holding that mmap-backed storage plus a second materialized tensor copy produces the expected ~2x tensor-size peak. The no-mmap path materializes the tensor immediately without retaining a file mapping; on CPU, explicitly cloning still creates a second copy as expected, while device loads can release the per-tensor staging buffer after .to(device).

johnnynunez · 2026-04-29T01:07:48Z

Clear mmap on/off comparison from local reproduction with a synthetic 256 MiB .safetensors tensor.

In ComfyUI terms:

mmap=True means comfy.utils.DISABLE_MMAP = False, so load_torch_file() uses safetensors.safe_open().
mmap=False means comfy.utils.DISABLE_MMAP = True, so this PR's manual no-mmap parser is used.

=== mmap=True / DISABLE_MMAP=False ===
baseline                        194.4 MiB peak
after load_torch_file           195.6 MiB peak
after touching tensor           452.5 MiB peak

=== mmap=True + second tensor copy ===
baseline                        193.9 MiB peak
after load_torch_file           195.1 MiB peak
after touching tensor           452.0 MiB peak
after clone/copy                708.2 MiB peak

=== mmap=False / DISABLE_MMAP=True ===
baseline                        193.5 MiB peak
after load_torch_file           450.0 MiB peak
after touching tensor           451.3 MiB peak

=== mmap=False + second tensor copy ===
baseline                        193.2 MiB peak
after load_torch_file           449.7 MiB peak
after touching tensor           451.0 MiB peak
after clone/copy                707.2 MiB peak

Interpretation:

With mmap=True, safe_open() is lazy: load_torch_file() barely changes RSS until the mmap-backed tensor is touched.
Once touched, the file-backed mapping becomes resident (~+256 MiB here).
If a second materialized tensor copy exists at the same time, peak becomes roughly baseline + 2x tensor size (~708 MiB here).
With mmap=False, this PR materializes one tensor copy directly and does not retain a file mapping. On CPU, explicitly cloning still creates a second copy as expected; for device loads, the per-tensor staging buffer can be released after .to(device).

rattus128 · 2026-04-29T08:36:51Z

We just cut comfy-aimdo 0.3.0 with aarch64 support. does it help? This should bring the dynamic-vram feature into play which very deliberately avoids this copy. The thinking in aimdo is the sole copy of the model is the mmap copy.

If you have a look at load_safetensors there is the notion of the _comfy_tensor_file_slice, which is a little piece of metadata, which allows the model loader to bypass the mmap completely without needing the upfront model deep-copy either. Currently its only used for pins, however on DGX spark it may actually make sense to load the entire model with this path, that is, in some new mode, try using the _comfy_tensor_file_slice to read to temp buffer then to GPU. That would take you RAM footprint for the model to 0 and leave you only with the VRAM copy (which is how it should be on DGX spark)

johnnynunez-nv · 2026-04-29T09:59:06Z

We just cut comfy-aimdo 0.3.0 with aarch64 support. does it help? This should bring the dynamic-vram feature into play which very deliberately avoids this copy. The thinking in aimdo is the sole copy of the model is the mmap copy.

If you have a look at load_safetensors there is the notion of the _comfy_tensor_file_slice, which is a little piece of metadata, which allows the model loader to bypass the mmap completely without needing the upfront model deep-copy either. Currently its only used for pins, however on DGX spark it may actually make sense to load the entire model with this path, that is, in some new mode, try using the _comfy_tensor_file_slice to read to temp buffer then to GPU. That would take you RAM footprint for the model to 0 and leave you only with the VRAM copy (which is how it should be on DGX spark)

that is good, i didn't know that. Let's community to try it

Balaxxe · 2026-04-29T18:06:23Z

Clear mmap on/off comparison from local reproduction with a synthetic 256 MiB .safetensors tensor.

In ComfyUI terms:

mmap=True means comfy.utils.DISABLE_MMAP = False, so load_torch_file() uses safetensors.safe_open().

mmap=False means comfy.utils.DISABLE_MMAP = True, so this PR's manual no-mmap parser is used.
=== mmap=True / DISABLE_MMAP=False ===
baseline                        194.4 MiB peak
after load_torch_file           195.6 MiB peak
after touching tensor           452.5 MiB peak

=== mmap=True + second tensor copy ===
baseline                        193.9 MiB peak
after load_torch_file           195.1 MiB peak
after touching tensor           452.0 MiB peak
after clone/copy                708.2 MiB peak

=== mmap=False / DISABLE_MMAP=True ===
baseline                        193.5 MiB peak
after load_torch_file           450.0 MiB peak
after touching tensor           451.3 MiB peak

=== mmap=False + second tensor copy ===
baseline                        193.2 MiB peak
after load_torch_file           449.7 MiB peak
after touching tensor           451.0 MiB peak
after clone/copy                707.2 MiB peak
Interpretation:

With mmap=True, safe_open() is lazy: load_torch_file() barely changes RSS until the mmap-backed tensor is touched.

Once touched, the file-backed mapping becomes resident (~+256 MiB here).

If a second materialized tensor copy exists at the same time, peak becomes roughly baseline + 2x tensor size (~708 MiB here).

With mmap=False, this PR materializes one tensor copy directly and does not retain a file mapping. On CPU, explicitly cloning still creates a second copy as expected; for device loads, the per-tensor staging buffer can be released after .to(device).

Cold notwithstanding, finally got time to test PR #13609. Quick report.

Setup: same Spark / GB10 / Comfy 0.20.1 / PyTorch 2.9.1+cu130 box from my earlier comment. Chroma1-HD bf16 + 6 LoRAs + T5xxl_fp16 + ae VAE. Ran with --disable-dynamic-vram --disable-mmap, which exercises the new load_safetensors_no_mmap path.

Loader works as advertised. Process settles after load:

RssAnon: 20.15 GiB   (single resident copy of all weights)
VmRSS:   21.00 GiB

Logs confirm load_safetensors_no_mmap was hit for every safetensors file in the workflow. So the file-backed mmap + materialized tensor coexistence from #10896 is gone on this path.

Two things came out of the testing that may be worth folding back:

Peak during load is still ~2x model size. VmHWM hit 34.75 GiB even though steady state is 21 GiB. Source isn't the file mmap any more, it's the tensor.to('cuda') step inside the model patcher creating a second resident copy on UMA before the CPU-side reference releases. Different doubling than the one this PR addresses, but it shows up in the same metric and confuses observation. This is what @rattus128 was pointing at with the _comfy_tensor_file_slice zero-RAM-footprint suggestion. PR Bypass safetensors mmap when --disable-mmap is set #13609 fixes the file-vs-tensor doubling cleanly; the CPU-vs-GPU doubling still needs a separate path.
Page cache hangs around after load. readinto populates the kernel page cache, and on Spark UMA those pages stay charged against the unified pool until something reclaims them. After loading Chroma + T5 + LoRAs I saw Cached: 32.5 GiB worth of safetensors files just sitting there. A one-line os.posix_fadvise(fd, 0, 0, os.POSIX_FADV_DONTNEED) after each file's read drops it back to ~5 GiB. Patch I'm running locally:
```
try:
    os.posix_fadvise(fd, 0, 0, os.POSIX_FADV_DONTNEED)
except (AttributeError, OSError):
    pass
```
Belongs upstream-ish (probably in safetensors PR Grids & combinatorial outputs #759 too) but easy to add here while the upstream review is pending.

Balaxxe · 2026-04-29T21:24:55Z

We just cut comfy-aimdo 0.3.0 with aarch64 support. does it help? This should bring the dynamic-vram feature into play which very deliberately avoids this copy. The thinking in aimdo is the sole copy of the model is the mmap copy.

If you have a look at load_safetensors there is the notion of the _comfy_tensor_file_slice, which is a little piece of metadata, which allows the model loader to bypass the mmap completely without needing the upfront model deep-copy either. Currently its only used for pins, however on DGX spark it may actually make sense to load the entire model with this path, that is, in some new mode, try using the _comfy_tensor_file_slice to read to temp buffer then to GPU. That would take you RAM footprint for the model to 0 and leave you only with the VRAM copy (which is how it should be on DGX spark)

This has fully addressed the issue. Thanks!

rattus128 · 2026-05-11T12:47:03Z

@Balaxxe @johnnynunez-nv @johnnynunez , please take a look at #13802 . I am getting a big improvement in DGX spark loads as that PR takes the RAM footprint even lower again and seems to solve slowness issues. Let me know either way if you try it as if its good we can push this to master. Note the custom aimdo wheel for DGX listed in the PR.

utils: bypass safetensors mmap when disabled

1ff3648

Load safetensors through a direct read path under --disable-mmap so unified-memory systems avoid retaining mmap-backed file pages alongside framework tensors. Made-with: Cursor

johnnynunez requested review from Kosinkadink, comfyanonymous and guill as code owners April 29, 2026 00:53

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread comfy/utils.py

rattus128 self-assigned this Apr 29, 2026

johnnynunez closed this May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bypass safetensors mmap when --disable-mmap is set#13609

Bypass safetensors mmap when --disable-mmap is set#13609
johnnynunez wants to merge 1 commit into
Comfy-Org:masterfrom
johnnynunez:fix/safetensors-no-mmap-loader

johnnynunez commented Apr 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

johnnynunez commented Apr 29, 2026

Uh oh!

johnnynunez commented Apr 29, 2026

Uh oh!

johnnynunez commented Apr 29, 2026

Uh oh!

rattus128 commented Apr 29, 2026

Uh oh!

johnnynunez-nv commented Apr 29, 2026

Uh oh!

Balaxxe commented Apr 29, 2026 •

edited

Loading

Uh oh!

Balaxxe commented Apr 29, 2026

Uh oh!

rattus128 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

johnnynunez commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnnynunez commented Apr 29, 2026

Uh oh!

johnnynunez commented Apr 29, 2026

Uh oh!

johnnynunez commented Apr 29, 2026

Uh oh!

rattus128 commented Apr 29, 2026

Uh oh!

johnnynunez-nv commented Apr 29, 2026

Uh oh!

Balaxxe commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Balaxxe commented Apr 29, 2026

Uh oh!

rattus128 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnnynunez commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Balaxxe commented Apr 29, 2026 •

edited

Loading