Skip to content

Fix reflink cloning on filesystems with large block sizes such as ZFS#356

Merged
folbricht merged 2 commits into
masterfrom
zfs-reflink-fallback
Jun 11, 2026
Merged

Fix reflink cloning on filesystems with large block sizes such as ZFS#356
folbricht merged 2 commits into
masterfrom
zfs-reflink-fallback

Conversation

@folbricht

Copy link
Copy Markdown
Owner

Fixes #353

Problem

Tests (and real extraction with seeds) fail on ZFS-backed storage with invalid argument or resource temporarily unavailable errors from CloneRange. Two separate causes:

ZFS reflink semantics. OpenZFS 2.2+ implements FICLONERANGE, so the 0-byte CanClone probe succeeds, but real clones are much stricter than on btrfs/XFS: offsets must be aligned to the recordsize (128k by default, reported as st_blksize), and cloning a range that hasn't been committed to disk yet returns EAGAIN (zfs_bclone_wait_dirty=0 default) — which is exactly what happens when the null-seed clones from its freshly written zero blockfile. On pre-2.2 ZFS the ioctl didn't exist, the probe failed, and everything quietly copied instead, which is why this is new.

A latent bug in the clone math, exposed by 128k blocks. fileSeedSegment.clone() and nullChunkSection.clone() assume every segment contains at least one full aligned block. That always holds with 4k blocks and ≥16k chunks, but not with 128k blocks. For segments smaller than a block:

  • alignLength underflows (uint64), producing absurd clone lengths,
  • the head/tail copies write outside the segment, clobbering neighboring data,
  • when alignLength == 0, FICLONERANGE is called with length 0, which the kernel defines as "clone to the end of the source file" — silently overwriting the target far beyond the segment.

The last two can corrupt output on any filesystem given a large enough st_blksize.

Fix

  • Ranges that contain no full aligned block are now copied instead of cloned (also avoids the underflow / zero-length cases).
  • Any CloneRange failure now falls back to copying that range instead of failing the extraction, the same approach coreutils cp takes. The null seed skips the copy entirely when the target is still blank.
  • seed.go gains a small cloneRange indirection so tests can simulate filesystems where the probe passes but cloning fails; new tests cover the fallback and small-segment geometry.

Verification

Reproduced and verified on a real ZFS pool (Ubuntu 24.04 VM, zfs-2.2.2, zfs_bclone_enabled=1, 128k recordsize):

  • Unpatched binaries reproduce both reported failures with identical stack traces.
  • Patched binaries pass the full library suite and all TestExtractCommand subtests on ZFS.
  • Tracing FICLONERANGE calls confirms properly aligned ranges are still reflinked on ZFS (verified via zpool get bcloneused), so ZFS keeps the dedup/speed benefit rather than degrading to copy-only.

go test ./... also passes on ext4.

OpenZFS 2.2+ implements FICLONERANGE, so the CanClone probe now succeeds
on ZFS, but actual clones often fail: ZFS requires alignment to its
recordsize (128k by default, reported as st_blksize) and returns EAGAIN
when the source range hasn't been committed to disk yet. Extraction with
seeds then failed with "invalid argument" or "resource temporarily
unavailable" errors.

On top of that, the clone math in fileSeedSegment and nullChunkSection
assumed every segment contains at least one full aligned block, which is
always true for 4k blocks but not for 128k ones. Segments smaller than a
block made the aligned length underflow, made the head/tail copies write
outside the segment, and could call FICLONERANGE with a zero length,
which the kernel interprets as "clone to the end of the source file",
silently corrupting the target.

Guard against ranges that contain no full aligned block by copying them
instead, and fall back to copying the blocks whenever CloneRange fails.
Cloning failures are no longer fatal on any filesystem.

Verified against a real ZFS pool (zfs 2.2.2, zfs_bclone_enabled=1):
previously failing extraction tests now pass and properly aligned ranges
are still reflinked.

Fixes #353
@folbricht folbricht merged commit ab59502 into master Jun 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tests fail on ZFS-backed storage

1 participant