Support fuse with overlayfs by alexlarsson · Pull Request #328 · composefs/composefs-rs

alexlarsson · 2026-06-24T15:14:46Z

This adds support for a fuse version that only serves the erofs metadata, and helpers to mount an overlayfs using this with userxattrs. This allows rootless mounting of a composefs image where file content (but not metadata) is directly accessed from the regular fs by the kernel and should perform similar to the rootful overlayfs mount.

cgwalters · 2026-06-24T15:30:53Z

Maybe some overlap with #306 ?

alexlarsson · 2026-06-25T07:53:08Z

@cgwalters I don't think there is necessary an overlap, except perhaps that the passthrough support is intended to solve the same performance issue (but is not useful as it is root only). They are however complementary, in that the readdir+ and multi-threading will increase metadata performance.

alexlarsson · 2026-06-25T15:38:36Z

That said, there might be code conflicts in the two, i'll have a look at that.

cgwalters

OK, now that I understand what this is doing - pretty cool! This makes a ton of sense.

One thing I did in the other PR is add some more integration tests for the FUSE path, which would probably be quite useful here.

cgwalters · 2026-06-25T20:16:42Z

+    /// When true, the server follows overlay redirects and serves file
+    /// content from the repository. When false, it synthesizes
+    /// `user.overlay.*` xattrs for use as an overlayfs lower layer.
+    /// Defaults to false.


But actually...I am not sure we considered this at all but - is there any reason at all to have the direct serving? I am not sure there is a strong one...I think the FUSE implementation was kind of intended mainly for unprivileged mounts, but since nowadays overlay is allowed in user namespaces, I think we should automatically set "follow_redirects" = false if running unprivileged right?

Actually to say it a different way - don't we always want to just always use overlayfs, and only have the FUSE for unprivileged EROFS? In that case, when running privileged, we should not use user.overlay right? Or I guess we still can, but there's no reason to?

Honestly I'm not sure here. There is one way that follow_redirects mode is useful, in that it allows any non-root user to mount a composefs image in the "root" user namespace.

Like, if you're in the shell you can just mount a cfs and have some other program look into the mount. You cannot do that with an overlay mount. For that you need to be in a new user namespace (where you have cap_sysadmin) so you can do the mount. And if you create a new user+mnt namespace programs in the root namespace can't see your mount. So, non-folllow_redirect is primarily useful for container like tools, wheras follow_redirect is for traditional commandline work.

So, I do think it would make sense to support both of these modes. I wonder though if the current composefs fuse implementation is ideal for this kind of use. It starts by parsing the entire erofs into a filesystem tree, which adds quite some latency. An implementation that just reads the erofs file into memory and does metadata lookups directly from that would probably be faster at startup. And, for a metadata-only implementation (non-follow-redirects) that would probably be pretty easy to implement, as all you have to do is read inode info.

I guess that could be later work though.

+1 to make the metadata-only serving the default.

There is one way that follow_redirects mode is useful, in that it allows any non-root user to mount a composefs image in the "root" user namespace.

Yeah, but I'm not sure of any kind of "production" use case for that. Debugging can equally well just use APIs to inspect things.

follow_redirects is a confusing name, how about --standalone for the non-overlayfs case?

It starts by parsing the entire erofs into a filesystem tree, which adds quite some latency. An implementation that just reads the erofs file into memory and does metadata lookups directly from that would probably be faster at startup.

Yeah, agreed, we should do it that way.

It starts by parsing the entire erofs into a filesystem tree, which adds quite some latency. An implementation that just reads the erofs file into memory and does metadata lookups directly from that would probably be faster at startup.

Yeah, agreed, we should do it that way.

can we mmap'it so multiple containers using the same erofs will share the memory?

I'd think so.

At least if the image is fs-verity, because then we can trust that it doesn't change under our back.

alexlarsson · 2026-06-26T14:34:08Z

Ok I did some more work on this, picking up a bunch of changes from #306 with some changes. This now has multi-threaded fuse, mount apis to support mounting direct as well as via overlayfs with and without userxattrs, as well as verity=require. It also has mount CLI options for image/oci/ostree that mostly does the right thing by default (for example, it will use a non-overlay approach if we don't have cap_sysadmin).

I didn't look at the varlink part yet. Also this is still based on serving a filesystem, rather than directly serving from an erofs image.

cgwalters

Cool, overall looking good

cgwalters · 2026-06-26T14:36:17Z

+        fuse: bool,
+        /// Force kernel mount instead of auto-detecting
+        #[cfg(feature = "fuse")]
+        #[arg(long, conflicts_with = "fuse")]


I think an enum is better --fuse=auto|yes|no. Could also have a #[clap(flatten)] shared struct to dedup

cgwalters · 2026-06-26T14:37:57Z

+    std::fs::read_to_string("/proc/self/uid_map")
+        .map(|s| s.trim() == "0          0 4294967295")


This feels hacky.

I think this is the canonical way to see if you're in the init user namespace. @giuseppe is there something better?

we have a similar function in containers/storage:

// hasFullUsersMappings checks whether the current user namespace has all the IDs mapped. func hasFullUsersMappings() (bool, error) { content, err := os.ReadFile("/proc/self/uid_map") if err != nil { return false, err } // The kernel rejects attempts to create mappings where either starting // point is (u32)-1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/user_namespace.c?id=af3e9579ecfb#n1006 . // So, if the uid_map contains 4294967295, the entire IDs space is available in the // user namespace, so it is likely the initial user namespace. return bytes.Contains(content, []byte("4294967295")), nil }

it will fail if you do something like: unshare --map-users 0:0:4294967295 ... but I guess that is not a common configuration

cgwalters · 2026-06-26T14:38:33Z

    Ok(options)
 }

+#[cfg(feature = "fuse")]


I think it'd be nicer to have a separate mod fuse with all this stuff so we don't need a lot of #[cfg

cgwalters · 2026-06-26T14:39:13Z

+}
+
+#[cfg(feature = "fuse")]
+fn detect_mount_mode(force_fuse: bool, no_fuse: bool, has_upper: bool) -> MountMode {


instead of the first two bools an enum would be way clearer

cgwalters · 2026-06-26T14:40:17Z

+    } else if no_fuse {
+        false
+    } else {
+        !(getuid().is_root() && in_init_user_namespace())


Hmm do we really need to check the userns? I think we could just always check for has_cap_sys_admin?

wouldn't that be enough in a rootless environment?

cgwalters · 2026-06-26T14:41:37Z

+        std::mem::forget(work_fd);
+    }
+
+    clear_cloexec(&image_fd);


I think https://docs.rs/cap-std-ext/latest/cap_std_ext/cmdext/struct.CmdFds.html is better than this

cgwalters · 2026-06-26T14:41:56Z

+    #[arg(long, value_parser = ["fuse", "fuse-overlay"])]
+    mode: String,


Clearer as an enum right?

cgwalters · 2026-06-26T14:42:40Z

+pub fn run_internal_fuse_serve(args: InternalFuseServeArgs) -> Result<()> {
+    use std::os::fd::FromRawFd;
+
+    let image_fd = unsafe { std::os::fd::OwnedFd::from_raw_fd(args.image_fd) };


Perhaps we could use the socket activation protocol to add a little bit more safety?

Rather than serving a tree we serve directly from an erofs image passed as an fd. This should allow much less latency at startup, as we don't have to parse the entire file. It also allows us to use memmap, which should be safe at least for fs-verity (i.e. readonly) files. Fuse inodes are erofs nids, except the root inode which is always 1 in fuse. Fortunately erofs nids can't be 1, so we just map 1 <-> root nid. fuse requires the memory to be send and 'static, which is problematic for the self-referencing that happens if we store both Image and the owning buffer in ComposefsFuse. For now we just leak the erofs data chunk to make it 'static, as we expect the fuse process to keep it around until exit anyway. Assisted-by: Claude Code (Opus 4.6) Signed-off-by: Alexander Larsson <alexl@redhat.com>

Add mount_fuse_overlay() which creates an overlayfs on top of a FUSE mount, using userxattr mode and data-only lower layers for file content. The FUSE server must already be running before calling this, since overlayfs probes the lower layer during setup. OverlayMountOptions controls the overlay configuration: upper/work dirs for writable mounts, read-write mode, and fs-verity enforcement. This is needed if using mount_fuse() that doesn't follow redirects. Assisted-by: Claude Code (Opus 4.6) Signed-off-by: Alexander Larsson <alexl@redhat.com>

Add --fuse/--no-fuse flags to 'cfsctl mount' and 'cfsctl oci mount' with automatic mount mode detection based on privilege level: - Root in the init user namespace: kernel composefs mount (default) - Non-init namespace with CAP_SYS_ADMIN: FUSE with overlayfs (kernel reads data directly via data-only lower layer) - Non-init namespace without CAP_SYS_ADMIN: plain FUSE The --fuse flag forces FUSE, --no-fuse forces kernel mount, and omitting both auto-detects. When --upperdir is given, overlay mode is always used regardless of capabilities. By default the FUSE server daemonizes by re-executing itself as --internal-fuse-serve, passing the repo, image, and overlay fds via inherited file descriptors. The parent waits on a pipe for mount readiness then returns, matching the kernel mount's fire-and-forget behaviour. Use --foreground to keep the server in the calling process (useful for tests and debugging). Init namespace detection reads /proc/self/uid_map for the characteristic "0 0 4294967295" identity mapping. The composefs-fuse crate is an optional dependency behind the 'fuse' feature (on by default). MountOptions gains has_overlay(), read_write(), and into_overlay() accessors. serve_tree_fuse() gains an optional ready_fd parameter for signaling mount readiness. Assisted-by: Claude Code (Opus 4.6) Signed-off-by: Alexander Larsson <alexl@redhat.com>

Add privileged_fuse_dumpfile_roundtrip test that validates the FUSE implementation by building a synthetic filesystem with diverse content (directories, inline files, external files, symlinks, xattrs, hardlinks, character devices, FIFOs), mounting it via `cfsctl mount --fuse --foreground`, and comparing the dumpfile output against the expected canonical form. The test uses --foreground so the FUSE server runs as a child process that the test can manage directly (kill + unmount on cleanup). The test also reads external file content from the FUSE mount to verify the repository object serving path works correctly. Based-on-work-by: Colin Walters <walters@verbum.org> Assisted-by: Claude Code (Opus 4.6) Signed-off-by: Alexander Larsson <alexl@redhat.com>

alexlarsson · 2026-06-26T16:04:32Z

Ok, completely new fuse reimplementation. Much smaller and should start faster. I'd like to do some more reviewing of it next week, and handle some of the comments above too. But, it looks pretty sweet to me.

cgwalters

Overall looks good to me; you didn't address a few of my comments, was that intentional?

cgwalters · 2026-06-26T16:35:45Z

+}
+
+/// Check if an fd has fs-verity enabled, meaning its contents cannot change.
+fn is_safe_to_mmap(fd: &impl AsFd) -> bool {


We could argue to move this check into the memmap2 crate, then we wouldn't need any unsafe here.

It's the same with sealed memfds

alexlarsson · 2026-06-26T16:40:35Z

Overall looks good to me; you didn't address a few of my comments, was that intentional?

I mean to look at those comment next week, as they were mainly in a different area than this change. I just wanted people to have a look at the new approach earlier.

cgwalters reviewed Jun 25, 2026

View reviewed changes

alexlarsson force-pushed the fuse-with-overlayfs branch 2 times, most recently from 3c4c36b to 588a7f0 Compare June 26, 2026 14:28

alexlarsson force-pushed the fuse-with-overlayfs branch from 588a7f0 to 98989b4 Compare June 26, 2026 14:36

cgwalters requested changes Jun 26, 2026

View reviewed changes

cgwalters and others added 4 commits June 26, 2026 18:02

alexlarsson force-pushed the fuse-with-overlayfs branch from 98989b4 to d82e9ab Compare June 26, 2026 16:03

giuseppe reviewed Jun 26, 2026

View reviewed changes

Comment thread crates/composefs-fuse/src/lib.rs

cgwalters approved these changes Jun 26, 2026

View reviewed changes

		std::fs::read_to_string("/proc/self/uid_map")
		.map(\|s\| s.trim() == "0 0 4294967295")

		#[arg(long, value_parser = ["fuse", "fuse-overlay"])]
		mode: String,

Uh oh!

Conversation

alexlarsson commented Jun 24, 2026

Uh oh!

cgwalters commented Jun 24, 2026

Uh oh!

alexlarsson commented Jun 25, 2026

Uh oh!

alexlarsson commented Jun 25, 2026

Uh oh!

cgwalters left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexlarsson commented Jun 26, 2026

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexlarsson commented Jun 26, 2026

Uh oh!

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexlarsson commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cgwalters left a comment •

edited

Loading