Hijack namespace requests + other fixes to make bwrap working by sylirre · Pull Request #359 · termux/proot

sylirre · 2026-05-23T18:12:46Z

Targeted to make it possible to run stuff under bwrap, which is used by Glycin (gdk-pixbuf).

~~Currently doesn't support bwrap's --unshare-net or --unshare-all~~ - fixed

Sandbox tools (bwrap, etc.) call unshare/setns/mount/pivot_root and write to /proc/self/{uid,gid}_map to set up a namespace; under proot these all fail because we have no real namespace. Pretend they succeed and use proot's binding system to make the resulting paths actually accessible. - syscall/enter.c: void unshare/setns/umount; turn mount into a runtime binding (bind/proc/sysfs/tmpfs); turn pivot_root into a root-binding swap that re-exposes the old root at put_old; redirect open of /proc/<pid|self>/{uid_map,gid_map,setgroups} to /dev/null; also fix prctl(PR_SET_DUMPABLE) to actually return 0 (previously leaked -ENOSYS). - syscall/seccomp.c: filter PR_unshare/PR_setns so the handlers above run under seccomp mode 2. - path/canon.c: when a symlink's guest path aliases /proc via a binding (e.g. /oldroot/proc/self), route through readlink_proc so "self" resolves to the tracee's pid, not proot's. - extension/mountinfo: append synthesized lines for runtime bindings so bwrap's parse_mountinfo finds the mounts it just asked for. - path/temp.c: skip chmod on symlinks during temp-dir cleanup; bwrap leaves /dev/{stdin,fd,...} symlinks pointing at /proc/self/fd/N inside the emulated tmpfs dirs.

bubblewrap calls clone(CLONE_NEWNS|SIGCHLD) directly (without going through unshare) and the Android kernel rejects it with EPERM when unprivileged user namespaces are disabled. Drop the namespace flags before the kernel sees them so the fork/thread itself still succeeds; PRoot keeps tracking the child via PTRACE_EVENT_CLONE. clone3 takes its flags from a struct clone_args in tracee memory, so read/write via peek_word/poke_word. Add both syscalls to the seccomp filter so the enter handler runs under seccomp mode 2.

On some aarch64 kernels (notably Android) the SYSCALL_AVOIDER trick leaks -ENOSYS through to the tracee even though set_sysnum() ran: chdir under PROOT_NO_SECCOMP and bwrap's mount(NULL, "/", ...) both returned "Function not implemented". Add an exit-stage poke so the result is always 0 for these emulated syscalls, regardless of how the kernel handled SYSCALL_AVOIDER. Requires the syscalls to be filtered with FILTER_SYSEXIT under seccomp mode so the exit handler actually runs. Also poke SYSARG_RESULT=0 at enter for getcwd/chdir/fchdir, mirroring what mount/unshare/etc. already do.

Android's parent process installs a system-wide seccomp filter that traps mount/umount/pivot_root/unshare/setns with SIGSYS. Our regular sysenter handlers in enter.c never run for those syscalls because the kernel sends SIGSYS instead of executing the call, so bwrap was getting -ENOSYS from the SIGSYS handler's default branch. Add cases in handle_seccomp_event_common that pretend the syscall succeeded (mirroring what enter.c does), and apply the mount / pivot_root binding emulation so sandbox helpers like bubblewrap see the bindings they expect. The emulation helpers in enter.c are factored out into apply_emulated_mount() / apply_emulated_pivot_root() so the SIGSYS handler and the normal enter path share the same code.

bubblewrap reads /oldroot/proc/self/fd/<N> to verify the mount it just asked for. With only a single /oldroot binding pointing at the previous rootfs host path, /oldroot/proc resolved to <rootfs>/proc on the host (empty), not the real /proc, so the readlink failed. After installing the put_old binding, walk the existing non-root bindings and add a parallel <put_old>/<guest> binding for each. The host /proc bound at /proc thus also becomes reachable at /oldroot/proc, which is what bwrap (and similar sandbox helpers) expects.

Two related fixes for the bubblewrap-on-PRoot emulation: 1. Subsequent bwrap runs in the same shell were failing with "Creating newroot failed: No such file or directory" because the bindings added by the previous bwrap leaked into the parent. bubblewrap clones with CLONE_NEWNS (which we strip); remember that on the tracee, and in new_child() deep-copy the binding tree so emulated mount(2) calls in the child don't propagate back to the parent. 2. umount of a runtime bind was a silent no-op. Add emulate_umount() that removes the matching binding when its guest path exactly equals the unmount target, and call it from both the regular sysenter handler and the SIGSYS handler.

Previously emulated mount of fstype "devtmpfs" or "devpts" got the same empty-tmpdir treatment as "tmpfs", which meant the tracee saw an empty directory instead of any real device. Bind the host /dev (for devtmpfs) and /dev/pts (for devpts) instead, so things like opening /dev/null or a pty inside the sandbox actually work.

Bubblewrap's --unshare-net path calls loopback_setup(), which: 1. if_nametoindex("lo") -> ioctl(SIOCGIFINDEX, {ifr_name="lo"}) 2. socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE) 3. bind() the netlink socket 4. sendto/recv RTM_NEWADDR + RTM_NEWLINK On Android the underlying syscalls return EACCES because the real caller lacks CAP_NET_ADMIN. We can't really set the loopback up but we can make bwrap think we did. - ioctl(SIOCGIFINDEX) for "lo" is intercepted and filled with index 1. - socket(AF_NETLINK, ...) is silently rewritten to socket(AF_UNIX, SOCK_DGRAM, ...). The resulting fd is tracked on the tracee. - bind/sendto/recvfrom on a tracked fd is voided. sendto records the request's nlmsg_seq, recvfrom writes back a synthesised NLMSG_ERROR reply with error=0, nlmsg_seq from the request, and nlmsg_pid set to the tracee's pid (bwrap checks both). - close() on a tracked fd removes it from the set so a reused fd number doesn't keep being intercepted.

Previously socket(AF_NETLINK, ...) was unconditionally rewritten to socket(AF_UNIX, SOCK_DGRAM, 0), which broke legitimate netlink users inside the rootfs (e.g. c-ares under dnf would observe a zero-byte recvmsg and abort with "Unexpected netlink response of size 0 on descriptor N (address family 1)"). The rewrite is only needed where the kernel refuses AF_NETLINK outright (typical on Android/Termux). Probe socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE) once at first use and cache the outcome; only fall back to the AF_UNIX emulation when the probe fails. When the probe succeeds, no fd is tracked, so the dependent bind/sendto/recv intercepts stay inert.

Pulls in <unistd.h> for close(2) so the netlink probe builds without the implicit-declaration warning. While here, replace the magic -1/0/1 cache values with a named enum, fix the comment to call out the real reasons AF_NETLINK gets denied (SELinux, inherited seccomp, hardened containers) rather than implying it is always seccomp, and emit a VERBOSE note the first time the probe falls back so users can tell from -v output whether the AF_UNIX shim is active.

sylirre added 6 commits May 23, 2026 18:01

sylirre mentioned this pull request May 23, 2026

[Bug]: After updating the packages in Termux on 10.02.26, many applications stopped running. termux/termux-packages#28421

Open

sylirre mentioned this pull request May 23, 2026

[Bug]: Arch Linux: XFCE4 fails due to bwrap/namespace incompatibility in proot termux/proot-distro#644

Open

sylirre force-pushed the bwrap-fix branch from 3479611 to aac060c Compare May 23, 2026 20:39

sylirre added 2 commits May 24, 2026 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hijack namespace requests + other fixes to make bwrap working#359

Hijack namespace requests + other fixes to make bwrap working#359
sylirre wants to merge 10 commits into
masterfrom
bwrap-fix

sylirre commented May 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sylirre commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sylirre commented May 23, 2026 •

edited

Loading