Skip to content

feat(gateway): OpenShell gateway microVM with libkrun#100

Draft
drew wants to merge 14 commits intomainfrom
openclaw-vm/dn
Draft

feat(gateway): OpenShell gateway microVM with libkrun#100
drew wants to merge 14 commits intomainfrom
openclaw-vm/dn

Conversation

@drew
Copy link
Collaborator

@drew drew commented Mar 4, 2026

Summary

  • Add openshell-vm library crate that boots k3s inside a libkrun microVM on macOS ARM64
  • Extract a standalone gateway binary with gvproxy networking, DHCP, and port forwarding
  • Kubeconfig auto-extracted to ~/.kube/gateway.yaml for immediate kubectl access
  • Add e2e integration tests for the gateway binary

Details

Boots a full k3s Kubernetes cluster inside an Apple Hypervisor.framework microVM via libkrun. Uses gvproxy for user-mode networking (virtio-net) with DHCP, bypassing TSI which is incompatible with k3s loopback connections. Preserves containerd metadata across boots for fast startup.

New files

  • crates/openshell-vm/ — library crate with libkrun FFI bindings, VmConfig, launch()
  • crates/openshell-vm/src/main.rs — standalone gateway binary
  • crates/openshell-vm/scripts/ — rootfs build script, init script, debug helpers
  • scripts/bin/openshell — updated with codesigning and DYLD_FALLBACK_LIBRARY_PATH

Usage

# Build the rootfs (one-time)
./crates/openshell-vm/scripts/build-rootfs.sh

# Boot the gateway
cargo run --bin gateway

# Use it
export KUBECONFIG=~/.kube/gateway.yaml
kubectl get nodes

Prerequisites

  • brew tap slp/krun && brew install libkrun
  • Podman Desktop (for gvproxy at /opt/podman/bin/gvproxy)

@drew drew self-assigned this Mar 4, 2026
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@drew drew changed the title feat(vm): hello world, NemoClaw VM feat(gateway): hello world, NemoClaw VM Mar 4, 2026
@drew drew force-pushed the openclaw-vm/dn branch 2 times, most recently from 58d7858 to 6e6759e Compare March 6, 2026 21:30
drew added 5 commits March 16, 2026 10:41
Add a new navigator-vm library crate that boots k3s inside a libkrun
microVM, accessible from the host via gvproxy port forwarding.

Key components:
- FFI bindings to libkrun C API (krun_create_ctx, krun_add_net_unixgram, etc.)
- VmConfig with gateway() preset for k3s and custom exec mode
- gvproxy integration: virtio-net via unixgram, DHCP, native HTTP port forwarding
- gateway-init.sh: PID 1 init script with DHCP via udhcpc, mounts, k3s exec
- build-rootfs.sh: builds Ubuntu 22.04 arm64 rootfs with k3s + busybox-static
- Kubeconfig auto-extraction to ~/.kube/gateway.yaml
- CLI integration as 'ncl gateway' with --exec, --port, --net flags
- macOS codesigning and DYLD_FALLBACK_LIBRARY_PATH in ncl wrapper
Enable full NemoClaw control plane deployment inside the libkrun
microVM so e2e tests can run against the VM instead of Docker.

Build-time (build-rootfs.sh):
- Package helm chart and inject into k3s static charts directory
- Copy HelmChart CR and agent-sandbox manifests into rootfs
- Pull and save arm64 container images as tarballs for airgap boot

Boot-time (gateway-init.sh):
- Enable flannel CNI (remove --flannel-backend=none and related flags)
- Deploy bundled manifests to k3s auto-deploy directory
- Patch HelmChart CR for VM context (pullPolicy, SSH placeholders)
- Ensure DNS fallback when DHCP doesn't configure resolv.conf

Post-boot (lib.rs):
- Wait for navigator namespace created by Helm controller
- Generate PKI and apply TLS secrets via host kubectl
- Store cluster metadata and mTLS creds for CLI/SDK access
- Set 'gateway' as active cluster for e2e test discovery

Also bump VM to 8GB RAM / 4 vCPUs, add port 30051 forwarding,
fix nemoclaw wrapper fingerprint to include navigator-vm crate,
and add test:e2e:vm mise task.
Stop deleting meta.db in gateway-init.sh and include the native
snapshotter, content store, and metadata DB in the rootfs built by
build-rootfs.sh. Without meta.db, containerd re-extracts all image
layers on every boot (~2 min for navigator/server on virtio-fs),
causing kubelet CreateContainer timeouts. Also replace the etcd-snapshot
approach with direct SQLite cleanup of the kine DB to remove stale
pod/event/lease records.
Move the gateway VM launching out of `nemoclaw gateway` into its own
`gateway` binary built from the navigator-vm crate. The nemoclaw CLI
no longer links against libkrun or requires macOS hypervisor codesigning.

Add scripts/bin/gateway wrapper (build + codesign + exec) and clean up
scripts/bin/nemoclaw to remove navigator-vm artifacts.
Two #[ignore] tests that require libkrun + pre-built rootfs:
- gateway_boots_and_service_becomes_reachable: starts the full gateway
  and verifies the gRPC service on port 30051
- gateway_exec_runs_guest_command: runs /bin/true inside the VM via
  --exec and checks the exit code
@drew drew changed the title feat(gateway): hello world, NemoClaw VM feat(gateway): OpenShell gateway microVM with libkrun Mar 17, 2026
drew added 9 commits March 17, 2026 15:57
Move orphaned integration test from crates/navigator-vm/ to
crates/openshell-vm/tests/ and update all navigator_bootstrap
references to openshell_bootstrap, including renamed types
(ClusterMetadata -> GatewayMetadata) and functions.
openshell-vm links against libkrun which is only available on macOS
with Homebrew. Exclude it from cargo check, clippy, and test workspace
commands so CI passes on Linux runners.
…support

Enable Kubernetes-compatible networking in the gateway microVM by
building a custom libkrunfw kernel with CONFIG_BRIDGE, CONFIG_NETFILTER,
CONFIG_NF_CONNTRACK, CONFIG_IP_NF_IPTABLES, and CONFIG_VETH compiled in.

Key changes:
- Docker-based kernel build pipeline for macOS (build-custom-libkrunfw.sh)
- Kernel config fragment enabling bridge/netfilter/conntrack/NAT/IPVS
- Feature-flagged bridge CNI with auto-detection fallback to legacy ptp
- Runtime provenance tracking (SHA-256, build metadata, manifest validation)
- VM capability checker and host-side verification matrix scripts
- Mise tasks: vm:build-custom-runtime, vm:verify, vm:check-capabilities
- Architecture and operator documentation
…orking

Switch kube-proxy to nftables mode and add missing kernel config options
(NFT_NUMGEN, NFT_FIB_IPV4/6, NFT_LIMIT, NFT_REDIR, NFT_TPROXY) plus
xtables match modules required by CNI bridge masquerade. Add stale CNI
state cleanup on boot (cni0 bridge, veth pairs, IPAM allocations, pod
network namespaces, sandbox controller shim) to prevent 'route already
exists' errors from persistent rootfs. Remove dual bridge/legacy-vm-net
profile system in favor of bridge-only with fail-fast kernel validation.
Drop host-mapped port 6443 (kube-apiserver) since it is not needed for
normal gateway operation. Update bundle script to fall back to Homebrew
for libkrun.dylib (VMM) while still requiring custom libkrunfw (kernel).
…aligned pre-bake

Two issues caused the gateway service readiness check to time out:

1. Port mapping mismatch: gvproxy mapped host:30051 → VM:8080, but with
   bridge CNI the pod listens on 8080 inside its network namespace, not
   on the VM's root namespace. Changed to 30051:30051 so traffic flows
   through the NodePort service (kube-proxy nftables → pod:8080).

2. Pod cycling from helm upgrade: build-rootfs.sh pre-baked with
   hostNetwork=true and automountServiceAccountToken=false, but
   gateway-init.sh changed these at boot, triggering a HelmChart
   reconcile that killed the pre-baked pod ~90s in. Aligned pre-bake
   values (hostNetwork=false, automountServiceAccountToken=true) to
   match runtime, eliminating the manifest delta.
The previous commit (070bcca) dropped port 6443 from the gvproxy
port_map, breaking all host-side kubectl commands including the
readiness check and stale pod recovery. k3s runs the API server
with host networking so VM:6443 is directly reachable — restore
the 6443:6443 mapping alongside the 30051:30051 NodePort mapping.
Remove all kubectl calls from the host-side boot sequence, eliminating
the need to forward port 6443 (kube-apiserver) outside the VM.

Changes:
- wait_for_gateway_service: TCP probe only (30051), no kubectl pod check
- bootstrap_gateway: cold boot writes TLS secret manifests via virtio-fs
  into k3s auto-deploy dir instead of kubectl apply
- bootstrap_gateway: warm boot skips namespace wait (TCP probe suffices)
- recover_stale_pods: removed entirely (gateway-init.sh already cleans
  containerd runtime/sandbox state, CNI state, and network namespaces)
- Kubeconfig copy moved to best-effort post-readiness (for debugging)
- Port 6443 removed from gvproxy port_map

Removed functions: recover_stale_pods, wait_for_namespace,
apply_tls_secrets, kubectl_apply.

Net: -362 lines, +147 lines. No kubectl binary required on host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant