Skip to content

Optional Docker sandbox: whole-skvm-in-container via --sandbox#39

Open
lec77 wants to merge 40 commits into
mainfrom
worktree-docker-sandbox-strategy-c
Open

Optional Docker sandbox: whole-skvm-in-container via --sandbox#39
lec77 wants to merge 40 commits into
mainfrom
worktree-docker-sandbox-strategy-c

Conversation

@lec77

@lec77 lec77 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Linked issue

Closes #30

Summary

Adds an opt-in --sandbox flag that runs the entire skvm process inside an ephemeral Docker container, instead of containing execution per-adapter. Default behaviour is unchanged — without --sandbox, skvm runs on the host exactly as before. Raising the containment boundary above the adapter layer means adapter code is untouched and the per-surface complexity explored earlier (per-CLI HOME mounts, env-var flips, per-surface network defaults) disappears.

Changes

Launcher (new src/launcher/)

  • index.tsrunLauncher(): composes mounts/env/image, a hardened docker run argv, then execs docker. Parses --mount-extra, --debug-sandbox (key-redacted), --docker-image, --docker-network.
  • path-flags.ts — enumerates every path-shaped CLI flag with {kind, mode}, including comma-separated list flags.
  • mounts.ts — three default mounts (/workspace, /skvm-cache, /skvm-data) plus dynamic /extra/<idx>/ mounts for out-of-root paths, with path-prefix dedup and longest-prefix root matching.
  • env.ts — proxy passthrough, SKVM_CACHE/HOME markers, per-route key injection (SKVM_ROUTE_<id>_KEY), route-match collision detection.
  • config-sanitize.ts — writes a key-stripped config the container mounts, so cat skvm.config.json inside the container exposes no secrets.
  • image.ts — hybrid resolve: pull ghcr.io/sjtu-ipads/skvm-sandbox:<version>, fall back to a printed local docker build command.
  • docker-argv.ts--cap-drop=ALL, no-new-privileges, pids/mem/cpu limits, host uid:gid, labels for stale-reap; rejects env values with newlines.
  • stale-reap.ts — reaps leaked containers and tmp dirs by host-pid label (timeout-bounded, best-effort).

Core wiring

  • src/index.ts--sandbox flag, SKVM_IN_SANDBOX re-entry guard, launcher dispatch, config-command/native-mode guards, clean error output.
  • src/core/config.tsgetSandboxConfig / getDefaultSandboxMode, resolveRouteApiKey env fallback, native-mode-under-sandbox enforcement at the resolveAdapterConfigMode choke point.
  • src/core/types.tsSandboxConfigSchema (validated memory/cpus/network).
  • src/providers/registry.tsresolveRouteApiKey moved to core/config.ts (re-exported; behaviour preserved).
  • src/cli-config/index.tsconfig init/show/doctor surface the sandbox slice plus docker/image checks.

Image + docs

  • docker/skvm-sandbox.Dockerfile — Ubuntu 24.04 + Bun + opencode + claude-code + baked skvm binary.
  • README.md--sandbox semantics, image build, cleanup recipes.

Adapter code (src/adapters/*.ts) is unchanged — the defining property.

Test plan

  • bunx tsc --noEmit
  • bun test — 1013 pass, 0 fail
  • Manual e2e on a Linux host with Docker:
bun run build:binary
docker build -f docker/skvm-sandbox.Dockerfile -t ghcr.io/sjtu-ipads/skvm-sandbox:$(bun run skvm --version) .
# real run through the container — resolves route, injects key, calls the model:
skvm run --sandbox --task=task.json --model=cheap_ipads/gpt-4o-mini   # -> HELLO_SANDBOX
skvm run --sandbox config doctor      # hard-errors (config is host-only)
skvm run --sandbox --adapter-config=native ...   # hard-errors (managed-only)
skvm run --sandbox --debug-sandbox ...           # argv printed, keys <redacted>

Verified end-to-end: image build, in-container tooling, launcher roundtrip, a real LLM run with artifacts persisted back to the host cache, and all guard / redaction paths.

Notes for reviewers

  • Security model: keys are stripped from the mounted config and injected via env (SKVM_ROUTE_<id>_KEY); the design accepts that env-readable keys remain visible to in-container code (network=bridge is required for LLM calls). --debug-sandbox redacts key values.
  • src/index.ts global error handler now prints error: <msg> by default (full stack under --verbose/SKVM_DEBUG). This affects all commands, not just sandbox — happy to split it out if you'd prefer the PR stay strictly sandbox-scoped.
  • Image footprint: the locally-built image is ~1.48 GB (Ubuntu 24.04 + apt curl/git/python3/nodejs/npm/jq/unzip + Bun + opencode binary + global @anthropic-ai/claude-code npm install + skvm binary). Reasonable for an agent sandbox but not lean. Multi-stage build, debian-slim base, and dropping the python3/npm/jq apt set are the obvious slimming levers — left as follow-ups so this PR stays scoped to the launcher mechanism.
  • Deferred follow-ups (intentional, not blockers): pi/hermes/openclaw in the image (only bare-agent/opencode/claude-code today); release pipeline for multi-arch image push; --mount-extra colon-in-path limitation; stripping non-route host paths (adapters/headlessAgent) from the mounted config; baseUrl-embedded-credentials stripping; pinning Bun in the Dockerfile.
  • This supersedes the earlier per-adapter direction; that branch is left unmerged as reference.

Checklist

lec77 added 30 commits May 27, 2026 02:51
… dup-guard test)

Add verified path-shaped CLI flags that were absent from PATH_FLAGS:
--profile (file, ro) used by aot-compile/pipeline/bench; --skill-list
(file, ro) for jit-optimize batch mode; --workdir (dir, rw) for run;
--target (dir, rw) for proposals accept; --custom (file, ro) for bench
YAML plans; --manifest (dir, ro) for bench judge; --output-dir (dir, rw)
for bench compare; --path (dir, ro) for bench import; --report (file, rw)
for bench merge-judge.

--logs and --failures (comma-separated path lists) are excluded from
PATH_FLAGS with a TODO(docker-sandbox) comment near their parsing site.

Add a "flag list has no duplicates" test to catch future drift.
…le + dispatch guard)

Adds assertSandboxCompatible() exported from src/index.ts, which throws
a hard error when --sandbox is combined with either a config subcommand
(host-state management) or native adapter mode (imports host credentials).

Wires the config-command guard in the sandbox dispatch block inside
main(), right before runLauncher() is invoked. Per-command native-adapter
wiring (after resolveAdapterConfigMode) is a follow-up task.
…e-code + skvm binary)

Pinned versions:
- opencode v1.4.3 (anomalyco/opencode GitHub release binary, linux-x64)
  SHA-256: 34d503ebb029853293be6fd4d441bbb2dbb03919bfa4525e88b1ca55d68f3e17
  (opencode is not on npm; installed from upstream release tarball)
- @anthropic-ai/claude-code@2.1.152 (npm)

pi/hermes/openclaw CLIs are deferred to follow-up commits.
dist/skvm must be a Linux x86_64 binary built on the host before docker build;
cross-compile with `bun run build:all` (CI) or a linux/amd64 bun target.
Docker daemon not available locally; build smoke test deferred to reck (Task 21).
lec77 added 4 commits June 1, 2026 06:11
--skill / --logs / --failures / --tasks / --test-tasks take comma-
separated path lists, but PATH_FLAGS modelled every flag as a single
path. Out-of-root list values were resolved as one nonexistent path (or
left unrewritten for --logs/--failures, which were absent entirely), so
log-source JIT and multi-skill runs broke silently inside the sandbox.

Add shape:"csv" and pathLikeOnly metadata to PathFlag; composeMounts now
splits csv values, resolves/rewrites/mounts each element independently,
and reassembles the arg. pathLikeOnly leaves bare bench task IDs in
--tasks/--test-tasks untouched while rewriting path elements.
sandbox.docker.extraMounts bypassed the denylist that --mount-extra
enforces, so a config could mount /var/run/docker.sock or / and defeat
the sandbox. Route both escape hatches through a shared
assertExtraMountsAllowed before composing mounts.

Also mkdir -p the cache root before docker bind-mounts it: a missing
bind source is created by the daemon as root, leaving the container (run
as the host uid) unable to write /skvm-cache and the host with a
root-owned ~/.skvm.
--no-auto-probe is stripped from argv on the host and re-expressed as
SKVM_AUTO_PROBE=0, but composeEnv's allowlist never forwarded it, so the
container re-enabled auto-probe despite the opt-out. Forward the var when
the host has it set.
Two mount-composition fixes from review:

- Longest-prefix root matching. rewriteUnderFixedRoots matched cwd before
  skvmCache, so --skvm-cache=./.skvm (cache nested under the workspace)
  rewrote to /workspace/.skvm. Since the in-container --skvm-cache flag
  outranks SKVM_CACHE=/skvm-cache, the container then read the raw config
  via the workspace mount and bypassed the sanitized /skvm-cache overlay,
  exposing literal API keys. Match the most specific root instead.

- Pre-create managed rw mount sources. composeMounts only existence-
  checked required flags, so an optional output like --out=/tmp/new still
  produced a dynamic bind mount; Docker creates a missing source as root
  and the host-uid container cannot write it. composeMounts now reports
  ensureDirs (cache root + dynamic rw outputs, excluding user extra
  mounts) and the launcher mkdir -p's them. Subsumes the earlier explicit
  cache-root pre-create.
@lec77

lec77 commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Pushed follow-up commits addressing the review feedback:

  • Path-list flags--skill, --logs, --failures, --tasks, --test-tasks accept comma-separated lists. The launcher now splits these and resolves / mounts / rewrites each element independently instead of treating the whole value as one path. --tasks / --test-tasks leave bare task IDs untouched and rewrite only path-like elements.
  • Config extra mountssandbox.docker.extraMounts now go through the same denylist as --mount-extra (Docker socket and host root rejected) before any mount is composed.
  • --no-auto-probe — now propagates into the container; the opt-out is no longer dropped at the host boundary.
  • rw mount ownership — the cache root and dynamic out-of-root output dirs are created host-side (owned by the invoking user) before docker run, so the daemon no longer creates them as root and locks the host-uid container out.
  • Cache-root prefix precedence — root matching is now longest-prefix, so a cache nested under the workspace (e.g. --skvm-cache=./.skvm) resolves to /skvm-cache and reads through the sanitized config overlay rather than the raw config under /workspace.

Each fix ships with regression tests; tsc is clean and the suite passes.

On the intermittent runTasksForRound adapterPool concurrency bound test: it's a pre-existing timing-sensitive concurrency test and is independent of this change — the diff doesn't touch the jit-optimize / adapter-pool path, and it passes in isolation but flakes under full-suite parallelism. I'd suggest tracking it in a separate issue to make it deterministic rather than blocking this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] Optional Docker sandbox for skvm runs (--sandbox)

1 participant