feat(execd): add isolation package with bwrap support#1008
Conversation
45245e8 to
03cbe0b
Compare
…e 1) - Add pkg/isolation/ package: Isolator interface, bwrap argv builder, startup probe, upper directory management, seccomp loading - Switch bwrap distribution from //go:embed to Dockerfile static build (musl-gcc) and init container injection alongside execd - Add isolation flags (upper root, max bytes, diff max bytes, allowed writable) with env var overrides - Add smoke test: Docker build, extract binaries, verify static link, bwrap namespace test, execd probe - Add smoke_bwrap.sh to CI workflow (ubuntu-latest only) - Defer diff/commit to Phase 2 (stub returning 503) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
040edea to
19a3775
Compare
6163613 to
22d8a0f
Compare
- Add isolated session model types and /v1/isolated/* router (17 endpoints) - Implement session lifecycle: Create/Get/Run/Delete with bwrap+bash - SSE streaming via basicController writeSingleEvent (context-aware) - MergedView overlay filesystem with whiteout support (20 unit tests) - Filesystem proxy endpoints (10 handlers via MergedView) - Idle GC: background goroutine scavenges sessions past idle_timeout - Bwrap integration tests (43 tests, linux+bwrap build tag) - Isolated session unit tests (15 tests, stub isolator) - CI job bwrap-smoke: meson build bwrap v0.11.2 + sudo go test - Windows stubs for cross-platform compilation - Fix: cmd.Wait() zombie cleanup, context cancellation propagation, setpriv skip when uid=0, correct v0.11.x overlay syntax Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
26158ac to
29a6549
Compare
…ts (OSEP-0013) Seccomp BPF: - pkg/isolation/seccomp_gen.go — BPF generator (elastic/go-seccomp-bpf, pure Go) - Default-allow denylist: 40+ dangerous syscalls blocked (mount, ptrace, etc.) - BPF passed to bwrap via memfd + ExtraFiles fd Telemetry: - execd.isolation.session.count (gauge) - execd.isolation.run.duration (histogram, ms) - execd.isolation.upper.usage_bytes (gauge) - IsolationStatsProvider pattern for gauge callbacks OpenAPI Spec: - specs/execd-api.yaml: 17 endpoints, 10 schemas, ServiceUnavailable response Integration tests (64 total in bwrap_test/): - 5 seccomp tests (filter active, normal syscalls, ptrace block, mount block, persist) - 3 ExtraWritable tests (write, read-write roundtrip, multiple sessions) - 11 gap-coverage tests (stderr, recovery, cancellation, network iso, 100x stress, delete-recreate, bash builtins, large file, subprocess cleanup, buffer size, workspace isolation) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
284f75f to
af39baa
Compare
17 endpoints, 10 new schemas, ServiceUnavailable response. FS proxy endpoints reuse same schemas as /files/* and /directories/*. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
584f722 to
bf1bfdd
Compare
Pangjiping
left a comment
There was a problem hiding this comment.
Review Summary
3 critical bugs, 3 high-severity issues, 4 medium items. See inline comments for details.
Critical (must fix before merge):
- stdin close goroutine kills session after every HTTP request
- Upper directory ID mismatch → directories never cleaned up
- Concurrent Run has no mutual exclusion on shared stdin/stdout
High:
4. InitCodeRunner() called twice in main.go — duplicate controller
5. ReadDir doesn't prefer upper over lower entry metadata
6. Dockerfile puts bwrap at /bwrap but findBwrap() doesn't look there
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 721bcd85ac
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| wrapOpts.EnvPassthrough.Mode = isolation.EnvMode(s.opts.EnvPassthroughMode) | ||
| wrapOpts.EnvPassthrough.Keys = s.opts.EnvPassthroughKeys |
There was a problem hiding this comment.
Apply strict env defaults before wrapping
When callers create the default/strict profile without an explicit env_passthrough, s.opts.EnvPassthroughMode is empty here, so bwrapEnvSegment adds no --unsetenv/--clearenv flags and the child inherits execd's full environment. That contradicts the strict profile default in oseps/0013-isolated-execution-api.md (deny + blacklist) and exposes parent secrets such as *_TOKEN to isolated code unless every client opts in manually.
Useful? React with 👍 / 👎.
| ctx, cancel := context.WithCancel(c.ctx.Request.Context()) | ||
| defer cancel() |
There was a problem hiding this comment.
For requests that set timeout_seconds, the handler still creates only a cancelable copy of the HTTP request context and never applies req.TimeoutSeconds, so a command like sleep 3600 keeps the isolated bash run and SSE response alive until the client disconnects. The new request model and OpenAPI schema expose this field, so the controller should wrap the context with a timeout when the value is positive before calling RunInIsolatedSession.
Useful? React with 👍 / 👎.
| func (c *IsolatedSessionController) getMergedView() (vfs.FS, error) { | ||
| sessionID := c.ctx.Param("sessionId") | ||
| mv, err := isolatedRunner.GetMergedView(sessionID) |
There was a problem hiding this comment.
Guard file endpoints when isolation is unavailable
When the bwrap probe fails at startup (or runner initialization is skipped), isolatedRunner remains nil; the create/run/get/delete handlers return 503 via probed(), but the file handlers call getMergedView() directly and this dereference panics instead of returning the documented service-unavailable response. This affects any /v1/isolated/session/:id/files* or /directories request in environments without working isolation.
Useful? React with 👍 / 👎.
| oldPath := c.ctx.Query("old_path") | ||
| newPath := c.ctx.Query("new_path") | ||
| if oldPath == "" || newPath == "" { |
There was a problem hiding this comment.
Parse file moves from the documented body
The new OpenAPI path /v1/isolated/session/{sessionId}/files/mv defines the same JSON body as /files/mv (an array of RenameFileItem with src/dest), but this handler ignores the body and requires old_path/new_path query parameters. Clients generated from specs/execd-api.yaml will send the documented JSON and consistently receive a 400 missing-query error, so the implementation and contract are not usable together.
Useful? React with 👍 / 👎.
| for _, allowed := range r.allowedWritable { | ||
| if strings.HasPrefix(p, allowed) { | ||
| found = true |
There was a problem hiding this comment.
Reject sibling paths in extra_writable allowlists
When an operator allowlists a directory such as /workspace/cache, this prefix check also accepts caller-supplied siblings like /workspace/cache-escape or /workspace/cache2, which are then passed to bwrap as writable binds. That lets a request escape the configured extra_writable boundary; the check should compare cleaned path components rather than raw string prefixes.
Useful? React with 👍 / 👎.
| if m.UpperDir != "" { | ||
| if data, err := os.ReadFile(m.resolveUpper(rel)); err == nil { | ||
| return data, nil | ||
| } | ||
| } | ||
| return os.ReadFile(m.resolveLower(rel)) |
There was a problem hiding this comment.
Honor overlay whiteouts when reading files
When code inside an overlay session deletes a lower-layer file, the upper layer records a .wh.<name> whiteout; ReadDir handles those whiteouts, but ReadFile falls back to the lower file whenever the upper path itself is absent. After rm file inside the session, the isolated download/read path can still return the deleted lower-layer content, so the API no longer reflects the session filesystem.
Useful? React with 👍 / 👎.
| filePath := c.ctx.Query("path") | ||
| if filePath == "" { | ||
| c.RespondError(http.StatusBadRequest, model.ErrorCodeMissingQuery, "path is required") |
There was a problem hiding this comment.
Read upload paths from multipart metadata
The new isolated upload endpoint is documented like /files/upload, with the destination path carried in each multipart metadata part, but this handler ignores metadata and requires a non-documented path query parameter. Generated clients following specs/execd-api.yaml will send metadata + file and consistently get path is required, making isolated uploads incompatible with the public contract.
Useful? React with 👍 / 👎.
| var exitCode int | ||
| scanDone := make(chan struct{}) | ||
| go func() { | ||
| defer close(scanDone) | ||
| for scanner.Scan() { | ||
| line := scanner.Text() |
There was a problem hiding this comment.
Require the end marker before reporting success
If user code exits the persistent shell before the appended marker runs, for example by sending exit 7 or exec some-process, scanner.Scan() reaches EOF, exitCode remains its zero value, and the handler reports a successful run even though the session process is gone. The scanner should treat EOF before seeing the marker as an execution error instead of returning code 0.
Useful? React with 👍 / 👎.
| func (m *UpperManager) Allocate() (sessionID, upperDir, workDir string, err error) { | ||
| id := newSessionID() | ||
|
|
||
| upperDir = filepath.Join(m.root, id, "upper") | ||
| workDir = filepath.Join(m.root, id, "work") | ||
|
|
||
| if err := os.MkdirAll(upperDir, 0o755); err != nil { |
There was a problem hiding this comment.
Enforce the configured upper size limit
UpperManager stores maxBytes, and the shipped config describes upper_max_bytes as a hard limit, but allocation and write paths never consult it; overlay sessions can therefore keep creating/writing upper directories until the pod/node filesystem fills. At minimum, allocation or writes need to reject once current usage plus the new session/write would exceed the configured limit.
Useful? React with 👍 / 👎.
| // ExtraFiles are assigned fds starting at 3 in the child process. | ||
| seccompFd = strconv.Itoa(3 + len(cmd.ExtraFiles)) | ||
| cmd.ExtraFiles = append(cmd.ExtraFiles, os.NewFile(uintptr(fd), "seccomp")) |
There was a problem hiding this comment.
Close seccomp memfds after starting sessions
Every wrapped command with seccomp enabled appends a new os.File for the memfd to cmd.ExtraFiles, but neither Wrap nor session cleanup closes that parent-side file. Long-running execd instances that create many isolated sessions will accumulate one open fd per session and can eventually hit the process fd limit even after the bwrap child has already consumed the filter.
Useful? React with 👍 / 👎.
…terface (OSEP-0013) Review fixes: - Fix session lifecycle: done channel prevents stdin close on normal return - Fix overlay mode: pass upper/work dirs to bwrap WrapOptions - Fix end marker mid-line concatenation with strings.Index - Extract scanUntilMarker() to reduce cognitive complexity - Add runMu serialization, upperID tracking, validateExtraWritable() - Delete dead seccomp.go (replaced by seccomp_gen.go + memfd) - Fix MergedView.ReadDir upper-takes-precedence via entryMap - Fix UpperManager.Collect delete-on-success only TOML config (replaces 4 --isolation-* flags): - New isolation.Config + LoadConfig() with go-toml/v2 - Seccomp override: [seccomp].deny fully replaces built-in denylist - configs/isolation.example.toml with documented defaults - Single --isolation-config flag + EXECD_ISOLATION_CONFIG env API simplification: - Flatten CreateIsolatedSessionRequest (remove isolation wrapper) - Delete dead PersistSpec, ArtifactURLs, IsolationSpec types - Wire up envs param with shell-escaped export prepend - Remove unused cwd from IsolatedRunRequest - Update OpenAPI spec to match VFS interface: - New pkg/vfs.FS interface for filesystem abstraction - MergedView satisfies vfs.FS (compile-time check) - Isolated file handlers use vfs.FS instead of concrete MergedView - TODO for FilesystemController migration Tests: - 6 end-to-end workflow tests (Run↔File API interop) - 2 seccomp config override e2e tests - 8 config loading tests - Document overlayfs API→Run limitation in MergedView Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8261783 to
5ff9482
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ff9482fe5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cmd := exec.Command(p, | ||
| "--ro-bind", "/", "/", | ||
| "--proc", "/proc", | ||
| "--", "true", |
There was a problem hiding this comment.
Exercise the same namespaces during probing
In environments where bwrap can perform the simple bind/proc setup but cannot create the PID/UTS/IPC namespaces, this smoke test still marks isolation as available even though every real session later adds --unshare-pid, --unshare-uts, and --unshare-ipc in buildArgv. That leaves /v1/isolated/capabilities reporting availability and initializes the runner, but session creation then fails per request instead of being disabled up front; the probe should include the same namespace flags used for actual sessions.
Useful? React with 👍 / 👎.
| if err := os.MkdirAll(filepath.Dir(newUpper), 0o755); err != nil { | ||
| return err | ||
| } | ||
| return os.Rename(oldUpper, newUpper) |
There was a problem hiding this comment.
White out lower files when moving them
When oldPath exists only in the lower workspace, this path copies it into upper and then renames that upper copy to the destination, but it never creates a whiteout for the original lower path. In overlay sessions, mv lower.txt new.txt through the isolated file API therefore leaves lower.txt visible via Stat/ReadFile because lookups fall back to the lower layer after oldUpper is moved, so the move behaves like a copy.
Useful? React with 👍 / 👎.
| // File exists only in lower — cannot delete without whiteout. | ||
| lowerPath := m.resolveLower(rel) | ||
| if _, err := os.Stat(lowerPath); err == nil { | ||
| return fmt.Errorf("cannot remove file from read-only workspace lower: %s", path) |
There was a problem hiding this comment.
Hide lower-only files instead of failing delete
For overlay sessions, deleting a file that exists only in the lower workspace should create a whiteout in the upper layer so the merged view no longer exposes it. This branch instead returns an error for exactly that common case, so DELETE /v1/isolated/session/:id/files?path=... fails with a 500 and the file remains visible whenever the target was part of the original workspace.
Useful? React with 👍 / 👎.
| filePath := c.ctx.Query("path") | ||
| oldStr := c.ctx.Query("old") | ||
| newStr := c.ctx.Query("new") | ||
| if filePath == "" || oldStr == "" { | ||
| c.RespondError(http.StatusBadRequest, model.ErrorCodeMissingQuery, "path and old are required") | ||
| return |
There was a problem hiding this comment.
Parse replace requests from the JSON body
The OpenAPI contract for isolatedReplaceContent mirrors /files/replace and sends a JSON map of paths to ReplaceFileContentItem, but this handler ignores the body and requires undocumented path, old, and new query parameters. Generated clients that follow specs/execd-api.yaml will post the documented body and consistently receive path and old are required, making the isolated replace endpoint incompatible with the published API.
Useful? React with 👍 / 👎.
| var req model.CreateIsolatedSessionRequest | ||
| if err := c.bindJSON(&req); err != nil { | ||
| c.RespondError(http.StatusBadRequest, model.ErrorCodeInvalidRequest, err.Error()) | ||
| return |
There was a problem hiding this comment.
Validate isolated request models before using them
bindJSON only decodes JSON, so the validate tags and Validate methods on the new isolated request models never run. A create request such as {} reaches the runtime with an empty workspace and returns a 500, and a run request without code executes an empty command successfully despite the OpenAPI required fields; these should be rejected as 400s immediately after decoding.
Useful? React with 👍 / 👎.
| paths := c.ctx.QueryArray("path") | ||
| for _, p := range paths { | ||
| if err := mv.MkdirAll(p, 0o755); err != nil { | ||
| c.RespondError(http.StatusInternalServerError, model.ErrorCodeRuntimeError, err.Error()) | ||
| return | ||
| } | ||
| } | ||
| c.RespondSuccess(nil) |
There was a problem hiding this comment.
Read isolated directory requests from the body
Both isolated directory operations are documented with JSON request bodies, but this implementation reads only path query parameters and ignores the body. A generated client that posts the documented map of directories to permissions hits this code with an empty paths slice, receives 200, and no directories are created; the same query/body mismatch affects the delete handler below.
Useful? React with 👍 / 👎.
| paths := c.ctx.QueryArray("path") | ||
| resp := make(map[string]model.FileInfo) |
There was a problem hiding this comment.
Read file-info paths from the documented body
isolatedGetFilesInfo is published in specs/execd-api.yaml with a required JSON body containing the paths to stat, but this handler only looks at repeated path query parameters. Clients generated from the new spec will send the body, paths will be empty, and the endpoint returns an empty 200 response instead of the requested metadata.
Useful? React with 👍 / 👎.
| if s.upperID != "" { | ||
| r.upperMgr.Release(s.upperID) | ||
| } |
There was a problem hiding this comment.
Reclaim upper directories after session deletion
Deleting an overlay session only marks its upper entry as not in use, but no code path ever calls upperMgr.Collect() (the GC loop only calls CollectIdle). As a result, every deleted overlay session leaves its upper/work directories on disk until the pod is destroyed, which can steadily fill the execd filesystem in long-running pods.
Useful? React with 👍 / 👎.
| go func() { | ||
| select { | ||
| case <-ctx.Done(): | ||
| stdin.Close() | ||
| case <-done: |
There was a problem hiding this comment.
Remove canceled runs from the session map
When a client disconnects or a request context is canceled during a run, this goroutine closes the persistent bash stdin, which normally causes the session process to exit; however the session remains stored as active and its upper directory remains in use. The next run on that session will fail against a dead pipe/process while GET /v1/isolated/session/:id still reports an active session, so cancellation should stop/delete or mark the session destroyed.
Useful? React with 👍 / 👎.
| if err != nil { | ||
| telemetry.RecordIsolatedRun(ctx, "error", durationMs) | ||
| event := model.ServerStreamEvent{ | ||
| Type: model.StreamEventTypeError, | ||
| Text: err.Error(), | ||
| Timestamp: time.Now().UnixMilli(), | ||
| } | ||
| c.writeSingleEvent("IsolatedError", event.ToJSON(), true, event.Summary()) |
There was a problem hiding this comment.
Return the documented 404 for missing run sessions
If sessionId does not exist, RunInIsolatedSession returns ErrContextNotFound, but this branch converts every error into a 200 text/event-stream error frame. Callers using the new OpenAPI contract for /v1/isolated/session/{sessionId}/run cannot distinguish a missing session from an execution failure via the documented 404 response.
Useful? React with 👍 / 👎.
Summary
Testing
Breaking Changes
Checklist