Skip to content

feat(execd): add isolation package with bwrap support#1008

Open
Pangjiping wants to merge 5 commits into
opensandbox-group:mainfrom
Pangjiping:osep-0013-phase1-isolation-core
Open

feat(execd): add isolation package with bwrap support#1008
Pangjiping wants to merge 5 commits into
opensandbox-group:mainfrom
Pangjiping:osep-0013-phase1-isolation-core

Conversation

@Pangjiping

Copy link
Copy Markdown
Collaborator
  • Add pkg/isolation/ package: Isolator interface, bwrap argv builder, startup probe, upper directory management, seccomp loading
  • Switch bwrap distribution from //go:embed to Dockerfile static build (musl-gcc) and init container injection alongside execd
  • Add isolation flags (upper root, max bytes, diff max bytes, allowed writable) with env var overrides
  • Add smoke test: Docker build, extract binaries, verify static link, bwrap namespace test, execd probe
  • Add smoke_bwrap.sh to CI workflow (ubuntu-latest only)
  • Defer diff/commit to Phase 2 (stub returning 503)

Summary

  • What is changing and why?

Testing

  • Not run (explain why)
  • Unit tests
  • Integration tests
  • e2e / manual verification

Breaking Changes

  • None
  • Yes (describe impact and migration path)

Checklist

  • Linked Issue or clearly described motivation
  • Added/updated docs (if needed)
  • Added/updated tests (if needed)
  • Security impact considered
  • Backward compatibility considered

@Pangjiping Pangjiping force-pushed the osep-0013-phase1-isolation-core branch 14 times, most recently from 45245e8 to 03cbe0b Compare June 9, 2026 14:44
…e 1)

- Add pkg/isolation/ package: Isolator interface, bwrap argv builder,
  startup probe, upper directory management, seccomp loading
- Switch bwrap distribution from //go:embed to Dockerfile static build
  (musl-gcc) and init container injection alongside execd
- Add isolation flags (upper root, max bytes, diff max bytes, allowed
  writable) with env var overrides
- Add smoke test: Docker build, extract binaries, verify static link,
  bwrap namespace test, execd probe
- Add smoke_bwrap.sh to CI workflow (ubuntu-latest only)
- Defer diff/commit to Phase 2 (stub returning 503)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Pangjiping Pangjiping force-pushed the osep-0013-phase1-isolation-core branch 12 times, most recently from 040edea to 19a3775 Compare June 10, 2026 06:09
@Pangjiping Pangjiping force-pushed the osep-0013-phase1-isolation-core branch 3 times, most recently from 6163613 to 22d8a0f Compare June 10, 2026 09:59
- Add isolated session model types and /v1/isolated/* router (17 endpoints)
- Implement session lifecycle: Create/Get/Run/Delete with bwrap+bash
- SSE streaming via basicController writeSingleEvent (context-aware)
- MergedView overlay filesystem with whiteout support (20 unit tests)
- Filesystem proxy endpoints (10 handlers via MergedView)
- Idle GC: background goroutine scavenges sessions past idle_timeout
- Bwrap integration tests (43 tests, linux+bwrap build tag)
- Isolated session unit tests (15 tests, stub isolator)
- CI job bwrap-smoke: meson build bwrap v0.11.2 + sudo go test
- Windows stubs for cross-platform compilation
- Fix: cmd.Wait() zombie cleanup, context cancellation propagation,
  setpriv skip when uid=0, correct v0.11.x overlay syntax

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Pangjiping Pangjiping force-pushed the osep-0013-phase1-isolation-core branch from 26158ac to 29a6549 Compare June 10, 2026 11:47
…ts (OSEP-0013)

Seccomp BPF:
- pkg/isolation/seccomp_gen.go — BPF generator (elastic/go-seccomp-bpf, pure Go)
- Default-allow denylist: 40+ dangerous syscalls blocked (mount, ptrace, etc.)
- BPF passed to bwrap via memfd + ExtraFiles fd

Telemetry:
- execd.isolation.session.count (gauge)
- execd.isolation.run.duration (histogram, ms)
- execd.isolation.upper.usage_bytes (gauge)
- IsolationStatsProvider pattern for gauge callbacks

OpenAPI Spec:
- specs/execd-api.yaml: 17 endpoints, 10 schemas, ServiceUnavailable response

Integration tests (64 total in bwrap_test/):
- 5 seccomp tests (filter active, normal syscalls, ptrace block, mount block, persist)
- 3 ExtraWritable tests (write, read-write roundtrip, multiple sessions)
- 11 gap-coverage tests (stderr, recovery, cancellation, network iso, 100x stress, delete-recreate, bash builtins, large file, subprocess cleanup, buffer size, workspace isolation)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Pangjiping Pangjiping force-pushed the osep-0013-phase1-isolation-core branch from 284f75f to af39baa Compare June 10, 2026 15:39
17 endpoints, 10 new schemas, ServiceUnavailable response.
FS proxy endpoints reuse same schemas as /files/* and /directories/*.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Pangjiping Pangjiping force-pushed the osep-0013-phase1-isolation-core branch from 584f722 to bf1bfdd Compare June 10, 2026 15:57

@Pangjiping Pangjiping left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

3 critical bugs, 3 high-severity issues, 4 medium items. See inline comments for details.

Critical (must fix before merge):

  1. stdin close goroutine kills session after every HTTP request
  2. Upper directory ID mismatch → directories never cleaned up
  3. Concurrent Run has no mutual exclusion on shared stdin/stdout

High:
4. InitCodeRunner() called twice in main.go — duplicate controller
5. ReadDir doesn't prefer upper over lower entry metadata
6. Dockerfile puts bwrap at /bwrap but findBwrap() doesn't look there

Comment thread components/execd/pkg/runtime/isolated_session_ctrl.go
Comment thread components/execd/pkg/runtime/isolated_session_ctrl.go Outdated
Comment thread components/execd/pkg/runtime/isolated_session_ctrl.go
Comment thread components/execd/main.go
Comment thread components/execd/pkg/isolation/merged_view.go Outdated
Comment thread components/execd/Dockerfile Outdated
Comment thread components/execd/pkg/runtime/isolated_session_ctrl.go
Comment thread components/execd/pkg/isolation/upper.go Outdated
Comment thread components/execd/pkg/isolation/seccomp.go Outdated
Comment thread components/execd/pkg/runtime/isolated_session.go
@Pangjiping Pangjiping marked this pull request as ready for review June 12, 2026 15:12

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 721bcd85ac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +100 to +101
wrapOpts.EnvPassthrough.Mode = isolation.EnvMode(s.opts.EnvPassthroughMode)
wrapOpts.EnvPassthrough.Keys = s.opts.EnvPassthroughKeys

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply strict env defaults before wrapping

When callers create the default/strict profile without an explicit env_passthrough, s.opts.EnvPassthroughMode is empty here, so bwrapEnvSegment adds no --unsetenv/--clearenv flags and the child inherits execd's full environment. That contradicts the strict profile default in oseps/0013-isolated-execution-api.md (deny + blacklist) and exposes parent secrets such as *_TOKEN to isolated code unless every client opts in manually.

Useful? React with 👍 / 👎.

Comment on lines +133 to +134
ctx, cancel := context.WithCancel(c.ctx.Request.Context())
defer cancel()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor per-run timeout_seconds

For requests that set timeout_seconds, the handler still creates only a cancelable copy of the HTTP request context and never applies req.TimeoutSeconds, so a command like sleep 3600 keeps the isolated bash run and SSE response alive until the client disconnects. The new request model and OpenAPI schema expose this field, so the controller should wrap the context with a timeout when the value is positive before calling RunInIsolatedSession.

Useful? React with 👍 / 👎.

Comment on lines +31 to +33
func (c *IsolatedSessionController) getMergedView() (vfs.FS, error) {
sessionID := c.ctx.Param("sessionId")
mv, err := isolatedRunner.GetMergedView(sessionID)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard file endpoints when isolation is unavailable

When the bwrap probe fails at startup (or runner initialization is skipped), isolatedRunner remains nil; the create/run/get/delete handlers return 503 via probed(), but the file handlers call getMergedView() directly and this dereference panics instead of returning the documented service-unavailable response. This affects any /v1/isolated/session/:id/files* or /directories request in environments without working isolation.

Useful? React with 👍 / 👎.

Comment on lines +176 to +178
oldPath := c.ctx.Query("old_path")
newPath := c.ctx.Query("new_path")
if oldPath == "" || newPath == "" {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse file moves from the documented body

The new OpenAPI path /v1/isolated/session/{sessionId}/files/mv defines the same JSON body as /files/mv (an array of RenameFileItem with src/dest), but this handler ignores the body and requires old_path/new_path query parameters. Clients generated from specs/execd-api.yaml will send the documented JSON and consistently receive a 400 missing-query error, so the implementation and contract are not usable together.

Useful? React with 👍 / 👎.

Comment on lines +403 to +405
for _, allowed := range r.allowedWritable {
if strings.HasPrefix(p, allowed) {
found = true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject sibling paths in extra_writable allowlists

When an operator allowlists a directory such as /workspace/cache, this prefix check also accepts caller-supplied siblings like /workspace/cache-escape or /workspace/cache2, which are then passed to bwrap as writable binds. That lets a request escape the configured extra_writable boundary; the check should compare cleaned path components rather than raw string prefixes.

Useful? React with 👍 / 👎.

Comment on lines +154 to +159
if m.UpperDir != "" {
if data, err := os.ReadFile(m.resolveUpper(rel)); err == nil {
return data, nil
}
}
return os.ReadFile(m.resolveLower(rel))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor overlay whiteouts when reading files

When code inside an overlay session deletes a lower-layer file, the upper layer records a .wh.<name> whiteout; ReadDir handles those whiteouts, but ReadFile falls back to the lower file whenever the upper path itself is absent. After rm file inside the session, the isolated download/read path can still return the deleted lower-layer content, so the API no longer reflects the session filesystem.

Useful? React with 👍 / 👎.

Comment on lines +128 to +130
filePath := c.ctx.Query("path")
if filePath == "" {
c.RespondError(http.StatusBadRequest, model.ErrorCodeMissingQuery, "path is required")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read upload paths from multipart metadata

The new isolated upload endpoint is documented like /files/upload, with the destination path carried in each multipart metadata part, but this handler ignores metadata and requires a non-documented path query parameter. Generated clients following specs/execd-api.yaml will send metadata + file and consistently get path is required, making isolated uploads incompatible with the public contract.

Useful? React with 👍 / 👎.

Comment on lines +273 to +278
var exitCode int
scanDone := make(chan struct{})
go func() {
defer close(scanDone)
for scanner.Scan() {
line := scanner.Text()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require the end marker before reporting success

If user code exits the persistent shell before the appended marker runs, for example by sending exit 7 or exec some-process, scanner.Scan() reaches EOF, exitCode remains its zero value, and the handler reports a successful run even though the session process is gone. The scanner should treat EOF before seeing the marker as an execution error instead of returning code 0.

Useful? React with 👍 / 👎.

Comment on lines +59 to +65
func (m *UpperManager) Allocate() (sessionID, upperDir, workDir string, err error) {
id := newSessionID()

upperDir = filepath.Join(m.root, id, "upper")
workDir = filepath.Join(m.root, id, "work")

if err := os.MkdirAll(upperDir, 0o755); err != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the configured upper size limit

UpperManager stores maxBytes, and the shipped config describes upper_max_bytes as a hard limit, but allocation and write paths never consult it; overlay sessions can therefore keep creating/writing upper directories until the pod/node filesystem fills. At minimum, allocation or writes need to reject once current usage plus the new session/write would exceed the configured limit.

Useful? React with 👍 / 👎.

Comment on lines +128 to +130
// ExtraFiles are assigned fds starting at 3 in the child process.
seccompFd = strconv.Itoa(3 + len(cmd.ExtraFiles))
cmd.ExtraFiles = append(cmd.ExtraFiles, os.NewFile(uintptr(fd), "seccomp"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Close seccomp memfds after starting sessions

Every wrapped command with seccomp enabled appends a new os.File for the memfd to cmd.ExtraFiles, but neither Wrap nor session cleanup closes that parent-side file. Long-running execd instances that create many isolated sessions will accumulate one open fd per session and can eventually hit the process fd limit even after the bwrap child has already consumed the filter.

Useful? React with 👍 / 👎.

…terface (OSEP-0013)

Review fixes:
- Fix session lifecycle: done channel prevents stdin close on normal return
- Fix overlay mode: pass upper/work dirs to bwrap WrapOptions
- Fix end marker mid-line concatenation with strings.Index
- Extract scanUntilMarker() to reduce cognitive complexity
- Add runMu serialization, upperID tracking, validateExtraWritable()
- Delete dead seccomp.go (replaced by seccomp_gen.go + memfd)
- Fix MergedView.ReadDir upper-takes-precedence via entryMap
- Fix UpperManager.Collect delete-on-success only

TOML config (replaces 4 --isolation-* flags):
- New isolation.Config + LoadConfig() with go-toml/v2
- Seccomp override: [seccomp].deny fully replaces built-in denylist
- configs/isolation.example.toml with documented defaults
- Single --isolation-config flag + EXECD_ISOLATION_CONFIG env

API simplification:
- Flatten CreateIsolatedSessionRequest (remove isolation wrapper)
- Delete dead PersistSpec, ArtifactURLs, IsolationSpec types
- Wire up envs param with shell-escaped export prepend
- Remove unused cwd from IsolatedRunRequest
- Update OpenAPI spec to match

VFS interface:
- New pkg/vfs.FS interface for filesystem abstraction
- MergedView satisfies vfs.FS (compile-time check)
- Isolated file handlers use vfs.FS instead of concrete MergedView
- TODO for FilesystemController migration

Tests:
- 6 end-to-end workflow tests (Run↔File API interop)
- 2 seccomp config override e2e tests
- 8 config loading tests
- Document overlayfs API→Run limitation in MergedView

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Pangjiping Pangjiping force-pushed the osep-0013-phase1-isolation-core branch from 8261783 to 5ff9482 Compare June 12, 2026 15:32

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ff9482fe5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +113 to +116
cmd := exec.Command(p,
"--ro-bind", "/", "/",
"--proc", "/proc",
"--", "true",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exercise the same namespaces during probing

In environments where bwrap can perform the simple bind/proc setup but cannot create the PID/UTS/IPC namespaces, this smoke test still marks isolation as available even though every real session later adds --unshare-pid, --unshare-uts, and --unshare-ipc in buildArgv. That leaves /v1/isolated/capabilities reporting availability and initializes the runner, but session creation then fails per request instead of being disabled up front; the probe should include the same namespace flags used for actual sessions.

Useful? React with 👍 / 👎.

if err := os.MkdirAll(filepath.Dir(newUpper), 0o755); err != nil {
return err
}
return os.Rename(oldUpper, newUpper)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge White out lower files when moving them

When oldPath exists only in the lower workspace, this path copies it into upper and then renames that upper copy to the destination, but it never creates a whiteout for the original lower path. In overlay sessions, mv lower.txt new.txt through the isolated file API therefore leaves lower.txt visible via Stat/ReadFile because lookups fall back to the lower layer after oldUpper is moved, so the move behaves like a copy.

Useful? React with 👍 / 👎.

Comment on lines +239 to +242
// File exists only in lower — cannot delete without whiteout.
lowerPath := m.resolveLower(rel)
if _, err := os.Stat(lowerPath); err == nil {
return fmt.Errorf("cannot remove file from read-only workspace lower: %s", path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide lower-only files instead of failing delete

For overlay sessions, deleting a file that exists only in the lower workspace should create a whiteout in the upper layer so the merged view no longer exposes it. This branch instead returns an error for exactly that common case, so DELETE /v1/isolated/session/:id/files?path=... fails with a 500 and the file remains visible whenever the target was part of the original workspace.

Useful? React with 👍 / 👎.

Comment on lines +218 to +223
filePath := c.ctx.Query("path")
oldStr := c.ctx.Query("old")
newStr := c.ctx.Query("new")
if filePath == "" || oldStr == "" {
c.RespondError(http.StatusBadRequest, model.ErrorCodeMissingQuery, "path and old are required")
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse replace requests from the JSON body

The OpenAPI contract for isolatedReplaceContent mirrors /files/replace and sends a JSON map of paths to ReplaceFileContentItem, but this handler ignores the body and requires undocumented path, old, and new query parameters. Generated clients that follow specs/execd-api.yaml will post the documented body and consistently receive path and old are required, making the isolated replace endpoint incompatible with the published API.

Useful? React with 👍 / 👎.

Comment on lines +61 to +64
var req model.CreateIsolatedSessionRequest
if err := c.bindJSON(&req); err != nil {
c.RespondError(http.StatusBadRequest, model.ErrorCodeInvalidRequest, err.Error())
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate isolated request models before using them

bindJSON only decodes JSON, so the validate tags and Validate methods on the new isolated request models never run. A create request such as {} reaches the runtime with an empty workspace and returns a 500, and a run request without code executes an empty command successfully despite the OpenAPI required fields; these should be rejected as 400s immediately after decoding.

Useful? React with 👍 / 👎.

Comment on lines +243 to +250
paths := c.ctx.QueryArray("path")
for _, p := range paths {
if err := mv.MkdirAll(p, 0o755); err != nil {
c.RespondError(http.StatusInternalServerError, model.ErrorCodeRuntimeError, err.Error())
return
}
}
c.RespondSuccess(nil)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read isolated directory requests from the body

Both isolated directory operations are documented with JSON request bodies, but this implementation reads only path query parameters and ignores the body. A generated client that posts the documented map of directories to permissions hits this code with an empty paths slice, receives 200, and no directories are created; the same query/body mismatch affects the delete handler below.

Useful? React with 👍 / 👎.

Comment on lines +55 to +56
paths := c.ctx.QueryArray("path")
resp := make(map[string]model.FileInfo)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read file-info paths from the documented body

isolatedGetFilesInfo is published in specs/execd-api.yaml with a required JSON body containing the paths to stat, but this handler only looks at repeated path query parameters. Clients generated from the new spec will send the body, paths will be empty, and the endpoint returns an empty 200 response instead of the requested metadata.

Useful? React with 👍 / 👎.

Comment on lines +327 to +329
if s.upperID != "" {
r.upperMgr.Release(s.upperID)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reclaim upper directories after session deletion

Deleting an overlay session only marks its upper entry as not in use, but no code path ever calls upperMgr.Collect() (the GC loop only calls CollectIdle). As a result, every deleted overlay session leaves its upper/work directories on disk until the pod is destroyed, which can steadily fill the execd filesystem in long-running pods.

Useful? React with 👍 / 👎.

Comment on lines +239 to +243
go func() {
select {
case <-ctx.Done():
stdin.Close()
case <-done:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove canceled runs from the session map

When a client disconnects or a request context is canceled during a run, this goroutine closes the persistent bash stdin, which normally causes the session process to exit; however the session remains stored as active and its upper directory remains in use. The next run on that session will fail against a dead pipe/process while GET /v1/isolated/session/:id still reports an active session, so cancellation should stop/delete or mark the session destroyed.

Useful? React with 👍 / 👎.

Comment on lines +153 to +160
if err != nil {
telemetry.RecordIsolatedRun(ctx, "error", durationMs)
event := model.ServerStreamEvent{
Type: model.StreamEventTypeError,
Text: err.Error(),
Timestamp: time.Now().UnixMilli(),
}
c.writeSingleEvent("IsolatedError", event.ToJSON(), true, event.Summary())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return the documented 404 for missing run sessions

If sessionId does not exist, RunInIsolatedSession returns ErrContextNotFound, but this branch converts every error into a 200 text/event-stream error frame. Callers using the new OpenAPI contract for /v1/isolated/session/{sessionId}/run cannot distinguish a missing session from an execution failure via the documented 404 response.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant