Skip to content

feat(shell): v0.4.0 — SDK bump, central config + hot-reload, LLM-friendly typed API, hardening + DX#238

Merged
andersonleal merged 4 commits into
mainfrom
feat/shell-sdk-bump-config-llm-api
Jun 9, 2026
Merged

feat(shell): v0.4.0 — SDK bump, central config + hot-reload, LLM-friendly typed API, hardening + DX#238
andersonleal merged 4 commits into
mainfrom
feat/shell-sdk-bump-config-llm-api

Conversation

@andersonleal

@andersonleal andersonleal commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Shell worker v0.3.7 → v0.4.0 — a single breaking release. Bumps the SDK, moves the worker onto the central configuration worker with hot-reload, publishes typed JSON schemas for an LLM-friendly API, and adds a round of pre-landing hardening plus DX features driven by a real agent session.

Two commits: the v0.4.0 base, then the pre-landing hardening + DX additions.

SDK & observability

  • iii-sdk 0.16.0-next.2 → 0.19.1-next.1; add iii-observability
  • per-call OpenTelemetry (shell.calls, shell.call.duration_ms, shell.jobs.running gauge, shell.exec.output_truncated) + tracing spans

Central configuration

  • database-style central config via the configuration worker with serialized hot-reload (fetch → build → swap under one lock); fail-closed boot reconcile; payload-untrusted re-fetch; shell::config-status denied to agents
  • removed manifest.rs; restored ./config.yaml default; added config.collect.yaml for CI

LLM-friendly typed API

  • published request/response JSON schemas for all functions (engine tool listing shows field types)
  • S-codes surfaced as the wire error.code (agents branch on error.code instead of parsing messages); unified S-code table
  • per-call cwd + gated env for exec/exec_bg; structured, sandbox-safe fs::* responses

Pre-landing hardening (adversarial review)

  • PID-safe kill: host bg jobs terminate via a per-job notify channel owned by the drain task, not a bare-pid SIGKILL — closes a PID-reuse window that could kill an unrelated process when the worker runs as root
  • process-group spawn + group-kill on kill/timeout/shutdown (no orphaned grandchildren); bounded drain join; atomic host-bg finalize
  • client-side RPC timeout on sandbox bg forwards (a hung engine can't leak a concurrency slot)
  • fs: read cap + binary-skip in sed, byte-bounded grep line reads, recursive chmod/rm/ls/mv moved off the async executor
  • reject exec command paths that resolve inside the writable fs jail (closes an allowlist-basename → fs::write → exec host RCE)
  • broadened per-call env denylist (loader + interpreter startup-file keys); fixed the rejection message that named HOME/PATH as settable

LLM-DX API additions (from a real agent session)

  • inline shell::fs::write: content accepts a plain string (host target) — no streaming channel to construct; sandbox still requires a ContentRef
  • multi-file batch write: files: [{ path, content, mode?, parents? }] in one call, with per-file results
  • stdin on shell::exec / shell::exec_bg (host-only): pipe to tee/patch/filters without a shell heredoc
  • separate max_bg_timeout_ms (default 0 = unbounded), distinct from the foreground max_timeout_ms — long installs/builds/dev servers are no longer killed at the 30s foreground cap

Test plan

  • cargo build --all-targets — clean
  • cargo clippy --all-targets — clean
  • cargo fmt --all --check — clean
  • cargo test540 passing, 0 failing
  • adversarial verification of the concurrency core + the new write/stdin/bg-timeout features — SOUND
  • live-verified earlier on the running engine: cwd/env present in the published shell::exec schema; S-code returned as the top-level wire error.code
  • reviewer: exercise the sandbox round-trip for the new write/exec shapes against a live engine

Leaves 6 unrelated */Cargo.lock files (other workers) untouched.

Summary by CodeRabbit

  • New Features

    • Configuration hot-reload with a config-status endpoint and fail-closed boot.
    • Per-call exec overrides: cwd, env, and stdin (host-only; sandbox rejects).
    • Background-job hard-cap timeout (configurable max_bg_timeout_ms).
    • Inline single-file and multi-file batch filesystem writes; richer fs responses (path, entries_changed, per-file results).
    • Exec/FS telemetry and runtime metrics.
  • Chores

    • Release bumped to 0.4.0.
    • fs.allow_special_bits toggle and tightened command/env confinement.

…ndly typed API, hardening (v0.4.0)

BREAKING CHANGE: bumps iii-sdk to 0.19.1-next.1 and restructures the shell
worker around the central `configuration` worker with hot-reload. 0.3.7 -> 0.4.0.

SDK & observability
- iii-sdk 0.16.0-next.2 -> 0.19.1-next.1; add iii-observability 0.19.1-next.1
- per-call OpenTelemetry: shell.calls, shell.call.duration_ms,
  shell.jobs.running gauge, shell.exec.output_truncated; tracing spans

Central configuration
- database-style central config via the `configuration` worker with serialized
  hot-reload (fetch -> build -> swap under a single lock)
- fail-closed boot reconcile; re-fetch authoritative config (no payload trust);
  validate seed before persist; deny shell::on-config-change for agents
- remove manifest.rs; restore ./config.yaml --config default; config.collect.yaml

LLM-friendly typed API
- publish request/response JSON schemas (typed ExecRequest/ExecBgRequest, empty
  ListRequest/ConfigStatusRequest) so the engine tool listing shows field types
- surface S-codes as the wire error.code (From<FsError/ExecError> for
  IIIError::Remote) so agents can branch on error.code; unify the S-code table
- per-call cwd + gated env for exec/exec_bg (jail-confined cwd, allowed_env gating)
- structured sandbox-safe fs responses; discoverable error-code hints in descriptions

Hardening
- job lifecycle: reaper eviction, host-bg timeout, finalize-once, kill running
  host jobs by pid (on demand + on shutdown), reap rejected child
- fs host: setuid/setgid guard, regex size caps, grep/sed walk off the async executor
- deny HOME + glibc exec-vector env keys (GCONV_PATH, LD_*/DYLD_*, IFS, PATH ...) per call

522 tests passing; build/clippy/fmt clean.
…te, stdin, bg timeout)

Follow-on work on top of the v0.4.0 squash, adversarially verified
(build/clippy/fmt clean, 540 tests passing).

Pre-landing review hardening
- kill: terminate host bg jobs via a per-job notify channel owned by the drain
  task instead of a bare-pid SIGKILL — closes a PID-reuse window that could
  SIGKILL an unrelated process when the worker runs as root
- spawn children in their own process group; group-kill on kill/timeout/shutdown
  so grandchildren don't orphan; bounded drain join so a held pipe can't wedge a job
- atomic host-bg finalize (status + exit_code + output published together)
- client-side RPC timeout on sandbox bg forwards (a hung engine can't leak a slot)
- fs: enforce a read cap + binary-skip in sed, byte-bounded grep line reads, treat
  0 caps as defaults, move recursive chmod/rm/ls/mv off the async executor
- reject exec command paths that resolve inside the writable fs jail (closes an
  allowlist-basename -> fs::write -> exec host RCE)
- add interpreter startup-file keys (BASH_ENV/ENV/PYTHONSTARTUP/PERL5OPT/RUBYOPT/
  NODE_OPTIONS) to the per-call env denylist; stop the env-rejection message from
  naming HOME/PATH as "settable"

LLM-DX API additions (from a real agent session that hit the friction)
- shell::fs::write accepts inline string content (host target) in addition to a
  streaming ContentRef; sandbox still requires a ContentRef (inline -> S210)
- shell::fs::write batch form: files: [{path, content, mode?, parents?}] in one
  call, with per-file results
- shell::exec / shell::exec_bg gain an optional host-only stdin (pipe to tee/
  patch/filters without a shell heredoc)
- separate max_bg_timeout_ms (default 0 = unbounded) for host bg jobs, distinct
  from the foreground max_timeout_ms — long installs/builds/dev servers are no
  longer killed at the 30s foreground cap

Docs (README / ARCHITECTURE / SKILL / config.yaml / config.collect.yaml) updated
for all of the above.
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
workers Ready Ready Preview, Comment Jun 9, 2026 10:20pm

Request Review

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

skill-check — worker

0 verified, 14 skipped (no docs/).

Layer Result
structure
vale
ai
render

Four for four. Nicely done.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Centralizes shell configuration with hot-reload, enforces per-call exec policy (cwd/env/stdin), unifies S-code error mapping, reworks host background-job lifecycle, redesigns filesystem write/streaming and sandbox contracts, adds telemetry, and updates documentation and tests for v0.4.0.

Changes

Shell Worker v0.4.0 Complete Refactor

Layer / File(s) Summary
Release surface & operator docs
iii-permissions.yaml, shell/ARCHITECTURE.md, shell/README.md, shell/skills/SKILL.md, shell/Cargo.toml, shell/config*.yaml
Permissions denylist extended; CLI/docs updated for config-driven startup, new fields (max_bg_timeout_ms, fs.allow_special_bits), function contract changes (entries_changed, shell::config-status, write inline vs stream).
Configuration hot-reload & runtime wiring
shell/src/config.rs, shell/src/configuration.rs, shell/src/main.rs
ShellConfig gains JsonSchema and serialization helpers; configuration.rs builds prepared runtime, register/fetch/reload logic, serialized reloads, register trigger shell::on-config-change, and boot reconcile; main.rs builds AppState and registers handlers after fail-closed reconcile.
S-code mapping & typed error lifting
shell/src/scode.rs, shell/src/exec/error.rs, shell/src/fs/error.rs
Unified S-code canonicalization and recovery; ExecError/FsError lift into iii_sdk::IIIError::Remote preserving the original S-code and message; local stringified helpers removed.
Exec overrides & backend changes
shell/src/exec/policy.rs, shell/src/exec/mod.rs, shell/src/exec/backend.rs, shell/src/exec/host.rs, shell/src/exec/sandbox.rs, shell/src/functions/exec*.rs, shell/src/functions/types.rs
Adds ExecOverrides and build_overrides validation (cwd confinement, allowed_env + dangerous-key denylist, stdin); threads overrides into ExecBackend::run; host applies overrides; sandbox rejects non-empty overrides; handlers updated to use typed IIIError.
Background-job lifecycle & kill flow
shell/src/jobs.rs, shell/src/functions/exec_bg.rs, shell/src/functions/kill.rs, shell/src/functions/status.rs
JobHandle tracks host_pid, try_reserve_and_insert returns a RAII RunningJobGuard, kill-signal registry added, spawn_host_job accepts overrides, host hard-cap max_bg_timeout_ms enforced, join_drain for output, and typed status/kill errors introduced.
Filesystem wire & backend redesign
shell/src/fs/mod.rs, shell/src/fs/host.rs, shell/src/fs/sandbox.rs, shell/src/functions/fs_*.rs
Wire: WriteContent (Inline/Stream), WriteRequest::into_specs for single vs batch; enriched responses (entries_changed alias for updated, path/flags, per-file results). Host backend: confine_path, spawn_blocking offload, sed/grep caps, special-bits checks, inline vs streamed write limits. Sandbox write rejects inline content and forwards stream ContentRef.
Handlers & error mapping
shell/src/functions/*
Handlers now return typed iii_sdk::IIIError/ExecError, removed err_to_string helpers, adapt fs write batching and exec override building, and updated tests.
Telemetry & registration refactor
shell/src/telemetry.rs, shell/src/main.rs, shell/src/lib.rs
New OTel metrics and record_call wrapper; main initializes telemetry and uses AppState to source live runtime per-call; fs handler registration centralized.
Tests & E2E harness updates
shell/tests/*, shell/tests/e2e/workers/harness/src/*
Extensive unit and e2e updates: assert new response fields, typed S-codes, inline/batch write cases, exec stdin cases, bg job timeout cases, and harness expectError matching on wire code.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • sergiofilhowz
  • ytallo

"🐰 I hopped through configs late at night,
tightened paths and made S-codes right,
per-call cages and writes that stream,
hot-reloads humming a careful dream,
tests green, docs tidy — hop, delight!"

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/shell-sdk-bump-config-llm-api

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@shell/config.yaml`:
- Line 2: Current config sets max_bg_timeout_ms: 0 which allows background jobs
to run indefinitely and can exhaust job slots; change the default to a finite
cap or implement a fallback to max_timeout_ms when max_bg_timeout_ms is
missing/zero. Update the configuration value for max_bg_timeout_ms to a sensible
non-zero default (e.g., same as max_timeout_ms) or update the job scheduler code
that reads max_bg_timeout_ms to treat 0/undefined as "use max_timeout_ms" and
enforce that timeout when scheduling background jobs; reference the config key
max_bg_timeout_ms, the fallback key max_timeout_ms, and the concurrency limiter
max_concurrent_jobs so the change prevents background jobs from holding slots
forever.

In `@shell/README.md`:
- Around line 158-166: The upgrade notes duplicate the "shell::fs::chmod
response field renamed `updated` → `entries_changed`" bullet; edit the README.md
upgrade list to remove the repeated instance so the chmod rename appears only
once (keep the first occurrence and delete the later duplicate referring to
shell::fs::chmod and `entries_changed`), ensuring the rest of the bullets and
ordering remain unchanged.
- Line 105: The README link for shell::exec points to the wrong fragment
`#per-call-cwd-and-env-host-target`; update the link so it matches the actual
section heading slug (which includes "stdin") or change the section title to
match the existing fragment. Concretely, in the shell::exec table entry replace
the fragment with the correct anchor (e.g.,
`#per-call-cwd-and-env-host-target-stdin` if the heading reads "Per-call `cwd`
and `env` (host-target, stdin)") or rename the heading to remove "stdin" so the
existing `#per-call-cwd-and-env-host-target` fragment resolves.

In `@shell/skills/SKILL.md`:
- Around line 66-68: The documentation incorrectly implies that all
sandbox-targeted calls to shell::exec are rejected; update the SKILL.md text for
shell::exec to state that only host-only override fields (e.g., providing a
host-only stdin string) are rejected with error S210 on sandbox targets, while
plain sandbox executions are allowed; specifically edit the sentence mentioning
"optional host-only `stdin`" to clarify that the host-only override is rejected
on sandbox targets (S210) but sandbox exec itself remains valid, and ensure the
wording references shell::exec and S210 for clarity.

In `@shell/src/config.rs`:
- Around line 245-260: The code currently only rejects commands with '/' when
self.fs.host_root is Some, letting an unjailed fs (host_root None +
self.fs.allow_unjailed == true) execute arbitrary host paths; modify the
cmd.contains('/') handling to also return an Err when
self.fs.host_root.is_none() && self.fs.allow_unjailed is true. Concretely,
inside the same branch that checks cmd.contains('/'), add a guard using
self.fs.allow_unjailed and self.fs.host_root to reject any path-form command
(cmd) when the FS is explicitly unjailed, producing a similar error message
referencing cmd and that executing files written via shell::fs::write or
arbitrary host paths is not allowed.

In `@shell/src/configuration.rs`:
- Around line 184-188: Modify try_get_config_value to stop string-matching RPC
errors and instead match the structured IIIError::Remote variant from
trigger_with_retry: ensure trigger_with_retry returns/propagates an IIIError
(not a String) so inside try_get_config_value you can match Err(IIIError::Remote
{ code, .. }) if code == "NOT_FOUND" => Ok(None), map Ok(resp) =>
Ok(resp.get("value").cloned()), and for other Err(e) return Err(e.to_string())
(or propagate the IIIError if you prefer changing the function error type);
reference try_get_config_value, trigger_with_retry, IIIError::Remote, and the
"configuration::get" / CONFIG_ID call when making the change.

In `@shell/src/exec/policy.rs`:
- Around line 53-88: The current is_dangerous_env_key only checks
DANGEROUS_ENV_KEYS for exact matches which misses other loader-related names;
update is_dangerous_env_key to return true if the key is exactly in
DANGEROUS_ENV_KEYS OR if it starts with the loader-family prefixes (e.g.
key.starts_with("LD_") || key.starts_with("DYLD_")), so any LD_/DYLD_ variant is
rejected; keep the existing DANGEROUS_ENV_KEYS for non-family names and
optionally remove redundant LD_*/DYLD_* entries from that constant to avoid
duplication.

In `@shell/src/fs/host.rs`:
- Around line 1243-1350: The loop in sed currently follows symlinks when reading
(std::fs::metadata/read_to_string) but then writes to the symlink path with
std::fs::rename(&tmp, p), which replaces the link with a regular file; fix by
detecting and handling symlinks consistently: call std::fs::symlink_metadata(p)
and check file_type().is_symlink() for each anchored_path (from
lexical_operand_with) and either push an FsSedFileResult error (e.g., "operand
is a symlink; skipped") to reject symlink operands, or else canonicalize the
operand once (std::fs::canonicalize) and use that canonical path for the
metadata/read_to_string/temp_sibling/write/rename so the same inode is read and
rewritten; update the code paths around the
metadata/read_to_string/tmp/temp_sibling/rename sections (the loop over
anchored, the use of looks_binary, temp_sibling, and write/rename) to use the
chosen consistent approach.
- Around line 249-260: The current canonicalize_with_fallback error mapping
treats any dangling symlink as S215, blocking link-preserving operations; change
canonicalize_with_fallback (or add an overload like
canonicalize_with_fallback_allow_dangling_leaf) to accept a flag
"allow_dangling_leaf" and only map dangling-symlink errors to S215 when that
flag is false, otherwise return a lexical-fallback style result (or a distinct
error code) so callers can switch back to lexical resolution; update callers for
link-preserving handlers (rm, mv, chmod, sed or their handler functions) to call
the new variant with allow_dangling_leaf=true, leaving other callers using the
default fully-resolved behavior (allow_dangling_leaf=false) that still maps to
S215 via FsError::new("S215", ...) while non-dangling errors keep S210.
- Around line 859-908: Replace the racy precheck using dst_p.exists() and move
the overwrite/no-overwrite decision into the blocking rename unit: when
req.overwrite is false, call an atomic no-replace move (use libc::renameat2 with
RENAME_NOREPLACE on Linux) inside the tokio::task::spawn_blocking closure
instead of the separate exists check; if renameat2 returns EEXIST convert to
FsError::new("S213", ...) as before, on ENOSYS fall back to the existing
std::fs::rename / EXDEV copy+rename+unlink path, and on success set overwrote =
false (derive the overwrote flag from whether the operation replaced an existing
dst rather than the removed dst_existed precheck); keep using temp_sibling,
std::fs::copy, and FsError::from_io in the fallback and map errors consistently.

In `@shell/src/fs/mod.rs`:
- Around line 485-506: The batch-path handling in WriteRequest::into_specs
currently only rejects when path or content are set but silently ignores
top-level mode and parents; update into_specs (the method on WriteRequest) so
that when self.files.is_some() it checks for any of the single-file fields
(self.path, self.content, self.mode (non-default/empty), and self.parents) and
returns an FsError "S210" if any are present, with a message like "provide
either the single-file `path`+`content`+`mode`+`parents` or a `files` array, not
both"; ensure you reference the WriteRequest fields (path, content, mode,
parents, files) so callers are rejected when mixing forms rather than silently
dropping mode/parents.

In `@shell/src/functions/exec_bg.rs`:
- Around line 55-63: The S210 error message returned in the
!overrides.is_empty() check should mention stdin as a host-only override as
well; update the error string in the exec_bg.rs block that checks overrides (the
!overrides.is_empty() branch) so it references "cwd/env/stdin" (or "cwd, env, or
stdin") and explains they are host-only and not forwarded by the sandbox exec
protocol (keep the S210 code). Ensure the updated message still directs callers
to drop those overrides or use target: host and preserve the existing error flow
using the same return Err(...) in that branch.
- Around line 121-126: The Err branch in exec_bg (the Err((running, mut handle))
path) only calls ch.start_kill() which kills the direct child but can leave
forked descendants running; update this branch (after taking handle.child into
ch and before awaiting ch.wait()) to also tear down the whole process group for
the child PID (send a SIGKILL to the child's process group, e.g. kill(-pid,
SIGKILL) on Unix or equivalent) so rejected spawns cannot leak descendants; keep
the existing start_kill() and await ch.wait() but add the process-group kill
using the child's pid (referencing ch, handle, start_kill, wait, and the exec_bg
error path).

In `@shell/src/functions/fs_write.rs`:
- Around line 30-45: The loop uses backend.write(spec).await and accumulates
WriteFileResult into files and total, but on the first backend error it maps to
iii_sdk::IIIError and returns immediately, discarding partial results; update
the code so that when backend.write fails you return an error variant or
response that includes the accumulated files Vec<WriteFileResult> and total
bytes (e.g., extend IIIError or introduce a BatchWriteError containing files and
bytes_written) instead of dropping them; adjust the return type of the function
(or map_err path) to propagate the partial WriteResponse (files and
bytes_written) alongside the error so callers can see which files succeeded and
how many bytes were written.

In `@shell/src/functions/kill.rs`:
- Around line 25-39: Branch 1 currently calls child.start_kill() which only
kills the direct child; change it to use the same process-group kill path used
in the timeout/notify flow so descendant processes are also terminated. Locate
the branch that checks h.child.as_mut() and replace the start_kill() call with
the process-group kill helper used elsewhere in this module (the code path used
for timeout/notify), ensuring you still update h.record.status to
JobStatus::Killed, set finished_at_ms, and return the same KillResponse
structure.
- Around line 214-228: This test acquires HOST_SWEEP_TEST_GUARD but not
GAUGE_TEST_GUARD, allowing a race on the RUNNING_JOBS gauge when
spawn_host_job() reserves a slot; to fix, acquire GAUGE_TEST_GUARD before
HOST_SWEEP_TEST_GUARD at the top of this test (same ordering used in exec_bg.rs)
so the running-gauge guard serializes with jobs.rs assertions around
RUNNING_JOBS while calling spawn_host_job().

In `@shell/src/functions/types.rs`:
- Around line 270-285: ListRequest and ConfigStatusRequest are empty no-arg
request types that currently accept unknown JSON fields; add serde unknown-field
rejection to match the published empty-object schema by annotating both structs
(ListRequest and ConfigStatusRequest) with #[serde(deny_unknown_fields)] so
deserialization fails on extra fields, ensuring runtime behavior aligns with the
JsonSchema contract.

In `@shell/src/scode.rs`:
- Around line 75-92: scan_s_code currently only checks the byte before the match
and can accept "S211foo" or "S2119"; add a right-hand boundary check so the
3-digit S-code is not immediately followed by an ASCII alphanumeric or
underscore. In the loop in scan_s_code (using bytes and index i), after
confirming bytes[i..i+4] matches, compute a followed_by_word flag by testing
whether i + 4 < bytes.len() and if so whether bytes[i +
4].is_ascii_alphanumeric() || bytes[i + 4] == b'_', and only return
Some(&s[i..i+4]) when !preceded_by_word && !followed_by_word (treat
end-of-string as OK).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 513a10c3-22ad-409b-9b81-aaa780937836

📥 Commits

Reviewing files that changed from the base of the PR and between 1c75ee9 and 3e9f82a.

⛔ Files ignored due to path filters (1)
  • shell/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (51)
  • iii-permissions.yaml
  • shell/ARCHITECTURE.md
  • shell/Cargo.toml
  • shell/README.md
  • shell/config.collect.yaml
  • shell/config.yaml
  • shell/skills/SKILL.md
  • shell/src/config.rs
  • shell/src/configuration.rs
  • shell/src/exec/backend.rs
  • shell/src/exec/error.rs
  • shell/src/exec/host.rs
  • shell/src/exec/mod.rs
  • shell/src/exec/policy.rs
  • shell/src/exec/sandbox.rs
  • shell/src/exec_dispatch.rs
  • shell/src/fs/error.rs
  • shell/src/fs/host.rs
  • shell/src/fs/mod.rs
  • shell/src/fs/sandbox.rs
  • shell/src/functions/exec.rs
  • shell/src/functions/exec_bg.rs
  • shell/src/functions/fs_chmod.rs
  • shell/src/functions/fs_dispatch.rs
  • shell/src/functions/fs_grep.rs
  • shell/src/functions/fs_ls.rs
  • shell/src/functions/fs_mkdir.rs
  • shell/src/functions/fs_mv.rs
  • shell/src/functions/fs_read.rs
  • shell/src/functions/fs_rm.rs
  • shell/src/functions/fs_sed.rs
  • shell/src/functions/fs_stat.rs
  • shell/src/functions/fs_write.rs
  • shell/src/functions/kill.rs
  • shell/src/functions/status.rs
  • shell/src/functions/types.rs
  • shell/src/jobs.rs
  • shell/src/lib.rs
  • shell/src/main.rs
  • shell/src/manifest.rs
  • shell/src/scode.rs
  • shell/src/telemetry.rs
  • shell/tests/e2e/run-tests-jailed.sh
  • shell/tests/e2e/run-tests.sh
  • shell/tests/e2e/workers/harness/src/cases-fs-host.ts
  • shell/tests/e2e/workers/harness/src/cases-fs-sandbox.ts
  • shell/tests/function_handlers.rs
  • shell/tests/host_fs_branches.rs
  • shell/tests/jobs_lifecycle.rs
  • shell/tests/sandbox_dispatch.rs
  • shell/tests/sandbox_exec_dispatch.rs
💤 Files with no reviewable changes (2)
  • shell/src/manifest.rs
  • shell/src/functions/fs_dispatch.rs

Comment thread shell/config.yaml
Comment thread shell/README.md Outdated
Comment thread shell/README.md
Comment thread shell/skills/SKILL.md Outdated
Comment thread shell/src/config.rs
Comment thread shell/src/functions/fs_write.rs
Comment thread shell/src/functions/kill.rs
Comment thread shell/src/functions/kill.rs Outdated
Comment thread shell/src/functions/types.rs
Comment thread shell/src/scode.rs
… & harness S-code matching

Adds live, over-the-engine e2e coverage for the four v0.4.0 features (inline +
batch fs::write, exec/exec_bg stdin, separate max_bg_timeout_ms). Running it
surfaced two real issues, fixed here:

Fixes
- fs::write: a single-element `files: [x]` batch wrongly collapsed to the flat
  single-file response (empty `files`). `WriteRequest::into_specs` now reports
  `is_batch`, so the `files` form always returns the per-file `files` array even
  for one entry — a caller that sends `files` always gets `files` back.
- e2e harness: the v0.4.0 change that surfaces S-codes as the structured wire
  `error.code` (so agents branch on it) had silently broken ~43 harness
  assertions that substring-matched the S-code in the error MESSAGE. The SDK
  rejects with the raw `{ code, message, stacktrace }` body, so `expectError`
  now matches the pattern against `e.code` AND `e.message`. Updated the few
  cases that did their own message inspection, and repurposed
  pbreak_write_content_wrong_type (a string `content` is now valid inline).

New e2e cases
- cases-fs-write-inline.ts: inline string write + on-disk round-trip, multi-file
  batch with per-file results, and the host-only/ambiguity rejections (both
  forms, empty files, missing content, inline-on-sandbox -> S210).
- cases-exec-stdin.ts: stdin piped to `cat` on exec/exec_bg, closed-stdin
  default, stdin-on-sandbox -> S210.
- cases-jobs-bg-timeout.ts: bg job killed at max_bg_timeout_ms (with the hard-cap
  note, proving it survived past the foreground cap) + under-cap completion.
- config.yaml / config-jailed.yaml: max_bg_timeout_ms: 6000 (> max_timeout_ms).

Verified: full e2e suite 167/167 pass against the engine; 238 Rust lib tests
pass; cargo fmt/clippy clean; harness tsc --noEmit clean.
CI fix: the jailed harness suite read S-codes from error.message, but
v0.4.0 moved them to the structured error.code field — update both
jailed cases to read e.code (only the jailed suite runs in CI).

Security/correctness (CodeRabbit review):
- config: reject command paths when fs is unjailed — closes an RCE where
  an agent could shell::fs::write /tmp/ls then exec command:"/tmp/ls"
  (basename allowlisted) and bypass the read-only allowlist. Unjailed
  mode has no writable boundary, so any path could run planted bytes;
  require bare PATH-resolved names. Jailed mode unchanged.
- policy: reject LD_*/DYLD_* env keys by family prefix, not exact name.
- exec_bg/kill: SIGKILL the whole process group on cap-rejection and on
  kill branch 1 (both still own the un-reaped child, so the pid is safe);
  prevents forked descendants leaking after killed:true.
- fs::write: batch form now rejects top-level mode/parents instead of
  silently dropping them (mode/parents are now Option).
- fs::sed: skip symlink operands (rename would replace the link with a
  regular file, destroying it); per-file error, link + target preserved.
- scode: require a right-hand boundary so S2119/S211foo don't parse as S211.
- configuration: case-insensitive NOT_FOUND match (real code is
  function_not_found).

Docs: fix broken README fragment link, drop a duplicate chmod migration
bullet, clarify SKILL.md sandbox exec semantics (only host-only overrides
are rejected on a sandbox target, not all sandbox exec), name stdin in the
sandbox override-rejection messages.

Declined: finite max_bg_timeout_ms default (intentional/documented),
dangling-symlink S215 relaxation and mv RENAME_NOREPLACE (heavy/risky,
fail-closed), deny_unknown_fields on no-arg requests — the latter is
actively harmful: the engine injects _caller_worker_id into every payload,
so a strict struct rejects all calls. Added a regression case pinning that
shell::list tolerates extra fields.

E2e: +new cases for the unjailed RCE guard, batch-write mode/parents
rejection, sed symlink skip, and the list extra-field contract. Default
171/171, jailed 2/2.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
shell/src/fs/host.rs (1)

1259-1280: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Dangling symlink operands still fail before this skip path.

This only skips symlinks that survive validation. A broken link under host_root still gets rejected by the pre-validation loop at Line 1240, so one dangling operand aborts the entire sed call instead of producing the per-file "skipped" result this block is trying to return. If the intent is to make symlink operands non-fatal, sed needs the same kind of "allow dangling final component" confinement path before it reaches this branch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shell/src/fs/host.rs` around lines 1259 - 1280, The pre-validation loop
currently rejects dangling symlink operands before the later symlink-skip block;
change the pre-validation to accept a final component that is a symlink by using
std::fs::symlink_metadata (or by calling symlink_metadata when metadata fails)
and treating Ok(lmd) where lmd.file_type().is_symlink() as a valid operand
rather than a fatal error, so that the later branch that pushes an
FsSedFileResult with "operand is a symlink; skipped..." (the block that checks
std::fs::symlink_metadata(p) and calls results.push(...)) can handle the
dangling link per-file instead of aborting the entire sed call; keep existing
checks that still enforce confinement to host_root.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@shell/src/fs/host.rs`:
- Around line 1259-1280: The pre-validation loop currently rejects dangling
symlink operands before the later symlink-skip block; change the pre-validation
to accept a final component that is a symlink by using std::fs::symlink_metadata
(or by calling symlink_metadata when metadata fails) and treating Ok(lmd) where
lmd.file_type().is_symlink() as a valid operand rather than a fatal error, so
that the later branch that pushes an FsSedFileResult with "operand is a symlink;
skipped..." (the block that checks std::fs::symlink_metadata(p) and calls
results.push(...)) can handle the dangling link per-file instead of aborting the
entire sed call; keep existing checks that still enforce confinement to
host_root.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a339eb7d-d300-4ba2-bc56-42241bf29a24

📥 Commits

Reviewing files that changed from the base of the PR and between 5a89d63 and 7c261b4.

📒 Files selected for processing (18)
  • shell/README.md
  • shell/skills/SKILL.md
  • shell/src/config.rs
  • shell/src/configuration.rs
  • shell/src/exec/policy.rs
  • shell/src/exec/sandbox.rs
  • shell/src/fs/host.rs
  • shell/src/fs/mod.rs
  • shell/src/functions/exec_bg.rs
  • shell/src/functions/kill.rs
  • shell/src/functions/types.rs
  • shell/src/main.rs
  • shell/src/scode.rs
  • shell/tests/e2e/workers/harness/src/cases-fs-protocol-break.ts
  • shell/tests/e2e/workers/harness/src/cases-fs-write-inline.ts
  • shell/tests/e2e/workers/harness/src/cases-jobs-break.ts
  • shell/tests/e2e/workers/harness/src/cases-safety.ts
  • shell/tests/e2e/workers/harness/src/cases-vuln-repro-jailed.ts
🚧 Files skipped from review as they are similar to previous changes (9)
  • shell/tests/e2e/workers/harness/src/cases-fs-write-inline.ts
  • shell/src/functions/types.rs
  • shell/src/functions/exec_bg.rs
  • shell/skills/SKILL.md
  • shell/README.md
  • shell/src/exec/policy.rs
  • shell/src/configuration.rs
  • shell/src/main.rs
  • shell/src/exec/sandbox.rs

@andersonleal andersonleal merged commit 9f48415 into main Jun 9, 2026
13 of 15 checks passed
@andersonleal andersonleal deleted the feat/shell-sdk-bump-config-llm-api branch June 9, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants