diff --git a/coder/CHANGELOG.md b/coder/CHANGELOG.md new file mode 100644 index 00000000..81d72b50 --- /dev/null +++ b/coder/CHANGELOG.md @@ -0,0 +1,212 @@ +# Changelog + +## 0.5.0 + +Config source-of-truth moves to the `configuration` worker (the pattern +`database` and `storage` already use). `--config` becomes a first-register +seed, and the numeric tuning knobs hot-reload. + +### Changed — runtime config now lives in the `configuration` worker + +- **coder's config is sourced from the `configuration` worker under id + `coder`, not from a file read on every boot.** At startup coder registers its + JSON Schema, reads the live value via `configuration::get` (which env-expands + `${VAR}`), and binds a `configuration` trigger that re-fetches the + authoritative value on change (it never trusts the trigger payload). Persisted + values default to `./data/configuration/coder.yaml`; edit that file or call + `configuration::set id=coder` and the change propagates without re-reading the + seed file. +- **`--config ` is now a first-register SEED, not the runtime source of + truth.** When present and no value is yet stored for id `coder`, the file's + contents are passed as `initial_value` on `configuration::register`. After the + first register the stored value is authoritative — re-running with `--config` + does NOT overwrite it. `${VAR}` is expanded only when the seed file is read; + `configuration::get` values are already expanded and are never expanded twice. +- **`--config` no longer has a default and no longer auto-loads `./config.yaml` + on every boot.** With no seed and no stored value coder runs on the built-in + default jail (`base_paths` `["./", "/tmp"]`, **no** `non_accessible_globs`). + Migration: deployments that relied on an on-disk `./config.yaml` being + auto-loaded must pass it explicitly via `--config ./config.yaml` for the + first register (to preserve the shipped secret-glob protection), or set the + value once via `configuration::set id=coder`. + +### Changed — split reload policy (jail restart-only, knobs hot-reload) + +- **The security jail is restart-required; the numeric budgets/limits + hot-reload.** On a `configuration:updated` event coder re-fetches and, if the + change alters any JAIL field — `base_paths` (and legacy `base_path`), + `non_accessible_globs`, `default_exclude_globs` — the reload is **refused**: + the running jail is kept and coder logs `restart coder to apply` (the + `PathResolver` is built once at boot and is never swapped at runtime, the same + way `storage` refuses a topology change). When the jail signature is + unchanged, the config snapshot is swapped live and handlers read the current + snapshot on their next call. The hot-reloadable knobs are `max_read_bytes`, + `max_write_bytes`, `tree_default_depth`, `tree_per_folder_limit`, + `list_default_page_size`, `list_max_page_size`, `search_default_max_matches`, + `search_default_max_line_bytes`, `search_response_budget_bytes`, + `batch_read_budget_bytes`, and `max_output_bytes`. `coder::info` keeps + reporting the boot-time roots until restart. + +### Docs + +- New [`config.yaml.example`](config.yaml.example) documents the + seed-on-first-register contract, the built-in default, and the + configuration-worker source of truth. +- New [`config.collect.yaml`](config.collect.yaml) seeds a single `/tmp` root + so the registry-publish CI job can boot a throwaway coder and read back its + interface without depending on the runner's working directory. +- `config.yaml` header now states it is a SEED for `initial_value`, not the + runtime source of truth, and spells out the restart-vs-hot-reload split. +- README `## Configuration` documents the `configuration`-worker sourcing, + `--config` seed semantics, and the reload policy. + +## 0.4.1 + +Fix wave driven by a live harness session failure (session q8x6g248). + +### Fixed — silent file corruption via undefined capture references + +- **Replace ops now validate every `$` capture reference in `replacement` + against the compiled pattern's actual groups, pre-write.** The regex crate + expands undefined references to the EMPTY STRING: a replacement carrying a + JS/TS template literal (`` `Hello, ${name}!` ``) against a pattern with no + group `name` silently wrote `` `Hello, !` `` to disk with `success: true` — + twice in the live session. An undefined reference now fails the entry with + a per-entry `C210` (file byte-identical on disk, other files in the batch + still apply) that names the offending reference, states what the pattern + defines, and teaches the corrective rewrites: escape literal `$` as `$$` + (`$${name}` outputs a literal `${name}`) or add the capture group. The + validator tokenizes exactly like `Captures::expand` (`$$` escape, longest + `[0-9A-Za-z_]` run for unbraced refs — `$1a` names group "1a", not group 1 + — braced `${ref}`, literal `$` fallbacks), pinned by a validator-vs-expand + parity test. Valid rewrites (`$0`, `$1`, `$name`, `${1}a`) work unchanged. + +### Changed — multi-line replace echoes show the region's tail + +- **Replace-site echoes now show the FIRST and LAST line of the post-replace + region** (previously the first matched line only), with `elided` set to the + inner line count when the region spans more than 2 lines. Single-line + replacements are unchanged; sites stay capped at 5 and the ~4 KiB echo + budget still applies. In the production session the corrupted line sat in + the tail of a multi-line replacement and stayed invisible until a full + read — it is now visible directly in the mutation response. + +### Docs — request envelopes + replacement teaching + +- `coder::update-file`, `coder::create-file`, `coder::delete-file`, and + `coder::move` descriptions now open with their top-level request shape + (e.g. `{"files": [{"path": "...", "ops": [...]}]}`) — the live agent + guessed `{path, edits[]}` because no surface showed the envelope. +- The `replacement` schema field documents capture-reference expansion, the + `$$` escape, the JS/TS template-literal collision, and the `$1a` + longest-run gotcha. SKILL.md and README updated to match. + +## 0.4.0 + +### BREAKING + +- **Slim tree/list-folder wire shape**: per-node `path` is gone from + `coder::tree` nodes and `coder::list-folder` entries — nodes carry only + `name`. `coder::tree` now carries a top-level `path` (canonical absolute path + of the snapshotted folder; `coder::list-folder` already had one). Migration: + derive child paths by joining — + child path = parent path + `"/"` + `name`; the root node's path IS the + response's top-level `path` (do not join the root's own `name` onto it). + Derived paths re-validate through the jail on use. + +### Changed Defaults + +- **Noise excludes are ON by default** (`default_exclude_globs`, new config + key): `coder::tree` and `coder::search` now skip `**/.git/**`, + `**/node_modules/**`, `**/target/**`, `**/dist/**`, `**/.venv/**`, and + `**/__pycache__/**`. Excluded directories still appear in `coder::tree` as + childless stubs flagged `truncated` (reason `"default_exclude"`); excluded + files are omitted from `coder::search`. Hide-only — this grants no access + protection (that remains `non_accessible_globs`). Opt out per call with + `use_default_excludes: false`, or change the list via the + `default_exclude_globs` config key. `coder::info` reports the active globs. +- **Single-path full reads are budgeted** (`max_output_bytes`, new config key, + default 128 KiB): a `coder::read-file` full read whose converted content + would exceed the budget now fails with a `C213` that carries the file's size + and line count plus the recovery calls (window with `line_from`/`line_to`, + probe with `stat: true`, or raise via the per-call `max_output_bytes` + override, clamped to `max_read_bytes`). Windowed reads and batch mode are + not governed by this key. + +### New Features + +- **`coder::search` context lines** (`context_lines_before` / + `context_lines_after`, ≤10 each): content matches carry `before[]`/`after[]` + arrays of surrounding lines, enabling the 2-call edit workflow (search with + context → edit) with no file read in between. +- **`coder::search` response budget** (`search_response_budget_bytes`, default + 256 KiB): when the next match would exceed the budget the search stops + accumulating and sets `truncated: true` — it degrades, it never errors. +- **`coder::read-file` `stat: true` probe**: size/mode/mtime plus + `total_lines`/`is_utf8` without reading content. Budget-free in batch mode — + probe many files, then window only what you need. +- **`coder::read-file` `numbered: true`**: prefixes each content line with its + absolute 1-based file line number — the exact coordinates + `coder::update-file` line ops take. Prefix bytes count against byte budgets. +- **Replace op `dot_matches_newline`**: lets `.` span newlines so two short + anchors joined by `.*?` can replace a multi-line region without quoting it. +- **Replace op `expect_matches`**: pre-write guard — the op fails with a + per-entry `C210` (before anything is written) when the actual match count + differs. `expect_matches: 1` turns a silent multi-site clobber into a safe + error; `expect_matches: 0` asserts absence. +- **`coder::info`** now reports `default_exclude_globs`, `max_output_bytes`, + and `search_response_budget_bytes` alongside the existing caps. + +### Fixes + +- **`clip_line` UTF-8 boundary panic**: clipping a search result line to + `max_line_bytes` now floors the clip point to the nearest UTF-8 character + boundary instead of panicking when the cap fell inside a multi-byte + character. + +## 0.3.0 + +### BREAKING + +- **Multi-root jail** (`base_paths[]`): replaces the single `base_path` scalar. + Legacy `base_path` is still honored as a one-entry list; setting both keys is + a startup error (`C210`). Default roots: `["./", "/tmp"]`. +- **Absolute paths inside any root are now accepted** (previously rejected as + `C210`). Paths outside every allowed root still return `C215`. +- **All responses carry canonical absolute paths** — callers that compared + relative echo values must update. +- **Structured per-entry errors**: `error` fields are now JSON objects + `{"code":"C2xx","message":"..."}` instead of JSON strings containing escaped + JSON. LLM agents can branch on `error.code` directly without string parsing. +- **`coder::update-file` response shape**: the old `before`/`after` full-body + fields are gone. Each applied op now returns a bounded `OpEcho` + (`op_index`, `from_line`, `lines`, `elided`, `total_replacements`) with ±2 + context lines for line ops and up to 5 per-match-site echoes for regex + replace. An `echoes_truncated` flag is set when the budget runs out. +- **`coder::read-file` schema**: `path` and `paths[]` are now XOR fields + (`C210` when both or neither is set). Single-path response fields are + nullable; batch responses carry a top-level `results[]` array. + +### New Features + +- **`coder::info`** — pure discovery call: returns canonical allowed roots, + per-file byte caps, listing/search limits, and non-accessible glob patterns. + Agents that hit a path error should call this first to understand the jail + contract. +- **`coder::move`** — batched move/rename with per-entry `overwrite` and + `parents` flags. Same-root moves use `rename` (per-file atomic); cross-root + file moves use copy+delete with rollback; cross-root directory moves are + rejected with a clear `C210`. +- **`coder::read-file` line windows** (`line_from`/`line_to`): stream any slice + of a large file without reading past the byte cap. `more_lines` flags + remaining content; `total_lines` is set when the file was fully traversed. +- **`coder::read-file` batch mode** (`paths[]`): read multiple files or windows + in one call against a shared `batch_read_budget_bytes` cap (default 1 MiB). + Per-entry `C211`/`C213` errors leave other entries unaffected. +- **Prescriptive error messages** (C210/C211/C213/C215): every error names the + relevant config key or root list and includes a corrective-action sentence so + an LLM agent can repair its call without a human in the loop. The champion + case: agents repeatedly abandoned coder after a single opaque `C210` with no + hint about which config key was wrong or what values were legal. +- **Canonical request examples** in every function schema — wire-contract, + golden-tested. diff --git a/coder/Cargo.lock b/coder/Cargo.lock index a2a52453..8a72fb77 100644 --- a/coder/Cargo.lock +++ b/coder/Cargo.lock @@ -204,7 +204,7 @@ checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9" [[package]] name = "coder" -version = "0.1.1" +version = "0.5.0" dependencies = [ "anyhow", "clap", diff --git a/coder/Cargo.toml b/coder/Cargo.toml index 2eab1af3..40e905e3 100644 --- a/coder/Cargo.toml +++ b/coder/Cargo.toml @@ -2,7 +2,7 @@ [package] name = "coder" -version = "0.2.0" +version = "0.5.0" edition = "2021" publish = false diff --git a/coder/README.md b/coder/README.md index 879bbac5..a651ae6d 100644 --- a/coder/README.md +++ b/coder/README.md @@ -1,11 +1,11 @@ # coder A path-jailed code worker for iii agents. `coder::*` lets agents read, -search, edit, create, and delete files inside a single configured -`base_path` — without ever escaping it via `..`, absolute paths, or -symlinks. A glob-based `non_accessible` list keeps sensitive files -(`.env`, `*.pem`, anything under `secrets/`) visible to directory -listings but unreadable and unwritable +search, edit, create, and delete files inside one or more configured +allowed roots — without ever escaping them via `..` or symlinks. A +glob-based `non_accessible` list keeps sensitive files (`.env`, `*.pem`, +anything under `secrets/`) visible to directory listings but unreadable +and unwritable. ## Install @@ -89,13 +89,15 @@ async fn main() -> anyhow::Result<()> { | Function id | What it does | |---|---| -| `coder::read-file` | Read a single file (capped at `max_read_bytes`). | -| `coder::search` | Search file contents (literal/regex) and/or paths under `base_path`. | -| `coder::update-file` | Apply batched `insert` / `remove` / `update_lines` / regex `replace` ops across one or more files. Line ops bottom-up; atomic per file. | +| `coder::info` | Discover the jail: canonical allowed roots, per-file size caps, response budgets, listing/search limits, `default_exclude_globs`, and non-accessible glob patterns. Call first when unsure where coder may read or write. | +| `coder::read-file` | Read a single file: `stat: true` probe (size + `total_lines`, no content), a streamed `line_from`/`line_to` window (`numbered: true` prefixes absolute line numbers), or a full read budgeted by `max_output_bytes`. Batch mode: pass `paths[]` to read multiple files in one call against a shared `batch_read_budget_bytes` cap; batch `stat` probes are budget-free. | +| `coder::search` | Search file contents (literal/regex) and/or paths inside the allowed roots. `context_lines_before`/`context_lines_after` (≤10) attach surrounding lines per match; responses are bounded by `max_matches` and `search_response_budget_bytes` (`truncated: true`, never an error). Noise dirs are skipped by default (`default_exclude_globs`). | +| `coder::update-file` | Apply batched `insert` / `remove` / `update_lines` / regex `replace` (`dot_matches_newline`, `expect_matches`) ops across one or more files. Line ops bottom-up; per-file atomic. Each applied op returns a bounded post-apply echo. | | `coder::create-file` | Create one or more files with `overwrite` and `parents` flags. | | `coder::delete-file` | Remove one or more paths; `recursive: true` required for non-empty dirs. | -| `coder::list-folder` | Paginated single-folder listing; non-accessible entries flagged. | -| `coder::tree` | Recursive snapshot bounded by `max_depth` and `per_folder_limit`. | +| `coder::move` | Move or rename one or more paths; same-root moves are per-file atomic; cross-root file moves use copy+delete with rollback. | +| `coder::list-folder` | Paginated single-folder listing; entries carry `name` only — join onto the response's top-level `path`; non-accessible entries flagged. | +| `coder::tree` | Recursive snapshot bounded by `max_depth` and `per_folder_limit`. Nodes carry `name` only (child path = parent path + "/" + name; the root node's path is the response's top-level `path`). Noise dirs surface as childless `truncated` stubs; pass `use_default_excludes: false` to look inside. | ### `coder::update-file` semantics @@ -103,9 +105,25 @@ Line ops (`insert`, `remove`, `update_lines`) use **1-based inclusive** line numbers and are applied **bottom-up** (highest affected line first), so each op still references the original line numbers from the caller's perspective. Overlapping line ops are rejected (`C210`). -Regex `replace` ops run after line ops on the full file body. The -whole batch is committed via a sibling temp file + rename, so a failure -mid-write leaves the original file intact. +Regex `replace` ops run after line ops on the full file body; +`dot_matches_newline: true` lets `.` span newlines (multi-line regions +via two short anchors joined by `.*?`), and `expect_matches: N` is a +pre-write guard that fails the entry with `C210` when the actual match +count differs (`0` asserts absence). `replacement` expands capture +references (`$1`, `$name`, `${name}`); a literal `$` must be written +`$$` (JS/TS template literals in a replacement are the classic +collision: `` `Hello, $${name}!` `` outputs `` `Hello, ${name}!` ``), +and a reference to a group the pattern does not define fails pre-write +with `C210` — nothing is written. The whole batch is committed via a +sibling temp file + rename, so a failure mid-write leaves the original +file intact. + +On success, each applied line op echoes a bounded post-apply window +(±2 context lines). Regex replace ops return up to 5 per-match-site +echoes, each showing the first and last line of its post-replace +region (inner line count reported via `elided`). An +`echoes_truncated` flag is set when the budget is exhausted before all +echoes could be returned. ```jsonc { @@ -123,27 +141,84 @@ mid-write leaves the original file intact. ### Error codes -All errors return as JSON strings of the form `{"code":"C2xx","message":"..."}`. +All errors return as JSON objects of the form `{"code":"C2xx","message":"..."}`. | Code | Meaning | |---|---| -| `C210` | Bad input (malformed payload, illegal line numbers, overlapping ops, absolute path, …) | +| `C210` | Bad input (malformed payload, illegal line numbers, overlapping ops, `expect_matches` mismatch, undefined `$` capture reference in a replacement, conflicting config keys, …) | | `C211` | Path not found OR matches a `non_accessible_globs` entry | -| `C213` | File exceeds `max_read_bytes` or `max_write_bytes` | -| `C215` | Path escapes `base_path` lexically or through a symlink | +| `C213` | File exceeds `max_read_bytes` or `max_write_bytes`; full read exceeds `max_output_bytes` (carries size/lines/recovery); batch budget exhausted | +| `C215` | Path is outside every allowed root (lexically or through a symlink); names all roots + recovery | | `C216` | Underlying I/O error | -| `C217` | `coder::create-file` saw an existing file with `overwrite=false` | +| `C217` | `coder::create-file` or `coder::move` saw an existing file with `overwrite=false` | ## Configuration +As of 0.5.0 coder's runtime config lives in the **`configuration` worker** +under id **`coder`** (the same pattern `database` and `storage` use). At boot +coder registers its JSON Schema, reads the live value via `configuration::get` +(the configuration worker env-expands `${VAR}`), and binds a `configuration` +trigger so it re-fetches on change. Persisted values default to +`./data/configuration/coder.yaml` — edit that file directly or call +`configuration::set id=coder`; both propagate without re-reading the seed file. + +```bash +iii trigger configuration::get id=coder +iii trigger configuration::set id=coder value='{"base_paths":["/srv/project"],"max_read_bytes":20971520}' +``` + +### `--config` is a first-register seed + +Pass `--config ` to supply a YAML seed file. When present AND no value is +yet stored for id `coder`, its contents are passed as `initial_value` on +`configuration::register`. After that first register the stored value is +authoritative — re-running with `--config` does NOT overwrite it. With no seed +and no stored value coder runs on the built-in default jail (`base_paths` +`["./", "/tmp"]`, **no** `non_accessible_globs` — seed the shipped +[`config.yaml`](config.yaml) to keep secret-file protection). See +[`config.yaml.example`](config.yaml.example). `${VAR}` placeholders are expanded +only when the seed file is read; `configuration::get` values are already +expanded, so they are never expanded twice. + +### Reload policy + +coder splits its keys the way `storage` splits its topology — the security +**jail** is restart-only, the numeric tuning knobs hot-reload: + +- **JAIL fields — RESTART-REQUIRED**: `base_paths` (and legacy `base_path`), + `non_accessible_globs`, `default_exclude_globs`. These four are everything + the `PathResolver` compiles. On a `configuration:updated` event coder + re-fetches the authoritative value (it never trusts the trigger payload); if + the change alters any jail field it is **refused** — the running jail is kept + and coder logs `restart coder to apply`. The `PathResolver` is built once at + boot and is never rebuilt at runtime. `coder::info` keeps reporting the + boot-time roots until restart. +- **HOT-RELOADABLE — numeric budgets/limits**: `max_read_bytes`, + `max_write_bytes`, `tree_default_depth`, `tree_per_folder_limit`, + `list_default_page_size`, `list_max_page_size`, `search_default_max_matches`, + `search_default_max_line_bytes`, `search_response_budget_bytes`, + `batch_read_budget_bytes`, `max_output_bytes`. When the jail signature is + unchanged, coder swaps the config snapshot live and handlers read the current + snapshot on their next call. Invalid configs are rejected and the previous + snapshot is kept. + +### Value shape + ```yaml -base_path: ./ # root every coder::* call is scoped under +base_paths: ["./", "/tmp"] # allowed roots; first entry is the primary root non_accessible_globs: # listable but unreadable/unwritable - "**/.env" - "**/.env.*" - "**/*.pem" - "**/*.key" - "**/secrets/**" +default_exclude_globs: # noise filter for tree/search (hide-only) + - "**/.git/**" + - "**/node_modules/**" + - "**/target/**" + - "**/dist/**" + - "**/.venv/**" + - "**/__pycache__/**" max_read_bytes: 10485760 # per-file read cap (10 MiB) max_write_bytes: 10485760 # per-file create/update cap (10 MiB) tree_default_depth: 4 # coder::tree depth when unset @@ -152,26 +227,72 @@ list_default_page_size: 100 # coder::list-folder default page s list_max_page_size: 1000 # hard cap on coder::list-folder page_size search_default_max_matches: 1000 # coder::search match cap search_default_max_line_bytes: 4096 # per-line cap when scanning content +search_response_budget_bytes: 262144 # byte budget per search response (256 KiB) +batch_read_budget_bytes: 1048576 # aggregate cap for paths[] batch reads (1 MiB) +max_output_bytes: 131072 # budget for single-path FULL reads (128 KiB) ``` +`base_paths` is a list of allowed roots. The first entry is the +**primary root**: relative wire paths resolve against it. Absolute wire +paths are accepted when they canonicalize inside any listed root; outside +all roots → `C215`. Default when neither `base_paths` nor the legacy +`base_path` is set: `["./", "/tmp"]`. Legacy `base_path: ` is still +honored as a one-entry list; setting both keys is a startup error +(`C210`). + `non_accessible_globs` uses the same syntax as the `globset` crate (so `**/`, `*`, `?`, character classes, …). Matching is done against the -*relative path* from `base_path`, so `**/.env` blocks `.env`, -`a/.env`, and `a/b/.env`. +path *relative to its containing root*, so `**/.env` blocks `.env`, +`a/.env`, and `a/b/.env` in every allowed root. + +`default_exclude_globs` (same syntax and root-relative matching as +`non_accessible_globs`) is a **hide-only** noise filter: `coder::tree` +skips descent into matching directories (they appear as childless nodes +flagged `truncated` with reason `"default_exclude"`) and `coder::search` +omits matching paths. It grants no access protection — that remains +`non_accessible_globs`. Callers opt out per call with +`use_default_excludes: false`. Default: `.git`, `node_modules`, +`target`, `dist`, `.venv`, `__pycache__`. + +`search_response_budget_bytes` bounds one `coder::search` response in +converted wire bytes (path + matched text + context lines). When the +next match would exceed the budget the search stops accumulating and +sets `truncated: true` — it degrades, it never errors. Default 256 KiB. + +`max_output_bytes` budgets single-path **full** reads in +`coder::read-file` (bytes of returned content after UTF-8 sanitization, +numbered prefixes included). Oversize full reads fail with a `C213` +carrying the file's size, line count, and recovery calls; callers can +raise it per call via the `max_output_bytes` request field (clamped to +`max_read_bytes`) or switch to `line_from`/`line_to` windows. Windowed +reads and batch mode (`batch_read_budget_bytes`) are not governed by +this key. Default 128 KiB. + +## Instrumentation + +Count C2xx errors by code and function over time with `scripts/error-frequency.py` — accepts session export `.md` files or queries the live engine directly. See the script header for usage and the baseline (session vqrfg31f, 3× C210 → tool abandonment, pre-0.3.0). ## Security boundary -- `base_path` is canonicalised at startup; the worker refuses to start - if it can't be reached. -- Every wire path must be **relative** to `base_path`; absolute paths - return `C210` rather than being silently re-jailed. +- Each allowed root is canonicalized at startup. Unreachable roots are + skipped with a warning; if zero roots remain the worker refuses to + start. The final canonical root list is logged at startup. +- Every wire path must be **relative** (resolves against the primary + root and must stay inside it → `C215`) or **absolute inside an + allowed root** (accepted as of 0.3.0 → `C215` if outside every root). - `..` and symlinks are resolved against the longest existing ancestor - and rejected if they leave `base_path` (`C215`). Dangling symlinks - in the tail are also rejected because the kernel would otherwise - follow them on the next syscall. + and rejected if they leave the containing root (`C215`). Dangling + symlinks in the tail are also rejected. - Non-accessible globs apply to reads as well as writes — the same glob hides the file from `coder::read-file`, `coder::update-file`, `coder::create-file`, `coder::delete-file`, and from `coder::search`'s content/path matches. - Recursive `coder::delete-file` refuses to descend through a subtree - that contains a non-accessible entry rather than removing it. + that contains a non-accessible entry. +- All responses carry canonical absolute paths so multi-root results + are unambiguous. +- **`/tmp` posture**: `/tmp` is world-writable and shared. It is + included in the default `base_paths` so agent tasks targeting `/tmp` + work out of the box on a trusted bus. Operators on multi-tenant hosts + should remove it and configure only the project root(s) they intend + to expose. diff --git a/coder/config.collect.yaml b/coder/config.collect.yaml new file mode 100644 index 00000000..3cfa9aa4 --- /dev/null +++ b/coder/config.collect.yaml @@ -0,0 +1,21 @@ +# Interface-collection config for the registry publish workflow. +# +# The publish job (.github/workflows/_publish-registry.yml) boots a throwaway +# copy of this worker purely to read back the functions and trigger types it +# registers with the engine — the published "interface". That interface is +# static: it does NOT depend on which roots coder is jailed to. +# +# The shipped config.yaml jails coder to ["./", "/tmp"]. On a fresh CI runner +# the checkout root happens to exist, but the publish job should not depend on +# the layout of whatever directory it runs in — coder refuses to start if zero +# roots canonicalize, and we do not want interface collection to hinge on the +# runner's working directory. +# +# This config jails the throwaway worker to "/tmp" only — a directory that +# always exists on every CI runner. The PathResolver canonicalizes it +# instantly, the worker registers its 9 functions plus the configuration +# trigger, interface collection completes, and the publish step runs normally. +# No file is ever read or written during collection, so a minimal single-root +# seed is sufficient and safe. + +base_paths: ["/tmp"] diff --git a/coder/config.yaml b/coder/config.yaml index a545ca34..eefefc6b 100644 --- a/coder/config.yaml +++ b/coder/config.yaml @@ -1,12 +1,53 @@ -# Root directory the worker operates inside. Every path argument in the -# coder::* API is resolved relative to this directory; absolute paths and -# `..` segments that would escape are rejected with C215. Required; the -# worker refuses to start if this can't be canonicalized. -base_path: ./ - -# Glob patterns (matched against the path relative to base_path) for files -# the worker will refuse to read, modify, or delete. Matching entries are -# still visible to coder::list-folder and coder::tree with a +# SEED FILE — NOT the runtime source of truth. +# +# As of 0.5.0 coder's runtime config lives in the `configuration` worker +# under id "coder". This file is only a SEED: when coder is launched with +# `--config ` AND no value is yet stored for id "coder", its +# contents are passed as `initial_value` on `configuration::register` (the +# configuration worker env-expands ${VAR}). After that first register the +# stored value is authoritative; edit it with `configuration::set id=coder` +# (or by editing the configuration worker's persisted file) and coder picks +# up the change without re-reading this file. Re-running with `--config` +# does NOT overwrite an already-stored value. +# +# RELOAD POLICY (see README "Configuration"): the JAIL fields below +# (base_paths / base_path, non_accessible_globs, default_exclude_globs) are +# RESTART-REQUIRED — a live config change that alters any of them is refused +# and the running jail is kept (log: "restart coder to apply"). The numeric +# budgets/limits below HOT-RELOAD on `configuration:updated`. +# +# See config.yaml.example for the seed-on-first-register contract. + +# Allowed root directories the worker operates inside. The FIRST entry is +# the primary root: relative wire paths resolve against it. Absolute wire +# paths are accepted when they canonicalize inside ANY listed root; a path +# outside every root is rejected with C215 naming the roots. +# +# Legacy single-root form `base_path: ` is still honored as a +# one-entry list. Setting BOTH `base_path` and `base_paths` is a startup +# error (C210); keep only `base_paths`. +# +# Default when neither key is set: ["./", "/tmp"] +# +# Startup behavior: +# - Each root is canonicalized; unreachable roots are skipped with a +# warning logged at startup. +# - If zero roots remain after skipping, the worker refuses to start. +# - Duplicate roots (post-canonicalization) are deduplicated with a +# warning. +# - The final canonical root list is logged at startup. +# +# SECURITY POSTURE — /tmp in the default list: +# /tmp is a world-writable shared directory. It is included in the +# default so agent tasks that target /tmp work out of the box on a +# trusted bus. Operators on multi-tenant hosts should remove /tmp from +# base_paths and configure only the project root(s) they intend to +# expose. +base_paths: ["./", "/tmp"] + +# Glob patterns (matched against the path relative to its containing root) +# for files the worker will refuse to read, modify, or delete. Matching +# entries are still visible to coder::list-folder and coder::tree with a # `non_accessible: true` flag so callers can see they exist. non_accessible_globs: - "**/.env" @@ -15,6 +56,21 @@ non_accessible_globs: - "**/*.key" - "**/secrets/**" +# Noise-exclusion globs (same syntax and root-relative matching as +# non_accessible_globs). coder::tree skips descent into matching directories +# (they appear as childless nodes flagged `truncated` with reason +# "default_exclude") and coder::search omits matching paths. HIDE-ONLY — this +# grants no access protection; that remains non_accessible_globs. Callers can +# opt out per call with `use_default_excludes: false`; coder::info reports +# the active list. +default_exclude_globs: + - "**/.git/**" + - "**/node_modules/**" + - "**/target/**" + - "**/dist/**" + - "**/.venv/**" + - "**/__pycache__/**" + # Per-file caps. read returns C213 when a file exceeds max_read_bytes; # create/update reject payloads larger than max_write_bytes. max_read_bytes: 10485760 @@ -34,3 +90,36 @@ list_max_page_size: 1000 # coder::search caps. truncated:true is set whenever either cap is hit. search_default_max_matches: 1000 search_default_max_line_bytes: 4096 + +# Aggregate byte budget for one coder::search response, measured in CONVERTED +# WIRE BYTES at accumulation time — the strings that will actually be +# serialized (path + matched text + context lines for content matches; path +# for path matches). When the next match would exceed the budget the search +# stops accumulating and sets truncated:true — it degrades, it never errors. +# Default: 256 KiB (262_144 bytes). +search_response_budget_bytes: 262144 + +# Aggregate budget for a single paths[] batch call to coder::read-file, +# measured in BYTES OF RETURNED CONTENT (after UTF-8 sanitization: invalid +# bytes expand to 3-byte U+FFFD replacements BEFORE they are counted, so the +# cap bounds what the caller actually receives — binary files can never +# deliver more than this budget). Entries are collected in request order; +# each entry may consume up to min(remaining_budget, max_read_bytes). An +# entry reached with zero remaining budget gets a per-entry C213 naming this +# key, its value, and the bytes already consumed, plus recovery guidance +# (use line_from/line_to windows, fewer entries, or raise this cap). +# Single-path FULL reads are budgeted by max_output_bytes instead and are +# unaffected by this cap. +# Default: 1 MiB (1_048_576 bytes). +batch_read_budget_bytes: 1048576 + +# Budget for single-path FULL reads in coder::read-file (no line_from/ +# line_to, stat:false), measured in BYTES OF RETURNED CONTENT after UTF-8 +# sanitization, numbered prefixes included — the same accounting unit as +# batch_read_budget_bytes. A full read whose converted content would exceed +# this budget fails with a C213 reporting the file's size and line count and +# naming the recovery calls (line_from/line_to window, stat:true probe, or a +# per-call max_output_bytes raise, clamped to max_read_bytes). Windowed reads +# and batch mode are NOT governed by this key. +# Default: 128 KiB (131_072 bytes). +max_output_bytes: 131072 diff --git a/coder/config.yaml.example b/coder/config.yaml.example new file mode 100644 index 00000000..760f9a68 --- /dev/null +++ b/coder/config.yaml.example @@ -0,0 +1,64 @@ +# Optional seed file for first-time registration (--config ./config.yaml.example). +# When omitted, the worker seeds the built-in default jail: +# +# base_paths: ["./", "/tmp"] # effective default — empty in CoderConfig, +# # materialized by the PathResolver +# (no non_accessible_globs, no default_exclude_globs, and the numeric +# budget/limit defaults baked into the binary) +# +# NOTE: the built-in default has NO non_accessible_globs, so a default-seeded +# coder gives NO secret-file protection. Seed the shipped config.yaml (which +# lists .env / *.pem / *.key / secrets/**) to keep that posture. +# +# After the first boot, the runtime source of truth is the `configuration` +# worker entry `coder`, persisted at ./data/configuration/coder.yaml. Edit +# that file directly or call `configuration::set id=coder` — both propagate +# without re-reading this seed file. Re-running with `--config` does NOT +# overwrite an already-stored value. +# +# Env placeholders (${VAR}) in THIS seed file are expanded once, when the +# file is read for registration. Values returned by `configuration::get` are +# already env-expanded by the configuration worker, so they are never +# expanded twice. +# +# RELOAD POLICY (see README "Configuration"): +# - JAIL fields (base_paths / base_path, non_accessible_globs, +# default_exclude_globs) are RESTART-REQUIRED. A live change that alters +# any of them is refused; the running jail is kept (log: "restart coder +# to apply"). The PathResolver is built once at boot and never swapped. +# - The numeric budgets/limits HOT-RELOAD on `configuration:updated`. + +# Allowed roots. First entry is the primary root (relative wire paths resolve +# against it). RESTART-REQUIRED. +base_paths: ["./", "/tmp"] + +# Listable but unreadable/unwritable. RESTART-REQUIRED. +non_accessible_globs: + - "**/.env" + - "**/.env.*" + - "**/*.pem" + - "**/*.key" + - "**/secrets/**" + +# Hide-only noise filter for tree/search (no access protection). +# RESTART-REQUIRED. +default_exclude_globs: + - "**/.git/**" + - "**/node_modules/**" + - "**/target/**" + - "**/dist/**" + - "**/.venv/**" + - "**/__pycache__/**" + +# Numeric tuning knobs below — these HOT-RELOAD on configuration:updated. +max_read_bytes: 10485760 # per-file read cap (10 MiB) +max_write_bytes: 10485760 # per-file create/update cap (10 MiB) +tree_default_depth: 4 # coder::tree depth when unset +tree_per_folder_limit: 50 # children before tree truncates a folder +list_default_page_size: 100 # coder::list-folder default page size +list_max_page_size: 1000 # hard cap on coder::list-folder page_size +search_default_max_matches: 1000 # coder::search match cap +search_default_max_line_bytes: 4096 # per-line cap when scanning content +search_response_budget_bytes: 262144 # byte budget per search response (256 KiB) +batch_read_budget_bytes: 1048576 # aggregate cap for paths[] batch reads (1 MiB) +max_output_bytes: 131072 # budget for single-path FULL reads (128 KiB) diff --git a/coder/iii.worker.yaml b/coder/iii.worker.yaml index d1f1f94f..d73ec341 100644 --- a/coder/iii.worker.yaml +++ b/coder/iii.worker.yaml @@ -4,7 +4,7 @@ language: rust deploy: binary manifest: Cargo.toml bin: coder -description: Path-jailed code worker — read/search/update/create/delete files plus paginated list-folder and tree, with non-accessible glob protection. +description: Path-jailed code worker — info/read/search/update/create/delete/move files plus paginated list-folder and tree, with non-accessible glob protection, default noise excludes (.git, node_modules, … — hide-only, per-call opt-out), and token-bounded read/search response budgets. targets: - x86_64-apple-darwin diff --git a/coder/scripts/error-frequency.py b/coder/scripts/error-frequency.py new file mode 100755 index 00000000..2cc58226 --- /dev/null +++ b/coder/scripts/error-frequency.py @@ -0,0 +1,518 @@ +#!/usr/bin/env python3 +""" +coder error-frequency instrumentation — counts C2xx errors by (code, function_id). + +PURPOSE +------- +Continuous signal that coder 0.3.0 DX fixes hold. The target trend: C210 +rate on coder calls approaches zero for post-0.3.0 sessions; C211/C213 are +expected-recoverable and some recurrence is normal. + +BASELINE (pre-0.3.0) +-------------------- +Session vqrfg31f (2026-06-09): agent's first 3 coder::create-file calls all +returned C210 ("path must be relative to base_path: /tmp/..."). The agent +abandoned the tool entirely and fell back to shell::exec to write files. +Root cause: no absolute-path support and no prescriptive guidance — the error +gave no recovery instruction. Fixed in 0.3.0 (absolute paths now accepted when +inside an allowed root; C215 names all roots; C210 is structured + prescriptive). + +ERROR SHAPES HANDLED +-------------------- +Structured (0.3.0+): + entry.error = {"code": "C210", "message": "..."} # dict already parsed + entry.error = '{"code":"C210","message":"..."}' # JSON string + +Legacy (pre-0.3.0): + entry.error = "path must be relative to base_path: ..." # bare string, no code + +SUCCESS METRIC +-------------- + post-0.3.0 C210 rate on coder:: calls → 0 + C211 / C213 expected-recoverable; watch for spikes + +USAGE +----- +# Live engine (requires `iii` CLI + running engine): + python3 scripts/error-frequency.py --live [--sessions N] + +# Session export markdown files: + python3 scripts/error-frequency.py ~/Downloads/iii-session-*.md + +# Both: + python3 scripts/error-frequency.py --live ~/Downloads/iii-session-*.md + +# Built-in self-test (no external deps): + python3 scripts/error-frequency.py --self-test + +# Filter to one session via live engine: + python3 scripts/error-frequency.py --live --session-id console-vqrfg31f4zemq74s5ia + +LIMITATIONS +----------- +Live-mode attribution uses the last-seen coder function from the preceding +assistant message — session-tree function_result parts carry no call_id — so +two different coder::* functions dispatched in the same turn may be +misattributed to each other. Counts per code remain correct either way. + +SESSION-TREE LIVE-MODE RECIPE (for future automation) +------------------------------------------------------ + iii trigger session-tree::list + iii trigger session-tree::messages --json '{"session_id":""}' +Messages are returned as [{entry_id, message: {content: [{text, type}]}}]. +The `text` field is a JSON string; parse it to reach .results[].error. +The session-tree worker is "session" in the engine's worker list. +""" + +from __future__ import annotations + +import argparse +import json +import re +import subprocess +import sys +from collections import defaultdict +from typing import Iterator + +# --------------------------------------------------------------------------- +# Error extraction helpers +# --------------------------------------------------------------------------- + +_C2XX_RE = re.compile(r'"C2\d{2}"') + + +def _parse_error(raw_error) -> tuple[str, str]: + """Return (code, message) from any error shape coder has ever emitted.""" + if raw_error is None: + return ("?", "") + + # Already a dict — structured 0.3.0 shape + if isinstance(raw_error, dict): + return (str(raw_error.get("code", "?")), str(raw_error.get("message", ""))) + + if not isinstance(raw_error, str): + return ("?", str(raw_error)) + + s = raw_error.strip() + if not s: + return ("?", "") + + # JSON string containing structured error + if s.startswith("{"): + try: + obj = json.loads(s) + if isinstance(obj, dict) and "code" in obj: + return (str(obj["code"]), str(obj.get("message", ""))) + except json.JSONDecodeError: + pass + + # Legacy bare string — classify as LEGACY + return ("LEGACY", s[:120]) + + +def _extract_from_results(results, function_id: str) -> list[dict]: + """Walk results[] or files[] arrays; yield {code, message, function_id}.""" + hits = [] + if not isinstance(results, list): + return hits + for entry in results: + if not isinstance(entry, dict): + continue + err = entry.get("error") or entry.get("err") + if not err: + continue + code, msg = _parse_error(err) + if code.startswith("C2") or code == "LEGACY": + hits.append({"code": code, "message": msg, "function_id": function_id}) + return hits + + +def _try_parse_text(text: str) -> dict | list | None: + """Best-effort: parse a possibly-double-encoded JSON text field.""" + if not isinstance(text, str) or not text.strip(): + return None + t = text.strip() + try: + obj = json.loads(t) + if isinstance(obj, str): + # Double-encoded — parse once more + try: + obj = json.loads(obj) + except json.JSONDecodeError: + pass + return obj + except json.JSONDecodeError: + return None + + +# --------------------------------------------------------------------------- +# Source: live engine via `iii trigger` +# --------------------------------------------------------------------------- + +def _iii(*args: str, timeout: int = 30) -> dict | list | None: + """Run `iii trigger ` and return parsed JSON, or None on failure.""" + cmd = ["iii", "trigger", *args] + try: + result = subprocess.run( + cmd, capture_output=True, text=True, timeout=timeout + ) + except (FileNotFoundError, subprocess.TimeoutExpired) as exc: + print(f"[warn] iii CLI unavailable: {exc}", file=sys.stderr) + return None + + if result.returncode != 0: + print(f"[warn] iii trigger {args[0]} failed: {result.stderr[:200]}", file=sys.stderr) + return None + + try: + return json.loads(result.stdout) + except json.JSONDecodeError: + print(f"[warn] unparseable iii output for {args[0]}", file=sys.stderr) + return None + + +def _live_sessions(limit: int | None = None) -> list[dict]: + data = _iii("session-tree::list") + if not data or not isinstance(data, dict): + return [] + sessions = data.get("sessions", []) + if limit: + sessions = sessions[:limit] + return sessions + + +def _live_messages(session_id: str) -> list[dict]: + data = _iii("session-tree::messages", "--json", json.dumps({"session_id": session_id})) + if not data or not isinstance(data, dict): + return [] + return data.get("messages", []) + + +def _hits_from_live_messages(messages: list[dict], session_id: str) -> Iterator[dict]: + """Extract C2xx coder errors from a session-tree::messages response. + + Message ordering: assistant messages carry function_call parts that name the + target function (e.g. coder::create-file). The immediately following + function_result message carries the output. We build a pending-call map so + result messages can be attributed to the correct function_id. + """ + # Map: call_id → target function_id, built from assistant messages + call_to_fn: dict[str, str] = {} + # Most recently dispatched coder function (fallback when call_id not in map) + last_coder_fn: str = "coder::?" + + for m in messages: + msg = m.get("message", {}) + role = msg.get("role", "") + content = msg.get("content", []) + if not isinstance(content, list): + continue + + if role == "assistant": + # Record function_call targets for future result attribution + for part in content: + if not isinstance(part, dict): + continue + if part.get("type") != "function_call": + continue + call_id = part.get("id", "") + target = part.get("arguments", {}).get("function", "") + if call_id: + call_to_fn[call_id] = target + if target.startswith("coder::"): + last_coder_fn = target + + elif role == "function_result": + for part in content: + if not isinstance(part, dict): + continue + if part.get("type") != "text": + continue + obj = _try_parse_text(part.get("text", "")) + if not isinstance(obj, dict): + continue + + # Infer which coder function produced this result. + # function_result messages in session-tree don't carry a call_id + # at the part level; use last_coder_fn as the best attribution. + fn_id = last_coder_fn + + for key in ("results", "files"): + for h in _extract_from_results(obj.get(key, []), fn_id): + h["session_id"] = session_id + yield h + + +def scan_live(session_ids: list[str] | None = None, limit: int | None = None) -> list[dict]: + """Query live engine; return list of {code, function_id, session_id} dicts.""" + if session_ids: + sessions = [{"session_id": sid} for sid in session_ids] + else: + sessions = _live_sessions(limit=limit) + + if not sessions: + print("[warn] no sessions returned from live engine", file=sys.stderr) + return [] + + all_hits = [] + for s in sessions: + sid = s.get("session_id", "") + messages = _live_messages(sid) + for h in _hits_from_live_messages(messages, sid): + all_hits.append(h) + + return all_hits + + +# --------------------------------------------------------------------------- +# Source: session export markdown files +# --------------------------------------------------------------------------- + +_TOOL_CALL_RE = re.compile(r"^##\s+Tool call\s+[—–-]\s+([\w::-]+)", re.MULTILINE) +_OUTPUT_BLOCK_RE = re.compile( + r"\*\*Output:\*\*\s*```(?:json)?\s*([\s\S]*?)```", re.MULTILINE +) + + +def _parse_markdown(path: str) -> Iterator[dict]: + """Yield {code, message, function_id, source_file} from a session export .md.""" + try: + with open(path, encoding="utf-8") as f: + text = f.read() + except OSError as exc: + print(f"[warn] cannot read {path}: {exc}", file=sys.stderr) + return + + # Split into sections by tool call headings; pair each with its output block + sections = list(_TOOL_CALL_RE.finditer(text)) + for i, match in enumerate(sections): + fn_id = match.group(1) + if not fn_id.startswith("coder::"): + continue + + # Region: from end of this heading to start of next (or EOF) + start = match.end() + end = sections[i + 1].start() if i + 1 < len(sections) else len(text) + section_text = text[start:end] + + for out_match in _OUTPUT_BLOCK_RE.finditer(section_text): + block = out_match.group(1).strip() + obj = _try_parse_text(block) + if not isinstance(obj, dict): + continue + + # Output block may be the outer {content:[{text:"JSON"}]} envelope + # or already the inner {results:[]} dict — handle both + inner = obj + content = obj.get("content") + if isinstance(content, list): + for item in content: + if isinstance(item, dict) and item.get("type") == "text": + parsed = _try_parse_text(item.get("text", "")) + if isinstance(parsed, dict): + inner = parsed + break + + # Prefer explicit .details over inner (avoids double-counting when + # the outer envelope has both content[].text and .details keys). + # Fall back to inner dict if no .details key is present. + details = obj.get("details") + if isinstance(details, dict): + source = details + else: + source = inner + + for key in ("results", "files"): + for h in _extract_from_results(source.get(key, []), fn_id): + h["source_file"] = path + yield h + + +def scan_files(paths: list[str]) -> list[dict]: + hits = [] + for p in paths: + for h in _parse_markdown(p): + hits.append(h) + return hits + + +# --------------------------------------------------------------------------- +# Reporting +# --------------------------------------------------------------------------- + +def _tabulate(hits: list[dict], source_label: str) -> None: + if not hits: + print(f"\n[{source_label}] no C2xx errors found\n") + return + + counts: dict[tuple[str, str], int] = defaultdict(int) + for h in hits: + counts[(h["code"], h["function_id"])] += 1 + + total = sum(counts.values()) + print(f"\n[{source_label}] {total} C2xx error(s) across {len(hits)} entries") + print(f" {'CODE':<8} {'FUNCTION':<30} COUNT") + print(f" {'-'*8} {'-'*30} -----") + for (code, fn), cnt in sorted(counts.items(), key=lambda x: (-x[1], x[0])): + print(f" {code:<8} {fn:<30} {cnt}") + print() + + +# --------------------------------------------------------------------------- +# Self-test +# --------------------------------------------------------------------------- + +SELF_TEST_MD = """\ +# Session: self-test + +- ID: `selftest` + +--- +## Tool call — coder::create-file +**Input:** +```json +{"files": [{"path": "/tmp/x", "content": "hi"}]} +``` +**Output:** +```json +{ + "content": [ + { + "text": "{\\"results\\":[{\\"bytes_written\\":0,\\"error\\":\\"{\\\\\\"code\\\\\\":\\\\\\"C210\\\\\\",\\\\\\"message\\\\\\":\\\\\\"path must be relative to base_path: /tmp/x\\\\\\"}\\",\\"path\\":\\"/tmp/x\\",\\"success\\":false},{\\"bytes_written\\":0,\\"error\\":\\"{\\\\\\"code\\\\\\":\\\\\\"C210\\\\\\",\\\\\\"message\\\\\\":\\\\\\"path must be relative to base_path: /tmp/y\\\\\\"}\\",\\"path\\":\\"/tmp/y\\",\\"success\\":false},{\\"bytes_written\\":0,\\"error\\":\\"{\\\\\\"code\\\\\\":\\\\\\"C210\\\\\\",\\\\\\"message\\\\\\":\\\\\\"path must be relative to base_path: /tmp/z\\\\\\"}\\",\\"path\\":\\"/tmp/z\\",\\"success\\":false}]}", + "type": "text" + } + ], + "details": { + "results": [ + {"bytes_written": 0, "error": "{\\"code\\":\\"C210\\",\\"message\\":\\"path must be relative to base_path: /tmp/x\\"}", "path": "/tmp/x", "success": false}, + {"bytes_written": 0, "error": "{\\"code\\":\\"C210\\",\\"message\\":\\"path must be relative to base_path: /tmp/y\\"}", "path": "/tmp/y", "success": false}, + {"bytes_written": 0, "error": "{\\"code\\":\\"C210\\",\\"message\\":\\"path must be relative to base_path: /tmp/z\\"}", "path": "/tmp/z", "success": false} + ] + }, + "terminate": false +} +``` + +## Tool call — coder::read-file +**Input:** +```json +{"path": "secret.pem"} +``` +**Output:** +```json +{ + "details": { + "results": [ + {"error": {"code": "C211", "message": "non_accessible: secret.pem"}, "path": "secret.pem", "success": false} + ] + } +} +``` +""" + + +def self_test() -> int: + import tempfile, os + + with tempfile.NamedTemporaryFile( + mode="w", suffix=".md", delete=False, encoding="utf-8" + ) as f: + f.write(SELF_TEST_MD) + tmp_path = f.name + + try: + hits = scan_files([tmp_path]) + finally: + os.unlink(tmp_path) + + c210 = [h for h in hits if h["code"] == "C210" and h["function_id"] == "coder::create-file"] + c211 = [h for h in hits if h["code"] == "C211" and h["function_id"] == "coder::read-file"] + + ok = True + if len(c210) != 3: + print(f"FAIL: expected 3 C210 on coder::create-file, got {len(c210)}") + print(" hits:", json.dumps(hits, indent=2)) + ok = False + if len(c211) != 1: + print(f"FAIL: expected 1 C211 on coder::read-file, got {len(c211)}") + ok = False + + if ok: + print("PASS: self-test — 3x C210 coder::create-file + 1x C211 coder::read-file detected") + _tabulate(hits, "self-test") + return 0 if ok else 1 + + +# --------------------------------------------------------------------------- +# CLI +# --------------------------------------------------------------------------- + +def main() -> int: + parser = argparse.ArgumentParser( + description="Count coder C2xx errors by (code, function_id) from session exports or live engine.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=__doc__.split("USAGE")[1].split("SESSION-TREE")[0] if "USAGE" in __doc__ else "", + ) + parser.add_argument( + "files", + nargs="*", + metavar="FILE", + help="Session export markdown file(s) (.md) — output of iii session export", + ) + parser.add_argument( + "--live", + action="store_true", + help="Query live engine via `iii trigger session-tree::*`", + ) + parser.add_argument( + "--sessions", + type=int, + default=20, + metavar="N", + help="Max sessions to scan from live engine (default: 20)", + ) + parser.add_argument( + "--session-id", + action="append", + dest="session_ids", + metavar="ID", + help="Scan a specific session id (repeatable; implies --live)", + ) + parser.add_argument( + "--self-test", + action="store_true", + help="Run built-in fixture test and exit (no external deps required)", + ) + args = parser.parse_args() + + if args.self_test: + return self_test() + + all_hits: list[dict] = [] + + if args.files: + file_hits = scan_files(args.files) + all_hits.extend(file_hits) + label = f"files: {', '.join(args.files)}" + _tabulate(file_hits, label) + + if args.live or args.session_ids: + live_hits = scan_live( + session_ids=args.session_ids, + limit=args.sessions if not args.session_ids else None, + ) + all_hits.extend(live_hits) + _tabulate(live_hits, "live engine") + + if not args.files and not args.live and not args.session_ids: + parser.print_help() + return 1 + + if args.files and (args.live or args.session_ids): + _tabulate(all_hits, "TOTAL (files + live)") + + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/coder/skills/SKILL.md b/coder/skills/SKILL.md index cb7e5dea..c4526b14 100644 --- a/coder/skills/SKILL.md +++ b/coder/skills/SKILL.md @@ -1,18 +1,22 @@ --- name: coder description: >- - Read, search, and edit files inside a path-jailed base_path — structured - filesystem ops for agents, with glob-based secret protection and atomic - per-file writes. + Read, search, and edit files inside path-jailed allowed roots — structured + filesystem ops for agents, with glob-based secret protection and per-file + atomic writes. --- # coder The coder worker is a path-jailed surface for filesystem work. Every `coder::*` -call resolves its `path` argument relative to a single operator-configured -`base_path` and refuses anything that would escape it — absolute inputs, `..` -segments, and crafted symlinks all return an error rather than being silently -re-jailed. A glob-based `non_accessible_globs` list keeps sensitive files +call is scoped to one or more operator-configured allowed roots. Call +`coder::info` first to discover the canonical allowed roots, size caps, +response budgets (`max_output_bytes`, `batch_read_budget_bytes`, +`search_response_budget_bytes`), the `default_exclude_globs` noise filter, and +non-accessible globs. Relative paths resolve against the primary root (index +0); absolute paths are accepted when they canonicalize inside any allowed root +— outside every root returns `C215`. `..` segments and escaping symlinks are +also rejected. A glob-based `non_accessible_globs` list keeps sensitive files (`.env`, `*.pem`, anything under `secrets/`) visible to directory listings but unreadable, unwritable, and unsearchable. @@ -20,8 +24,8 @@ The surface covers the whole read-explore-edit cycle: navigate with `coder::tree` and `coder::list-folder`, discover with `coder::search`, inspect with `coder::read-file`, then mutate through the batched `coder::create-file`, `coder::update-file`, and `coder::delete-file`. Add it with `iii worker add -coder`; operator caps on per-file read/write bytes, listing pages, and search -matches live in `config.yaml`. It is filesystem-only and never spawns a process. +coder`; operator caps and budgets live in `config.yaml`. It is filesystem-only +and never spawns a process. ## When to Use @@ -29,17 +33,85 @@ matches live in `config.yaml`. It is filesystem-only and never spawns a process. drill into folders flagged as truncated (`coder::list-folder`). - Find a string, symbol, or TODO across many files by content or path (`coder::search`). -- Read one file's full contents after a search hit (`coder::read-file`). +- Read a file window-first: `stat: true` probe, then just the lines you need + (`coder::read-file` — see Window-first reading). +- Make a targeted edit in TWO calls — search with context, edit directly — no + file read in between (the 2-call edit workflow below). - Scaffold a fresh file or subtree, or rewrite existing source line-by-line (`coder::create-file`, `coder::update-file`). - Remove stale files or directories (`coder::delete-file`). +## The 2-call edit workflow + +1. `coder::search {query, context_lines_before: 3, context_lines_after: 8}` — + the context lines plus 1-based line numbers are usually all you need. +2. Edit with `coder::update-file`: + - Regex `replace` when the region is uniquely anchorable: two short anchors + joined by `.*?`, with `dot_matches_newline: true` and `expect_matches: 1`. + - `update_lines` when it is not (repeated code, nothing distinctive): take + exact coordinates from a `numbered: true` read window first. + +Wildcard economy: replace large regions WITHOUT quoting them — +`"anchor_start.*?anchor_end"` with `dot_matches_newline: true`. ALWAYS prefer +wildcards over pasting the block into the pattern. `expect_matches: 1` turns a +silent multi-site clobber into a safe pre-write error; `expect_matches: 0` +asserts absence. `$` in `replacement` is a capture reference (`$1`, `${name}`) +— write a literal `$` as `$$`; JS/TS template literals are the classic trap +(write `Hello, $${name}!` to output `Hello, ${name}!`), and a reference to a +group the pattern does not define fails pre-write with `C210`. Every applied +op returns bounded post-apply echoes (line ops: region ±2 lines; replace: up +to 5 sites, each showing the first AND last line of its replaced region with +inner lines counted in `elided`) — verify from the echoes instead of +re-reading the file. + +Bulk rename: `coder::search {query: "\bold_name\b", regex: true}` → ONE +`coder::update-file` call with a `\b`-anchored `replace` op per file, +`expect_matches` pinned to that file's count from the search results (in the +JSON wire payload write `\\b` — a lone `\b` is valid JSON but means the +BACKSPACE character, so the pattern silently matches nothing). Omitting +`expect_matches` replaces all matches in the file unconditionally — fine when +the search already showed you every site. + +## Window-first reading + +Never full-read a file you haven't probed. `stat: true` costs ~90 tokens and +returns size plus `total_lines`; then fetch only the window you need with +`line_from`/`line_to`. If `total_lines` > ~400, window it. Full reads larger +than `max_output_bytes` (default 128 KiB) refuse with a C213 that carries the +file's size, line count, and the recovery calls. + +- `numbered: true` recipe: a numbered window prefixes each line with its + ABSOLUTE 1-based file line number — the exact coordinates + `coder::update-file` line ops take, so go straight from window to + `update_lines`. Bottom-up application lets multiple ops in one call all use + original line numbers. +- Poor-man's outline: `coder::search {path: "src", regex: true, query: + "^\s*(pub |fn |class |def |func |impl |interface )", include_globs: + ["src/config.rs"]}` returns one file's structure with line numbers — the + `\s*` catches indented declarations (impl/class methods). `path` must be a + folder (scope it tight); the root-relative `include_globs` pins the file. +- Batch economy: batch related reads in ONE `paths[]` call (shared + `batch_read_budget_bytes`, 1 MiB); batch `stat` probes are budget-free; + batched mutators are per-entry isolated, so one bad path never aborts the + rest. + +## Tree and path notes + +- Nodes carry only `name`. The root node's path IS the response's top-level + `path` — start joining at the root's children: child path = parent path + + "/" + name. Same rule for `coder::list-folder` entries. +- Noise dirs (node_modules, .git, target, …) appear as childless `truncated` + stubs in `coder::tree` and are omitted from `coder::search`; pass + `use_default_excludes: false` to look inside. `coder::info` lists the + active globs. + ## Boundaries - Not for running processes — reach for `shell::exec` / `shell::exec_bg` in the `shell` worker to build, test, format, or run git. `coder::*` never shells out. -- Paths must be relative to `base_path`; absolute inputs, `..`, and escaping - symlinks are rejected rather than re-jailed. +- Relative paths resolve against the primary allowed root; absolute paths inside + any allowed root are accepted (as of 0.3.0). `..` and escaping symlinks are + rejected. Use `coder::info` to discover roots when a path is rejected. - `non_accessible_globs` blocks reads, writes, searches, and deletes — a denied path is folded with "not found" so callers can't probe for its existence. - Writes fire no engine triggers and emit no events; the only effect is the @@ -49,12 +121,17 @@ matches live in `config.yaml`. It is filesystem-only and never spawns a process. ## Functions -- `coder::tree` — recursive directory snapshot bounded by `max_depth` and a per-folder limit; folders that hit the cap are flagged for paginated drilldown. -- `coder::list-folder` — paginated single-folder listing sorted by name; non-accessible entries are still listed with a `non_accessible: true` flag. -- `coder::search` — literal or regex search over file content and/or paths, with include/exclude globs; non-accessible files are skipped entirely. -- `coder::read-file` — read one file as UTF-8 plus `size` / `mode` / `mtime`, capped by `max_read_bytes`. +- `coder::info` — the jail contract: allowed roots, size caps, budgets, limits, `default_exclude_globs`, and non-accessible globs. Call FIRST when unsure where coder may read or write. +- `coder::tree` — recursive snapshot bounded by `max_depth` and a per-folder limit; slim nodes (`name` only — join rule above); capped folders are flagged for `coder::list-folder` drilldown; noise dirs surface as childless `truncated` stubs. +- `coder::list-folder` — paginated single-folder listing sorted by name; entries carry `name` only; non-accessible entries flagged `non_accessible: true`. +- `coder::search` — literal or regex over content and/or paths with include/exclude globs; `context_lines_before`/`context_lines_after` (≤10) attach surrounding lines; capped by `max_matches` AND `search_response_budget_bytes` — on `truncated: true`, refine the query rather than paginate. +- `coder::read-file` — `stat: true` probe, `line_from`/`line_to` windows, `numbered: true` absolute line numbers; full reads budgeted by `max_output_bytes` (C213 carries size/lines/recovery); batch `paths[]` shares `batch_read_budget_bytes`. - `coder::create-file` — batched file creation with per-entry `overwrite` and `parents` flags. -- `coder::update-file` — batched `insert` / `remove` / `update_lines` / regex `replace` ops across one or more files. +- `coder::update-file` — batched `insert` / `remove` / `update_lines` / regex `replace` (`dot_matches_newline`, `expect_matches`) ops; every applied op echoes a bounded post-apply window for read-free verification. - `coder::delete-file` — batched removal; `recursive: true` is required for non-empty directories and missing paths are idempotent successes. +- `coder::move` — batched move/rename with per-entry `overwrite` and `parents` flags; same-root renames are per-file atomic; cross-root file moves use copy+delete with rollback; cross-root directory moves are unsupported. -The batched mutators return one result per input entry so a single bad path never aborts the rest of the call, `coder::update-file` line ops are 1-based and inclusive and applied bottom-up so each op still references the caller's original line numbers, and every file commits atomically via a temp file plus rename. +The batched mutators return one result per input entry so a single bad path +never aborts the rest of the call; line ops are 1-based, inclusive, applied +bottom-up against original line numbers; every file commits per-file atomically +via temp file plus rename. diff --git a/coder/src/config.rs b/coder/src/config.rs index 370301df..9a91b208 100644 --- a/coder/src/config.rs +++ b/coder/src/config.rs @@ -7,20 +7,42 @@ use std::path::PathBuf; use anyhow::{Context, Result}; +use schemars::JsonSchema; use serde::{Deserialize, Serialize}; +use serde_json::Value; -#[derive(Debug, Clone, Serialize, Deserialize)] +#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)] pub struct CoderConfig { - /// Root directory the worker operates inside. Every wire path is - /// resolved relative to this. - #[serde(default = "default_base_path")] - pub base_path: PathBuf, + /// Legacy single-root form. Honored as a one-entry `base_paths` list. + /// Setting BOTH `base_path` and `base_paths` is a startup error + /// (checked at `PathResolver` construction). + #[serde(default)] + pub base_path: Option, + + /// Root directories the worker operates inside. The FIRST entry is + /// the primary root: relative wire paths resolve against it. Absolute + /// wire paths are accepted when they canonicalize inside ANY listed + /// root. When neither this nor `base_path` is set, the effective + /// default is `["./", "/tmp"]` (resolved at `PathResolver` + /// construction). + #[serde(default)] + pub base_paths: Vec, - /// Glob patterns matched against the *relative* path. Matching files - /// can be listed but not read/written/deleted/created. + /// Glob patterns matched against the path *relative to its containing + /// root*. Matching files can be listed but not + /// read/written/deleted/created. #[serde(default)] pub non_accessible_globs: Vec, + /// Noise-exclusion globs (matched against the path relative to its + /// containing root, same convention as `non_accessible_globs`). + /// `coder::tree` and `coder::search` suppress descent into matching + /// directories and omit matching files; callers opt out per call with + /// `use_default_excludes: false`. Unlike `non_accessible_globs` this + /// only HIDES results — it grants no access protection. + #[serde(default = "default_default_exclude_globs")] + pub default_exclude_globs: Vec, + #[serde(default = "default_max_read_bytes")] pub max_read_bytes: u64, @@ -44,10 +66,54 @@ pub struct CoderConfig { #[serde(default = "default_search_max_line_bytes")] pub search_default_max_line_bytes: u32, + + /// Aggregate budget across a single `paths[]` batch call to + /// `coder::read-file`, measured in BYTES OF RETURNED CONTENT (after + /// UTF-8 sanitization — invalid bytes expand to 3-byte U+FFFD + /// replacements before being counted, so the cap bounds what the + /// caller actually receives). Entries are collected in request order + /// until this budget is exhausted; an entry reached with zero budget + /// remaining gets a per-entry C213. Single-path FULL reads are + /// budgeted by `max_output_bytes` instead; `max_read_bytes` remains + /// the per-file IO ceiling in every mode. + #[serde(default = "default_batch_read_budget_bytes")] + pub batch_read_budget_bytes: u64, + + /// Context budget for single-path FULL reads in `coder::read-file` + /// (no `line_from`/`line_to`, `stat: false`), measured in BYTES OF + /// RETURNED CONTENT after UTF-8 sanitization (numbered prefixes + /// included) — the same accounting unit as `batch_read_budget_bytes`. + /// A full read whose converted content would exceed this budget + /// fails with a C213 that reports the file's size and line count and + /// names the recovery paths (window, stat probe, or per-call + /// `max_output_bytes` raise, clamped to `max_read_bytes`). Windowed + /// reads and batch mode are NOT governed by this key. + #[serde(default = "default_max_output_bytes")] + pub max_output_bytes: u64, + + /// Aggregate byte budget for one `coder::search` response, measured + /// in CONVERTED WIRE BYTES at accumulation time — the bytes of the + /// strings that will actually be serialized (path + matched text + + /// context lines for content matches; path for path matches), the + /// same accounting philosophy as `batch_read_budget_bytes`. Exactness + /// is not required; monotone bounding is. When the next match would + /// exceed the budget the search stops accumulating and sets + /// `truncated: true` — it degrades, it never errors. + #[serde(default = "default_search_response_budget_bytes")] + pub search_response_budget_bytes: u64, } -fn default_base_path() -> PathBuf { - PathBuf::from("./") +fn default_default_exclude_globs() -> Vec { + [ + "**/.git/**", + "**/node_modules/**", + "**/target/**", + "**/dist/**", + "**/.venv/**", + "**/__pycache__/**", + ] + .map(String::from) + .to_vec() } fn default_max_read_bytes() -> u64 { 10 * 1024 * 1024 @@ -73,12 +139,51 @@ fn default_search_max_matches() -> u32 { fn default_search_max_line_bytes() -> u32 { 4_096 } +fn default_batch_read_budget_bytes() -> u64 { + 1_048_576 +} +fn default_max_output_bytes() -> u64 { + 131_072 +} +fn default_search_response_budget_bytes() -> u64 { + 262_144 +} + +/// A signature of everything the boot-time security jail (`PathResolver`) and +/// the resolver-compiled noise filter depend on. See +/// [`CoderConfig::jail_signature`]. Two configs with an equal signature differ +/// only in numeric tuning knobs that can be hot-applied; any other difference +/// requires a worker restart because the `PathResolver` is built once at boot +/// and is NEVER rebuilt at runtime — it is the security boundary. +#[derive(Clone, PartialEq, Eq, Debug)] +pub struct JailSignature { + /// Legacy single-root form. A change re-roots the jail — restart-required + /// (`PathResolver` compiles the effective root set from this + `base_paths`). + pub base_path: Option, + /// The root directories the jail confines all access to. A change to the + /// set (or order — the first entry is the primary root) moves the security + /// boundary, so it is restart-required: the `PathResolver` canonicalizes + /// these once at boot and refuses to swap them live. + pub base_paths: Vec, + /// The access-deny globs. These are the read/write/delete protection layer + /// (e.g. `.env`, `*.pem`), compiled into the `PathResolver` at boot. A + /// change alters the security posture, so it is restart-required — never + /// relax the jail on a live process. + pub non_accessible_globs: Vec, + /// The resolver-compiled noise filter used by `tree`/`search`. Unlike + /// `non_accessible_globs` this grants NO access protection — it only hides + /// results — but it is still compiled into the `PathResolver` at boot, so a + /// change is restart-required for symmetry with the other compiled globsets. + pub default_exclude_globs: Vec, +} impl Default for CoderConfig { fn default() -> Self { Self { - base_path: default_base_path(), + base_path: None, + base_paths: Vec::new(), non_accessible_globs: Vec::new(), + default_exclude_globs: default_default_exclude_globs(), max_read_bytes: default_max_read_bytes(), max_write_bytes: default_max_write_bytes(), tree_default_depth: default_tree_default_depth(), @@ -87,10 +192,117 @@ impl Default for CoderConfig { list_max_page_size: default_list_max_page_size(), search_default_max_matches: default_search_max_matches(), search_default_max_line_bytes: default_search_max_line_bytes(), + batch_read_budget_bytes: default_batch_read_budget_bytes(), + max_output_bytes: default_max_output_bytes(), + search_response_budget_bytes: default_search_response_budget_bytes(), } } } +impl CoderConfig { + /// Parse a config from a YAML seed string. Expands `${NAME}` against the + /// process environment FIRST (the seed file is the only path that needs + /// expansion — values fetched from `configuration::get` are already + /// env-expanded by the configuration worker), then deserializes. Used only + /// by the `--config` seed path ([`from_file`]); never on the live fetch. + pub fn from_yaml(yaml: &str) -> Result { + let expanded = expand_env(yaml); + let cfg: CoderConfig = + serde_yaml::from_str(&expanded).map_err(|e| format!("yaml parse: {e}"))?; + Ok(cfg) + } + + /// Read and parse a YAML seed file (env-expanded — see [`from_yaml`]). + pub fn from_file(path: &str) -> Result { + let raw = std::fs::read_to_string(path).map_err(|e| format!("read {path}: {e}"))?; + Self::from_yaml(&raw) + } + + /// Parse a config from a JSON value already env-expanded by the + /// configuration worker. Does NOT run `expand_env` — double-expansion would + /// be a bug — and tolerates a zero-field object (serde defaults fill in). + pub fn from_json(value: &Value) -> Result { + let cfg: CoderConfig = + serde_json::from_value(value.clone()).map_err(|e| format!("json parse: {e}"))?; + Ok(cfg) + } + + pub fn to_json(&self) -> Value { + serde_json::to_value(self).expect("CoderConfig serializes") + } + + pub fn json_schema() -> Value { + let root = schemars::schema_for!(CoderConfig); + let mut schema = + serde_json::to_value(&root.schema).expect("CoderConfig JSON Schema serializes"); + if let Some(obj) = schema.as_object_mut() { + if !root.definitions.is_empty() { + obj.insert( + "definitions".into(), + serde_json::to_value(&root.definitions).expect("definitions serialize"), + ); + } + // Top-level example mirrors the shipped defaults so operators see a + // ready-to-edit multi-root config (same shape as database's schema). + obj.insert("example".into(), CoderConfig::default().to_json()); + } + schema + } + + /// Build the restart-required jail signature. These four fields are + /// EVERYTHING the `PathResolver` compiles: the root set (`base_path` + + /// `base_paths`) that bounds the security jail, the access-deny globs + /// (`non_accessible_globs`), and the resolver-compiled noise filter + /// (`default_exclude_globs`). A live config update that changes ANY of them + /// is refused on hot-reload (logged "restart coder to apply", previous + /// state kept) — the `PathResolver` is the security boundary and is never + /// rebuilt at runtime. Every OTHER field is a numeric tuning knob that + /// hot-applies. Compared by value; the signature owns cloned copies. + pub fn jail_signature(&self) -> JailSignature { + JailSignature { + base_path: self.base_path.clone(), + base_paths: self.base_paths.clone(), + non_accessible_globs: self.non_accessible_globs.clone(), + default_exclude_globs: self.default_exclude_globs.clone(), + } + } +} + +/// Expand `${NAME}` occurrences against the process environment. +/// Unknown variables expand to the empty string and emit a tracing warning. +/// Non-ASCII content outside `${...}` markers is preserved verbatim (the slice +/// boundary lands on the ASCII `$`, so this is UTF-8-safe), and an unterminated +/// `${` is treated as a literal. +fn expand_env(input: &str) -> String { + let mut out = String::with_capacity(input.len()); + let mut rest = input; + while let Some(start) = rest.find("${") { + // Push the prefix verbatim (UTF-8-safe slice — start is a char boundary + // because it points at an ASCII `$`). + out.push_str(&rest[..start]); + let after = &rest[start + 2..]; + match after.find('}') { + Some(end) => { + let name = &after[..end]; + match std::env::var(name) { + Ok(v) => out.push_str(&v), + Err(_) => { + tracing::warn!(var = %name, "config references undefined env var"); + } + } + rest = &after[end + 1..]; + } + None => { + // Unterminated `${`; treat as literal. + out.push_str("${"); + rest = after; + } + } + } + out.push_str(rest); + out +} + pub fn load_config(path: &str) -> Result { let content = std::fs::read_to_string(path).with_context(|| format!("read {}", path))?; let cfg: CoderConfig = @@ -105,8 +317,20 @@ mod tests { #[test] fn empty_yaml_parses_to_defaults() { let cfg: CoderConfig = serde_yaml::from_str("{}").expect("empty yaml parses"); - assert_eq!(cfg.base_path, PathBuf::from("./")); + assert_eq!(cfg.base_path, None); + assert!(cfg.base_paths.is_empty()); assert!(cfg.non_accessible_globs.is_empty()); + assert_eq!( + cfg.default_exclude_globs, + vec![ + "**/.git/**", + "**/node_modules/**", + "**/target/**", + "**/dist/**", + "**/.venv/**", + "**/__pycache__/**", + ] + ); assert_eq!(cfg.max_read_bytes, 10 * 1024 * 1024); assert_eq!(cfg.max_write_bytes, 10 * 1024 * 1024); assert_eq!(cfg.tree_default_depth, 4); @@ -115,6 +339,9 @@ mod tests { assert_eq!(cfg.list_max_page_size, 1_000); assert_eq!(cfg.search_default_max_matches, 1_000); assert_eq!(cfg.search_default_max_line_bytes, 4_096); + assert_eq!(cfg.batch_read_budget_bytes, 1_048_576); + assert_eq!(cfg.max_output_bytes, 131_072); + assert_eq!(cfg.search_response_budget_bytes, 262_144); } #[test] @@ -127,12 +354,23 @@ mod tests { assert_eq!(a, b); } + #[test] + fn legacy_base_path_parses_as_option() { + let cfg: CoderConfig = serde_yaml::from_str("base_path: /tmp/legacy").unwrap(); + assert_eq!(cfg.base_path, Some(PathBuf::from("/tmp/legacy"))); + assert!(cfg.base_paths.is_empty()); + } + #[test] fn custom_yaml_overrides_each_field() { let yaml = r#" -base_path: /tmp/c +base_paths: + - /tmp/c + - /tmp/d non_accessible_globs: - "**/.env" +default_exclude_globs: + - "**/build/**" max_read_bytes: 42 max_write_bytes: 43 tree_default_depth: 7 @@ -141,10 +379,18 @@ list_default_page_size: 11 list_max_page_size: 13 search_default_max_matches: 17 search_default_max_line_bytes: 19 +batch_read_budget_bytes: 23 +max_output_bytes: 31 +search_response_budget_bytes: 29 "#; let cfg: CoderConfig = serde_yaml::from_str(yaml).unwrap(); - assert_eq!(cfg.base_path, PathBuf::from("/tmp/c")); + assert_eq!(cfg.base_path, None); + assert_eq!( + cfg.base_paths, + vec![PathBuf::from("/tmp/c"), PathBuf::from("/tmp/d")] + ); assert_eq!(cfg.non_accessible_globs, vec!["**/.env".to_string()]); + assert_eq!(cfg.default_exclude_globs, vec!["**/build/**".to_string()]); assert_eq!(cfg.max_read_bytes, 42); assert_eq!(cfg.max_write_bytes, 43); assert_eq!(cfg.tree_default_depth, 7); @@ -153,6 +399,9 @@ search_default_max_line_bytes: 19 assert_eq!(cfg.list_max_page_size, 13); assert_eq!(cfg.search_default_max_matches, 17); assert_eq!(cfg.search_default_max_line_bytes, 19); + assert_eq!(cfg.batch_read_budget_bytes, 23); + assert_eq!(cfg.max_output_bytes, 31); + assert_eq!(cfg.search_response_budget_bytes, 29); } #[test] @@ -160,7 +409,202 @@ search_default_max_line_bytes: 19 let path = concat!(env!("CARGO_MANIFEST_DIR"), "/config.yaml"); let content = std::fs::read_to_string(path).expect("read config.yaml"); let cfg: CoderConfig = serde_yaml::from_str(&content).expect("config.yaml parses"); - assert_eq!(cfg.base_path, PathBuf::from("./")); + // T14: config.yaml now uses the multi-root form; legacy base_path is unset. + assert_eq!(cfg.base_path, None); + assert_eq!( + cfg.base_paths, + vec![PathBuf::from("./"), PathBuf::from("/tmp")] + ); assert!(cfg.non_accessible_globs.iter().any(|g| g.contains(".env"))); } + + #[test] + fn json_schema_has_expected_properties() { + let schema = CoderConfig::json_schema(); + let props = schema + .get("properties") + .and_then(|p| p.as_object()) + .expect("schema has properties object"); + for field in [ + "base_path", + "base_paths", + "non_accessible_globs", + "default_exclude_globs", + "max_read_bytes", + "max_write_bytes", + "tree_default_depth", + "tree_per_folder_limit", + "list_default_page_size", + "list_max_page_size", + "search_default_max_matches", + "search_default_max_line_bytes", + "batch_read_budget_bytes", + "max_output_bytes", + "search_response_budget_bytes", + ] { + assert!( + props.get(field).is_some(), + "missing schema property {field}" + ); + } + // The field doc-comments survive as schema descriptions. + assert!(props["non_accessible_globs"].get("description").is_some()); + // The `#[schemars(example = ...)]` attribute surfaces a top-level example. + assert!(schema.get("example").is_some()); + } + + #[test] + fn from_json_round_trips_from_default() { + let cfg = CoderConfig::default(); + let json = cfg.to_json(); + let back = CoderConfig::from_json(&json).unwrap(); + let a = serde_json::to_value(&cfg).unwrap(); + let b = serde_json::to_value(&back).unwrap(); + assert_eq!(a, b); + } + + #[test] + fn from_json_round_trips_custom_values() { + let json = serde_json::json!({ + "base_paths": ["/tmp/x"], + "non_accessible_globs": ["**/.env"], + "max_read_bytes": 99, + "tree_default_depth": 2, + }); + let cfg = CoderConfig::from_json(&json).unwrap(); + assert_eq!(cfg.base_paths, vec![PathBuf::from("/tmp/x")]); + assert_eq!(cfg.non_accessible_globs, vec!["**/.env".to_string()]); + assert_eq!(cfg.max_read_bytes, 99); + assert_eq!(cfg.tree_default_depth, 2); + // Unspecified fields fall back to serde defaults. + assert_eq!(cfg.max_write_bytes, 10 * 1024 * 1024); + } + + #[test] + fn from_json_tolerates_empty_object() { + let back = CoderConfig::from_json(&serde_json::json!({})).unwrap(); + let a = serde_json::to_value(&back).unwrap(); + let b = serde_json::to_value(CoderConfig::default()).unwrap(); + assert_eq!(a, b); + } + + #[test] + fn from_json_rejects_garbage() { + // Wrong type for a numeric field — serde rejects. + let err = CoderConfig::from_json(&serde_json::json!({ "max_read_bytes": "not-a-number" })) + .unwrap_err(); + assert!(err.contains("json parse"), "got: {err}"); + // A non-object value also fails. + let err = CoderConfig::from_json(&serde_json::json!("garbage")).unwrap_err(); + assert!(err.contains("json parse"), "got: {err}"); + } + + #[test] + fn to_json_round_trips_through_from_json() { + let yaml = r#" +base_paths: + - /tmp/a +max_output_bytes: 7 +search_response_budget_bytes: 11 +"#; + let cfg = CoderConfig::from_yaml(yaml).unwrap(); + let back = CoderConfig::from_json(&cfg.to_json()).unwrap(); + assert_eq!(back.base_paths, vec![PathBuf::from("/tmp/a")]); + assert_eq!(back.max_output_bytes, 7); + assert_eq!(back.search_response_budget_bytes, 11); + } + + #[test] + fn from_yaml_expands_env_var() { + std::env::set_var("CODER_TEST_ROOT", "/tmp/expanded-root"); + let yaml = "base_paths:\n - \"${CODER_TEST_ROOT}\"\n"; + let cfg = CoderConfig::from_yaml(yaml).unwrap(); + assert_eq!(cfg.base_paths, vec![PathBuf::from("/tmp/expanded-root")]); + std::env::remove_var("CODER_TEST_ROOT"); + } + + #[test] + fn from_yaml_preserves_unicode_outside_markers() { + // Guard against byte-iteration mojibake: a non-ASCII comment runs + // through expand_env verbatim before serde_yaml strips it. + let yaml = "# café 日本語\nmax_read_bytes: 5\n"; + let cfg = CoderConfig::from_yaml(yaml).unwrap(); + assert_eq!(cfg.max_read_bytes, 5); + } + + #[test] + fn jail_signature_equal_when_only_numeric_fields_differ() { + // Two configs differing ONLY in numeric tuning knobs share a signature: + // a hot-reload between them is allowed (no PathResolver rebuild needed). + let base = CoderConfig { + base_paths: vec![PathBuf::from("/tmp/a")], + non_accessible_globs: vec!["**/.env".to_string()], + default_exclude_globs: vec!["**/target/**".to_string()], + ..CoderConfig::default() + }; + let tuned = CoderConfig { + max_read_bytes: base.max_read_bytes + 1, + max_write_bytes: base.max_write_bytes + 1, + tree_default_depth: base.tree_default_depth + 1, + tree_per_folder_limit: base.tree_per_folder_limit + 1, + list_default_page_size: base.list_default_page_size + 1, + list_max_page_size: base.list_max_page_size + 1, + search_default_max_matches: base.search_default_max_matches + 1, + search_default_max_line_bytes: base.search_default_max_line_bytes + 1, + batch_read_budget_bytes: base.batch_read_budget_bytes + 1, + max_output_bytes: base.max_output_bytes + 1, + search_response_budget_bytes: base.search_response_budget_bytes + 1, + ..base.clone() + }; + assert_eq!(base.jail_signature(), tuned.jail_signature()); + } + + #[test] + fn jail_signature_differs_when_base_path_changes() { + let a = CoderConfig::default(); + let b = CoderConfig { + base_path: Some(PathBuf::from("/tmp/legacy")), + ..CoderConfig::default() + }; + assert_ne!(a.jail_signature(), b.jail_signature()); + } + + #[test] + fn jail_signature_differs_when_base_paths_changes() { + let a = CoderConfig { + base_paths: vec![PathBuf::from("/tmp/a")], + ..CoderConfig::default() + }; + let b = CoderConfig { + base_paths: vec![PathBuf::from("/tmp/a"), PathBuf::from("/tmp/b")], + ..CoderConfig::default() + }; + assert_ne!(a.jail_signature(), b.jail_signature()); + } + + #[test] + fn jail_signature_differs_when_non_accessible_globs_change() { + let a = CoderConfig { + non_accessible_globs: vec!["**/.env".to_string()], + ..CoderConfig::default() + }; + let b = CoderConfig { + non_accessible_globs: vec!["**/.env".to_string(), "**/*.pem".to_string()], + ..CoderConfig::default() + }; + assert_ne!(a.jail_signature(), b.jail_signature()); + } + + #[test] + fn jail_signature_differs_when_default_exclude_globs_change() { + let a = CoderConfig { + default_exclude_globs: vec!["**/target/**".to_string()], + ..CoderConfig::default() + }; + let b = CoderConfig { + default_exclude_globs: vec!["**/build/**".to_string()], + ..CoderConfig::default() + }; + assert_ne!(a.jail_signature(), b.jail_signature()); + } } diff --git a/coder/src/configuration.rs b/coder/src/configuration.rs new file mode 100644 index 00000000..75112973 --- /dev/null +++ b/coder/src/configuration.rs @@ -0,0 +1,300 @@ +//! Integration with the `configuration` worker — register, fetch, and +//! hot-reload the `coder` configuration entry. +//! +//! coder's config splits into two halves on a live update: +//! +//! - The JAIL SIGNATURE (`base_path` + `base_paths` + `non_accessible_globs` + +//! `default_exclude_globs`) is everything the boot-time `PathResolver` +//! compiles. The `PathResolver` is the security boundary and is built ONCE at +//! boot and NEVER rebuilt at runtime; a config change that alters any of those +//! four fields is REFUSED on hot-reload (logged "restart coder to apply", the +//! previous snapshot kept) — mirroring storage's topology-change refusal. +//! - Every OTHER field is a numeric tuning knob (byte caps, page sizes, response +//! budgets). When a freshly-fetched config's jail signature matches the boot +//! signature, the snapshot is swapped live; handlers read the current snapshot +//! per call. + +use std::sync::Arc; +use std::time::Duration; + +use iii_sdk::{IIIError, RegisterFunction, RegisterTriggerInput, TriggerRequest, III}; +use serde_json::{json, Value}; +use tokio::sync::RwLock; + +use crate::config::{CoderConfig, JailSignature}; + +/// Hot-swappable config snapshot shared with every cfg-taking handler. The +/// `Arc>>` shape lets a handler take a `read().await` +/// and `clone()` the inner `Arc` out (a cheap refcount bump) without holding the +/// lock across its work, while `apply_config` whole-snapshot replaces the inner +/// `Arc` under the write lock. The `PathResolver` is NOT stored here — it is the +/// security jail, built once at boot and never swapped. +pub type ConfigCell = Arc>>; + +pub const CONFIG_ID: &str = "coder"; +const CONFIG_FN_ID: &str = "coder::on-config-change"; +const CONFIG_TIMEOUT_MS: u64 = 5_000; +const CONFIG_RETRIES: u32 = 3; + +/// Register the `coder` configuration schema with the configuration worker. +/// When `seed` is present, its value is installed as `initial_value`. Otherwise, +/// the built-in default is seeded only when no stored value exists yet. +pub async fn register_config(iii: &III, seed: Option<&CoderConfig>) -> Result<(), String> { + let mut payload = json!({ + "id": CONFIG_ID, + "name": "Coder", + "description": "Path-jailed code file access for iii agents: read/search/edit/create/move files inside a fixed set of allowed roots, with non-accessible glob protection and token-bounded response budgets.", + "schema": CoderConfig::json_schema(), + }); + if let Some(seed) = seed { + payload["initial_value"] = seed.to_json(); + } else if should_seed_default_value(iii).await? { + payload["initial_value"] = CoderConfig::default().to_json(); + } + trigger_with_retry(iii, "configuration::register", payload).await?; + Ok(()) +} + +/// Read the live `coder` configuration (env-expanded by the configuration +/// worker — `from_json` does NOT re-expand). +pub async fn fetch_config(iii: &III) -> Result { + let value = get_config_value(iii).await?; + if value.is_null() { + tracing::info!("no configuration value found; using built-in default configuration"); + return Ok(CoderConfig::default()); + } + CoderConfig::from_json(&value) +} + +async fn should_seed_default_value(iii: &III) -> Result { + match try_get_config_value(iii).await? { + None => Ok(true), + Some(value) if value.is_null() => Ok(true), + Some(_) => Ok(false), + } +} + +async fn get_config_value(iii: &III) -> Result { + try_get_config_value(iii) + .await? + .ok_or_else(|| format!("configuration `{CONFIG_ID}` not found")) +} + +/// Returns `Ok(None)` when the entry does not exist (`NOT_FOUND`). +async fn try_get_config_value(iii: &III) -> Result, String> { + match trigger_with_retry(iii, "configuration::get", json!({ "id": CONFIG_ID })).await { + Ok(resp) => Ok(resp.get("value").cloned()), + Err(e) if e.contains("NOT_FOUND") => Ok(None), + Err(e) => Err(e), + } +} + +/// Decide whether a freshly-fetched config can be hot-applied. Returns the +/// config when its jail signature matches the boot-time signature (only numeric +/// tuning knobs changed), or an error describing the jail change that requires a +/// worker restart. Mirrors storage's `reloadable` topology gate. +fn reloadable(cfg: CoderConfig, boot_sig: &JailSignature) -> Result { + if cfg.jail_signature() != *boot_sig { + return Err( + "configuration change alters the path jail (base_paths / non_accessible_globs / \ + default_exclude_globs) — these define the security boundary and the compiled \ + PathResolver; a worker restart is required to apply them" + .to_string(), + ); + } + Ok(cfg) +} + +/// Swap the config snapshot under the write lock. No resolver rebuild — the jail +/// is unchanged by construction (the caller has already passed `reloadable`). +pub async fn apply_config(cell: &ConfigCell, cfg: CoderConfig) { + *cell.write().await = Arc::new(cfg); +} + +/// Register the internal config-change handler and bind a `configuration` +/// trigger. +/// +/// `boot_sig` is the jail signature captured at startup; any reload that would +/// change it is refused (those require a worker restart because the +/// `PathResolver` is never rebuilt). +pub fn register_config_trigger( + iii: &III, + cell: ConfigCell, + boot_sig: JailSignature, +) -> Result<(), IIIError> { + let cell_for_fn = cell.clone(); + let engine = iii.clone(); + iii.register_function( + CONFIG_FN_ID, + RegisterFunction::new_async(move |_payload: Value| { + let cell = cell_for_fn.clone(); + let engine = engine.clone(); + let boot_sig = boot_sig.clone(); + async move { + on_config_change(&engine, &cell, &boot_sig).await; + Ok::(json!({ "ok": true })) + } + }) + .description( + "Internal: reload coder's tuning limits from the authoritative configuration when it \ + changes; jail-defining changes require a restart.", + ), + ); + + iii.register_trigger(RegisterTriggerInput { + trigger_type: "configuration".to_string(), + function_id: CONFIG_FN_ID.to_string(), + config: json!({ + "configuration_id": CONFIG_ID, + "event_types": ["configuration:updated"], + }), + metadata: None, + })?; + Ok(()) +} + +/// Reload coder's tuning limits from the AUTHORITATIVE configuration. +/// +/// The caller-supplied trigger payload is intentionally ignored: +/// `coder::on-config-change` is a discoverable bus function, so trusting +/// `payload.new_value` would let any caller inject arbitrary config (loosening +/// byte caps or — worse — the jail) without updating persisted state. Re-fetch +/// the stored value via `configuration::get` instead. A jail-changing update is +/// refused (it requires a restart); the previous snapshot is always kept on any +/// failure path. +async fn on_config_change(iii: &III, cell: &ConfigCell, boot_sig: &JailSignature) { + let cfg = match fetch_config(iii).await { + Ok(cfg) => cfg, + Err(e) => { + tracing::error!( + error = %e, + "config-change: failed to fetch authoritative configuration; keeping previous limits" + ); + return; + } + }; + let cfg = match reloadable(cfg, boot_sig) { + Ok(cfg) => cfg, + Err(reason) => { + tracing::warn!( + reason = %reason, + "config-change refused: jail change requires restart; keeping previous limits" + ); + return; + } + }; + apply_config(cell, cfg).await; + tracing::info!("coder tuning limits reloaded (jail unchanged)"); +} + +async fn trigger_with_retry(iii: &III, function_id: &str, payload: Value) -> Result { + let mut last_err = String::new(); + for attempt in 1..=CONFIG_RETRIES { + match iii + .trigger(TriggerRequest { + function_id: function_id.to_string(), + payload: payload.clone(), + action: None, + timeout_ms: Some(CONFIG_TIMEOUT_MS), + }) + .await + { + Ok(v) => return Ok(v), + Err(e) => { + last_err = e.to_string(); + if attempt < CONFIG_RETRIES { + tracing::warn!( + function_id, + attempt, + error = %last_err, + "configuration RPC failed; retrying" + ); + tokio::time::sleep(Duration::from_millis(250 * u64::from(attempt))).await; + } + } + } + } + Err(format!( + "{function_id} failed after {CONFIG_RETRIES} attempts: {last_err}" + )) +} + +#[cfg(test)] +mod tests { + use super::*; + use std::path::PathBuf; + + #[test] + fn reloadable_allows_numeric_only_change() { + // Two configs sharing a jail signature, differing only in a numeric + // tuning knob, hot-apply (no PathResolver rebuild needed). + let boot = CoderConfig { + base_paths: vec![PathBuf::from("/tmp/a")], + non_accessible_globs: vec!["**/.env".to_string()], + default_exclude_globs: vec!["**/target/**".to_string()], + ..CoderConfig::default() + }; + let next = CoderConfig { + max_read_bytes: boot.max_read_bytes + 1, + ..boot.clone() + }; + let boot_sig = boot.jail_signature(); + let applied = reloadable(next, &boot_sig).expect("numeric-only change is reloadable"); + assert_eq!(applied.max_read_bytes, boot.max_read_bytes + 1); + } + + #[test] + fn reloadable_refuses_jail_change() { + let boot = CoderConfig { + base_paths: vec![PathBuf::from("/tmp/a")], + non_accessible_globs: vec!["**/.env".to_string()], + default_exclude_globs: vec!["**/target/**".to_string()], + ..CoderConfig::default() + }; + let boot_sig = boot.jail_signature(); + + // base_path change -> refused. + let changed_base_path = CoderConfig { + base_path: Some(PathBuf::from("/tmp/legacy")), + ..boot.clone() + }; + assert!(reloadable(changed_base_path, &boot_sig).is_err()); + + // base_paths change -> refused. + let changed_base_paths = CoderConfig { + base_paths: vec![PathBuf::from("/tmp/a"), PathBuf::from("/tmp/b")], + ..boot.clone() + }; + assert!(reloadable(changed_base_paths, &boot_sig).is_err()); + + // non_accessible_glob change -> refused. + let changed_non_accessible = CoderConfig { + non_accessible_globs: vec!["**/.env".to_string(), "**/*.pem".to_string()], + ..boot.clone() + }; + assert!(reloadable(changed_non_accessible, &boot_sig).is_err()); + + // default_exclude_glob change -> refused. + let changed_default_exclude = CoderConfig { + default_exclude_globs: vec!["**/build/**".to_string()], + ..boot.clone() + }; + assert!(reloadable(changed_default_exclude, &boot_sig).is_err()); + } + + #[tokio::test] + async fn apply_config_swaps_snapshot() { + let cell: ConfigCell = Arc::new(RwLock::new(Arc::new(CoderConfig::default()))); + let before = cell.read().await.clone(); + assert_eq!(before.max_read_bytes, CoderConfig::default().max_read_bytes); + + let tuned = CoderConfig { + max_read_bytes: 7, + ..CoderConfig::default() + }; + apply_config(&cell, tuned).await; + + let after = cell.read().await.clone(); + assert_eq!(after.max_read_bytes, 7); + } +} diff --git a/coder/src/error.rs b/coder/src/error.rs index a0949e76..bc60b07d 100644 --- a/coder/src/error.rs +++ b/coder/src/error.rs @@ -4,10 +4,42 @@ //! //! Codes mirror `shell::fs::*`'s `S2xx` scheme so consumers can pattern //! against a stable prefix. +//! +//! MESSAGE STYLE — every error message must carry: +//! (a) what happened (the input + actual values), +//! (b) why it was rejected, +//! (c) the corrective next call, with enough detail that an LLM agent can +//! make a successful second call using ONLY the error text. +//! +//! REDACTION INVARIANT — C211 deliberately folds "not found" and "access +//! denied" into a single code and identical wording so callers cannot probe +//! for the existence of a protected file. Messages MUST NOT distinguish the +//! two cases and MUST NOT name filesystem paths the caller did not supply +//! (e.g. discovered child paths inside a recursive walk). +use schemars::JsonSchema; use serde::Serialize; use thiserror::Error; +/// Structured per-entry error as it appears on the wire. +/// +/// Use `code` for stable programmatic branching (e.g. `"C211"` for +/// not-found-or-denied). `message` carries the human/LLM-readable +/// problem description plus the corrective next call. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, JsonSchema)] +pub struct WireError { + /// Stable error code, e.g. "C211". See the README error table. + pub code: String, + /// Human/LLM-readable message: problem + actual values + corrective next call. + pub message: String, +} + +/// The one allowed C211 recovery-hint suffix. Both `not_found_or_denied` +/// and the `From` NotFound arm build from this const, so the +/// missing and glob-denied wordings can never drift apart. +const C211_SUFFIX: &str = + "not found or not accessible. Verify the path with coder::list-folder or coder::tree."; + #[derive(Debug, Error, Serialize)] #[serde(tag = "code", content = "message")] pub enum CoderError { @@ -28,7 +60,7 @@ pub enum CoderError { #[serde(rename = "C213")] TooLarge(String), - /// Path escapes `base_path` lexically or through a symlink. + /// Path escapes every allowed root, lexically or through a symlink. #[error("C215: {0}")] #[serde(rename = "C215")] OutsideBase(String), @@ -63,12 +95,90 @@ impl CoderError { CoderError::AlreadyExists(_) => "C217", } } + + /// The inner message string (without the `C2xx: ` prefix that the + /// `Display` impl prepends). Used by `WireError` to populate the + /// `message` field without duplicating the code prefix. + pub fn message(&self) -> &str { + match self { + CoderError::BadInput(m) + | CoderError::NotFoundOrDenied(m) + | CoderError::TooLarge(m) + | CoderError::OutsideBase(m) + | CoderError::Io(m) + | CoderError::AlreadyExists(m) => m, + } + } + + /// Convert to the structured wire form used in per-entry batch results. + pub fn to_wire_error(&self) -> WireError { + WireError { + code: self.code().to_string(), + message: self.message().to_string(), + } + } + + /// The primary C211 wording. Single constructor so the missing and + /// glob-denied cases can never drift apart (REDACTION INVARIANT). + /// `not_found_or_denied_subtree` below is the ONLY other allowed + /// C211 shape. + pub fn not_found_or_denied(path: &str) -> Self { + CoderError::NotFoundOrDenied(format!("{path}: {C211_SUFFIX}")) + } + + /// The ONLY other allowed C211 shape: a recursive delete refused + /// because the subtree contains non-accessible entries. Redaction-safe + /// because non-accessible entries' EXISTENCE is already public by + /// design — `list-folder`/`tree` show them with `non_accessible: true` + /// flags; only their content/identity is protected — and this message + /// names neither (`parent_path` is the path the CALLER supplied). + pub fn not_found_or_denied_subtree(parent_path: &str) -> Self { + CoderError::NotFoundOrDenied(format!( + "{parent_path}: subtree contains non-accessible entries; \ + refusing recursive delete." + )) + } + + /// Map an `io::Error` from an operation on a caller-supplied `path`. + /// EVERY arm names that path (MESSAGE STYLE: the error alone must tell + /// the caller which input to act on): NotFound folds into the + /// standardized C211 wording; all other kinds prefix the path onto the + /// `From` mapping (": "). Handlers MUST + /// use this (not bare `?`) whenever the wire path is in scope. + /// Redaction-safe by construction: `path` is caller-supplied at every + /// call site, never a discovered filesystem entry. + pub fn io_for_path(e: std::io::Error, path: &str) -> Self { + match Self::from(e) { + // From already standardized the C211 wording; rebuild + // through the sanctioned constructor so the path is prefixed. + CoderError::NotFoundOrDenied(_) => Self::not_found_or_denied(path), + CoderError::AlreadyExists(m) => CoderError::AlreadyExists(format!("{path}: {m}")), + CoderError::Io(m) => CoderError::Io(format!("{path}: {m}")), + // From only produces the three variants above; keep + // any future variants untouched rather than double-prefixing. + other => other, + } + } +} + +impl From<&CoderError> for WireError { + fn from(e: &CoderError) -> Self { + e.to_wire_error() + } } impl From for CoderError { fn from(e: std::io::Error) -> Self { match e.kind() { - std::io::ErrorKind::NotFound => CoderError::NotFoundOrDenied(e.to_string()), + // Defense in depth for the REDACTION INVARIANT: a bare `?` on a + // fs op must NEVER leak raw OS text ("No such file or directory + // (os error 2)") — that wording is distinguishable from the + // glob-denied message and would let callers probe for protected + // files. The raw error detail is deliberately dropped. Handlers + // should prefer `CoderError::io_for_path` so the message also + // names the caller-supplied path; this generic arm is the + // fallback when no wire path is in scope. + std::io::ErrorKind::NotFound => CoderError::NotFoundOrDenied(C211_SUFFIX.to_string()), std::io::ErrorKind::AlreadyExists => CoderError::AlreadyExists(e.to_string()), _ => CoderError::Io(e.to_string()), } @@ -92,9 +202,63 @@ mod tests { } #[test] - fn io_not_found_maps_to_c211() { - let e: CoderError = std::io::Error::new(std::io::ErrorKind::NotFound, "x").into(); + fn io_not_found_maps_to_c211_without_raw_os_text() { + let e: CoderError = std::io::Error::from(std::io::ErrorKind::NotFound).into(); assert_eq!(e.code(), "C211"); + let msg = e.to_string(); + // REDACTION INVARIANT: the generic From arm must not leak OS error + // detail that would distinguish "missing" from "denied". + assert!( + !msg.contains("os error") && !msg.contains("No such file"), + "raw OS text leaked: {msg}" + ); + assert!( + msg.contains("not found or not accessible"), + "standardized wording missing: {msg}" + ); + } + + #[test] + fn io_for_path_not_found_uses_standardized_wording_with_path() { + let e = CoderError::io_for_path( + std::io::Error::from(std::io::ErrorKind::NotFound), + "some/file.txt", + ); + assert_eq!(e.code(), "C211"); + let msg = e.to_string(); + assert!(msg.contains("some/file.txt: not found or not accessible")); + assert!(!msg.contains("os error"), "raw OS text leaked: {msg}"); + } + + #[test] + fn io_for_path_non_not_found_maps_to_io_with_path_prefix() { + let e = CoderError::io_for_path( + std::io::Error::from(std::io::ErrorKind::PermissionDenied), + "some/file.txt", + ); + assert_eq!(e.code(), "C216"); + // MESSAGE STYLE: every io_for_path arm must name the caller path — + // a bare "Directory not empty (os error 66)" tells the caller + // nothing about which batch entry to act on. + let msg = e.message(); + assert!( + msg.starts_with("some/file.txt: "), + "C216 via io_for_path must prefix the caller path: {msg}" + ); + } + + #[test] + fn io_for_path_already_exists_maps_to_c217_with_path_prefix() { + let e = CoderError::io_for_path( + std::io::Error::new(std::io::ErrorKind::AlreadyExists, "exists"), + "some/file.txt", + ); + assert_eq!(e.code(), "C217"); + assert!( + e.message().starts_with("some/file.txt: "), + "C217 via io_for_path must prefix the caller path: {}", + e.message() + ); } #[test] @@ -118,4 +282,35 @@ mod tests { .collect(); assert_eq!(codes.len(), 6); } + + /// DRIFT PREVENTION: `to_wire_error()` (structured per-entry form) + /// and `to_wire_string()` (top-level Result<_,String> form) are two + /// renderings of the same error; their `code` and `message` must + /// never diverge for any variant. + #[test] + fn wire_error_matches_wire_string_for_every_variant() { + let variants = [ + CoderError::BadInput("bad input msg".into()), + CoderError::NotFoundOrDenied("not found msg".into()), + CoderError::TooLarge("too large msg".into()), + CoderError::OutsideBase("outside base msg".into()), + CoderError::Io("io msg".into()), + CoderError::AlreadyExists("already exists msg".into()), + ]; + for v in &variants { + let wire = v.to_wire_error(); + let s: serde_json::Value = + serde_json::from_str(&v.to_wire_string()).expect("wire string is valid JSON"); + assert_eq!( + wire.code, + s["code"].as_str().unwrap(), + "code drift for variant {v:?}" + ); + assert_eq!( + wire.message, + s["message"].as_str().unwrap(), + "message drift for variant {v:?}" + ); + } + } } diff --git a/coder/src/functions/create_file.rs b/coder/src/functions/create_file.rs index d31a2692..7167a935 100644 --- a/coder/src/functions/create_file.rs +++ b/coder/src/functions/create_file.rs @@ -9,16 +9,22 @@ use schemars::JsonSchema; use serde::{Deserialize, Serialize}; use crate::config::CoderConfig; -use crate::error::{err_to_string, CoderError}; +use crate::error::{err_to_string, CoderError, WireError}; use crate::path::PathResolver; +// examples are wire-contract; goldens pin them. #[derive(Debug, Deserialize, JsonSchema)] +#[schemars(example = "example_create_file_input")] pub struct CreateFileInput { pub files: Vec, } #[derive(Debug, Deserialize, JsonSchema)] pub struct CreateFileSpec { + /// Path relative to the primary allowed root, or an absolute path inside + /// any allowed root. Call `coder::info` to see the allowed roots. Paths + /// outside every allowed root are rejected — use the shell worker's + /// `shell::fs::*` for host paths outside the jail. pub path: String, pub content: String, /// Octal permission bits as a string, e.g. "0644". Defaults to "0644". @@ -40,6 +46,24 @@ fn default_true() -> bool { true } +// examples are wire-contract; goldens pin them. +fn example_create_file_input() -> serde_json::Value { + serde_json::json!({ + "files": [ + { + "path": "src/lib.rs", + "content": "pub mod utils;\n", + "overwrite": false + }, + { + "path": "/tmp/scratch/notes.md", + "content": "# scratch notes\n", + "overwrite": true + } + ] + }) +} + #[derive(Debug, Serialize, JsonSchema)] pub struct CreateFileOutput { pub results: Vec, @@ -47,11 +71,17 @@ pub struct CreateFileOutput { #[derive(Debug, Serialize, JsonSchema)] pub struct CreateFileResult { + /// Canonical absolute path (resolved through the jail); the caller's + /// input verbatim when resolution failed. pub path: String, pub success: bool, pub bytes_written: u64, + /// Structured error for this entry. `code` is stable for programmatic + /// branching (e.g. `"C217"` means already-exists; pass `overwrite=true` + /// to replace). `message` carries the corrective action an LLM agent + /// needs to make a successful second call. #[serde(skip_serializing_if = "Option::is_none")] - pub error: Option, + pub error: Option, } pub async fn handle( @@ -76,33 +106,46 @@ fn create_one( cfg: &CoderConfig, spec: CreateFileSpec, ) -> CreateFileResult { - let path = spec.path.clone(); - match try_create_one(resolver, cfg, spec) { + // Resolve up front: from here on every filesystem operation uses ONLY + // the resolver-returned path (never re-derived from the raw request), + // and the result echoes that canonical absolute path. When resolution + // fails there is no canonical path, so the input is echoed verbatim. + let abs = match resolver.require_writable(&spec.path) { + Ok(abs) => abs, + Err(e) => { + return CreateFileResult { + path: spec.path, + success: false, + bytes_written: 0, + error: Some((&e).into()), + } + } + }; + let wire_path = abs.display().to_string(); + match try_create_one(cfg, &abs, spec) { Ok(bytes) => CreateFileResult { - path, + path: wire_path, success: true, bytes_written: bytes, error: None, }, Err(e) => CreateFileResult { - path, + path: wire_path, success: false, bytes_written: 0, - error: Some(e.to_wire_string()), + error: Some((&e).into()), }, } } -fn try_create_one( - resolver: &PathResolver, - cfg: &CoderConfig, - spec: CreateFileSpec, -) -> Result { - let abs = resolver.require_writable(&spec.path)?; +fn try_create_one(cfg: &CoderConfig, abs: &Path, spec: CreateFileSpec) -> Result { let bytes = spec.content.as_bytes(); if (bytes.len() as u64) > cfg.max_write_bytes { return Err(CoderError::TooLarge(format!( - "{} bytes exceeds max_write_bytes {}", + "{} is {} bytes, which exceeds max_write_bytes ({}). \ + Split the content into smaller files or raise \ + max_write_bytes in coder config.", + spec.path, bytes.len(), cfg.max_write_bytes ))); @@ -115,11 +158,13 @@ fn try_create_one( } if spec.parents { if let Some(parent) = abs.parent() { - std::fs::create_dir_all(parent).map_err(CoderError::from)?; + // io_for_path names spec.path (caller-supplied, redaction-safe) + // rather than the derived parent directory. + std::fs::create_dir_all(parent).map_err(|e| CoderError::io_for_path(e, &spec.path))?; } } - std::fs::write(&abs, bytes).map_err(CoderError::from)?; - apply_mode(&abs, &spec.mode)?; + std::fs::write(abs, bytes).map_err(|e| CoderError::io_for_path(e, &spec.path))?; + apply_mode(abs, &spec.mode)?; Ok(bytes.len() as u64) } @@ -145,7 +190,7 @@ mod tests { fn setup() -> (tempfile::TempDir, Arc, Arc) { let tmp = tempdir().unwrap(); let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], non_accessible_globs: vec!["**/.env".to_string()], max_read_bytes: 1024 * 1024, max_write_bytes: 1024 * 1024, @@ -175,6 +220,15 @@ mod tests { .unwrap(); assert!(out.results[0].success); assert_eq!(out.results[0].bytes_written, 5); + // Successful entries echo the canonical absolute path. + assert_eq!( + out.results[0].path, + std::fs::canonicalize(tmp.path()) + .unwrap() + .join("a.txt") + .display() + .to_string() + ); assert_eq!( std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), "hello" @@ -223,7 +277,8 @@ mod tests { .await .unwrap(); assert!(!out.results[0].success); - assert!(out.results[0].error.as_deref().unwrap().contains("C217")); + let err = out.results[0].error.as_ref().unwrap(); + assert_eq!(err.code, "C217"); assert_eq!( std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), "old" @@ -275,14 +330,14 @@ mod tests { .await .unwrap(); assert!(!out.results[0].success); - assert!(out.results[0].error.as_deref().unwrap().contains("C211")); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C211"); } #[tokio::test] async fn refuses_oversize() { let (_tmp, r, _c) = setup(); let small_cfg = Arc::new(CoderConfig { - base_path: _tmp.path().to_path_buf(), + base_paths: vec![_tmp.path().to_path_buf()], non_accessible_globs: vec![], max_write_bytes: 4, ..CoderConfig::default() @@ -303,7 +358,7 @@ mod tests { .await .unwrap(); assert!(!out.results[0].success); - assert!(out.results[0].error.as_deref().unwrap().contains("C213")); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C213"); } #[tokio::test] @@ -338,4 +393,39 @@ mod tests { assert!(out.results[1].success); assert!(tmp.path().join("ok.txt").exists()); } + + /// The per-entry `error` field must serialize as a raw JSON object — + /// NOT a JSON string containing escaped JSON. An LLM agent reading + /// `"code":"C2` directly as an object key requires no mental + /// unescaping; the old wire shape `\"code\":\"C2` was a double-encode. + #[tokio::test] + async fn error_field_serializes_as_structured_object_not_escaped_string() { + let (_tmp, r, c) = setup(); + let out = handle( + r, + c, + CreateFileInput { + files: vec![CreateFileSpec { + path: ".env".into(), + content: "x".into(), + mode: "0644".into(), + parents: true, + overwrite: false, + }], + }, + ) + .await + .unwrap(); + let serialized = serde_json::to_string(&out.results[0]).unwrap(); + // Structured object key must appear raw. + assert!( + serialized.contains(r#""code":"C2"#), + "expected raw object key; got: {serialized}" + ); + // Double-encoded form must NOT appear. + assert!( + !serialized.contains(r#"\"code\""#), + "double-encoded JSON detected; got: {serialized}" + ); + } } diff --git a/coder/src/functions/delete_file.rs b/coder/src/functions/delete_file.rs index c13b8a1e..3d6f64d2 100644 --- a/coder/src/functions/delete_file.rs +++ b/coder/src/functions/delete_file.rs @@ -1,7 +1,7 @@ //! `coder::delete-file` — remove one or more paths. Per-path errors are //! reported in the result array rather than failing the whole batch. //! Directories require `recursive: true`. Non-accessible paths return -//! `C211`. Trying to delete `base_path` itself is rejected. +//! `C211`. Trying to delete an allowed root itself is rejected. use std::path::Path; use std::sync::Arc; @@ -9,17 +9,32 @@ use std::sync::Arc; use schemars::JsonSchema; use serde::{Deserialize, Serialize}; -use crate::error::{err_to_string, CoderError}; +use crate::error::{err_to_string, CoderError, WireError}; use crate::path::PathResolver; +// examples are wire-contract; goldens pin them. #[derive(Debug, Deserialize, JsonSchema)] +#[schemars(example = "example_delete_file_input")] pub struct DeleteFileInput { + /// Paths to remove. Each entry is relative to the primary allowed root, + /// or an absolute path inside any allowed root. Call `coder::info` to + /// see the allowed roots. Paths outside every allowed root are rejected + /// — use the shell worker's `shell::fs::*` for host paths outside the + /// jail. pub paths: Vec, /// Required for non-empty directories. Files and empty dirs ignore it. #[serde(default)] pub recursive: bool, } +// examples are wire-contract; goldens pin them. +fn example_delete_file_input() -> serde_json::Value { + serde_json::json!({ + "paths": ["src/old_module.rs", "build/artifacts"], + "recursive": true + }) +} + #[derive(Debug, Serialize, JsonSchema)] pub struct DeleteFileOutput { pub results: Vec, @@ -27,11 +42,17 @@ pub struct DeleteFileOutput { #[derive(Debug, Serialize, JsonSchema)] pub struct DeleteFileResult { + /// Canonical absolute path (resolved through the jail); the caller's + /// input verbatim when resolution failed. pub path: String, pub success: bool, pub removed: bool, + /// Structured error for this entry. `code` is stable for programmatic + /// branching (e.g. `"C211"` for not-found-or-denied; `"C210"` for + /// refusing to delete an allowed root). `message` carries the + /// corrective action an LLM agent needs to make a successful second call. #[serde(skip_serializing_if = "Option::is_none")] - pub error: Option, + pub error: Option, } pub async fn handle( @@ -51,30 +72,49 @@ pub async fn handle( } fn delete_one(resolver: &PathResolver, rel: &str, recursive: bool) -> DeleteFileResult { - match try_delete_one(resolver, rel, recursive) { + // Resolve up front: deletion operates ONLY on the resolver-returned + // path, and the result echoes that canonical absolute path. When + // resolution fails there is no canonical path, so the caller's input + // is echoed verbatim. + let abs = match resolver.require_writable(rel) { + Ok(abs) => abs, + Err(e) => { + return DeleteFileResult { + path: rel.to_string(), + success: false, + removed: false, + error: Some((&e).into()), + } + } + }; + let wire_path = abs.display().to_string(); + match try_delete_one(resolver, &abs, recursive) { Ok(removed) => DeleteFileResult { - path: rel.to_string(), + path: wire_path, success: true, removed, error: None, }, Err(e) => DeleteFileResult { - path: rel.to_string(), + path: wire_path, success: false, removed: false, - error: Some(e.to_wire_string()), + error: Some((&e).into()), }, } } -fn try_delete_one(resolver: &PathResolver, rel: &str, recursive: bool) -> Result { - let abs = resolver.require_writable(rel)?; - if abs == resolver.base_root() { +fn try_delete_one( + resolver: &PathResolver, + abs: &Path, + recursive: bool, +) -> Result { + if resolver.is_root(abs) { return Err(CoderError::BadInput( - "refusing to delete base_path itself".into(), + "refusing to delete an allowed root itself".into(), )); } - let md = match std::fs::symlink_metadata(&abs) { + let md = match std::fs::symlink_metadata(abs) { Ok(m) => m, Err(e) if e.kind() == std::io::ErrorKind::NotFound => { // Idempotent: missing target counts as "not removed, no error". @@ -84,19 +124,19 @@ fn try_delete_one(resolver: &PathResolver, rel: &str, recursive: bool) -> Result }; if md.file_type().is_dir() { if recursive { - remove_dir_all_safe(&abs, resolver)?; + remove_dir_all_safe(abs, resolver)?; } else { - std::fs::remove_dir(&abs).map_err(CoderError::from)?; + std::fs::remove_dir(abs).map_err(CoderError::from)?; } } else { - std::fs::remove_file(&abs).map_err(CoderError::from)?; + std::fs::remove_file(abs).map_err(CoderError::from)?; } Ok(true) } /// `std::fs::remove_dir_all` plus a guard rail: refuse to descend through /// non-accessible entries. The resolver canonicalised `abs` already so -/// it's known to be inside `base_root`. +/// it's known to be inside an allowed root. fn remove_dir_all_safe(abs: &Path, resolver: &PathResolver) -> Result<(), CoderError> { for entry in walkdir::WalkDir::new(abs) .min_depth(1) @@ -105,11 +145,13 @@ fn remove_dir_all_safe(abs: &Path, resolver: &PathResolver) -> Result<(), CoderE .filter_map(|e| e.ok()) { if resolver.is_non_accessible(entry.path()) { - return Err(CoderError::NotFoundOrDenied(format!( - "recursive delete blocked: {} contains non-accessible {}", - abs.display(), - entry.path().display() - ))); + // REDACTION INVARIANT: do NOT name the discovered child path. + // Naming it would allow callers to enumerate protected entries + // by probing recursive deletes. The sanctioned constructor + // references only the caller-supplied `abs`. + return Err(CoderError::not_found_or_denied_subtree( + &abs.display().to_string(), + )); } } std::fs::remove_dir_all(abs).map_err(CoderError::from) @@ -123,7 +165,7 @@ mod tests { fn setup() -> (tempfile::TempDir, Arc) { let tmp = tempdir().unwrap(); let cfg = Arc::new(crate::config::CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], non_accessible_globs: vec!["**/.env".to_string()], ..crate::config::CoderConfig::default() }); @@ -179,7 +221,7 @@ mod tests { .await .unwrap(); assert!(!out.results[0].success); - assert!(out.results[0].error.as_deref().unwrap().contains("C211")); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C211"); assert!(tmp.path().join(".env").exists()); } @@ -233,10 +275,47 @@ mod tests { .await .unwrap(); assert!(!out.results[0].success); - assert!(out.results[0].error.as_deref().unwrap().contains("C211")); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C211"); assert!(tmp.path().join("d/.env").exists()); } + // REDACTION INVARIANT: the error message for a recursive-delete blocked + // by a non-accessible child MUST NOT contain the child's filename. The + // caller supplied "d", so only "d" (its canonical absolute form) may + // appear — the child ".env" must be invisible to the caller. + #[tokio::test] + async fn recursive_blocked_error_does_not_leak_child_path() { + let (tmp, r) = setup(); + std::fs::create_dir(tmp.path().join("secrets")).unwrap(); + std::fs::write(tmp.path().join("secrets/.env"), "API_KEY=secret").unwrap(); + let out = handle( + r, + DeleteFileInput { + paths: vec!["secrets".into()], + recursive: true, + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + let err = out.results[0].error.as_ref().unwrap(); + // Code must be C211. + assert_eq!(err.code, "C211", "expected C211, got: {:?}", err.code); + // The discovered child name must NOT appear in the error. + assert!( + !err.message.contains(".env"), + "REDACTION INVARIANT violated: error leaks child '.env': {}", + err.message + ); + // The caller-supplied directory name IS allowed to appear (it was + // the input they gave us). + assert!( + err.message.contains("secrets"), + "expected caller path 'secrets' in error, got: {}", + err.message + ); + } + #[tokio::test] async fn refuses_to_delete_base_root() { let (_tmp, r) = setup(); @@ -250,6 +329,6 @@ mod tests { .await .unwrap(); assert!(!out.results[0].success); - assert!(out.results[0].error.as_deref().unwrap().contains("C210")); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C210"); } } diff --git a/coder/src/functions/info.rs b/coder/src/functions/info.rs new file mode 100644 index 00000000..5c974b7c --- /dev/null +++ b/coder/src/functions/info.rs @@ -0,0 +1,251 @@ +//! `coder::info` — report the jail contract: allowed roots, caps, +//! response budgets, default noise excludes (`default_exclude_globs`), +//! and non-accessible glob patterns. No I/O; pure read from the runtime +//! `PathResolver` and `CoderConfig`. Call this first when unsure where +//! coder may read or write, or when a path was rejected. + +use std::sync::Arc; + +use schemars::JsonSchema; +use serde::{Deserialize, Serialize}; + +use crate::config::CoderConfig; +use crate::path::PathResolver; + +/// No arguments — `coder::info` is a pure discovery call. +// examples are wire-contract; goldens pin them. +#[derive(Debug, Default, Deserialize, JsonSchema)] +#[schemars(example = "example_info_input")] +pub struct InfoInput {} + +// examples are wire-contract; goldens pin them. +fn example_info_input() -> serde_json::Value { + serde_json::json!({}) +} + +#[derive(Debug, Serialize, JsonSchema)] +pub struct InfoOutput { + /// Canonical absolute paths of the allowed roots, in configuration order. + /// The primary root (index 0) is where relative wire paths resolve; an + /// absolute path is accepted when it canonicalises inside ANY of these. + /// Paths outside every root are rejected — use `shell::fs::*` instead. + pub base_paths: Vec, + + /// Convenience duplicate of `base_paths[0]` — the primary allowed root. + /// Relative paths resolve against this directory. + pub primary_root: String, + + /// Glob patterns matched per root (root-relative). Files whose + /// root-relative path matches are listable but not + /// readable/writable/deletable/creatable; they return C211. + pub non_accessible_globs: Vec, + + /// Noise-exclusion globs (root-relative, same matching as + /// `non_accessible_globs`): matching paths (node_modules, .git, …) + /// are omitted from `coder::search` results and pruned from + /// `coder::tree` descent — the directory surfaces as a childless + /// `truncated` stub. Hide-only — no access protection. Pass + /// `use_default_excludes: false` on those calls to look inside. + pub default_exclude_globs: Vec, + + /// Per-file IO ceiling for `coder::read-file`. Full reads of files + /// larger than this are rejected with C213; windowed reads cap the + /// returned window bytes instead, so larger files stay readable + /// window by window. Also the ceiling for `coder::search` content + /// scanning — larger files are silently skipped during search. + pub max_read_bytes: u64, + + /// Maximum bytes that `coder::create-file` or `coder::update-file` will + /// accept for a single file write. Larger writes are rejected with C213. + pub max_write_bytes: u64, + + /// Default `max_depth` used by `coder::tree` when the caller omits it. + pub tree_default_depth: u32, + + /// Maximum entries returned per folder node by `coder::tree`; folders + /// that exceed this are flagged `truncated`. + pub tree_per_folder_limit: u32, + + /// Default `page_size` used by `coder::list-folder` when the caller + /// omits it. + pub list_default_page_size: u32, + + /// Hard cap on `page_size` accepted by `coder::list-folder`. + pub list_max_page_size: u32, + + /// Default `max_matches` used by `coder::search` when the caller omits + /// it. + pub search_default_max_matches: u32, + + /// Per-line byte cap in `coder::search`: matching considers at most + /// this many bytes of each line, and matched/context lines are + /// truncated to it. + pub search_default_max_line_bytes: u32, + + /// Aggregate byte budget for one `coder::search` response, measured + /// in payload bytes (paths + matched text + context lines). When the + /// budget is hit the response sets `truncated: true` — refine the + /// query or add `include_globs`. + pub search_response_budget_bytes: u64, + + /// Aggregate budget across a single `paths[]` batch call to + /// `coder::read-file`, measured in bytes of returned content (after + /// UTF-8 sanitization — invalid bytes expand to U+FFFD before being + /// counted, so the cap bounds what the caller actually receives). + /// Entries are collected in request order; each entry may consume up + /// to `min(remaining_budget, max_read_bytes)`. An entry reached with + /// zero remaining budget receives a per-entry C213 naming this key, + /// its value, and the bytes already consumed, with recovery guidance. + /// Budget topology: batch reads are governed by this key; single-path + /// full reads by `max_output_bytes`; windowed reads by `max_read_bytes` + /// applied per returned window — `max_read_bytes` is also the per-file + /// IO ceiling for all of them. + pub batch_read_budget_bytes: u64, + + /// Context budget for single-path FULL reads in `coder::read-file`, + /// in bytes of returned content. Full reads larger than this return + /// C213 with the file's size/line count and window/stat recovery + /// guidance; a per-call `max_output_bytes` override is available on + /// `coder::read-file` (clamped to `max_read_bytes`). + pub max_output_bytes: u64, + + /// Coder worker version (`CARGO_PKG_VERSION`). + pub version: String, +} + +pub async fn handle( + resolver: Arc, + cfg: Arc, +) -> Result { + Ok(inner(&resolver, &cfg)) +} + +fn inner(resolver: &PathResolver, cfg: &CoderConfig) -> InfoOutput { + let base_paths: Vec = resolver + .roots() + .iter() + .map(|p| p.display().to_string()) + .collect(); + let primary_root = base_paths[0].clone(); + + InfoOutput { + base_paths, + primary_root, + non_accessible_globs: cfg.non_accessible_globs.clone(), + default_exclude_globs: cfg.default_exclude_globs.clone(), + max_read_bytes: cfg.max_read_bytes, + max_write_bytes: cfg.max_write_bytes, + tree_default_depth: cfg.tree_default_depth, + tree_per_folder_limit: cfg.tree_per_folder_limit, + list_default_page_size: cfg.list_default_page_size, + list_max_page_size: cfg.list_max_page_size, + search_default_max_matches: cfg.search_default_max_matches, + search_default_max_line_bytes: cfg.search_default_max_line_bytes, + search_response_budget_bytes: cfg.search_response_budget_bytes, + batch_read_budget_bytes: cfg.batch_read_budget_bytes, + max_output_bytes: cfg.max_output_bytes, + version: env!("CARGO_PKG_VERSION").to_string(), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use std::path::PathBuf; + use tempfile::tempdir; + + fn make_resolver_cfg( + roots: Vec, + globs: Vec<&str>, + ) -> (Arc, Arc) { + let cfg = Arc::new(CoderConfig { + base_paths: roots, + non_accessible_globs: globs.into_iter().map(String::from).collect(), + ..CoderConfig::default() + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + (resolver, cfg) + } + + /// `base_paths` in the output must be CANONICAL, not the raw configured + /// form. On macOS /tmp is a symlink to /private/tmp — we verify by + /// using a path form that differs from its canonical counterpart. + #[test] + fn canonical_not_raw() { + let tmp = tempdir().unwrap(); + // Construct a non-canonical path form: append a "." segment. + let non_canon = tmp.path().join("."); + let (resolver, cfg) = make_resolver_cfg(vec![non_canon], vec![]); + + let out = inner(&resolver, &cfg); + + let expected_canon = std::fs::canonicalize(tmp.path()) + .unwrap() + .display() + .to_string(); + assert_eq!( + out.base_paths, + vec![expected_canon.clone()], + "base_paths must be canonical, not raw configured form" + ); + assert_eq!( + out.primary_root, expected_canon, + "primary_root must equal base_paths[0]" + ); + } + + /// All caps, globs, and version must be populated from config; + /// primary_root must equal base_paths[0]; version must match + /// CARGO_PKG_VERSION. + #[test] + fn field_completeness() { + let tmp = tempdir().unwrap(); + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec!["**/.env".to_string(), "**/*.pem".to_string()], + default_exclude_globs: vec!["**/build/**".to_string()], + max_read_bytes: 42, + max_write_bytes: 43, + tree_default_depth: 7, + tree_per_folder_limit: 9, + list_default_page_size: 11, + list_max_page_size: 13, + search_default_max_matches: 17, + search_default_max_line_bytes: 19, + search_response_budget_bytes: 29, + batch_read_budget_bytes: 23, + max_output_bytes: 31, + ..CoderConfig::default() + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + + let out = inner(&resolver, &cfg); + + // primary_root == base_paths[0] + assert!(!out.base_paths.is_empty()); + assert_eq!(out.primary_root, out.base_paths[0]); + + // globs + assert_eq!( + out.non_accessible_globs, + vec!["**/.env".to_string(), "**/*.pem".to_string()] + ); + assert_eq!(out.default_exclude_globs, vec!["**/build/**".to_string()]); + + // caps + assert_eq!(out.max_read_bytes, 42); + assert_eq!(out.max_write_bytes, 43); + assert_eq!(out.tree_default_depth, 7); + assert_eq!(out.tree_per_folder_limit, 9); + assert_eq!(out.list_default_page_size, 11); + assert_eq!(out.list_max_page_size, 13); + assert_eq!(out.search_default_max_matches, 17); + assert_eq!(out.search_default_max_line_bytes, 19); + assert_eq!(out.search_response_budget_bytes, 29); + assert_eq!(out.batch_read_budget_bytes, 23); + assert_eq!(out.max_output_bytes, 31); + + // version + assert_eq!(out.version, env!("CARGO_PKG_VERSION")); + } +} diff --git a/coder/src/functions/list_folder.rs b/coder/src/functions/list_folder.rs index 251051e2..8be382a6 100644 --- a/coder/src/functions/list_folder.rs +++ b/coder/src/functions/list_folder.rs @@ -12,9 +12,15 @@ use crate::config::CoderConfig; use crate::error::{err_to_string, CoderError}; use crate::path::PathResolver; +// examples are wire-contract; goldens pin them. #[derive(Debug, Deserialize, JsonSchema)] +#[schemars(example = "example_list_folder_input")] pub struct ListFolderInput { - /// Folder, relative to `base_path`. Defaults to `.` (the base itself). + /// Folder to list. Relative to the primary allowed root, or an absolute + /// path inside any allowed root. Defaults to `.` (the primary root + /// itself). Call `coder::info` to see the allowed roots. Paths outside + /// every allowed root are rejected — use the shell worker's + /// `shell::fs::*` for host paths outside the jail. #[serde(default = "default_path")] pub path: String, #[serde(default = "default_page")] @@ -32,8 +38,21 @@ fn default_page() -> u32 { 1 } +// examples are wire-contract; goldens pin them. +fn example_list_folder_input() -> serde_json::Value { + serde_json::json!({ + "path": "src", + "page": 1, + "page_size": 50 + }) +} + #[derive(Debug, Serialize, JsonSchema)] pub struct ListFolderOutput { + /// Canonical absolute path of the listed folder (resolved through the + /// jail). Entries carry only `name`; derive an entry's absolute path + /// by joining: entry path = this path + "/" + name. Operations on + /// derived paths re-validate through the jail. pub path: String, pub entries: Vec, pub total: u64, @@ -44,6 +63,8 @@ pub struct ListFolderOutput { #[derive(Debug, Serialize, JsonSchema)] pub struct DirEntry { + /// Entry basename. The absolute path is derivable from the response's + /// `path`: entry path = folder path + "/" + name. pub name: String, pub kind: EntryKind, pub size: u64, @@ -79,7 +100,10 @@ fn inner( // list a directory that contains non-accessible *children* even if the // directory itself happened to match a glob. let abs = resolver.resolve(&req.path)?; - let md = std::fs::metadata(&abs)?; + // NotFound is intercepted with the wire path in scope so the C211 + // message names the path the caller supplied (standardized wording — + // REDACTION INVARIANT: identical to the glob-denied message). + let md = std::fs::metadata(&abs).map_err(|e| CoderError::io_for_path(e, &req.path))?; if !md.is_dir() { return Err(CoderError::BadInput(format!( "not a directory: {}", @@ -95,7 +119,8 @@ fn inner( let page = req.page.max(1); let mut all: Vec = Vec::new(); - for entry in std::fs::read_dir(&abs)? { + let read_dir = std::fs::read_dir(&abs).map_err(|e| CoderError::io_for_path(e, &req.path))?; + for entry in read_dir { let e = entry?; let name = e.file_name().to_string_lossy().into_owned(); let entry_md = match e.metadata() { @@ -124,7 +149,7 @@ fn inner( let has_more = (end as u64) < total; Ok(ListFolderOutput { - path: req.path, + path: abs.display().to_string(), entries, total, page, @@ -179,7 +204,7 @@ mod tests { fn setup() -> (tempfile::TempDir, Arc, Arc) { let tmp = tempdir().unwrap(); let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], non_accessible_globs: vec!["**/.env".to_string()], list_default_page_size: 100, list_max_page_size: 1000, @@ -210,6 +235,14 @@ mod tests { assert_eq!(names, vec!["a.txt", "b.txt", "dir"]); assert_eq!(out.total, 3); assert!(!out.has_more); + // The folder path is canonical-absolute (decision D2-eng); + // entries carry only names. + let base = std::fs::canonicalize(tmp.path()).unwrap(); + assert_eq!(out.path, base.display().to_string()); + // WIRE-CONTRACT PIN: the documented derivation rule (entry path = + // folder path + "/" + name) must reproduce the real fs path. + let derived = format!("{}/{}", out.path, out.entries[0].name); + assert_eq!(derived, base.join("a.txt").display().to_string()); } #[tokio::test] @@ -240,7 +273,7 @@ mod tests { let (tmp, r, _c) = setup(); std::fs::write(tmp.path().join("x.txt"), "x").unwrap(); let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], list_default_page_size: 50, list_max_page_size: 5, ..CoderConfig::default() diff --git a/coder/src/functions/mod.rs b/coder/src/functions/mod.rs index 6ad73d46..6727f9c0 100644 --- a/coder/src/functions/mod.rs +++ b/coder/src/functions/mod.rs @@ -3,11 +3,20 @@ //! `register_` below uses `RegisterFunction::new_async` with typed //! `JsonSchema` request/response structs so the SDK can emit schemas for //! tools and docs (binary-worker.md §7). +//! +//! WIRE-SURFACE CATALOG — `catalog()` below is the single source of truth +//! for every function's id + registration description, plus the +//! schemars-derived request/response schemas. The golden test +//! `tests/golden_schemas.rs` snapshots each entry so ANY change to the +//! agent-facing wire surface shows up as an explicit, reviewed diff. pub mod create_file; pub mod delete_file; +pub mod info; pub mod list_folder; +pub mod move_file; pub mod read_file; +pub mod read_window; pub mod search; pub mod tree; pub mod update_file; @@ -16,107 +25,328 @@ use std::sync::Arc; use iii_sdk::{IIIError, RegisterFunction, III}; -use crate::config::CoderConfig; +use crate::configuration::ConfigCell; use crate::path::PathResolver; -pub fn register_all(iii: &III, resolver: Arc, cfg: Arc) { +// --------------------------------------------------------------------------- +// Function ids + registration descriptions (ONE place). +// +// The jail-contract sentence in every description below also appears in +// the schema field docs of each input struct. This duplication is +// deliberate: the catalog description and the schema docs reach agents +// through DIFFERENT surfaces — `functions::list` shows descriptions only, +// while the schema docs surface via `functions::info`. Do not "DRY" one +// into the other. +// --------------------------------------------------------------------------- + +const INFO_ID: &str = "coder::info"; +const INFO_DESC: &str = "Report the coder jail: canonical allowed roots (primary first), \ + per-file size caps, response budgets (max_output_bytes, \ + batch_read_budget_bytes, search_response_budget_bytes), \ + listing/search limits, the non-accessible glob patterns, and the \ + default_exclude_globs noise filter applied by tree/search. Call \ + this FIRST when unsure where coder may read or write, or when a \ + path was rejected — paths outside every allowed root need the \ + shell worker's shell::fs::* instead."; + +const READ_FILE_ID: &str = "coder::read-file"; +const READ_FILE_DESC: &str = "Read a file window-first: probe with stat: true (size/mtime/mode \ + plus total_lines, no content), then fetch just the lines you need \ + with line_from/line_to (1-based, inclusive) — windows keep files \ + larger than max_read_bytes readable window by window, with \ + more_lines/total_lines reporting what remains. numbered: true \ + prefixes each line with its absolute 1-based file line number, \ + matching coder::update-file's line ops exactly. Full reads are \ + budgeted by max_output_bytes (default 128 KiB; per-call override \ + clamped to max_read_bytes) — an over-budget full read fails with \ + a C213 carrying the file's size, line count, and the window/stat \ + recovery calls. Batch mode: pass paths[] (XOR path) \ + to read multiple files in one call — entries are processed in \ + request order against batch_read_budget_bytes, measured in \ + bytes of returned content (after UTF-8 sanitization); per-entry \ + errors (C211/C213) leave other entries unaffected. Paths are relative \ + to the primary allowed root or absolute inside any allowed root \ + (coder::info lists them); for host paths outside the jail use \ + shell::fs::*. Non-accessible paths return C211."; + +const SEARCH_ID: &str = "coder::search"; +const SEARCH_DESC: &str = "Search file contents and/or paths. Supports literal or regex \ + queries with include/exclude globs; non-accessible files are \ + excluded from both content and path results. Only the FIRST match \ + on each line is reported (one content match per matching line). \ + Optional context_lines_before/context_lines_after (max 10) attach \ + surrounding lines to each content match so many edits can go \ + straight to coder::update-file with no read in between. Noise \ + paths matching default_exclude_globs (.git, node_modules, target, \ + … — coder::info lists them) are skipped by default; pass \ + use_default_excludes: false to search inside them. Files larger \ + than max_read_bytes are silently skipped during content scanning. \ + Results are capped by max_matches AND a response byte budget \ + (search_response_budget_bytes); when truncated is true, refine \ + the query or add include_globs rather than paginate. Paths are relative \ + to the primary allowed root or absolute inside any allowed root \ + (coder::info lists them); for host paths outside the jail use \ + shell::fs::*."; + +const UPDATE_FILE_ID: &str = "coder::update-file"; +const UPDATE_FILE_DESC: &str = "Apply batched line-oriented and regex edits across one or more \ + files. Request shape: {\"files\": [{\"path\": \"...\", \"ops\": [...]}]}. \ + Line ops: { op: 'insert', at_line, content } | \ + { op: 'remove', from_line, to_line } | \ + { op: 'update_lines', from_line, to_line, content } — 1-based, \ + inclusive, applied bottom-up. Regex op: { op: 'replace', pattern, \ + replacement, ignore_case?, dot_matches_newline?, expect_matches? } \ + runs on the file body after line ops. Replace large regions \ + WITHOUT quoting them: two short anchors joined by .*? with \ + dot_matches_newline: true — always prefer wildcards over pasting \ + the block into the pattern. expect_matches: 1 turns a silent \ + multi-site clobber into a safe pre-write C210; expect_matches: 0 \ + asserts absence. In `replacement`, $1/${name} are capture \ + references and a literal $ must be written $$ (JS/TS template \ + literals: `Hello, $${name}!`); undefined references fail \ + pre-write with C210. Each file commits atomically via temp + \ + rename. On success each applied line op returns a bounded \ + post-apply echo (±2 context lines); regex replace ops return up \ + to 5 per-match-site echoes (first + last line of each replaced \ + region, inner lines elided) — verify from the echoes instead of \ + re-reading the file. Paths are relative to the primary \ + allowed root or absolute inside any allowed root (coder::info \ + lists them); for host paths outside the jail use shell::fs::*."; + +const CREATE_FILE_ID: &str = "coder::create-file"; +const CREATE_FILE_DESC: &str = "Create one or more files. Request shape: {\"files\": [{\"path\": \ + \"...\", \"content\": \"...\"}]}. Per-file `overwrite` and `parents` \ + flags; non-accessible paths return C211. Paths are relative to \ + the primary allowed root or absolute inside any allowed root \ + (coder::info lists them); for host paths outside the jail use \ + shell::fs::*."; + +const DELETE_FILE_ID: &str = "coder::delete-file"; +const DELETE_FILE_DESC: &str = "Remove one or more paths. Request shape: {\"paths\": [\"...\"]}. \ + Directories need `recursive: true`; \ + missing paths are idempotent successes; recursive removal \ + refuses to descend through non-accessible entries. Paths are \ + relative to the primary allowed root or absolute inside any \ + allowed root (coder::info lists them); for host paths outside \ + the jail use shell::fs::*."; + +const LIST_FOLDER_ID: &str = "coder::list-folder"; +const LIST_FOLDER_DESC: &str = "Paginated single-folder listing, sorted by name. Entries carry \ + only `name`; derive an entry's absolute path as the response's \ + `path` + '/' + name. Non-accessible entries are still listed with a \ + `non_accessible: true` flag. Paths are relative to the primary \ + allowed root or absolute inside any allowed root (coder::info \ + lists them); for host paths outside the jail use shell::fs::*."; + +const TREE_ID: &str = "coder::tree"; +const TREE_DESC: &str = "Recursive directory snapshot bounded by `max_depth` and a \ + `per_folder_limit`. Slim wire shape: nodes carry only `name` — \ + the root node's path IS the response's top-level `path`; derive \ + any child's path as parent path + '/' + name. Folders that hit \ + the limit are flagged `truncated` and the caller is pointed at \ + coder::list-folder for pagination. Noise directories matching \ + default_exclude_globs (.git, node_modules, target, … — \ + coder::info lists them) appear as childless `truncated` stubs; \ + pass use_default_excludes: false to descend into them. Paths are \ + relative to the primary allowed root or absolute inside any \ + allowed root (coder::info lists them); for host paths outside \ + the jail use shell::fs::*."; + +const MOVE_FILE_ID: &str = "coder::move"; +const MOVE_FILE_DESC: &str = "Move or rename one or more paths inside the jail. Request shape: \ + {\"files\": [{\"from\": \"...\", \"to\": \"...\"}]}. Paths are \ + relative to the primary allowed root or absolute inside any \ + allowed root (coder::info lists them); for host paths outside \ + the jail use shell::fs::*. Per-entry `overwrite` and `parents` \ + flags. Same-root moves use a per-file-atomic rename; cross-root \ + moves use copy+delete (files only — cross-root directory moves \ + are unsupported, move files individually). Copy+delete is \ + rollback-safe: if source deletion fails after a successful copy \ + the copy is removed and the error names the failure; if rollback \ + also fails the error names both states for manual cleanup."; + +/// One function's complete agent-facing wire surface: id, registration +/// description, and the schemars-derived request/response schemas. +pub struct FunctionSpec { + pub function_id: &'static str, + pub description: &'static str, + pub request_schema: schemars::schema::RootSchema, + pub response_schema: schemars::schema::RootSchema, +} + +/// Schema generation MUST mirror iii-sdk's internal `json_schema_for` +/// (`SchemaSettings::draft07()` on the handler's request/response types): +/// `RegisterFunction::new_async` auto-extracts schemas from the SAME +/// structs referenced here, with the same schemars 0.8 generator settings, +/// so a catalog snapshot pins exactly what registration emits. +fn schema_of() -> schemars::schema::RootSchema { + schemars::r#gen::SchemaSettings::draft07() + .into_generator() + .into_root_schema_for::() +} + +fn spec(function_id: &'static str, description: &'static str) -> FunctionSpec +where + Req: schemars::JsonSchema, + Resp: schemars::JsonSchema, +{ + FunctionSpec { + function_id, + description, + request_schema: schema_of::(), + response_schema: schema_of::(), + } +} + +/// The full wire-surface catalog, in registration order. Golden-tested in +/// `tests/golden_schemas.rs`; keep in lockstep with `register_all`. +pub fn catalog() -> Vec { + vec![ + spec::(INFO_ID, INFO_DESC), + spec::(READ_FILE_ID, READ_FILE_DESC), + spec::(SEARCH_ID, SEARCH_DESC), + spec::( + UPDATE_FILE_ID, + UPDATE_FILE_DESC, + ), + spec::( + CREATE_FILE_ID, + CREATE_FILE_DESC, + ), + spec::( + DELETE_FILE_ID, + DELETE_FILE_DESC, + ), + spec::( + LIST_FOLDER_ID, + LIST_FOLDER_DESC, + ), + spec::(TREE_ID, TREE_DESC), + spec::(MOVE_FILE_ID, MOVE_FILE_DESC), + ] +} + +pub fn register_all(iii: &III, resolver: Arc, cfg: ConfigCell) { + // DRIFT GUARD: the register_* calls below and the entries in + // `catalog()` must stay 1:1 — catalog() feeds the wire-schema goldens + // (tests/golden_schemas.rs). Adding a function to one list but not + // the other trips the debug_assert below (exercised engine-free by + // `tests::register_all_count_matches_catalog`). + let mut registered: usize = 0; + register_info(iii, resolver.clone(), cfg.clone()); + registered += 1; register_read_file(iii, resolver.clone(), cfg.clone()); + registered += 1; register_search(iii, resolver.clone(), cfg.clone()); + registered += 1; register_update_file(iii, resolver.clone(), cfg.clone()); + registered += 1; register_create_file(iii, resolver.clone(), cfg.clone()); + registered += 1; register_delete_file(iii, resolver.clone()); + registered += 1; register_list_folder(iii, resolver.clone(), cfg.clone()); - register_tree(iii, resolver, cfg); - tracing::info!("coder registered 7 functions"); + registered += 1; + register_tree(iii, resolver.clone(), cfg.clone()); + registered += 1; + register_move_file(iii, resolver); + registered += 1; + debug_assert_eq!( + registered, + catalog().len(), + "register_all and catalog() drifted — register every catalog() \ + entry (and vice versa), then regenerate the wire-schema goldens \ + (UPDATE_GOLDENS=1 cargo test)" + ); + tracing::info!(count = registered, "coder registered functions"); +} + +fn register_info(iii: &III, resolver: Arc, cfg: ConfigCell) { + iii.register_function( + INFO_ID, + RegisterFunction::new_async(move |_req: info::InfoInput| { + let resolver = resolver.clone(); + let cfg = cfg.clone(); + async move { + let cfg = cfg.read().await.clone(); + info::handle(resolver, cfg).await.map_err(IIIError::from) + } + }) + .description(INFO_DESC), + ); } -fn register_read_file(iii: &III, resolver: Arc, cfg: Arc) { +fn register_read_file(iii: &III, resolver: Arc, cfg: ConfigCell) { iii.register_function( - "coder::read-file", + READ_FILE_ID, RegisterFunction::new_async(move |req: read_file::ReadFileInput| { let resolver = resolver.clone(); let cfg = cfg.clone(); async move { + let cfg = cfg.read().await.clone(); read_file::handle(resolver, cfg, req) .await .map_err(IIIError::from) } }) - .description( - "Read a file relative to base_path. Returns content plus \ - size/mtime/mode. Capped by max_read_bytes; non-accessible \ - paths return C211.", - ), + .description(READ_FILE_DESC), ); } -fn register_search(iii: &III, resolver: Arc, cfg: Arc) { +fn register_search(iii: &III, resolver: Arc, cfg: ConfigCell) { iii.register_function( - "coder::search", + SEARCH_ID, RegisterFunction::new_async(move |req: search::SearchInput| { let resolver = resolver.clone(); let cfg = cfg.clone(); async move { + let cfg = cfg.read().await.clone(); search::handle(resolver, cfg, req) .await .map_err(IIIError::from) } }) - .description( - "Search file contents and/or paths under base_path. Supports \ - literal or regex queries with include/exclude globs; \ - non-accessible files are excluded from both content and path \ - results.", - ), + .description(SEARCH_DESC), ); } -fn register_update_file(iii: &III, resolver: Arc, cfg: Arc) { +fn register_update_file(iii: &III, resolver: Arc, cfg: ConfigCell) { iii.register_function( - "coder::update-file", + UPDATE_FILE_ID, RegisterFunction::new_async(move |req: update_file::UpdateFileInput| { let resolver = resolver.clone(); let cfg = cfg.clone(); async move { + let cfg = cfg.read().await.clone(); update_file::handle(resolver, cfg, req) .await .map_err(IIIError::from) } }) - .description( - "Apply batched line-oriented and regex edits across one or more \ - files. Line ops: { op: 'insert', at_line, content } | \ - { op: 'remove', from_line, to_line } | \ - { op: 'update_lines', from_line, to_line, content } — 1-based, \ - inclusive, applied bottom-up. Regex op: { op: 'replace', pattern, \ - replacement, ignore_case? } runs on the file body after line \ - ops. Each file commits atomically via temp + rename.", - ), + .description(UPDATE_FILE_DESC), ); } -fn register_create_file(iii: &III, resolver: Arc, cfg: Arc) { +fn register_create_file(iii: &III, resolver: Arc, cfg: ConfigCell) { iii.register_function( - "coder::create-file", + CREATE_FILE_ID, RegisterFunction::new_async(move |req: create_file::CreateFileInput| { let resolver = resolver.clone(); let cfg = cfg.clone(); async move { + let cfg = cfg.read().await.clone(); create_file::handle(resolver, cfg, req) .await .map_err(IIIError::from) } }) - .description( - "Create one or more files. Per-file `overwrite` and `parents` \ - flags; non-accessible paths return C211.", - ), + .description(CREATE_FILE_DESC), ); } fn register_delete_file(iii: &III, resolver: Arc) { iii.register_function( - "coder::delete-file", + DELETE_FILE_ID, RegisterFunction::new_async(move |req: delete_file::DeleteFileInput| { let resolver = resolver.clone(); async move { @@ -125,51 +355,75 @@ fn register_delete_file(iii: &III, resolver: Arc) { .map_err(IIIError::from) } }) - .description( - "Remove one or more paths. Directories need `recursive: true`; \ - missing paths are idempotent successes; recursive removal \ - refuses to descend through non-accessible entries.", - ), + .description(DELETE_FILE_DESC), ); } -fn register_list_folder(iii: &III, resolver: Arc, cfg: Arc) { +fn register_list_folder(iii: &III, resolver: Arc, cfg: ConfigCell) { iii.register_function( - "coder::list-folder", + LIST_FOLDER_ID, RegisterFunction::new_async(move |req: list_folder::ListFolderInput| { let resolver = resolver.clone(); let cfg = cfg.clone(); async move { + let cfg = cfg.read().await.clone(); list_folder::handle(resolver, cfg, req) .await .map_err(IIIError::from) } }) - .description( - "Paginated single-folder listing, sorted by name. \ - Non-accessible entries are still listed with a \ - `non_accessible: true` flag.", - ), + .description(LIST_FOLDER_DESC), ); } -fn register_tree(iii: &III, resolver: Arc, cfg: Arc) { +fn register_tree(iii: &III, resolver: Arc, cfg: ConfigCell) { iii.register_function( - "coder::tree", + TREE_ID, RegisterFunction::new_async(move |req: tree::TreeInput| { let resolver = resolver.clone(); let cfg = cfg.clone(); async move { + let cfg = cfg.read().await.clone(); tree::handle(resolver, cfg, req) .await .map_err(IIIError::from) } }) - .description( - "Recursive directory snapshot bounded by `max_depth` and a \ - `per_folder_limit`. Folders that hit the limit are flagged \ - `truncated` and the caller is pointed at coder::list-folder \ - for pagination.", - ), + .description(TREE_DESC), ); } + +fn register_move_file(iii: &III, resolver: Arc) { + iii.register_function( + MOVE_FILE_ID, + RegisterFunction::new_async(move |req: move_file::MoveFileInput| { + let resolver = resolver.clone(); + async move { + move_file::handle(resolver, req) + .await + .map_err(IIIError::from) + } + }) + .description(MOVE_FILE_DESC), + ); +} + +#[cfg(test)] +mod tests { + use super::*; + + /// DRIFT GUARD execution: `III::new` only buffers registrations into a + /// channel (no connection, no runtime needed), so `register_all` runs + /// engine-free here and its debug_assert fires in `cargo test` when + /// the register_* calls and `catalog()` fall out of 1:1. + #[test] + fn register_all_count_matches_catalog() { + use crate::config::CoderConfig; + use tokio::sync::RwLock; + let iii = III::new("ws://127.0.0.1:1"); + let cfg = CoderConfig::default(); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + let cell: ConfigCell = Arc::new(RwLock::new(Arc::new(cfg))); + register_all(&iii, resolver, cell); + } +} diff --git a/coder/src/functions/move_file.rs b/coder/src/functions/move_file.rs new file mode 100644 index 00000000..18acf87c --- /dev/null +++ b/coder/src/functions/move_file.rs @@ -0,0 +1,1091 @@ +//! `coder::move` — rename or move one or more paths inside the jail. +//! +//! Same-root moves use `std::fs::rename` (atomic, works for files and +//! directories). Cross-root moves are supported for FILES only: copy → verify +//! (length match) → delete source, with rollback when the source delete fails. +//! Cross-root directory moves are rejected (`C210`). Per-entry errors are +//! reported in the result array rather than failing the whole batch. + +use std::path::Path; +use std::sync::Arc; + +use schemars::JsonSchema; +use serde::{Deserialize, Serialize}; + +use crate::error::{err_to_string, CoderError, WireError}; +use crate::path::PathResolver; + +// --------------------------------------------------------------------------- +// Input / output types +// --------------------------------------------------------------------------- + +// examples are wire-contract; goldens pin them. +#[derive(Debug, Deserialize, JsonSchema)] +#[schemars(example = "example_move_file_input")] +pub struct MoveFileInput { + /// Entries to move. Each entry is processed independently so a single + /// failure never aborts the rest. + pub files: Vec, +} + +#[derive(Debug, Deserialize, JsonSchema)] +pub struct MoveFileSpec { + /// Source path: relative to the primary allowed root, or an absolute path + /// inside any allowed root. Call `coder::info` to see the allowed roots. + /// Paths outside every allowed root are rejected — use the shell worker's + /// `shell::fs::*` for host paths outside the jail. + pub from: String, + + /// Destination path: relative to the primary allowed root, or an absolute + /// path inside any allowed root. Call `coder::info` to see the allowed + /// roots. Paths outside every allowed root are rejected — use the shell + /// worker's `shell::fs::*` for host paths outside the jail. + pub to: String, + + /// When false (the default), refuse to overwrite an existing destination. + /// Pass `overwrite: true` to replace an existing file at `to`. + #[serde(default)] + pub overwrite: bool, + + /// Create missing parent directories of the destination. Defaults to true. + #[serde(default = "default_true")] + pub parents: bool, +} + +fn default_true() -> bool { + true +} + +// examples are wire-contract; goldens pin them. +fn example_move_file_input() -> serde_json::Value { + serde_json::json!({ + "files": [ + { "from": "src/old_name.rs", "to": "src/new_name.rs" }, + { "from": "build/output.bin", + "to": "/tmp/coder-cache/output.bin", + "overwrite": true } + ] + }) +} + +#[derive(Debug, Serialize, JsonSchema)] +pub struct MoveFileOutput { + pub results: Vec, +} + +#[derive(Debug, Serialize, JsonSchema)] +pub struct MoveFileResult { + /// Canonical absolute path of the source (resolved through the jail); + /// the caller's input verbatim when resolution failed. + pub from: String, + + /// Canonical absolute path of the destination (resolved through the jail); + /// the caller's input verbatim when resolution failed. + pub to: String, + + pub success: bool, + + /// True only when the move fully completed; false for a no-op + /// self-move (`from` and `to` resolve to the same file). + pub moved: bool, + + /// Structured error for this entry. `code` is stable for programmatic + /// branching (e.g. `"C217"` means destination exists; pass `overwrite=true` + /// to replace; `"C210"` for disallowed operations such as cross-root + /// directory moves, moving a root itself, or a destination that is a + /// directory — the message then names the corrected target path). + /// `message` carries the corrective action an LLM agent needs to make + /// a successful second call. + #[serde(skip_serializing_if = "Option::is_none")] + pub error: Option, +} + +// --------------------------------------------------------------------------- +// Handler +// --------------------------------------------------------------------------- + +pub async fn handle( + resolver: Arc, + req: MoveFileInput, +) -> Result { + if req.files.is_empty() { + return Err(err_to_string(CoderError::BadInput( + "`files` must not be empty".into(), + ))); + } + let mut results = Vec::with_capacity(req.files.len()); + for spec in req.files { + results.push(move_one(&resolver, spec)); + } + Ok(MoveFileOutput { results }) +} + +// --------------------------------------------------------------------------- +// Per-entry logic +// --------------------------------------------------------------------------- + +fn move_one(resolver: &PathResolver, spec: MoveFileSpec) -> MoveFileResult { + // Resolve source. + let abs_from = match resolver.require_writable(&spec.from) { + Ok(p) => p, + Err(e) => { + return MoveFileResult { + from: spec.from, + to: spec.to, + success: false, + moved: false, + error: Some((&e).into()), + } + } + }; + + // Resolve destination (may not exist yet — resolution via fallback is fine). + let abs_to = match resolver.require_writable(&spec.to) { + Ok(p) => p, + Err(e) => { + return MoveFileResult { + from: abs_from.display().to_string(), + to: spec.to, + success: false, + moved: false, + error: Some((&e).into()), + } + } + }; + + // SELF-MOVE GUARD — before ALL other checks. Allowed roots may NEST + // (containing_root is first-match-wins), so `from` and `to` can name + // the SAME file through different wire forms; if such a pair ever + // reached the cross-root copy+delete branch, fs::copy(src, dst) with + // src==dst would TRUNCATE the file before reading it and the + // follow-up delete would remove it entirely — data loss. A self-move + // is a no-op success: moved=false, no error. + if abs_from == abs_to { + let wire = abs_from.display().to_string(); + return MoveFileResult { + from: wire.clone(), + to: wire, + success: true, + moved: false, + error: None, + }; + } + + let wire_from = abs_from.display().to_string(); + let wire_to = abs_to.display().to_string(); + + match try_move_one( + resolver, + &abs_from, + &abs_to, + &spec.from, + &spec.to, + spec.overwrite, + spec.parents, + ) { + Ok(()) => MoveFileResult { + from: wire_from, + to: wire_to, + success: true, + moved: true, + error: None, + }, + Err(e) => MoveFileResult { + from: wire_from, + to: wire_to, + success: false, + moved: false, + error: Some((&e).into()), + }, + } +} + +fn try_move_one( + resolver: &PathResolver, + abs_from: &Path, + abs_to: &Path, + wire_from: &str, + wire_to: &str, + overwrite: bool, + parents: bool, +) -> Result<(), CoderError> { + // ------------------------------------------------------------------ + // Categorical rejections FIRST — everything in this block is + // read-only. A rejected entry must leave ZERO side effects, so no + // filesystem mutation (including parent-dir creation) may run before + // every rejection below has been ruled out. + // ------------------------------------------------------------------ + + // Guard: refuse to move an allowed root itself (mirrors delete's root guard). + if resolver.is_root(abs_from) { + return Err(CoderError::BadInput( + "refusing to move an allowed root itself".into(), + )); + } + + // Source must exist. + let src_meta = + std::fs::symlink_metadata(abs_from).map_err(|e| CoderError::io_for_path(e, wire_from))?; + + let from_root = resolver.containing_root(abs_from); + let to_root = resolver.containing_root(abs_to); + let same_root = match (from_root, to_root) { + (Some(fr), Some(tr)) => fr == tr, + _ => false, + }; + + // Cross-root directory moves are categorically unsupported. + if !same_root && src_meta.is_dir() { + return Err(CoderError::BadInput( + "cross-root directory moves are unsupported; move files individually".into(), + )); + } + + // Destination conflict checks. + if let Ok(dst_meta) = std::fs::symlink_metadata(abs_to) { + if dst_meta.is_dir() && !src_meta.is_dir() { + // overwrite=true can't fix this (a file cannot replace a + // directory via rename), so don't send the caller down the + // C217 dead end — tell them the actual corrective call. + let fname = abs_from + .file_name() + .map(|f| f.to_string_lossy().into_owned()) + .unwrap_or_else(|| "".to_string()); + return Err(CoderError::BadInput(format!( + "{wire_to}: destination is a directory; name the target \ + file inside it (e.g. {wire_to}/{fname})" + ))); + } + if !overwrite { + return Err(CoderError::AlreadyExists(format!( + "{wire_to} already exists; pass overwrite=true to replace" + ))); + } + } + + // ------------------------------------------------------------------ + // MUTATIONS BEGIN — nothing above this line may touch the filesystem + // (pinned by rejected_*_leaves_dst_tree_absent test). + // ------------------------------------------------------------------ + + // Create destination parent directories if requested. + if parents { + if let Some(parent) = abs_to.parent() { + std::fs::create_dir_all(parent).map_err(|e| CoderError::io_for_path(e, wire_to))?; + } + } + + if same_root { + // Same containing root — attempt atomic rename. + // For files: on cross-device errors (rename across mount points + // inside the same logical root, unusual but possible with + // bind-mounts), fall back to copy+delete. + // For directories: rename is the only safe option; surface the error. + if src_meta.is_dir() { + std::fs::rename(abs_from, abs_to).map_err(|e| CoderError::io_for_path(e, wire_from))?; + } else { + match std::fs::rename(abs_from, abs_to) { + Ok(()) => {} + Err(e) if e.kind() == std::io::ErrorKind::CrossesDevices => { + copy_and_delete(abs_from, abs_to, wire_from, wire_to)?; + } + Err(e) => return Err(CoderError::io_for_path(e, wire_from)), + } + } + } else { + // Cross-root FILE move (directories were rejected above): + // copy → verify → delete source, with rollback on failure. + copy_and_delete(abs_from, abs_to, wire_from, wire_to)?; + } + + Ok(()) +} + +/// Copy `src` to `dst`, verify byte count matches, then delete `src`. +/// On source-delete failure: attempt to remove the copied `dst` (rollback). +/// Structured error messages describe the resulting state for both +/// rollback-succeeded and double-fault cases. +fn copy_and_delete( + src: &Path, + dst: &Path, + wire_from: &str, + wire_to: &str, +) -> Result<(), CoderError> { + // Copy. + let copied_bytes = std::fs::copy(src, dst).map_err(|e| CoderError::io_for_path(e, wire_to))?; + + // Verify: compare source size against bytes written. + // TOCTOU caveat: lengths may spuriously mismatch if src is modified + // between copy() and metadata(); acceptable in a jailed single-agent + // context. + let src_len = match std::fs::metadata(src) { + Ok(m) => m.len(), + Err(e) => { + // Metadata read failed after copy — unusual; attempt rollback. + let _ = std::fs::remove_file(dst); + return Err(CoderError::io_for_path(e, wire_from)); + } + }; + + if copied_bytes != src_len { + // Size mismatch — copy incomplete; rollback. + let _ = std::fs::remove_file(dst); + return Err(CoderError::Io(format!( + "copy incomplete: wrote {copied_bytes} bytes but source is {src_len} bytes; \ + copy removed, source unchanged" + ))); + } + + // Delete source. + if let Err(del_err) = std::fs::remove_file(src) { + // Source delete failed — attempt rollback by removing the copy. + match std::fs::remove_file(dst) { + Ok(()) => { + // Rollback succeeded: report the delete failure; state is clean. + return Err(CoderError::Io(format!( + "move rolled back; source unchanged: failed to delete source \ + after copy ({del_err}); the copy at {wire_to} was removed" + ))); + } + Err(rb_err) => { + // Double fault: both delete-src and rollback failed. + return Err(CoderError::Io(format!( + "copy exists at {wire_to}; source remains at {wire_from}; \ + manual cleanup needed: failed to delete source ({del_err}) and \ + rollback also failed ({rb_err})" + ))); + } + } + } + + Ok(()) +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +#[cfg(test)] +mod tests { + use super::*; + use std::sync::Arc; + use tempfile::tempdir; + + fn setup_single() -> (tempfile::TempDir, Arc) { + let tmp = tempdir().unwrap(); + let cfg = Arc::new(crate::config::CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec!["**/.env".to_string()], + ..crate::config::CoderConfig::default() + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + (tmp, resolver) + } + + fn setup_two_roots() -> (tempfile::TempDir, tempfile::TempDir, Arc) { + let tmp0 = tempdir().unwrap(); + let tmp1 = tempdir().unwrap(); + let cfg = Arc::new(crate::config::CoderConfig { + base_paths: vec![tmp0.path().to_path_buf(), tmp1.path().to_path_buf()], + non_accessible_globs: vec!["**/.env".to_string()], + ..crate::config::CoderConfig::default() + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + (tmp0, tmp1, resolver) + } + + // ------------------------------------------------------------------ + // Same-root: rename a file + // ------------------------------------------------------------------ + #[tokio::test] + async fn same_root_rename_file() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("a.txt"), "hello").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "a.txt".into(), + to: "b.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success, "{:?}", out.results[0].error); + assert!(out.results[0].moved); + assert!(!tmp.path().join("a.txt").exists()); + assert_eq!( + std::fs::read_to_string(tmp.path().join("b.txt")).unwrap(), + "hello" + ); + // Echo: canonical absolute paths for both ends. + let canon = std::fs::canonicalize(tmp.path()).unwrap(); + assert_eq!( + out.results[0].from, + canon.join("a.txt").display().to_string() + ); + assert_eq!(out.results[0].to, canon.join("b.txt").display().to_string()); + } + + // ------------------------------------------------------------------ + // Same-root: rename a directory + // ------------------------------------------------------------------ + #[tokio::test] + async fn same_root_rename_directory() { + let (tmp, r) = setup_single(); + std::fs::create_dir(tmp.path().join("src_dir")).unwrap(); + std::fs::write(tmp.path().join("src_dir/file.txt"), "content").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "src_dir".into(), + to: "dst_dir".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success, "{:?}", out.results[0].error); + assert!(!tmp.path().join("src_dir").exists()); + assert!(tmp.path().join("dst_dir/file.txt").exists()); + } + + // ------------------------------------------------------------------ + // SELF-MOVE GUARD: from == to is a no-op success (moved=false) and + // the file is untouched. Without the guard, a same-file pair routed + // to copy+delete would truncate-then-delete the file (data loss). + // ------------------------------------------------------------------ + #[tokio::test] + async fn self_move_is_noop_success_with_content_intact() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("a.txt"), "precious").unwrap(); + for overwrite in [false, true] { + let out = handle( + r.clone(), + MoveFileInput { + files: vec![MoveFileSpec { + from: "a.txt".into(), + to: "a.txt".into(), + overwrite, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success, "overwrite={overwrite}"); + assert!(!out.results[0].moved, "self-move must report moved=false"); + assert!(out.results[0].error.is_none()); + // Both echoes are the same canonical path. + assert_eq!(out.results[0].from, out.results[0].to); + // File content INTACT. + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "precious", + "overwrite={overwrite}: self-move must not touch the file" + ); + } + } + + /// NESTED ROOTS regression: root B lives INSIDE root A, so the same + /// file is reachable via A's relative form and via an absolute path + /// inside B. Both canonicalize to one path; the self-move guard must + /// catch the pair before any branch (the cross-root branch would + /// otherwise fs::copy(src, src) → truncation → delete → data loss). + #[tokio::test] + async fn self_move_through_nested_roots_is_noop_and_content_intact() { + let tmp = tempdir().unwrap(); + let nested = tmp.path().join("sub"); + std::fs::create_dir(&nested).unwrap(); + std::fs::write(nested.join("file.txt"), "do not lose me").unwrap(); + let cfg = Arc::new(crate::config::CoderConfig { + // Root B (= A/sub) nests inside root A; A is primary. + base_paths: vec![tmp.path().to_path_buf(), nested.clone()], + non_accessible_globs: vec![], + ..crate::config::CoderConfig::default() + }); + let r = Arc::new(PathResolver::new(&cfg).unwrap()); + assert_eq!(r.roots().len(), 2, "both nested roots must be active"); + + // from: relative through primary root A; to: absolute inside B. + let to_abs = std::fs::canonicalize(&nested).unwrap().join("file.txt"); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "sub/file.txt".into(), + to: to_abs.display().to_string(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success, "{:?}", out.results[0].error); + assert!(!out.results[0].moved, "same-file pair must be a no-op"); + assert!(out.results[0].error.is_none()); + // The file survives with its content intact. + assert_eq!( + std::fs::read_to_string(nested.join("file.txt")).unwrap(), + "do not lose me" + ); + } + + // ------------------------------------------------------------------ + // Cross-root: file copy+delete (verify content + src gone) + // ------------------------------------------------------------------ + #[tokio::test] + async fn cross_root_file_copy_and_delete() { + let (tmp0, tmp1, r) = setup_two_roots(); + std::fs::write(tmp0.path().join("src.txt"), "cross-root content").unwrap(); + let dst_abs = tmp1.path().join("dst.txt"); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "src.txt".into(), + to: dst_abs.display().to_string(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success, "{:?}", out.results[0].error); + assert!(out.results[0].moved); + // Source gone. + assert!(!tmp0.path().join("src.txt").exists()); + // Destination has correct content. + assert_eq!( + std::fs::read_to_string(&dst_abs).unwrap(), + "cross-root content" + ); + } + + // ------------------------------------------------------------------ + // Rollback on src-delete failure (via read-only parent post-copy) + // ------------------------------------------------------------------ + #[cfg(unix)] + #[tokio::test] + async fn rollback_on_src_delete_failure() { + // Skip when running as root — chmod is ineffective for root. + // Use `id -u` via the shell rather than libc to avoid adding a dependency. + let euid: u32 = std::process::Command::new("id") + .arg("-u") + .output() + .ok() + .and_then(|o| String::from_utf8(o.stdout).ok()) + .and_then(|s| s.trim().parse().ok()) + .unwrap_or(0); + if euid == 0 { + eprintln!("SKIP: rollback_on_src_delete_failure skipped when euid==0 (CI-as-root makes chmod ineffective)"); + return; + } + + let (tmp0, tmp1, r) = setup_two_roots(); + // Write source file under a parent directory we can make read-only. + std::fs::create_dir(tmp0.path().join("locked")).unwrap(); + std::fs::write(tmp0.path().join("locked/src.txt"), "rollback test").unwrap(); + + // Make parent read-only so remove_file(src) will fail after copy. + use std::os::unix::fs::PermissionsExt; + std::fs::set_permissions( + tmp0.path().join("locked"), + std::fs::Permissions::from_mode(0o555), + ) + .unwrap(); + + let dst_abs = tmp1.path().join("dst.txt"); + let src_abs = tmp0.path().join("locked/src.txt"); + + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: src_abs.display().to_string(), + to: dst_abs.display().to_string(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + + // Restore permissions for cleanup. + let _ = std::fs::set_permissions( + tmp0.path().join("locked"), + std::fs::Permissions::from_mode(0o755), + ); + + assert!(!out.results[0].success); + let err = out.results[0].error.as_ref().unwrap(); + assert_eq!(err.code, "C216", "expected C216, got: {:?}", err.code); + // Error must mention rollback. + assert!( + err.message.contains("rolled back") || err.message.contains("rollback"), + "error should mention rollback: {}", + err.message + ); + // Source must still exist (rollback succeeded). + assert!(src_abs.exists(), "source must remain after rollback"); + // Copy must have been removed (rollback cleaned up dst). + assert!(!dst_abs.exists(), "dst copy must be removed after rollback"); + } + + // ------------------------------------------------------------------ + // overwrite=false → C217 + // ------------------------------------------------------------------ + #[tokio::test] + async fn overwrite_false_dst_exists_c217() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("src.txt"), "new").unwrap(); + std::fs::write(tmp.path().join("dst.txt"), "old").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "src.txt".into(), + to: "dst.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + let err = out.results[0].error.as_ref().unwrap(); + assert_eq!(err.code, "C217"); + // House C217 shape — identical format to create-file's (no colon). + assert_eq!( + err.message, + "dst.txt already exists; pass overwrite=true to replace" + ); + // Both files must remain untouched. + assert!(tmp.path().join("src.txt").exists()); + assert_eq!( + std::fs::read_to_string(tmp.path().join("dst.txt")).unwrap(), + "old" + ); + } + + // ------------------------------------------------------------------ + // overwrite=true replaces existing destination + // ------------------------------------------------------------------ + #[tokio::test] + async fn overwrite_true_replaces_existing() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("src.txt"), "new-content").unwrap(); + std::fs::write(tmp.path().join("dst.txt"), "old-content").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "src.txt".into(), + to: "dst.txt".into(), + overwrite: true, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success, "{:?}", out.results[0].error); + assert!(!tmp.path().join("src.txt").exists()); + assert_eq!( + std::fs::read_to_string(tmp.path().join("dst.txt")).unwrap(), + "new-content" + ); + } + + // ------------------------------------------------------------------ + // Missing source → C211 (standard redaction wording) + // ------------------------------------------------------------------ + #[tokio::test] + async fn missing_src_c211() { + let (_tmp, r) = setup_single(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "no-such-file.txt".into(), + to: "dst.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + let err = out.results[0].error.as_ref().unwrap(); + assert_eq!(err.code, "C211"); + // Redaction: must use the standard C211 suffix; must NOT leak OS text. + assert!( + err.message.contains("not found or not accessible"), + "C211 must use standard wording: {}", + err.message + ); + assert!( + !err.message.contains("os error"), + "C211 must not leak OS error text: {}", + err.message + ); + } + + // ------------------------------------------------------------------ + // Glob-denied source (redaction: same C211 suffix as missing) + // ------------------------------------------------------------------ + #[tokio::test] + async fn glob_denied_src_c211_identical_suffix_to_missing() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join(".env"), "secret").unwrap(); + // Both resolve + deny and stat failure yield C211; suffix must match. + let out_denied = handle( + r.clone(), + MoveFileInput { + files: vec![MoveFileSpec { + from: ".env".into(), + to: "dst.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + let out_missing = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "definitely-missing.txt".into(), + to: "dst2.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out_denied.results[0].success); + assert!(!out_missing.results[0].success); + let denied_err = out_denied.results[0].error.as_ref().unwrap(); + let missing_err = out_missing.results[0].error.as_ref().unwrap(); + assert_eq!(denied_err.code, "C211"); + assert_eq!(missing_err.code, "C211"); + // Suffix after "path: " must be byte-identical (REDACTION INVARIANT). + let denied_suffix = denied_err + .message + .strip_prefix(".env: ") + .expect("denied message must start with the supplied path"); + let missing_suffix = missing_err + .message + .strip_prefix("definitely-missing.txt: ") + .expect("missing message must start with the supplied path"); + assert_eq!( + denied_suffix, missing_suffix, + "C211 missing vs glob-denied suffixes must be byte-identical" + ); + // Source must NOT have been moved. + assert!(tmp.path().join(".env").exists()); + } + + // ------------------------------------------------------------------ + // Glob-denied destination + // ------------------------------------------------------------------ + #[tokio::test] + async fn glob_denied_dst_c211() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("src.txt"), "x").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "src.txt".into(), + to: ".env".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C211"); + // Source must remain. + assert!(tmp.path().join("src.txt").exists()); + } + + // ------------------------------------------------------------------ + // Move an allowed root itself → C210 + // ------------------------------------------------------------------ + #[tokio::test] + async fn move_root_rejected_c210() { + let (_tmp, r) = setup_single(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: ".".into(), + to: "other".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C210"); + } + + // ------------------------------------------------------------------ + // Cross-root directory move → C210 + // ------------------------------------------------------------------ + #[tokio::test] + async fn cross_root_dir_rejected_c210() { + let (tmp0, tmp1, r) = setup_two_roots(); + std::fs::create_dir(tmp0.path().join("mydir")).unwrap(); + std::fs::write(tmp0.path().join("mydir/f.txt"), "x").unwrap(); + let dst_abs = tmp1.path().join("mydir"); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "mydir".into(), + to: dst_abs.display().to_string(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + let err = out.results[0].error.as_ref().unwrap(); + assert_eq!(err.code, "C210"); + assert!( + err.message.contains("cross-root"), + "error must mention cross-root: {}", + err.message + ); + } + + // ------------------------------------------------------------------ + // ZERO SIDE EFFECTS: a rejected entry must mutate nothing — in + // particular, the parents=true dir creation must NOT run before the + // categorical rejections (probe regression: empty parent dirs were + // created at dst for a rejected cross-root dir move). + // ------------------------------------------------------------------ + #[tokio::test] + async fn rejected_cross_root_dir_move_leaves_dst_tree_absent() { + let (tmp0, tmp1, r) = setup_two_roots(); + std::fs::create_dir(tmp0.path().join("mydir")).unwrap(); + std::fs::write(tmp0.path().join("mydir/f.txt"), "x").unwrap(); + // Destination nested under parents that don't exist yet. + let dst_abs = tmp1.path().join("a/b/mydir"); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "mydir".into(), + to: dst_abs.display().to_string(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C210"); + // Source untouched. + assert!(tmp0.path().join("mydir/f.txt").exists()); + // No part of the dst tree may have been created. + assert!( + !tmp1.path().join("a").exists(), + "rejected entry must not create dst parent directories" + ); + } + + // ------------------------------------------------------------------ + // dst is a directory + src is a file → prescriptive C210 (not the + // C217 "pass overwrite=true" dead end — overwrite can't fix it). + // ------------------------------------------------------------------ + #[tokio::test] + async fn dst_is_directory_src_is_file_prescriptive_c210() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("src.txt"), "x").unwrap(); + std::fs::create_dir(tmp.path().join("destdir")).unwrap(); + // Both overwrite=false AND overwrite=true must get the same + // guidance: overwrite cannot make a file replace a directory. + for overwrite in [false, true] { + let out = handle( + r.clone(), + MoveFileInput { + files: vec![MoveFileSpec { + from: "src.txt".into(), + to: "destdir".into(), + overwrite, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success, "overwrite={overwrite}"); + let err = out.results[0].error.as_ref().unwrap(); + assert_eq!(err.code, "C210", "overwrite={overwrite}: {}", err.message); + assert!( + err.message.contains("destination is a directory"), + "must explain dst is a dir: {}", + err.message + ); + // The e.g. hint must name the corrected target path. + assert!( + err.message.contains("destdir/src.txt"), + "must suggest the target file inside the dir: {}", + err.message + ); + // Source and destination untouched. + assert!(tmp.path().join("src.txt").exists()); + assert!(tmp.path().join("destdir").is_dir()); + } + } + + // ------------------------------------------------------------------ + // parents=false + missing parent directory → error + // ------------------------------------------------------------------ + #[tokio::test] + async fn parents_false_missing_parent_errors() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("src.txt"), "x").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "src.txt".into(), + to: "missing_parent/dst.txt".into(), + overwrite: false, + parents: false, + }], + }, + ) + .await + .unwrap(); + // rename itself may fail or succeed depending on whether the parent + // directory exists; since it doesn't, the move should fail. + // On Linux/macOS std::fs::rename returns an error when dst parent is absent. + assert!(!out.results[0].success); + // Source must still be there. + assert!(tmp.path().join("src.txt").exists()); + } + + // ------------------------------------------------------------------ + // Order preservation in a batch + // ------------------------------------------------------------------ + #[tokio::test] + async fn batch_order_preserved() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("a.txt"), "A").unwrap(); + std::fs::write(tmp.path().join("b.txt"), "B").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![ + MoveFileSpec { + from: "a.txt".into(), + to: "a2.txt".into(), + overwrite: false, + parents: true, + }, + MoveFileSpec { + from: "b.txt".into(), + to: "b2.txt".into(), + overwrite: false, + parents: true, + }, + ], + }, + ) + .await + .unwrap(); + assert_eq!(out.results.len(), 2); + assert!(out.results[0].success); + assert!(out.results[1].success); + // Results are in request order. + assert!(out.results[0].from.contains("a.txt")); + assert!(out.results[1].from.contains("b.txt")); + } + + // ------------------------------------------------------------------ + // Echo rules: canonical when resolved, verbatim when not + // ------------------------------------------------------------------ + #[tokio::test] + async fn echo_verbatim_on_resolution_failure() { + let (_tmp, r) = setup_single(); + // A path that escapes the jail will fail resolution (C215). + let out = handle( + r, + MoveFileInput { + files: vec![MoveFileSpec { + from: "/etc/passwd".into(), + to: "dst.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .unwrap(); + assert!(!out.results[0].success); + // The `from` echo must be the caller's verbatim input, not a resolved path. + assert_eq!(out.results[0].from, "/etc/passwd"); + } + + // ------------------------------------------------------------------ + // Batch continues after a per-entry failure + // ------------------------------------------------------------------ + #[tokio::test] + async fn batch_continues_after_failure() { + let (tmp, r) = setup_single(); + std::fs::write(tmp.path().join("good.txt"), "ok").unwrap(); + let out = handle( + r, + MoveFileInput { + files: vec![ + // First entry: missing source → C211. + MoveFileSpec { + from: "missing.txt".into(), + to: "out1.txt".into(), + overwrite: false, + parents: true, + }, + // Second entry: valid move → should succeed. + MoveFileSpec { + from: "good.txt".into(), + to: "moved.txt".into(), + overwrite: false, + parents: true, + }, + ], + }, + ) + .await + .unwrap(); + assert_eq!(out.results.len(), 2); + assert!(!out.results[0].success); + assert!(out.results[1].success); + assert!(!tmp.path().join("good.txt").exists()); + assert!(tmp.path().join("moved.txt").exists()); + } +} diff --git a/coder/src/functions/read_file.rs b/coder/src/functions/read_file.rs index aea4e269..2d3db319 100644 --- a/coder/src/functions/read_file.rs +++ b/coder/src/functions/read_file.rs @@ -1,39 +1,358 @@ -//! `coder::read-file` — return the file's content + metadata. Capped by -//! `max_read_bytes`; non-accessible paths return `C211`. The path is always -//! relative to `base_path`. +//! `coder::read-file` — return file content + metadata. +//! +//! **Single-path mode** (`path`): unchanged from T7 — full reads capped by +//! `max_read_bytes`; optional `line_from`/`line_to` switch to a windowed +//! streamed read. Non-accessible paths return C211. +//! +//! **Batch mode** (`paths[]`): read multiple files or windows in one call. +//! Each entry is a `ReadTarget`: a plain string (whole-file read) or an +//! object `{path, line_from?, line_to?}` (per-entry window). Entries are +//! processed in request order against a shared `batch_read_budget_bytes` +//! cap measured in BYTES OF RETURNED CONTENT (after UTF-8 sanitization — +//! invalid bytes expand to 3-byte U+FFFD replacements BEFORE they are +//! counted, so binary files can never deliver more than the budget). An +//! entry cut short by the remaining budget succeeds with `more_lines: +//! true`; an entry reached with zero budget gets a per-entry C213 (names +//! the config key + value, bytes consumed, and recovery guidance). +//! Per-entry resolution/glob/stat failures return per-entry C211; budget +//! is not consumed by failed entries. +//! +//! REDACTION INVARIANT (batch): error classification NEVER depends on +//! budget state — resolve + stat run BEFORE the zero-budget check, so a +//! missing path and a glob-denied path both return C211 (identical +//! wording, verbatim path echo) even after exhaustion. Only an existing, +//! accessible regular file may receive the budget C213. +//! +//! **XOR rule**: `path` XOR `paths` must be set; both or neither → C210. +//! +//! **S4 additions (v0.4.0)**: `stat: true` (single-path + per-entry) +//! returns metadata only — no content — with `total_lines`/`is_utf8` +//! counted via a bounded read (never more than `max_read_bytes`; null +//! beyond it, while size/mode/mtime still populate). `numbered: true` +//! prefixes each content line `N→` with its ABSOLUTE 1-based file line +//! number; prefix bytes are charged against every byte cap/budget. +//! Single-path FULL reads are additionally bounded by the +//! `max_output_bytes` config (per-call override clamped to +//! `max_read_bytes`); the C213 carries size + total_lines + the +//! corrective calls. REDACTION ORDERING everywhere: resolve → deny +//! (C211) → metadata syscalls → budget (C213) — classification must +//! never depend on budget state, and deny must precede any metadata +//! syscall so stat/budget can't become an existence or size oracle. +//! +//! LINE CONVENTION (shared with `coder::update-file`): a line is a +//! 0x0A-terminated or EOF-terminated byte segment. An empty file has 0 +//! lines; a trailing newline does NOT create a phantom last line. This +//! is exactly `str::lines()` counting — the same convention +//! `update_file::split_file` uses for its 1-based line ops — so line +//! numbers reported here address the same lines `update-file` edits. +//! (The two must not drift: agents read a window, then edit those line +//! numbers.) Windowed content keeps each line's raw terminator bytes. +use std::path::Path; use std::sync::Arc; use schemars::JsonSchema; use serde::{Deserialize, Serialize}; use crate::config::CoderConfig; -use crate::error::{err_to_string, CoderError}; +use crate::error::{err_to_string, CoderError, WireError}; use crate::path::PathResolver; +use super::read_window::{count_lines, lossy_utf8, number_lines, read_window, read_window_wire}; + +// --------------------------------------------------------------------------- +// Input types +// --------------------------------------------------------------------------- + +/// A single entry in a `paths[]` batch request. Pass either a bare file +/// path string (whole-file read, same cap as `max_read_bytes`) or an +/// object with optional per-entry `line_from`/`line_to` window parameters +/// (1-based, inclusive — same rules as the top-level `path` mode). #[derive(Debug, Deserialize, JsonSchema)] +#[serde(untagged)] +pub enum ReadTarget { + /// Bare path string: read the whole file (within remaining batch budget + /// and `max_read_bytes`). + Path(String), + /// Object form: path plus optional 1-based window parameters. Omit + /// `line_from` to start from line 1; omit `line_to` to read to EOF. + Window { + /// File to read. Same jail rules as the top-level `path` field. + path: String, + /// First line of the window, 1-based inclusive (must be >= 1; 0 + /// is rejected with C210 for this entry). Defaults to 1 when + /// only `line_to` is set. + #[serde(default)] + #[schemars(range(min = 1))] + line_from: Option, + /// Last line of the window, 1-based inclusive. Must be >= + /// `line_from` (C210 for this entry otherwise). Omit to read from + /// `line_from` to EOF. + #[serde(default)] + #[schemars(range(min = 1))] + line_to: Option, + /// Per-entry metadata probe: same semantics as the top-level + /// `stat` field — size/mode/mtime always, `total_lines`/`is_utf8` + /// when the file fits `max_read_bytes`, content null, no batch + /// budget consumed. C210 when combined with this entry's + /// `line_from`/`line_to` or `numbered`. + #[serde(default)] + stat: bool, + /// Prefix this entry's content lines with their absolute 1-based + /// file line numbers (`N→`) — same semantics as the top-level + /// `numbered` field. Prefix bytes are charged against + /// `batch_read_budget_bytes`. + #[serde(default)] + numbered: bool, + }, +} + +impl ReadTarget { + fn path(&self) -> &str { + match self { + ReadTarget::Path(p) => p, + ReadTarget::Window { path, .. } => path, + } + } + + fn window_params(&self) -> (Option, Option) { + match self { + ReadTarget::Path(_) => (None, None), + ReadTarget::Window { + line_from, line_to, .. + } => (*line_from, *line_to), + } + } + + /// `(stat, numbered)` for this entry; bare string targets carry + /// neither flag. + fn flags(&self) -> (bool, bool) { + match self { + ReadTarget::Path(_) => (false, false), + ReadTarget::Window { stat, numbered, .. } => (*stat, *numbered), + } + } +} + +// examples are wire-contract; goldens pin them. +#[derive(Debug, Default, Deserialize, JsonSchema)] +#[schemars( + example = "example_read_file_input", + example = "example_read_file_batch" +)] pub struct ReadFileInput { - /// File to read, relative to `base_path`. + /// Single file to read. Relative to the primary allowed root, or an + /// absolute path inside any allowed root. Call `coder::info` to see + /// the allowed roots. Paths outside every allowed root are rejected — + /// use the shell worker's `shell::fs::*` for host paths outside the + /// jail. Mutually exclusive with `paths` (XOR): pass either `path` or + /// `paths`, not both — C210 if both or neither is set. + #[serde(default)] + pub path: Option, + /// First line of the window, 1-based inclusive (must be >= 1; 0 is + /// rejected with C210). Setting `line_from` and/or `line_to` switches + /// to windowed mode: the file is streamed and only the requested + /// lines are returned, so files larger than `max_read_bytes` stay + /// readable slice by slice — the byte cap then applies to the + /// returned window, never the file size. Defaults to 1 when only + /// `line_to` is set. A window starting past EOF succeeds with empty + /// content and reports the file's `total_lines`. Only valid in + /// single-path mode (`path`); ignored when `paths` is set. Lines are + /// 0x0A- or EOF-terminated segments; a trailing newline does not + /// create a phantom line (same convention as `coder::update-file`). + #[serde(default)] + #[schemars(range(min = 1))] + pub line_from: Option, + /// Last line of the window, 1-based inclusive. Must be >= `line_from` + /// (C210 otherwise). Omit to read from `line_from` to end-of-file + /// (still bounded by `max_read_bytes` on the returned bytes). Only + /// valid in single-path mode (`path`); ignored when `paths` is set. + #[serde(default)] + #[schemars(range(min = 1))] + pub line_to: Option, + /// Metadata probe — the cheap "how big is it" call. When true the + /// response carries size/mode/mtime plus `total_lines` and `is_utf8` + /// (both null when the file exceeds `max_read_bytes` — size/mode/mtime + /// still populate, so stat on a huge file SUCCEEDS); `content` is + /// null, `lines_returned` 0, `more_lines` false. Probe BEFORE reading + /// an unknown file, then fetch just the slice you need with + /// `line_from`/`line_to`. Mutually exclusive with `line_from`, + /// `line_to`, `numbered`, and `max_output_bytes` (C210 — stat returns + /// no content for them to act on). Batch entries take a per-entry + /// `stat` field instead; this top-level flag is ignored when `paths` + /// is set. + #[serde(default)] + pub stat: bool, + /// When true every returned content line is prefixed `N→`, where N is + /// the line's ABSOLUTE 1-based number in the file — a window starting + /// at `line_from: 40` is numbered from 40, not 1. Numbers match + /// `coder::update-file`'s 1-based line ops exactly, so you can go + /// from a numbered read straight to a line edit. Prefix bytes count + /// toward all byte caps and budgets (no hidden bypass). C210 with + /// `stat: true` (no content to number). Batch entries take a + /// per-entry `numbered` field instead; this top-level flag is ignored + /// when `paths` is set. + #[serde(default)] + pub numbered: bool, + /// Per-call override of the `max_output_bytes` config (default + /// 131072) that budgets single-path FULL reads, measured in returned + /// content bytes after UTF-8 conversion (numbered prefixes included). + /// Values above `max_read_bytes` are silently clamped to it. When the + /// full content would exceed the effective budget the call fails with + /// a C213 naming the file's size and `total_lines` — recover by + /// windowing with `line_from`/`line_to`, probing with `stat: true`, + /// or raising this field. Full reads only: combining it with + /// `line_from`/`line_to` is C210 (windows are bounded by + /// `max_read_bytes` instead); ignored when `paths` is set (batch mode + /// is governed by `batch_read_budget_bytes`). + #[serde(default)] + pub max_output_bytes: Option, + /// Batch of files (or windowed slices) to read in a single call. + /// Each entry is either a plain path string (whole-file read) or an + /// object `{path, line_from?, line_to?}` with per-entry window + /// parameters. Entries are processed in request order against a + /// shared `batch_read_budget_bytes` cap, measured in bytes of + /// returned content (after UTF-8 sanitization) — see `coder::info` + /// for the configured value. Results are returned in the `results` + /// field; top-level fields are null. Mutually exclusive with `path` + /// (XOR): pass either `path` or `paths`, not both — C210 if both or + /// neither is set. + #[serde(default)] + pub paths: Option>, +} + +// examples are wire-contract; goldens pin them. +fn example_read_file_input() -> serde_json::Value { + serde_json::json!({ + "path": "src/main.rs", + "line_from": 10, + "line_to": 50 + }) +} + +/// Batch form: mix bare path strings and {path, line_from, line_to} objects. +fn example_read_file_batch() -> serde_json::Value { + serde_json::json!({ + "paths": [ + "src/lib.rs", + { "path": "src/config.rs", "line_from": 1, "line_to": 30 } + ] + }) +} + +// --------------------------------------------------------------------------- +// Output types +// --------------------------------------------------------------------------- + +/// Per-entry result in a batch `paths[]` response. +#[derive(Debug, Serialize, JsonSchema)] +pub struct ReadEntryResult { + /// Canonical absolute path of the file (resolved through the jail). + /// If resolution failed, this echoes the caller's input verbatim. pub path: String, + /// `true` when the read succeeded (content/metadata fields are + /// populated); `false` when an error occurred (only `error` is set). + pub success: bool, + /// File content as a UTF-8 string — the whole file or the requested + /// window. Binary bytes are replaced by U+FFFD (`is_utf8: false`). + /// `null` on failure. + #[serde(skip_serializing_if = "Option::is_none")] + pub content: Option, + /// Whether `content` survived UTF-8 conversion without losing bytes. + /// `null` on failure. + #[serde(skip_serializing_if = "Option::is_none")] + pub is_utf8: Option, + /// Number of lines returned in `content`. `null` on failure. + #[serde(skip_serializing_if = "Option::is_none")] + pub lines_returned: Option, + /// Total lines in the file; present when the stream reached EOF during + /// this entry's read. `null` when not traversed or on failure. + #[serde(skip_serializing_if = "Option::is_none")] + pub total_lines: Option, + /// `true` when the file has content beyond what `content` includes + /// (window ended before EOF, or byte budget cut the window short). + /// `null` on failure. + #[serde(skip_serializing_if = "Option::is_none")] + pub more_lines: Option, + /// Size of the FILE in bytes (from metadata). `null` on failure or + /// when the entry budget was exhausted before the file was opened. + #[serde(skip_serializing_if = "Option::is_none")] + pub size: Option, + /// Unix permission bits (lower 9 bits of `st_mode`), e.g. 0o644. + /// `null` on failure. + #[serde(skip_serializing_if = "Option::is_none")] + pub mode: Option, + /// Last-modified time as a Unix epoch in seconds. `null` on failure. + #[serde(skip_serializing_if = "Option::is_none")] + pub mtime: Option, + /// Structured error — present only when `success: false`. + #[serde(skip_serializing_if = "Option::is_none")] + pub error: Option, } #[derive(Debug, Serialize, JsonSchema)] pub struct ReadFileOutput { - /// The original `path` argument echoed back for caller correlation. - pub path: String, - /// File content as a UTF-8 string. Binary files are returned with - /// invalid bytes replaced by U+FFFD; use a future binary-aware - /// function if exact bytes matter. - pub content: String, - /// Whether `content` lost bytes to UTF-8 sanitisation. - pub is_utf8: bool, - pub size: u64, + /// Canonical absolute path of the file read (resolved through the + /// jail). **Single-path mode only; null when the request used + /// `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub path: Option, + /// File content as a UTF-8 string — the whole file, or just the + /// requested window when `line_from`/`line_to` was given (window + /// lines keep their newline terminators). Binary content is returned + /// with invalid bytes replaced by U+FFFD; use a future binary-aware + /// function if exact bytes matter. **Single-path mode only; null when + /// the request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub content: Option, + /// Whether `content` survived UTF-8 conversion without losing bytes. + /// Reflects the RETURNED content only: a clean window inside an + /// otherwise-binary file is still `true`. **Single-path mode only; + /// null when the request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub is_utf8: Option, + /// Number of lines in `content`. For full reads this equals the + /// file's total line count. **Single-path mode only; null when the + /// request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub lines_returned: Option, + /// Total number of lines in the file. Present only when the read + /// traversed the whole file: always for full reads; for windowed + /// reads only when the stream naturally reached EOF within the byte + /// cap. Never computed by forcing an extra full scan — absent means + /// the file was not fully traversed. **Single-path mode only; null + /// when the request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub total_lines: Option, + /// True when the file has content beyond what `content` includes: + /// the window ended before EOF, or the byte budget cut the window + /// short. Always false for full reads. **Single-path mode only; null + /// when the request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub more_lines: Option, + /// Size of the FILE in bytes (from metadata) — not the size of + /// `content`; in windowed mode the two differ. **Single-path mode + /// only; null when the request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub size: Option, /// Unix permission bits (lower 9 bits of `st_mode`), e.g. 0o644. - pub mode: u32, - /// Last-modified time as a Unix epoch in seconds. - pub mtime: i64, + /// **Single-path mode only; null when the request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub mode: Option, + /// Last-modified time as a Unix epoch in seconds. **Single-path mode + /// only; null when the request used `paths[]`.** + #[serde(skip_serializing_if = "Option::is_none")] + pub mtime: Option, + /// Per-entry results for a batch `paths[]` request. **Present only + /// when the request used `paths[]`; null in single-path mode.** + #[serde(skip_serializing_if = "Option::is_none")] + pub results: Option>, } +// --------------------------------------------------------------------------- +// Public handler +// --------------------------------------------------------------------------- + pub async fn handle( resolver: Arc, cfg: Arc, @@ -42,44 +361,572 @@ pub async fn handle( inner(&resolver, &cfg, req).map_err(err_to_string) } +// --------------------------------------------------------------------------- +// Internal dispatch +// --------------------------------------------------------------------------- + fn inner( resolver: &PathResolver, cfg: &CoderConfig, req: ReadFileInput, ) -> Result { - let abs = resolver.require_writable(&req.path)?; - let md = std::fs::metadata(&abs)?; + match (&req.path, &req.paths) { + // XOR: both set + (Some(_), Some(_)) => Err(CoderError::BadInput( + "pass either path or paths, not both; the two modes are mutually exclusive \ + (C210). Use path for a single-file read, or paths[] for a batch read." + .into(), + )), + // XOR: neither set + (None, None) => Err(CoderError::BadInput( + "either path or paths must be set (C210). \ + Use path for a single-file read, or paths[] for a batch read." + .into(), + )), + // Single-path mode + (Some(p), None) => { + let p = p.clone(); + let single_req = SingleReadReq { + path: &p, + line_from: req.line_from, + line_to: req.line_to, + stat: req.stat, + numbered: req.numbered, + max_output_bytes: req.max_output_bytes, + }; + single_read(resolver, cfg, single_req).map(|o| ReadFileOutput { + path: Some(o.path), + content: o.content, + is_utf8: o.is_utf8, + lines_returned: Some(o.lines_returned), + total_lines: o.total_lines, + more_lines: Some(o.more_lines), + size: Some(o.size), + mode: Some(o.mode), + mtime: Some(o.mtime), + results: None, + }) + } + // Batch mode + (None, Some(targets)) => { + let results = batch_read(resolver, cfg, targets); + Ok(ReadFileOutput { + path: None, + content: None, + is_utf8: None, + lines_returned: None, + total_lines: None, + more_lines: None, + size: None, + mode: None, + mtime: None, + results: Some(results), + }) + } + } +} + +// --------------------------------------------------------------------------- +// Single-path mode (T7 + full reads) +// --------------------------------------------------------------------------- + +struct SingleReadReq<'a> { + path: &'a str, + line_from: Option, + line_to: Option, + stat: bool, + numbered: bool, + max_output_bytes: Option, +} + +/// Internal result for a single-path read before wrapping in +/// `ReadFileOutput`. `content`/`is_utf8` are `None` for stat probes. +struct SingleReadOut { + path: String, + content: Option, + is_utf8: Option, + lines_returned: u64, + total_lines: Option, + more_lines: bool, + size: u64, + mode: u32, + mtime: i64, +} + +/// C210 for a field combined with `stat: true`. Prescriptive: stat +/// returns no content, so content-shaping fields cannot act. +fn stat_conflict(field: &str) -> CoderError { + CoderError::BadInput(format!( + "stat: true returns metadata only — no content — so {field} has \ + no effect (C210). Choose one: drop {field} to probe metadata, or \ + drop stat to read content." + )) +} + +fn single_read( + resolver: &PathResolver, + cfg: &CoderConfig, + req: SingleReadReq<'_>, +) -> Result { + // Pure input validation first (C210), before any path is touched. + let window = parse_window(req.line_from, req.line_to)?; + if req.stat { + if window.is_some() { + return Err(stat_conflict("line_from/line_to")); + } + if req.numbered { + return Err(stat_conflict("numbered")); + } + if req.max_output_bytes.is_some() { + return Err(stat_conflict("max_output_bytes")); + } + } + if req.max_output_bytes.is_some() && window.is_some() { + return Err(CoderError::BadInput( + "max_output_bytes budgets FULL reads only; a line_from/line_to \ + window is already bounded by max_read_bytes (C210). Drop \ + line_from/line_to to apply the budget, or drop \ + max_output_bytes to read the window." + .into(), + )); + } + // REDACTION ORDERING: resolve + deny-check (C211) BEFORE any metadata + // syscall — stat on a denied path must be byte-identical to stat on a + // missing path, and no budget may reclassify either. + let abs = resolver.require_writable(req.path)?; + let md = std::fs::metadata(&abs).map_err(|e| CoderError::io_for_path(e, req.path))?; if !md.is_file() { return Err(CoderError::BadInput(format!( "not a regular file: {}", req.path ))); } + if req.stat { + return stat_read(&abs, req.path, cfg, &md); + } + match window { + None => full_read(&abs, req.path, cfg, &md, req.numbered, req.max_output_bytes), + Some((from, to)) => windowed_read(&abs, req.path, cfg, &md, from, to, req.numbered), + } +} + +/// Validate the window parameters. `Ok(None)` means full (non-windowed) +/// read; `Ok(Some((from, to)))` is the normalized 1-based inclusive +/// window (`from` defaults to 1, `to: None` means end-of-file). +fn parse_window( + line_from: Option, + line_to: Option, +) -> Result)>, CoderError> { + if line_from.is_none() && line_to.is_none() { + return Ok(None); + } + if line_from == Some(0) { + return Err(CoderError::BadInput( + "line_from must be >= 1 (line numbers are 1-based); got 0. \ + Use line_from=1 for the first line." + .into(), + )); + } + let from = line_from.unwrap_or(1); + if let Some(to) = line_to { + if to < from { + return Err(CoderError::BadInput(format!( + "line_to ({to}) must be >= line_from ({from}); the window \ + is 1-based and inclusive. Swap or widen the bounds." + ))); + } + } + Ok(Some((from, line_to))) +} + +/// Full (non-windowed) read: the whole file, pre-checked against +/// `max_read_bytes`, then against the `max_output_bytes` context budget +/// (converted wire bytes, numbered prefixes included). Both C213s are +/// recovery tools: they name the actual sizes and the corrective calls. +/// +/// ORDERING (REDACTION RULE): callers run resolve → deny → metadata +/// before reaching here, so by construction only an existing, accessible +/// regular file can ever receive either C213. +fn full_read( + abs: &Path, + wire_path: &str, + cfg: &CoderConfig, + md: &std::fs::Metadata, + numbered: bool, + max_output_override: Option, +) -> Result { if md.len() > cfg.max_read_bytes { return Err(CoderError::TooLarge(format!( - "{} is {} bytes; max_read_bytes is {}", - req.path, + "{} is {} bytes, which exceeds max_read_bytes ({}). \ + Read a smaller file, raise max_read_bytes in coder config, \ + or read a slice with line_from/line_to.", + wire_path, md.len(), cfg.max_read_bytes ))); } - let bytes = std::fs::read(&abs)?; - let (content, is_utf8) = match String::from_utf8(bytes.clone()) { - Ok(s) => (s, true), - Err(_) => (String::from_utf8_lossy(&bytes).into_owned(), false), + let bytes = std::fs::read(abs).map_err(|e| CoderError::io_for_path(e, wire_path))?; + let lines = count_lines(&bytes); + let (content, is_utf8) = lossy_utf8(bytes); + let content = if numbered { + number_lines(&content, 1) + } else { + content }; - let mode = unix_mode(&md); - let mtime = unix_mtime(&md); - Ok(ReadFileOutput { - path: req.path, - content, + // Per-call override clamps SILENTLY to max_read_bytes (documented on + // the input field); without an override the config value applies. + let budget = max_output_override + .map(|v| v.min(cfg.max_read_bytes)) + .unwrap_or(cfg.max_output_bytes); + if content.len() as u64 > budget { + // The error IS the recovery tool: it carries the stat facts + // (size, total_lines) plus every corrective call, so the agent's + // next call can succeed from the message alone. + return Err(CoderError::TooLarge(format!( + "{wire_path}: a full read would return {} bytes of content \ + (file is {} bytes, {lines} lines), which exceeds \ + max_output_bytes ({budget}). To recover: read a slice with \ + line_from/line_to, probe metadata cheaply with stat: true, or \ + re-call with a higher per-call max_output_bytes (values above \ + max_read_bytes are clamped).", + content.len(), + md.len(), + ))); + } + Ok(SingleReadOut { + path: abs.display().to_string(), + content: Some(content), + is_utf8: Some(is_utf8), + lines_returned: lines, + total_lines: Some(lines), + more_lines: false, + size: md.len(), + mode: unix_mode(md), + mtime: unix_mtime(md), + }) +} + +/// Metadata-only probe (`stat: true`): size/mode/mtime from `md`, plus +/// `total_lines`/`is_utf8` from a bounded read — never more than +/// `max_read_bytes` of work. Shared by single-path and batch modes. +/// +/// CALLERS MUST have run resolve → deny-check (C211) BEFORE calling — +/// the bounded read is a metadata syscall in the redaction-ordering +/// sense, and stat on a denied path must stay byte-identical to stat on +/// a missing one. +fn stat_read( + abs: &Path, + wire_path: &str, + cfg: &CoderConfig, + md: &std::fs::Metadata, +) -> Result { + let (total_lines, is_utf8) = stat_counts(abs, wire_path, md, cfg.max_read_bytes)?; + Ok(SingleReadOut { + path: abs.display().to_string(), + content: None, is_utf8, + lines_returned: 0, + total_lines, + more_lines: false, size: md.len(), - mode, - mtime, + mode: unix_mode(md), + mtime: unix_mtime(md), }) } +/// Count `total_lines` + whole-file UTF-8 validity for a stat probe, +/// reading at most `limit` bytes (the read is capped at limit+1 to +/// detect overflow — the same probe pattern `read_window` uses for +/// lines). `(None, None)` when the file exceeds `limit` (by metadata or +/// by an over-limit read after TOCTOU growth): stat on a big file still +/// SUCCEEDS for size/mode/mtime — that is its point. Line counting and +/// the UTF-8 verdict follow the full-read path exactly (`count_lines` + +/// strict validation before lossy conversion would kick in). +fn stat_counts( + abs: &Path, + wire_path: &str, + md: &std::fs::Metadata, + limit: u64, +) -> Result<(Option, Option), CoderError> { + if md.len() > limit { + return Ok((None, None)); + } + let file = std::fs::File::open(abs).map_err(|e| CoderError::io_for_path(e, wire_path))?; + let mut bytes = Vec::new(); + let mut bounded = std::io::Read::take(file, limit.saturating_add(1)); + std::io::Read::read_to_end(&mut bounded, &mut bytes) + .map_err(|e| CoderError::io_for_path(e, wire_path))?; + if bytes.len() as u64 > limit { + return Ok((None, None)); + } + let total = count_lines(&bytes); + let is_utf8 = std::str::from_utf8(&bytes).is_ok(); + Ok((Some(total), Some(is_utf8))) +} + +/// Windowed read (single-path mode): stream lines `from..=to` via +/// `BufReader`. The `max_read_bytes` cap bounds the COLLECTED window's +/// RAW bytes — the T7 contract — never the file size (windowed mode +/// never returns C213 for an oversize file). +fn windowed_read( + abs: &Path, + wire_path: &str, + cfg: &CoderConfig, + md: &std::fs::Metadata, + from: u64, + to: Option, + numbered: bool, +) -> Result { + let file = std::fs::File::open(abs).map_err(|e| CoderError::io_for_path(e, wire_path))?; + let mut reader = std::io::BufReader::new(file); + let w = read_window(&mut reader, from, to, cfg.max_read_bytes, numbered) + .map_err(|e| CoderError::io_for_path(e, wire_path))?; + let (content, is_utf8) = lossy_utf8(w.raw); + Ok(SingleReadOut { + path: abs.display().to_string(), + content: Some(content), + is_utf8: Some(is_utf8), + lines_returned: w.lines_returned, + total_lines: w.total_lines, + more_lines: w.more_lines, + size: md.len(), + mode: unix_mode(md), + mtime: unix_mtime(md), + }) +} + +/// Windowed read (batch mode): identical streaming/no-torn-lines +/// machinery, but the budget is measured in CONVERTED WIRE BYTES +/// (`content.len()` after UTF-8 sanitization) — the unit +/// `batch_read_budget_bytes` is defined in, so the aggregate cap bounds +/// what the caller's context actually receives even for binary files +/// whose invalid bytes expand to 3-byte U+FFFD replacements. +fn wire_windowed_read( + abs: &Path, + wire_path: &str, + wire_budget: u64, + md: &std::fs::Metadata, + from: u64, + to: Option, + numbered: bool, +) -> Result { + let file = std::fs::File::open(abs).map_err(|e| CoderError::io_for_path(e, wire_path))?; + let mut reader = std::io::BufReader::new(file); + let w = read_window_wire(&mut reader, from, to, wire_budget, numbered) + .map_err(|e| CoderError::io_for_path(e, wire_path))?; + Ok(SingleReadOut { + path: abs.display().to_string(), + content: Some(w.content), + is_utf8: Some(w.is_utf8), + lines_returned: w.lines_returned, + total_lines: w.total_lines, + more_lines: w.more_lines, + size: md.len(), + mode: unix_mode(md), + mtime: unix_mtime(md), + }) +} + +// --------------------------------------------------------------------------- +// Batch mode +// --------------------------------------------------------------------------- + +/// A failed batch entry: every content/metadata field null, `error` set. +fn entry_failure(path: String, error: WireError) -> ReadEntryResult { + ReadEntryResult { + path, + success: false, + content: None, + is_utf8: None, + lines_returned: None, + total_lines: None, + more_lines: None, + size: None, + mode: None, + mtime: None, + error: Some(error), + } +} + +/// Process `targets` in request order against the aggregate +/// `batch_read_budget_bytes` cap. The budget unit is CONVERTED WIRE +/// BYTES (each entry's `content.len()`), so what is accounted is exactly +/// what is delivered. +/// +/// ORDERING IS LOAD-BEARING (REDACTION INVARIANT): resolve, stat, and the +/// regular-file check all run BEFORE the zero-budget check, so error +/// classification never depends on budget state. Otherwise an agent +/// could exhaust the budget and then distinguish a missing path (which +/// would hit the budget C213) from a glob-denied one (C211 at resolve). +fn batch_read( + resolver: &PathResolver, + cfg: &CoderConfig, + targets: &[ReadTarget], +) -> Vec { + let mut remaining_budget: u64 = cfg.batch_read_budget_bytes; + let mut results = Vec::with_capacity(targets.len()); + + for target in targets { + let wire_path = target.path(); + let (lf, lt) = target.window_params(); + let (stat, numbered) = target.flags(); + + // Per-entry C210 for invalid window params before touching anything. + let window = match parse_window(lf, lt) { + Ok(w) => w, + Err(e) => { + results.push(entry_failure(wire_path.to_string(), e.to_wire_error())); + continue; + } + }; + // Per-entry C210 for stat conflicts — same rules as single-path. + if stat && window.is_some() { + results.push(entry_failure( + wire_path.to_string(), + stat_conflict("line_from/line_to").to_wire_error(), + )); + continue; + } + if stat && numbered { + results.push(entry_failure( + wire_path.to_string(), + stat_conflict("numbered").to_wire_error(), + )); + continue; + } + + // Resolve + accessibility check; failures echo the caller's input + // verbatim and do NOT consume budget. + let abs = match resolver.require_writable(wire_path) { + Ok(p) => p, + Err(e) => { + results.push(entry_failure(wire_path.to_string(), e.to_wire_error())); + continue; + } + }; + + // Stat BEFORE the zero-budget check (see fn docs). + let md = match std::fs::metadata(&abs) { + Ok(m) => m, + Err(e) => { + // NotFound folds to C211 and echoes the wire path VERBATIM — + // byte-indistinguishable from the glob-denied arm above + // (REDACTION INVARIANT). Other io errors (EIO, permission + // TOCTOU) echo the canonical path: resolution succeeded, so + // the canonical form is redaction-safe and more actionable. + let echo = if e.kind() == std::io::ErrorKind::NotFound { + wire_path.to_string() + } else { + abs.display().to_string() + }; + let err = CoderError::io_for_path(e, wire_path); + results.push(entry_failure(echo, err.to_wire_error())); + continue; + } + }; + if !md.is_file() { + results.push(entry_failure( + abs.display().to_string(), + CoderError::BadInput(format!("not a regular file: {wire_path}")).to_wire_error(), + )); + continue; + } + + // Stat probe: metadata only, no content — consumes no budget and + // is deliberately exempt from the zero-budget C213 below (stat is + // the cheap probe; an exhausted batch can still size files). + // Resolve + deny + metadata already ran, so classification stays + // budget-independent (REDACTION INVARIANT). + if stat { + match stat_read(&abs, wire_path, cfg, &md) { + Ok(r) => results.push(ReadEntryResult { + path: r.path, + success: true, + content: None, + is_utf8: r.is_utf8, + lines_returned: Some(0), + total_lines: r.total_lines, + more_lines: Some(false), + size: Some(r.size), + mode: Some(r.mode), + mtime: Some(r.mtime), + error: None, + }), + Err(e) => results.push(entry_failure(abs.display().to_string(), e.to_wire_error())), + } + continue; + } + + // Accounted consumption so far, derived from the running budget. + // Computed BEFORE the guard so it stays correct if the guard + // condition ever changes; with today's `== 0` condition it is + // tautologically the full budget. + let consumed = cfg.batch_read_budget_bytes - remaining_budget; + + // Zero-budget check — only an existing, accessible regular file + // can reach this point, so C213 leaks nothing about protected or + // missing paths. The message reports the ACTUAL accounted + // consumption. + if remaining_budget == 0 { + results.push(entry_failure( + abs.display().to_string(), + CoderError::TooLarge(format!( + "batch budget exhausted before reaching {wire_path}: \ + batch_read_budget_bytes is {} and earlier entries already \ + returned {consumed} bytes of content (after UTF-8 \ + sanitization). To recover: request fewer or smaller entries, \ + use per-entry line_from/line_to windows, or raise \ + batch_read_budget_bytes in coder config.", + cfg.batch_read_budget_bytes, + )) + .to_wire_error(), + )); + continue; + } + + // Effective per-entry budget: min(remaining, max_read_bytes), + // applied to the entry's converted wire bytes. A string target is + // a window from line 1 to EOF, so the no-torn-lines machinery + // applies uniformly. + let entry_budget = remaining_budget.min(cfg.max_read_bytes); + let (from, to) = window.unwrap_or((1, None)); + + match wire_windowed_read(&abs, wire_path, entry_budget, &md, from, to, numbered) { + Ok(r) => { + // Accounted consumption == delivered wire bytes (numbered + // prefixes included): the collection above already counted + // converted lengths. + let delivered = r.content.as_ref().map_or(0, String::len) as u64; + remaining_budget = remaining_budget.saturating_sub(delivered); + results.push(ReadEntryResult { + path: r.path, + success: true, + content: r.content, + is_utf8: r.is_utf8, + lines_returned: Some(r.lines_returned), + total_lines: r.total_lines, + more_lines: Some(r.more_lines), + size: Some(r.size), + mode: Some(r.mode), + mtime: Some(r.mtime), + error: None, + }); + } + Err(e) => { + // IO/other error — budget not consumed; resolution + // succeeded, so the canonical echo is redaction-safe. + results.push(entry_failure(abs.display().to_string(), e.to_wire_error())); + } + } + } + + results +} + +// --------------------------------------------------------------------------- +// Platform helpers +// --------------------------------------------------------------------------- + #[cfg(unix)] fn unix_mode(md: &std::fs::Metadata) -> u32 { use std::os::unix::fs::PermissionsExt; @@ -99,96 +946,1239 @@ fn unix_mtime(md: &std::fs::Metadata) -> i64 { .unwrap_or(0) } +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + #[cfg(test)] mod tests { use super::*; use tempfile::tempdir; + // ----------------------------------------------------------------------- + // Test helpers + // ----------------------------------------------------------------------- + fn setup() -> (tempfile::TempDir, Arc, Arc) { + setup_with_cap(1024) + } + + /// Jail with a custom `max_read_bytes` (window byte-budget tests). + fn setup_with_cap(cap: u64) -> (tempfile::TempDir, Arc, Arc) { let tmp = tempdir().unwrap(); - let cfg = CoderConfig { - base_path: tmp.path().to_path_buf(), + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], non_accessible_globs: vec!["**/.env".to_string()], - max_read_bytes: 1024, + max_read_bytes: cap, ..CoderConfig::default() - }; - let cfg = Arc::new(cfg); + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + (tmp, resolver, cfg) + } + + fn setup_with_batch_budget( + batch_budget: u64, + ) -> (tempfile::TempDir, Arc, Arc) { + let tmp = tempdir().unwrap(); + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec!["**/.env".to_string()], + max_read_bytes: 1024 * 1024, // 1 MiB per-entry cap — not the constraint + batch_read_budget_bytes: batch_budget, + ..CoderConfig::default() + }); let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); (tmp, resolver, cfg) } + fn full(path: &str) -> ReadFileInput { + ReadFileInput { + path: Some(path.into()), + ..ReadFileInput::default() + } + } + + fn window_req(path: &str, from: Option, to: Option) -> ReadFileInput { + ReadFileInput { + path: Some(path.into()), + line_from: from, + line_to: to, + ..ReadFileInput::default() + } + } + + fn batch(paths: Vec) -> ReadFileInput { + ReadFileInput { + paths: Some(paths), + ..ReadFileInput::default() + } + } + + fn stat_req(path: &str) -> ReadFileInput { + ReadFileInput { + path: Some(path.into()), + stat: true, + ..ReadFileInput::default() + } + } + + /// Object-form batch target with only the window fields set. + fn target_window(path: &str, from: Option, to: Option) -> ReadTarget { + ReadTarget::Window { + path: path.into(), + line_from: from, + line_to: to, + stat: false, + numbered: false, + } + } + + /// Object-form batch target with only the stat flag set. + fn target_stat(path: &str) -> ReadTarget { + ReadTarget::Window { + path: path.into(), + line_from: None, + line_to: None, + stat: true, + numbered: false, + } + } + + /// Object-form batch target with only the numbered flag set. + fn target_numbered(path: &str) -> ReadTarget { + ReadTarget::Window { + path: path.into(), + line_from: None, + line_to: None, + stat: false, + numbered: true, + } + } + + /// "L1\n" .. "L\n" (trailing newline). + fn numbered_lines(n: u64) -> String { + (1..=n).map(|i| format!("L{i}\n")).collect() + } + + fn unwrap_single(out: ReadFileOutput) -> (String, String, bool, u64, Option, bool) { + ( + out.path.unwrap(), + out.content.unwrap(), + out.is_utf8.unwrap(), + out.lines_returned.unwrap(), + out.total_lines, + out.more_lines.unwrap(), + ) + } + + // ----------------------------------------------------------------------- + // XOR input validation + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn both_path_and_paths_returns_c210() { + let (_tmp, r, c) = setup(); + let req = ReadFileInput { + path: Some("f.txt".into()), + paths: Some(vec![ReadTarget::Path("f.txt".into())]), + ..ReadFileInput::default() + }; + let err = handle(r, c, req).await.unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!( + err.contains("not both"), + "error must name the XOR rule: {err}" + ); + } + + #[tokio::test] + async fn neither_path_nor_paths_returns_c210() { + let (_tmp, r, c) = setup(); + let req = ReadFileInput::default(); + let err = handle(r, c, req).await.unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + } + + // ----------------------------------------------------------------------- + // Single-path mode — regression (unchanged from T7) + // ----------------------------------------------------------------------- + #[tokio::test] async fn reads_existing_file() { let (tmp, r, c) = setup(); std::fs::write(tmp.path().join("hi.txt"), b"hello").unwrap(); + let out = handle(r, c, full("hi.txt")).await.unwrap(); + let (path, content, is_utf8, lines, total, more) = unwrap_single(out); + assert_eq!(content, "hello"); + assert!(is_utf8); + assert_eq!(lines, 1); + assert_eq!(total, Some(1)); + assert!(!more); + assert_eq!( + path, + std::fs::canonicalize(tmp.path()) + .unwrap() + .join("hi.txt") + .display() + .to_string() + ); + } + + #[tokio::test] + async fn refuses_non_accessible() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join(".env"), b"secret").unwrap(); + let err = handle(r, c, full(".env")).await.unwrap_err(); + assert!(err.contains("C211"), "got: {err}"); + } + + #[tokio::test] + async fn refuses_file_above_max_read_bytes_and_hints_window_params() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("big.bin"), vec![0u8; 2048]).unwrap(); + let err = handle(r, c, full("big.bin")).await.unwrap_err(); + assert!(err.contains("C213"), "got: {err}"); + assert!(err.contains("line_from"), "got: {err}"); + assert!(err.contains("line_to"), "got: {err}"); + } + + #[tokio::test] + async fn rejects_directory_with_bad_input() { + let (tmp, r, c) = setup(); + std::fs::create_dir(tmp.path().join("d")).unwrap(); + let err = handle(r, c, full("d")).await.unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + } + + #[tokio::test] + async fn missing_file_returns_c211() { + let (_tmp, r, c) = setup(); + let err = handle(r, c, full("nope.txt")).await.unwrap_err(); + assert!(err.contains("C211"), "got: {err}"); + } + + // ----------------------------------------------------------------------- + // T7 — windowed reads (single-path, regression) + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn window_in_range_returns_lines_and_counters() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), numbered_lines(10)).unwrap(); + let out = handle(r, c, window_req("f.txt", Some(3), Some(5))) + .await + .unwrap(); + let (_, content, is_utf8, lines, total, more) = unwrap_single(out); + assert_eq!(content, "L3\nL4\nL5\n"); + assert_eq!(lines, 3); + assert!(more); + assert_eq!(total, None); + assert!(is_utf8); + } + + #[tokio::test] + async fn window_from_only_reads_to_eof_and_knows_total() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), numbered_lines(10)).unwrap(); + let out = handle(r, c, window_req("f.txt", Some(8), None)) + .await + .unwrap(); + let (_, content, _, lines, total, more) = unwrap_single(out); + assert_eq!(content, "L8\nL9\nL10\n"); + assert_eq!(lines, 3); + assert!(!more); + assert_eq!(total, Some(10)); + } + + #[tokio::test] + async fn window_to_only_reads_from_line_one() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), numbered_lines(10)).unwrap(); + let out = handle(r, c, window_req("f.txt", None, Some(2))) + .await + .unwrap(); + let (_, content, _, lines, total, more) = unwrap_single(out); + assert_eq!(content, "L1\nL2\n"); + assert_eq!(lines, 2); + assert!(more); + assert_eq!(total, None); + } + + #[tokio::test] + async fn window_past_eof_is_success_with_total() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), numbered_lines(10)).unwrap(); + let out = handle(r, c, window_req("f.txt", Some(50), Some(60))) + .await + .unwrap(); + let (_, content, _, lines, total, more) = unwrap_single(out); + assert_eq!(content, ""); + assert_eq!(lines, 0); + assert!(!more); + assert_eq!(total, Some(10)); + } + + #[tokio::test] + async fn line_from_zero_rejected_with_c210() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\n").unwrap(); + let err = handle(r, c, window_req("f.txt", Some(0), Some(3))) + .await + .unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!(err.contains("1-based"), "must name the rule: {err}"); + } + + #[tokio::test] + async fn inverted_window_rejected_with_c210() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\nb\nc\n").unwrap(); + let err = handle(r, c, window_req("f.txt", Some(5), Some(3))) + .await + .unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!(err.contains("line_to"), "must name the rule: {err}"); + } + + #[tokio::test] + async fn to_only_zero_rejected_as_inverted() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\n").unwrap(); + let err = handle(r, c, window_req("f.txt", None, Some(0))) + .await + .unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + } + + #[tokio::test] + async fn window_exceeding_byte_cap_returns_partial_with_more_lines() { + let tmp = tempdir().unwrap(); + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec![], + max_read_bytes: 10, + ..CoderConfig::default() + }); + let r = Arc::new(PathResolver::new(&cfg).unwrap()); + std::fs::write(tmp.path().join("f.txt"), "aaaa\naaaa\naaaa\naaaa\n").unwrap(); + let out = handle(r, cfg, window_req("f.txt", Some(1), Some(4))) + .await + .unwrap(); + let (_, content, _, lines, total, more) = unwrap_single(out); + assert_eq!(content, "aaaa\naaaa\n"); + assert_eq!(lines, 2); + assert!(more, "byte-budget cut must set more_lines"); + assert_eq!(total, None); + } + + #[tokio::test] + async fn single_line_exceeding_byte_cap_returns_empty_partial() { + let tmp = tempdir().unwrap(); + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec![], + max_read_bytes: 4, + ..CoderConfig::default() + }); + let r = Arc::new(PathResolver::new(&cfg).unwrap()); + std::fs::write(tmp.path().join("f.txt"), "aaaaaaaa\nb\n").unwrap(); + let out = handle(r, cfg, window_req("f.txt", Some(1), Some(2))) + .await + .unwrap(); + let (_, content, _, lines, _, more) = unwrap_single(out); + assert_eq!(content, ""); + assert_eq!(lines, 0); + assert!(more); + } + + // ----------------------------------------------------------------------- + // Batch mode — ReadTarget parsing + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_string_target_reads_whole_file() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.txt"), "hello\n").unwrap(); + let out = handle(r, c, batch(vec![ReadTarget::Path("a.txt".into())])) + .await + .unwrap(); + assert!( + out.path.is_none(), + "single-path field must be null in batch" + ); + let results = out.results.unwrap(); + assert_eq!(results.len(), 1); + assert!(results[0].success); + assert_eq!(results[0].content.as_deref(), Some("hello\n")); + } + + #[tokio::test] + async fn batch_object_target_with_window() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("b.txt"), numbered_lines(5)).unwrap(); + let out = handle(r, c, batch(vec![target_window("b.txt", Some(2), Some(3))])) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success); + assert_eq!(results[0].content.as_deref(), Some("L2\nL3\n")); + assert_eq!(results[0].lines_returned, Some(2)); + } + + // ----------------------------------------------------------------------- + // Batch mode — order preservation + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_results_in_request_order() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("x.txt"), "X\n").unwrap(); + std::fs::write(tmp.path().join("y.txt"), "Y\n").unwrap(); + std::fs::write(tmp.path().join("z.txt"), "Z\n").unwrap(); let out = handle( r, c, - ReadFileInput { - path: "hi.txt".into(), - }, + batch(vec![ + ReadTarget::Path("x.txt".into()), + ReadTarget::Path("y.txt".into()), + ReadTarget::Path("z.txt".into()), + ]), ) .await .unwrap(); - assert_eq!(out.content, "hello"); - assert_eq!(out.size, 5); - assert!(out.is_utf8); + let results = out.results.unwrap(); + assert_eq!(results[0].content.as_deref(), Some("X\n")); + assert_eq!(results[1].content.as_deref(), Some("Y\n")); + assert_eq!(results[2].content.as_deref(), Some("Z\n")); } + // ----------------------------------------------------------------------- + // Batch mode — per-entry failures don't consume budget + // ----------------------------------------------------------------------- + #[tokio::test] - async fn refuses_non_accessible() { + async fn batch_per_entry_c211_missing_does_not_consume_budget() { + let (tmp, r, c) = setup_with_batch_budget(20); // 20 bytes total + std::fs::write(tmp.path().join("ok.txt"), "hello\n").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("missing.txt".into()), // fails — no budget consumed + ReadTarget::Path("ok.txt".into()), // succeeds + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(!results[0].success, "missing entry must fail"); + let wire = results[0].error.as_ref().unwrap(); + assert_eq!(wire.code, "C211"); + assert!(results[1].success, "next entry must succeed"); + assert_eq!(results[1].content.as_deref(), Some("hello\n")); + } + + #[tokio::test] + async fn batch_per_entry_c211_glob_denied_does_not_consume_budget() { + let (tmp, r, c) = setup_with_batch_budget(20); + std::fs::write(tmp.path().join(".env"), "secret").unwrap(); + std::fs::write(tmp.path().join("ok.txt"), "hello\n").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path(".env".into()), // denied — no budget consumed + ReadTarget::Path("ok.txt".into()), // succeeds + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(!results[0].success); + assert_eq!(results[0].error.as_ref().unwrap().code, "C211"); + assert!(results[1].success); + } + + // ----------------------------------------------------------------------- + // Batch mode — budget partial (more_lines mid-entry) + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_budget_partial_entry_has_more_lines_true() { + // Budget: 10 bytes. Each line is 5 bytes. First entry consumes 10 + // bytes (2 lines). Second entry has zero budget → C213. + let (tmp, r, c) = setup_with_batch_budget(10); + std::fs::write(tmp.path().join("a.txt"), "aaaa\nbbbb\ncccc\n").unwrap(); + std::fs::write(tmp.path().join("b.txt"), "data\n").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("a.txt".into()), + ReadTarget::Path("b.txt".into()), + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + // First entry: 2 lines fit (10 bytes), 3rd line cut → more_lines=true + assert!(results[0].success); + assert_eq!(results[0].more_lines, Some(true)); + assert_eq!(results[0].lines_returned, Some(2)); + // Second entry: zero budget → C213 + assert!(!results[1].success); + let wire = results[1].error.as_ref().unwrap(); + assert_eq!(wire.code, "C213"); + assert!( + wire.message.contains("batch_read_budget_bytes"), + "C213 must name the config key: {}", + wire.message + ); + assert!( + wire.message.contains("10"), + "C213 must name the budget value: {}", + wire.message + ); + } + + // ----------------------------------------------------------------------- + // Batch mode — zero-budget C213 details + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_zero_budget_entry_c213_names_key_and_value() { + let (tmp, r, c) = setup_with_batch_budget(5); + // First file: 5 bytes, consumes entire budget. + std::fs::write(tmp.path().join("first.txt"), "abcde").unwrap(); + std::fs::write(tmp.path().join("second.txt"), "x").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("first.txt".into()), + ReadTarget::Path("second.txt".into()), + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success); + assert!(!results[1].success); + let wire = results[1].error.as_ref().unwrap(); + assert_eq!(wire.code, "C213"); + assert!(wire.message.contains("batch_read_budget_bytes")); + assert!(wire.message.contains('5'), "must name the value"); + // Recovery guidance + assert!( + wire.message.contains("line_from") || wire.message.contains("raise"), + "must include recovery guidance: {}", + wire.message + ); + } + + // ----------------------------------------------------------------------- + // Batch mode — tiny budget (budget < first line → empty success) + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_budget_smaller_than_first_line_succeeds_with_empty_more_lines() { + // Budget of 2 bytes; first line is "aaaaaaaa\n" (9 bytes). + // No-torn-lines: empty content, more_lines=true — NOT a C213 error. + let (tmp, r, c) = setup_with_batch_budget(2); + std::fs::write(tmp.path().join("f.txt"), "aaaaaaaa\nb\n").unwrap(); + let out = handle(r, c, batch(vec![ReadTarget::Path("f.txt".into())])) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!( + results[0].success, + "budget < line length is still a success per no-torn-lines" + ); + assert_eq!(results[0].content.as_deref(), Some("")); + assert_eq!(results[0].lines_returned, Some(0)); + assert_eq!(results[0].more_lines, Some(true)); + } + + // ----------------------------------------------------------------------- + // Batch mode — budget unit is CONVERTED wire bytes (T8 review fix 1) + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_budget_counts_wire_bytes_not_raw() { + // REVIEWER REPRO: invalid bytes expand 3x under lossy conversion + // (1x0xFF → one 3-byte U+FFFD), so a raw-byte budget would let + // binary entries deliver up to 3x the configured cap. Two raw + // lines of 3x0xFF + '\n' (4 raw bytes) each convert to 10 wire + // bytes; budget 10 → exactly one converted line fits. + let (tmp, r, c) = setup_with_batch_budget(10); + std::fs::write(tmp.path().join("bin.dat"), b"\xFF\xFF\xFF\n\xFF\xFF\xFF\n").unwrap(); + std::fs::write(tmp.path().join("next.txt"), "x\n").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("bin.dat".into()), + ReadTarget::Path("next.txt".into()), + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + + // Entry 0: one converted line (exactly 10 wire bytes) fits; the + // second would exceed the budget → partial success per + // no-torn-lines on the CONVERTED form. + assert!(results[0].success); + assert_eq!(results[0].content.as_ref().unwrap().len(), 10); + assert_eq!(results[0].lines_returned, Some(1)); + assert_eq!(results[0].more_lines, Some(true)); + assert_eq!(results[0].is_utf8, Some(false)); + + // Entry 1: zero wire budget remains → C213 reporting the ACTUAL + // accounted consumption, not a hardcoded value. + assert!(!results[1].success); + let wire = results[1].error.as_ref().unwrap(); + assert_eq!(wire.code, "C213"); + assert!( + wire.message.contains("batch_read_budget_bytes is 10"), + "C213 must name the key + value: {}", + wire.message + ); + assert!( + wire.message.contains("returned 10 bytes"), + "C213 must report actual accounted consumption: {}", + wire.message + ); + + // INVARIANT: total delivered wire bytes never exceed the budget. + let total: usize = results + .iter() + .filter_map(|e| e.content.as_ref().map(String::len)) + .sum(); + assert!(total <= 10, "delivered {total} wire bytes > budget 10"); + } + + #[tokio::test] + async fn batch_binary_line_over_wire_budget_is_empty_success_not_torn() { + // 10 raw 0xFF bytes = one EOF-terminated line converting to 30 + // wire bytes. Budget 10: the converted line cannot fit → empty + // SUCCESS with more_lines=true (no-torn-lines on the converted + // form), zero consumed — the raw accounting bug delivered all 30 + // wire bytes here, 3x the budget. + let (tmp, r, c) = setup_with_batch_budget(10); + std::fs::write(tmp.path().join("bin.dat"), vec![0xFFu8; 10]).unwrap(); + std::fs::write(tmp.path().join("ok.txt"), "hi\n").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("bin.dat".into()), + ReadTarget::Path("ok.txt".into()), + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success); + assert_eq!(results[0].content.as_deref(), Some("")); + assert_eq!(results[0].more_lines, Some(true)); + // Nothing delivered → nothing consumed: the next entry reads fine. + assert!(results[1].success); + assert_eq!(results[1].content.as_deref(), Some("hi\n")); + let total: usize = results + .iter() + .filter_map(|e| e.content.as_ref().map(String::len)) + .sum(); + assert!(total <= 10, "delivered {total} wire bytes > budget 10"); + } + + // ----------------------------------------------------------------------- + // Batch mode — redaction invariant survives exhaustion (T8 review fix 2) + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_post_exhaustion_missing_and_denied_indistinguishable() { + // After the budget hits zero, a missing path and a glob-denied + // path must BOTH return C211 with byte-identical message suffixes + // and verbatim path echoes — C213 may only reach an existing, + // accessible entry, or an agent could probe for protected files + // by exhausting the budget first. + let (tmp, r, c) = setup_with_batch_budget(5); + std::fs::write(tmp.path().join("eat.txt"), "abcde").unwrap(); + std::fs::write(tmp.path().join(".env"), "secret").unwrap(); + std::fs::write(tmp.path().join("exists.txt"), "x").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("eat.txt".into()), // consumes the whole budget + ReadTarget::Path("missing.txt".into()), // must be C211, NOT C213 + ReadTarget::Path(".env".into()), // C211 (glob-denied) + ReadTarget::Path("exists.txt".into()), // C213 — exists + accessible + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success); + assert_eq!(results[0].content.as_deref(), Some("abcde")); + + let missing = results[1].error.as_ref().unwrap(); + let denied = results[2].error.as_ref().unwrap(); + assert_eq!( + missing.code, "C211", + "missing after exhaustion: {missing:?}" + ); + assert_eq!(denied.code, "C211", "denied after exhaustion: {denied:?}"); + // Byte-identical suffix after the caller-supplied path prefix + // (T3 suffix-comparison pattern — REDACTION INVARIANT). + let m_suffix = missing + .message + .strip_prefix("missing.txt: ") + .expect("missing message starts with its wire path"); + let d_suffix = denied + .message + .strip_prefix(".env: ") + .expect("denied message starts with its wire path"); + assert_eq!( + m_suffix, d_suffix, + "C211 missing vs glob-denied suffixes must be byte-identical" + ); + // Both echo the caller's input verbatim (canonical echo for the + // missing case would itself distinguish the two). + assert_eq!(results[1].path, "missing.txt"); + assert_eq!(results[2].path, ".env"); + + // Only the existing, accessible entry receives the budget C213. + assert_eq!(results[3].error.as_ref().unwrap().code, "C213"); + } + + // ----------------------------------------------------------------------- + // Batch mode — per-entry window C210 propagates as per-entry error + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_per_entry_window_c210_fails_that_entry_others_proceed() { let (tmp, r, c) = setup(); - std::fs::write(tmp.path().join(".env"), b"secret").unwrap(); - let err = handle( + std::fs::write(tmp.path().join("a.txt"), numbered_lines(5)).unwrap(); + std::fs::write(tmp.path().join("b.txt"), "ok\n").unwrap(); + let out = handle( r, c, - ReadFileInput { - path: ".env".into(), - }, + batch(vec![ + // line_to < line_from → C210 + target_window("a.txt", Some(5), Some(2)), + ReadTarget::Path("b.txt".into()), + ]), ) .await - .unwrap_err(); - assert!(err.contains("C211"), "got: {err}"); + .unwrap(); + let results = out.results.unwrap(); + assert!(!results[0].success); + assert_eq!(results[0].error.as_ref().unwrap().code, "C210"); + assert!(results[1].success); + assert_eq!(results[1].content.as_deref(), Some("ok\n")); + } + + // ----------------------------------------------------------------------- + // Batch mode — empty batch succeeds with empty results + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn batch_empty_paths_returns_empty_results() { + let (_tmp, r, c) = setup(); + let out = handle(r, c, batch(vec![])).await.unwrap(); + assert!(out.results.unwrap().is_empty()); + } + + // ----------------------------------------------------------------------- + // S4 — stat probe (single-path) + // ----------------------------------------------------------------------- + + /// Jail with custom full-read output budget + read cap. + fn setup_with_output_budget( + max_output: u64, + max_read: u64, + ) -> (tempfile::TempDir, Arc, Arc) { + let tmp = tempdir().unwrap(); + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec!["**/.env".to_string()], + max_read_bytes: max_read, + max_output_bytes: max_output, + ..CoderConfig::default() + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + (tmp, resolver, cfg) + } + + /// Parse a top-level Err(String) wire JSON into (code, message). + fn parse_wire(err: &str) -> (String, String) { + let v: serde_json::Value = serde_json::from_str(err).expect("wire JSON"); + ( + v["code"].as_str().unwrap().to_string(), + v["message"].as_str().unwrap().to_string(), + ) } #[tokio::test] - async fn refuses_file_above_max_read_bytes() { + async fn stat_single_returns_metadata_without_content() { let (tmp, r, c) = setup(); - std::fs::write(tmp.path().join("big.bin"), vec![0u8; 2048]).unwrap(); - let err = handle( + std::fs::write(tmp.path().join("s.txt"), "hello\nworld\n").unwrap(); + let out = handle(r, c, stat_req("s.txt")).await.unwrap(); + assert!(out.content.is_none(), "stat must not return content"); + assert_eq!(out.lines_returned, Some(0)); + assert_eq!(out.more_lines, Some(false)); + assert_eq!(out.size, Some(12)); + assert_eq!(out.total_lines, Some(2)); + assert_eq!(out.is_utf8, Some(true)); + assert!(out.mode.is_some()); + assert!(out.mtime.is_some()); + } + + #[tokio::test] + async fn stat_batch_entry_returns_metadata_without_content() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("s.txt"), "hello\nworld\n").unwrap(); + std::fs::write(tmp.path().join("t.txt"), "x\n").unwrap(); + let out = handle( r, c, - ReadFileInput { - path: "big.bin".into(), - }, + batch(vec![target_stat("s.txt"), ReadTarget::Path("t.txt".into())]), ) .await - .unwrap_err(); - assert!(err.contains("C213"), "got: {err}"); + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success); + assert!(results[0].content.is_none()); + assert_eq!(results[0].lines_returned, Some(0)); + assert_eq!(results[0].more_lines, Some(false)); + assert_eq!(results[0].size, Some(12)); + assert_eq!(results[0].total_lines, Some(2)); + assert_eq!(results[0].is_utf8, Some(true)); + // Next entry reads normally — stat consumed no budget. + assert!(results[1].success); + assert_eq!(results[1].content.as_deref(), Some("x\n")); } + /// REDACTION ORDERING regression: stat resolves + deny-checks BEFORE + /// any metadata syscall, so stat on a denied path is byte-identical + /// to stat on a missing one. #[tokio::test] - async fn rejects_directory_with_bad_input() { + async fn stat_denied_byte_identical_to_missing_single() { let (tmp, r, c) = setup(); - std::fs::create_dir(tmp.path().join("d")).unwrap(); - let err = handle(r, c, ReadFileInput { path: "d".into() }) + std::fs::write(tmp.path().join(".env"), "secret-data").unwrap(); + let denied = handle(r.clone(), c.clone(), stat_req(".env")) .await .unwrap_err(); + let missing = handle(r, c, stat_req("missing.txt")).await.unwrap_err(); + let (d_code, d_msg) = parse_wire(&denied); + let (m_code, m_msg) = parse_wire(&missing); + assert_eq!(d_code, "C211"); + assert_eq!(m_code, "C211"); + let d_suffix = d_msg.strip_prefix(".env: ").expect("denied prefix"); + let m_suffix = m_msg.strip_prefix("missing.txt: ").expect("missing prefix"); + assert_eq!( + d_suffix, m_suffix, + "stat C211 suffixes must be byte-identical" + ); + } + + #[tokio::test] + async fn stat_denied_byte_identical_to_missing_batch_entries() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join(".env"), "secret-data").unwrap(); + let out = handle( + r, + c, + batch(vec![target_stat(".env"), target_stat("missing.txt")]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + let denied = results[0].error.as_ref().unwrap(); + let missing = results[1].error.as_ref().unwrap(); + assert_eq!(denied.code, "C211"); + assert_eq!(missing.code, "C211"); + let d_suffix = denied.message.strip_prefix(".env: ").unwrap(); + let m_suffix = missing.message.strip_prefix("missing.txt: ").unwrap(); + assert_eq!(d_suffix, m_suffix); + // Verbatim path echoes — canonical echo would itself distinguish. + assert_eq!(results[0].path, ".env"); + assert_eq!(results[1].path, "missing.txt"); + } + + #[tokio::test] + async fn stat_big_file_returns_size_with_null_total_lines() { + let (tmp, r, c) = setup_with_cap(1024); + std::fs::write(tmp.path().join("big.bin"), vec![b'a'; 2048]).unwrap(); + let out = handle(r, c, stat_req("big.bin")).await.unwrap(); + assert_eq!(out.size, Some(2048), "stat on a big file SUCCEEDS"); + assert_eq!(out.total_lines, None, "not countable within max_read_bytes"); + assert_eq!(out.is_utf8, None, "not verifiable within max_read_bytes"); + assert!(out.content.is_none()); + assert!(out.mode.is_some()); + assert!(out.mtime.is_some()); + } + + #[tokio::test] + async fn stat_with_window_is_c210() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\n").unwrap(); + let req = ReadFileInput { + path: Some("f.txt".into()), + stat: true, + line_from: Some(1), + line_to: Some(2), + ..ReadFileInput::default() + }; + let err = handle(r, c, req).await.unwrap_err(); assert!(err.contains("C210"), "got: {err}"); + assert!(err.contains("stat"), "must name the conflict: {err}"); + assert!(err.contains("line_from"), "must name the field: {err}"); } #[tokio::test] - async fn missing_file_returns_c211() { - let (_tmp, r, c) = setup(); - let err = handle( + async fn stat_with_numbered_is_c210() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\n").unwrap(); + let req = ReadFileInput { + path: Some("f.txt".into()), + stat: true, + numbered: true, + ..ReadFileInput::default() + }; + let err = handle(r, c, req).await.unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!(err.contains("numbered"), "must name the field: {err}"); + } + + #[tokio::test] + async fn stat_with_max_output_bytes_is_c210() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\n").unwrap(); + let req = ReadFileInput { + path: Some("f.txt".into()), + stat: true, + max_output_bytes: Some(64), + ..ReadFileInput::default() + }; + let err = handle(r, c, req).await.unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!( + err.contains("max_output_bytes"), + "must name the field: {err}" + ); + } + + #[tokio::test] + async fn batch_stat_entry_with_window_or_numbered_is_per_entry_c210() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.txt"), "a\n").unwrap(); + std::fs::write(tmp.path().join("ok.txt"), "ok\n").unwrap(); + let out = handle( r, c, - ReadFileInput { - path: "nope.txt".into(), - }, + batch(vec![ + ReadTarget::Window { + path: "a.txt".into(), + line_from: Some(1), + line_to: Some(1), + stat: true, + numbered: false, + }, + ReadTarget::Window { + path: "a.txt".into(), + line_from: None, + line_to: None, + stat: true, + numbered: true, + }, + ReadTarget::Path("ok.txt".into()), + ]), ) .await - .unwrap_err(); - assert!(err.contains("C211"), "got: {err}"); + .unwrap(); + let results = out.results.unwrap(); + assert_eq!(results[0].error.as_ref().unwrap().code, "C210"); + assert_eq!(results[1].error.as_ref().unwrap().code, "C210"); + assert!( + results[1] + .error + .as_ref() + .unwrap() + .message + .contains("numbered"), + "per-entry C210 must name the field" + ); + assert!(results[2].success, "other entries proceed"); + } + + /// stat is the cheap probe: it stays available after the batch budget + /// is exhausted (no content, no budget interaction) — while error + /// classification for denied/missing paths stays C211 regardless. + #[tokio::test] + async fn batch_stat_entry_succeeds_after_budget_exhaustion() { + let (tmp, r, c) = setup_with_batch_budget(5); + std::fs::write(tmp.path().join("eat.txt"), "abcde").unwrap(); + std::fs::write(tmp.path().join("probe.txt"), "p1\np2\n").unwrap(); + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("eat.txt".into()), // consumes the whole budget + target_stat("probe.txt"), // still succeeds + ReadTarget::Path("probe.txt".into()), // C213 — budget gone + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success); + assert!(results[1].success, "stat must not require budget"); + assert_eq!(results[1].total_lines, Some(2)); + assert_eq!(results[1].size, Some(6)); + assert!(results[1].content.is_none()); + assert_eq!(results[2].error.as_ref().unwrap().code, "C213"); + } + + // ----------------------------------------------------------------------- + // S4 — numbered reads + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn numbered_full_read_prefixes_absolute_lines() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\nb\nc\n").unwrap(); + let req = ReadFileInput { + path: Some("f.txt".into()), + numbered: true, + ..ReadFileInput::default() + }; + let out = handle(r, c, req).await.unwrap(); + assert_eq!( + out.content.as_deref(), + Some("1\u{2192}a\n2\u{2192}b\n3\u{2192}c\n") + ); + // Counters keep their meaning — numbering changes bytes, not lines. + assert_eq!(out.lines_returned, Some(3)); + assert_eq!(out.total_lines, Some(3)); + assert_eq!(out.more_lines, Some(false)); + } + + #[tokio::test] + async fn numbered_window_numbers_from_line_from() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), numbered_lines(10)).unwrap(); + let req = ReadFileInput { + path: Some("f.txt".into()), + line_from: Some(3), + line_to: Some(5), + numbered: true, + ..ReadFileInput::default() + }; + let out = handle(r, c, req).await.unwrap(); + assert_eq!( + out.content.as_deref(), + Some("3\u{2192}L3\n4\u{2192}L4\n5\u{2192}L5\n"), + "numbering is ABSOLUTE: starts at line_from, not 1" + ); + assert_eq!(out.lines_returned, Some(3)); + } + + #[tokio::test] + async fn numbered_prefix_charged_to_batch_budget() { + // a.txt prefixed = "1→aaaa\n2→bbbb\n" = 18 wire bytes, exactly the + // budget; unprefixed it is only 10 and b.txt would still fit. The + // prefix bytes must consume the budget → b.txt gets C213. + let (tmp, r, c) = setup_with_batch_budget(18); + std::fs::write(tmp.path().join("a.txt"), "aaaa\nbbbb\n").unwrap(); + std::fs::write(tmp.path().join("b.txt"), "x\n").unwrap(); + let out = handle( + r.clone(), + c.clone(), + batch(vec![ + target_numbered("a.txt"), + ReadTarget::Path("b.txt".into()), + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success); + assert_eq!( + results[0].content.as_deref(), + Some("1\u{2192}aaaa\n2\u{2192}bbbb\n") + ); + assert!(!results[1].success, "prefix bytes must consume budget"); + assert_eq!(results[1].error.as_ref().unwrap().code, "C213"); + + // Control: the same batch unprefixed fits both entries. + let out = handle( + r, + c, + batch(vec![ + ReadTarget::Path("a.txt".into()), + ReadTarget::Path("b.txt".into()), + ]), + ) + .await + .unwrap(); + let results = out.results.unwrap(); + assert!(results[0].success && results[1].success); + } + + #[tokio::test] + async fn numbered_full_read_prefixes_converted_lossy_lines() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("bin.dat"), b"\xFF\xFF\nok\n").unwrap(); + let req = ReadFileInput { + path: Some("bin.dat".into()), + numbered: true, + ..ReadFileInput::default() + }; + let out = handle(r, c, req).await.unwrap(); + assert_eq!( + out.content.as_deref(), + Some("1\u{2192}\u{FFFD}\u{FFFD}\n2\u{2192}ok\n"), + "prefix rides on the CONVERTED line" + ); + assert_eq!(out.is_utf8, Some(false)); + } + + // ----------------------------------------------------------------------- + // S4 — full-read context budget (max_output_bytes) + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn full_read_over_output_budget_returns_recovery_c213() { + let (tmp, r, c) = setup_with_output_budget(10, 1024); + std::fs::write(tmp.path().join("big.txt"), "aaaa\nbbbb\ncccc\ndddd\n").unwrap(); + let err = handle(r, c, full("big.txt")).await.unwrap_err(); + let (code, msg) = parse_wire(&err); + assert_eq!(code, "C213"); + // The message is itself the recovery tool: size, total_lines, the + // config key + per-call override, and every corrective call. + assert!(msg.contains("20 bytes"), "must carry file size: {msg}"); + assert!(msg.contains("4 lines"), "must carry total_lines: {msg}"); + assert!( + msg.matches("max_output_bytes").count() >= 2, + "must name the config key AND the per-call override: {msg}" + ); + assert!(msg.contains("line_from"), "window guidance: {msg}"); + assert!(msg.contains("stat: true"), "stat guidance: {msg}"); + } + + #[tokio::test] + async fn full_read_at_default_budget_boundary() { + // 128 KiB exactly passes; one byte more fails — under the DEFAULT + // config (no custom budget), pinning the 131072 default. + let (tmp, r, c) = setup_with_cap(10 * 1024 * 1024); + std::fs::write(tmp.path().join("fits.txt"), vec![b'a'; 131_072]).unwrap(); + std::fs::write(tmp.path().join("over.txt"), vec![b'a'; 131_073]).unwrap(); + let out = handle(r.clone(), c.clone(), full("fits.txt")) + .await + .unwrap(); + assert_eq!(out.content.unwrap().len(), 131_072); + let err = handle(r, c, full("over.txt")).await.unwrap_err(); + assert!(err.contains("C213"), "got: {err}"); + assert!(err.contains("max_output_bytes"), "got: {err}"); + } + + #[tokio::test] + async fn per_call_max_output_bytes_admits_larger_read() { + let (tmp, r, c) = setup_with_output_budget(10, 1024); + std::fs::write(tmp.path().join("big.txt"), "aaaa\nbbbb\ncccc\ndddd\n").unwrap(); + let req = ReadFileInput { + path: Some("big.txt".into()), + max_output_bytes: Some(100), + ..ReadFileInput::default() + }; + let out = handle(r, c, req).await.unwrap(); + assert_eq!(out.content.unwrap().len(), 20); + assert_eq!(out.total_lines, Some(4)); + } + + #[tokio::test] + async fn per_call_max_output_bytes_clamps_to_max_read_bytes() { + // File: 8 invalid bytes + '\n' = 9 raw bytes (under max_read_bytes + // 15) but 25 CONVERTED wire bytes. Per-call budget 1000 silently + // clamps to max_read_bytes (15) → 25 > 15 → C213. The config + // budget (1000) alone would have admitted it — the clamp applies + // to the per-call override. + let (tmp, r, c) = setup_with_output_budget(1000, 15); + let mut body = vec![0xFFu8; 8]; + body.push(b'\n'); + std::fs::write(tmp.path().join("bin.dat"), body).unwrap(); + let out = handle(r.clone(), c.clone(), full("bin.dat")).await.unwrap(); + assert_eq!( + out.content.unwrap().len(), + 25, + "config budget admits the read" + ); + let req = ReadFileInput { + path: Some("bin.dat".into()), + max_output_bytes: Some(1000), + ..ReadFileInput::default() + }; + let err = handle(r, c, req).await.unwrap_err(); + assert!(err.contains("C213"), "clamped per-call must refuse: {err}"); + } + + #[tokio::test] + async fn windowed_read_unaffected_by_output_budget() { + // The same file whose FULL read exceeds max_output_bytes streams + // fine through a window — windows are governed by max_read_bytes. + let (tmp, r, c) = setup_with_output_budget(10, 1024); + std::fs::write(tmp.path().join("big.txt"), "aaaa\nbbbb\ncccc\ndddd\n").unwrap(); + let err = handle(r.clone(), c.clone(), full("big.txt")) + .await + .unwrap_err(); + assert!(err.contains("C213")); + let out = handle(r, c, window_req("big.txt", Some(1), Some(4))) + .await + .unwrap(); + assert_eq!(out.content.as_deref(), Some("aaaa\nbbbb\ncccc\ndddd\n")); + } + + #[tokio::test] + async fn max_output_bytes_with_window_is_c210() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), "a\nb\n").unwrap(); + let req = ReadFileInput { + path: Some("f.txt".into()), + line_from: Some(1), + line_to: Some(2), + max_output_bytes: Some(64), + ..ReadFileInput::default() + }; + let err = handle(r, c, req).await.unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!(err.contains("max_output_bytes"), "got: {err}"); + } + + /// REDACTION ORDERING regression: resolve → deny (C211) → size → budget + /// (C213). A denied or missing path must classify C211 no matter how + /// the budget relates to the file. + #[tokio::test] + async fn denied_huge_file_is_c211_not_c213() { + let (tmp, r, c) = setup_with_output_budget(10, 1024); + std::fs::write(tmp.path().join(".env"), "aaaa\nbbbb\ncccc\ndddd\n").unwrap(); + let err = handle(r, c, full(".env")).await.unwrap_err(); + let (code, _) = parse_wire(&err); + assert_eq!(code, "C211", "deny must precede budget: {err}"); + } + + #[tokio::test] + async fn missing_file_with_budget_override_is_c211() { + let (_tmp, r, c) = setup_with_output_budget(10, 1024); + let req = ReadFileInput { + path: Some("missing.txt".into()), + max_output_bytes: Some(5), + ..ReadFileInput::default() + }; + let err = handle(r, c, req).await.unwrap_err(); + let (code, _) = parse_wire(&err); + assert_eq!(code, "C211", "budget must never reclassify C211: {err}"); } } diff --git a/coder/src/functions/read_window.rs b/coder/src/functions/read_window.rs new file mode 100644 index 00000000..12441511 --- /dev/null +++ b/coder/src/functions/read_window.rs @@ -0,0 +1,567 @@ +//! Windowed streaming read primitives shared between single-path and batch +//! modes of `coder::read-file`. +//! +//! Extracted from `read_file.rs` (T8 size-pressure extraction) so +//! `read_file.rs` stays focused on the two handler modes. +//! +//! Two budget units live here, deliberately: +//! - [`read_window`] (single-path mode) budgets RAW file bytes — the T7 +//! contract for `max_read_bytes`. +//! - [`read_window_wire`] (batch mode) budgets CONVERTED WIRE BYTES — +//! `content.len()` after lossy UTF-8 sanitization — because the batch +//! budget exists to bound what the caller's context actually receives, +//! and U+FFFD expansion (1 invalid byte → 3 wire bytes) would otherwise +//! let binary files deliver up to 3x the configured budget. + +use std::io::BufRead; + +/// `N→` prefix for line `n` when numbering is on; empty when off. The +/// prefix is injected AT COLLECTION TIME so its bytes are charged against +/// the same budget as the line itself — numbering can never smuggle bytes +/// past a cap. (`→` is U+2192: 3 UTF-8 bytes.) +fn line_prefix(numbered: bool, n: u64) -> String { + if numbered { + format!("{n}\u{2192}") + } else { + String::new() + } +} + +/// Prefix every line of an already-converted body with its 1-based line +/// number (`N→`), numbering from `start`. Lines follow the shared +/// convention (0x0A- or EOF-terminated segments; empty input has none), +/// so numbering here matches `count_lines` and `coder::update-file`'s +/// line ops exactly. Used by the full-read path, where the whole body is +/// materialized before numbering. +pub fn number_lines(content: &str, start: u64) -> String { + let mut out = String::with_capacity(content.len() + content.len() / 4); + for (n, segment) in (start..).zip(content.split_inclusive('\n')) { + out.push_str(&line_prefix(true, n)); + out.push_str(segment); + } + out +} + +/// Outcome of the shared skip phase: either the stream reached the window +/// start, or EOF arrived first (in which case the file's full line count +/// is known for free — agents probe past EOF; not an error). +enum SkipOutcome { + /// Lines `1..from` consumed; collection may begin. Carries the number + /// of lines consumed so far (`from - 1`). + Reached { consumed: u64 }, + /// EOF before the window starts; `total` is the file's line count. + Eof { total: u64 }, +} + +/// Skip lines `1..from`, chunk-wise, buffering nothing: consume buffer +/// chunks counting 0x0A bytes. Shared by [`read_window`] and +/// [`read_window_wire`] so the two budget flavors can never drift in how +/// they locate the window start. +fn skip_to_window(reader: &mut R, from: u64) -> std::io::Result { + let mut consumed: u64 = 0; + let mut in_partial_line = false; + while consumed + 1 < from { + let available = reader.fill_buf()?; + if available.is_empty() { + return Ok(SkipOutcome::Eof { + total: consumed + u64::from(in_partial_line), + }); + } + match available.iter().position(|&b| b == b'\n') { + Some(idx) => { + reader.consume(idx + 1); + consumed += 1; + in_partial_line = false; + } + None => { + let len = available.len(); + reader.consume(len); + in_partial_line = true; + } + } + } + Ok(SkipOutcome::Reached { consumed }) +} + +/// Outcome of a streamed window read. `raw` holds the window's exact +/// bytes; lossy UTF-8 conversion happens ONCE on the whole collected +/// chunk, AFTER raw 0x0A line splitting, so an invalid multi-byte +/// sequence can never corrupt the line structure. +pub struct Window { + pub raw: Vec, + pub lines_returned: u64, + pub total_lines: Option, + pub more_lines: bool, +} + +/// Stream lines `from..=to` (`to: None` = to EOF) out of `reader` +/// without ever materializing the full body: +/// +/// - skip phase: consume buffer chunks counting 0x0A bytes, buffering +/// nothing; +/// - collect phase: buffer one line at a time, stopping when adding a +/// line would push the collected window over `max_bytes` — a partial +/// window is a SUCCESS with `more_lines: true`. A line that does not +/// fit is excluded entirely (the window never returns a torn line); +/// when even the first window line exceeds the budget, the window is +/// empty with `more_lines: true`. Per-line buffering is itself capped +/// at the remaining budget + 1 byte, so peak memory stays bounded by +/// ~2x `max_bytes` regardless of line length. +/// +/// `total_lines` is reported only when the stream naturally reached EOF +/// (skip phase past EOF, collect phase hitting EOF, or an exact-`to` +/// window whose post-window peek shows EOF) — never via a forced scan. +/// +/// With `numbered: true` each collected line is prefixed `N→` (N = the +/// line's absolute 1-based number in the FILE — `consumed + 1`, so a +/// window starting at line 40 is numbered from 40). Prefix bytes count +/// toward `max_bytes`: a line is excluded when prefix + raw line would +/// exceed the remaining budget. +pub fn read_window( + reader: &mut R, + from: u64, + to: Option, + max_bytes: u64, + numbered: bool, +) -> std::io::Result { + // Lines fully consumed from the stream so far (skipped + collected); + // the next line to read is `consumed + 1`. + let mut consumed = match skip_to_window(reader, from)? { + SkipOutcome::Eof { total } => { + return Ok(Window { + raw: Vec::new(), + lines_returned: 0, + total_lines: Some(total), + more_lines: false, + }) + } + SkipOutcome::Reached { consumed } => consumed, + }; + + // --- collect phase: lines from..=to (or EOF / byte budget) ---------- + let mut raw: Vec = Vec::new(); + let mut lines_returned: u64 = 0; + let mut line_buf: Vec = Vec::new(); + loop { + if to.is_some_and(|t| consumed >= t) { + // Window complete. Peek (without consuming) to learn whether + // anything follows — EOF here means the whole file was + // traversed and the total line count is known for free. + let at_eof = reader.fill_buf()?.is_empty(); + return Ok(Window { + raw, + lines_returned, + total_lines: at_eof.then_some(consumed), + more_lines: !at_eof, + }); + } + let budget_left = max_bytes.saturating_sub(raw.len() as u64); + line_buf.clear(); + // Cap the line read at budget+1 bytes: enough to tell "fits" + // from "does not fit" without buffering an arbitrarily long line. + // UFCS pins `Self = &mut R` so the reader is reborrowed (not + // moved) into the `Take` adapter. + let n = std::io::Read::take(&mut *reader, budget_left.saturating_add(1)) + .read_until(b'\n', &mut line_buf)? as u64; + if n == 0 { + // Natural EOF at/under the byte cap: total known. + return Ok(Window { + raw, + lines_returned, + total_lines: Some(consumed), + more_lines: false, + }); + } + // The `N→` prefix is charged against the SAME budget as the line: + // a line that fits raw but not prefixed is excluded entirely. + let prefix = line_prefix(numbered, consumed + 1); + if n.saturating_add(prefix.len() as u64) > budget_left { + // Byte budget exhausted mid-window: partial window, success. + return Ok(Window { + raw, + lines_returned, + total_lines: None, + more_lines: true, + }); + } + raw.extend_from_slice(prefix.as_bytes()); + raw.extend_from_slice(&line_buf); + lines_returned += 1; + consumed += 1; + if line_buf.last() != Some(&b'\n') { + // EOF-terminated final segment: it IS a line (convention), + // and the stream is exhausted. + return Ok(Window { + raw, + lines_returned, + total_lines: Some(consumed), + more_lines: false, + }); + } + } +} + +/// Outcome of a wire-budgeted window read (batch mode). The budget unit +/// is CONVERTED WIRE BYTES — `content.len()` after lossy UTF-8 +/// sanitization — not raw file bytes. Each line is converted as it is +/// collected and counted in converted form, so binary input (whose +/// invalid bytes expand to 3-byte U+FFFD replacements) can never deliver +/// more than the budget. Raw 0x0A line splitting still happens BEFORE +/// conversion — 0x0A can never appear inside a multi-byte UTF-8 +/// sequence, so per-line conversion concatenates to exactly the string a +/// whole-body conversion would produce. +pub struct WireWindow { + pub content: String, + pub is_utf8: bool, + pub lines_returned: u64, + pub total_lines: Option, + pub more_lines: bool, +} + +/// [`read_window`], but budgeted in CONVERTED wire bytes (see +/// [`WireWindow`]). Same skip phase and the same no-torn-lines rule, +/// applied to each line's CONVERTED form: a line whose converted length +/// exceeds the remaining budget is excluded entirely (`more_lines: +/// true`); when even the first line's converted form exceeds the budget, +/// the window is empty with `more_lines: true`. +/// +/// Memory bound: lossy conversion never SHRINKS a byte sequence (valid +/// bytes map 1:1; each invalid maximal subpart of 1-3 bytes becomes one +/// 3-byte U+FFFD), so capping the RAW per-line read at remaining+1 bytes +/// both bounds peak memory (~2x budget) and detects "cannot fit" early: +/// raw overflow already implies converted overflow. +/// +/// With `numbered: true` each line is prefixed `N→` (absolute 1-based +/// file line number); the prefix's bytes are charged against +/// `max_wire_bytes` together with the line's CONVERTED form. +pub fn read_window_wire( + reader: &mut R, + from: u64, + to: Option, + max_wire_bytes: u64, + numbered: bool, +) -> std::io::Result { + let mut consumed = match skip_to_window(reader, from)? { + SkipOutcome::Eof { total } => { + return Ok(WireWindow { + content: String::new(), + is_utf8: true, + lines_returned: 0, + total_lines: Some(total), + more_lines: false, + }) + } + SkipOutcome::Reached { consumed } => consumed, + }; + + let mut content = String::new(); + let mut is_utf8 = true; + let mut lines_returned: u64 = 0; + let mut line_buf: Vec = Vec::new(); + loop { + if to.is_some_and(|t| consumed >= t) { + let at_eof = reader.fill_buf()?.is_empty(); + return Ok(WireWindow { + content, + is_utf8, + lines_returned, + total_lines: at_eof.then_some(consumed), + more_lines: !at_eof, + }); + } + let budget_left = max_wire_bytes.saturating_sub(content.len() as u64); + line_buf.clear(); + let n = std::io::Read::take(&mut *reader, budget_left.saturating_add(1)) + .read_until(b'\n', &mut line_buf)? as u64; + if n == 0 { + // Natural EOF at/under the wire cap: total known. + return Ok(WireWindow { + content, + is_utf8, + lines_returned, + total_lines: Some(consumed), + more_lines: false, + }); + } + if n > budget_left { + // The RAW form already exceeds the remaining wire budget; + // the converted form can only be larger. Excluded entirely. + return Ok(WireWindow { + content, + is_utf8, + lines_returned, + total_lines: None, + more_lines: true, + }); + } + // Raw fits — convert this line and apply the fit test to the + // CONVERTED length (the unit the budget is defined in), plus the + // `N→` prefix when numbering: prefix bytes are budget bytes too. + let ends_with_newline = line_buf.last() == Some(&b'\n'); + // mem::take re-allocates line_buf each iteration — intentional: + // lossy_utf8 wants ownership, and the loop is budget-bounded, not hot. + let (line, line_utf8) = lossy_utf8(std::mem::take(&mut line_buf)); + let prefix = line_prefix(numbered, consumed + 1); + if (line.len() as u64).saturating_add(prefix.len() as u64) > budget_left { + return Ok(WireWindow { + content, + is_utf8, + lines_returned, + total_lines: None, + more_lines: true, + }); + } + content.push_str(&prefix); + content.push_str(&line); + is_utf8 &= line_utf8; + lines_returned += 1; + consumed += 1; + if !ends_with_newline { + // EOF-terminated final segment: it IS a line (convention), + // and the stream is exhausted. + return Ok(WireWindow { + content, + is_utf8, + lines_returned, + total_lines: Some(consumed), + more_lines: false, + }); + } + } +} + +/// Count lines per the shared convention (see `read_file` module docs): +/// 0x0A- or EOF-terminated segments; empty input has 0 lines; a trailing +/// newline does NOT add a phantom line. Matches `str::lines()` counting, +/// which `coder::update-file` uses for its 1-based line ops. +pub fn count_lines(bytes: &[u8]) -> u64 { + let newlines = bytes.iter().filter(|&&b| b == b'\n').count() as u64; + match bytes.last() { + None => 0, + Some(&b'\n') => newlines, + Some(_) => newlines + 1, + } +} + +/// UTF-8 conversion with the documented lossy semantics: valid input +/// passes through unchanged (`true`); invalid bytes become U+FFFD +/// (`false`). +pub fn lossy_utf8(bytes: Vec) -> (String, bool) { + match String::from_utf8(bytes) { + Ok(s) => (s, true), + Err(e) => { + let bytes = e.into_bytes(); + (String::from_utf8_lossy(&bytes).into_owned(), false) + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + /// `count_lines` must match `str::lines()` counting — the same + /// convention `coder::update-file` uses for its 1-based line ops. + #[test] + fn count_lines_matches_update_file_convention() { + let cases: &[&[u8]] = &[ + b"", + b"\n", + b"a", + b"a\n", + b"a\nb", + b"a\nb\n", + b"a\n\n\nb\n", + b"\r\n", + b"a\r\nb\r\n", + ]; + for bytes in cases { + let via_str_lines = String::from_utf8_lossy(bytes).lines().count() as u64; + assert_eq!( + count_lines(bytes), + via_str_lines, + "convention drift for {bytes:?}" + ); + } + } + + #[test] + fn lossy_utf8_valid_is_true() { + let (s, ok) = lossy_utf8(b"hello".to_vec()); + assert_eq!(s, "hello"); + assert!(ok); + } + + #[test] + fn lossy_utf8_invalid_is_false_with_replacement() { + let (s, ok) = lossy_utf8(vec![0xFF, 0xFE]); + assert!(!ok); + assert!(s.contains('\u{FFFD}')); + } + + #[test] + fn read_window_full_file() { + let data = b"line1\nline2\nline3\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window(&mut reader, 1, None, 1024, false).unwrap(); + assert_eq!(w.lines_returned, 3); + assert_eq!(w.total_lines, Some(3)); + assert!(!w.more_lines); + assert_eq!(w.raw, data); + } + + #[test] + fn read_window_subset() { + let data = b"L1\nL2\nL3\nL4\nL5\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window(&mut reader, 2, Some(4), 1024, false).unwrap(); + assert_eq!(w.lines_returned, 3); + assert_eq!(&w.raw, b"L2\nL3\nL4\n"); + assert!(w.more_lines); + assert_eq!(w.total_lines, None); + } + + #[test] + fn read_window_budget_cuts_partial() { + // Lines are 5 bytes each; budget=10 → only 2 lines fit. + let data = b"aaaa\nbbbb\ncccc\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window(&mut reader, 1, None, 10, false).unwrap(); + assert_eq!(w.lines_returned, 2); + assert!(w.more_lines); + assert_eq!(w.total_lines, None); + } + + #[test] + fn read_window_past_eof_returns_empty_with_total() { + let data = b"L1\nL2\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window(&mut reader, 10, Some(20), 1024, false).unwrap(); + assert_eq!(w.lines_returned, 0); + assert!(!w.more_lines); + assert_eq!(w.total_lines, Some(2)); + } + + // ----------------------------------------------------------------- + // read_window_wire — converted-wire-byte budget + // ----------------------------------------------------------------- + + #[test] + fn wire_budget_counts_converted_not_raw_bytes() { + // Two raw lines of 3x0xFF + '\n' (4 raw bytes each); each converts + // to 3xU+FFFD + '\n' = 10 wire bytes. Wire budget 10: exactly one + // converted line fits even though BOTH raw lines (8 bytes) would + // have fit a raw budget of 10. + let data: &[u8] = b"\xFF\xFF\xFF\n\xFF\xFF\xFF\n"; + let mut reader = std::io::BufReader::new(data); + let w = read_window_wire(&mut reader, 1, None, 10, false).unwrap(); + assert_eq!(w.lines_returned, 1); + assert_eq!(w.content.len(), 10, "delivered wire bytes == budget"); + assert_eq!(w.content, "\u{FFFD}\u{FFFD}\u{FFFD}\n"); + assert!(!w.is_utf8); + assert!(w.more_lines); + assert_eq!(w.total_lines, None); + } + + #[test] + fn wire_budget_first_converted_line_over_budget_is_empty_partial() { + // 10 raw 0xFF bytes = one EOF-terminated line converting to 30 + // wire bytes. Budget 10: excluded entirely per no-torn-lines on + // the CONVERTED form → empty success, more_lines=true. (The raw + // accounting bug delivered all 30 wire bytes here.) + let data = [0xFFu8; 10]; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window_wire(&mut reader, 1, None, 10, false).unwrap(); + assert_eq!(w.content, ""); + assert_eq!(w.lines_returned, 0); + assert!(w.more_lines); + assert_eq!(w.total_lines, None); + } + + #[test] + fn wire_budget_matches_raw_for_ascii() { + // For valid UTF-8 the two budget units coincide: identical output. + let data = b"aaaa\nbbbb\ncccc\n"; + let mut r1 = std::io::BufReader::new(&data[..]); + let raw = read_window(&mut r1, 1, None, 10, false).unwrap(); + let mut r2 = std::io::BufReader::new(&data[..]); + let wire = read_window_wire(&mut r2, 1, None, 10, false).unwrap(); + assert_eq!(wire.content.as_bytes(), &raw.raw[..]); + assert_eq!(wire.lines_returned, raw.lines_returned); + assert_eq!(wire.total_lines, raw.total_lines); + assert_eq!(wire.more_lines, raw.more_lines); + assert!(wire.is_utf8); + } + + #[test] + fn wire_window_past_eof_returns_empty_with_total() { + let data = b"L1\nL2\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window_wire(&mut reader, 10, Some(20), 1024, false).unwrap(); + assert_eq!(w.lines_returned, 0); + assert!(!w.more_lines); + assert_eq!(w.total_lines, Some(2)); + assert!(w.is_utf8, "empty window is vacuously clean"); + } + + // ----------------------------------------------------------------- + // numbered — `N→` prefixes at collection time + // ----------------------------------------------------------------- + + #[test] + fn number_lines_prefixes_each_segment_from_start() { + assert_eq!( + number_lines("a\nb\nc\n", 1), + "1\u{2192}a\n2\u{2192}b\n3\u{2192}c\n" + ); + // EOF-terminated final segment is a line too. + assert_eq!(number_lines("a\nb", 5), "5\u{2192}a\n6\u{2192}b"); + // Empty body has zero lines — nothing to number. + assert_eq!(number_lines("", 1), ""); + } + + #[test] + fn numbered_window_uses_absolute_file_line_numbers() { + let data = b"L1\nL2\nL3\nL4\nL5\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window(&mut reader, 3, Some(4), 1024, true).unwrap(); + assert_eq!(&w.raw, "3\u{2192}L3\n4\u{2192}L4\n".as_bytes()); + assert_eq!(w.lines_returned, 2); + } + + #[test] + fn numbered_prefix_counts_toward_raw_budget() { + // Each raw line is 5 bytes; the "1→"/"2→" prefix adds 4 bytes + // (digit + 3-byte arrow) → 9 per numbered line. Budget 10: + // unnumbered fits 2 lines, numbered fits only 1. + let data = b"aaaa\nbbbb\ncccc\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window(&mut reader, 1, None, 10, true).unwrap(); + assert_eq!(&w.raw, "1\u{2192}aaaa\n".as_bytes()); + assert_eq!(w.lines_returned, 1); + assert!(w.more_lines, "prefix bytes must not bypass the cap"); + } + + #[test] + fn numbered_wire_prefix_counts_toward_wire_budget() { + let data = b"aaaa\nbbbb\ncccc\n"; + let mut reader = std::io::BufReader::new(&data[..]); + let w = read_window_wire(&mut reader, 1, None, 10, true).unwrap(); + assert_eq!(w.content, "1\u{2192}aaaa\n"); + assert_eq!(w.content.len(), 9); + assert_eq!(w.lines_returned, 1); + assert!(w.more_lines); + } + + #[test] + fn numbered_wire_prefixes_converted_lossy_lines() { + // Invalid bytes convert to U+FFFD; the prefix rides on the + // CONVERTED line and the combined length is what the budget sees. + let data: &[u8] = b"\xFF\xFF\n"; + let mut reader = std::io::BufReader::new(data); + let w = read_window_wire(&mut reader, 1, None, 64, true).unwrap(); + assert_eq!(w.content, "1\u{2192}\u{FFFD}\u{FFFD}\n"); + assert!(!w.is_utf8); + assert_eq!(w.total_lines, Some(1)); + } +} diff --git a/coder/src/functions/search.rs b/coder/src/functions/search.rs index 371ed2d8..db81a519 100644 --- a/coder/src/functions/search.rs +++ b/coder/src/functions/search.rs @@ -1,9 +1,23 @@ //! `coder::search` — combined path + content search. //! -//! Walks `base_path` with `walkdir`, filtering by include/exclude globs -//! and skipping non-accessible files entirely so the search can't reveal -//! their content. Path matches and content matches are reported in -//! separate arrays of one response. +//! Walks the resolved folder with `walkdir`, filtering by include/exclude +//! globs (matched against the path relative to its containing root) and +//! skipping non-accessible files entirely so the search can't reveal +//! their content. Noise paths matching `default_exclude_globs` are also +//! skipped by default — descent into matching directories is suppressed +//! and matching files are omitted; opt out per call with +//! `use_default_excludes: false`. That filter is hide-only and +//! independent of the `is_non_accessible` access control (REDACTION +//! INVARIANT) — never merge the two. Path matches and content matches are +//! reported in separate arrays of one response; result paths are +//! canonical-absolute. +//! +//! Each content match can carry `before`/`after` context lines (same +//! file only, capped at [`CONTEXT_LINES_CAP`]). The whole response is +//! bounded by `search_response_budget_bytes`, accounted in converted +//! wire bytes like `batch_read_budget_bytes`: when the next match would +//! exceed the budget, accumulation stops and `truncated` is set — the +//! search degrades, it never fails. use std::sync::Arc; @@ -14,25 +28,32 @@ use crate::config::CoderConfig; use crate::error::{err_to_string, CoderError}; use crate::path::PathResolver; +// examples are wire-contract; goldens pin them. #[derive(Debug, Deserialize, JsonSchema)] +#[schemars(example = "example_search_input")] pub struct SearchInput { /// Pattern to search for. Treated as a regex when `regex: true`, /// otherwise as a literal substring. pub query: String, - /// Folder, relative to `base_path`, scoping the walk. Defaults to `.` - /// (the base itself). Globs and result `path`s remain anchored at - /// `base_path` regardless of this value. + /// Folder scoping the walk. Relative to the primary allowed root, or an + /// absolute path inside any allowed root. Defaults to `.` (the primary + /// root itself). Globs are matched relative to the containing root; + /// result paths are absolute regardless of this value. Call + /// `coder::info` to see the allowed roots. Paths outside every allowed + /// root are rejected — use the shell worker's `shell::fs::*` for host + /// paths outside the jail. #[serde(default = "default_path")] pub path: String, #[serde(default)] pub regex: bool, #[serde(default)] pub ignore_case: bool, - /// Glob patterns (relative to `base_path`) that paths must match - /// to be considered. Empty = include everything. + /// Glob patterns (matched against the path relative to its containing + /// root) that paths must match to be considered. Empty = include + /// everything. #[serde(default)] pub include_globs: Vec, - /// Glob patterns (relative to `base_path`) that exclude matching paths. + /// Glob patterns (same relative-to-root matching) that exclude paths. #[serde(default)] pub exclude_globs: Vec, /// Optional explicit cap. Falls back to config when unset. @@ -42,6 +63,26 @@ pub struct SearchInput { /// truncated for the match snippet. #[serde(default)] pub max_line_bytes: Option, + /// Lines of context to return BEFORE each content match (same file + /// only, in file order), max 10 — larger values are rejected with + /// C210. With context lines many edit tasks can go straight from + /// search output to coder::update-file with no read in between. + /// Context lines are truncated to `max_line_bytes` like the matched + /// text and count toward the response byte budget. Unset = 0. + #[serde(default)] + pub context_lines_before: Option, + /// Lines of context to return AFTER each content match. Same max + /// (10), truncation, and budget rules as `context_lines_before`; + /// unset = 0. + #[serde(default)] + pub context_lines_after: Option, + /// Apply the worker's `default_exclude_globs` config (noise paths + /// like .git, node_modules, target, dist — call coder::info for the + /// active list): the walk does not descend into matching directories + /// and matching files are omitted from BOTH content and path + /// results. Pass `false` to search inside them. + #[serde(default = "default_true")] + pub use_default_excludes: bool, /// Search file contents (default true). #[serde(default = "default_true")] pub search_content: bool, @@ -58,17 +99,48 @@ fn default_path() -> String { ".".to_string() } +// examples are wire-contract; goldens pin them. +fn example_search_input() -> serde_json::Value { + serde_json::json!({ + "query": "fn handle", + "path": "src", + "include_globs": ["**/*.rs"], + "context_lines_before": 2, + "context_lines_after": 2, + "search_content": true, + "search_paths": false + }) +} + #[derive(Debug, Serialize, JsonSchema)] pub struct ContentMatch { + /// Absolute path under the canonical parent; symlinks at the entry + /// itself are not resolved. Operations on it re-validate through the + /// jail. pub path: String, pub line: u32, pub column: u32, /// Matched line; truncated to `max_line_bytes` and never spans newlines. pub text: String, + /// Context lines immediately before the matched line — same file + /// only, in file order, each truncated to `max_line_bytes`. Omitted + /// when empty (no `context_lines_before` requested, or the match is + /// at the start of the file). + #[serde(skip_serializing_if = "Option::is_none")] + pub before: Option>, + /// Context lines immediately after the matched line — same file + /// only, in file order, each truncated to `max_line_bytes`. Omitted + /// when empty (no `context_lines_after` requested, or the match is + /// at the end of the file). + #[serde(skip_serializing_if = "Option::is_none")] + pub after: Option>, } #[derive(Debug, Serialize, JsonSchema)] pub struct PathMatch { + /// Absolute path under the canonical parent; symlinks at the entry + /// itself are not resolved. Operations on it re-validate through the + /// jail. pub path: String, } @@ -76,7 +148,10 @@ pub struct PathMatch { pub struct SearchOutput { pub content_matches: Vec, pub path_matches: Vec, - /// True if either match list was cut off at the configured cap. + /// True if results were cut off — either match list hit the + /// `max_matches` cap, or the response hit the + /// `search_response_budget_bytes` byte budget. When true, refine the + /// query or add include_globs rather than paginate. pub truncated: bool, } @@ -105,12 +180,17 @@ fn inner( let max_line_bytes = req .max_line_bytes .unwrap_or(cfg.search_default_max_line_bytes) as usize; + let ctx_before = validate_context_lines("context_lines_before", req.context_lines_before)?; + let ctx_after = validate_context_lines("context_lines_after", req.context_lines_after)?; // Use `resolve` rather than `require_writable` so a search rooted at // a folder that *contains* non-accessible children still works; the // per-file `is_non_accessible` filter below still guards their bytes. let walk_root = resolver.resolve(&req.path)?; - let md = std::fs::metadata(&walk_root)?; + // NotFound is intercepted with the wire path in scope so the C211 + // message names the path the caller supplied (standardized wording — + // REDACTION INVARIANT: identical to the glob-denied message). + let md = std::fs::metadata(&walk_root).map_err(|e| CoderError::io_for_path(e, &req.path))?; if !md.is_dir() { return Err(CoderError::BadInput(format!( "not a directory: {}", @@ -132,12 +212,36 @@ fn inner( None }; + // Explicitly naming an excluded folder as the walk root expresses + // the caller's intent to see inside it: the default-exclude filter + // is disabled for that ENTIRE walk (mirrors coder::tree's behavior). + let use_default_excludes = + req.use_default_excludes && !resolver.is_default_excluded_dir(&walk_root); + let mut content_matches: Vec = Vec::new(); let mut path_matches: Vec = Vec::new(); let mut truncated = false; + // CONVERTED WIRE BYTES accounting (same philosophy as + // batch_read_budget_bytes in read_file.rs): charge the strings that + // will actually be serialized. Monotone bounding of the payload + // STRINGS only, not the JSON envelope — actual wire bytes can exceed + // the budget by structural overhead (keys, quotes, commas, line/column + // numbers) plus escape expansion. Exhaustion sets `truncated` — never + // an error. + let mut budget_remaining: u64 = cfg.search_response_budget_bytes; + let mut budget_exhausted = false; let walker = walkdir::WalkDir::new(&walk_root).follow_links(false); - for entry in walker.into_iter().filter_map(|e| e.ok()) { + // Suppress descent into default-excluded DIRECTORIES at the dir + // boundary (dir-companion set; files merely NAMED like an excluded + // directory are unaffected). Excluded FILES are skipped in the loop + // body below. + let entries = walker.into_iter().filter_entry(|e| { + !(use_default_excludes + && e.file_type().is_dir() + && resolver.is_default_excluded_dir(e.path())) + }); + for entry in entries.filter_map(|e| e.ok()) { if !entry.file_type().is_file() { continue; } @@ -148,9 +252,17 @@ fn inner( if rel.is_empty() { continue; } + // Access control (REDACTION INVARIANT): denied files are wholly + // absent from both result lists. Independent of the hide-only + // default-exclude filter below — keep the two checks separate. if resolver.is_non_accessible(abs) { continue; } + // Noise hiding: default-excluded FILES (configured globs only — + // the dir-boundary companions never apply to files). + if use_default_excludes && resolver.is_default_excluded(abs) { + continue; + } if let Some(set) = &include { if !set.is_match(&rel) { continue; @@ -162,15 +274,27 @@ fn inner( } } + // Matching runs on the root-relative form; emitted paths are the + // canonical absolute form (decision D2-eng). + let abs_wire = abs.display().to_string(); + if let Some(matcher) = &path_matcher { if matcher.is_match(&rel) { if path_matches.len() >= max_matches { truncated = true; + } else if charge(&mut budget_remaining, abs_wire.len()) { + path_matches.push(PathMatch { + path: abs_wire.clone(), + }); } else { - path_matches.push(PathMatch { path: rel.clone() }); + truncated = true; + budget_exhausted = true; } } } + if budget_exhausted { + break; + } if let Some(matcher) = &content_matcher { // Skip files larger than max_read_bytes during a search — we @@ -190,22 +314,51 @@ fn inner( continue; } let text = String::from_utf8_lossy(&bytes); - for (line_idx, line) in text.lines().enumerate() { - let truncated_line = if line.len() > max_line_bytes { - &line[..max_line_bytes] - } else { - line - }; + // The file is fully in memory already; collecting the lines + // lets context slices index into the same file (and ONLY the + // same file — context never crosses file boundaries). + let lines: Vec<&str> = text.lines().collect(); + for (line_idx, line) in lines.iter().enumerate() { + let truncated_line = clip_line(line, max_line_bytes); + // Only the FIRST match per line is reported (one content + // match per matching line) — long-standing wire behavior. if let Some(m) = matcher.find(truncated_line) { if content_matches.len() >= max_matches { truncated = true; break; } + // Context slices over the collected lines, clipped at + // the file edges; overlap between adjacent matches is + // duplicated, not merged. Same per-line truncation as + // the matched text. + let before: Vec = lines[line_idx.saturating_sub(ctx_before)..line_idx] + .iter() + .map(|l| clip_line(l, max_line_bytes).to_string()) + .collect(); + let after_end = (line_idx + 1).saturating_add(ctx_after).min(lines.len()); + let after: Vec = lines[line_idx + 1..after_end] + .iter() + .map(|l| clip_line(l, max_line_bytes).to_string()) + .collect(); + let cost = abs_wire.len() + + truncated_line.len() + + before.iter().map(String::len).sum::() + + after.iter().map(String::len).sum::(); + if !charge(&mut budget_remaining, cost) { + truncated = true; + budget_exhausted = true; + break; + } content_matches.push(ContentMatch { - path: rel.clone(), + path: abs_wire.clone(), line: (line_idx as u32) + 1, column: (m.start as u32) + 1, text: truncated_line.to_string(), + // None when empty so the wire omits the field + // (schema-wire consistency: nullable, not + // required-but-sometimes-absent). + before: (!before.is_empty()).then_some(before), + after: (!after.is_empty()).then_some(after), }); } } @@ -222,6 +375,53 @@ fn inner( }) } +/// Per-request cap on `context_lines_before` / `context_lines_after`. +/// Larger windows belong to `coder::read-file` line windows, not search. +const CONTEXT_LINES_CAP: u32 = 10; + +/// Validate one context-lines knob against [`CONTEXT_LINES_CAP`]. +/// `None` means 0 (no context). +fn validate_context_lines(field: &str, value: Option) -> Result { + let v = value.unwrap_or(0); + if v > CONTEXT_LINES_CAP { + return Err(CoderError::BadInput(format!( + "{field} is {v} but the maximum is {CONTEXT_LINES_CAP}. \ + Re-call with {field} <= {CONTEXT_LINES_CAP}; for a wider view \ + read the file with coder::read-file line_from/line_to." + ))); + } + Ok(v as usize) +} + +/// Per-line truncation to `max_line_bytes` — one rule for the matched +/// line and its context lines. The clip point is floored to the nearest +/// UTF-8 char boundary (a mid-character byte slice panics), so a clipped +/// line can come out up to 3 bytes short of the cap — never over it. +/// (`str::floor_char_boundary` is nightly-only; walk back on stable.) +fn clip_line(line: &str, max_line_bytes: usize) -> &str { + if line.len() <= max_line_bytes { + return line; + } + let mut end = max_line_bytes; + while end > 0 && !line.is_char_boundary(end) { + end -= 1; + } + &line[..end] +} + +/// Charge `cost` wire bytes against the remaining response budget. +/// Returns false (charging nothing) when the cost would overdraw — the +/// caller stops accumulating and sets `truncated`; the search degrades, +/// it never errors. +fn charge(remaining: &mut u64, cost: usize) -> bool { + let cost = cost as u64; + if cost > *remaining { + return false; + } + *remaining -= cost; + true +} + fn build_globset(patterns: &[String]) -> Result, CoderError> { if patterns.is_empty() { return Ok(None); @@ -297,7 +497,7 @@ mod tests { fn setup() -> (tempfile::TempDir, Arc, Arc) { let tmp = tempdir().unwrap(); let cfg = CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], non_accessible_globs: vec!["**/.env".to_string()], max_read_bytes: 1024 * 1024, search_default_max_matches: 1000, @@ -309,6 +509,15 @@ mod tests { (tmp, resolver, cfg) } + /// Expected wire path: responses carry canonical absolute paths. + fn abs(tmp: &tempfile::TempDir, rel: &str) -> String { + std::fs::canonicalize(tmp.path()) + .unwrap() + .join(rel) + .display() + .to_string() + } + fn write(tmp: &tempfile::TempDir, rel: &str, body: &str) { let p = tmp.path().join(rel); if let Some(parent) = p.parent() { @@ -333,6 +542,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: false, }, @@ -341,11 +553,71 @@ mod tests { .unwrap(); assert_eq!(out.content_matches.len(), 1); let m = &out.content_matches[0]; - assert_eq!(m.path, "a.txt"); + assert_eq!(m.path, abs(&tmp, "a.txt")); assert_eq!(m.line, 2); assert_eq!(m.column, 6); } + /// Pins the SKILL.md "poor-man's outline" recipe: matching is + /// PER-LINE, so `^` anchors at each line start and a leading `\s*` + /// catches indented declarations (impl/class methods). `path` scopes + /// the walk to a folder; the root-relative include glob pins the one + /// file. The pattern consumes the indentation, so `column` stays 1 + /// even for indented hits. + #[tokio::test] + async fn outline_recipe_returns_indented_declarations() { + let (tmp, r, c) = setup(); + write( + &tmp, + "src/lib.rs", + "pub struct Config {}\n\nimpl Config {\n pub fn load() -> Self {\n \ + Self {}\n }\n\n fn validate(&self) -> bool {\n true\n }\n}\n", + ); + write(&tmp, "src/other.rs", "fn excluded_by_glob() {}\n"); + let out = handle( + r, + c, + SearchInput { + query: r"^\s*(pub |fn |class |def |func |impl |interface )".into(), + path: "src".into(), + regex: true, + ignore_case: false, + include_globs: vec!["src/lib.rs".into()], + exclude_globs: vec![], + max_matches: None, + max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, + search_content: true, + search_paths: false, + }, + ) + .await + .unwrap(); + let hits: Vec<(u32, u32, &str)> = out + .content_matches + .iter() + .map(|m| (m.line, m.column, m.text.as_str())) + .collect(); + assert_eq!( + hits, + vec![ + (1, 1, "pub struct Config {}"), + (3, 1, "impl Config {"), + (4, 1, " pub fn load() -> Self {"), + (8, 1, " fn validate(&self) -> bool {"), + ], + "outline must include the indented impl methods (lines 4 and 8)" + ); + assert!( + !out.content_matches + .iter() + .any(|m| m.path.ends_with("other.rs")), + "include_globs must pin the single file" + ); + } + #[tokio::test] async fn regex_content_match() { let (tmp, r, c) = setup(); @@ -362,6 +634,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: false, }, @@ -389,6 +664,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: false, search_paths: true, }, @@ -396,8 +674,8 @@ mod tests { .await .unwrap(); let paths: Vec<_> = out.path_matches.iter().map(|p| p.path.as_str()).collect(); - assert!(paths.contains(&"src/foo.rs")); - assert!(!paths.contains(&"src/bar.ts")); + assert!(paths.contains(&abs(&tmp, "src/foo.rs").as_str())); + assert!(!paths.contains(&abs(&tmp, "src/bar.ts").as_str())); } #[tokio::test] @@ -417,6 +695,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: true, }, @@ -424,11 +705,12 @@ mod tests { .await .unwrap(); // The .env file must not appear in either match list. + let env_abs = abs(&tmp, ".env"); for m in &out.content_matches { - assert_ne!(m.path, ".env", "non-accessible file leaked content"); + assert_ne!(m.path, env_abs, "non-accessible file leaked content"); } for m in &out.path_matches { - assert_ne!(m.path, ".env", "non-accessible file leaked path"); + assert_ne!(m.path, env_abs, "non-accessible file leaked path"); } } @@ -450,6 +732,9 @@ mod tests { exclude_globs: vec!["build/**".into()], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: false, }, @@ -461,7 +746,7 @@ mod tests { .iter() .map(|m| m.path.as_str()) .collect(); - assert_eq!(paths, vec!["src/a.rs"]); + assert_eq!(paths, vec![abs(&tmp, "src/a.rs")]); } #[tokio::test] @@ -471,7 +756,7 @@ mod tests { write(&tmp, &format!("f{i}.txt"), "needle\n"); } let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], non_accessible_globs: vec![], search_default_max_matches: 1000, max_read_bytes: 1024 * 1024, @@ -489,6 +774,9 @@ mod tests { exclude_globs: vec![], max_matches: Some(2), max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: false, }, @@ -514,6 +802,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: true, }, @@ -539,6 +830,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: false, }, @@ -564,6 +858,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: false, }, @@ -590,6 +887,9 @@ mod tests { exclude_globs: vec![], max_matches: None, max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, search_content: true, search_paths: false, }, @@ -601,6 +901,475 @@ mod tests { .iter() .map(|m| m.path.as_str()) .collect(); - assert_eq!(paths, vec!["a/x.txt"]); + assert_eq!(paths, vec![abs(&tmp, "a/x.txt")]); + } + + // ----------------------------------------------------------------- + // S2 (v0.4.0): context lines, response byte budget, default excludes. + // ----------------------------------------------------------------- + + /// Request with serde defaults for everything but `query` — the same + /// defaults a wire caller gets. + fn base_input(query: &str) -> SearchInput { + SearchInput { + query: query.into(), + path: ".".into(), + regex: false, + ignore_case: false, + include_globs: vec![], + exclude_globs: vec![], + max_matches: None, + max_line_bytes: None, + context_lines_before: None, + context_lines_after: None, + use_default_excludes: true, + search_content: true, + search_paths: false, + } + } + + /// Expected wire context: `Some` of owned lines (the wire omits the + /// field entirely when no context exists — `None`, never `Some([])`). + fn ctx(lines: &[&str]) -> Option> { + Some(lines.iter().map(|s| s.to_string()).collect()) + } + + /// Jail with an explicit `search_response_budget_bytes`. + fn cfg_with_budget(tmp: &tempfile::TempDir, budget: u64) -> Arc { + Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec![], + max_read_bytes: 1024 * 1024, + search_default_max_matches: 1000, + search_default_max_line_bytes: 4096, + search_response_budget_bytes: budget, + ..CoderConfig::default() + }) + } + + #[tokio::test] + async fn context_lines_in_file_order() { + let (tmp, r, c) = setup(); + write(&tmp, "a.txt", "l1\nl2\nl3 needle\nl4\nl5\n"); + let out = handle( + r, + c, + SearchInput { + context_lines_before: Some(2), + context_lines_after: Some(2), + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 1); + let m = &out.content_matches[0]; + assert_eq!(m.before, ctx(&["l1", "l2"])); + assert_eq!(m.after, ctx(&["l4", "l5"])); + } + + #[tokio::test] + async fn context_clipped_at_file_edges_and_duplicated_across_matches() { + let (tmp, r, c) = setup(); + write(&tmp, "a.txt", "needle first\nmid\nneedle last\n"); + let out = handle( + r, + c, + SearchInput { + context_lines_before: Some(5), + context_lines_after: Some(5), + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 2); + // First match: nothing before line 1; after clipped at EOF. + assert!(out.content_matches[0].before.is_none()); + assert_eq!(out.content_matches[0].after, ctx(&["mid", "needle last"])); + // Second match: overlapping context is duplicated, not merged. + assert_eq!(out.content_matches[1].before, ctx(&["needle first", "mid"])); + assert!(out.content_matches[1].after.is_none()); + } + + #[tokio::test] + async fn context_never_crosses_file_boundaries() { + let (tmp, r, c) = setup(); + write(&tmp, "a.txt", "needle a\n"); + write(&tmp, "b.txt", "needle b\n"); + let out = handle( + r, + c, + SearchInput { + context_lines_before: Some(3), + context_lines_after: Some(3), + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 2); + for m in &out.content_matches { + assert!(m.before.is_none(), "crossed file boundary: {:?}", m.before); + assert!(m.after.is_none(), "crossed file boundary: {:?}", m.after); + } + } + + #[tokio::test] + async fn context_lines_over_cap_rejected_c210() { + let (tmp, r, c) = setup(); + write(&tmp, "a.txt", "needle\n"); + let err = handle( + r.clone(), + c.clone(), + SearchInput { + context_lines_before: Some(11), + ..base_input("needle") + }, + ) + .await + .unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!( + err.contains("11") && err.contains("10"), + "must name the actual value and the cap: {err}" + ); + let err = handle( + r, + c, + SearchInput { + context_lines_after: Some(99), + ..base_input("needle") + }, + ) + .await + .unwrap_err(); + assert!(err.contains("C210"), "got: {err}"); + assert!( + err.contains("99") && err.contains("10"), + "must name the actual value and the cap: {err}" + ); + } + + #[tokio::test] + async fn context_lines_subject_to_max_line_bytes() { + let (tmp, r, c) = setup(); + let long = "x".repeat(100); + write(&tmp, "a.txt", &format!("{long}\nneedle\n{long}\n")); + let out = handle( + r, + c, + SearchInput { + max_line_bytes: Some(10), + context_lines_before: Some(1), + context_lines_after: Some(1), + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 1); + let m = &out.content_matches[0]; + assert_eq!(m.before, Some(vec!["x".repeat(10)])); + assert_eq!(m.after, Some(vec!["x".repeat(10)])); + } + + #[tokio::test] + async fn budget_truncates_without_error_deterministically() { + let (tmp, r, _c) = setup(); + write(&tmp, "a.txt", "needle one\nneedle two\n"); + // Budget fits exactly the first match (path + text), not both. + let cost1 = (abs(&tmp, "a.txt").len() + "needle one".len()) as u64; + let out = handle(r, cfg_with_budget(&tmp, cost1), base_input("needle")) + .await + .unwrap(); + assert_eq!( + out.content_matches.len(), + 1, + "deterministic cutoff after the first match" + ); + assert_eq!(out.content_matches[0].text, "needle one"); + assert!(out.truncated, "budget cutoff must set the truncated flag"); + } + + #[tokio::test] + async fn budget_accounting_includes_context_lines() { + let (tmp, r, _c) = setup(); + write( + &tmp, + "a.txt", + "ctx before\nneedle one\nctx after\nfiller\nneedle two\n", + ); + let path_len = abs(&tmp, "a.txt").len() as u64; + // Without context this budget fits both matches exactly… + let budget = 2 * (path_len + "needle one".len() as u64); + let out = handle( + r.clone(), + cfg_with_budget(&tmp, budget), + base_input("needle"), + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 2); + assert!(!out.truncated); + // …but with context lines charged, only the first match fits. + let out = handle( + r, + cfg_with_budget(&tmp, budget), + SearchInput { + context_lines_before: Some(1), + context_lines_after: Some(1), + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!( + out.content_matches.len(), + 1, + "context lines must count toward the budget" + ); + assert!(out.truncated); + } + + #[tokio::test] + async fn budget_applies_to_path_matches_too() { + let (tmp, r, _c) = setup(); + write(&tmp, "needle1.txt", "x"); + write(&tmp, "needle2.txt", "x"); + // Both absolute paths have the same length; budget fits one. + let p_len = abs(&tmp, "needle1.txt").len() as u64; + let out = handle( + r, + cfg_with_budget(&tmp, p_len), + SearchInput { + search_content: false, + search_paths: true, + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!(out.path_matches.len(), 1); + assert!(out.truncated); + } + + #[tokio::test] + async fn default_excludes_skip_node_modules_in_content_and_path_results() { + let (tmp, r, c) = setup(); + write(&tmp, "node_modules/pkg/needle.js", "needle inside"); + write(&tmp, "src/needle.rs", "needle inside"); + let out = handle( + r, + c, + SearchInput { + search_paths: true, + ..base_input("needle") + }, + ) + .await + .unwrap(); + // No names, counts, or placeholders for excluded entries anywhere. + let serialized = serde_json::to_string(&out).unwrap(); + assert!( + !serialized.contains("node_modules"), + "excluded dir leaked: {serialized}" + ); + assert_eq!(out.content_matches.len(), 1); + assert_eq!(out.content_matches[0].path, abs(&tmp, "src/needle.rs")); + assert_eq!(out.path_matches.len(), 1); + assert_eq!(out.path_matches[0].path, abs(&tmp, "src/needle.rs")); + } + + #[tokio::test] + async fn use_default_excludes_false_searches_inside() { + let (tmp, r, c) = setup(); + write(&tmp, "node_modules/pkg/dep.js", "needle inside"); + let out = handle( + r, + c, + SearchInput { + use_default_excludes: false, + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 1); + assert_eq!( + out.content_matches[0].path, + abs(&tmp, "node_modules/pkg/dep.js") + ); + } + + #[tokio::test] + async fn file_named_like_excluded_dir_still_searched() { + let (tmp, r, c) = setup(); + // A FILE named `dist`: dir-boundary companions apply to + // directories only, so this must still be scanned. + write(&tmp, "dist", "needle in a file named dist"); + let out = handle(r, c, base_input("needle")).await.unwrap(); + assert_eq!(out.content_matches.len(), 1); + assert_eq!(out.content_matches[0].path, abs(&tmp, "dist")); + } + + #[tokio::test] + async fn explicitly_excluded_walk_root_disables_filter() { + let (tmp, r, c) = setup(); + write(&tmp, "node_modules/pkg/dep.js", "needle inside"); + // Naming the excluded folder as the walk root expresses intent to + // see inside it (mirrors coder::tree's S1-reviewed behavior). + let out = handle( + r, + c, + SearchInput { + path: "node_modules".into(), + ..base_input("needle") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 1); + } + + // REDACTION INVARIANT regression: denied files stay wholly absent + // from the whole serialized response even with the new options. + #[tokio::test] + async fn denied_files_wholly_absent_with_context_and_excludes_off() { + let (tmp, r, c) = setup(); + write(&tmp, ".env", "API_KEY=needle secret"); + write(&tmp, "ok.txt", "needle here"); + let out = handle( + r, + c, + SearchInput { + search_paths: true, + context_lines_before: Some(2), + context_lines_after: Some(2), + use_default_excludes: false, + ..base_input("needle") + }, + ) + .await + .unwrap(); + let serialized = serde_json::to_string(&out).unwrap(); + assert!( + !serialized.contains(".env"), + "denied path leaked: {serialized}" + ); + assert!( + !serialized.contains("API_KEY"), + "denied content leaked: {serialized}" + ); + assert_eq!(out.content_matches.len(), 1); + } + + // ----------------------------------------------------------------- + // clip_line UTF-8 boundary regressions: clipping must floor to a + // char boundary instead of panicking mid-character (ship-blocker + // found in S2 review). Three confirmed panic variants pinned below. + // ----------------------------------------------------------------- + + // a1: matched-text clip lands mid-'é' (2 bytes straddling the cap). + #[tokio::test] + async fn clip_mid_char_in_matched_line_does_not_panic() { + let (tmp, r, c) = setup(); + write(&tmp, "a.txt", "abétail\n"); + let out = handle( + r, + c, + SearchInput { + max_line_bytes: Some(3), + ..base_input("ab") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 1); + // Cap 3 falls inside 'é' (bytes 2..4); floor to the boundary at 2. + assert_eq!(out.content_matches[0].text, "ab"); + } + + // a2: clean ASCII match poisoned by a multibyte CONTEXT neighbor — + // the cap lands mid-'😀' (4 bytes) in the before-line. + #[tokio::test] + async fn clip_mid_char_in_context_line_does_not_panic() { + let (tmp, r, c) = setup(); + write(&tmp, "a.txt", "abc😀x\nhit42\n"); + let out = handle( + r, + c, + SearchInput { + max_line_bytes: Some(5), + context_lines_before: Some(1), + ..base_input("hit") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 1); + assert_eq!(out.content_matches[0].text, "hit42"); + // Cap 5 falls inside '😀' (bytes 3..7); floor to the boundary at 3. + assert_eq!(out.content_matches[0].before, ctx(&["abc"])); + } + + // a3 (worst): clip_line runs on EVERY scanned line before matching, + // so a non-matching over-cap line with a multibyte char straddling + // the DEFAULT 4096-byte cap killed the whole handler with zero + // caller-supplied knobs — regardless of query. + #[tokio::test] + async fn clip_mid_char_in_non_matching_scanned_line_does_not_panic() { + let (tmp, r, c) = setup(); + // '😀' occupies bytes 4095..4099: the default + // search_default_max_line_bytes (4096) lands mid-character. + let line = format!("{}😀tail", "a".repeat(4095)); + write(&tmp, "big.txt", &format!("{line}\n")); + let out = handle(r, c, base_input("zzz-no-match")).await.unwrap(); + assert!(out.content_matches.is_empty()); + assert!(!out.truncated); + } + + // Wire-shape pin: matches without context carry `None` (never + // `Some([])`), so the fields are omitted entirely and context-free + // responses keep the pre-S2 shape — matching the nullable, + // non-required schema. + #[tokio::test] + async fn empty_context_omitted_from_wire() { + let (tmp, r, c) = setup(); + write(&tmp, "a.txt", "needle\n"); + let out = handle(r, c, base_input("needle")).await.unwrap(); + assert!(out.content_matches[0].before.is_none()); + assert!(out.content_matches[0].after.is_none()); + let serialized = serde_json::to_string(&out).unwrap(); + assert!( + !serialized.contains("\"before\"") && !serialized.contains("\"after\""), + "empty context must be skipped on the wire: {serialized}" + ); + } + + // Clip-then-charge ordering pin: a line clipped at `max_line_bytes` + // must be charged its POST-clip byte length. A budget sized to the + // clipped text fits exactly; charging the raw line length would + // overdraw and wrongly return zero matches. + #[tokio::test] + async fn budget_charges_post_clip_length() { + let (tmp, r, _c) = setup(); + write(&tmp, "a.txt", &format!("ab{}\n", "é".repeat(50))); + // max_line_bytes 3 falls inside the first 'é' (bytes 2..4); the + // stored text floors to "ab" (2 bytes), not the 102-byte raw line. + let budget = (abs(&tmp, "a.txt").len() + "ab".len()) as u64; + let out = handle( + r, + cfg_with_budget(&tmp, budget), + SearchInput { + max_line_bytes: Some(3), + ..base_input("ab") + }, + ) + .await + .unwrap(); + assert_eq!(out.content_matches.len(), 1); + assert_eq!(out.content_matches[0].text, "ab"); + assert!(!out.truncated); } } diff --git a/coder/src/functions/tree.rs b/coder/src/functions/tree.rs index 533b5118..dab1b929 100644 --- a/coder/src/functions/tree.rs +++ b/coder/src/functions/tree.rs @@ -3,6 +3,13 @@ //! a `truncated` block pointing the caller at `coder::list-folder` for //! pagination — matching the user's "if folder contains thousands of //! files, it should show an indication it loaded only 50" requirement. +//! +//! Noise folders matching `default_exclude_globs` (.git, node_modules, …) +//! surface as childless stub nodes flagged `truncated` with reason +//! `default_exclude` — never silently hidden; opt out per call with +//! `use_default_excludes: false`. Nodes carry only `name`; absolute paths +//! derive from the response's top-level `path` (child = parent + "/" + +//! name), which cuts thousands of redundant tokens from large snapshots. use std::path::Path; use std::sync::Arc; @@ -14,9 +21,15 @@ use crate::config::CoderConfig; use crate::error::{err_to_string, CoderError}; use crate::path::PathResolver; +// examples are wire-contract; goldens pin them. #[derive(Debug, Deserialize, JsonSchema)] +#[schemars(example = "example_tree_input")] pub struct TreeInput { - /// Base folder relative to `base_path`. Defaults to `.`. + /// Base folder for the snapshot. Relative to the primary allowed root, + /// or an absolute path inside any allowed root. Defaults to `.` (the + /// primary root itself). Call `coder::info` to see the allowed roots. + /// Paths outside every allowed root are rejected — use the shell + /// worker's `shell::fs::*` for host paths outside the jail. #[serde(default = "default_path")] pub path: String, /// Maximum depth to descend; the root node is depth 0. @@ -26,21 +39,49 @@ pub struct TreeInput { /// flagged `truncated` and callers should switch to `coder::list-folder`. #[serde(default)] pub per_folder_limit: Option, + /// Apply the worker's `default_exclude_globs` config (noise folders + /// like .git/node_modules/target — call `coder::info` for the active + /// list). Excluded directories still appear as childless nodes + /// flagged `truncated` with reason "default_exclude"; excluded files + /// are omitted. Pass `false` to list everything. + #[serde(default = "default_true")] + pub use_default_excludes: bool, } fn default_path() -> String { ".".to_string() } +fn default_true() -> bool { + true +} + +// examples are wire-contract; goldens pin them. +fn example_tree_input() -> serde_json::Value { + serde_json::json!({ + "path": ".", + "max_depth": 3 + }) +} + #[derive(Debug, Serialize, JsonSchema)] pub struct TreeOutput { + /// Canonical absolute path of the requested folder (resolved through + /// the jail). Nodes carry only `name`, and the root node's path IS + /// this `path` — do not join the root's `name` onto it; derive + /// children by joining from here: child path = parent path + "/" + + /// name. Operations on derived paths re-validate through the jail. + pub path: String, + /// Root node of the snapshot; its `name` is the folder's basename. pub root: TreeNode, } #[derive(Debug, Serialize, JsonSchema)] pub struct TreeNode { + /// Entry basename. The ROOT node's path is the response's top-level + /// `path` itself; every other node's path derives by joining from + /// there: child path = parent path + "/" + name. pub name: String, - pub path: String, pub kind: NodeKind, pub size: u64, pub mtime: i64, @@ -48,8 +89,9 @@ pub struct TreeNode { pub non_accessible: bool, #[serde(skip_serializing_if = "Option::is_none")] pub children: Option>, - /// Set on directories whose `children` was capped at `per_folder_limit` - /// or whose subtree was cut off by `max_depth`. + /// Set on directories whose `children` was capped at + /// `per_folder_limit`, whose subtree was cut off by `max_depth`, or + /// which matched `default_exclude_globs` (reason "default_exclude"). #[serde(skip_serializing_if = "Option::is_none")] pub truncated: Option, } @@ -65,12 +107,14 @@ pub enum NodeKind { #[derive(Debug, Serialize, JsonSchema)] pub struct TruncationInfo { - /// Reason this folder was truncated: hit `per_folder_limit` or - /// `max_depth`. + /// Reason this folder was truncated: hit `per_folder_limit`, cut off + /// by `max_depth`, or matched `default_exclude_globs` + /// (`default_exclude`). pub reason: String, /// Number of children actually returned. pub shown: u32, - /// Total number of children in the folder (only populated when + /// Total number of children eligible for listing in the folder, + /// counted after default-exclude filtering (only populated when /// `reason == "per_folder_limit"`; for depth truncation we don't /// peek into the folder). #[serde(skip_serializing_if = "Option::is_none")] @@ -92,32 +136,53 @@ fn inner( req: TreeInput, ) -> Result { let abs = resolver.resolve(&req.path)?; - let md = std::fs::metadata(&abs)?; + // NotFound is intercepted with the wire path in scope so the C211 + // message names the path the caller supplied (standardized wording — + // REDACTION INVARIANT: identical to the glob-denied message). + let md = std::fs::metadata(&abs).map_err(|e| CoderError::io_for_path(e, &req.path))?; if !md.is_dir() { return Err(CoderError::BadInput(format!( "not a directory: {}", req.path ))); } - let max_depth = req.max_depth.unwrap_or(cfg.tree_default_depth); - let per_folder_limit = req - .per_folder_limit - .unwrap_or(cfg.tree_per_folder_limit) - .max(1); - - let root_rel = resolver.relative(&abs).unwrap_or_default(); - let root = walk_dir(resolver, &abs, root_rel, 0, max_depth, per_folder_limit)?; - Ok(TreeOutput { root }) + // Explicitly naming an excluded folder as the walk root expresses + // the caller's intent to see inside it: the default-exclude filter is + // disabled for that ENTIRE walk. Anything less returns an + // affirmatively false "empty" listing — direct file children omitted, + // subdirectories stubbed one level down. + let use_default_excludes = req.use_default_excludes && !resolver.is_default_excluded_dir(&abs); + let opts = WalkOpts { + max_depth: req.max_depth.unwrap_or(cfg.tree_default_depth), + per_folder_limit: req + .per_folder_limit + .unwrap_or(cfg.tree_per_folder_limit) + .max(1), + use_default_excludes, + }; + + let root = walk_dir(resolver, &abs, 0, &opts)?; + Ok(TreeOutput { + path: abs.display().to_string(), + root, + }) +} + +struct WalkOpts { + max_depth: u32, + per_folder_limit: u32, + use_default_excludes: bool, } fn walk_dir( resolver: &PathResolver, abs: &Path, - rel: String, depth: u32, - max_depth: u32, - per_folder_limit: u32, + opts: &WalkOpts, ) -> Result { + // Deliberately bare `?` (generic From fallback, no path in the message): + // `abs` here can be a DISCOVERED child during the walk — naming it would + // violate the REDACTION INVARIANT. Do not "fix" this to io_for_path. let md = std::fs::metadata(abs)?; let name = abs .file_name() @@ -126,7 +191,6 @@ fn walk_dir( let mut node = TreeNode { name, - path: rel.clone(), kind: NodeKind::Dir, size: md.len(), mtime: unix_mtime(&md), @@ -135,7 +199,23 @@ fn walk_dir( truncated: None, }; - if depth >= max_depth { + // `abs` is always a directory here (the walk only recurses into + // dirs), so the dir-boundary check applies. The excluded node still + // appears — never silently hidden. The root can't trip this: inner() + // disables the filter when the requested root is itself excluded. + if opts.use_default_excludes && resolver.is_default_excluded_dir(abs) { + node.truncated = Some(TruncationInfo { + reason: "default_exclude".to_string(), + shown: 0, + total: None, + hint: "folder matches default_exclude_globs (coder::info lists them); \ + re-call coder::tree with use_default_excludes: false to descend" + .into(), + }); + return Ok(node); + } + + if depth >= opts.max_depth { node.truncated = Some(TruncationInfo { reason: "max_depth".to_string(), shown: 0, @@ -158,10 +238,21 @@ fn walk_dir( return Ok(node); } }; + if opts.use_default_excludes { + // Excluded non-directory entries are omitted outright — matched + // against the configured globs ONLY (no dir companions), so a + // file or symlink merely NAMED like an excluded directory is + // kept. Directories stay regardless; excluded ones surface as + // childless stubs in the recursive call. + entries.retain(|e| { + let is_dir = e.file_type().is_ok_and(|t| t.is_dir()); + is_dir || !resolver.is_default_excluded(&e.path()) + }); + } entries.sort_by_key(|a| a.file_name()); let total = entries.len() as u32; - let cap = per_folder_limit as usize; + let cap = opts.per_folder_limit as usize; let truncated_here = total as usize > cap; let visible = if truncated_here { &entries[..cap] @@ -172,21 +263,9 @@ fn walk_dir( let mut children = Vec::with_capacity(visible.len()); for e in visible { let child_abs = e.path(); - let child_rel = if rel.is_empty() { - e.file_name().to_string_lossy().into_owned() - } else { - format!("{}/{}", rel, e.file_name().to_string_lossy()) - }; let ft = e.file_type().ok(); if ft.as_ref().is_some_and(|t| t.is_dir()) { - let sub = walk_dir( - resolver, - &child_abs, - child_rel, - depth + 1, - max_depth, - per_folder_limit, - )?; + let sub = walk_dir(resolver, &child_abs, depth + 1, opts)?; children.push(sub); } else { let cmd = match e.metadata() { @@ -195,7 +274,6 @@ fn walk_dir( }; children.push(TreeNode { name: e.file_name().to_string_lossy().into_owned(), - path: child_rel, kind: classify(&cmd), size: cmd.len(), mtime: unix_mtime(&cmd), @@ -246,7 +324,7 @@ mod tests { fn setup() -> (tempfile::TempDir, Arc, Arc) { let tmp = tempdir().unwrap(); let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], non_accessible_globs: vec!["**/.env".to_string()], tree_default_depth: 4, tree_per_folder_limit: 50, @@ -256,6 +334,15 @@ mod tests { (tmp, resolver, cfg) } + fn input(path: &str) -> TreeInput { + TreeInput { + path: path.into(), + max_depth: None, + per_folder_limit: None, + use_default_excludes: true, + } + } + #[tokio::test] async fn tree_with_nested_dirs() { let (tmp, r, c) = setup(); @@ -263,20 +350,10 @@ mod tests { std::fs::write(tmp.path().join("a/b/c.txt"), "hi").unwrap(); std::fs::write(tmp.path().join("z.txt"), "x").unwrap(); - let out = handle( - r, - c, - TreeInput { - path: ".".into(), - max_depth: None, - per_folder_limit: None, - }, - ) - .await - .unwrap(); - let root = out.root; + let out = handle(r, c, input(".")).await.unwrap(); + let root = &out.root; assert!(matches!(root.kind, NodeKind::Dir)); - let children = root.children.unwrap(); + let children = root.children.as_ref().unwrap(); let names: Vec<_> = children.iter().map(|c| c.name.as_str()).collect(); assert_eq!(names, vec!["a", "z.txt"]); let a = &children[0]; @@ -284,6 +361,17 @@ mod tests { assert_eq!(a_children[0].name, "b"); let b_children = a_children[0].children.as_ref().unwrap(); assert_eq!(b_children[0].name, "c.txt"); + // The response's top-level path is canonical-absolute (decision + // D2-eng); nodes carry only names. + let base = std::fs::canonicalize(tmp.path()).unwrap(); + assert_eq!(out.path, base.display().to_string()); + // WIRE-CONTRACT PIN: the documented derivation rule (child path = + // parent path + "/" + name) must reproduce the real fs path. + let derived = format!( + "{}/{}/{}/{}", + out.path, a.name, a_children[0].name, b_children[0].name + ); + assert_eq!(derived, base.join("a/b/c.txt").display().to_string()); } #[tokio::test] @@ -292,22 +380,12 @@ mod tests { std::fs::create_dir_all(tmp.path().join("a/b/c")).unwrap(); std::fs::write(tmp.path().join("a/b/c/x.txt"), "x").unwrap(); let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], tree_default_depth: 1, tree_per_folder_limit: 50, ..CoderConfig::default() }); - let out = handle( - r, - cfg, - TreeInput { - path: ".".into(), - max_depth: None, - per_folder_limit: None, - }, - ) - .await - .unwrap(); + let out = handle(r, cfg, input(".")).await.unwrap(); let a = &out.root.children.unwrap()[0]; // a is depth 1, which equals max_depth → should be truncated, no children loaded. assert!(a.children.is_none()); @@ -322,22 +400,12 @@ mod tests { std::fs::write(tmp.path().join(format!("f{i:02}.txt")), "x").unwrap(); } let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), + base_paths: vec![tmp.path().to_path_buf()], tree_default_depth: 4, tree_per_folder_limit: 3, ..CoderConfig::default() }); - let out = handle( - r, - cfg, - TreeInput { - path: ".".into(), - max_depth: None, - per_folder_limit: None, - }, - ) - .await - .unwrap(); + let out = handle(r, cfg, input(".")).await.unwrap(); let kids = out.root.children.as_ref().unwrap(); assert_eq!(kids.len(), 3); let trunc = out.root.truncated.as_ref().unwrap(); @@ -352,21 +420,138 @@ mod tests { let (tmp, r, c) = setup(); std::fs::write(tmp.path().join(".env"), "x").unwrap(); std::fs::write(tmp.path().join("a.txt"), "x").unwrap(); + let out = handle(r, c, input(".")).await.unwrap(); + let kids = out.root.children.unwrap(); + let env = kids.iter().find(|k| k.name == ".env").unwrap(); + assert!(env.non_accessible); + let a = kids.iter().find(|k| k.name == "a.txt").unwrap(); + assert!(!a.non_accessible); + } + + #[tokio::test] + async fn default_excluded_dir_appears_as_childless_stub() { + let (tmp, r, c) = setup(); + std::fs::create_dir_all(tmp.path().join("node_modules/pkg")).unwrap(); + std::fs::write(tmp.path().join("node_modules/pkg/index.js"), "x").unwrap(); + std::fs::write(tmp.path().join("main.rs"), "x").unwrap(); + + let out = handle(r, c, input(".")).await.unwrap(); + let kids = out.root.children.unwrap(); + let nm = kids + .iter() + .find(|k| k.name == "node_modules") + .expect("excluded dir must still appear, never silently hidden"); + assert!(matches!(nm.kind, NodeKind::Dir)); + assert!(nm.children.is_none(), "descent must be suppressed"); + let trunc = nm.truncated.as_ref().unwrap(); + assert_eq!(trunc.reason, "default_exclude"); + assert_eq!(trunc.shown, 0); + assert_eq!(trunc.total, None); + assert!( + trunc.hint.contains("use_default_excludes"), + "hint must teach the opt-out: {}", + trunc.hint + ); + } + + #[tokio::test] + async fn use_default_excludes_false_descends_into_excluded_dirs() { + let (tmp, r, c) = setup(); + std::fs::create_dir_all(tmp.path().join("node_modules/pkg")).unwrap(); + std::fs::write(tmp.path().join("node_modules/pkg/index.js"), "x").unwrap(); + let out = handle( r, c, TreeInput { - path: ".".into(), - max_depth: None, - per_folder_limit: None, + use_default_excludes: false, + ..input(".") }, ) .await .unwrap(); let kids = out.root.children.unwrap(); - let env = kids.iter().find(|k| k.name == ".env").unwrap(); - assert!(env.non_accessible); - let a = kids.iter().find(|k| k.name == "a.txt").unwrap(); - assert!(!a.non_accessible); + let nm = kids.iter().find(|k| k.name == "node_modules").unwrap(); + assert!(nm.truncated.is_none()); + let nm_kids = nm.children.as_ref().expect("opt-out must descend"); + assert_eq!(nm_kids[0].name, "pkg"); + } + + #[tokio::test] + async fn default_excluded_file_omitted_from_listing() { + let tmp = tempdir().unwrap(); + std::fs::write(tmp.path().join("debug.log"), "x").unwrap(); + std::fs::write(tmp.path().join("keep.txt"), "x").unwrap(); + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + default_exclude_globs: vec!["**/*.log".to_string()], + ..CoderConfig::default() + }); + let r = Arc::new(PathResolver::new(&cfg).unwrap()); + let out = handle(r, cfg, input(".")).await.unwrap(); + let kids = out.root.children.unwrap(); + let names: Vec<_> = kids.iter().map(|k| k.name.as_str()).collect(); + assert_eq!(names, vec!["keep.txt"]); + assert!( + out.root.truncated.is_none(), + "omitted files must not count toward per_folder_limit truncation" + ); + } + + fn assert_no_default_exclude_stubs(node: &TreeNode) { + if let Some(t) = &node.truncated { + assert_ne!( + t.reason, "default_exclude", + "unexpected default_exclude stub at node {:?}", + node.name + ); + } + for child in node.children.iter().flatten() { + assert_no_default_exclude_stubs(child); + } + } + + #[tokio::test] + async fn explicitly_requested_excluded_root_disables_filter_for_whole_walk() { + let (tmp, r, c) = setup(); + std::fs::create_dir_all(tmp.path().join("node_modules/pkg")).unwrap(); + std::fs::write(tmp.path().join("node_modules/.package-lock.json"), "x").unwrap(); + std::fs::write(tmp.path().join("node_modules/pkg/index.js"), "x").unwrap(); + + let out = handle(r, c, input("node_modules")).await.unwrap(); + assert!(out.root.truncated.is_none()); + let kids = out.root.children.as_ref().expect("explicit root must list"); + let names: Vec<_> = kids.iter().map(|k| k.name.as_str()).collect(); + // File children directly inside the excluded root must be visible… + assert_eq!(names, vec![".package-lock.json", "pkg"]); + // …and subdirectories must actually descend, not stub one level down. + let pkg = kids.iter().find(|k| k.name == "pkg").unwrap(); + assert!(pkg.truncated.is_none()); + assert_eq!(pkg.children.as_ref().unwrap()[0].name, "index.js"); + assert_no_default_exclude_stubs(&out.root); + } + + #[tokio::test] + async fn entries_merely_named_like_excluded_dirs_are_kept_as_leaves() { + let (tmp, r, c) = setup(); + std::fs::create_dir(tmp.path().join("real")).unwrap(); + std::fs::write(tmp.path().join("dist"), "not a dir").unwrap(); + std::os::unix::fs::symlink(tmp.path().join("real"), tmp.path().join("node_modules")) + .unwrap(); + + let out = handle(r, c, input(".")).await.unwrap(); + let kids = out.root.children.unwrap(); + let dist = kids + .iter() + .find(|k| k.name == "dist") + .expect("a FILE named dist must not be dropped by the dir companion"); + assert!(matches!(dist.kind, NodeKind::File)); + assert!(dist.truncated.is_none()); + let nm = kids + .iter() + .find(|k| k.name == "node_modules") + .expect("a SYMLINK named node_modules must not be dropped"); + assert!(matches!(nm.kind, NodeKind::Symlink)); + assert!(nm.truncated.is_none()); } } diff --git a/coder/src/functions/update_file.rs b/coder/src/functions/update_file.rs index 3c791f66..dc6f7fa7 100644 --- a/coder/src/functions/update_file.rs +++ b/coder/src/functions/update_file.rs @@ -13,8 +13,13 @@ //! //! Content op: //! -//! - `{ op: "replace", pattern: "...", replacement: "...", ignore_case?: bool }` -//! — regex substitution on the full file body after line ops. +//! - `{ op: "replace", pattern: "...", replacement: "...", ignore_case?: bool, +//! dot_matches_newline?: bool, expect_matches?: u64 }` — regex substitution +//! on the full file body after line ops. `dot_matches_newline` lets `.` +//! cross newlines; `expect_matches` fails the whole file (C210, nothing +//! written) when the actual match count differs. Capture references in +//! `replacement` are validated pre-write: a reference to a group the +//! pattern does not define fails with C210 (write a literal `$` as `$$`). use std::path::Path; use std::sync::Arc; @@ -23,16 +28,22 @@ use schemars::JsonSchema; use serde::{Deserialize, Serialize}; use crate::config::CoderConfig; -use crate::error::{err_to_string, CoderError}; +use crate::error::{err_to_string, CoderError, WireError}; use crate::path::PathResolver; +// examples are wire-contract; goldens pin them. #[derive(Debug, Deserialize, JsonSchema)] +#[schemars(example = "example_update_file_input")] pub struct UpdateFileInput { pub files: Vec, } #[derive(Debug, Deserialize, JsonSchema)] pub struct UpdateFileSpec { + /// Path relative to the primary allowed root, or an absolute path inside + /// any allowed root. Call `coder::info` to see the allowed roots. Paths + /// outside every allowed root are rejected — use the shell worker's + /// `shell::fs::*` for host paths outside the jail. pub path: String, pub ops: Vec, } @@ -55,33 +66,134 @@ pub enum UpdateOp { /// Replace all regex matches in the file body (after line ops). Replace { pattern: String, + /// Substitution text for each match. Capture references expand: + /// `$1`/`${1}` by index (`$0` is the whole match) and + /// `$name`/`${name}` by name. A literal `$` MUST be written `$$` + /// — JS/TS template literals in a replacement are the classic + /// collision: write `Hello, $${name}!` to output `Hello, ${name}!`. + /// Unbraced references consume the longest `[0-9A-Za-z_]` run, + /// so `$1a` means a group named "1a", NOT group 1 then "a" (use + /// `${1}a`). A reference to a group the pattern does not define + /// fails pre-write with C210 — nothing is written. References + /// are validated even when the replacement goes unused (e.g. + /// `expect_matches: 0`): the replacement must be well-formed + /// even when unused. replacement: String, #[serde(default)] ignore_case: bool, + /// When true, `.` in `pattern` also matches `\n`, so a short + /// anchored pattern like `"fn parse_config\\(.*?\\n\\}"` replaces a + /// whole multi-line region without quoting it — prefer two short + /// anchors joined by `.*?` over pasting the block into the pattern. + /// Without this flag (the default), `.` does not cross newlines and + /// a multi-line pattern silently matches nothing. + #[serde(default)] + dot_matches_newline: bool, + /// Expected number of matches for this op. When set and the actual + /// count differs, this FILE fails with C210 and nothing is written + /// to it (other files in the batch still apply). Set + /// `expect_matches: 1` to make a targeted read-free edit safe — a + /// mismatch means the pattern is anchored too loosely or matches + /// nothing. Set `expect_matches: 0` to assert ABSENCE: the op + /// succeeds only when nothing matches (the replacement is unused). + /// Omit to replace all matches unconditionally. + #[serde(default)] + expect_matches: Option, }, } +// examples are wire-contract; goldens pin them. +fn example_update_file_input() -> serde_json::Value { + serde_json::json!({ + "files": [{ + "path": "src/lib.rs", + "ops": [ + { "op": "insert", "at_line": 1, "content": "// generated by coder\n" }, + { "op": "update_lines", "from_line": 5, "to_line": 7, + "content": "pub fn hello() {\n println!(\"hello\");\n}\n" }, + { "op": "replace", "pattern": "// BEGIN legacy.*?// END legacy", + "replacement": "// removed", "dot_matches_newline": true, + "expect_matches": 1 } + ] + }] + }) +} + +/// Post-apply snapshot of the region affected by one op. Line ops echo the +/// affected region with ±2 context lines; regex `replace` ops emit one echo +/// per match site (up to 5, no context): the FIRST and LAST line of the +/// post-replace region, with `elided` set to the inner line count when the +/// region spans more than 2 lines (single-line replacements echo just that +/// line). Each site carries `total_replacements`. Provides just enough +/// context to verify the edit landed in the right place without flooding +/// the LLM context with the full file body. +#[derive(Debug, Serialize, JsonSchema, PartialEq)] +pub struct OpEcho { + /// Index of the op in the request's ops array (0-based). + pub op_index: u32, + /// 1-based line number of the first echoed line (after all ops applied). + pub from_line: u64, + /// The echoed lines, post-apply. When the region is large, middle lines + /// are elided and `elided` is set to indicate how many were skipped. + pub lines: Vec, + /// Number of middle lines elided from a large region: for line ops, + /// set when the affected region exceeded the per-echo cap; for + /// replace sites, the count of inner lines between the region's + /// echoed first and last line (set when the region spans >2 lines). + #[serde(skip_serializing_if = "Option::is_none")] + pub elided: Option, + /// Total number of replacements the regex op made across the whole + /// file (set only on replace-op site echoes, duplicated on each site). + /// Sites are capped at 5 — when more matched, this count is the only + /// record of the extras. + #[serde(skip_serializing_if = "Option::is_none")] + pub total_replacements: Option, +} + #[derive(Debug, Serialize, JsonSchema)] pub struct UpdateFileOutput { pub results: Vec, } +/// Maximum lines echoed per op before elision kicks in (first 8 + last 8). +const ECHO_MAX_LINES: usize = 20; +/// Number of context lines shown above and below a line op's region. +/// Regex match sites echo the first and last line of the post-replace +/// region (context 0, inner lines elided). +const ECHO_CONTEXT: i64 = 2; +/// Max echoed lines in the head/tail of a large region. +const ECHO_HEAD_TAIL: usize = 8; +/// Approximate per-file echo budget in bytes (~4 KiB). +const ECHO_BUDGET_BYTES: usize = 4 * 1024; +/// Maximum number of match sites echoed for a `replace` op. +const ECHO_MAX_SITES: usize = 5; + #[derive(Debug, Serialize, JsonSchema)] pub struct UpdateFileResult { + /// Canonical absolute path (resolved through the jail); the caller's + /// input verbatim when resolution failed. pub path: String, pub success: bool, /// Number of operations applied (only meaningful when `success`). pub applied: u32, /// Final line count after applying (only meaningful when `success`). pub new_line_count: u64, - /// UTF-8 body before ops (only on success, capped by `max_read_bytes`). - #[serde(skip_serializing_if = "Option::is_none")] - pub before: Option, - /// UTF-8 body after ops (only on success, capped by `max_read_bytes`). + /// Per-op bounded post-apply echoes for edit verification; each applied + /// op returns a snapshot of the affected region (±2 context lines) so + /// the caller can confirm the edit landed at the right position without + /// receiving the full file body. See `OpEcho` for field semantics. + /// Empty on failure. Always present on the wire. + pub echoes: Vec, + /// True when the total echo budget (~4 KiB) was exhausted before all + /// op echoes could be emitted. Use `coder::read-file` to inspect the + /// full result if needed. Always present on the wire. + pub echoes_truncated: bool, + /// Structured error for this entry. `code` is stable for programmatic + /// branching (e.g. `"C211"` for not-found-or-denied; `"C210"` for bad + /// input such as overlapping ops). `message` carries the corrective + /// action an LLM agent needs to make a successful second call. #[serde(skip_serializing_if = "Option::is_none")] - pub after: Option, - #[serde(skip_serializing_if = "Option::is_none")] - pub error: Option, + pub error: Option, } pub async fn handle( @@ -106,36 +218,56 @@ fn update_one( cfg: &CoderConfig, spec: UpdateFileSpec, ) -> UpdateFileResult { - let path = spec.path.clone(); - match try_update_one(resolver, cfg, spec) { - Ok((applied, new_line_count, before, after)) => UpdateFileResult { - path, + // Resolve up front: the edit pipeline operates ONLY on the + // resolver-returned path, and the result echoes that canonical + // absolute path. When resolution fails there is no canonical path, + // so the caller's input is echoed verbatim. + let abs = match resolver.require_writable(&spec.path) { + Ok(abs) => abs, + Err(e) => { + return UpdateFileResult { + path: spec.path, + success: false, + applied: 0, + new_line_count: 0, + echoes: vec![], + echoes_truncated: false, + error: Some((&e).into()), + } + } + }; + let wire_path = abs.display().to_string(); + match try_update_one(cfg, &abs, spec) { + Ok((applied, new_line_count, echoes, echoes_truncated)) => UpdateFileResult { + path: wire_path, success: true, applied, new_line_count, - before, - after, + echoes, + echoes_truncated, error: None, }, Err(e) => UpdateFileResult { - path, + path: wire_path, success: false, applied: 0, new_line_count: 0, - before: None, - after: None, - error: Some(e.to_wire_string()), + echoes: vec![], + echoes_truncated: false, + error: Some((&e).into()), }, } } fn try_update_one( - resolver: &PathResolver, cfg: &CoderConfig, + abs: &Path, spec: UpdateFileSpec, -) -> Result<(u32, u64, Option, Option), CoderError> { - let abs = resolver.require_writable(&spec.path)?; - let md = std::fs::metadata(&abs)?; +) -> Result<(u32, u64, Vec, bool), CoderError> { + // NotFound is intercepted with the wire path in scope so the C211 + // message names the path the caller supplied (standardized wording — + // REDACTION INVARIANT: identical to the glob-denied message). + let md = std::fs::metadata(abs).map_err(|e| CoderError::io_for_path(e, &spec.path))?; if !md.is_file() { return Err(CoderError::BadInput(format!( "not a regular file: {}", @@ -144,7 +276,9 @@ fn try_update_one( } if md.len() > cfg.max_write_bytes { return Err(CoderError::TooLarge(format!( - "current file size {} exceeds max_write_bytes {}", + "{} current file size is {} bytes, which exceeds max_write_bytes \ + ({}). Split the content or raise max_write_bytes in coder config.", + spec.path, md.len(), cfg.max_write_bytes ))); @@ -153,44 +287,668 @@ fn try_update_one( return Err(CoderError::BadInput("ops must not be empty".into())); } - let bytes = std::fs::read(&abs)?; - let before = utf8_snapshot(&bytes, cfg.max_read_bytes); + let bytes = std::fs::read(abs).map_err(|e| CoderError::io_for_path(e, &spec.path))?; let (mut lines, ending, has_trailing) = split_file(&bytes); let original_len = lines.len(); - let (line_ops, replace_ops): (Vec<&UpdateOp>, Vec<&UpdateOp>) = - spec.ops.iter().partition(|op| is_line_op(op)); + let line_ops: Vec<&UpdateOp> = spec.ops.iter().filter(|op| is_line_op(op)).collect(); validate_line_ops(&line_ops, original_len)?; apply_line_ops(&mut lines, &line_ops)?; + // Mutation-event timeline: every line-count-changing mutation is + // recorded in true application order so echo anchors can be mapped + // to FINAL-body coordinates (see `MutationEvent`/`map_through_events`). + let mut events: Vec = Vec::new(); + let mut anchors: Vec = Vec::new(); + record_line_op_events(&spec.ops, &mut events, &mut anchors); + let mut new_bytes = join_lines(&lines, ending, has_trailing); - new_bytes = apply_regex_replaces(new_bytes, &replace_ops)?; + new_bytes = apply_regex_ops(new_bytes, &spec.ops, &mut events, &mut anchors)?; let (final_lines, _, _) = split_file(&new_bytes); if (new_bytes.len() as u64) > cfg.max_write_bytes { return Err(CoderError::TooLarge(format!( - "new file size {} exceeds max_write_bytes {}", + "{} new file size after ops is {} bytes, which exceeds \ + max_write_bytes ({}). Split the content or raise \ + max_write_bytes in coder config.", + spec.path, new_bytes.len(), cfg.max_write_bytes ))); } - atomic_write(&abs, &new_bytes)?; - let after = utf8_snapshot(&new_bytes, cfg.max_read_bytes); + atomic_write(abs, &new_bytes)?; + + // Build per-op echoes by mapping each anchor through the events that + // applied after it, then extracting from the FINAL body. + let (echoes, echoes_truncated) = build_echoes(&anchors, &events, &final_lines); + Ok(( spec.ops.len() as u32, final_lines.len() as u64, - before, - after, + echoes, + echoes_truncated, )) } -/// Include a UTF-8 snapshot in the wire response when the body fits `max_read_bytes`. -fn utf8_snapshot(bytes: &[u8], max_bytes: u64) -> Option { - if (bytes.len() as u64) > max_bytes { +// --------------------------------------------------------------------------- +// Echo construction — mutation-event timeline +// +// Every line-count-changing mutation is recorded as an event in TRUE +// application order: line ops bottom-up first (exactly as `apply_line_ops` +// applies them), then each regex op's replacements sequentially in match +// order. Echo anchors captured at event k map to the final body by +// shifting through every later event (`map_through_events`); content is +// always extracted from the FINAL body, clamped so a coordinate bug can +// only produce a bounded wrong echo, never a panic. +// --------------------------------------------------------------------------- + +/// One newline-count-changing mutation in true application order. `pos` is +/// the 1-based line of the mutation in ITS OWN input body (the body after +/// all earlier events applied); `delta` is the line-count change it caused. +/// Zero-delta mutations are not recorded — they cannot shift any position. +#[derive(Debug, Clone, Copy)] +struct MutationEvent { + pos: i64, + delta: i64, +} + +/// A pending echo region captured when its mutation applied, expressed in +/// the body as it existed right after `events[..events_after]` ran. It is +/// mapped to the final body by shifting through every later event. +#[derive(Debug)] +struct EchoAnchor { + op_index: u32, + /// Inclusive 1-based region in the post-mutation body at capture time. + first: i64, + last: i64, + /// `events.len()` at capture time: only later events shift this anchor. + events_after: usize, + /// Context lines around the region (±2 for line ops, 0 for regex sites). + context: i64, + /// Regex sites echo only the region's FIRST and LAST line (inner + /// lines counted in `elided`); line ops echo the full region with + /// head/tail elision. + first_last_only: bool, + /// Regex sites carry the op's total replacement count. + total_replacements: Option, +} + +/// Map a 1-based line position forward through `later` mutation events. +/// Every event at a position ≤ `pos` shifts it by the event's delta; +/// events strictly below never move content above them. Each event's +/// position is expressed in its own input frame — exactly the frame `pos` +/// occupies after shifting through all earlier events in the slice — so a +/// single left-to-right pass is correct. +fn map_through_events(pos: i64, later: &[MutationEvent]) -> i64 { + let mut p = pos; + for ev in later { + if ev.pos <= p { + p += ev.delta; + } + } + p +} + +fn count_newlines(bytes: &[u8]) -> i64 { + bytes.iter().filter(|&&b| b == b'\n').count() as i64 +} + +/// Record one mutation event + echo anchor per line op, in true application +/// order (descending anchor — mirrors `apply_line_ops`). In each op's input +/// frame its region starts at its ORIGINAL anchor: ops already applied all +/// sit strictly above it (overlap-validated), so they never shift it. +fn record_line_op_events( + ops: &[UpdateOp], + events: &mut Vec, + anchors: &mut Vec, +) { + let mut line_ops: Vec<(usize, &UpdateOp)> = ops + .iter() + .enumerate() + .filter(|(_, op)| is_line_op(op)) + .collect(); + line_ops.sort_by_key(|b| std::cmp::Reverse(anchor(b.1))); + + for (op_index, op) in line_ops { + let (pos, delta, first, last) = match op { + UpdateOp::Insert { at_line, content } => { + let m = split_content(content).len() as i64; + let a = *at_line as i64; + (a, m, a, a + m - 1) + } + UpdateOp::Remove { from_line, to_line } => { + let a = *from_line as i64; + let consumed = *to_line as i64 - a + 1; + // The region is gone — echo the lines now surrounding the + // removal point (context expands the window). + (a, -consumed, a - 1, a - 1) + } + UpdateOp::UpdateLines { + from_line, + to_line, + content, + } => { + let m = split_content(content).len() as i64; + let a = *from_line as i64; + let consumed = *to_line as i64 - a + 1; + (a, m - consumed, a, a + m - 1) + } + UpdateOp::Replace { .. } => unreachable!("line ops only"), + }; + if delta != 0 { + events.push(MutationEvent { pos, delta }); + } + anchors.push(EchoAnchor { + op_index: op_index as u32, + first, + last, + events_after: events.len(), + context: ECHO_CONTEXT, + first_last_only: false, + total_replacements: None, + }); + } +} + +/// Apply every `replace` op in `ops` (in array order), each on the body +/// produced by all earlier ops, recording mutation events and up to +/// `ECHO_MAX_SITES` echo anchors per op. Match positions are located on +/// each op's OWN input body; earlier replacements within the same op shift +/// later sites through the event list like any other mutation. +fn apply_regex_ops( + mut bytes: Vec, + ops: &[UpdateOp], + events: &mut Vec, + anchors: &mut Vec, +) -> Result, CoderError> { + for (op_index, op) in ops.iter().enumerate() { + let UpdateOp::Replace { + pattern, + replacement, + ignore_case, + dot_matches_newline, + expect_matches, + } = op + else { + continue; + }; + if pattern.is_empty() { + return Err(CoderError::BadInput( + "replace.pattern must not be empty".into(), + )); + } + let mut builder = regex::RegexBuilder::new(pattern); + builder.case_insensitive(*ignore_case); + builder.dot_matches_new_line(*dot_matches_newline); + let re = builder + .build() + .map_err(|e| CoderError::BadInput(format!("bad regex {pattern:?}: {e}")))?; + // R1 (v0.4.1): pre-write guard — every `$` capture reference in + // the replacement must name a group the pattern actually defines. + // Runs right after compilation, before ANY replacement work, so + // an undefined reference fails the entry (C210) with the file + // byte-identical on disk. + validate_replacement_refs(&re, pattern, replacement)?; + + let op_events_start = events.len(); + // (site first/last line in its own frame, events.len() at + // capture); anchors are created after the op's total count is + // known. `last` = first + newline count of the expanded + // replacement: the full post-replace region of this site. + let mut sites: Vec<(i64, i64, usize)> = Vec::new(); + let mut total: u64 = 0; + // Incremental input-frame line counter for ascending match starts. + let mut scan_pos: usize = 0; + let mut scan_line: i64 = 0; + + let s = String::from_utf8_lossy(&bytes); + let out = re + .replace_all(&s, |caps: ®ex::Captures<'_>| { + let m = caps.get(0).expect("regex group 0 always present"); + scan_line += count_newlines(s[scan_pos..m.start()].as_bytes()); + scan_pos = m.start(); + let mut expanded = String::new(); + caps.expand(replacement, &mut expanded); + let expanded_newlines = count_newlines(expanded.as_bytes()); + let delta = expanded_newlines - count_newlines(m.as_str().as_bytes()); + // Input-frame line -> this match's own frame (earlier + // matches of this op may already have shifted it). + // O(k) per match within this op (k = earlier delta-events); + // O(M^2) total for M newline-changing matches. Acceptable: + // max_write_bytes caps input and most replaces are delta-0. + let pos = map_through_events(scan_line + 1, &events[op_events_start..]); + if delta != 0 { + events.push(MutationEvent { pos, delta }); + } + if sites.len() < ECHO_MAX_SITES { + sites.push((pos, pos + expanded_newlines, events.len())); + } + total += 1; + expanded + }) + .into_owned(); + // expect_matches guard: validated BEFORE any filesystem mutation — + // `try_update_one` only reaches `atomic_write` after every op in + // the file's pipeline succeeded, so erroring here leaves the file + // byte-identical on disk (earlier ops' changes are in-memory only) + // and the per-entry failure carries no echoes. + // HAZARD: the failing op's MutationEvents are already pushed into + // `events` — safety hinges on the WHOLE frame being discarded on + // Err; a future resumable/per-op refactor must rewind them. + if let Some(expected) = expect_matches { + if total != *expected { + return Err(expect_matches_mismatch(pattern, total, *expected)); + } + } + bytes = out.into_bytes(); + for (first, last, events_after) in sites { + anchors.push(EchoAnchor { + op_index: op_index as u32, + first, + last, + events_after, + context: 0, + first_last_only: true, + total_replacements: Some(total), + }); + } + } + Ok(bytes) +} + +/// Max pattern chars echoed in an `expect_matches` mismatch message — +/// enough to disambiguate WHICH replace op failed in a multi-op file +/// without flooding the error with a huge regex. +const PATTERN_SNIPPET_MAX_CHARS: usize = 60; + +/// Quote a pattern for an error message, truncating to +/// [`PATTERN_SNIPPET_MAX_CHARS`] (char-boundary safe) with a `…` marker. +fn pattern_snippet(pattern: &str) -> String { + if pattern.chars().count() <= PATTERN_SNIPPET_MAX_CHARS { + format!("\"{pattern}\"") + } else { + let head: String = pattern.chars().take(PATTERN_SNIPPET_MAX_CHARS).collect(); + format!("\"{head}…\"") + } +} + +/// "1 time" / "N times" — mismatch messages read like prose, and the +/// assert-absent failure commonly reports exactly one match. +fn times(n: u64) -> String { + if n == 1 { + "1 time".to_string() + } else { + format!("{n} times") + } +} + +/// Prescriptive C210 for an `expect_matches` mismatch: names the failing +/// pattern (truncated) plus the actual and expected counts, and the +/// corrective next call. Nothing matched → re-check anchors with +/// `coder::search` / `dot_matches_newline`. Expected 0 (assert-absent) +/// but something matched → also route to `coder::search`; suggesting +/// `expect_matches: {actual}` here would invert the caller's +/// assert-absent intent. Otherwise → tighten anchors or accept the +/// observed count. +fn expect_matches_mismatch(pattern: &str, actual: u64, expected: u64) -> CoderError { + let shown = pattern_snippet(pattern); + if actual == 0 { + CoderError::BadInput(format!( + "replace pattern {shown} matched 0 times, expected {expected} — \ + re-check the anchors with coder::search (the pattern may need \ + dot_matches_newline: true or different anchor text)" + )) + } else if expected == 0 { + CoderError::BadInput(format!( + "replace pattern {shown} matched {}, expected 0 — \ + the pattern asserted absent still matches; re-check the \ + match sites with coder::search before replacing", + times(actual) + )) + } else { + CoderError::BadInput(format!( + "replace pattern {shown} matched {}, expected {expected} — \ + anchor the regex more tightly (add surrounding context) or set \ + expect_matches: {actual}", + times(actual) + )) + } +} + +// --------------------------------------------------------------------------- +// R1 (v0.4.1) — pre-write validation of replacement capture references. +// +// The regex crate expands UNDEFINED `$` references to the EMPTY STRING +// (documented `Captures::expand` behavior). A replacement carrying +// literal `$…` text — JS/TS template literals being the production case +// (session q8x6g248: `Hello, ${name}!` silently became `Hello, !` with +// success: true) — therefore corrupts the file unless every reference is +// checked against the pattern's actual groups BEFORE any replacement +// work. The tokenizer below mirrors regex-automata's +// `util::interpolate::{string, find_cap_ref}` exactly; the parity test +// `r1_validator_matches_expand_semantics` pins both sides. A regex bump +// that breaks that test means interpolate semantics moved — re-mirror +// this tokenizer. +// --------------------------------------------------------------------------- + +/// A capture reference parsed from a replacement string, plus the exact +/// text it was written as (`$1a`, `${name}`, …) for error messages. +struct ReplacementRef<'a> { + /// The reference as written, including the leading `$`. + written: &'a str, + kind: RefKind<'a>, +} + +enum RefKind<'a> { + Index(usize), + Named(&'a str), +} + +/// Tokenize `replacement` EXACTLY like `Captures::expand`: +/// - `$$` is a literal-`$` escape (no reference); +/// - unbraced `$ref` consumes the longest run of `[0-9A-Za-z_]`; +/// - braced `${ref}` accepts ANY text up to the first `}` (`${}` +/// included — a reference to the empty name, which no pattern can +/// define); +/// - a `$` that cannot start a reference (`$ `, `$.`, trailing `$`, +/// unclosed `${`) is literal; +/// - refs that parse as `usize` are INDEX refs, everything else is +/// NAMED — so `$1a` names the group "1a", it is NOT `$1` then "a". +fn replacement_refs(replacement: &str) -> Vec> { + let mut refs = Vec::new(); + let mut s = replacement; + while let Some(i) = s.find('$') { + s = &s[i..]; + if s.as_bytes().get(1) == Some(&b'$') { + s = &s[2..]; // `$$` escape — literal `$`, no reference. + continue; + } + match parse_cap_ref(s) { + Some((kind, end)) => { + refs.push(ReplacementRef { + written: &s[..end], + kind, + }); + s = &s[end..]; + } + None => s = &s[1..], // literal `$` + } + } + refs +} + +/// Parse one possible capture reference at the start of `s` (which +/// begins with `$`). Returns the reference and its end offset, or None +/// when the `$` cannot start a reference (it is then literal). Mirrors +/// regex-automata's `find_cap_ref`/`find_cap_ref_braced`. +fn parse_cap_ref(s: &str) -> Option<(RefKind<'_>, usize)> { + let rep = s.as_bytes(); + if rep.len() <= 1 || rep[0] != b'$' { + return None; + } + if rep[1] == b'{' { + // Braced: anything up to the first `}`; unclosed brace → literal. + let close = s[2..].find('}')? + 2; + return Some((classify_cap_ref(&s[2..close]), close + 1)); + } + let mut end = 1; + while rep.get(end).copied().is_some_and(is_cap_letter) { + end += 1; + } + if end == 1 { return None; } - std::str::from_utf8(bytes).ok().map(str::to_owned) + Some((classify_cap_ref(&s[1..end]), end)) +} + +/// usize-parseable refs are INDEX references; everything else is NAMED +/// (matching `find_cap_ref`'s `parse::()` fallback). +fn classify_cap_ref(name: &str) -> RefKind<'_> { + match name.parse::() { + Ok(i) => RefKind::Index(i), + Err(_) => RefKind::Named(name), + } +} + +fn is_cap_letter(b: u8) -> bool { + matches!(b, b'0'..=b'9' | b'a'..=b'z' | b'A'..=b'Z' | b'_') +} + +/// Validate every capture reference in `replacement` against the +/// compiled pattern's actual groups. First undefined reference → +/// prescriptive C210; nothing is written for the entry (the caller only +/// reaches `atomic_write` after every op succeeded). +fn validate_replacement_refs( + re: ®ex::Regex, + pattern: &str, + replacement: &str, +) -> Result<(), CoderError> { + let group_count = re.captures_len(); // includes group 0 (whole match) + for r in replacement_refs(replacement) { + let defined = match r.kind { + RefKind::Index(i) => i < group_count, + RefKind::Named(name) => re.capture_names().flatten().any(|n| n == name), + }; + if !defined { + return Err(undefined_capture_ref(pattern, re, &r)); + } + } + Ok(()) +} + +/// Prescriptive C210 for an undefined capture reference: names the +/// offending reference as written, states what the pattern actually +/// defines, and gives the corrective rewrites (escape literal `$` as +/// `$$` — the JS/TS template-literal collision — or add the group). +fn undefined_capture_ref(pattern: &str, re: ®ex::Regex, r: &ReplacementRef<'_>) -> CoderError { + let shown = pattern_snippet(pattern); + let explicit = re.captures_len() - 1; + let named: Vec = re + .capture_names() + .flatten() + .map(|n| format!("`{n}`")) + .collect(); + let plural = if explicit == 1 { "" } else { "s" }; + let defines = match (&r.kind, explicit) { + (RefKind::Index(_), 0) => { + "defines 0 capture groups (only $0, the whole match, is valid)".to_string() + } + (RefKind::Index(_), n) => { + format!("defines {n} capture group{plural} (valid: $0, the whole match, through ${n})") + } + (RefKind::Named(name), 0) => { + format!("defines 0 capture groups and no group named `{name}`") + } + (RefKind::Named(name), n) if named.is_empty() => { + format!("defines {n} unnamed capture group{plural} and no group named `{name}`") + } + (RefKind::Named(name), n) => format!( + "defines {n} capture group{plural} (named: {}) and no group named `{name}`", + named.join(", ") + ), + }; + let written = r.written; + CoderError::BadInput(format!( + "replacement references capture group `{written}` but pattern {shown} \ + {defines} — the regex engine expands undefined references to the \ + EMPTY STRING, silently corrupting the file. Escape literal `$` as \ + `$$` (write `${written}` to output a literal `{written}` — common \ + when the replacement contains JS/TS template literals), or add the \ + capture group to the pattern" + )) +} + +/// Emit per-op echoes in op-index order (sites keep match order), mapping +/// each anchor to the final body and enforcing the per-file byte budget. +fn build_echoes( + anchors: &[EchoAnchor], + events: &[MutationEvent], + final_lines: &[String], +) -> (Vec, bool) { + let mut order: Vec = (0..anchors.len()).collect(); + order.sort_by_key(|&i| anchors[i].op_index); + + let mut echoes: Vec = Vec::new(); + let mut budget_bytes: usize = ECHO_BUDGET_BYTES; + for &i in &order { + let a = &anchors[i]; + let later = &events[a.events_after.min(events.len())..]; + let first = map_through_events(a.first, later); + let last = map_through_events(a.last, later); + let echo = if a.first_last_only { + build_site_echo(a.op_index, first, last, final_lines, a.total_replacements) + } else { + build_line_echo( + a.op_index, + first, + last, + a.context, + final_lines, + a.total_replacements, + ) + }; + if !append_echo(echo, &mut echoes, &mut budget_bytes) { + return (echoes, true); + } + } + (echoes, false) +} + +/// Build a replace-site `OpEcho` for the post-replace region +/// `[post_first..=post_last]` (1-based): single- and two-line regions +/// echo every line; larger regions echo only the FIRST and LAST line +/// with `elided` set to the inner line count, so the tail of a +/// multi-line replacement is always visible without flooding the budget +/// (sites are capped at [`ECHO_MAX_SITES`] per op). +/// +/// DEFENSIVE (panic guard): same clamping discipline as +/// [`build_line_echo`] — a wrong upstream coordinate degrades to a +/// bounded, possibly-off-position echo, never a panic. +fn build_site_echo( + op_index: u32, + post_first: i64, + post_last: i64, + final_lines: &[String], + total_replacements: Option, +) -> OpEcho { + let n = final_lines.len() as i64; + if n == 0 { + return OpEcho { + op_index, + from_line: 1, + lines: vec![], + elided: None, + total_replacements, + }; + } + let start = post_first.saturating_sub(1).clamp(0, n - 1); + let mut end = post_last.saturating_sub(1).clamp(0, n - 1); + if end < start { + // Inverted region (a collapse event pulled `last` above `first`): + // degrade to a single-line echo, mirroring build_line_echo. + end = start; + } + let (start, end) = (start as usize, end as usize); + let region_len = end - start + 1; // end >= start by construction + let from_line = (start + 1) as u64; + if region_len <= 2 { + OpEcho { + op_index, + from_line, + lines: final_lines[start..=end].to_vec(), + elided: None, + total_replacements, + } + } else { + OpEcho { + op_index, + from_line, + lines: vec![final_lines[start].clone(), final_lines[end].clone()], + elided: Some((region_len - 2) as u64), + total_replacements, + } + } +} + +/// Append an echo if it fits in the budget. Returns false when the budget +/// is exhausted (the first echo is always admitted). +fn append_echo(echo: OpEcho, echoes: &mut Vec, budget_bytes: &mut usize) -> bool { + let estimate = echo.lines.iter().map(|l| l.len() + 1).sum::() + 64; // overhead + if *budget_bytes < estimate && !echoes.is_empty() { + return false; + } + *budget_bytes = budget_bytes.saturating_sub(estimate); + echoes.push(echo); + true +} + +/// Build an `OpEcho` for the post-apply region `[post_first..=post_last]` +/// (1-based), expanded by `context` lines on each side, eliding the middle +/// of large regions. +/// +/// DEFENSIVE (panic guard): all coordinates are saturating/clamped to the +/// final body, so a wrong upstream coordinate degrades to a bounded, +/// possibly-off-position echo — never an arithmetic or slice panic. +fn build_line_echo( + op_index: u32, + post_first: i64, + post_last: i64, + context: i64, + final_lines: &[String], + total_replacements: Option, +) -> OpEcho { + let n = final_lines.len() as i64; + if n == 0 { + return OpEcho { + op_index, + from_line: 1, + lines: vec![], + elided: None, + total_replacements, + }; + } + + let start = post_first + .saturating_sub(1) + .saturating_sub(context) + .clamp(0, n - 1); + let mut end = post_last + .saturating_sub(1) + .saturating_add(context) + .clamp(0, n - 1); + if end < start { + // Inverted region (e.g. empty insert, or a collapse event pulled + // `last` above `first`): degrade to a single-line echo. + // Known cosmetic (ADV-1): the echoed line may be off by one + // from the ideal collapse point; accepted as harmless. + end = start; + } + let (start, end) = (start as usize, end as usize); + let region_len = end - start + 1; // end >= start by construction + let from_line = (start + 1) as u64; + + if region_len <= ECHO_MAX_LINES { + OpEcho { + op_index, + from_line, + lines: final_lines[start..=end].to_vec(), + elided: None, + total_replacements, + } + } else { + // Large region: first ECHO_HEAD_TAIL + last ECHO_HEAD_TAIL lines. + let mut lines = final_lines[start..start + ECHO_HEAD_TAIL].to_vec(); + let tail_start = (end + 1).saturating_sub(ECHO_HEAD_TAIL); + lines.extend_from_slice(&final_lines[tail_start..=end]); + OpEcho { + op_index, + from_line, + lines, + elided: Some((region_len - ECHO_HEAD_TAIL * 2) as u64), + total_replacements, + } + } } fn is_line_op(op: &UpdateOp) -> bool { @@ -239,31 +997,12 @@ fn apply_line_ops(lines: &mut Vec, ops: &[&UpdateOp]) -> Result<(), Code Ok(()) } -fn apply_regex_replaces(mut bytes: Vec, ops: &[&UpdateOp]) -> Result, CoderError> { - for op in ops { - let UpdateOp::Replace { - pattern, - replacement, - ignore_case, - } = op - else { - continue; - }; - if pattern.is_empty() { - return Err(CoderError::BadInput( - "replace.pattern must not be empty".into(), - )); - } - let mut builder = regex::RegexBuilder::new(pattern); - builder.case_insensitive(*ignore_case); - let re = builder - .build() - .map_err(|e| CoderError::BadInput(format!("bad regex {pattern:?}: {e}")))?; - let s = String::from_utf8_lossy(&bytes); - let out = re.replace_all(&s, replacement.as_str()); - bytes = out.into_owned().into_bytes(); - } - Ok(bytes) +// Keep the old name for the unit tests that exercise regex semantics +// without caring about echo traces. +#[cfg(test)] +fn apply_regex_replaces(bytes: Vec, ops: &[&UpdateOp]) -> Result, CoderError> { + let owned: Vec = ops.iter().map(|op| (*op).clone()).collect(); + apply_regex_ops(bytes, &owned, &mut Vec::new(), &mut Vec::new()) } /// Apply line ops only — used by unit tests that exercise line semantics @@ -668,6 +1407,8 @@ mod tests { pattern: "foo".into(), replacement: "baz".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }], ) .unwrap(); @@ -682,6 +1423,8 @@ mod tests { pattern: r"(\w+)=(\d+)".into(), replacement: "$1: $2".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }], ) .unwrap(); @@ -696,6 +1439,8 @@ mod tests { pattern: "foo".into(), replacement: "bar".into(), ignore_case: true, + dot_matches_newline: false, + expect_matches: None, }], ) .unwrap(); @@ -710,6 +1455,8 @@ mod tests { pattern: "[unclosed".into(), replacement: "y".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }], ) .unwrap_err(); @@ -724,6 +1471,8 @@ mod tests { pattern: String::new(), replacement: "y".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }], ) .unwrap_err(); @@ -738,90 +1487,454 @@ mod tests { pattern: "missing".into(), replacement: "x".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }], ) .unwrap(); assert_eq!(out, b"hello"); } -} - -#[cfg(test)] -mod handler_tests { - use super::*; - use tempfile::tempdir; - fn setup() -> (tempfile::TempDir, Arc, Arc) { - let tmp = tempdir().unwrap(); - let cfg = Arc::new(CoderConfig { - base_path: tmp.path().to_path_buf(), - non_accessible_globs: vec!["**/.env".to_string()], - max_read_bytes: 1024 * 1024, - max_write_bytes: 1024 * 1024, - ..CoderConfig::default() - }); - let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); - (tmp, resolver, cfg) - } + // ----------------------------------------------------------------------- + // S3 (v0.4.0): dot_matches_newline + expect_matches regex semantics. + // ----------------------------------------------------------------------- - #[tokio::test] - async fn end_to_end_single_file_update_lines_writes_atomically() { - let (tmp, r, c) = setup(); - std::fs::write(tmp.path().join("a.txt"), "1\n2\n3\n").unwrap(); - let out = handle( - r, - c, - UpdateFileInput { - files: vec![UpdateFileSpec { - path: "a.txt".into(), - ops: vec![UpdateOp::UpdateLines { - from_line: 2, - to_line: 2, - content: "TWO".into(), - }], - }], - }, + #[test] + fn regex_dot_does_not_cross_newlines_by_default() { + let out = apply_regex_replaces( + b"start\nmiddle\nend\n".to_vec(), + &[&UpdateOp::Replace { + pattern: "start.*?end".into(), + replacement: "X".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }], ) - .await .unwrap(); - assert_eq!(out.results.len(), 1); - let r0 = &out.results[0]; - assert!(r0.success, "got: {:?}", r0.error); - assert_eq!(r0.applied, 1); - assert_eq!( - std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), - "1\nTWO\n3\n" - ); + assert_eq!(out, b"start\nmiddle\nend\n"); } - #[tokio::test] - async fn end_to_end_regex_replace() { - let (tmp, r, c) = setup(); - std::fs::write(tmp.path().join("a.txt"), "foo bar foo\n").unwrap(); - let out = handle( - r, - c, - UpdateFileInput { - files: vec![UpdateFileSpec { - path: "a.txt".into(), - ops: vec![UpdateOp::Replace { - pattern: "foo".into(), - replacement: "baz".into(), - ignore_case: false, - }], - }], - }, + #[test] + fn regex_dot_matches_newline_crosses_lines() { + let out = apply_regex_replaces( + b"start\nmiddle\nend\n".to_vec(), + &[&UpdateOp::Replace { + pattern: "start.*?end".into(), + replacement: "X".into(), + ignore_case: false, + dot_matches_newline: true, + expect_matches: None, + }], ) - .await .unwrap(); - assert!(out.results[0].success); - assert_eq!( - std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), - "baz bar baz\n" - ); + assert_eq!(out, b"X\n"); } - #[tokio::test] - async fn mixed_update_lines_then_regex_replace() { + #[test] + fn regex_expect_matches_exact_count_ok() { + let out = apply_regex_replaces( + b"foo bar".to_vec(), + &[&UpdateOp::Replace { + pattern: "foo".into(), + replacement: "baz".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + ) + .unwrap(); + assert_eq!(out, b"baz bar"); + } + + #[test] + fn regex_expect_matches_mismatch_is_c210() { + let err = apply_regex_replaces( + b"foo foo foo".to_vec(), + &[&UpdateOp::Replace { + pattern: "foo".into(), + replacement: "bar".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + ) + .unwrap_err(); + assert_eq!(err.code(), "C210"); + } + + // --------------------------------------------------------------------------- + // Echo primitive unit tests (the scenario coverage lives in + // handler_tests, driven through the real pipeline) + // --------------------------------------------------------------------------- + + #[test] + fn map_through_events_shifts_only_positions_at_or_below() { + let events = [ + MutationEvent { pos: 2, delta: 1 }, + MutationEvent { pos: 10, delta: -3 }, + ]; + // Above both events (after the first shifts it past the second). + assert_eq!(map_through_events(12, &events), 10); + // Between: only the first event is at/below. + assert_eq!(map_through_events(5, &events), 6); + // Below both: untouched. + assert_eq!(map_through_events(1, &events), 1); + // Exactly at an event position: shifted (<= rule). + assert_eq!(map_through_events(2, &events), 3); + } + + #[test] + fn map_through_events_sequential_frames() { + // Three line-collapsing replacements all at (current-frame) line 1: + // a position at line 4 must end up at line 1. + let events = [ + MutationEvent { pos: 1, delta: -1 }, + MutationEvent { pos: 1, delta: -1 }, + MutationEvent { pos: 1, delta: -1 }, + ]; + assert_eq!(map_through_events(4, &events), 1); + } + + /// A1b panic guard: out-of-range and inverted regions must degrade to + /// bounded echoes, never panic (debug overflow or release slice OOB). + #[test] + fn build_line_echo_clamps_out_of_range_regions() { + let final_lines: Vec = vec!["a".into(), "b".into()]; + // Region far beyond EOF. + let e = build_line_echo(0, 9, 9, 2, &final_lines, None); + assert!(e.lines.len() <= 2); + assert_eq!(e.from_line, 2); + // Inverted region (last << first). + let e = build_line_echo(0, 5, -3, 2, &final_lines, None); + assert!(e.lines.len() <= 2); + // Negative positions. + let e = build_line_echo(0, -10, -10, 2, &final_lines, None); + assert_eq!(e.from_line, 1); + // Empty final body. + let e = build_line_echo(0, 1, 1, 2, &[], None); + assert!(e.lines.is_empty()); + // i64 extremes must not overflow the saturating math. + let e = build_line_echo(0, i64::MIN, i64::MAX, 2, &final_lines, None); + assert!(e.lines.len() <= 2); + } + + /// R4 panic guard (mirrors the A1b test above): `build_site_echo` + /// claims the same clamping discipline as `build_line_echo` — pin it + /// at unit level: out-of-range and inverted regions degrade to + /// bounded echoes, never panic. + #[test] + fn build_site_echo_clamps_out_of_range_regions() { + let final_lines: Vec = vec!["a".into(), "b".into()]; + // Region far beyond EOF. + let e = build_site_echo(0, 9, 9, &final_lines, None); + assert!(e.lines.len() <= 2); + assert_eq!(e.from_line, 2); + // Inverted region (last << first): degrades to a single line. + let e = build_site_echo(0, 5, -3, &final_lines, None); + assert_eq!(e.lines, vec!["b"]); + assert_eq!(e.elided, None); + // Negative positions. + let e = build_site_echo(0, -10, -10, &final_lines, None); + assert_eq!(e.from_line, 1); + // Empty final body. + let e = build_site_echo(0, 1, 1, &[], None); + assert!(e.lines.is_empty()); + // i64 extremes must not overflow the saturating math; the huge + // clamped region still echoes only first + last with elision. + let e = build_site_echo(0, i64::MIN, i64::MAX, &final_lines, None); + assert!(e.lines.len() <= 2); + } + + #[test] + fn build_line_echo_elides_large_regions() { + let final_lines: Vec = (1..=100).map(|i| format!("L{i}")).collect(); + let e = build_line_echo(0, 1, 100, 2, &final_lines, None); + assert_eq!(e.lines.len(), ECHO_HEAD_TAIL * 2); + assert_eq!(e.elided, Some(100 - ECHO_HEAD_TAIL as u64 * 2)); + assert_eq!(e.from_line, 1); + assert_eq!(e.lines[0], "L1"); + assert_eq!(e.lines[ECHO_HEAD_TAIL * 2 - 1], "L100"); + } + + // ----------------------------------------------------------------------- + // R1 (v0.4.1): validator-vs-expand parity. + // ----------------------------------------------------------------------- + + /// R1 — the validator must tokenize `replacement` EXACTLY like + /// `Captures::expand` (regex-automata `util::interpolate`). Each row + /// pins BOTH (a) the validator's verdict and (b) the actual + /// `expand()` output, so any divergence between our tokenizer and the + /// regex crate's interpolation shows up here. Every `valid: false` + /// row's expansion demonstrates the hazard the validator guards: the + /// undefined reference silently expands to the EMPTY STRING. + #[test] + fn r1_validator_matches_expand_semantics() { + struct Case { + pattern: &'static str, + haystack: &'static str, + replacement: &'static str, + valid: bool, + expanded: &'static str, + } + let cases = [ + // Index and named references to defined groups: valid. + Case { + pattern: r"(\w+)=(\d+)", + haystack: "HOST=8080", + replacement: "$1: $2", + valid: true, + expanded: "HOST: 8080", + }, + Case { + pattern: r"(\w+)", + haystack: "abc", + replacement: "[$0]", + valid: true, + expanded: "[abc]", + }, + Case { + pattern: r"(?P\d+)", + haystack: "42", + replacement: "n=$num", + valid: true, + expanded: "n=42", + }, + // `$$` is the literal-$ escape: no reference, valid. + Case { + pattern: "foo", + haystack: "foo", + replacement: "a $$ b", + valid: true, + expanded: "a $ b", + }, + Case { + pattern: "foo", + haystack: "foo", + replacement: "$${name}", + valid: true, + expanded: "${name}", + }, + // The production hazard: undefined named reference (JS/TS + // template literal) expands to the EMPTY STRING. + Case { + pattern: "foo", + haystack: "foo", + replacement: "`Hello, ${name}!`", + valid: false, + expanded: "`Hello, !`", + }, + // Unbraced refs consume the longest [0-9A-Za-z_] run: `$1a` + // names group "1a" (NOT group 1 followed by "a"). + Case { + pattern: "(x)", + haystack: "x", + replacement: "$1a", + valid: false, + expanded: "", + }, + Case { + pattern: "(x)", + haystack: "x", + replacement: "${1}a", + valid: true, + expanded: "xa", + }, + // `$1_` is also one named run ("1_"), undefined here; the + // following `$0` is a separate, defined reference. + Case { + pattern: "(x)", + haystack: "x", + replacement: "$1_$0", + valid: false, + expanded: "x", + }, + // Index out of range. + Case { + pattern: "(x)", + haystack: "x", + replacement: "$2", + valid: false, + expanded: "", + }, + // Leading zeros still parse as an index. + Case { + pattern: "(x)", + haystack: "x", + replacement: "$01", + valid: true, + expanded: "x", + }, + // A `$` that cannot start a reference is literal. + Case { + pattern: "foo", + haystack: "foo", + replacement: "$ price", + valid: true, + expanded: "$ price", + }, + Case { + pattern: "foo", + haystack: "foo", + replacement: "100$", + valid: true, + expanded: "100$", + }, + Case { + pattern: "foo", + haystack: "foo", + replacement: "$.x", + valid: true, + expanded: "$.x", + }, + // Unclosed brace: literal, valid. + Case { + pattern: "foo", + haystack: "foo", + replacement: "${unclosed", + valid: true, + expanded: "${unclosed", + }, + // Empty braces ARE a reference (to the empty name) — a group + // can never be named "", so expand drops it: invalid. + Case { + pattern: "foo", + haystack: "foo", + replacement: "a${}b", + valid: false, + expanded: "ab", + }, + // `$` followed by non-ASCII: `é` is not a cap letter, so the + // `$` is literal (UTF-8 boundary handling in the tokenizer + // must not split the multi-byte char). + Case { + pattern: "foo", + haystack: "foo", + replacement: "$é", + valid: true, + expanded: "$é", + }, + // A numeric run too large for usize (2^64) falls back to a + // NAMED reference — undefined, expands to the empty string. + Case { + pattern: "foo", + haystack: "foo", + replacement: "$18446744073709551616", + valid: false, + expanded: "", + }, + // Defined-but-non-participating groups are VALID (they may be + // empty at runtime; that is not an undefined reference). + Case { + pattern: "(a)|(b)", + haystack: "b", + replacement: "<$1$2>", + valid: true, + expanded: "", + }, + ]; + for c in &cases { + let re = regex::Regex::new(c.pattern).unwrap(); + let verdict = validate_replacement_refs(&re, c.pattern, c.replacement); + assert_eq!( + verdict.is_ok(), + c.valid, + "validator verdict for replacement {:?} against pattern {:?} \ + (got: {verdict:?})", + c.replacement, + c.pattern + ); + let caps = re.captures(c.haystack).expect("haystack must match"); + let mut out = String::new(); + caps.expand(c.replacement, &mut out); + assert_eq!( + out, c.expanded, + "expand() output for replacement {:?} against pattern {:?}", + c.replacement, c.pattern + ); + } + } +} + +#[cfg(test)] +mod handler_tests { + use super::*; + use tempfile::tempdir; + + fn setup() -> (tempfile::TempDir, Arc, Arc) { + let tmp = tempdir().unwrap(); + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + non_accessible_globs: vec!["**/.env".to_string()], + max_read_bytes: 1024 * 1024, + max_write_bytes: 1024 * 1024, + ..CoderConfig::default() + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + (tmp, resolver, cfg) + } + + #[tokio::test] + async fn end_to_end_single_file_update_lines_writes_atomically() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.txt"), "1\n2\n3\n").unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::UpdateLines { + from_line: 2, + to_line: 2, + content: "TWO".into(), + }], + }], + }, + ) + .await + .unwrap(); + assert_eq!(out.results.len(), 1); + let r0 = &out.results[0]; + assert!(r0.success, "got: {:?}", r0.error); + assert_eq!(r0.applied, 1); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "1\nTWO\n3\n" + ); + } + + #[tokio::test] + async fn end_to_end_regex_replace() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.txt"), "foo bar foo\n").unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: "foo".into(), + replacement: "baz".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }], + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "baz bar baz\n" + ); + } + + #[tokio::test] + async fn mixed_update_lines_then_regex_replace() { let (tmp, r, c) = setup(); std::fs::write(tmp.path().join("a.txt"), "OLD\nkeep\nOLD\n").unwrap(); let out = handle( @@ -839,6 +1952,8 @@ mod handler_tests { pattern: "OLD".into(), replacement: "NEW".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }, ], }], @@ -868,6 +1983,8 @@ mod handler_tests { pattern: "=".into(), replacement: "=\n".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }], }], }, @@ -876,12 +1993,15 @@ mod handler_tests { .unwrap(); assert!(out.results[0].success); assert_eq!(out.results[0].new_line_count, 2); - assert_eq!(out.results[0].before.as_deref(), Some("a=b\n")); - assert_eq!(out.results[0].after.as_deref(), Some("a=\nb\n")); + // No before/after fields; verify via file content and echo. + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "a=\nb\n" + ); } #[tokio::test] - async fn success_includes_utf8_before_and_after_snapshots() { + async fn success_includes_echo_not_full_body() { let (tmp, r, c) = setup(); std::fs::write(tmp.path().join("a.txt"), "old\n").unwrap(); let out = handle( @@ -900,8 +2020,81 @@ mod handler_tests { ) .await .unwrap(); - assert_eq!(out.results[0].before.as_deref(), Some("old\n")); - assert_eq!(out.results[0].after.as_deref(), Some("new\n")); + let r0 = &out.results[0]; + assert!(r0.success); + // No full-body fields; echoes present. + assert!(!r0.echoes.is_empty(), "echo should be present on success"); + assert!(r0.echoes[0].lines.contains(&"new".to_string())); + } + + #[tokio::test] + async fn echo_insert_appears_in_handler_result() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.txt"), "1\n2\n3\n").unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Insert { + at_line: 2, + content: "inserted".into(), + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(r0.success); + assert!(!r0.echoes.is_empty()); + let e = &r0.echoes[0]; + assert!( + e.lines.contains(&"inserted".to_string()), + "echo should contain inserted: {e:?}" + ); + } + + #[tokio::test] + async fn echo_regex_replace_site_appears_in_result() { + let (tmp, r, c) = setup(); + std::fs::write( + tmp.path().join("a.txt"), + "hello world\nfoo bar\nhello again\n", + ) + .unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: "hello".into(), + replacement: "HI".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(r0.success); + // Should have echo(es) showing match sites. + let all_lines: Vec<&str> = r0 + .echoes + .iter() + .flat_map(|e| e.lines.iter().map(|s| s.as_str())) + .collect(); + assert!( + all_lines.iter().any(|l| l.contains("HI")), + "echo should contain replaced text; echoes: {:?}", + r0.echoes + ); } #[tokio::test] @@ -935,7 +2128,7 @@ mod handler_tests { .unwrap(); assert_eq!(out.results.len(), 2); assert!(!out.results[0].success); - assert!(out.results[0].error.as_deref().unwrap().contains("C211")); + assert_eq!(out.results[0].error.as_ref().unwrap().code, "C211"); assert!(out.results[1].success); assert_eq!( std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), @@ -943,6 +2136,59 @@ mod handler_tests { ); } + // C211 IDENTICAL-WORDING (REDACTION INVARIANT), tested end-to-end + // through the REAL handler: a genuinely missing file and a glob-denied + // file must produce byte-identical message suffixes after the + // caller-supplied path prefix, so callers cannot distinguish the two. + #[tokio::test] + async fn c211_wording_identical_for_missing_and_glob_denied() { + let (tmp, r, c) = setup(); + // ".env" exists on disk but matches the non-accessible glob; + // "missing.txt" does not exist at all. + std::fs::write(tmp.path().join(".env"), "secret").unwrap(); + let spec = |path: &str| UpdateFileSpec { + path: path.into(), + ops: vec![UpdateOp::Insert { + at_line: 1, + content: "x".into(), + }], + }; + let out = handle( + r, + c, + UpdateFileInput { + files: vec![spec("missing.txt"), spec(".env")], + }, + ) + .await + .unwrap(); + let missing_err = out.results[0].error.as_ref().expect("missing errors"); + let denied_err = out.results[1].error.as_ref().expect("denied errors"); + + assert_eq!(missing_err.code, "C211", "missing code wrong"); + assert_eq!(denied_err.code, "C211", "denied code wrong"); + + // Strip the ": " prefix; the remainder must be byte-identical. + let m_msg = &missing_err.message; + let d_msg = &denied_err.message; + let m_suffix = m_msg + .strip_prefix("missing.txt: ") + .expect("missing message starts with its wire path"); + let d_suffix = d_msg + .strip_prefix(".env: ") + .expect("denied message starts with its wire path"); + assert_eq!( + m_suffix, d_suffix, + "C211 wording must not distinguish missing from denied; \ + missing: {m_msg}; denied: {d_msg}" + ); + // Neither message may carry raw OS error detail. + assert!( + !m_msg.contains("os error") && !d_msg.contains("os error"), + "raw OS text leaked; missing: {m_msg}; denied: {d_msg}" + ); + } + #[tokio::test] async fn original_file_untouched_when_ops_invalid() { let (tmp, r, c) = setup(); @@ -985,4 +2231,1083 @@ mod handler_tests { .unwrap_err(); assert!(err.contains("C210")); } + + // ----------------------------------------------------------------------- + // Echo scenarios, end-to-end through the real handler. + // ----------------------------------------------------------------------- + + /// Write `content`, run `ops` against it, return the single result. + async fn run_ops(content: &str, ops: Vec) -> UpdateFileResult { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("f.txt"), content).unwrap(); + let mut out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "f.txt".into(), + ops, + }], + }, + ) + .await + .unwrap(); + out.results.remove(0) + } + + #[tokio::test] + async fn echo_insert_correct_post_apply_region() { + // a..e, insert "X\nY" before line 3 → a,b,X,Y,c,d,e. + // Region [3,4] ±2 context → from_line 1, lines a..d. + let r = run_ops( + "a\nb\nc\nd\ne\n", + vec![UpdateOp::Insert { + at_line: 3, + content: "X\nY".into(), + }], + ) + .await; + assert!(r.success, "got: {:?}", r.error); + assert!(!r.echoes_truncated); + assert_eq!(r.echoes.len(), 1); + let e = &r.echoes[0]; + assert_eq!(e.op_index, 0); + assert_eq!(e.from_line, 1); + assert_eq!(e.lines, vec!["a", "b", "X", "Y", "c", "d"]); + assert_eq!(e.elided, None); + assert_eq!(e.total_replacements, None); + } + + #[tokio::test] + async fn echo_remove_shows_context_around_gap() { + // a..e, remove line 3 → a,b,d,e. Gap anchor at line 2 ±2 → all 4. + let r = run_ops( + "a\nb\nc\nd\ne\n", + vec![UpdateOp::Remove { + from_line: 3, + to_line: 3, + }], + ) + .await; + assert!(r.success); + assert_eq!(r.echoes.len(), 1); + let e = &r.echoes[0]; + assert_eq!(e.from_line, 1); + assert_eq!(e.lines, vec!["a", "b", "d", "e"]); + } + + #[tokio::test] + async fn echo_update_lines_correct_content_and_position() { + // a,b,c,d → update 2..3 with X,Y,Z → a,X,Y,Z,d. + let r = run_ops( + "a\nb\nc\nd\n", + vec![UpdateOp::UpdateLines { + from_line: 2, + to_line: 3, + content: "X\nY\nZ".into(), + }], + ) + .await; + assert!(r.success); + assert_eq!(r.echoes.len(), 1); + let e = &r.echoes[0]; + assert_eq!(e.from_line, 1); + assert_eq!(e.lines, vec!["a", "X", "Y", "Z", "d"]); + } + + #[tokio::test] + async fn echo_two_line_ops_offset_correctness() { + // 1..10; op 0 inserts "X" before line 2 (delta +1), op 1 updates + // original line 5 → its post-apply position must be 6. + let content = "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"; + let r = run_ops( + content, + vec![ + UpdateOp::Insert { + at_line: 2, + content: "X".into(), + }, + UpdateOp::UpdateLines { + from_line: 5, + to_line: 5, + content: "FIVE".into(), + }, + ], + ) + .await; + assert!(r.success); + assert_eq!(r.echoes.len(), 2, "echoes: {:?}", r.echoes); + + let e0 = r.echoes.iter().find(|e| e.op_index == 0).unwrap(); + assert!(e0.lines.contains(&"X".to_string()), "echo0: {e0:?}"); + assert_eq!(e0.from_line, 1); + + // FIVE sits at post-apply line 6; ±2 context → window starts at 4. + let e1 = r.echoes.iter().find(|e| e.op_index == 1).unwrap(); + assert_eq!(e1.from_line, 4, "echo1: {e1:?}"); + assert_eq!(e1.lines, vec!["3", "4", "FIVE", "6", "7"]); + } + + #[tokio::test] + async fn echo_pathological_1000_line_op_stays_bounded() { + let big: Vec = (0..1000).map(|i| format!("L{i}")).collect(); + let r = run_ops( + "old\n", + vec![UpdateOp::UpdateLines { + from_line: 1, + to_line: 1, + content: big.join("\n"), + }], + ) + .await; + assert!(r.success); + assert_eq!(r.echoes.len(), 1); + let e = &r.echoes[0]; + assert_eq!(e.lines.len(), ECHO_HEAD_TAIL * 2); + assert_eq!(e.elided, Some(1000 - ECHO_HEAD_TAIL as u64 * 2)); + } + + #[tokio::test] + async fn echo_budget_truncation_sets_flag() { + // 100 single-line update ops over ~205-byte lines: each echo is + // ~1 KiB (5 context lines), so the ~4 KiB budget exhausts after a + // few echoes and the flag MUST be set. + let long = "X".repeat(200); + let content: String = (0..200).map(|i| format!("{i}:{long}\n")).collect(); + let ops: Vec = (0..100) + .map(|i| UpdateOp::UpdateLines { + from_line: i + 1, + to_line: i + 1, + content: format!("{i}:{long}"), + }) + .collect(); + let r = run_ops(&content, ops).await; + assert!(r.success); + assert!( + r.echoes_truncated, + "echoes_truncated must be set when the budget is exhausted; \ + emitted {} echoes", + r.echoes.len() + ); + assert!(!r.echoes.is_empty(), "first echo is always admitted"); + assert!(r.echoes.len() < 100, "remaining echoes must be dropped"); + } + + /// Wire contract: `echoes` and `echoes_truncated` are ALWAYS serialized, + /// matching the schema's required[] — including on failure results. + #[tokio::test] + async fn wire_always_emits_echo_fields() { + let r = run_ops( + "a\n", + vec![UpdateOp::Remove { + from_line: 5, + to_line: 9, + }], + ) + .await; + assert!(!r.success); + let v = serde_json::to_value(&r).unwrap(); + assert_eq!(v["echoes"], serde_json::json!([])); + assert_eq!(v["echoes_truncated"], serde_json::json!(false)); + } + + // ----------------------------------------------------------------------- + // T6 review regressions (A1–A4): permanent reviewer repros. + // ----------------------------------------------------------------------- + + /// A1: a regex whose matches span `\n` collapses many lines; the old + /// site indices pointed beyond the final body and panicked in the echo + /// arithmetic. Must never panic; echoes stay bounded and truthful. + #[tokio::test] + async fn a1_multiline_collapsing_regex_does_not_panic() { + let content = "k1\nk2\nk3\nk4\nk5\nk6\nk7\nk8\nk9\nz\n"; + let r = run_ops( + content, + vec![UpdateOp::Replace { + pattern: r"k\d\n".into(), + replacement: "".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }], + ) + .await; + assert!(r.success, "got: {:?}", r.error); + assert_eq!(r.new_line_count, 1); + // Sites capped at 5, all collapsed onto the surviving line. + assert_eq!(r.echoes.len(), ECHO_MAX_SITES); + for e in &r.echoes { + assert_eq!(e.op_index, 0); + assert_eq!(e.from_line, 1); + assert_eq!(e.lines, vec!["z"]); + assert_eq!(e.total_replacements, Some(9)); + } + + // Same regex deleting the ENTIRE body: bounded empty echoes. + let r = run_ops( + "k1\nk2\n", + vec![UpdateOp::Replace { + pattern: r"k\d\n".into(), + replacement: "".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }], + ) + .await; + assert!(r.success); + assert_eq!(r.new_line_count, 0); + for e in &r.echoes { + assert!(e.lines.is_empty()); + assert_eq!(e.total_replacements, Some(2)); + } + } + + /// A2: replacements that ADD newlines shift later sites; each site must + /// report its POST-APPLY position (1/12/23), not its pre-replace line. + /// R4 (v0.4.1): each 11-line region echoes its FIRST and LAST line + /// with the 9 inner lines elided. + #[tokio::test] + async fn a2_regex_sites_report_post_apply_positions() { + // Replace each "X" with 11 lines (delta +10 per match). + let repl: Vec = (1..=11).map(|i| format!("L{i}")).collect(); + let r = run_ops( + "X\nX\nX\n", + vec![UpdateOp::Replace { + pattern: "X".into(), + replacement: repl.join("\n"), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }], + ) + .await; + assert!(r.success, "got: {:?}", r.error); + assert_eq!(r.new_line_count, 33); + assert_eq!(r.echoes.len(), 3); + let from_lines: Vec = r.echoes.iter().map(|e| e.from_line).collect(); + assert_eq!(from_lines, vec![1, 12, 23], "echoes: {:?}", r.echoes); + for e in &r.echoes { + assert_eq!(e.op_index, 0); + assert_eq!( + e.lines, + vec!["L1", "L11"], + "site shows the region's first AND last line" + ); + assert_eq!(e.elided, Some(9), "inner lines elided"); + assert_eq!(e.total_replacements, Some(3)); + } + } + + /// A3: each regex op matches on its OWN input body. The second op's + /// pattern only exists AFTER the first op ran, so it must still find + /// (and echo) its site — the old single-pass site scan found nothing. + #[tokio::test] + async fn a3_sequential_regex_ops_attribute_sites_to_own_input() { + let r = run_ops( + "alpha\nmiddle\n", + vec![ + UpdateOp::Replace { + pattern: "alpha".into(), + replacement: "beta".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }, + UpdateOp::Replace { + pattern: "beta".into(), + replacement: "gamma".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }, + ], + ) + .await; + assert!(r.success); + let e0 = r + .echoes + .iter() + .find(|e| e.op_index == 0) + .expect("op 0 echo"); + let e1 = r.echoes.iter().find(|e| e.op_index == 1).expect( + "op 1 must get an echo: its pattern matches on its OWN input \ + body (which contains beta), not the pre-replace body", + ); + // Both sites are line 1; content comes from the FINAL body, so both + // truthfully show what the site looks like now. + assert_eq!(e1.from_line, 1); + assert_eq!(e1.lines, vec!["gamma"]); + assert_eq!(e1.total_replacements, Some(1)); + assert_eq!(e0.from_line, 1); + assert_eq!(e0.lines, vec!["gamma"]); + assert_eq!(e0.total_replacements, Some(1)); + } + + /// A4: line-op echoes must shift through regex deltas too. An insert at + /// line 4 followed by a regex collapsing the 3 lines above it must be + /// echoed at its TRUE final position (line 1). + #[tokio::test] + async fn a4_insert_echo_shifts_through_regex_deltas() { + let r = run_ops( + "k1\nk2\nk3\nz\n", + vec![ + UpdateOp::Insert { + at_line: 4, + content: "INSERTED".into(), + }, + UpdateOp::Replace { + pattern: r"k\d\n".into(), + replacement: "".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }, + ], + ) + .await; + assert!(r.success, "got: {:?}", r.error); + // Final body: INSERTED\nz\n + assert_eq!(r.new_line_count, 2); + let e0 = r + .echoes + .iter() + .find(|e| e.op_index == 0) + .expect("insert echo"); + assert_eq!( + e0.from_line, 1, + "insert region must map through the -3 regex delta" + ); + assert_eq!(e0.lines[0], "INSERTED", "echo: {e0:?}"); + } + + // ----------------------------------------------------------------------- + // S3 (v0.4.0): replace-op hardening — dot_matches_newline + expect_matches. + // ----------------------------------------------------------------------- + + /// Default `dot_matches_newline: false` keeps current behavior: `.` + /// does not cross newlines, the multi-line pattern matches nothing, + /// and 0 matches with no expectation is still a successful no-op + /// (no sites → no echoes, file byte-identical). + #[tokio::test] + async fn s3_dot_default_false_multiline_pattern_matches_nothing() { + let (tmp, r, c) = setup(); + let original = "start\nmiddle\nend\n"; + std::fs::write(tmp.path().join("a.txt"), original).unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: "start.*?end".into(), + replacement: "X".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: None, + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(r0.success, "0 matches without expectation: {:?}", r0.error); + assert!( + r0.echoes.is_empty(), + "no sites → no echoes: {:?}", + r0.echoes + ); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + original + ); + } + + /// `dot_matches_newline: true` lets `start.*?end` span lines. The + /// replace collapses 4 lines into 1 (delta -3); both the regex site + /// echo and a line-op echo captured below it must land at FINAL-body + /// coordinates (extends the a1–a4 echo math to a multi-line-delta + /// replace driven by the new flag). + #[tokio::test] + async fn s3_dot_matches_newline_replaces_region_with_correct_echoes() { + let (tmp, r, c) = setup(); + std::fs::write( + tmp.path().join("a.txt"), + "before\nstart\nmid1\nmid2\nend\nafter\n", + ) + .unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![ + UpdateOp::Insert { + at_line: 7, + content: "TAIL".into(), + }, + UpdateOp::Replace { + pattern: "start.*?end".into(), + replacement: "ONE".into(), + ignore_case: false, + dot_matches_newline: true, + expect_matches: None, + }, + ], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(r0.success, "got: {:?}", r0.error); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "before\nONE\nafter\nTAIL\n" + ); + assert_eq!(r0.new_line_count, 4); + // Regex site: matched at line 2 of its own input, context 0. + let site = r0 + .echoes + .iter() + .find(|e| e.op_index == 1) + .expect("replace site echo"); + assert_eq!(site.from_line, 2, "site: {site:?}"); + assert_eq!(site.lines, vec!["ONE"]); + assert_eq!(site.total_replacements, Some(1)); + // Insert echo captured at line 7 must shift through the -3 delta + // of the multi-line replace to final line 4 (±2 context → 2). + let ins = r0 + .echoes + .iter() + .find(|e| e.op_index == 0) + .expect("insert echo"); + assert!(ins.lines.contains(&"TAIL".to_string()), "echo: {ins:?}"); + assert_eq!(ins.from_line, 2, "echo: {ins:?}"); + } + + /// `expect_matches: 1` with exactly 1 actual match applies normally. + #[tokio::test] + async fn s3_expect_matches_exact_count_applies() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.txt"), "foo bar\n").unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: "foo".into(), + replacement: "X".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(r0.success, "got: {:?}", r0.error); + assert_eq!(r0.echoes[0].total_replacements, Some(1)); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "X bar\n" + ); + } + + /// `expect_matches: 1` against 3 actual matches: the FILE fails with a + /// per-entry C210 naming both counts, nothing is written for it (bytes + /// identical), no echoes are emitted for it — and the OTHER file in + /// the same batch still applies with normal echoes. + #[tokio::test] + async fn s3_expect_matches_mismatch_fails_file_other_file_applies() { + let (tmp, r, c) = setup(); + let original = "foo\nfoo\nfoo\n"; + std::fs::write(tmp.path().join("a.txt"), original).unwrap(); + std::fs::write(tmp.path().join("b.txt"), "foo\n").unwrap(); + let replace = |expect: Option| UpdateOp::Replace { + pattern: "foo".into(), + replacement: "bar".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: expect, + }; + let out = handle( + r, + c, + UpdateFileInput { + files: vec![ + UpdateFileSpec { + path: "a.txt".into(), + ops: vec![replace(Some(1))], + }, + UpdateFileSpec { + path: "b.txt".into(), + ops: vec![replace(None)], + }, + ], + }, + ) + .await + .unwrap(); + let ra = &out.results[0]; + assert!(!ra.success); + let err = ra.error.as_ref().expect("a.txt errors"); + assert_eq!(err.code, "C210"); + assert!( + err.message.contains("\"foo\" matched 3 times, expected 1"), + "message must name the failing pattern plus actual and expected: {}", + err.message + ); + assert!( + err.message.contains("expect_matches: 3"), + "message must offer the corrective value: {}", + err.message + ); + assert!(ra.echoes.is_empty(), "failed file emits no echoes"); + assert_eq!( + std::fs::read(tmp.path().join("a.txt")).unwrap(), + original.as_bytes(), + "mismatching file must be byte-identical on disk" + ); + let rb = &out.results[1]; + assert!(rb.success, "per-entry isolation: {:?}", rb.error); + assert_eq!(rb.echoes[0].total_replacements, Some(1)); + assert_eq!( + std::fs::read_to_string(tmp.path().join("b.txt")).unwrap(), + "bar\n" + ); + } + + /// `expect_matches: 2` against 0 actual matches: C210 routes the + /// agent to coder::search / dot_matches_newline; file unchanged. + #[tokio::test] + async fn s3_expect_matches_zero_actual_routes_to_search() { + let (tmp, r, c) = setup(); + let original = "hello\n"; + std::fs::write(tmp.path().join("a.txt"), original).unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: "missing".into(), + replacement: "x".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(2), + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(!r0.success); + let err = r0.error.as_ref().expect("errors"); + assert_eq!(err.code, "C210"); + assert!( + err.message + .contains("\"missing\" matched 0 times, expected 2"), + "message must name the failing pattern plus the counts: {}", + err.message + ); + assert!( + err.message.contains("coder::search"), + "0-match mismatch must route to coder::search: {}", + err.message + ); + assert!( + err.message.contains("dot_matches_newline"), + "0-match mismatch must hint at dot_matches_newline: {}", + err.message + ); + assert_eq!( + std::fs::read(tmp.path().join("a.txt")).unwrap(), + original.as_bytes() + ); + } + + /// `expect_matches: 0` (assert-absent) against 1 actual match: C210 + /// routes to coder::search and must NOT suggest `expect_matches: 1` — + /// that would invert the caller's assert-absent intent. File unchanged. + #[tokio::test] + async fn s3_expect_matches_assert_absent_mismatch_routes_to_search() { + let (tmp, r, c) = setup(); + let original = "legacy_api()\n"; + std::fs::write(tmp.path().join("a.txt"), original).unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: "legacy_api".into(), + replacement: "x".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(0), + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(!r0.success); + let err = r0.error.as_ref().expect("errors"); + assert_eq!(err.code, "C210"); + assert!( + err.message + .contains("\"legacy_api\" matched 1 time, expected 0"), + "message must name the pattern and counts: {}", + err.message + ); + assert!( + err.message.contains("coder::search"), + "assert-absent mismatch must route to coder::search: {}", + err.message + ); + assert!( + !err.message.contains("expect_matches:"), + "must not suggest setting expect_matches to the observed \ + count — that inverts assert-absent intent: {}", + err.message + ); + assert_eq!( + std::fs::read(tmp.path().join("a.txt")).unwrap(), + original.as_bytes() + ); + } + + /// Long patterns are truncated in the mismatch message (first 60 chars + /// + `…`) so multi-op files stay disambiguated without flooding. + #[test] + fn pattern_snippet_truncates_long_patterns() { + let short = "a".repeat(PATTERN_SNIPPET_MAX_CHARS); + assert_eq!(pattern_snippet(&short), format!("\"{short}\"")); + let long = "b".repeat(PATTERN_SNIPPET_MAX_CHARS + 10); + let shown = pattern_snippet(&long); + assert_eq!( + shown, + format!("\"{}…\"", "b".repeat(PATTERN_SNIPPET_MAX_CHARS)) + ); + // Char-boundary safe on multi-byte input. + let emoji = "é".repeat(PATTERN_SNIPPET_MAX_CHARS + 1); + assert_eq!( + pattern_snippet(&emoji), + format!("\"{}…\"", "é".repeat(PATTERN_SNIPPET_MAX_CHARS)) + ); + } + + /// Mismatch messages pluralize the match count correctly. + #[test] + fn times_pluralizes_match_counts() { + assert_eq!(times(0), "0 times"); + assert_eq!(times(1), "1 time"); + assert_eq!(times(3), "3 times"); + } + + /// Multiple ops in one file: op 0 would apply cleanly, but op 1's + /// expect_matches mismatch fails the WHOLE file — op 0's would-be + /// change must not reach disk (all-or-nothing per file). + #[tokio::test] + async fn s3_second_op_expect_mismatch_rolls_back_whole_file() { + let (tmp, r, c) = setup(); + let original = "alpha\nkeep\n"; + std::fs::write(tmp.path().join("a.txt"), original).unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![ + UpdateOp::Replace { + pattern: "alpha".into(), + replacement: "beta".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }, + UpdateOp::Replace { + pattern: "nope".into(), + replacement: "x".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }, + ], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(!r0.success, "second op's mismatch must fail the file"); + assert_eq!(r0.error.as_ref().unwrap().code, "C210"); + assert!(r0.echoes.is_empty()); + assert_eq!( + std::fs::read(tmp.path().join("a.txt")).unwrap(), + original.as_bytes(), + "op 0's would-be change must not land when op 1 fails" + ); + } + + // ----------------------------------------------------------------------- + // R1 (v0.4.1): pre-write validation of replacement capture references. + // ----------------------------------------------------------------------- + + /// R1 — the EXACT production repro (live session q8x6g248): a harness + /// agent rewrote a JS handler with a replacement containing the + /// template literal `Hello, ${name}!`. `Captures::expand` treated + /// `${name}` as a capture reference, the pattern defines no group + /// `name`, and the regex crate expands undefined references to the + /// EMPTY STRING — the file got `Hello, !` with success: true (silent + /// corruption, twice in one session). The reference must now fail + /// pre-write with a C210 naming `name`; the file stays byte-identical + /// on disk and other files in the batch still apply. + #[tokio::test] + async fn r1_production_repro_template_literal_ref_rejected_pre_write() { + let (tmp, r, c) = setup(); + let original = "import { iii } from 'iii';\n\n\ + iii.registerFunction({\n \ + handler: () => ({ body: { message: 'hi' } }),\n});\n"; + std::fs::write(tmp.path().join("handler.js"), original).unwrap(); + std::fs::write(tmp.path().join("other.txt"), "foo\n").unwrap(); + let replacement = "iii.registerFunction({\n \ + handler: (req) => {\n \ + const name = req.query.name ?? 'world';\n \ + return { status: 200, body: { message: `Hello, ${name}!` } };\n \ + },\n});\n"; + let out = handle( + r, + c, + UpdateFileInput { + files: vec![ + UpdateFileSpec { + path: "handler.js".into(), + ops: vec![UpdateOp::Replace { + pattern: r"iii\.registerFunction\(.*".into(), + replacement: replacement.into(), + ignore_case: false, + dot_matches_newline: true, + expect_matches: Some(1), + }], + }, + UpdateFileSpec { + path: "other.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: "foo".into(), + replacement: "bar".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + }, + ], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(!r0.success, "undefined ${{name}} must fail pre-write"); + let err = r0.error.as_ref().expect("entry must carry an error"); + assert_eq!(err.code, "C210"); + assert!( + err.message.contains("`${name}`"), + "message must name the offending reference: {}", + err.message + ); + assert!( + err.message.contains("0 capture groups"), + "message must state what the pattern defines: {}", + err.message + ); + assert!( + err.message.contains("$$"), + "message must teach the $$ escape: {}", + err.message + ); + assert!( + err.message.contains("template literal"), + "message must name the JS/TS template-literal collision: {}", + err.message + ); + assert!(r0.echoes.is_empty(), "failed file emits no echoes"); + assert_eq!( + std::fs::read(tmp.path().join("handler.js")).unwrap(), + original.as_bytes(), + "file must be byte-identical on disk — NOTHING written" + ); + // Per-entry isolation: the other file in the batch still applies. + let r1 = &out.results[1]; + assert!(r1.success, "batch isolation: {:?}", r1.error); + assert_eq!( + std::fs::read_to_string(tmp.path().join("other.txt")).unwrap(), + "bar\n" + ); + } + + /// R1 — `$$` escape: the corrective rewrite from the C210 message + /// (`$${name}`) must write a literal `${name}` to disk. + #[tokio::test] + async fn r1_dollar_dollar_escape_writes_literal_dollar() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.js"), "MSG\n").unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.js".into(), + ops: vec![UpdateOp::Replace { + pattern: "MSG".into(), + replacement: "`Hello, $${name}!`".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(r0.success, "got: {:?}", r0.error); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.js")).unwrap(), + "`Hello, ${name}!`\n" + ); + } + + /// R1 — the unbraced longest-run gotcha: `$1a` is a reference to a + /// group NAMED "1a" (NOT group 1 followed by "a"); with only group 1 + /// defined it must fail C210 naming `1a`. The braced disambiguation + /// `${1}a` applies fine. + #[tokio::test] + async fn r1_unbraced_longest_run_gotcha_dollar_1a() { + let (tmp, r, c) = setup(); + let original = "x\n"; + std::fs::write(tmp.path().join("a.txt"), original).unwrap(); + let op = |replacement: &str| UpdateOp::Replace { + pattern: "(x)".into(), + replacement: replacement.into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }; + let out = handle( + r.clone(), + c.clone(), + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![op("$1a")], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(!r0.success, "$1a names group `1a`, which does not exist"); + let err = r0.error.as_ref().unwrap(); + assert_eq!(err.code, "C210"); + assert!( + err.message.contains("`$1a`") && err.message.contains("`1a`"), + "message must name the `1a` reference: {}", + err.message + ); + assert_eq!( + std::fs::read(tmp.path().join("a.txt")).unwrap(), + original.as_bytes() + ); + // The braced disambiguation works. + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![op("${1}a")], + }], + }, + ) + .await + .unwrap(); + assert!(out.results[0].success, "got: {:?}", out.results[0].error); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "xa\n" + ); + } + + /// R1 — valid rewrites keep working unchanged: `$0`, `$1` with a + /// group, `$name` with a named group. + #[tokio::test] + async fn r1_valid_capture_rewrites_still_apply() { + let (tmp, r, c) = setup(); + std::fs::write(tmp.path().join("a.txt"), "HOST=8080\n").unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + pattern: r"(?P\w+)=(\d+)".into(), + replacement: "$key: $2 (was $0)".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(r0.success, "got: {:?}", r0.error); + assert_eq!( + std::fs::read_to_string(tmp.path().join("a.txt")).unwrap(), + "HOST: 8080 (was HOST=8080)\n" + ); + } + + /// R1 — undefined-ref validation precedes the `expect_matches` + /// guard: even with `expect_matches: 0` (assert-absence, where the + /// replacement goes unused on success) an undefined reference fails + /// with the capture-ref C210 — the replacement must be well-formed + /// even when unused. + #[tokio::test] + async fn r1_undefined_ref_rejected_even_with_expect_matches_zero() { + let (tmp, r, c) = setup(); + let original = "plain\n"; + std::fs::write(tmp.path().join("a.txt"), original).unwrap(); + let out = handle( + r, + c, + UpdateFileInput { + files: vec![UpdateFileSpec { + path: "a.txt".into(), + ops: vec![UpdateOp::Replace { + // Matches nothing: expect_matches: 0 alone would + // succeed without ever using the replacement. + pattern: "missing".into(), + replacement: "`Hi, ${name}!`".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(0), + }], + }], + }, + ) + .await + .unwrap(); + let r0 = &out.results[0]; + assert!(!r0.success, "undefined ref must fail even when unused"); + let err = r0.error.as_ref().expect("entry must carry an error"); + assert_eq!(err.code, "C210"); + assert!( + err.message.contains("references capture group `${name}`"), + "the capture-ref C210 (not the mismatch wording) must fire: {}", + err.message + ); + assert_eq!( + std::fs::read(tmp.path().join("a.txt")).unwrap(), + original.as_bytes() + ); + } + + // ----------------------------------------------------------------------- + // R4 (v0.4.1): multi-line replace site echoes show first AND last line. + // ----------------------------------------------------------------------- + + /// R4 — a replace whose post-replace region spans >2 lines echoes the + /// FIRST and LAST line of the region with `elided` set to the inner + /// line count. The old matched-first-line-only echo hid everything + /// below the region's first line: in production (session q8x6g248) + /// the corrupted "Hello, !" line sat in the tail of the replaced + /// region and stayed invisible until a full read-file — with this + /// echo shape the region's tail is visible in the mutation response. + #[tokio::test] + async fn r4_multiline_replace_echo_shows_first_and_last_lines() { + let r = run_ops( + "before\nstart\nmid\nend\nafter\n", + vec![UpdateOp::Replace { + pattern: "start.*?end".into(), + replacement: "NEW_HEAD\ninner1\ninner2\nNEW_TAIL".into(), + ignore_case: false, + dot_matches_newline: true, + expect_matches: Some(1), + }], + ) + .await; + assert!(r.success, "got: {:?}", r.error); + assert_eq!(r.new_line_count, 6); + assert_eq!(r.echoes.len(), 1); + let e = &r.echoes[0]; + assert_eq!(e.op_index, 0); + assert_eq!(e.from_line, 2); + assert_eq!( + e.lines, + vec!["NEW_HEAD", "NEW_TAIL"], + "echo must show the region's first AND last line: {e:?}" + ); + assert_eq!(e.elided, Some(2), "inner line count elided"); + assert_eq!(e.total_replacements, Some(1)); + } + + /// R4 — a two-line region echoes both lines with no elision. + #[tokio::test] + async fn r4_two_line_replace_echo_shows_both_lines_no_elision() { + let r = run_ops( + "a\nTARGET\nz\n", + vec![UpdateOp::Replace { + pattern: "TARGET".into(), + replacement: "first\nlast".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + ) + .await; + assert!(r.success, "got: {:?}", r.error); + assert_eq!(r.echoes.len(), 1); + let e = &r.echoes[0]; + assert_eq!(e.from_line, 2); + assert_eq!(e.lines, vec!["first", "last"]); + assert_eq!(e.elided, None); + } + + /// R4 — single-line replacements are unchanged: one line, no elision. + #[tokio::test] + async fn r4_single_line_replace_echo_unchanged() { + let r = run_ops( + "a\nTARGET tail\nz\n", + vec![UpdateOp::Replace { + pattern: "TARGET".into(), + replacement: "DONE".into(), + ignore_case: false, + dot_matches_newline: false, + expect_matches: Some(1), + }], + ) + .await; + assert!(r.success, "got: {:?}", r.error); + assert_eq!(r.echoes.len(), 1); + let e = &r.echoes[0]; + assert_eq!(e.from_line, 2); + assert_eq!(e.lines, vec!["DONE tail"]); + assert_eq!(e.elided, None); + } } diff --git a/coder/src/lib.rs b/coder/src/lib.rs index 575c4a47..3eae7f67 100644 --- a/coder/src/lib.rs +++ b/coder/src/lib.rs @@ -3,6 +3,7 @@ //! share source files via Cargo's two-target compile. pub mod config; +pub mod configuration; pub mod error; pub mod functions; pub mod manifest; diff --git a/coder/src/main.rs b/coder/src/main.rs index c196851c..9ed02fa6 100644 --- a/coder/src/main.rs +++ b/coder/src/main.rs @@ -1,10 +1,12 @@ use std::sync::Arc; -use anyhow::Result; +use anyhow::{Context, Result}; use clap::Parser; use iii_sdk::{register_worker, InitOptions, WorkerMetadata}; +use tokio::sync::RwLock; -use coder::config; +use coder::config::CoderConfig; +use coder::configuration::{self, ConfigCell}; use coder::functions; use coder::manifest; use coder::path::PathResolver; @@ -12,8 +14,12 @@ use coder::path::PathResolver; #[derive(Parser, Debug)] #[command(name = "coder", about = "Path-jailed code worker for iii agents")] struct Cli { - #[arg(long, default_value = "./config.yaml")] - config: String, + /// Optional seed config.yaml used to populate `initial_value` on first + /// register. The AUTHORITATIVE config is always fetched from the + /// `configuration` worker afterward; this file only seeds it. Keeps + /// `--config ./config.yaml` working for deployments that pass it. + #[arg(long)] + config: Option, #[arg(long, env = "III_URL", default_value = "ws://127.0.0.1:49134")] url: String, @@ -42,43 +48,8 @@ async fn main() -> Result<()> { return Ok(()); } - let cfg = match config::load_config(&cli.config) { - Ok(c) => { - tracing::info!( - base_path = %c.base_path.display(), - non_accessible_globs = c.non_accessible_globs.len(), - "loaded config from {}", - cli.config - ); - c - } - Err(e) => { - tracing::warn!( - error = %e, - path = %cli.config, - "failed to load config, using defaults" - ); - config::CoderConfig::default() - } - }; - let cfg = Arc::new(cfg); - - let resolver = match PathResolver::new(&cfg) { - Ok(r) => Arc::new(r), - Err(e) => { - // base_path canonicalization failure is operator config — - // refuse to start instead of degrading silently. - anyhow::bail!( - "failed to initialise PathResolver: {} (check `base_path` and `non_accessible_globs`)", - e.code() - ); - } - }; - tracing::info!( - base_root = %resolver.base_root().display(), - "path resolver ready" - ); - + // 2. register_worker FIRST so the configuration round-trip below runs over a + // live connection. tracing::info!(url = %cli.url, "connecting to III engine"); let iii = register_worker( &cli.url, @@ -96,7 +67,73 @@ async fn main() -> Result<()> { }, ); - functions::register_all(&iii, resolver.clone(), cfg.clone()); + // 3. Best-effort seed: a failed parse WARNS and falls through to None (does + // NOT abort) — the authoritative value comes from fetch_config. The seed + // file IS env-expanded (`${VAR}`) via CoderConfig::from_file. + let seed = match &cli.config { + Some(path) => match CoderConfig::from_file(path) { + Ok(cfg) => { + tracing::info!(path = %path, "loaded seed config for initial registration"); + Some(cfg) + } + Err(e) => { + tracing::warn!( + path = %path, + error = %e, + "failed to load seed config; relying on existing configuration entry" + ); + None + } + }, + None => None, + }; + + // 4. Register the schema + (optional) seed with the configuration worker. + configuration::register_config(&iii, seed.as_ref()) + .await + .map_err(anyhow::Error::msg) + .context("registering coder configuration schema")?; + + // 5. Fetch the AUTHORITATIVE config (env-expanded by the configuration + // worker; falls back to default inside fetch_config when unset). + let cfg = configuration::fetch_config(&iii) + .await + .map_err(anyhow::Error::msg) + .context("loading coder configuration")?; + tracing::info!( + base_path = ?cfg.base_path, + base_paths = ?cfg.base_paths, + non_accessible_globs = cfg.non_accessible_globs.len(), + "coder configuration loaded" + ); + + // 6. Build the PathResolver from the authoritative cfg — SAME as today. The + // resolver is the security jail: built once here and NEVER rebuilt at + // runtime. Zero reachable roots / conflicting root config is an operator + // error — refuse to start instead of degrading silently. Capture the + // boot jail signature BEFORE moving cfg into the snapshot cell. + let resolver = match PathResolver::new(&cfg) { + Ok(r) => Arc::new(r), + Err(e) => { + anyhow::bail!( + "failed to initialise PathResolver: {} (check `base_paths`/`base_path` and `non_accessible_globs`)", + e + ); + } + }; + tracing::info!(roots = ?resolver.roots(), "path resolver ready"); + let boot_sig = cfg.jail_signature(); + + // 7. The hot-swappable config snapshot shared with every cfg-taking handler. + let cell: ConfigCell = Arc::new(RwLock::new(Arc::new(cfg))); + + // 8. Register the RPC functions (handlers read the live snapshot per call). + functions::register_all(&iii, resolver, cell.clone()); + + // 9. LAST: bind the configuration-change trigger so its handler closes over + // the fully-built snapshot cell + the boot jail signature. + configuration::register_config_trigger(&iii, cell, boot_sig) + .context("registering configuration change trigger")?; tracing::info!("coder ready, waiting for invocations"); tokio::signal::ctrl_c().await?; diff --git a/coder/src/manifest.rs b/coder/src/manifest.rs index a3932c02..93b4d24c 100644 --- a/coder/src/manifest.rs +++ b/coder/src/manifest.rs @@ -23,7 +23,7 @@ pub fn build_manifest() -> ModuleManifest { version: env!("CARGO_PKG_VERSION").to_string(), description: DESCRIPTION.to_string(), default_config: serde_json::json!({ - "base_path": "./", + "base_paths": ["./", "/tmp"], "non_accessible_globs": [ "**/.env", "**/.env.*", diff --git a/coder/src/path/mod.rs b/coder/src/path/mod.rs index dbdcca6e..50bf5a42 100644 --- a/coder/src/path/mod.rs +++ b/coder/src/path/mod.rs @@ -1,16 +1,34 @@ //! Path resolution and access control. //! -//! All wire-facing paths are *relative to `base_path`*. `PathResolver` -//! canonicalises them (symlink-aware) and verifies they remain inside -//! `base_root` so `..` and crafted symlinks cannot escape. A `GlobSet` +//! The worker is jailed to a set of allowed roots (`base_paths`; the +//! legacy `base_path` is honored as a one-entry list). Relative wire +//! paths resolve against the FIRST root (the "primary"); absolute wire +//! paths are accepted when they canonicalise inside ANY allowed root. +//! `PathResolver` canonicalises inputs (symlink-aware) and verifies +//! containment so `..` and crafted symlinks cannot escape. A `GlobSet` //! built from `non_accessible_globs` further blocks read/write/delete on //! sensitive entries (`.env`, `*.pem`, …) while still allowing them to -//! appear in `list-folder`/`tree` listings. +//! appear in `list-folder`/`tree` listings; globs match the path +//! *relative to its containing root*. A second GlobSet compiled from +//! `default_exclude_globs` (same matching convention) is a hide-only +//! noise filter applied by `coder::tree` — opt-out per call, never +//! access control. +//! +//! Every `coder::*` response carries canonical ABSOLUTE paths (decision +//! D2-eng) so multi-root results are unambiguous, and handlers operate +//! ONLY on resolver-returned paths — never on operands re-derived from +//! the raw request after validation. //! //! The symlink-safe canonicalisation walks to the longest existing //! ancestor, canonicalises that, then lexically collapses the tail — //! mirroring [`shell/src/fs/host.rs`](../../../shell/src/fs/host.rs) //! `canonicalize_with_fallback`. +//! +//! MIRROR-INVARIANT: `canonicalize_with_fallback` + `normalize_lexical` +//! here and in `shell/src/fs/host.rs` implement the same jail-safety +//! algorithm and MUST evolve in lockstep — port any fix in one file to +//! the other. (Divergence between validated and operated-on paths caused +//! a jail escape in shell; see `lexical_operand` history there.) use std::path::{Component, Path, PathBuf}; @@ -21,109 +39,301 @@ use crate::error::CoderError; #[derive(Debug)] pub struct PathResolver { - base_root_canon: PathBuf, + /// Canonical allowed roots, in configuration order. Index 0 is the + /// primary root that relative wire paths resolve against. + roots_canon: Vec, non_accessible: GlobSet, + /// Compiled `default_exclude_globs` exactly as configured. Used to + /// omit matching NON-directory entries from `coder::tree`. Hide-only + /// noise filter — never an access-control surface; that is + /// `non_accessible`'s job. + default_exclude: GlobSet, + /// `default_exclude_globs` plus `/**`-stripped dir-boundary + /// companions, so `**/node_modules/**` also catches the + /// `node_modules` directory itself. Checked against DIRECTORIES only + /// — on other entry kinds the companions would wrongly drop a file + /// or symlink merely NAMED like an excluded directory. + default_exclude_dirs: GlobSet, +} + +/// Effective roots when neither `base_paths` nor legacy `base_path` is +/// configured: the engine workspace cwd plus `/tmp` (a deliberate, +/// user-approved default). +fn default_roots() -> Vec { + vec![PathBuf::from("./"), PathBuf::from("/tmp")] +} + +/// Prefix of the roots listing inside C215 messages. The recovery-pair +/// test (`c215_error_text_alone_enables_successful_second_call`) parses +/// the allowed roots back out of the error text using this marker — keep +/// the format! sites and this const in lockstep. +const C215_ROOTS_PREFIX: &str = "Allowed roots: "; + +/// Standard re-route hint appended to C215 messages. The recovery-pair +/// test uses `". " + SHELL_FS_HINT` as the end-of-roots-list marker. +const SHELL_FS_HINT: &str = + "Use a path inside an allowed root, or the shell worker's shell::fs::* for other host paths."; + +/// `", "` display form for path lists in error messages. +fn display_paths(paths: &[PathBuf]) -> String { + paths + .iter() + .map(|p| p.display().to_string()) + .collect::>() + .join(", ") } impl PathResolver { pub fn new(cfg: &CoderConfig) -> Result { - let base_root_canon = std::fs::canonicalize(&cfg.base_path).map_err(|e| { - CoderError::Io(format!( - "base_path {} cannot be canonicalized: {e}", - cfg.base_path.display() - )) - })?; - let mut builder = GlobSetBuilder::new(); - for pat in &cfg.non_accessible_globs { - let g = Glob::new(pat).map_err(|e| { - CoderError::BadInput(format!("invalid non_accessible_glob {pat:?}: {e}")) - })?; - builder.add(g); + let configured: Vec = match (&cfg.base_path, cfg.base_paths.as_slice()) { + (Some(_), [_, ..]) => { + return Err(CoderError::BadInput( + "both `base_path` and `base_paths` are set; set either \ + `base_path` or `base_paths` in config.yaml, not both. \ + Remove `base_path` and keep only `base_paths` \ + (legacy `base_path` is honored as a one-entry list)." + .into(), + )) + } + (Some(single), []) => vec![single.clone()], + (None, []) => default_roots(), + (None, many) => many.to_vec(), + }; + + let mut roots_canon: Vec = Vec::with_capacity(configured.len()); + for root in &configured { + match std::fs::canonicalize(root) { + Ok(canon) if roots_canon.contains(&canon) => tracing::warn!( + root = %root.display(), + canonical = %canon.display(), + "dropping duplicate root: same canonical path already configured" + ), + Ok(canon) => roots_canon.push(canon), + Err(e) => tracing::warn!( + root = %root.display(), + error = %e, + "skipping unreachable root: cannot canonicalize" + ), + } + } + if roots_canon.is_empty() { + // C210 like the both-set case above: an operator config error + // detected at construction time, not a runtime I/O failure. + return Err(CoderError::BadInput(format!( + "no reachable roots: none of [{}] could be canonicalized. \ + Ensure the directories exist and are accessible, then set \ + `base_paths` in config.yaml to at least one reachable path.", + display_paths(&configured) + ))); } - let non_accessible = builder - .build() - .map_err(|e| CoderError::BadInput(format!("globset build failed: {e}")))?; + + let non_accessible = compile_globset(&cfg.non_accessible_globs, "non_accessible_glob")?; + let default_exclude = compile_globset(&cfg.default_exclude_globs, "default_exclude_glob")?; + let default_exclude_dirs = compile_globset( + &with_dir_companions(&cfg.default_exclude_globs), + "default_exclude_glob", + )?; + + tracing::info!(roots = ?roots_canon, "path resolver roots"); Ok(Self { - base_root_canon, + roots_canon, non_accessible, + default_exclude, + default_exclude_dirs, }) } + /// Primary root — the first configured (and reachable) root. Relative + /// wire paths resolve against it. pub fn base_root(&self) -> &Path { - &self.base_root_canon + &self.roots_canon[0] + } + + /// All canonical allowed roots, in configuration order. + pub fn roots(&self) -> &[PathBuf] { + &self.roots_canon + } + + /// True when `p` is exactly one of the allowed roots. + pub fn is_root(&self, p: &Path) -> bool { + self.roots_canon.iter().any(|r| r == p) + } + + /// The allowed root containing `canon`, if any. First match in + /// configuration order (only relevant when roots nest). + pub fn containing_root(&self, canon: &Path) -> Option<&Path> { + self.roots_canon + .iter() + .find(|r| canon.starts_with(r)) + .map(PathBuf::as_path) } - /// Resolve `rel` to a canonical absolute path under `base_root`. The - /// path need not exist; we walk to the longest existing ancestor, - /// canonicalise that, then lexically collapse the tail. Rejects: + /// Comma-separated display of all allowed roots, for C215 messages. + fn roots_list(&self) -> String { + display_paths(&self.roots_canon) + } + + /// Resolve a wire `path` to a canonical absolute path inside the + /// jail. The path need not exist; we walk to the longest existing + /// ancestor, canonicalise that, then lexically collapse the tail. /// - /// - absolute inputs → `C210` (the wire API is base-relative) - /// - results outside `base_root` → `C215` + /// - relative inputs resolve against the primary (first) root and + /// must stay inside it → escape is `C215` + /// - absolute inputs are accepted when they canonicalise inside ANY + /// allowed root; outside all roots → `C215` /// - dangling symlinks in the tail → `C215` - pub fn resolve(&self, rel: &str) -> Result { - let rel_path = Path::new(rel); - if rel_path.is_absolute() { - return Err(CoderError::BadInput(format!( - "path must be relative to base_path: {rel}" - ))); - } - let joined = self.base_root_canon.join(rel_path); + pub fn resolve(&self, path: &str) -> Result { + let wire = Path::new(path); + let is_absolute = wire.is_absolute(); + let joined = if is_absolute { + wire.to_path_buf() + } else { + self.base_root().join(wire) + }; let canon = canonicalize_with_fallback(&joined).map_err(|e| { let msg = e.to_string(); if msg.contains("dangling symlink in path") { - CoderError::OutsideBase(format!("{rel}: {msg}")) + // Dangling symlink: name the containing root context so the + // caller knows where they are allowed to work. The marker + // consts are parsed back out by the recovery-pair test. + CoderError::OutsideBase(format!( + "{path}: {msg}. {C215_ROOTS_PREFIX}{roots}. {SHELL_FS_HINT}", + roots = self.roots_list() + )) } else if e.kind() == std::io::ErrorKind::InvalidInput || e.kind() == std::io::ErrorKind::NotFound { - CoderError::NotFoundOrDenied(format!("{rel}: {msg}")) + // Not-found / invalid during ancestor walk — treated as + // C211 (not-found-or-denied) rather than C215. The single + // constructor guarantees the standardized wording. + CoderError::not_found_or_denied(path) } else { - CoderError::Io(format!("canonicalize {rel}: {e}")) + CoderError::Io(format!("canonicalize {path}: {e}")) } })?; - if !canon.starts_with(&self.base_root_canon) { - return Err(CoderError::OutsideBase(format!( - "path escapes base_path: {rel}" - ))); + let inside = if is_absolute { + self.containing_root(&canon).is_some() + } else { + canon.starts_with(self.base_root()) + }; + if !inside { + if is_absolute { + // Absolute path outside every allowed root. The marker + // consts are parsed back out by the recovery-pair test. + return Err(CoderError::OutsideBase(format!( + "path is outside every allowed root: {path}. \ + {C215_ROOTS_PREFIX}{roots}. {SHELL_FS_HINT}", + roots = self.roots_list() + ))); + } else { + // Relative path that escaped the primary root (e.g. via `..`). + let primary = self.base_root().display(); + return Err(CoderError::OutsideBase(format!( + "path escapes the primary allowed root {primary}: {path}. \ + Relative paths resolve against {primary}; \ + use an absolute path inside an allowed root instead." + ))); + } } Ok(canon) } - /// Path's location relative to `base_root` as a forward-slash string, - /// suitable for glob matching and wire responses. Returns `None` if - /// `abs` isn't under `base_root` (should never happen for paths that - /// came out of `resolve`). + /// Path's location relative to its CONTAINING root as a forward-slash + /// string, suitable for glob matching. Returns `None` if `abs` isn't + /// under any allowed root (should never happen for paths that came + /// out of `resolve`). pub fn relative(&self, abs: &Path) -> Option { - abs.strip_prefix(&self.base_root_canon) + let root = self.containing_root(abs)?; + abs.strip_prefix(root) .ok() .map(|p| p.to_string_lossy().replace('\\', "/")) } - /// True if `abs`'s base-relative form matches any non-accessible glob. - /// `abs` is expected to be a path previously returned by `resolve`. + /// True if `abs`'s root-relative form matches any non-accessible + /// glob. `abs` is expected to be a path previously returned by + /// `resolve`. Glob semantics are per containing root, so a pattern + /// like `**/.env` blocks `.env` in every allowed root. pub fn is_non_accessible(&self, abs: &Path) -> bool { + self.matches_rel(&self.non_accessible, abs) + } + + /// True if `abs`'s root-relative form matches a + /// `default_exclude_globs` entry exactly as configured. `coder::tree` + /// omits matching NON-directory entries. Hide-only: it must never + /// gate access — `is_non_accessible` does that. + pub fn is_default_excluded(&self, abs: &Path) -> bool { + self.matches_rel(&self.default_exclude, abs) + } + + /// Directory-boundary form of [`is_default_excluded`]: additionally + /// matches the `/**`-stripped companions, so `**/node_modules/**` + /// catches the `node_modules` directory itself and descent can be + /// suppressed. Call this for DIRECTORIES only — on other entry kinds + /// the companions would match files/symlinks merely NAMED like an + /// excluded directory. + /// + /// [`is_default_excluded`]: Self::is_default_excluded + pub fn is_default_excluded_dir(&self, abs: &Path) -> bool { + self.matches_rel(&self.default_exclude_dirs, abs) + } + + fn matches_rel(&self, set: &GlobSet, abs: &Path) -> bool { let Some(rel) = self.relative(abs) else { return false; }; if rel.is_empty() { return false; } - self.non_accessible.is_match(&rel) + set.is_match(&rel) } /// Resolve and reject if the result is on the non-accessible list. /// Used by every mutating operation and by `read-file` so the same /// glob hides both reads and writes. + /// + /// The C211 message is intentionally identical in wording to the + /// not-found case (REDACTION INVARIANT: callers must not be able to + /// distinguish "denied" from "missing" by observing the error text). pub fn require_writable(&self, rel: &str) -> Result { let abs = self.resolve(rel)?; if self.is_non_accessible(&abs) { - return Err(CoderError::NotFoundOrDenied(format!( - "path is non-accessible per config: {rel}" - ))); + return Err(CoderError::not_found_or_denied(rel)); } Ok(abs) } } +fn compile_globset(patterns: &[String], key: &str) -> Result { + let mut builder = GlobSetBuilder::new(); + for pat in patterns { + let g = Glob::new(pat) + .map_err(|e| CoderError::BadInput(format!("invalid {key} {pat:?}: {e}")))?; + builder.add(g); + } + builder + .build() + .map_err(|e| CoderError::BadInput(format!("globset build failed: {e}"))) +} + +/// A pattern like `**/node_modules/**` matches paths INSIDE the directory +/// but not the directory itself, so descent suppression at the dir +/// boundary would never trigger; compile a `/**`-stripped companion +/// (`**/node_modules`) alongside each such pattern so the boundary +/// matches too. The degenerate pattern `/**` would strip to an empty +/// companion, which is dropped. +fn with_dir_companions(patterns: &[String]) -> Vec { + patterns + .iter() + .flat_map(|p| { + let companion = p + .strip_suffix("/**") + .filter(|s| !s.is_empty()) + .map(String::from); + std::iter::once(p.clone()).chain(companion) + }) + .collect() +} + fn canonicalize_with_fallback(p: &Path) -> std::io::Result { if let Ok(c) = std::fs::canonicalize(p) { return Ok(c); @@ -174,20 +384,28 @@ mod tests { use super::*; use tempfile::tempdir; - fn cfg_with(base: PathBuf, globs: Vec<&str>) -> CoderConfig { + fn cfg_roots(roots: Vec, globs: Vec<&str>) -> CoderConfig { CoderConfig { - base_path: base, + base_paths: roots, non_accessible_globs: globs.into_iter().map(String::from).collect(), ..CoderConfig::default() } } + fn cfg_with(base: PathBuf, globs: Vec<&str>) -> CoderConfig { + cfg_roots(vec![base], globs) + } + + fn canon(p: &Path) -> PathBuf { + std::fs::canonicalize(p).unwrap() + } + #[test] fn resolve_dot_returns_base_root() { let tmp = tempdir().unwrap(); let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); let got = r.resolve(".").unwrap(); - assert_eq!(got, std::fs::canonicalize(tmp.path()).unwrap()); + assert_eq!(got, canon(tmp.path())); } #[test] @@ -210,13 +428,52 @@ mod tests { } #[test] - fn resolve_absolute_input_rejected_as_bad_input() { + fn relative_resolves_against_primary_root_only() { + let a = tempdir().unwrap(); + let b = tempdir().unwrap(); + // The file only exists in the SECOND root; a relative wire path + // must still anchor at the primary (first) root. + std::fs::write(b.path().join("f.txt"), b"secondary").unwrap(); + let r = PathResolver::new(&cfg_roots( + vec![a.path().to_path_buf(), b.path().to_path_buf()], + vec![], + )) + .unwrap(); + let got = r.resolve("f.txt").unwrap(); + assert!(got.starts_with(canon(a.path()))); + assert!(!got.starts_with(canon(b.path()))); + } + + #[test] + fn absolute_inside_any_root_accepted() { + let a = tempdir().unwrap(); + let b = tempdir().unwrap(); + std::fs::write(a.path().join("x.txt"), b"x").unwrap(); + std::fs::write(b.path().join("y.txt"), b"y").unwrap(); + let r = PathResolver::new(&cfg_roots( + vec![a.path().to_path_buf(), b.path().to_path_buf()], + vec![], + )) + .unwrap(); + let in_a = r + .resolve(&a.path().join("x.txt").display().to_string()) + .unwrap(); + assert!(in_a.starts_with(canon(a.path()))); + let in_b = r + .resolve(&b.path().join("y.txt").display().to_string()) + .unwrap(); + assert!(in_b.starts_with(canon(b.path()))); + } + + #[test] + fn absolute_outside_all_roots_rejected_with_c215() { let tmp = tempdir().unwrap(); let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); let err = r.resolve("/etc/passwd").unwrap_err(); - assert_eq!(err.code(), "C210"); + assert_eq!(err.code(), "C215"); } + // REGRESSION PIN: `..` escapes must keep failing closed. #[test] fn resolve_dotdot_escape_rejected_as_outside_base() { let tmp = tempdir().unwrap(); @@ -225,6 +482,21 @@ mod tests { assert_eq!(err.code(), "C215"); } + #[test] + fn absolute_dotdot_escape_rejected_per_root() { + let a = tempdir().unwrap(); + let b = tempdir().unwrap(); + let r = PathResolver::new(&cfg_roots( + vec![a.path().to_path_buf(), b.path().to_path_buf()], + vec![], + )) + .unwrap(); + // `/../escape.txt` collapses to b's PARENT — outside both. + let input = format!("{}/../escape.txt", b.path().display()); + let err = r.resolve(&input).unwrap_err(); + assert_eq!(err.code(), "C215"); + } + #[test] fn resolve_through_symlink_escape_rejected() { let tmp = tempdir().unwrap(); @@ -235,6 +507,156 @@ mod tests { assert_eq!(err.code(), "C215"); } + #[test] + fn dangling_symlink_in_tail_rejected_with_c215() { + let tmp = tempdir().unwrap(); + std::os::unix::fs::symlink(tmp.path().join("missing-target"), tmp.path().join("dangle")) + .unwrap(); + let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); + let err = r.resolve("dangle/child.txt").unwrap_err(); + assert_eq!(err.code(), "C215"); + } + + #[test] + fn root_and_input_both_canonicalize_before_containment_check() { + // On macOS `/tmp` is a symlink to `/private/tmp`; both the + // configured root and the wire input must be canonicalized before + // the starts_with comparison, or every absolute `/tmp/...` input + // would be rejected. The expectation is canonicalized too so this + // passes on Linux (where /tmp is already canonical). + let r = PathResolver::new(&cfg_roots(vec![PathBuf::from("/tmp")], vec![])).unwrap(); + let name = format!("coder-multiroot-test-{}", std::process::id()); + let got = r.resolve(&format!("/tmp/{name}")).unwrap(); + let expected = std::fs::canonicalize("/tmp").unwrap().join(&name); + assert_eq!(got, expected); + } + + #[test] + fn same_filename_in_two_roots_resolves_to_each_own_root() { + let a = tempdir().unwrap(); + let b = tempdir().unwrap(); + std::fs::write(a.path().join("same.txt"), b"a").unwrap(); + std::fs::write(b.path().join("same.txt"), b"b").unwrap(); + let r = PathResolver::new(&cfg_roots( + vec![a.path().to_path_buf(), b.path().to_path_buf()], + vec![], + )) + .unwrap(); + let in_a = r + .resolve(&a.path().join("same.txt").display().to_string()) + .unwrap(); + let in_b = r + .resolve(&b.path().join("same.txt").display().to_string()) + .unwrap(); + assert_ne!(in_a, in_b, "absolute responses must disambiguate roots"); + assert!(in_a.starts_with(canon(a.path()))); + assert!(in_b.starts_with(canon(b.path()))); + } + + #[test] + fn both_base_path_and_base_paths_set_is_construction_error() { + let a = tempdir().unwrap(); + let b = tempdir().unwrap(); + let cfg = CoderConfig { + base_path: Some(a.path().to_path_buf()), + base_paths: vec![b.path().to_path_buf()], + ..CoderConfig::default() + }; + let err = PathResolver::new(&cfg).unwrap_err(); + assert_eq!(err.code(), "C210"); + } + + #[test] + fn zero_reachable_roots_is_construction_error() { + let cfg = cfg_roots( + vec![ + PathBuf::from("/this/does/not/exist/a-xyz123"), + PathBuf::from("/this/does/not/exist/b-xyz123"), + ], + vec![], + ); + let err = PathResolver::new(&cfg).unwrap_err(); + // C210: operator config error, same class as the both-set case. + assert_eq!(err.code(), "C210"); + } + + #[test] + fn unreachable_root_among_several_is_skipped() { + let a = tempdir().unwrap(); + let cfg = cfg_roots( + vec![ + a.path().to_path_buf(), + PathBuf::from("/this/does/not/exist/xyz123"), + ], + vec![], + ); + let r = PathResolver::new(&cfg).unwrap(); + assert_eq!(r.roots().len(), 1); + assert_eq!(r.base_root(), canon(a.path())); + assert!(r.resolve(".").is_ok()); + } + + #[test] + fn duplicate_roots_deduped_after_canonicalization() { + // The same directory configured twice — once verbatim, once in its + // canonical form (on macOS tempdirs live under /var/folders, an + // alias of /private/var/folders, so these strings can differ) — + // must collapse to a single canonical root. + let tmp = tempdir().unwrap(); + let cfg = cfg_roots(vec![tmp.path().to_path_buf(), canon(tmp.path())], vec![]); + let r = PathResolver::new(&cfg).unwrap(); + assert_eq!(r.roots().len(), 1); + assert_eq!(r.base_root(), canon(tmp.path())); + } + + #[test] + fn non_accessible_glob_matches_per_containing_root() { + let a = tempdir().unwrap(); + let b = tempdir().unwrap(); + std::fs::write(a.path().join(".env"), b"x").unwrap(); + std::fs::write(b.path().join(".env"), b"x").unwrap(); + let r = PathResolver::new(&cfg_roots( + vec![a.path().to_path_buf(), b.path().to_path_buf()], + vec!["**/.env"], + )) + .unwrap(); + let abs_a = r + .resolve(&a.path().join(".env").display().to_string()) + .unwrap(); + let abs_b = r + .resolve(&b.path().join(".env").display().to_string()) + .unwrap(); + assert!(r.is_non_accessible(&abs_a), ".env in root[0] must match"); + assert!(r.is_non_accessible(&abs_b), ".env in root[1] must match"); + } + + #[test] + fn default_config_constructs_resolver() { + // CI interface collection boots the worker with zero config from a + // scratch cwd; the defaults are ["./", "/tmp"] and "./" must + // canonicalize from whatever cwd the process happens to have. + let r = PathResolver::new(&CoderConfig::default()).unwrap(); + assert!(!r.roots().is_empty()); + assert_eq!( + r.base_root(), + std::fs::canonicalize(std::env::current_dir().unwrap()).unwrap() + ); + } + + #[test] + fn legacy_base_path_honored_as_single_root() { + let tmp = tempdir().unwrap(); + let cfg = CoderConfig { + base_path: Some(tmp.path().to_path_buf()), + ..CoderConfig::default() + }; + let r = PathResolver::new(&cfg).unwrap(); + assert_eq!(r.roots().len(), 1); + assert_eq!(r.resolve(".").unwrap(), canon(tmp.path())); + let err = r.resolve("/etc/passwd").unwrap_err(); + assert_eq!(err.code(), "C215"); + } + #[test] fn is_non_accessible_matches_root_dotenv() { let tmp = tempdir().unwrap(); @@ -284,12 +706,222 @@ mod tests { } #[test] - fn new_with_missing_base_path_returns_io_error() { + fn new_with_invalid_default_exclude_glob_returns_bad_input() { + let tmp = tempdir().unwrap(); + let cfg = CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + default_exclude_globs: vec!["[".to_string()], + ..CoderConfig::default() + }; + let err = PathResolver::new(&cfg).unwrap_err(); + assert_eq!(err.code(), "C210"); + assert!( + err.to_string().contains("default_exclude_glob"), + "message must name the config key: {err}" + ); + } + + // DIR-BOUNDARY PIN: `**/node_modules/**` only matches paths INSIDE + // the directory; the dir-set companion must catch the directory + // itself so descent suppression triggers at the boundary — while the + // plain set must NOT match the bare name, or files/symlinks merely + // named like an excluded directory would be dropped. + #[test] + fn default_exclude_matches_dir_itself_not_just_children() { + let tmp = tempdir().unwrap(); + std::fs::create_dir_all(tmp.path().join("node_modules/pkg")).unwrap(); + std::fs::create_dir_all(tmp.path().join("sub/node_modules")).unwrap(); + let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); + let dir = r.resolve("node_modules").unwrap(); + assert!( + r.is_default_excluded_dir(&dir), + "the directory itself must match, not only its children" + ); + assert!( + !r.is_default_excluded(&dir), + "companions live only in the dir set: a non-directory entry \ + merely NAMED node_modules must not match the plain set" + ); + let child = r.resolve("node_modules/pkg").unwrap(); + assert!(r.is_default_excluded(&child)); + assert!(r.is_default_excluded_dir(&child)); + let nested = r.resolve("sub/node_modules").unwrap(); + assert!( + r.is_default_excluded_dir(&nested), + "nested dir boundary must match" + ); + } + + #[test] + fn default_exclude_false_for_ordinary_paths() { + let tmp = tempdir().unwrap(); + std::fs::create_dir_all(tmp.path().join("src")).unwrap(); + std::fs::write(tmp.path().join("src/main.rs"), b"x").unwrap(); + let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); + let f = r.resolve("src/main.rs").unwrap(); + assert!(!r.is_default_excluded(&f)); + let d = r.resolve("src").unwrap(); + assert!(!r.is_default_excluded(&d)); + assert!(!r.is_default_excluded_dir(&d)); + } + + #[test] + fn dir_companions_derived_only_for_slash_star_star_suffixes() { + let patterns = vec![ + "**/node_modules/**".to_string(), + "**/*.log".to_string(), + "/**".to_string(), + ]; + assert_eq!( + with_dir_companions(&patterns), + vec![ + "**/node_modules/**".to_string(), + "**/node_modules".to_string(), + "**/*.log".to_string(), + "/**".to_string(), + ], + "no companion for non-/** patterns; empty companion of the \ + degenerate /** must be dropped" + ); + } + + #[test] + fn degenerate_slash_star_star_exclude_still_constructs() { + let tmp = tempdir().unwrap(); + std::fs::create_dir_all(tmp.path().join("src")).unwrap(); let cfg = CoderConfig { - base_path: PathBuf::from("/this/does/not/exist/probably/xyz123"), + base_paths: vec![tmp.path().to_path_buf()], + default_exclude_globs: vec!["/**".to_string()], + ..CoderConfig::default() + }; + let r = PathResolver::new(&cfg).expect("empty companion must be dropped, not compiled"); + let d = r.resolve("src").unwrap(); + assert!(!r.is_default_excluded_dir(&d)); + } + + // REDACTION INVARIANT separation: the hide-only exclude set must not + // bleed into the access-control set. + #[test] + fn default_exclude_does_not_make_paths_non_accessible() { + let tmp = tempdir().unwrap(); + std::fs::create_dir_all(tmp.path().join("node_modules")).unwrap(); + let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); + let dir = r.resolve("node_modules").unwrap(); + assert!(!r.is_non_accessible(&dir)); + assert!(r.require_writable("node_modules/x.txt").is_ok()); + } + + #[test] + fn legacy_missing_base_path_is_construction_error() { + let cfg = CoderConfig { + base_path: Some(PathBuf::from("/this/does/not/exist/probably/xyz123")), ..CoderConfig::default() }; let err = PathResolver::new(&cfg).unwrap_err(); - assert_eq!(err.code(), "C216"); + // C210: operator config error, same class as the both-set case. + assert_eq!(err.code(), "C210"); + } + + // RECOVERY-PAIR TEST: parse the first allowed root out of the C215 error + // text, write a file there, then verify success. This proves the error + // message alone contains enough information for a caller to make a + // successful second call. + // + // The message format is: + // "... {C215_ROOTS_PREFIX}, . {SHELL_FS_HINT}" + // We parse using the SAME consts the format! sites use, so the test + // and the message can never drift apart. + #[test] + fn c215_error_text_alone_enables_successful_second_call() { + let tmp = tempdir().unwrap(); + let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); + + // Trigger C215 with an absolute path outside the root. + let err = r.resolve("/etc/passwd").unwrap_err(); + assert_eq!(err.code(), "C215"); + let msg = err.to_string(); + + // Parse the first allowed root from the error text using the + // shared marker consts. + let after_prefix = msg + .split(C215_ROOTS_PREFIX) + .nth(1) + .expect("C215 message must contain C215_ROOTS_PREFIX"); + + // The roots list ends where the shell::fs re-route hint begins. + let hint_marker = format!(". {SHELL_FS_HINT}"); + let roots_list = after_prefix + .split(hint_marker.as_str()) + .next() + .expect("roots section must be followed by the shell::fs hint"); + + // The first root is everything up to the first ", " (or the whole + // string if there is only one root). + let first_root = roots_list + .split(", ") + .next() + .expect("at least one root") + .trim() + .to_string(); + + // Now write a file inside that first root and verify it resolves. + let target = format!("{first_root}/c215_recovery_test.txt"); + std::fs::write(&target, b"ok").unwrap(); + let resolved = r + .resolve(&target) + .expect("writing to a path parsed from the C215 error text must succeed"); + assert!(resolved.starts_with(canon(tmp.path()))); + // Cleanup. + let _ = std::fs::remove_file(&target); + } + + // NOTE: the C211 identical-wording invariant (missing vs glob-denied) + // is pinned end-to-end through a real handler call in + // functions::update_file::handler_tests:: + // c211_wording_identical_for_missing_and_glob_denied. + + // C215 absolute messages name the primary root for relative-escape + // and name all roots for absolute-escape. + #[test] + fn c215_relative_escape_names_primary_root_in_message() { + let tmp = tempdir().unwrap(); + let r = PathResolver::new(&cfg_with(tmp.path().to_path_buf(), vec![])).unwrap(); + let err = r.resolve("../../../etc/passwd").unwrap_err(); + assert_eq!(err.code(), "C215"); + let msg = err.to_string(); + let primary = r.base_root().display().to_string(); + assert!( + msg.contains(&primary), + "C215 relative-escape message must name the primary root \ + ({primary}); got: {msg}" + ); + } + + #[test] + fn c215_absolute_outside_message_names_all_roots() { + let a = tempdir().unwrap(); + let b = tempdir().unwrap(); + let r = PathResolver::new(&cfg_roots( + vec![a.path().to_path_buf(), b.path().to_path_buf()], + vec![], + )) + .unwrap(); + let err = r.resolve("/etc/passwd").unwrap_err(); + assert_eq!(err.code(), "C215"); + let msg = err.to_string(); + let root_a = canon(a.path()).display().to_string(); + let root_b = canon(b.path()).display().to_string(); + assert!( + msg.contains(&root_a), + "C215 absolute-outside message must name root_a ({root_a}); got: {msg}" + ); + assert!( + msg.contains(&root_b), + "C215 absolute-outside message must name root_b ({root_b}); got: {msg}" + ); + assert!( + msg.contains(C215_ROOTS_PREFIX), + "C215 message must contain C215_ROOTS_PREFIX; got: {msg}" + ); } } diff --git a/coder/tests/common/helpers.rs b/coder/tests/common/helpers.rs index e0c27358..59c0fc15 100644 --- a/coder/tests/common/helpers.rs +++ b/coder/tests/common/helpers.rs @@ -103,8 +103,24 @@ pub fn last_err(world: &CoderWorld) -> String { /// Locate a batch result entry by `path` in `results[]`. Returns `None` /// when missing. Shared by create-file / update-file / delete-file /// assertions. +/// +/// Result `path`s are canonical-absolute when the input resolved inside +/// the jail and verbatim otherwise; features speak base-relative, so +/// accept either the verbatim form or the path anchored at the +/// scenario's (canonical) base. pub fn batch_result<'a>(world: &'a CoderWorld, path: &str) -> Option<&'a Value> { + let abs = world.base_path.as_ref().map(|b| b.join(path)); let v = world.stash.get(LAST_OK)?; let arr = v.get("results")?.as_array()?; - arr.iter().find(|e| e["path"].as_str() == Some(path)) + arr.iter().find(|e| { + let Some(got) = e["path"].as_str() else { + return false; + }; + // Component-wise Path comparison so "." anchors to the base + // itself rather than the literal "/." string. + got == path + || abs + .as_deref() + .is_some_and(|a| std::path::Path::new(got) == a) + }) } diff --git a/coder/tests/common/workers.rs b/coder/tests/common/workers.rs index 80e50479..2c5bc1cd 100644 --- a/coder/tests/common/workers.rs +++ b/coder/tests/common/workers.rs @@ -2,7 +2,7 @@ //! shared SDK handle. Re-uses the production entry point //! `coder::functions::register_all` so scenarios exercise identical //! code paths. Registration is idempotent (`OnceCell`); we wipe the -//! per-test `base_path` between scenarios so leftover files from one +//! per-test allowed root between scenarios so leftover files from one //! scenario don't pollute the next. use std::path::Path; @@ -14,8 +14,10 @@ use iii_sdk::III; use tokio::sync::OnceCell; use coder::config::CoderConfig; +use coder::configuration::ConfigCell; use coder::functions; use coder::path::PathResolver; +use tokio::sync::RwLock; /// Caps used by the shared in-process worker. Small enough that a few /// kilobytes of fixture content triggers `C213` for oversize scenarios, @@ -30,7 +32,7 @@ const TEST_LIST_MAX_PAGE_SIZE: u32 = 100; pub struct Shared { pub cfg: Arc, - /// Canonicalised `base_path` the worker reads + writes against. + /// Canonicalised primary allowed root the worker reads + writes against. /// Fixture writes from step defs go here. pub base_path: PathBuf, } @@ -54,7 +56,7 @@ pub async fn register_all(iii: &Arc) -> Result> { let base_path = std::fs::canonicalize(&base_path)?; let cfg = Arc::new(CoderConfig { - base_path: base_path.clone(), + base_paths: vec![base_path.clone()], non_accessible_globs: vec!["**/.env".to_string(), "**/*.pem".to_string()], max_read_bytes: TEST_MAX_READ_BYTES, max_write_bytes: TEST_MAX_WRITE_BYTES, @@ -64,7 +66,10 @@ pub async fn register_all(iii: &Arc) -> Result> { }); let resolver = Arc::new(PathResolver::new(&cfg)?); - functions::register_all(iii, resolver, cfg.clone()); + // The production handlers read a hot-swappable snapshot cell; the BDD + // harness never hot-reloads, so the cell just wraps the boot cfg. + let cell: ConfigCell = Arc::new(RwLock::new(cfg.clone())); + functions::register_all(iii, resolver, cell); // Give the SDK a beat to publish the function registrations before // scenarios start triggering them. diff --git a/coder/tests/features/delete_file.feature b/coder/tests/features/delete_file.feature index 3cd3227d..b9be42f8 100644 --- a/coder/tests/features/delete_file.feature +++ b/coder/tests/features/delete_file.feature @@ -3,7 +3,7 @@ Feature: coder::delete-file Per-path results in `results[]`. Missing paths are idempotent successes with `removed: false`. Directories need `recursive: true`; recursive deletes refuse to descend through non-accessible entries - and the worker refuses to delete `base_path` itself. + and the worker refuses to delete an allowed root itself. Background: Given the iii engine is reachable @@ -84,7 +84,7 @@ Feature: coder::delete-file """ Then the result for ".env" failed with code "C211" - Scenario: deleting base_path itself fails with C210 + Scenario: deleting an allowed root itself fails with C210 When I call coder::delete-file with payload: """ {"paths": ["."]} diff --git a/coder/tests/features/path_security.feature b/coder/tests/features/path_security.feature index 38e2e6f5..2e6bd3a7 100644 --- a/coder/tests/features/path_security.feature +++ b/coder/tests/features/path_security.feature @@ -1,9 +1,10 @@ @engine @security Feature: path-jail invariants - All wire paths are interpreted as relative to `base_path`. Absolute - paths are bad input (`C210`); `..` escapes and crafted symlinks - cannot leave the jail (`C215`); non-accessible globs block reads - even though the entry is still listed (`C211`). + Relative wire paths are interpreted against the primary root; absolute + wire paths are accepted only when they canonicalize inside an allowed + root. Absolute paths outside all roots, `..` escapes, and crafted + symlinks cannot leave the jail (`C215`); non-accessible globs block + reads even though the entry is still listed (`C211`). Background: Given the iii engine is reachable @@ -25,24 +26,24 @@ Feature: path-jail invariants Then the call succeeded And the result for "../escape.txt" failed with code "C215" - Scenario: an absolute path is rejected as C210 on read + Scenario: an absolute path outside all roots fails with C215 on read When I call coder::read-file with payload: """ {"path": "/etc/passwd"} """ - Then the call failed with code "C210" + Then the call failed with code "C215" - Scenario: an absolute path is rejected per item on create + Scenario: an absolute path outside all roots is rejected per item on create When I call coder::create-file with payload: """ {"files": [ {"path": "/tmp/abs.txt", "content": "x", "mode": "0644", "parents": false, "overwrite": false} ]} """ - Then the result for "/tmp/abs.txt" failed with code "C210" + Then the result for "/tmp/abs.txt" failed with code "C215" @unix - Scenario: a symlink whose target escapes base_path is rejected with C215 + Scenario: a symlink whose target escapes the allowed roots is rejected with C215 Given a symlink at "escape_link" pointing to a path outside base When I call coder::read-file with payload: """ diff --git a/coder/tests/features/search.feature b/coder/tests/features/search.feature index 2083241b..66ee72e8 100644 --- a/coder/tests/features/search.feature +++ b/coder/tests/features/search.feature @@ -1,10 +1,14 @@ @engine @search Feature: coder::search - Combined path + content search under `base_path`. Supports literal + Combined path + content search under the allowed roots. Supports literal and regex queries with include / exclude globs, returns content and path matches in separate arrays, and refuses to read non-accessible files. Binary files (NUL byte heuristic) and oversize files are - skipped silently. + skipped silently. Noise paths matching default_exclude_globs are + skipped by default (use_default_excludes: false searches inside); + optional context lines (max 10 each way) ride along with content + matches, and the response is bounded by a byte budget that flags + truncated instead of erroring. Background: Given the iii engine is reachable @@ -167,3 +171,52 @@ Feature: coder::search {"query": "x", "search_content": false, "search_paths": false} """ Then the call failed with code "C210" + + Scenario: context_lines_before above the cap is rejected with C210 + When I call coder::search with payload: + """ + {"query": "x", "context_lines_before": 11} + """ + Then the call failed with code "C210" + + Scenario: context lines within the cap still succeed + Given a file at "ctx.txt" with content: + """ + one + two needle + three + """ + When I call coder::search with payload: + """ + {"query": "needle", "context_lines_before": 1, "context_lines_after": 1} + """ + Then the call succeeded + And the search has a content match for "ctx.txt" at line 2 + + Scenario: default excludes hide node_modules from content and path results + Given a file at "node_modules/pkg/dep.js" with content: + """ + banana + """ + And a file at "src/ok.txt" with content: + """ + banana + """ + When I call coder::search with payload: + """ + {"query": "banana"} + """ + Then the search has a content match for "src/ok.txt" + And the search has no content match for "node_modules/pkg/dep.js" + And the search has no path match for "node_modules/pkg/dep.js" + + Scenario: use_default_excludes false searches inside excluded folders + Given a file at "node_modules/pkg/dep.js" with content: + """ + banana + """ + When I call coder::search with payload: + """ + {"query": "banana", "use_default_excludes": false} + """ + Then the search has a content match for "node_modules/pkg/dep.js" diff --git a/coder/tests/features/tree.feature b/coder/tests/features/tree.feature index ddf73d45..c00104de 100644 --- a/coder/tests/features/tree.feature +++ b/coder/tests/features/tree.feature @@ -93,6 +93,38 @@ Feature: coder::tree Then the tree has a node at "inside.txt" And the tree has no node at "outside.txt" + Scenario: default-excluded directories appear as childless stubs + Given a file at "node_modules/pkg/index.js" with content: + """ + x + """ + And a file at "src/main.rs" with content: + """ + fn main() {} + """ + When I call coder::tree with payload: + """ + {"path": "."} + """ + Then the call succeeded + And the tree has a node at "node_modules" + And the tree node at "node_modules" is truncated with reason "default_exclude" + And the tree node at "node_modules" hint mentions "use_default_excludes" + And the tree has no node at "node_modules/pkg" + And the tree has a node at "src/main.rs" + + Scenario: use_default_excludes false descends into excluded directories + Given a file at "node_modules/pkg/index.js" with content: + """ + x + """ + When I call coder::tree with payload: + """ + {"path": ".", "use_default_excludes": false} + """ + Then the call succeeded + And the tree has a node at "node_modules/pkg/index.js" + Scenario: tree on a missing folder fails with C211 When I call coder::tree with payload: """ diff --git a/coder/tests/features/update_file.feature b/coder/tests/features/update_file.feature index b3e5baf8..0ba0989d 100644 --- a/coder/tests/features/update_file.feature +++ b/coder/tests/features/update_file.feature @@ -249,3 +249,61 @@ Feature: coder::update-file NEW NEW """ + + Scenario: dot_matches_newline lets a short anchored pattern span lines + Given a file at "doc.txt" with content: + """ + before + start + middle + end + after + """ + When I call coder::update-file with payload: + """ + {"files": [ + {"path": "doc.txt", "ops": [ + {"op": "replace", "pattern": "start.*?end", "replacement": "ONE", + "dot_matches_newline": true} + ]} + ]} + """ + Then the result for "doc.txt" succeeded + And the result for "doc.txt" has line count 3 + When I call coder::read-file with payload: + """ + {"path": "doc.txt"} + """ + Then the read content equals: + """ + before + ONE + after + """ + + Scenario: expect_matches mismatch fails the file and leaves it unchanged + Given a file at "doc.txt" with content: + """ + foo + foo + foo + """ + When I call coder::update-file with payload: + """ + {"files": [ + {"path": "doc.txt", "ops": [ + {"op": "replace", "pattern": "foo", "replacement": "bar", "expect_matches": 1} + ]} + ]} + """ + Then the result for "doc.txt" failed with code "C210" + When I call coder::read-file with payload: + """ + {"path": "doc.txt"} + """ + Then the read content equals: + """ + foo + foo + foo + """ diff --git a/coder/tests/golden/errors.json b/coder/tests/golden/errors.json new file mode 100644 index 00000000..4c7f168d --- /dev/null +++ b/coder/tests/golden/errors.json @@ -0,0 +1,78 @@ +{ + "C210_config_both_root_forms_set": { + "code": "C210", + "message": "both `base_path` and `base_paths` are set; set either `base_path` or `base_paths` in config.yaml, not both. Remove `base_path` and keep only `base_paths` (legacy `base_path` is honored as a one-entry list)." + }, + "C210_create_bad_mode": { + "code": "C210", + "message": "bad mode \"9z9\": invalid digit found in string" + }, + "C210_move_cross_root_dir": { + "code": "C210", + "message": "cross-root directory moves are unsupported; move files individually" + }, + "C210_move_dst_is_directory": { + "code": "C210", + "message": "move-dst-dir: destination is a directory; name the target file inside it (e.g. move-dst-dir/move-dir-src.txt)" + }, + "C210_replace_undefined_capture_ref": { + "code": "C210", + "message": "replacement references capture group `${name}` but pattern \"iii\\.registerFunction\\(.*\" defines 0 capture groups and no group named `name` — the regex engine expands undefined references to the EMPTY STRING, silently corrupting the file. Escape literal `$` as `$$` (write `$${name}` to output a literal `${name}` — common when the replacement contains JS/TS template literals), or add the capture group to the pattern" + }, + "C211_delete_subtree_blocked": { + "code": "C211", + "message": "/blocked-dir: subtree contains non-accessible entries; refusing recursive delete." + }, + "C211_move_missing_src": { + "code": "C211", + "message": "no-such-move-src.txt: not found or not accessible. Verify the path with coder::list-folder or coder::tree." + }, + "C211_read_glob_denied": { + "code": "C211", + "message": ".env: not found or not accessible. Verify the path with coder::list-folder or coder::tree." + }, + "C211_read_missing": { + "code": "C211", + "message": "missing.txt: not found or not accessible. Verify the path with coder::list-folder or coder::tree." + }, + "C213_batch_budget_exhausted": { + "code": "C213", + "message": "batch budget exhausted before reaching budget-b.txt: batch_read_budget_bytes is 5 and earlier entries already returned 5 bytes of content (after UTF-8 sanitization). To recover: request fewer or smaller entries, use per-entry line_from/line_to windows, or raise batch_read_budget_bytes in coder config." + }, + "C213_full_read_output_budget_exceeded": { + "code": "C213", + "message": "over-output.txt: a full read would return 9 bytes of content (file is 9 bytes, 3 lines), which exceeds max_output_bytes (8). To recover: read a slice with line_from/line_to, probe metadata cheaply with stat: true, or re-call with a higher per-call max_output_bytes (values above max_read_bytes are clamped)." + }, + "C213_read_cap_exceeded": { + "code": "C213", + "message": "big.txt is 16 bytes, which exceeds max_read_bytes (8). Read a smaller file, raise max_read_bytes in coder config, or read a slice with line_from/line_to." + }, + "C213_write_cap_exceeded": { + "code": "C213", + "message": "big-create.txt is 16 bytes, which exceeds max_write_bytes (8). Split the content into smaller files or raise max_write_bytes in coder config." + }, + "C215_absolute_outside_all_roots": { + "code": "C215", + "message": "path is outside every allowed root: /etc/passwd. Allowed roots: , . Use a path inside an allowed root, or the shell worker's shell::fs::* for other host paths." + }, + "C215_dangling_symlink": { + "code": "C215", + "message": "dangle/child.txt: dangling symlink in path: /dangle. Allowed roots: , . Use a path inside an allowed root, or the shell worker's shell::fs::* for other host paths." + }, + "C215_relative_dotdot_escape": { + "code": "C215", + "message": "path escapes the primary allowed root : ../escape.txt. Relative paths resolve against ; use an absolute path inside an allowed root instead." + }, + "C216_io_passthrough": { + "code": "C216", + "message": "synthetic io failure" + }, + "C217_create_exists_without_overwrite": { + "code": "C217", + "message": "exists.txt already exists; pass overwrite=true to replace" + }, + "C217_move_dst_exists_without_overwrite": { + "code": "C217", + "message": "move-dst.txt already exists; pass overwrite=true to replace" + } +} diff --git a/coder/tests/golden/schemas/coder.create-file.json b/coder/tests/golden/schemas/coder.create-file.json new file mode 100644 index 00000000..f99a4ea7 --- /dev/null +++ b/coder/tests/golden/schemas/coder.create-file.json @@ -0,0 +1,138 @@ +{ + "description": "Create one or more files. Request shape: {\"files\": [{\"path\": \"...\", \"content\": \"...\"}]}. Per-file `overwrite` and `parents` flags; non-accessible paths return C211. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*.", + "function_id": "coder::create-file", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "CreateFileSpec": { + "properties": { + "content": { + "type": "string" + }, + "mode": { + "default": "0644", + "description": "Octal permission bits as a string, e.g. \"0644\". Defaults to \"0644\".", + "type": "string" + }, + "overwrite": { + "default": false, + "description": "When false (the default), refuse to write if `path` already exists.", + "type": "boolean" + }, + "parents": { + "default": true, + "description": "Create missing parent directories. Defaults to true so a single `coder::create-file` call can scaffold a fresh subtree.", + "type": "boolean" + }, + "path": { + "description": "Path relative to the primary allowed root, or an absolute path inside any allowed root. Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "type": "string" + } + }, + "required": [ + "content", + "path" + ], + "type": "object" + } + }, + "examples": [ + { + "files": [ + { + "content": "pub mod utils;\n", + "overwrite": false, + "path": "src/lib.rs" + }, + { + "content": "# scratch notes\n", + "overwrite": true, + "path": "/tmp/scratch/notes.md" + } + ] + } + ], + "properties": { + "files": { + "items": { + "$ref": "#/definitions/CreateFileSpec" + }, + "type": "array" + } + }, + "required": [ + "files" + ], + "title": "CreateFileInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "CreateFileResult": { + "properties": { + "bytes_written": { + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "error": { + "anyOf": [ + { + "$ref": "#/definitions/WireError" + }, + { + "type": "null" + } + ], + "description": "Structured error for this entry. `code` is stable for programmatic branching (e.g. `\"C217\"` means already-exists; pass `overwrite=true` to replace). `message` carries the corrective action an LLM agent needs to make a successful second call." + }, + "path": { + "description": "Canonical absolute path (resolved through the jail); the caller's input verbatim when resolution failed.", + "type": "string" + }, + "success": { + "type": "boolean" + } + }, + "required": [ + "bytes_written", + "path", + "success" + ], + "type": "object" + }, + "WireError": { + "description": "Structured per-entry error as it appears on the wire.\n\nUse `code` for stable programmatic branching (e.g. `\"C211\"` for not-found-or-denied). `message` carries the human/LLM-readable problem description plus the corrective next call.", + "properties": { + "code": { + "description": "Stable error code, e.g. \"C211\". See the README error table.", + "type": "string" + }, + "message": { + "description": "Human/LLM-readable message: problem + actual values + corrective next call.", + "type": "string" + } + }, + "required": [ + "code", + "message" + ], + "type": "object" + } + }, + "properties": { + "results": { + "items": { + "$ref": "#/definitions/CreateFileResult" + }, + "type": "array" + } + }, + "required": [ + "results" + ], + "title": "CreateFileOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.delete-file.json b/coder/tests/golden/schemas/coder.delete-file.json new file mode 100644 index 00000000..a605c76b --- /dev/null +++ b/coder/tests/golden/schemas/coder.delete-file.json @@ -0,0 +1,102 @@ +{ + "description": "Remove one or more paths. Request shape: {\"paths\": [\"...\"]}. Directories need `recursive: true`; missing paths are idempotent successes; recursive removal refuses to descend through non-accessible entries. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*.", + "function_id": "coder::delete-file", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "examples": [ + { + "paths": [ + "src/old_module.rs", + "build/artifacts" + ], + "recursive": true + } + ], + "properties": { + "paths": { + "description": "Paths to remove. Each entry is relative to the primary allowed root, or an absolute path inside any allowed root. Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "items": { + "type": "string" + }, + "type": "array" + }, + "recursive": { + "default": false, + "description": "Required for non-empty directories. Files and empty dirs ignore it.", + "type": "boolean" + } + }, + "required": [ + "paths" + ], + "title": "DeleteFileInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "DeleteFileResult": { + "properties": { + "error": { + "anyOf": [ + { + "$ref": "#/definitions/WireError" + }, + { + "type": "null" + } + ], + "description": "Structured error for this entry. `code` is stable for programmatic branching (e.g. `\"C211\"` for not-found-or-denied; `\"C210\"` for refusing to delete an allowed root). `message` carries the corrective action an LLM agent needs to make a successful second call." + }, + "path": { + "description": "Canonical absolute path (resolved through the jail); the caller's input verbatim when resolution failed.", + "type": "string" + }, + "removed": { + "type": "boolean" + }, + "success": { + "type": "boolean" + } + }, + "required": [ + "path", + "removed", + "success" + ], + "type": "object" + }, + "WireError": { + "description": "Structured per-entry error as it appears on the wire.\n\nUse `code` for stable programmatic branching (e.g. `\"C211\"` for not-found-or-denied). `message` carries the human/LLM-readable problem description plus the corrective next call.", + "properties": { + "code": { + "description": "Stable error code, e.g. \"C211\". See the README error table.", + "type": "string" + }, + "message": { + "description": "Human/LLM-readable message: problem + actual values + corrective next call.", + "type": "string" + } + }, + "required": [ + "code", + "message" + ], + "type": "object" + } + }, + "properties": { + "results": { + "items": { + "$ref": "#/definitions/DeleteFileResult" + }, + "type": "array" + } + }, + "required": [ + "results" + ], + "title": "DeleteFileOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.info.json b/coder/tests/golden/schemas/coder.info.json new file mode 100644 index 00000000..9c0ebc61 --- /dev/null +++ b/coder/tests/golden/schemas/coder.info.json @@ -0,0 +1,133 @@ +{ + "description": "Report the coder jail: canonical allowed roots (primary first), per-file size caps, response budgets (max_output_bytes, batch_read_budget_bytes, search_response_budget_bytes), listing/search limits, the non-accessible glob patterns, and the default_exclude_globs noise filter applied by tree/search. Call this FIRST when unsure where coder may read or write, or when a path was rejected — paths outside every allowed root need the shell worker's shell::fs::* instead.", + "function_id": "coder::info", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "description": "No arguments — `coder::info` is a pure discovery call.", + "examples": [ + {} + ], + "title": "InfoInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "properties": { + "base_paths": { + "description": "Canonical absolute paths of the allowed roots, in configuration order. The primary root (index 0) is where relative wire paths resolve; an absolute path is accepted when it canonicalises inside ANY of these. Paths outside every root are rejected — use `shell::fs::*` instead.", + "items": { + "type": "string" + }, + "type": "array" + }, + "batch_read_budget_bytes": { + "description": "Aggregate budget across a single `paths[]` batch call to `coder::read-file`, measured in bytes of returned content (after UTF-8 sanitization — invalid bytes expand to U+FFFD before being counted, so the cap bounds what the caller actually receives). Entries are collected in request order; each entry may consume up to `min(remaining_budget, max_read_bytes)`. An entry reached with zero remaining budget receives a per-entry C213 naming this key, its value, and the bytes already consumed, with recovery guidance. Budget topology: batch reads are governed by this key; single-path full reads by `max_output_bytes`; windowed reads by `max_read_bytes` applied per returned window — `max_read_bytes` is also the per-file IO ceiling for all of them.", + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "default_exclude_globs": { + "description": "Noise-exclusion globs (root-relative, same matching as `non_accessible_globs`): matching paths (node_modules, .git, …) are omitted from `coder::search` results and pruned from `coder::tree` descent — the directory surfaces as a childless `truncated` stub. Hide-only — no access protection. Pass `use_default_excludes: false` on those calls to look inside.", + "items": { + "type": "string" + }, + "type": "array" + }, + "list_default_page_size": { + "description": "Default `page_size` used by `coder::list-folder` when the caller omits it.", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "list_max_page_size": { + "description": "Hard cap on `page_size` accepted by `coder::list-folder`.", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "max_output_bytes": { + "description": "Context budget for single-path FULL reads in `coder::read-file`, in bytes of returned content. Full reads larger than this return C213 with the file's size/line count and window/stat recovery guidance; a per-call `max_output_bytes` override is available on `coder::read-file` (clamped to `max_read_bytes`).", + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "max_read_bytes": { + "description": "Per-file IO ceiling for `coder::read-file`. Full reads of files larger than this are rejected with C213; windowed reads cap the returned window bytes instead, so larger files stay readable window by window. Also the ceiling for `coder::search` content scanning — larger files are silently skipped during search.", + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "max_write_bytes": { + "description": "Maximum bytes that `coder::create-file` or `coder::update-file` will accept for a single file write. Larger writes are rejected with C213.", + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "non_accessible_globs": { + "description": "Glob patterns matched per root (root-relative). Files whose root-relative path matches are listable but not readable/writable/deletable/creatable; they return C211.", + "items": { + "type": "string" + }, + "type": "array" + }, + "primary_root": { + "description": "Convenience duplicate of `base_paths[0]` — the primary allowed root. Relative paths resolve against this directory.", + "type": "string" + }, + "search_default_max_line_bytes": { + "description": "Per-line byte cap in `coder::search`: matching considers at most this many bytes of each line, and matched/context lines are truncated to it.", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "search_default_max_matches": { + "description": "Default `max_matches` used by `coder::search` when the caller omits it.", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "search_response_budget_bytes": { + "description": "Aggregate byte budget for one `coder::search` response, measured in payload bytes (paths + matched text + context lines). When the budget is hit the response sets `truncated: true` — refine the query or add `include_globs`.", + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "tree_default_depth": { + "description": "Default `max_depth` used by `coder::tree` when the caller omits it.", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "tree_per_folder_limit": { + "description": "Maximum entries returned per folder node by `coder::tree`; folders that exceed this are flagged `truncated`.", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "version": { + "description": "Coder worker version (`CARGO_PKG_VERSION`).", + "type": "string" + } + }, + "required": [ + "base_paths", + "batch_read_budget_bytes", + "default_exclude_globs", + "list_default_page_size", + "list_max_page_size", + "max_output_bytes", + "max_read_bytes", + "max_write_bytes", + "non_accessible_globs", + "primary_root", + "search_default_max_line_bytes", + "search_default_max_matches", + "search_response_budget_bytes", + "tree_default_depth", + "tree_per_folder_limit", + "version" + ], + "title": "InfoOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.list-folder.json b/coder/tests/golden/schemas/coder.list-folder.json new file mode 100644 index 00000000..a5d8a14c --- /dev/null +++ b/coder/tests/golden/schemas/coder.list-folder.json @@ -0,0 +1,125 @@ +{ + "description": "Paginated single-folder listing, sorted by name. Entries carry only `name`; derive an entry's absolute path as the response's `path` + '/' + name. Non-accessible entries are still listed with a `non_accessible: true` flag. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*.", + "function_id": "coder::list-folder", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "examples": [ + { + "page": 1, + "page_size": 50, + "path": "src" + } + ], + "properties": { + "page": { + "default": 1, + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "page_size": { + "default": null, + "description": "Capped by `config.list_max_page_size`; falls back to `config.list_default_page_size` when omitted.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "path": { + "default": ".", + "description": "Folder to list. Relative to the primary allowed root, or an absolute path inside any allowed root. Defaults to `.` (the primary root itself). Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "type": "string" + } + }, + "title": "ListFolderInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "DirEntry": { + "properties": { + "kind": { + "$ref": "#/definitions/EntryKind" + }, + "mtime": { + "format": "int64", + "type": "integer" + }, + "name": { + "description": "Entry basename. The absolute path is derivable from the response's `path`: entry path = folder path + \"/\" + name.", + "type": "string" + }, + "non_accessible": { + "description": "True if this entry matches `non_accessible_globs` — caller cannot read/write/delete it via `coder::*` even though it shows up here.", + "type": "boolean" + }, + "size": { + "format": "uint64", + "minimum": 0.0, + "type": "integer" + } + }, + "required": [ + "kind", + "mtime", + "name", + "non_accessible", + "size" + ], + "type": "object" + }, + "EntryKind": { + "enum": [ + "file", + "dir", + "symlink", + "other" + ], + "type": "string" + } + }, + "properties": { + "entries": { + "items": { + "$ref": "#/definitions/DirEntry" + }, + "type": "array" + }, + "has_more": { + "type": "boolean" + }, + "page": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "page_size": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "path": { + "description": "Canonical absolute path of the listed folder (resolved through the jail). Entries carry only `name`; derive an entry's absolute path by joining: entry path = this path + \"/\" + name. Operations on derived paths re-validate through the jail.", + "type": "string" + }, + "total": { + "format": "uint64", + "minimum": 0.0, + "type": "integer" + } + }, + "required": [ + "entries", + "has_more", + "page", + "page_size", + "path", + "total" + ], + "title": "ListFolderOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.move.json b/coder/tests/golden/schemas/coder.move.json new file mode 100644 index 00000000..48f352e8 --- /dev/null +++ b/coder/tests/golden/schemas/coder.move.json @@ -0,0 +1,138 @@ +{ + "description": "Move or rename one or more paths inside the jail. Request shape: {\"files\": [{\"from\": \"...\", \"to\": \"...\"}]}. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*. Per-entry `overwrite` and `parents` flags. Same-root moves use a per-file-atomic rename; cross-root moves use copy+delete (files only — cross-root directory moves are unsupported, move files individually). Copy+delete is rollback-safe: if source deletion fails after a successful copy the copy is removed and the error names the failure; if rollback also fails the error names both states for manual cleanup.", + "function_id": "coder::move", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "MoveFileSpec": { + "properties": { + "from": { + "description": "Source path: relative to the primary allowed root, or an absolute path inside any allowed root. Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "type": "string" + }, + "overwrite": { + "default": false, + "description": "When false (the default), refuse to overwrite an existing destination. Pass `overwrite: true` to replace an existing file at `to`.", + "type": "boolean" + }, + "parents": { + "default": true, + "description": "Create missing parent directories of the destination. Defaults to true.", + "type": "boolean" + }, + "to": { + "description": "Destination path: relative to the primary allowed root, or an absolute path inside any allowed root. Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "type": "string" + } + }, + "required": [ + "from", + "to" + ], + "type": "object" + } + }, + "examples": [ + { + "files": [ + { + "from": "src/old_name.rs", + "to": "src/new_name.rs" + }, + { + "from": "build/output.bin", + "overwrite": true, + "to": "/tmp/coder-cache/output.bin" + } + ] + } + ], + "properties": { + "files": { + "description": "Entries to move. Each entry is processed independently so a single failure never aborts the rest.", + "items": { + "$ref": "#/definitions/MoveFileSpec" + }, + "type": "array" + } + }, + "required": [ + "files" + ], + "title": "MoveFileInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "MoveFileResult": { + "properties": { + "error": { + "anyOf": [ + { + "$ref": "#/definitions/WireError" + }, + { + "type": "null" + } + ], + "description": "Structured error for this entry. `code` is stable for programmatic branching (e.g. `\"C217\"` means destination exists; pass `overwrite=true` to replace; `\"C210\"` for disallowed operations such as cross-root directory moves, moving a root itself, or a destination that is a directory — the message then names the corrected target path). `message` carries the corrective action an LLM agent needs to make a successful second call." + }, + "from": { + "description": "Canonical absolute path of the source (resolved through the jail); the caller's input verbatim when resolution failed.", + "type": "string" + }, + "moved": { + "description": "True only when the move fully completed; false for a no-op self-move (`from` and `to` resolve to the same file).", + "type": "boolean" + }, + "success": { + "type": "boolean" + }, + "to": { + "description": "Canonical absolute path of the destination (resolved through the jail); the caller's input verbatim when resolution failed.", + "type": "string" + } + }, + "required": [ + "from", + "moved", + "success", + "to" + ], + "type": "object" + }, + "WireError": { + "description": "Structured per-entry error as it appears on the wire.\n\nUse `code` for stable programmatic branching (e.g. `\"C211\"` for not-found-or-denied). `message` carries the human/LLM-readable problem description plus the corrective next call.", + "properties": { + "code": { + "description": "Stable error code, e.g. \"C211\". See the README error table.", + "type": "string" + }, + "message": { + "description": "Human/LLM-readable message: problem + actual values + corrective next call.", + "type": "string" + } + }, + "required": [ + "code", + "message" + ], + "type": "object" + } + }, + "properties": { + "results": { + "items": { + "$ref": "#/definitions/MoveFileResult" + }, + "type": "array" + } + }, + "required": [ + "results" + ], + "title": "MoveFileOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.read-file.json b/coder/tests/golden/schemas/coder.read-file.json new file mode 100644 index 00000000..a72ea4f7 --- /dev/null +++ b/coder/tests/golden/schemas/coder.read-file.json @@ -0,0 +1,343 @@ +{ + "description": "Read a file window-first: probe with stat: true (size/mtime/mode plus total_lines, no content), then fetch just the lines you need with line_from/line_to (1-based, inclusive) — windows keep files larger than max_read_bytes readable window by window, with more_lines/total_lines reporting what remains. numbered: true prefixes each line with its absolute 1-based file line number, matching coder::update-file's line ops exactly. Full reads are budgeted by max_output_bytes (default 128 KiB; per-call override clamped to max_read_bytes) — an over-budget full read fails with a C213 carrying the file's size, line count, and the window/stat recovery calls. Batch mode: pass paths[] (XOR path) to read multiple files in one call — entries are processed in request order against batch_read_budget_bytes, measured in bytes of returned content (after UTF-8 sanitization); per-entry errors (C211/C213) leave other entries unaffected. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*. Non-accessible paths return C211.", + "function_id": "coder::read-file", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "ReadTarget": { + "anyOf": [ + { + "description": "Bare path string: read the whole file (within remaining batch budget and `max_read_bytes`).", + "type": "string" + }, + { + "description": "Object form: path plus optional 1-based window parameters. Omit `line_from` to start from line 1; omit `line_to` to read to EOF.", + "properties": { + "line_from": { + "default": null, + "description": "First line of the window, 1-based inclusive (must be >= 1; 0 is rejected with C210 for this entry). Defaults to 1 when only `line_to` is set.", + "format": "uint64", + "minimum": 1.0, + "type": [ + "integer", + "null" + ] + }, + "line_to": { + "default": null, + "description": "Last line of the window, 1-based inclusive. Must be >= `line_from` (C210 for this entry otherwise). Omit to read from `line_from` to EOF.", + "format": "uint64", + "minimum": 1.0, + "type": [ + "integer", + "null" + ] + }, + "numbered": { + "default": false, + "description": "Prefix this entry's content lines with their absolute 1-based file line numbers (`N→`) — same semantics as the top-level `numbered` field. Prefix bytes are charged against `batch_read_budget_bytes`.", + "type": "boolean" + }, + "path": { + "description": "File to read. Same jail rules as the top-level `path` field.", + "type": "string" + }, + "stat": { + "default": false, + "description": "Per-entry metadata probe: same semantics as the top-level `stat` field — size/mode/mtime always, `total_lines`/`is_utf8` when the file fits `max_read_bytes`, content null, no batch budget consumed. C210 when combined with this entry's `line_from`/`line_to` or `numbered`.", + "type": "boolean" + } + }, + "required": [ + "path" + ], + "type": "object" + } + ], + "description": "A single entry in a `paths[]` batch request. Pass either a bare file path string (whole-file read, same cap as `max_read_bytes`) or an object with optional per-entry `line_from`/`line_to` window parameters (1-based, inclusive — same rules as the top-level `path` mode)." + } + }, + "examples": [ + { + "line_from": 10, + "line_to": 50, + "path": "src/main.rs" + }, + { + "paths": [ + "src/lib.rs", + { + "line_from": 1, + "line_to": 30, + "path": "src/config.rs" + } + ] + } + ], + "properties": { + "line_from": { + "default": null, + "description": "First line of the window, 1-based inclusive (must be >= 1; 0 is rejected with C210). Setting `line_from` and/or `line_to` switches to windowed mode: the file is streamed and only the requested lines are returned, so files larger than `max_read_bytes` stay readable slice by slice — the byte cap then applies to the returned window, never the file size. Defaults to 1 when only `line_to` is set. A window starting past EOF succeeds with empty content and reports the file's `total_lines`. Only valid in single-path mode (`path`); ignored when `paths` is set. Lines are 0x0A- or EOF-terminated segments; a trailing newline does not create a phantom line (same convention as `coder::update-file`).", + "format": "uint64", + "minimum": 1.0, + "type": [ + "integer", + "null" + ] + }, + "line_to": { + "default": null, + "description": "Last line of the window, 1-based inclusive. Must be >= `line_from` (C210 otherwise). Omit to read from `line_from` to end-of-file (still bounded by `max_read_bytes` on the returned bytes). Only valid in single-path mode (`path`); ignored when `paths` is set.", + "format": "uint64", + "minimum": 1.0, + "type": [ + "integer", + "null" + ] + }, + "max_output_bytes": { + "default": null, + "description": "Per-call override of the `max_output_bytes` config (default 131072) that budgets single-path FULL reads, measured in returned content bytes after UTF-8 conversion (numbered prefixes included). Values above `max_read_bytes` are silently clamped to it. When the full content would exceed the effective budget the call fails with a C213 naming the file's size and `total_lines` — recover by windowing with `line_from`/`line_to`, probing with `stat: true`, or raising this field. Full reads only: combining it with `line_from`/`line_to` is C210 (windows are bounded by `max_read_bytes` instead); ignored when `paths` is set (batch mode is governed by `batch_read_budget_bytes`).", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "numbered": { + "default": false, + "description": "When true every returned content line is prefixed `N→`, where N is the line's ABSOLUTE 1-based number in the file — a window starting at `line_from: 40` is numbered from 40, not 1. Numbers match `coder::update-file`'s 1-based line ops exactly, so you can go from a numbered read straight to a line edit. Prefix bytes count toward all byte caps and budgets (no hidden bypass). C210 with `stat: true` (no content to number). Batch entries take a per-entry `numbered` field instead; this top-level flag is ignored when `paths` is set.", + "type": "boolean" + }, + "path": { + "default": null, + "description": "Single file to read. Relative to the primary allowed root, or an absolute path inside any allowed root. Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail. Mutually exclusive with `paths` (XOR): pass either `path` or `paths`, not both — C210 if both or neither is set.", + "type": [ + "string", + "null" + ] + }, + "paths": { + "description": "Batch of files (or windowed slices) to read in a single call. Each entry is either a plain path string (whole-file read) or an object `{path, line_from?, line_to?}` with per-entry window parameters. Entries are processed in request order against a shared `batch_read_budget_bytes` cap, measured in bytes of returned content (after UTF-8 sanitization) — see `coder::info` for the configured value. Results are returned in the `results` field; top-level fields are null. Mutually exclusive with `path` (XOR): pass either `path` or `paths`, not both — C210 if both or neither is set.", + "items": { + "$ref": "#/definitions/ReadTarget" + }, + "type": [ + "array", + "null" + ] + }, + "stat": { + "default": false, + "description": "Metadata probe — the cheap \"how big is it\" call. When true the response carries size/mode/mtime plus `total_lines` and `is_utf8` (both null when the file exceeds `max_read_bytes` — size/mode/mtime still populate, so stat on a huge file SUCCEEDS); `content` is null, `lines_returned` 0, `more_lines` false. Probe BEFORE reading an unknown file, then fetch just the slice you need with `line_from`/`line_to`. Mutually exclusive with `line_from`, `line_to`, `numbered`, and `max_output_bytes` (C210 — stat returns no content for them to act on). Batch entries take a per-entry `stat` field instead; this top-level flag is ignored when `paths` is set.", + "type": "boolean" + } + }, + "title": "ReadFileInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "ReadEntryResult": { + "description": "Per-entry result in a batch `paths[]` response.", + "properties": { + "content": { + "description": "File content as a UTF-8 string — the whole file or the requested window. Binary bytes are replaced by U+FFFD (`is_utf8: false`). `null` on failure.", + "type": [ + "string", + "null" + ] + }, + "error": { + "anyOf": [ + { + "$ref": "#/definitions/WireError" + }, + { + "type": "null" + } + ], + "description": "Structured error — present only when `success: false`." + }, + "is_utf8": { + "description": "Whether `content` survived UTF-8 conversion without losing bytes. `null` on failure.", + "type": [ + "boolean", + "null" + ] + }, + "lines_returned": { + "description": "Number of lines returned in `content`. `null` on failure.", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "mode": { + "description": "Unix permission bits (lower 9 bits of `st_mode`), e.g. 0o644. `null` on failure.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "more_lines": { + "description": "`true` when the file has content beyond what `content` includes (window ended before EOF, or byte budget cut the window short). `null` on failure.", + "type": [ + "boolean", + "null" + ] + }, + "mtime": { + "description": "Last-modified time as a Unix epoch in seconds. `null` on failure.", + "format": "int64", + "type": [ + "integer", + "null" + ] + }, + "path": { + "description": "Canonical absolute path of the file (resolved through the jail). If resolution failed, this echoes the caller's input verbatim.", + "type": "string" + }, + "size": { + "description": "Size of the FILE in bytes (from metadata). `null` on failure or when the entry budget was exhausted before the file was opened.", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "success": { + "description": "`true` when the read succeeded (content/metadata fields are populated); `false` when an error occurred (only `error` is set).", + "type": "boolean" + }, + "total_lines": { + "description": "Total lines in the file; present when the stream reached EOF during this entry's read. `null` when not traversed or on failure.", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + } + }, + "required": [ + "path", + "success" + ], + "type": "object" + }, + "WireError": { + "description": "Structured per-entry error as it appears on the wire.\n\nUse `code` for stable programmatic branching (e.g. `\"C211\"` for not-found-or-denied). `message` carries the human/LLM-readable problem description plus the corrective next call.", + "properties": { + "code": { + "description": "Stable error code, e.g. \"C211\". See the README error table.", + "type": "string" + }, + "message": { + "description": "Human/LLM-readable message: problem + actual values + corrective next call.", + "type": "string" + } + }, + "required": [ + "code", + "message" + ], + "type": "object" + } + }, + "properties": { + "content": { + "description": "File content as a UTF-8 string — the whole file, or just the requested window when `line_from`/`line_to` was given (window lines keep their newline terminators). Binary content is returned with invalid bytes replaced by U+FFFD; use a future binary-aware function if exact bytes matter. **Single-path mode only; null when the request used `paths[]`.**", + "type": [ + "string", + "null" + ] + }, + "is_utf8": { + "description": "Whether `content` survived UTF-8 conversion without losing bytes. Reflects the RETURNED content only: a clean window inside an otherwise-binary file is still `true`. **Single-path mode only; null when the request used `paths[]`.**", + "type": [ + "boolean", + "null" + ] + }, + "lines_returned": { + "description": "Number of lines in `content`. For full reads this equals the file's total line count. **Single-path mode only; null when the request used `paths[]`.**", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "mode": { + "description": "Unix permission bits (lower 9 bits of `st_mode`), e.g. 0o644. **Single-path mode only; null when the request used `paths[]`.**", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "more_lines": { + "description": "True when the file has content beyond what `content` includes: the window ended before EOF, or the byte budget cut the window short. Always false for full reads. **Single-path mode only; null when the request used `paths[]`.**", + "type": [ + "boolean", + "null" + ] + }, + "mtime": { + "description": "Last-modified time as a Unix epoch in seconds. **Single-path mode only; null when the request used `paths[]`.**", + "format": "int64", + "type": [ + "integer", + "null" + ] + }, + "path": { + "description": "Canonical absolute path of the file read (resolved through the jail). **Single-path mode only; null when the request used `paths[]`.**", + "type": [ + "string", + "null" + ] + }, + "results": { + "description": "Per-entry results for a batch `paths[]` request. **Present only when the request used `paths[]`; null in single-path mode.**", + "items": { + "$ref": "#/definitions/ReadEntryResult" + }, + "type": [ + "array", + "null" + ] + }, + "size": { + "description": "Size of the FILE in bytes (from metadata) — not the size of `content`; in windowed mode the two differ. **Single-path mode only; null when the request used `paths[]`.**", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "total_lines": { + "description": "Total number of lines in the file. Present only when the read traversed the whole file: always for full reads; for windowed reads only when the stream naturally reached EOF within the byte cap. Never computed by forcing an extra full scan — absent means the file was not fully traversed. **Single-path mode only; null when the request used `paths[]`.**", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + } + }, + "title": "ReadFileOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.search.json b/coder/tests/golden/schemas/coder.search.json new file mode 100644 index 00000000..dd20bf25 --- /dev/null +++ b/coder/tests/golden/schemas/coder.search.json @@ -0,0 +1,206 @@ +{ + "description": "Search file contents and/or paths. Supports literal or regex queries with include/exclude globs; non-accessible files are excluded from both content and path results. Only the FIRST match on each line is reported (one content match per matching line). Optional context_lines_before/context_lines_after (max 10) attach surrounding lines to each content match so many edits can go straight to coder::update-file with no read in between. Noise paths matching default_exclude_globs (.git, node_modules, target, … — coder::info lists them) are skipped by default; pass use_default_excludes: false to search inside them. Files larger than max_read_bytes are silently skipped during content scanning. Results are capped by max_matches AND a response byte budget (search_response_budget_bytes); when truncated is true, refine the query or add include_globs rather than paginate. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*.", + "function_id": "coder::search", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "examples": [ + { + "context_lines_after": 2, + "context_lines_before": 2, + "include_globs": [ + "**/*.rs" + ], + "path": "src", + "query": "fn handle", + "search_content": true, + "search_paths": false + } + ], + "properties": { + "context_lines_after": { + "default": null, + "description": "Lines of context to return AFTER each content match. Same max (10), truncation, and budget rules as `context_lines_before`; unset = 0.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "context_lines_before": { + "default": null, + "description": "Lines of context to return BEFORE each content match (same file only, in file order), max 10 — larger values are rejected with C210. With context lines many edit tasks can go straight from search output to coder::update-file with no read in between. Context lines are truncated to `max_line_bytes` like the matched text and count toward the response byte budget. Unset = 0.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "exclude_globs": { + "default": [], + "description": "Glob patterns (same relative-to-root matching) that exclude paths.", + "items": { + "type": "string" + }, + "type": "array" + }, + "ignore_case": { + "default": false, + "type": "boolean" + }, + "include_globs": { + "default": [], + "description": "Glob patterns (matched against the path relative to its containing root) that paths must match to be considered. Empty = include everything.", + "items": { + "type": "string" + }, + "type": "array" + }, + "max_line_bytes": { + "default": null, + "description": "Bytes per line to consider when scanning content; longer lines are truncated for the match snippet.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "max_matches": { + "default": null, + "description": "Optional explicit cap. Falls back to config when unset.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "path": { + "default": ".", + "description": "Folder scoping the walk. Relative to the primary allowed root, or an absolute path inside any allowed root. Defaults to `.` (the primary root itself). Globs are matched relative to the containing root; result paths are absolute regardless of this value. Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "type": "string" + }, + "query": { + "description": "Pattern to search for. Treated as a regex when `regex: true`, otherwise as a literal substring.", + "type": "string" + }, + "regex": { + "default": false, + "type": "boolean" + }, + "search_content": { + "default": true, + "description": "Search file contents (default true).", + "type": "boolean" + }, + "search_paths": { + "default": true, + "description": "Search file paths (default true).", + "type": "boolean" + }, + "use_default_excludes": { + "default": true, + "description": "Apply the worker's `default_exclude_globs` config (noise paths like .git, node_modules, target, dist — call coder::info for the active list): the walk does not descend into matching directories and matching files are omitted from BOTH content and path results. Pass `false` to search inside them.", + "type": "boolean" + } + }, + "required": [ + "query" + ], + "title": "SearchInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "ContentMatch": { + "properties": { + "after": { + "description": "Context lines immediately after the matched line — same file only, in file order, each truncated to `max_line_bytes`. Omitted when empty (no `context_lines_after` requested, or the match is at the end of the file).", + "items": { + "type": "string" + }, + "type": [ + "array", + "null" + ] + }, + "before": { + "description": "Context lines immediately before the matched line — same file only, in file order, each truncated to `max_line_bytes`. Omitted when empty (no `context_lines_before` requested, or the match is at the start of the file).", + "items": { + "type": "string" + }, + "type": [ + "array", + "null" + ] + }, + "column": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "line": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "path": { + "description": "Absolute path under the canonical parent; symlinks at the entry itself are not resolved. Operations on it re-validate through the jail.", + "type": "string" + }, + "text": { + "description": "Matched line; truncated to `max_line_bytes` and never spans newlines.", + "type": "string" + } + }, + "required": [ + "column", + "line", + "path", + "text" + ], + "type": "object" + }, + "PathMatch": { + "properties": { + "path": { + "description": "Absolute path under the canonical parent; symlinks at the entry itself are not resolved. Operations on it re-validate through the jail.", + "type": "string" + } + }, + "required": [ + "path" + ], + "type": "object" + } + }, + "properties": { + "content_matches": { + "items": { + "$ref": "#/definitions/ContentMatch" + }, + "type": "array" + }, + "path_matches": { + "items": { + "$ref": "#/definitions/PathMatch" + }, + "type": "array" + }, + "truncated": { + "description": "True if results were cut off — either match list hit the `max_matches` cap, or the response hit the `search_response_budget_bytes` byte budget. When true, refine the query or add include_globs rather than paginate.", + "type": "boolean" + } + }, + "required": [ + "content_matches", + "path_matches", + "truncated" + ], + "title": "SearchOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.tree.json b/coder/tests/golden/schemas/coder.tree.json new file mode 100644 index 00000000..3a1299c7 --- /dev/null +++ b/coder/tests/golden/schemas/coder.tree.json @@ -0,0 +1,164 @@ +{ + "description": "Recursive directory snapshot bounded by `max_depth` and a `per_folder_limit`. Slim wire shape: nodes carry only `name` — the root node's path IS the response's top-level `path`; derive any child's path as parent path + '/' + name. Folders that hit the limit are flagged `truncated` and the caller is pointed at coder::list-folder for pagination. Noise directories matching default_exclude_globs (.git, node_modules, target, … — coder::info lists them) appear as childless `truncated` stubs; pass use_default_excludes: false to descend into them. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*.", + "function_id": "coder::tree", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "examples": [ + { + "max_depth": 3, + "path": "." + } + ], + "properties": { + "max_depth": { + "default": null, + "description": "Maximum depth to descend; the root node is depth 0.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "path": { + "default": ".", + "description": "Base folder for the snapshot. Relative to the primary allowed root, or an absolute path inside any allowed root. Defaults to `.` (the primary root itself). Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "type": "string" + }, + "per_folder_limit": { + "default": null, + "description": "Maximum children listed per folder. When more exist, the folder is flagged `truncated` and callers should switch to `coder::list-folder`.", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "use_default_excludes": { + "default": true, + "description": "Apply the worker's `default_exclude_globs` config (noise folders like .git/node_modules/target — call `coder::info` for the active list). Excluded directories still appear as childless nodes flagged `truncated` with reason \"default_exclude\"; excluded files are omitted. Pass `false` to list everything.", + "type": "boolean" + } + }, + "title": "TreeInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "NodeKind": { + "enum": [ + "file", + "dir", + "symlink", + "other" + ], + "type": "string" + }, + "TreeNode": { + "properties": { + "children": { + "items": { + "$ref": "#/definitions/TreeNode" + }, + "type": [ + "array", + "null" + ] + }, + "kind": { + "$ref": "#/definitions/NodeKind" + }, + "mtime": { + "format": "int64", + "type": "integer" + }, + "name": { + "description": "Entry basename. The ROOT node's path is the response's top-level `path` itself; every other node's path derives by joining from there: child path = parent path + \"/\" + name.", + "type": "string" + }, + "non_accessible": { + "type": "boolean" + }, + "size": { + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "truncated": { + "anyOf": [ + { + "$ref": "#/definitions/TruncationInfo" + }, + { + "type": "null" + } + ], + "description": "Set on directories whose `children` was capped at `per_folder_limit`, whose subtree was cut off by `max_depth`, or which matched `default_exclude_globs` (reason \"default_exclude\")." + } + }, + "required": [ + "kind", + "mtime", + "name", + "non_accessible", + "size" + ], + "type": "object" + }, + "TruncationInfo": { + "properties": { + "hint": { + "type": "string" + }, + "reason": { + "description": "Reason this folder was truncated: hit `per_folder_limit`, cut off by `max_depth`, or matched `default_exclude_globs` (`default_exclude`).", + "type": "string" + }, + "shown": { + "description": "Number of children actually returned.", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "total": { + "description": "Total number of children eligible for listing in the folder, counted after default-exclude filtering (only populated when `reason == \"per_folder_limit\"`; for depth truncation we don't peek into the folder).", + "format": "uint32", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + } + }, + "required": [ + "hint", + "reason", + "shown" + ], + "type": "object" + } + }, + "properties": { + "path": { + "description": "Canonical absolute path of the requested folder (resolved through the jail). Nodes carry only `name`, and the root node's path IS this `path` — do not join the root's `name` onto it; derive children by joining from here: child path = parent path + \"/\" + name. Operations on derived paths re-validate through the jail.", + "type": "string" + }, + "root": { + "allOf": [ + { + "$ref": "#/definitions/TreeNode" + } + ], + "description": "Root node of the snapshot; its `name` is the folder's basename." + } + }, + "required": [ + "path", + "root" + ], + "title": "TreeOutput", + "type": "object" + } +} diff --git a/coder/tests/golden/schemas/coder.update-file.json b/coder/tests/golden/schemas/coder.update-file.json new file mode 100644 index 00000000..c3652c47 --- /dev/null +++ b/coder/tests/golden/schemas/coder.update-file.json @@ -0,0 +1,338 @@ +{ + "description": "Apply batched line-oriented and regex edits across one or more files. Request shape: {\"files\": [{\"path\": \"...\", \"ops\": [...]}]}. Line ops: { op: 'insert', at_line, content } | { op: 'remove', from_line, to_line } | { op: 'update_lines', from_line, to_line, content } — 1-based, inclusive, applied bottom-up. Regex op: { op: 'replace', pattern, replacement, ignore_case?, dot_matches_newline?, expect_matches? } runs on the file body after line ops. Replace large regions WITHOUT quoting them: two short anchors joined by .*? with dot_matches_newline: true — always prefer wildcards over pasting the block into the pattern. expect_matches: 1 turns a silent multi-site clobber into a safe pre-write C210; expect_matches: 0 asserts absence. In `replacement`, $1/${name} are capture references and a literal $ must be written $$ (JS/TS template literals: `Hello, $${name}!`); undefined references fail pre-write with C210. Each file commits atomically via temp + rename. On success each applied line op returns a bounded post-apply echo (±2 context lines); regex replace ops return up to 5 per-match-site echoes (first + last line of each replaced region, inner lines elided) — verify from the echoes instead of re-reading the file. Paths are relative to the primary allowed root or absolute inside any allowed root (coder::info lists them); for host paths outside the jail use shell::fs::*.", + "function_id": "coder::update-file", + "request_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "UpdateFileSpec": { + "properties": { + "ops": { + "items": { + "$ref": "#/definitions/UpdateOp" + }, + "type": "array" + }, + "path": { + "description": "Path relative to the primary allowed root, or an absolute path inside any allowed root. Call `coder::info` to see the allowed roots. Paths outside every allowed root are rejected — use the shell worker's `shell::fs::*` for host paths outside the jail.", + "type": "string" + } + }, + "required": [ + "ops", + "path" + ], + "type": "object" + }, + "UpdateOp": { + "oneOf": [ + { + "description": "Insert `content` before line `at_line` (1-based). `at_line = lines+1` appends to end of file.", + "properties": { + "at_line": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "content": { + "type": "string" + }, + "op": { + "enum": [ + "insert" + ], + "type": "string" + } + }, + "required": [ + "at_line", + "content", + "op" + ], + "type": "object" + }, + { + "description": "Delete lines `from_line..=to_line` (1-based, inclusive).", + "properties": { + "from_line": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "op": { + "enum": [ + "remove" + ], + "type": "string" + }, + "to_line": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + } + }, + "required": [ + "from_line", + "op", + "to_line" + ], + "type": "object" + }, + { + "description": "Overwrite lines `from_line..=to_line` with `content`.", + "properties": { + "content": { + "type": "string" + }, + "from_line": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "op": { + "enum": [ + "update_lines" + ], + "type": "string" + }, + "to_line": { + "format": "uint32", + "minimum": 0.0, + "type": "integer" + } + }, + "required": [ + "content", + "from_line", + "op", + "to_line" + ], + "type": "object" + }, + { + "description": "Replace all regex matches in the file body (after line ops).", + "properties": { + "dot_matches_newline": { + "default": false, + "description": "When true, `.` in `pattern` also matches `\\n`, so a short anchored pattern like `\"fn parse_config\\\\(.*?\\\\n\\\\}\"` replaces a whole multi-line region without quoting it — prefer two short anchors joined by `.*?` over pasting the block into the pattern. Without this flag (the default), `.` does not cross newlines and a multi-line pattern silently matches nothing.", + "type": "boolean" + }, + "expect_matches": { + "default": null, + "description": "Expected number of matches for this op. When set and the actual count differs, this FILE fails with C210 and nothing is written to it (other files in the batch still apply). Set `expect_matches: 1` to make a targeted read-free edit safe — a mismatch means the pattern is anchored too loosely or matches nothing. Set `expect_matches: 0` to assert ABSENCE: the op succeeds only when nothing matches (the replacement is unused). Omit to replace all matches unconditionally.", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "ignore_case": { + "default": false, + "type": "boolean" + }, + "op": { + "enum": [ + "replace" + ], + "type": "string" + }, + "pattern": { + "type": "string" + }, + "replacement": { + "description": "Substitution text for each match. Capture references expand: `$1`/`${1}` by index (`$0` is the whole match) and `$name`/`${name}` by name. A literal `$` MUST be written `$$` — JS/TS template literals in a replacement are the classic collision: write `Hello, $${name}!` to output `Hello, ${name}!`. Unbraced references consume the longest `[0-9A-Za-z_]` run, so `$1a` means a group named \"1a\", NOT group 1 then \"a\" (use `${1}a`). A reference to a group the pattern does not define fails pre-write with C210 — nothing is written. References are validated even when the replacement goes unused (e.g. `expect_matches: 0`): the replacement must be well-formed even when unused.", + "type": "string" + } + }, + "required": [ + "op", + "pattern", + "replacement" + ], + "type": "object" + } + ] + } + }, + "examples": [ + { + "files": [ + { + "ops": [ + { + "at_line": 1, + "content": "// generated by coder\n", + "op": "insert" + }, + { + "content": "pub fn hello() {\n println!(\"hello\");\n}\n", + "from_line": 5, + "op": "update_lines", + "to_line": 7 + }, + { + "dot_matches_newline": true, + "expect_matches": 1, + "op": "replace", + "pattern": "// BEGIN legacy.*?// END legacy", + "replacement": "// removed" + } + ], + "path": "src/lib.rs" + } + ] + } + ], + "properties": { + "files": { + "items": { + "$ref": "#/definitions/UpdateFileSpec" + }, + "type": "array" + } + }, + "required": [ + "files" + ], + "title": "UpdateFileInput", + "type": "object" + }, + "response_schema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "definitions": { + "OpEcho": { + "description": "Post-apply snapshot of the region affected by one op. Line ops echo the affected region with ±2 context lines; regex `replace` ops emit one echo per match site (up to 5, no context): the FIRST and LAST line of the post-replace region, with `elided` set to the inner line count when the region spans more than 2 lines (single-line replacements echo just that line). Each site carries `total_replacements`. Provides just enough context to verify the edit landed in the right place without flooding the LLM context with the full file body.", + "properties": { + "elided": { + "description": "Number of middle lines elided from a large region: for line ops, set when the affected region exceeded the per-echo cap; for replace sites, the count of inner lines between the region's echoed first and last line (set when the region spans >2 lines).", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + }, + "from_line": { + "description": "1-based line number of the first echoed line (after all ops applied).", + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "lines": { + "description": "The echoed lines, post-apply. When the region is large, middle lines are elided and `elided` is set to indicate how many were skipped.", + "items": { + "type": "string" + }, + "type": "array" + }, + "op_index": { + "description": "Index of the op in the request's ops array (0-based).", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "total_replacements": { + "description": "Total number of replacements the regex op made across the whole file (set only on replace-op site echoes, duplicated on each site). Sites are capped at 5 — when more matched, this count is the only record of the extras.", + "format": "uint64", + "minimum": 0.0, + "type": [ + "integer", + "null" + ] + } + }, + "required": [ + "from_line", + "lines", + "op_index" + ], + "type": "object" + }, + "UpdateFileResult": { + "properties": { + "applied": { + "description": "Number of operations applied (only meaningful when `success`).", + "format": "uint32", + "minimum": 0.0, + "type": "integer" + }, + "echoes": { + "description": "Per-op bounded post-apply echoes for edit verification; each applied op returns a snapshot of the affected region (±2 context lines) so the caller can confirm the edit landed at the right position without receiving the full file body. See `OpEcho` for field semantics. Empty on failure. Always present on the wire.", + "items": { + "$ref": "#/definitions/OpEcho" + }, + "type": "array" + }, + "echoes_truncated": { + "description": "True when the total echo budget (~4 KiB) was exhausted before all op echoes could be emitted. Use `coder::read-file` to inspect the full result if needed. Always present on the wire.", + "type": "boolean" + }, + "error": { + "anyOf": [ + { + "$ref": "#/definitions/WireError" + }, + { + "type": "null" + } + ], + "description": "Structured error for this entry. `code` is stable for programmatic branching (e.g. `\"C211\"` for not-found-or-denied; `\"C210\"` for bad input such as overlapping ops). `message` carries the corrective action an LLM agent needs to make a successful second call." + }, + "new_line_count": { + "description": "Final line count after applying (only meaningful when `success`).", + "format": "uint64", + "minimum": 0.0, + "type": "integer" + }, + "path": { + "description": "Canonical absolute path (resolved through the jail); the caller's input verbatim when resolution failed.", + "type": "string" + }, + "success": { + "type": "boolean" + } + }, + "required": [ + "applied", + "echoes", + "echoes_truncated", + "new_line_count", + "path", + "success" + ], + "type": "object" + }, + "WireError": { + "description": "Structured per-entry error as it appears on the wire.\n\nUse `code` for stable programmatic branching (e.g. `\"C211\"` for not-found-or-denied). `message` carries the human/LLM-readable problem description plus the corrective next call.", + "properties": { + "code": { + "description": "Stable error code, e.g. \"C211\". See the README error table.", + "type": "string" + }, + "message": { + "description": "Human/LLM-readable message: problem + actual values + corrective next call.", + "type": "string" + } + }, + "required": [ + "code", + "message" + ], + "type": "object" + } + }, + "properties": { + "results": { + "items": { + "$ref": "#/definitions/UpdateFileResult" + }, + "type": "array" + } + }, + "required": [ + "results" + ], + "title": "UpdateFileOutput", + "type": "object" + } +} diff --git a/coder/tests/golden_errors.rs b/coder/tests/golden_errors.rs new file mode 100644 index 00000000..62f6e231 --- /dev/null +++ b/coder/tests/golden_errors.rs @@ -0,0 +1,545 @@ +//! GOLDEN FAMILY B — error-message format snapshots for every C2xx shape. +//! +//! Each case constructs its error through the REAL public paths +//! (`PathResolver` construction/resolution + the per-verb handlers) in a +//! fixed two-root temp layout, then normalizes machine-specific path +//! prefixes (the canonical tempdir roots become `` / ``) +//! before comparing against `tests/golden/errors.json` (map: case name -> +//! `{code, message}`). +//! +//! Pinned shapes (T3's prescriptive messages + redaction invariant): +//! - C210: both-set root config; bad create-file mode; move cross-root +//! directory; move destination-is-a-directory (prescriptive target hint); +//! undefined replacement capture reference (v0.4.1 pre-write guard — +//! the production silent-corruption repro from session q8x6g248) +//! - C211: missing path; glob-denied path (byte-identical suffix — +//! REDACTION INVARIANT); recursive-delete subtree-blocked; move missing +//! source +//! - C213: read cap; write cap; batch budget exhausted; full-read output +//! budget (S4's recovery-tool message: size + total_lines + corrective +//! calls) +//! - C215: relative `..` escape; absolute outside all roots; dangling +//! symlink +//! - C216: io passthrough (via the public `From` conversion — +//! see the case comment for why no handler drives this one) +//! - C217: create-file exists without overwrite; move destination exists +//! without overwrite (one house C217 shape) +//! +//! Regenerate with `UPDATE_GOLDENS=1 cargo test`. + +mod support; + +use std::collections::BTreeMap; +use std::path::PathBuf; +use std::sync::Arc; + +use coder::config::CoderConfig; +use coder::error::CoderError; +use coder::functions::{create_file, delete_file, move_file, read_file, update_file}; +use coder::path::PathResolver; + +/// Normalized `{code, message}` pair as written to the golden file. +#[derive(serde::Serialize)] +struct GoldenError { + code: String, + message: String, +} + +/// Fixed two-root jail for the error cases. `` is primary. +struct Jail { + _tmp0: tempfile::TempDir, + _tmp1: tempfile::TempDir, + root0: PathBuf, + root1: PathBuf, + resolver: Arc, + cfg: Arc, +} + +impl Jail { + fn new() -> Self { + let tmp0 = tempfile::tempdir().unwrap(); + let tmp1 = tempfile::tempdir().unwrap(); + // cfg.base_paths holds the RAW tempdir forms; root0/root1 below are + // the CANONICAL forms the error messages name (PathResolver + // canonicalizes internally — on macOS the two differ: /var vs + // /private/var). + let cfg = Arc::new(CoderConfig { + base_paths: vec![tmp0.path().to_path_buf(), tmp1.path().to_path_buf()], + non_accessible_globs: vec!["**/.env".to_string()], + ..CoderConfig::default() + }); + let resolver = Arc::new(PathResolver::new(&cfg).unwrap()); + let root0 = std::fs::canonicalize(tmp0.path()).unwrap(); + let root1 = std::fs::canonicalize(tmp1.path()).unwrap(); + Self { + _tmp0: tmp0, + _tmp1: tmp1, + root0, + root1, + resolver, + cfg, + } + } + + /// Replace the canonical tempdir roots with stable tokens so the + /// golden file is machine-independent. Longest root first in case one + /// string is a prefix of the other. + fn normalize(&self, msg: &str) -> String { + let mut subs = [ + (self.root0.display().to_string(), ""), + (self.root1.display().to_string(), ""), + ]; + subs.sort_by_key(|(raw, _)| std::cmp::Reverse(raw.len())); + let mut out = msg.to_string(); + for (raw, token) in subs { + out = out.replace(&raw, token); + } + out + } +} + +/// Parse a top-level handler `Err(String)` — the wire JSON +/// `{"code":"C2xx","message":"..."}` — into (code, message). +fn parse_wire_string(err: &str) -> (String, String) { + let v: serde_json::Value = serde_json::from_str(err) + .unwrap_or_else(|e| panic!("handler error is not wire JSON ({e}): {err}")); + ( + v["code"].as_str().expect("code").to_string(), + v["message"].as_str().expect("message").to_string(), + ) +} + +fn from_coder_error(e: &CoderError) -> (String, String) { + (e.code().to_string(), e.message().to_string()) +} + +async fn read_err(jail: &Jail, path: &str) -> (String, String) { + let err = read_file::handle( + jail.resolver.clone(), + jail.cfg.clone(), + read_file::ReadFileInput { + path: Some(path.into()), + ..read_file::ReadFileInput::default() + }, + ) + .await + .expect_err("read must fail"); + parse_wire_string(&err) +} + +async fn create_err( + jail: &Jail, + cfg: Arc, + spec: create_file::CreateFileSpec, +) -> (String, String) { + let out = create_file::handle( + jail.resolver.clone(), + cfg, + create_file::CreateFileInput { files: vec![spec] }, + ) + .await + .expect("create-file batches never fail top-level for per-entry errors"); + let wire = out.results[0] + .error + .as_ref() + .expect("entry must carry an error"); + (wire.code.clone(), wire.message.clone()) +} + +#[tokio::test] +async fn error_message_formats_match_golden() { + let jail = Jail::new(); + let mut cases: BTreeMap = BTreeMap::new(); + let mut put = |name: &str, (code, message): (String, String), jail: &Jail| { + cases.insert( + name.to_string(), + GoldenError { + code, + message: jail.normalize(&message), + }, + ); + }; + + // --- C210: operator config error — both root forms set ------------- + { + let cfg = CoderConfig { + base_path: Some(jail.root0.clone()), + base_paths: vec![jail.root1.clone()], + ..CoderConfig::default() + }; + let err = PathResolver::new(&cfg).expect_err("both-set must fail"); + put( + "C210_config_both_root_forms_set", + from_coder_error(&err), + &jail, + ); + } + + // --- C210: bad input — malformed create-file mode ------------------ + { + let got = create_err( + &jail, + jail.cfg.clone(), + create_file::CreateFileSpec { + path: "bad-mode.txt".into(), + content: "x".into(), + mode: "9z9".into(), + parents: true, + overwrite: false, + }, + ) + .await; + put("C210_create_bad_mode", got, &jail); + } + + // --- C211: missing vs glob-denied (REDACTION INVARIANT) ------------ + std::fs::write(jail.root0.join(".env"), "secret").unwrap(); + let missing = read_err(&jail, "missing.txt").await; + let denied = read_err(&jail, ".env").await; + { + // Byte-identical suffix after the caller-supplied path prefix: + // callers must not be able to distinguish "missing" from "denied". + let m_suffix = missing + .1 + .strip_prefix("missing.txt: ") + .expect("missing message starts with its wire path"); + let d_suffix = denied + .1 + .strip_prefix(".env: ") + .expect("denied message starts with its wire path"); + assert_eq!( + m_suffix, d_suffix, + "C211 missing vs glob-denied suffixes must be byte-identical" + ); + } + put("C211_read_missing", missing, &jail); + put("C211_read_glob_denied", denied, &jail); + + // --- C211: recursive delete refused on non-accessible subtree ------ + { + std::fs::create_dir_all(jail.root0.join("blocked-dir/nested")).unwrap(); + std::fs::write(jail.root0.join("blocked-dir/nested/.env"), "s").unwrap(); + let out = delete_file::handle( + jail.resolver.clone(), + delete_file::DeleteFileInput { + paths: vec!["blocked-dir".into()], + recursive: true, + }, + ) + .await + .unwrap(); + let wire = out.results[0].error.as_ref().expect("subtree must block"); + put( + "C211_delete_subtree_blocked", + (wire.code.clone(), wire.message.clone()), + &jail, + ); + } + + // --- C213: read cap and write cap ----------------------------------- + { + let tiny = Arc::new(CoderConfig { + base_paths: vec![jail.root0.clone(), jail.root1.clone()], + non_accessible_globs: vec!["**/.env".to_string()], + max_read_bytes: 8, + max_write_bytes: 8, + ..CoderConfig::default() + }); + std::fs::write(jail.root0.join("big.txt"), "0123456789ABCDEF").unwrap(); + let err = read_file::handle( + jail.resolver.clone(), + tiny.clone(), + read_file::ReadFileInput { + path: Some("big.txt".into()), + ..read_file::ReadFileInput::default() + }, + ) + .await + .expect_err("over-cap read must fail"); + put("C213_read_cap_exceeded", parse_wire_string(&err), &jail); + + let got = create_err( + &jail, + tiny, + create_file::CreateFileSpec { + path: "big-create.txt".into(), + content: "0123456789ABCDEF".into(), + mode: "0644".into(), + parents: true, + overwrite: false, + }, + ) + .await; + put("C213_write_cap_exceeded", got, &jail); + } + + // --- C213: batch budget exhausted ------------------------------------- + { + let tiny_budget = Arc::new(CoderConfig { + base_paths: vec![jail.root0.clone(), jail.root1.clone()], + non_accessible_globs: vec!["**/.env".to_string()], + max_read_bytes: 1024 * 1024, + batch_read_budget_bytes: 5, + ..CoderConfig::default() + }); + std::fs::write(jail.root0.join("budget-a.txt"), "abcde").unwrap(); + std::fs::write(jail.root0.join("budget-b.txt"), "next").unwrap(); + let out = read_file::handle( + jail.resolver.clone(), + tiny_budget, + read_file::ReadFileInput { + paths: Some(vec![ + read_file::ReadTarget::Path("budget-a.txt".into()), + read_file::ReadTarget::Path("budget-b.txt".into()), + ]), + ..read_file::ReadFileInput::default() + }, + ) + .await + .expect("batch never fails top-level"); + let results = out.results.unwrap(); + let wire = results[1].error.as_ref().expect("second entry must fail"); + put( + "C213_batch_budget_exhausted", + (wire.code.clone(), wire.message.clone()), + &jail, + ); + } + + // --- C213: full-read output budget (the recovery-tool message) ------- + { + let tiny_output = Arc::new(CoderConfig { + base_paths: vec![jail.root0.clone(), jail.root1.clone()], + non_accessible_globs: vec!["**/.env".to_string()], + max_output_bytes: 8, + ..CoderConfig::default() + }); + std::fs::write(jail.root0.join("over-output.txt"), "L1\nL2\nL3\n").unwrap(); + let err = read_file::handle( + jail.resolver.clone(), + tiny_output, + read_file::ReadFileInput { + path: Some("over-output.txt".into()), + ..read_file::ReadFileInput::default() + }, + ) + .await + .expect_err("over-output full read must fail"); + put( + "C213_full_read_output_budget_exceeded", + parse_wire_string(&err), + &jail, + ); + } + + // --- C215: relative `..` escape / absolute outside / dangling link -- + put( + "C215_relative_dotdot_escape", + read_err(&jail, "../escape.txt").await, + &jail, + ); + put( + "C215_absolute_outside_all_roots", + read_err(&jail, "/etc/passwd").await, + &jail, + ); + { + std::os::unix::fs::symlink(jail.root0.join("missing-target"), jail.root0.join("dangle")) + .unwrap(); + put( + "C215_dangling_symlink", + read_err(&jail, "dangle/child.txt").await, + &jail, + ); + } + + // --- C216: io passthrough ------------------------------------------- + // No handler drives this one deterministically: a real EACCES needs a + // chmod-000 directory, which silently stops failing when tests run as + // root (CI containers). The public `From` conversion in + // error.rs IS the path every handler's `?` takes for non-NotFound io + // errors, so pinning it pins the C216 wire shape: the io error text + // passes through verbatim under code C216. + { + let e = CoderError::from(std::io::Error::other("synthetic io failure")); + put("C216_io_passthrough", from_coder_error(&e), &jail); + } + + // --- C217: create-file exists without overwrite --------------------- + { + std::fs::write(jail.root0.join("exists.txt"), "old").unwrap(); + let got = create_err( + &jail, + jail.cfg.clone(), + create_file::CreateFileSpec { + path: "exists.txt".into(), + content: "new".into(), + mode: "0644".into(), + parents: true, + overwrite: false, + }, + ) + .await; + put("C217_create_exists_without_overwrite", got, &jail); + } + + // --- C217: move destination exists without overwrite ---------------- + { + std::fs::write(jail.root0.join("move-src.txt"), "src").unwrap(); + std::fs::write(jail.root0.join("move-dst.txt"), "dst").unwrap(); + let out = move_file::handle( + jail.resolver.clone(), + move_file::MoveFileInput { + files: vec![move_file::MoveFileSpec { + from: "move-src.txt".into(), + to: "move-dst.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .expect("move batches never fail top-level"); + let wire = out.results[0] + .error + .as_ref() + .expect("entry must carry an error"); + put( + "C217_move_dst_exists_without_overwrite", + (wire.code.clone(), wire.message.clone()), + &jail, + ); + } + + // --- C210: move — cross-root directory move rejected ---------------- + { + std::fs::create_dir_all(jail.root0.join("cross-dir")).unwrap(); + std::fs::write(jail.root0.join("cross-dir/f.txt"), "x").unwrap(); + let dst_abs = jail.root1.join("cross-dir"); + let out = move_file::handle( + jail.resolver.clone(), + move_file::MoveFileInput { + files: vec![move_file::MoveFileSpec { + from: "cross-dir".into(), + to: dst_abs.display().to_string(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .expect("move batches never fail top-level"); + let wire = out.results[0] + .error + .as_ref() + .expect("entry must carry an error"); + put( + "C210_move_cross_root_dir", + (wire.code.clone(), jail.normalize(&wire.message)), + &jail, + ); + } + + // --- C210: move — destination is a directory (prescriptive) --------- + { + std::fs::write(jail.root0.join("move-dir-src.txt"), "x").unwrap(); + std::fs::create_dir_all(jail.root0.join("move-dst-dir")).unwrap(); + let out = move_file::handle( + jail.resolver.clone(), + move_file::MoveFileInput { + files: vec![move_file::MoveFileSpec { + from: "move-dir-src.txt".into(), + to: "move-dst-dir".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .expect("move batches never fail top-level"); + let wire = out.results[0] + .error + .as_ref() + .expect("entry must carry an error"); + put( + "C210_move_dst_is_directory", + (wire.code.clone(), wire.message.clone()), + &jail, + ); + } + + // --- C210: update-file — undefined replacement capture reference ---- + // v0.4.1 pre-write guard, pinned with the EXACT production repro + // (session q8x6g248): a replacement carrying a JS template literal + // (`Hello, ${name}!`) against a pattern with no capture groups. The + // old behavior silently expanded `${name}` to the empty string and + // wrote the corrupted file with success: true. + { + std::fs::write( + jail.root0.join("tpl-handler.js"), + "iii.registerFunction({ handler: () => 'hi' });\n", + ) + .unwrap(); + let out = + update_file::handle( + jail.resolver.clone(), + jail.cfg.clone(), + update_file::UpdateFileInput { + files: vec![update_file::UpdateFileSpec { + path: "tpl-handler.js".into(), + ops: vec![update_file::UpdateOp::Replace { + pattern: r"iii\.registerFunction\(.*".into(), + replacement: + "iii.registerFunction({ body: { message: `Hello, ${name}!` } });" + .into(), + ignore_case: false, + dot_matches_newline: true, + expect_matches: Some(1), + }], + }], + }, + ) + .await + .expect("update-file batches never fail top-level for per-entry errors"); + let wire = out.results[0] + .error + .as_ref() + .expect("entry must carry an error"); + put( + "C210_replace_undefined_capture_ref", + (wire.code.clone(), wire.message.clone()), + &jail, + ); + } + + // --- C211: move — missing source ------------------------------------ + { + let out = move_file::handle( + jail.resolver.clone(), + move_file::MoveFileInput { + files: vec![move_file::MoveFileSpec { + from: "no-such-move-src.txt".into(), + to: "no-such-dst.txt".into(), + overwrite: false, + parents: true, + }], + }, + ) + .await + .expect("move batches never fail top-level"); + let wire = out.results[0] + .error + .as_ref() + .expect("entry must carry an error"); + put( + "C211_move_missing_src", + (wire.code.clone(), wire.message.clone()), + &jail, + ); + } + + // --- compare against the committed golden --------------------------- + let mut pretty = serde_json::to_string_pretty(&cases).expect("cases serialize"); + pretty.push('\n'); + support::assert_golden("errors.json", &pretty); +} diff --git a/coder/tests/golden_schemas.rs b/coder/tests/golden_schemas.rs new file mode 100644 index 00000000..6ea076a7 --- /dev/null +++ b/coder/tests/golden_schemas.rs @@ -0,0 +1,298 @@ +//! GOLDEN FAMILY A — wire-schema snapshots for all 9 functions. +//! +//! `coder::functions::catalog()` is the single source of truth for each +//! function's id, registration description, and schemars-derived +//! request/response schemas (generated with the same +//! `SchemaSettings::draft07()` construction iii-sdk uses at registration, +//! from the same input/output structs). Each entry is serialized to +//! pretty JSON and compared against `tests/golden/schemas/.json` +//! (`::` in the function id maps to `.` in the filename). +//! +//! These snapshots ARE the product surface consumed by LLM agents — any +//! schema, description, or error-shape change must land as an explicit +//! golden diff. Regenerate with `UPDATE_GOLDENS=1 cargo test`. + +mod support; + +use coder::functions::{catalog, FunctionSpec}; + +/// `coder::read-file` -> `coder.read-file.json` (no `::` in filenames so +/// the goldens stay portable across filesystems). +fn golden_file_name(function_id: &str) -> String { + format!("schemas/{}.json", function_id.replace("::", ".")) +} + +fn spec_to_pretty_json(spec: &FunctionSpec) -> String { + let value = serde_json::json!({ + "function_id": spec.function_id, + "description": spec.description, + "request_schema": spec.request_schema, + "response_schema": spec.response_schema, + }); + let mut pretty = serde_json::to_string_pretty(&value).expect("spec serializes"); + pretty.push('\n'); + pretty +} + +/// The catalog must cover exactly the 9 registered functions, in +/// registration order (kept in lockstep with `register_all`). +#[test] +fn catalog_lists_all_nine_functions_in_registration_order() { + let ids: Vec<&str> = catalog().iter().map(|s| s.function_id).collect(); + assert_eq!( + ids, + vec![ + "coder::info", + "coder::read-file", + "coder::search", + "coder::update-file", + "coder::create-file", + "coder::delete-file", + "coder::list-folder", + "coder::tree", + "coder::move", + ] + ); +} + +/// Every catalog entry matches its committed golden. Mismatches are +/// collected across ALL functions before failing so one run shows the +/// full drift, not just the first file. +#[test] +fn wire_schema_snapshots_match_goldens() { + let mut failures = Vec::new(); + for spec in catalog() { + let rel = golden_file_name(spec.function_id); + let actual = spec_to_pretty_json(&spec); + if let Err(msg) = support::check_golden(&rel, &actual) { + failures.push(msg); + } + } + assert!( + failures.is_empty(), + "{} wire-schema golden(s) drifted:\n\n{}", + failures.len(), + failures.join("\n") + ); +} + +/// Every catalog entry's request_schema must carry at least one example. +/// New functions must ship a canonical request example — examples are +/// load-bearing wire-contract anchors that ground LLM agent payload shape. +#[test] +fn every_request_schema_has_at_least_one_example() { + let mut missing = Vec::new(); + for spec in catalog() { + let has_example = spec + .request_schema + .schema + .metadata + .as_ref() + .map(|m| !m.examples.is_empty()) + .unwrap_or(false); + if !has_example { + missing.push(spec.function_id); + } + } + assert!( + missing.is_empty(), + "request schemas missing at least one example (add \ + #[schemars(example = \"fn_path\")] to the input struct): {missing:?}" + ); +} + +// --------------------------------------------------------------------------- +// Example round-trip + key-fact validation. +// +// `from_value` alone is weak: none of the input structs use +// `deny_unknown_fields`, so a typo in an OPTIONAL field deserializes +// silently into the default. Each test below therefore re-asserts the +// example's key facts on the DESERIALIZED struct — pinning every example +// through the real wire deserialization path, not just "it parses". +// --------------------------------------------------------------------------- + +/// The examples actually embedded in the wire schema (request_schema +/// metadata) — validating these validates exactly what agents see. +fn request_examples(function_id: &str) -> Vec { + let spec = catalog() + .into_iter() + .find(|s| s.function_id == function_id) + .unwrap_or_else(|| panic!("no catalog entry for {function_id}")); + spec.request_schema + .schema + .metadata + .map(|m| m.examples) + .unwrap_or_default() +} + +/// Deserialize example `idx` of `function_id` into the input type, with +/// a readable panic when the example does not round-trip. +fn example_as(function_id: &str, idx: usize) -> T { + let examples = request_examples(function_id); + let value = examples + .get(idx) + .unwrap_or_else(|| panic!("{function_id} has no example at index {idx}")); + serde_json::from_value(value.clone()).unwrap_or_else(|e| { + panic!("example {idx} of {function_id} does not deserialize into the input type: {e}") + }) +} + +#[test] +fn info_example_round_trips() { + let _input: coder::functions::info::InfoInput = example_as("coder::info", 0); +} + +#[test] +fn read_file_single_example_round_trips_with_window() { + let input: coder::functions::read_file::ReadFileInput = example_as("coder::read-file", 0); + assert_eq!(input.path.as_deref(), Some("src/main.rs")); + assert_eq!(input.line_from, Some(10)); + assert_eq!(input.line_to, Some(50)); + assert!( + input.paths.is_none(), + "single-path example must not set paths" + ); +} + +#[test] +fn read_file_batch_example_round_trips_with_both_target_forms() { + use coder::functions::read_file::{ReadFileInput, ReadTarget}; + let input: ReadFileInput = example_as("coder::read-file", 1); + assert!(input.path.is_none(), "batch example must not set path"); + let targets = input.paths.expect("batch example sets paths"); + assert_eq!(targets.len(), 2); + assert!( + matches!(&targets[0], ReadTarget::Path(p) if p == "src/lib.rs"), + "first target must be the bare-string form" + ); + assert!( + matches!( + &targets[1], + ReadTarget::Window { + line_from: Some(1), + line_to: Some(30), + .. + } + ), + "second target must be the windowed object form" + ); +} + +#[test] +fn search_example_round_trips() { + let input: coder::functions::search::SearchInput = example_as("coder::search", 0); + assert_eq!(input.query, "fn handle"); + assert_eq!(input.include_globs, vec!["**/*.rs".to_string()]); + assert_eq!(input.context_lines_before, Some(2)); + assert_eq!(input.context_lines_after, Some(2)); + assert!( + input.use_default_excludes, + "example leaves use_default_excludes at its serde default (true)" + ); + assert!(input.search_content); + assert!(!input.search_paths); +} + +#[test] +fn update_file_example_round_trips_with_three_op_kinds() { + use coder::functions::update_file::{UpdateFileInput, UpdateOp}; + let input: UpdateFileInput = example_as("coder::update-file", 0); + assert_eq!(input.files.len(), 1); + let ops = &input.files[0].ops; + assert_eq!(ops.len(), 3); + assert!(matches!(ops[0], UpdateOp::Insert { at_line: 1, .. })); + assert!(matches!( + ops[1], + UpdateOp::UpdateLines { + from_line: 5, + to_line: 7, + .. + } + )); + assert!(matches!(ops[2], UpdateOp::Replace { .. })); +} + +#[test] +fn create_file_example_round_trips_with_relative_and_absolute_entries() { + let input: coder::functions::create_file::CreateFileInput = example_as("coder::create-file", 0); + assert_eq!(input.files.len(), 2); + assert!( + !input.files[0].path.starts_with('/'), + "first entry must show the relative-path form" + ); + assert!( + input.files[1].path.starts_with("/tmp/"), + "second entry must show the in-default-root absolute form" + ); + assert!(!input.files[0].overwrite); + assert!(input.files[1].overwrite); +} + +#[test] +fn delete_file_example_round_trips() { + let input: coder::functions::delete_file::DeleteFileInput = example_as("coder::delete-file", 0); + assert_eq!(input.paths.len(), 2); + assert!( + input.recursive, + "example demonstrates recursive dir removal" + ); +} + +#[test] +fn list_folder_example_round_trips() { + let input: coder::functions::list_folder::ListFolderInput = example_as("coder::list-folder", 0); + assert_eq!(input.page_size, Some(50)); +} + +#[test] +fn tree_example_round_trips() { + let input: coder::functions::tree::TreeInput = example_as("coder::tree", 0); + assert_eq!(input.max_depth, Some(3)); + assert!( + input.use_default_excludes, + "omitting use_default_excludes must default to true (noise exclusion on)" + ); +} + +#[test] +fn move_example_round_trips_with_in_root_absolute_destination() { + let input: coder::functions::move_file::MoveFileInput = example_as("coder::move", 0); + assert_eq!(input.files.len(), 2); + assert!(!input.files[0].overwrite); + assert!(input.files[1].overwrite); + // The absolute destination must live inside a DEFAULT root ("/tmp") + // — a canonical example must not be rejected with C215 under the + // default config (T10 review finding). + assert!( + input.files[1].to.starts_with("/tmp/"), + "cross-root example destination must be inside the default /tmp root, \ + got: {}", + input.files[1].to + ); +} + +/// No stale goldens: every file under tests/golden/schemas/ must +/// correspond to a current catalog entry (catches renames/removals that +/// forget to delete the old snapshot). +#[test] +fn no_orphan_schema_goldens() { + let dir = support::golden_root().join("schemas"); + let expected: Vec = catalog() + .iter() + .map(|s| format!("{}.json", s.function_id.replace("::", "."))) + .collect(); + let entries = match std::fs::read_dir(&dir) { + Ok(e) => e, + // Directory absent only before first UPDATE_GOLDENS run; the + // snapshot test above already fails loudly in that case. + Err(_) => return, + }; + for entry in entries.filter_map(Result::ok) { + let name = entry.file_name().to_string_lossy().into_owned(); + assert!( + expected.iter().any(|e| e == &name), + "orphan golden tests/golden/schemas/{name}: no catalog entry \ + produces it. Delete it or fix the catalog." + ); + } +} diff --git a/coder/tests/integration.rs b/coder/tests/integration.rs index 8c9b87a5..b11c0171 100644 --- a/coder/tests/integration.rs +++ b/coder/tests/integration.rs @@ -229,9 +229,16 @@ async fn full_lifecycle_via_iii_sdk() { let content_matches = search["content_matches"] .as_array() .expect("content matches"); + // Response paths are canonical-absolute (decision D2-eng); the worker + // canonicalized its root, so anchor the expectation the same way. + let expected_match_path = std::fs::canonicalize(h.base.path()) + .expect("canonicalize harness base") + .join("hello.txt") + .display() + .to_string(); assert!(content_matches .iter() - .any(|m| m["path"] == "hello.txt" && m["line"] == 2)); + .any(|m| m["path"] == expected_match_path.as_str() && m["line"] == 2)); // 7. Non-accessible glob blocks reads even though list-folder shows it. std::fs::write(h.base.path().join(".env"), "API_KEY=secret\n").unwrap(); diff --git a/coder/tests/parity.rs b/coder/tests/parity.rs new file mode 100644 index 00000000..2c73a67f --- /dev/null +++ b/coder/tests/parity.rs @@ -0,0 +1,162 @@ +//! GOLDEN FAMILY C — canonicalization parity vectors. +//! +//! CANONICAL CASE MATRIX for the MIRROR-INVARIANT shared between +//! `coder/src/path/mod.rs` and `shell/src/fs/host.rs` +//! (`canonicalize_with_fallback` + `normalize_lexical` implement the same +//! jail-safety algorithm and MUST evolve in lockstep). +//! +//! True cross-crate execution is impossible — coder and shell are +//! separate Cargo workspaces, and a shared crate is deliberately deferred +//! (rule of three). Instead this VECTOR TABLE pins the behavioral +//! contract against coder's `PathResolver`; shell's host.rs points here +//! from its MIRROR-INVARIANT note. When you change either implementation, +//! port the fix to the other file AND extend this matrix with the case +//! that motivated the change. +//! +//! Case matrix: +//! 1. relative resolve -> Ok, inside primary root +//! 2. `.` resolve -> Ok, equals canonical primary root +//! 3. nonexistent inside base -> Ok via longest-existing-ancestor fallback +//! 4. relative `..` escape -> C215 +//! 5. symlink escape -> C215 (canonicalized BEFORE containment) +//! 6. dangling symlink in tail -> C215 +//! 7. absolute inside a root -> Ok, canonical absolute +//! 8. absolute outside all roots -> C215 + +use std::path::Path; + +use coder::config::CoderConfig; +use coder::path::PathResolver; + +/// What a vector expects from `PathResolver::resolve`. +enum Expect { + /// Ok; result starts with the canonical primary root and ends with + /// the given suffix. + OkInsidePrimary(&'static str), + /// Ok; result is exactly the canonical primary root. + OkEqualsPrimaryRoot, + /// Err with this error code. + ErrCode(&'static str), +} + +struct Vector { + name: &'static str, + /// Prepare the jail contents and return the wire path to resolve. + /// Receives the RAW (pre-canonicalization) primary root. + arrange: fn(root: &Path) -> String, + expect: Expect, +} + +const VECTORS: &[Vector] = &[ + Vector { + name: "relative_resolve", + arrange: |root| { + std::fs::create_dir(root.join("sub")).unwrap(); + std::fs::write(root.join("sub/a.txt"), b"hi").unwrap(); + "sub/a.txt".into() + }, + expect: Expect::OkInsidePrimary("sub/a.txt"), + }, + Vector { + name: "dot_resolve", + arrange: |_root| ".".into(), + expect: Expect::OkEqualsPrimaryRoot, + }, + Vector { + name: "nonexistent_inside_base_via_fallback", + arrange: |_root| "does/not/exist.txt".into(), + expect: Expect::OkInsidePrimary("does/not/exist.txt"), + }, + Vector { + name: "dotdot_escape", + arrange: |_root| "../escape.txt".into(), + expect: Expect::ErrCode("C215"), + }, + Vector { + name: "symlink_escape", + arrange: |root| { + // Symlink inside the jail pointing OUTSIDE it. The lexical + // form stays inside the root; only canonicalization-before- + // containment catches the escape (the jail-escape vector the + // MIRROR-INVARIANT exists for). + std::os::unix::fs::symlink("/", root.join("escape")).unwrap(); + "escape/etc/passwd".into() + }, + expect: Expect::ErrCode("C215"), + }, + Vector { + name: "dangling_symlink", + arrange: |root| { + std::os::unix::fs::symlink(root.join("missing-target"), root.join("dangle")).unwrap(); + "dangle/child.txt".into() + }, + expect: Expect::ErrCode("C215"), + }, + Vector { + name: "absolute_inside_root_accept", + arrange: |root| { + std::fs::write(root.join("abs.txt"), b"x").unwrap(); + root.join("abs.txt").display().to_string() + }, + expect: Expect::OkInsidePrimary("abs.txt"), + }, + Vector { + name: "absolute_outside_all_roots", + arrange: |_root| "/etc/passwd".into(), + expect: Expect::ErrCode("C215"), + }, +]; + +#[test] +fn canonicalization_parity_vectors() { + for v in VECTORS { + // Fresh jail per vector so arrangements can't interfere. + let tmp = tempfile::tempdir().unwrap(); + let cfg = CoderConfig { + base_paths: vec![tmp.path().to_path_buf()], + ..CoderConfig::default() + }; + let resolver = PathResolver::new(&cfg) + .unwrap_or_else(|e| panic!("[{}] resolver construction failed: {e}", v.name)); + let canon_root = std::fs::canonicalize(tmp.path()).unwrap(); + + let wire = (v.arrange)(tmp.path()); + let got = resolver.resolve(&wire); + + match (&v.expect, got) { + (Expect::OkInsidePrimary(suffix), Ok(p)) => { + assert!( + p.starts_with(&canon_root), + "[{}] {p:?} must start with canonical primary root {canon_root:?}", + v.name + ); + assert!( + p.ends_with(suffix), + "[{}] {p:?} must end with {suffix:?}", + v.name + ); + } + (Expect::OkEqualsPrimaryRoot, Ok(p)) => { + assert_eq!( + p, canon_root, + "[{}] must resolve to the canonical primary root", + v.name + ); + } + (Expect::ErrCode(code), Err(e)) => { + assert_eq!( + e.code(), + *code, + "[{}] wrong error code; message: {e}", + v.name + ); + } + (Expect::OkInsidePrimary(_), Err(e)) | (Expect::OkEqualsPrimaryRoot, Err(e)) => { + panic!("[{}] expected Ok, got {} ({e})", v.name, e.code()); + } + (Expect::ErrCode(code), Ok(p)) => { + panic!("[{}] expected {code}, got Ok({p:?})", v.name); + } + } + } +} diff --git a/coder/tests/path_jail.rs b/coder/tests/path_jail.rs index 8e0b3d52..83f7ca86 100644 --- a/coder/tests/path_jail.rs +++ b/coder/tests/path_jail.rs @@ -2,7 +2,8 @@ //! per-module unit tests already exercise most branches; these scenarios //! re-verify the cross-cutting invariants from outside the module: //! -//! - `..` escapes and absolute paths must never resolve outside `base_root`. +//! - `..` escapes must never resolve outside the allowed roots. +//! - Absolute paths are accepted only inside an allowed root. //! - Symlinks to outside the base must be rejected. //! - Non-accessible globs must block reading too (not just writing). @@ -16,7 +17,7 @@ use tempfile::tempdir; fn make_resolver(base: PathBuf, globs: Vec<&str>) -> (Arc, Arc) { let cfg = Arc::new(CoderConfig { - base_path: base, + base_paths: vec![base], non_accessible_globs: globs.into_iter().map(String::from).collect(), ..CoderConfig::default() }); @@ -33,11 +34,21 @@ fn dotdot_in_path_cannot_escape_base_root() { } #[test] -fn absolute_path_input_rejected_with_c210() { +fn absolute_path_outside_all_roots_rejected_with_c215() { let tmp = tempdir().unwrap(); let (r, _) = make_resolver(tmp.path().to_path_buf(), vec![]); let err = r.resolve("/etc/passwd").unwrap_err(); - assert_eq!(err.code(), "C210"); + assert_eq!(err.code(), "C215"); +} + +#[test] +fn absolute_path_inside_a_root_accepted() { + let tmp = tempdir().unwrap(); + std::fs::write(tmp.path().join("ok.txt"), b"x").unwrap(); + let (r, _) = make_resolver(tmp.path().to_path_buf(), vec![]); + let abs_input = tmp.path().join("ok.txt").display().to_string(); + let resolved = r.resolve(&abs_input).expect("absolute inside root"); + assert!(resolved.starts_with(r.base_root())); } #[test] @@ -73,7 +84,8 @@ fn non_accessible_glob_blocks_read() { r, c, ReadFileInput { - path: ".env".into(), + path: Some(".env".into()), + ..ReadFileInput::default() }, )) .unwrap_err(); diff --git a/coder/tests/steps/common.rs b/coder/tests/steps/common.rs index 8fa717e4..d610a7bf 100644 --- a/coder/tests/steps/common.rs +++ b/coder/tests/steps/common.rs @@ -116,9 +116,11 @@ fn result_failed_with_code(world: &mut CoderWorld, path: String, code: String) { entry["success"], false, "expected failure for {path:?}; got: {entry:?}" ); - let err = entry["error"].as_str().unwrap_or(""); - assert!( - err.contains(&code), - "expected error for {path:?} to contain {code:?}; got: {err:?}" + // `error` is now a structured object `{"code":"C2xx","message":"..."}`; + // extract the `code` field for stable programmatic assertions. + let err_code = entry["error"]["code"].as_str().unwrap_or(""); + assert_eq!( + err_code, code, + "expected error code {code:?} for {path:?}; got entry: {entry:?}" ); } diff --git a/coder/tests/steps/read.rs b/coder/tests/steps/read.rs index 72b7601b..5d65cfe1 100644 --- a/coder/tests/steps/read.rs +++ b/coder/tests/steps/read.rs @@ -54,6 +54,15 @@ fn read_path_equals(world: &mut CoderWorld, expected: String) { if world.iii.is_none() { return; } + // Responses carry canonical absolute paths; features speak + // base-relative, so anchor the expectation at the scenario base. + let expected = world + .base_path + .as_ref() + .expect("base_path set") + .join(&expected) + .display() + .to_string(); let v = last_ok(world); let got = v["path"].as_str().unwrap_or(""); assert_eq!(got, expected, "path echo mismatch; got: {v}"); diff --git a/coder/tests/steps/search.rs b/coder/tests/steps/search.rs index 15015a43..07125c5e 100644 --- a/coder/tests/steps/search.rs +++ b/coder/tests/steps/search.rs @@ -22,19 +22,32 @@ fn path_matches(v: &Value) -> &[Value] { .unwrap_or(&[]) } +/// Responses carry canonical absolute paths; features speak +/// base-relative, so anchor expectations at the scenario base. +fn abs_path(world: &CoderWorld, rel: &str) -> String { + world + .base_path + .as_ref() + .expect("base_path set") + .join(rel) + .display() + .to_string() +} + #[then(regex = r#"^the search has a content match for "([^"]+)" at line (\d+)$"#)] fn search_has_content_match(world: &mut CoderWorld, path: String, line: u64) { if world.iii.is_none() { return; } + let expected = abs_path(world, &path); let v = last_ok(world); let arr = content_matches(v); let found = arr .iter() - .any(|m| m["path"].as_str() == Some(path.as_str()) && m["line"].as_u64() == Some(line)); + .any(|m| m["path"].as_str() == Some(expected.as_str()) && m["line"].as_u64() == Some(line)); assert!( found, - "expected content match for {path:?} at line {line}; got: {arr:?}" + "expected content match for {expected:?} at line {line}; got: {arr:?}" ); } @@ -43,14 +56,15 @@ fn search_has_content_match_any_line(world: &mut CoderWorld, path: String) { if world.iii.is_none() { return; } + let expected = abs_path(world, &path); let v = last_ok(world); let arr = content_matches(v); let found = arr .iter() - .any(|m| m["path"].as_str() == Some(path.as_str())); + .any(|m| m["path"].as_str() == Some(expected.as_str())); assert!( found, - "expected at least one content match for {path:?}; got: {arr:?}" + "expected at least one content match for {expected:?}; got: {arr:?}" ); } @@ -59,14 +73,15 @@ fn search_lacks_content_match(world: &mut CoderWorld, path: String) { if world.iii.is_none() { return; } + let expected = abs_path(world, &path); let v = last_ok(world); let arr = content_matches(v); let found = arr .iter() - .any(|m| m["path"].as_str() == Some(path.as_str())); + .any(|m| m["path"].as_str() == Some(expected.as_str())); assert!( !found, - "unexpected content match for {path:?}; got: {arr:?}" + "unexpected content match for {expected:?}; got: {arr:?}" ); } @@ -75,12 +90,13 @@ fn search_has_path_match(world: &mut CoderWorld, path: String) { if world.iii.is_none() { return; } + let expected = abs_path(world, &path); let v = last_ok(world); let arr = path_matches(v); let found = arr .iter() - .any(|m| m["path"].as_str() == Some(path.as_str())); - assert!(found, "expected path match for {path:?}; got: {arr:?}"); + .any(|m| m["path"].as_str() == Some(expected.as_str())); + assert!(found, "expected path match for {expected:?}; got: {arr:?}"); } #[then(regex = r#"^the search has no path match for "([^"]+)"$"#)] @@ -88,12 +104,16 @@ fn search_lacks_path_match(world: &mut CoderWorld, path: String) { if world.iii.is_none() { return; } + let expected = abs_path(world, &path); let v = last_ok(world); let arr = path_matches(v); let found = arr .iter() - .any(|m| m["path"].as_str() == Some(path.as_str())); - assert!(!found, "unexpected path match for {path:?}; got: {arr:?}"); + .any(|m| m["path"].as_str() == Some(expected.as_str())); + assert!( + !found, + "unexpected path match for {expected:?}; got: {arr:?}" + ); } #[then(regex = r#"^the search truncated is (true|false)$"#)] diff --git a/coder/tests/support/mod.rs b/coder/tests/support/mod.rs new file mode 100644 index 00000000..be1f590e --- /dev/null +++ b/coder/tests/support/mod.rs @@ -0,0 +1,106 @@ +//! Hand-rolled golden-file harness (deliberately no `insta`/snapshot +//! dependency). Goldens live under `tests/golden/` and are committed; +//! any wire-surface change must show up as an explicit, reviewed diff. +//! +//! Workflow: +//! - `cargo test` compares actual output against the committed goldens. +//! - `UPDATE_GOLDENS=1 cargo test` regenerates the files; review the git +//! diff, then commit the new goldens alongside the change that caused +//! them. + +#![allow(dead_code)] + +use std::fs; +use std::path::PathBuf; + +/// Root of the committed golden files. +pub fn golden_root() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("tests/golden") +} + +fn update_mode() -> bool { + std::env::var("UPDATE_GOLDENS") + .map(|v| v == "1") + .unwrap_or(false) +} + +/// Compare `actual` against the golden file at `tests/golden/`. +/// Returns `Err(readable diff hint)` on mismatch or missing golden; +/// with `UPDATE_GOLDENS=1` the file is (re)written and the check passes. +pub fn check_golden(rel: &str, actual: &str) -> Result<(), String> { + let path = golden_root().join(rel); + if update_mode() { + if let Some(parent) = path.parent() { + fs::create_dir_all(parent).map_err(|e| format!("create {}: {e}", parent.display()))?; + } + fs::write(&path, actual).map_err(|e| format!("write {}: {e}", path.display()))?; + return Ok(()); + } + let expected = fs::read_to_string(&path).map_err(|e| { + format!( + "golden file {} unreadable ({e}).\n\ + Run `UPDATE_GOLDENS=1 cargo test` to (re)generate, then review \ + and commit the diff.", + path.display() + ) + })?; + if expected == actual { + return Ok(()); + } + Err(diff_hint(rel, &expected, actual)) +} + +/// Panicking wrapper around [`check_golden`]. +pub fn assert_golden(rel: &str, actual: &str) { + if let Err(msg) = check_golden(rel, actual) { + panic!("{msg}"); + } +} + +/// Readable first-divergence diff hint: line number, expected vs actual +/// around the mismatch, and the regeneration instructions. +fn diff_hint(rel: &str, expected: &str, actual: &str) -> String { + let exp_lines: Vec<&str> = expected.lines().collect(); + let act_lines: Vec<&str> = actual.lines().collect(); + let first_diff = exp_lines + .iter() + .zip(act_lines.iter()) + .position(|(e, a)| e != a) + .unwrap_or_else(|| exp_lines.len().min(act_lines.len())); + + const CONTEXT: usize = 3; + let lo = first_diff.saturating_sub(CONTEXT); + let hi = (first_diff + CONTEXT + 1).max(first_diff + 1); + + let mut out = format!( + "golden mismatch: tests/golden/{rel}\n\ + first divergence at line {} (expected {} lines, actual {} lines)\n", + first_diff + 1, + exp_lines.len(), + act_lines.len() + ); + out.push_str("--- expected (golden) ---\n"); + for (i, line) in exp_lines.iter().enumerate().skip(lo).take(hi - lo) { + let marker = if i == first_diff { ">" } else { " " }; + out.push_str(&format!("{marker} {:>4} | {line}\n", i + 1)); + } + out.push_str("--- actual ---\n"); + for (i, line) in act_lines.iter().enumerate().skip(lo).take(hi - lo) { + let marker = if i == first_diff { ">" } else { " " }; + out.push_str(&format!("{marker} {:>4} | {line}\n", i + 1)); + } + // All compared lines matched — one side is a strict prefix of the other. + if exp_lines.len() != act_lines.len() && first_diff == exp_lines.len().min(act_lines.len()) { + out.push_str(&format!( + "(expected has {} lines, actual has {} — divergence is in \ + length, not content)\n", + exp_lines.len(), + act_lines.len() + )); + } + out.push_str( + "If this change is intentional, run `UPDATE_GOLDENS=1 cargo test`, \ + review the git diff, and commit the updated goldens.\n", + ); + out +} diff --git a/coder/tests/update_ops.rs b/coder/tests/update_ops.rs index d25efd7a..506dfe7b 100644 --- a/coder/tests/update_ops.rs +++ b/coder/tests/update_ops.rs @@ -15,7 +15,7 @@ use tempfile::tempdir; fn make(base: PathBuf, globs: Vec<&str>) -> (Arc, Arc) { let cfg = Arc::new(CoderConfig { - base_path: base, + base_paths: vec![base], non_accessible_globs: globs.into_iter().map(String::from).collect(), max_read_bytes: 1024 * 1024, max_write_bytes: 1024 * 1024, @@ -114,9 +114,9 @@ async fn batch_with_mix_of_success_and_failure_preserves_originals() { assert_eq!(out.results.len(), 3); assert!(out.results[0].success, "ok.txt should succeed"); assert!(!out.results[1].success, "bad.txt overlap must be rejected"); - assert!(out.results[1].error.as_deref().unwrap().contains("C210")); + assert_eq!(out.results[1].error.as_ref().unwrap().code, "C210"); assert!(!out.results[2].success, ".env must be denied"); - assert!(out.results[2].error.as_deref().unwrap().contains("C211")); + assert_eq!(out.results[2].error.as_ref().unwrap().code, "C211"); assert_eq!( std::fs::read_to_string(tmp.path().join("ok.txt")).unwrap(), @@ -172,6 +172,8 @@ async fn regex_replace_e2e() { pattern: "foo".into(), replacement: "baz".into(), ignore_case: false, + dot_matches_newline: false, + expect_matches: None, }], }], }, diff --git a/console/web/src/components/chat/FunctionCallMessage.tsx b/console/web/src/components/chat/FunctionCallMessage.tsx index e4bedb74..5d4747c1 100644 --- a/console/web/src/components/chat/FunctionCallMessage.tsx +++ b/console/web/src/components/chat/FunctionCallMessage.tsx @@ -112,7 +112,7 @@ function FunctionIdLabel({ functionId }: { functionId: string }) { if (WebToolView.isWebFunction(functionId)) { return } - if (CoderToolView.isCoderMutateFunction(functionId)) { + if (CoderToolView.isCoderFunction(functionId)) { return } if (SandboxToolView.isSandboxFunction(functionId)) { diff --git a/console/web/src/components/chat/coder/CoderDiff.tsx b/console/web/src/components/chat/coder/CoderDiff.tsx index 54fefd48..675c8430 100644 --- a/console/web/src/components/chat/coder/CoderDiff.tsx +++ b/console/web/src/components/chat/coder/CoderDiff.tsx @@ -14,12 +14,6 @@ function usePierreDiffOptions() { } } -interface CoderFileDiffProps { - path: string - oldContents: string - newContents: string -} - /** Input-derived diff: empty old side for new files. */ export function CoderNewFilePreview({ path, @@ -62,20 +56,3 @@ export function CoderOverwritePreview({ ) } - -export function CoderFileDiff({ - path, - oldContents, - newContents, -}: CoderFileDiffProps) { - const options = usePierreDiffOptions() - return ( -
- -
- ) -} diff --git a/console/web/src/components/chat/coder/CreateFileView.tsx b/console/web/src/components/chat/coder/CreateFileView.tsx index ea201d14..f9868915 100644 --- a/console/web/src/components/chat/coder/CreateFileView.tsx +++ b/console/web/src/components/chat/coder/CreateFileView.tsx @@ -1,9 +1,11 @@ /** * `coder::create-file` — input-derived diffs via @pierre/diffs. * - * UI-only: overwrite shows new content only (previous version unknown on wire). - * When the Rust handler later adds optional `before`/`after` snapshots, this - * view can switch to `CoderFileDiff` when those fields are present. + * UI-only: overwrite shows new content only (previous version unknown on + * wire). Entries are independent — one failure never aborts the batch, so + * each row gets its own success/error state. Per-entry errors are + * structured WireError {code, message}; the message names the corrective + * next call (e.g. C217 → retry with overwrite: true) and is shown verbatim. */ import { TriangleAlert } from 'lucide-react' import { formatBytes } from '@/components/chat/sandbox/format' @@ -59,21 +61,28 @@ export function CreateFileView({ {req.files.map((file, i) => { const result = resp?.results[i] if (result && !result.success) { + // Result path is canonical absolute (jail-resolved) unless + // resolution itself failed — then it's the caller's input + // verbatim, never absolute. Flag that so it reads as raw input. + const unresolved = !result.path.startsWith('/') return (
- {file.path} + {result.path} + {unresolved ? ( + · unresolved + ) : null} {result.error ? ( - err + {result.error.code} - {result.error} + {result.error.message} ) : ( err diff --git a/console/web/src/components/chat/coder/DeleteFileView.tsx b/console/web/src/components/chat/coder/DeleteFileView.tsx index 5810cbf2..dc0ae230 100644 --- a/console/web/src/components/chat/coder/DeleteFileView.tsx +++ b/console/web/src/components/chat/coder/DeleteFileView.tsx @@ -1,5 +1,10 @@ /** * `coder::delete-file` — path-level removal summary (no file body on wire). + * + * Missing paths are idempotent SUCCESSES: success + !removed renders as + * "already absent", never as a deletion. Per-entry errors are structured + * WireError {code, message}; C210 = refusing to delete an allowed root, + * C211 = not-found-or-denied (incl. non-accessible entries mid-recursion). */ import { TriangleAlert } from 'lucide-react' import { Chip } from '@/components/chat/sandbox/terminal/Terminal' @@ -62,44 +67,38 @@ export function DeleteFileView({ {req.paths.map((path, i) => { const result = resp?.results[i] - const removed = result?.removed - const outcomeLabel = - preview || running - ? 'pending' - : removed - ? 'removed' - : 'not removed' + const pending = preview || running + // success + !removed = idempotent no-op (path already gone). + const outcome = pending + ? { label: 'pending', tone: 'text-ink-faint' } + : !result + ? { label: '—', tone: 'text-ink-faint' } + : result.removed + ? { label: 'removed', tone: 'text-warn' } + : result.success + ? { label: 'already absent', tone: 'text-ink-ghost' } + : { label: 'failed', tone: 'text-ink-ghost' } return ( {path} - - {outcomeLabel} - + {outcome.label} - {preview || running ? ( + {pending || !result ? ( - ) : result?.success ? ( + ) : result.success ? ( ok - ) : result?.error ? ( + ) : result.error ? ( - err + {result.error.code} - {result.error} + {result.error.message} ) : ( err diff --git a/console/web/src/components/chat/coder/InfoView.tsx b/console/web/src/components/chat/coder/InfoView.tsx new file mode 100644 index 00000000..c1a7db37 --- /dev/null +++ b/console/web/src/components/chat/coder/InfoView.tsx @@ -0,0 +1,248 @@ +/** + * `coder::info` — config/discovery panel: allowed roots (primary marked), + * byte budgets, list/search/tree defaults, and the two glob lists. + * Reference data, kept compact — this is the call to suggest whenever + * another coder call rejects a path. + */ +import { formatBytes } from '@/components/chat/sandbox/format' +import { Chip, FooterPill } from '@/components/chat/sandbox/terminal/Terminal' +import { + type InfoResponse, + infoRequestSchema, + infoResponseSchema, + safeParseRequest, + safeParseResponse, +} from './parsers' + +interface InfoViewProps { + input: unknown + output?: unknown + running?: boolean + preview?: boolean +} + +/** Row in a key→value config grid; labels are the exact wire field + names so humans can correlate with C213 budget errors verbatim. */ +interface ConfigRow { + label: string + value: string + note?: string +} + +/** `"1.0 MiB (1,048,576 B)"` — humanized plus the exact wire value, + since errors compare against exact byte counts. */ +function formatBytesWithRaw(bytes: number): string { + if (!Number.isFinite(bytes) || bytes < 0) return '—' + if (bytes < 1024) return `${bytes} B` + return `${formatBytes(bytes)} (${bytes.toLocaleString('en-US')} B)` +} + +/** Byte budgets — exceeding any of these surfaces as C213. */ +function budgetRows(info: InfoResponse): ConfigRow[] { + return [ + { + label: 'batch_read_budget_bytes', + value: formatBytesWithRaw(info.batch_read_budget_bytes), + note: 'per paths[] batch read', + }, + { + label: 'max_output_bytes', + value: formatBytesWithRaw(info.max_output_bytes), + note: 'single-path full-read context', + }, + { + label: 'max_read_bytes', + value: formatBytesWithRaw(info.max_read_bytes), + note: 'per-file IO ceiling + search scan', + }, + { + label: 'max_write_bytes', + value: formatBytesWithRaw(info.max_write_bytes), + note: 'per single file write', + }, + { + label: 'search_response_budget_bytes', + value: formatBytesWithRaw(info.search_response_budget_bytes), + note: 'per search response', + }, + ] +} + +/** Defaults applied when a request omits the knob, plus hard caps. */ +function limitRows(info: InfoResponse): ConfigRow[] { + return [ + { + label: 'list_default_page_size', + value: String(info.list_default_page_size), + note: 'list-folder page_size default', + }, + { + label: 'list_max_page_size', + value: String(info.list_max_page_size), + note: 'page_size hard cap', + }, + { + label: 'search_default_max_matches', + value: String(info.search_default_max_matches), + note: 'search max_matches default', + }, + { + label: 'search_default_max_line_bytes', + value: formatBytesWithRaw(info.search_default_max_line_bytes), + note: 'per matched line', + }, + { + label: 'tree_default_depth', + value: String(info.tree_default_depth), + note: 'tree max_depth default', + }, + { + label: 'tree_per_folder_limit', + value: String(info.tree_per_folder_limit), + note: 'entries per tree folder', + }, + ] +} + +/** Allowed roots in config order; index 0 is where relative paths resolve. */ +function RootsList({ basePaths }: { basePaths: string[] }) { + return ( +
+
+ allowed roots +
+ {basePaths.length === 0 ? ( +
+ · none +
+ ) : ( +
+ {basePaths.map((root, i) => ( +
+ {root} + {i === 0 ? primary : null} +
+ ))} +
+ )} +
+ ) +} + +function ConfigSection({ title, rows }: { title: string; rows: ConfigRow[] }) { + return ( +
+
+ {title} +
+ + + {rows.map((row) => ( + + + + + ))} + +
+ {row.label} + + {row.value} + {row.note ? ( + · {row.note} + ) : null} +
+
+ ) +} + +/** Glob list with its semantic note — the two lists mean different + things (access protection vs noise filter), never conflate them. */ +function GlobList({ + title, + note, + globs, + warn, +}: { + title: string + note: string + globs: string[] + warn?: boolean +}) { + return ( +
+
+ {title} + · {note} +
+ {globs.length === 0 ? ( + · none + ) : ( +
+ {globs.map((glob) => ( + + {glob} + + ))} +
+ )} +
+ ) +} + +export function InfoView({ input, output, running, preview }: InfoViewProps) { + // Request is `{}` — trivially satisfiable, only bail on non-object junk. + const req = safeParseRequest(infoRequestSchema, input) + if (!req) return null + const resp = + output != null && !preview + ? safeParseResponse(infoResponseSchema, output) + : null + + return ( +
+
+ + coder + info + + {resp ? {resp.version} : null} + {resp ? {resp.base_paths.length} : null} + {running ? ( + + · querying… + + ) : null} +
+ + {resp ? ( + <> + + + + + + + ) : running ? null : ( +
+ · no config reported +
+ )} +
+ ) +} diff --git a/console/web/src/components/chat/coder/ListFolderView.tsx b/console/web/src/components/chat/coder/ListFolderView.tsx new file mode 100644 index 00000000..a9270d59 --- /dev/null +++ b/console/web/src/components/chat/coder/ListFolderView.tsx @@ -0,0 +1,150 @@ +/** + * `coder::list-folder` — paginated flat listing of one folder, the + * follow-up call for `coder::tree` per_folder_limit stubs. Wire notes + * (list_folder.rs): entries carry basenames only (full path = response + * `path` + "/" + name), `page_size` echoes the EFFECTIVE size after + * default fill / cap clamp (may differ from the request — show what was + * actually used), and `non_accessible` is a plain bool, always + * serialized, unlike tree's omit-when-false. + */ +import { formatBytes, formatMtime } from '@/components/chat/sandbox/format' +import { Chip, FooterPill } from '@/components/chat/sandbox/terminal/Terminal' +import { iconForEntry, LockedBadge } from './entryShared' +import { + type EntryKind, + joinEntryPath, + listFolderRequestSchema, + listFolderResponseSchema, + safeParseRequest, + safeParseResponse, +} from './parsers' + +interface ListFolderViewProps { + input: unknown + output?: unknown + running?: boolean +} + +export function ListFolderView({ + input, + output, + running, +}: ListFolderViewProps) { + const req = safeParseRequest(listFolderRequestSchema, input) + if (!req) return null + const resp = + output != null ? safeParseResponse(listFolderResponseSchema, output) : null + + return ( +
+
+ {resp?.path ?? req.path ?? '.'} + {resp ? ( + <> + + {pageLabel(resp.page, resp.page_size, resp.total)} + + {/* Effective size — the worker may clamp or default-fill it. */} + {resp.page_size} + {resp.total} + + ) : ( + <> + {req.page != null ? {req.page} : null} + {req.page_size != null ? ( + {req.page_size} + ) : null} + + )} + {running ? ( + + · listing… + + ) : null} +
+ + {resp ? ( + resp.entries.length === 0 ? ( +
+ {resp.total === 0 + ? '· directory is empty' + : '· no entries on this page'} +
+ ) : ( + + ) + ) : ( +
+ {running ? '· listing…' : '· no listing data'} +
+ )} + + {resp?.has_more ? ( +
+ more pages available +
+ ) : null} +
+ ) +} + +interface EntriesTableProps { + /** Canonical folder path — the prefix for every entry basename. */ + path: string + entries: { + name: string + kind: EntryKind + size: number + mtime: number + non_accessible: boolean + }[] +} + +function EntriesTable({ path, entries }: EntriesTableProps) { + return ( + + + {entries.map((e) => { + const Icon = iconForEntry(e.kind, e.name) + return ( + + + + + + + ) + })} + +
+ + + + {e.kind === 'dir' ? `${e.name}/` : e.name} + {e.non_accessible ? : null} + + + {e.kind === 'dir' ? '—' : formatBytes(e.size)} + + {formatMtime(e.mtime)} +
+ ) +} + +/** + * "page/totalPages" chip text. Both `total` and the EFFECTIVE + * `page_size` come from the response, so the page count reflects what + * the worker actually used (the requested size may have been clamped). + */ +export function pageLabel( + page: number, + pageSize: number, + total: number, +): string { + if (pageSize <= 0) return String(page) + const totalPages = Math.max(1, Math.ceil(total / pageSize)) + return `${page}/${totalPages}` +} diff --git a/console/web/src/components/chat/coder/MoveView.tsx b/console/web/src/components/chat/coder/MoveView.tsx new file mode 100644 index 00000000..f0fd189f --- /dev/null +++ b/console/web/src/components/chat/coder/MoveView.tsx @@ -0,0 +1,133 @@ +/** + * `coder::move` — batched `from → to` summary (no file body on wire). + * + * success + !moved is a no-op self-move (from and to resolve to the same + * file) — rendered "unchanged", never as a completed move. Failed entries + * also print the full WireError message inline: cross-root rollback + * failures name BOTH leftover states (copy/source) for manual cleanup, + * and C210 destination-is-directory messages carry a corrected target + * path — neither belongs hidden in a tooltip alone. + */ +import { TriangleAlert } from 'lucide-react' +import { Chip } from '@/components/chat/sandbox/terminal/Terminal' +import { + Tooltip, + TooltipContent, + TooltipTrigger, +} from '@/components/ui/Tooltip' +import { + moveFileRequestSchema, + moveFileResponseSchema, + safeParseRequest, + safeParseResponse, +} from './parsers' + +interface MoveViewProps { + input: unknown + output?: unknown + running?: boolean + preview?: boolean +} + +export function MoveView({ input, output, running, preview }: MoveViewProps) { + const req = safeParseRequest(moveFileRequestSchema, input) + if (!req) return null + const resp = + output != null && !preview + ? safeParseResponse(moveFileResponseSchema, output) + : null + + const pending = preview || running + + return ( +
+
+ {req.files.length} + {running ? ( + + · moving… + + ) : null} +
+ + + + + + + + + + + {req.files.map((spec, i) => { + const result = resp?.results[i] + // Canonical absolute paths once the wire responds (caller's + // input verbatim when resolution failed); request paths until. + const from = result?.from ?? spec.from + const to = result?.to ?? spec.to + const outcome = pending + ? { label: 'pending', tone: 'text-ink-faint' } + : !result + ? { label: '—', tone: 'text-ink-faint' } + : result.moved + ? { label: 'moved', tone: 'text-ink' } + : result.success + ? { label: 'unchanged', tone: 'text-ink-ghost' } + : { label: 'failed', tone: 'text-ink-ghost' } + + return ( + + + + + + ) + })} + +
from → tooutcomestatus
+
+ {from} + + {to} + {spec.overwrite ? ( + + true + + ) : null} +
+ {result?.error ? ( +
+ {result.error.message} +
+ ) : null} +
+ {outcome.label} + + {pending || !result ? ( + + ) : result.success ? ( + ok + ) : result.error ? ( + + + + + {result.error.code} + + + {result.error.message} + + ) : ( + err + )} +
+
+ ) +} + +export function MovePreview({ input }: { input: unknown }) { + return +} diff --git a/console/web/src/components/chat/coder/ReadFileView.tsx b/console/web/src/components/chat/coder/ReadFileView.tsx new file mode 100644 index 00000000..f2a4c363 --- /dev/null +++ b/console/web/src/components/chat/coder/ReadFileView.tsx @@ -0,0 +1,453 @@ +/** + * `coder::read-file` — single-path scalars vs batch `paths[]`, full reads + * vs `line_from`/`line_to` windows, stat probes, and pre-numbered content. + * + * Wire: workers/coder/src/functions/read_file.rs (v0.4.1). Single-path + * responses populate the top-level scalars and omit `results`; batch + * responses populate only `results` (request order). `total_lines` + * ABSENT means "not fully traversed" — distinct from 0. + */ +import { TriangleAlert } from 'lucide-react' +import { + formatBytes, + formatMode, + formatMtime, + inferLangFromPath, +} from '@/components/chat/sandbox/format' +import { Chip, FooterPill } from '@/components/chat/sandbox/terminal/Terminal' +import { CodeHighlight } from '@/lib/syntax' +import { + type ReadEntryResult, + type ReadFileRequest, + type ReadFileResponse, + type ReadTarget, + readFileRequestSchema, + readFileResponseSchema, + safeParseRequest, + safeParseResponse, +} from './parsers' + +interface ReadFileViewProps { + input: unknown + output?: unknown + running?: boolean +} + +/* ---------------- pure derivation helpers (unit-tested) ---------------- */ + +/** Request-side options for one read target — collapses the batch + string/object shorthand and nullish wire fields into one shape. */ +export interface NormalizedReadTarget { + path: string + lineFrom: number | null + lineTo: number | null + numbered: boolean + stat: boolean +} + +export function normalizeReadTarget(target: ReadTarget): NormalizedReadTarget { + if (typeof target === 'string') { + // Bare string = whole-file read with handler defaults. + return { + path: target, + lineFrom: null, + lineTo: null, + numbered: false, + stat: false, + } + } + return { + path: target.path, + lineFrom: target.line_from ?? null, + lineTo: target.line_to ?? null, + numbered: target.numbered ?? false, + stat: target.stat ?? false, + } +} + +/** + * One renderable unit: requested target + its wire result. Single-path + * responses are folded into a synthetic `ReadEntryResult` so both modes + * share a renderer. `result` stays null while running or when the + * response is missing/unparseable. + */ +export interface ReadEntryModel { + requested: NormalizedReadTarget + result: ReadEntryResult | null +} + +export function isBatchRequest(req: ReadFileRequest): boolean { + return Array.isArray(req.paths) +} + +/** Stable list key — the same path may appear twice with different + windows, so the key carries the whole target descriptor. */ +export function readTargetKey(t: NormalizedReadTarget): string { + return [ + t.path, + t.lineFrom ?? '', + t.lineTo ?? '', + t.numbered ? 'n' : '', + t.stat ? 's' : '', + ].join(':') +} + +export function deriveReadEntries( + req: ReadFileRequest, + resp: ReadFileResponse | null, +): ReadEntryModel[] { + if (isBatchRequest(req)) { + // Results arrive in request order — align by index. Per-entry window + // flags live on the targets; top-level window fields are ignored. + return (req.paths ?? []).map((target, i) => ({ + requested: normalizeReadTarget(target), + result: resp?.results?.[i] ?? null, + })) + } + const path = req.path ?? resp?.path + // Neither path nor paths — runtime rejects with C210; nothing to render. + if (path == null) return [] + const result: ReadEntryResult | null = resp + ? { + path: resp.path ?? path, + // Single-path failures arrive as top-level handler errors (the + // family error view renders those) — a parsed response = success. + success: true, + content: resp.content, + is_utf8: resp.is_utf8, + lines_returned: resp.lines_returned, + total_lines: resp.total_lines, + more_lines: resp.more_lines, + size: resp.size, + mode: resp.mode, + mtime: resp.mtime, + } + : null + return [ + { + requested: { + path, + lineFrom: req.line_from ?? null, + lineTo: req.line_to ?? null, + numbered: req.numbered ?? false, + stat: req.stat ?? false, + }, + result, + }, + ] +} + +/** `"L10–50"` / `"L40–EOF"`. The wire defaults `line_from` to 1 when only + `line_to` is set. Null when the read isn't windowed. */ +export function windowLabel( + lineFrom: number | null, + lineTo: number | null, +): string | null { + if (lineFrom == null && lineTo == null) return null + const from = lineFrom ?? 1 + return lineTo == null ? `L${from}–EOF` : `L${from}–${lineTo}` +} + +/** First line of the follow-up window when `more_lines` is true. */ +export function nextWindowStart( + lineFrom: number | null, + linesReturned: number | null | undefined, +): number { + return (lineFrom ?? 1) + (linesReturned ?? 0) +} + +/** Numeric st_mode (lower 9 bits, e.g. 420 = 0o644) → `"rw-r--r--"`. + The sandbox `formatMode` takes octal STRINGS; coder sends integers. */ +export function formatNumericMode(mode: number): string { + if (!Number.isInteger(mode) || mode < 0) return '—' + return formatMode((mode & 0o777).toString(8).padStart(3, '0')) +} + +/* ---------------- chip rows ---------------- */ + +function RequestChips({ t }: { t: NormalizedReadTarget }) { + const window = windowLabel(t.lineFrom, t.lineTo) + return ( + <> + {window ? {window} : null} + {t.numbered ? true : null} + {t.stat ? true : null} + + ) +} + +/** size / lines / lossy badge for a successful read. `size` is the FILE + size from metadata — labelled apart from content size when windowed. */ +function MetaChips({ entry }: { entry: ReadEntryModel }) { + const { requested, result } = entry + if (!result?.success || requested.stat) return null + const windowed = requested.lineFrom != null || requested.lineTo != null + return ( + <> + {result.size != null ? ( + + {formatBytes(result.size)} + + ) : null} + {result.total_lines != null ? ( + {result.total_lines} + ) : null} + {result.is_utf8 === false ? ( + // Binary bytes were replaced with U+FFFD in the returned content. + + lossy + + ) : null} + + ) +} + +/** `more_lines: true` — content beyond the returned window exists. Offer + the next `line_from` so the follow-up window is one copy away. */ +function MoreLinesPill({ entry }: { entry: ReadEntryModel }) { + const r = entry.result + if (r?.more_lines !== true) return null + return ( + + file continues + + next window from L + {nextWindowStart(entry.requested.lineFrom, r.lines_returned)} + + + ) +} + +/* ---------------- bodies ---------------- */ + +/** Verbatim wire message — C213 budget errors name the budget, the bytes + consumed, and the recovery call; never paraphrase. */ +function EntryErrorBlock({ error }: { error: ReadEntryResult['error'] }) { + return ( +
+ {error ? error.message : 'read failed'} +
+ ) +} + +/** Metadata-only probe (`stat: true`) — compact chip grid like FsStatView. + `total_lines` / `is_utf8` are absent for files over max_read_bytes; + the stat itself still SUCCEEDED. */ +function StatBody({ result }: { result: ReadEntryResult }) { + return ( +
+ + {result.size != null ? formatBytes(result.size) : '—'} + + + {result.mode != null ? formatNumericMode(result.mode) : '—'} + + + {result.mtime != null ? formatMtime(result.mtime) : '—'} + + {result.total_lines != null ? ( + {result.total_lines} + ) : null} + {result.is_utf8 != null ? ( + + {result.is_utf8 ? 'lossless' : 'lossy'} + + ) : null} + {result.total_lines == null ? ( + + · too large to line-count + + ) : null} +
+ ) +} + +/** File content. Numbered reads arrive with literal `N→` prefixes baked + into each line (absolute numbers that feed update-file line ops), so + they pass through a plain `
` instead of the highlighter — never
+    double-number. */
+function ContentBody({ entry }: { entry: ReadEntryModel }) {
+  const r = entry.result
+  if (!r?.success) return null
+  if (entry.requested.stat) return 
+  const content = typeof r.content === 'string' ? r.content : null
+  if (content === null) {
+    return (
+      
+ · no content +
+ ) + } + if (content === '') { + // Window past EOF / empty file — both succeed with empty content. + return ( +
+ · empty +
+ ) + } + const lang = entry.requested.numbered + ? null + : inferLangFromPath(entry.requested.path) + if (lang) return + return ( +
+      {content}
+    
+ ) +} + +/* ---------------- modes ---------------- */ + +function SingleRead({ + entry, + budget, + running, +}: { + entry: ReadEntryModel + budget: number | null + running?: boolean +}) { + const r = entry.result + const hasFooter = + r?.success === true && + !entry.requested.stat && + (r.size != null || + r.mode != null || + r.mtime != null || + r.total_lines != null || + r.is_utf8 === false || + r.more_lines === true) + + return ( +
+
+ + file + + + {entry.requested.path} + + + {budget != null ? ( + // Per-call max_output_bytes override on full reads. + {formatBytes(budget)} + ) : null} + {running ? ( + + · reading… + + ) : null} +
+ + + + {hasFooter && r ? ( +
+ + {r.mode != null ? ( + {formatNumericMode(r.mode)} + ) : null} + {r.mtime != null ? ( + {formatMtime(r.mtime)} + ) : null} + +
+ ) : null} +
+ ) +} + +/** Batch budget exhaustion is a per-entry C213 — earlier entries may have + succeeded while later ones failed individually. Render each honestly. */ +function BatchEntry({ + entry, + running, +}: { + entry: ReadEntryModel + running?: boolean +}) { + const { requested, result } = entry + const failed = result != null && !result.success + return ( +
+
+ {requested.path} + + + {failed ? ( + + + {result.error?.code ?? 'err'} + + ) : null} + {result == null && !running ? ( + + ) : null} +
+ {failed ? ( + + ) : ( + + )} + {entry.result?.more_lines === true ? ( +
+ +
+ ) : null} +
+ ) +} + +function BatchRead({ + entries, + running, +}: { + entries: ReadEntryModel[] + running?: boolean +}) { + return ( +
+
+ {entries.length} + {running ? ( + + · reading… + + ) : null} +
+ {entries.map((entry) => ( + + ))} +
+ ) +} + +/* ---------------- view ---------------- */ + +export function ReadFileView({ input, output, running }: ReadFileViewProps) { + const req = safeParseRequest(readFileRequestSchema, input) + if (!req) return null + const resp = + output != null ? safeParseResponse(readFileResponseSchema, output) : null + const entries = deriveReadEntries(req, resp) + if (entries.length === 0) return null + + if (isBatchRequest(req)) { + return + } + return ( + + ) +} diff --git a/console/web/src/components/chat/coder/SearchView.tsx b/console/web/src/components/chat/coder/SearchView.tsx new file mode 100644 index 00000000..35c9a8ae --- /dev/null +++ b/console/web/src/components/chat/coder/SearchView.tsx @@ -0,0 +1,365 @@ +/** + * `coder::search` — grep-style results (workers/coder/src/functions/search.rs). + * Content matches grouped by file with a line-number gutter and dimmed + * before/after context; path matches render as a second section. The + * `truncated` flag is the headline honesty signal: the wire sets it on + * the match cap OR the response byte budget, and the fix is to refine + * the query / add include_globs — there is no pagination. + */ +import type { ReactNode } from 'react' +import { renderWithHighlight } from '@/components/chat/sandbox/highlight' +import { Chip, FooterPill } from '@/components/chat/sandbox/terminal/Terminal' +import { cn } from '@/lib/utils' +import { + type ContentMatch, + type PathMatch, + type SearchRequest, + safeParseRequest, + safeParseResponse, + searchRequestSchema, + searchResponseSchema, +} from './parsers' + +interface SearchViewProps { + input: unknown + output?: unknown + running?: boolean +} + +export function SearchView({ input, output, running }: SearchViewProps) { + const req = safeParseRequest(searchRequestSchema, input) + if (!req) return null + const resp = + output != null ? safeParseResponse(searchResponseSchema, output) : null + + const groups = resp ? groupContentMatches(resp.content_matches) : [] + const pathMatches = resp?.path_matches ?? [] + const hasMatches = groups.length > 0 || pathMatches.length > 0 + const isRegex = !!req.regex + const ignoreCase = !!req.ignore_case + + return ( +
+
+ {`"${req.query}"`} + + {resp ? ( + + {formatMatchCount(resp.content_matches.length, pathMatches.length)} + + ) : null} + {resp?.truncated ? ( + results truncated + ) : null} + {running ? ( + + · searching… + + ) : null} +
+ + {resp?.truncated ? ( +
+ results truncated — refine the query or add include_globs rather than + paginating +
+ ) : null} + + {resp ? ( + hasMatches ? ( + <> + {groups.length > 0 ? ( + + ) : null} + {pathMatches.length > 0 ? ( + 0 ? 'border-t border-rule-2' : undefined + } + /> + ) : null} + + ) : ( +
+ · no matches +
+ ) + ) : null} +
+ ) +} + +/** Request-derived honesty chips: every knob that narrowed (or widened) + what the agent actually searched. */ +function RequestChips({ req }: { req: SearchRequest }) { + const context = formatContextRequest( + req.context_lines_before, + req.context_lines_after, + ) + return ( + <> + {req.regex ? regex : null} + {req.ignore_case ? case-insensitive : null} + {req.path ? {req.path} : null} + {req.include_globs?.length ? ( + {req.include_globs.join(' ')} + ) : null} + {req.exclude_globs?.length ? ( + {req.exclude_globs.join(' ')} + ) : null} + {/* Walking into .git / node_modules / target is unusual enough to + warn-tint — results may include generated or vendored code. */} + {req.use_default_excludes === false ? ( + default excludes off + ) : null} + {req.search_content === false ? paths only : null} + {req.search_paths === false ? content only : null} + {context ? {context} : null} + {req.max_matches != null ? ( + {req.max_matches} + ) : null} + + ) +} + +function SectionLabel({ children }: { children: ReactNode }) { + return ( +
+ {children} +
+ ) +} + +interface HighlightOpts { + query: string + isRegex: boolean + ignoreCase: boolean +} + +function ContentSection({ + groups, + query, + isRegex, + ignoreCase, +}: { groups: ContentGroup[] } & HighlightOpts) { + return ( +
+ content matches + {groups.map((group) => ( +
+
{group.path}
+ {/* 3-col grid: line | column | text — auto-sized gutters keep + line numbers aligned per file group. */} +
+ {buildGroupRows(group.matches).map((row) => ( + + ))} +
+
+ ))} +
+ ) +} + +function MatchRowCells({ + row, + query, + isRegex, + ignoreCase, +}: { row: MatchRow } & HighlightOpts) { + if (row.kind === 'gap') { + return ( + <> + + + + + ) + } + const isMatch = row.kind === 'match' + return ( + <> + {/* select-none gutters so copying a block yields just the code. */} + + {row.line} + + + {isMatch && row.column != null ? `:${row.column}` : ''} + +
+        
+          {isMatch
+            ? renderWithHighlight(row.text, query, { isRegex, ignoreCase })
+            : row.text}
+        
+      
+ + ) +} + +function PathSection({ + paths, + query, + isRegex, + ignoreCase, + className, +}: { paths: PathMatch[]; className?: string } & HighlightOpts) { + return ( +
+ path matches + {paths.map((p) => ( +
+ {renderWithHighlight(p.path, query, { isRegex, ignoreCase })} +
+ ))} +
+ ) +} + +/* ---------------- pure derivation helpers (unit-tested) ---------------- */ + +export interface ContentGroup { + path: string + matches: ContentMatch[] +} + +/** Group content matches by file, preserving wire (walk) order for both + files and the matches within each file. */ +export function groupContentMatches(matches: ContentMatch[]): ContentGroup[] { + const groups: ContentGroup[] = [] + const byPath = new Map() + for (const m of matches) { + const existing = byPath.get(m.path) + if (existing) { + existing.matches.push(m) + } else { + const group: ContentGroup = { path: m.path, matches: [m] } + byPath.set(m.path, group) + groups.push(group) + } + } + return groups +} + +export interface MatchRow { + key: string + kind: 'match' | 'context' | 'gap' + /** 1-based file line; null for gap rows. */ + line: number | null + /** Match column — only on `match` rows. */ + column?: number + text: string +} + +/** + * Flatten one file's matches into display rows. Context line numbers are + * derived by offset from the match line (`before` ends at line-1, `after` + * starts at line+1 — the wire omits both arrays entirely when empty). A + * `gap` row marks non-contiguous blocks so context lines from different + * matches don't read as one continuous excerpt. + */ +export function buildGroupRows(matches: ContentMatch[]): MatchRow[] { + const rows: MatchRow[] = [] + let prevEnd: number | null = null + let ordinal = 0 + for (const m of matches) { + const before = m.before ?? [] + const after = m.after ?? [] + const start = m.line - before.length + if (prevEnd !== null && start > prevEnd + 1) { + rows.push({ key: `gap:${ordinal++}`, kind: 'gap', line: null, text: '' }) + } + before.forEach((text, i) => { + rows.push({ + key: `ctx:${ordinal++}`, + kind: 'context', + line: start + i, + text, + }) + }) + rows.push({ + key: `match:${ordinal++}`, + kind: 'match', + line: m.line, + column: m.column, + text: m.text, + }) + after.forEach((text, i) => { + rows.push({ + key: `ctx:${ordinal++}`, + kind: 'context', + line: m.line + 1 + i, + text, + }) + }) + prevEnd = m.line + after.length + } + return rows +} + +/** Header chip for requested context: `±2` when symmetric, else the + grep-flavoured `-B +A` parts that were actually asked for. */ +export function formatContextRequest( + before?: number | null, + after?: number | null, +): string | null { + const b = before ?? 0 + const a = after ?? 0 + if (b === 0 && a === 0) return null + if (b === a) return `±${b}` + const parts: string[] = [] + if (b > 0) parts.push(`-${b}`) + if (a > 0) parts.push(`+${a}`) + return parts.join(' ') +} + +/** Count pill text. Content hits are counted as "lines" on purpose — + the wire reports at most ONE match per line, so a line with five + hits is still a single ContentMatch. */ +export function formatMatchCount( + contentLines: number, + pathHits: number, +): string { + const parts: string[] = [] + if (contentLines > 0) { + parts.push(`${contentLines} ${contentLines === 1 ? 'line' : 'lines'}`) + } + if (pathHits > 0) { + parts.push(`${pathHits} ${pathHits === 1 ? 'path' : 'paths'}`) + } + if (parts.length === 0) return '0 matches' + return parts.join(' · ') +} diff --git a/console/web/src/components/chat/coder/TreeView.tsx b/console/web/src/components/chat/coder/TreeView.tsx new file mode 100644 index 00000000..45f09166 --- /dev/null +++ b/console/web/src/components/chat/coder/TreeView.tsx @@ -0,0 +1,220 @@ +/** + * `coder::tree` — recursive snapshot of a folder subtree, rendered as an + * indented monospace tree. Wire quirks (tree.rs): the root node carries + * NO path — the response `path` IS the root's path (never join root.name + * onto it); child paths derive as parent + "/" + name. `non_accessible` + * is omitted when false (the schema defaults it), `children`/`truncated` + * are omitted when absent, and default-excluded FILES are silently + * dropped — only excluded dirs come back as childless stub nodes. + */ +import { formatBytes } from '@/components/chat/sandbox/format' +import { Chip } from '@/components/chat/sandbox/terminal/Terminal' +import { + Tooltip, + TooltipContent, + TooltipTrigger, +} from '@/components/ui/Tooltip' +import { iconForEntry, LockedBadge } from './entryShared' +import { + joinEntryPath, + safeParseRequest, + safeParseResponse, + type TreeNode, + type TruncationInfo, + treeRequestSchema, + treeResponseSchema, +} from './parsers' + +interface TreeViewProps { + input: unknown + output?: unknown + running?: boolean +} + +export function TreeView({ input, output, running }: TreeViewProps) { + const req = safeParseRequest(treeRequestSchema, input) + if (!req) return null + const resp = + output != null ? safeParseResponse(treeResponseSchema, output) : null + const summary = resp ? summariseTree(resp.root) : null + + return ( +
+
+ {resp?.path ?? req.path ?? '.'} + {req.max_depth != null ? ( + {req.max_depth} + ) : null} + {req.per_folder_limit != null ? ( + {req.per_folder_limit} + ) : null} + {req.use_default_excludes === false ? ( + + off + + ) : null} + {summary ? ( + <> + {summary.dirs} + {summary.files} + {summary.truncated > 0 ? ( + + {summary.truncated} + + ) : null} + + ) : null} + {running ? ( + + · scanning… + + ) : null} +
+ + {resp ? ( +
+ +
+ ) : ( +
+ {running ? '· scanning…' : '· no tree data'} +
+ )} +
+ ) +} + +interface TreeRowsProps { + node: TreeNode + /** Canonical absolute path of THIS node (root's = the response `path`). */ + path: string + depth: number +} + +/** One node row, then its children, then any truncation/empty stub. */ +function TreeRows({ node, path, depth }: TreeRowsProps) { + const isDir = node.kind === 'dir' + const Icon = iconForEntry(node.kind, node.name) + const children = node.children ?? [] + // `children: []` is a genuinely empty dir; absent children on a dir + // means not descended (depth cut / exclude) and carries a stub instead. + const isEmptyDir = + isDir && node.children != null && children.length === 0 && !node.truncated + + return ( + <> +
+ + + + {isDir ? `${node.name}/` : node.name} + + {node.non_accessible ? : null} + {!isDir ? ( + + {formatBytes(node.size)} + + ) : null} +
+ {children.map((child) => ( + + ))} + {node.truncated ? ( + + ) : null} + {isEmptyDir ? ( +
+ · empty +
+ ) : null} + + ) +} + +/** + * Dimmed annotation row for a cut-off subtree. The label is the + * at-a-glance reason; the wire's pre-written `hint` (next-step guidance) + * renders in the tooltip. + */ +function TruncationStub({ + info, + depth, +}: { + info: TruncationInfo + depth: number +}) { + return ( +
+ + + + + ⋯ {truncationLabel(info)} + + + {info.hint} + +
+ ) +} + +/** Two spaces per level; root (depth 0) gets none. */ +function Indent({ depth }: { depth: number }) { + if (depth <= 0) return null + return ( + + {' '.repeat(depth)} + + ) +} + +export interface TreeSummary { + dirs: number + files: number + /** Nodes carrying a truncation stub — the at-a-glance audit signal. */ + truncated: number +} + +/** Single walk over the in-memory tree for the header chips. Symlink and + "other" nodes count in neither bucket. Includes the root itself. */ +export function summariseTree(node: TreeNode): TreeSummary { + const self: TreeSummary = { + dirs: node.kind === 'dir' ? 1 : 0, + files: node.kind === 'file' ? 1 : 0, + truncated: node.truncated ? 1 : 0, + } + return (node.children ?? []).reduce((acc, child) => { + const sub = summariseTree(child) + return { + dirs: acc.dirs + sub.dirs, + files: acc.files + sub.files, + truncated: acc.truncated + sub.truncated, + } + }, self) +} + +/** + * Stub label per truncation reason. `total` is populated ONLY for + * "per_folder_limit" — depth cuts never peek into the folder, so they + * have nothing to count. Unknown reasons render verbatim (the schema + * keeps `reason` a plain string for forward tolerance). + */ +export function truncationLabel(info: TruncationInfo): string { + switch (info.reason) { + case 'per_folder_limit': + return info.total != null + ? `${info.shown}/${info.total} children shown — paginate with list-folder` + : `${info.shown} children shown — paginate with list-folder` + case 'max_depth': + return 'max depth — subtree not explored' + case 'default_exclude': + return 'excluded by default — not descended' + default: + return info.reason + } +} diff --git a/console/web/src/components/chat/coder/UpdateFileView.tsx b/console/web/src/components/chat/coder/UpdateFileView.tsx index fff4add3..3b9c1df0 100644 --- a/console/web/src/components/chat/coder/UpdateFileView.tsx +++ b/console/web/src/components/chat/coder/UpdateFileView.tsx @@ -1,23 +1,44 @@ /** - * `coder::update-file` — Pierre diff when the wire response includes - * `before` / `after` snapshots; op-summary table for pending/running or - * when snapshots are omitted (binary / oversize). + * `coder::update-file` — per-op echo rendering (v0.4.1 wire). + * + * The before/after full-file diff is GONE from the wire: each applied op + * returns a bounded post-apply snapshot (`OpEcho`) instead. Done state + * renders one section per file (path + ops/applied/lines chips) followed + * by echo blocks keyed by op_index; replace ops emit up to 5 site echoes + * sharing one op_index (`total_replacements` is the only record of the + * extras). The op-summary table remains for pending/running and as the + * fallback when the response is missing or unparseable. Per-file + * atomicity: a failed file carries empty echoes and only `error` — + * sibling files in the batch still apply, so batches render per file. + * + * Echoes render "half-diff" styled: lines the op INSERTED are tagged added + * (green `+` gutter); old text is not on the wire, so update_lines/remove + * ops get a single amber (`text-warn`) `−` stub at the seam instead of + * real deletion rows. Replace-site echoes are entirely post-replace text — + * every line is added. Addition ranges anchor on POST-APPLY coordinates + * reconstructed from the echo itself, falling back to mapping the + * request's own line-op deltas when the echo head-clamps (see + * `postRegionFirst`), so multi-op batches that shift later regions still + * tag the right lines. */ import { TriangleAlert } from 'lucide-react' -import { Chip, FooterPill } from '@/components/chat/sandbox/terminal/Terminal' +import { Chip } from '@/components/chat/sandbox/terminal/Terminal' import { Tooltip, TooltipContent, TooltipTrigger, } from '@/components/ui/Tooltip' -import { CoderFileDiff } from './CoderDiff' +import { cn } from '@/lib/utils' import { formatUpdateOp, + type OpEcho, safeParseRequest, safeParseResponse, truncateInline, type UpdateFileRequest, - type UpdateFileResponse, + type UpdateFileResult, + type UpdateFileSpec, + type UpdateOp, updateFileRequestSchema, updateFileResponseSchema, } from './parsers' @@ -29,46 +50,500 @@ interface UpdateFileViewProps { preview?: boolean } -function hasDiffSnapshot(result: { - success: boolean - before?: string - after?: string -}): result is { success: true; before: string; after: string } { +/* ---------------- pure echo derivation (unit-tested) ---------------- */ + +export interface EchoGroup { + opIndex: number + echoes: OpEcho[] +} + +/** + * Group echoes by `op_index`, preserving wire order (`build_echoes` emits + * them sorted by op_index; replace sites keep match order). Replace ops + * are the only multi-echo producers — every echo in a >1 group is a + * separate match site of the same op. + */ +export function groupEchoesByOp(echoes: readonly OpEcho[]): EchoGroup[] { + const groups: EchoGroup[] = [] + const byOpIndex = new Map() + for (const echo of echoes) { + const bucket = byOpIndex.get(echo.op_index) + if (bucket) { + bucket.push(echo) + } else { + const fresh = [echo] + byOpIndex.set(echo.op_index, fresh) + groups.push({ opIndex: echo.op_index, echoes: fresh }) + } + } + return groups +} + +export type EchoRow = + | { kind: 'line'; lineNo: number; text: string; added: boolean } + | { kind: 'elision'; count: number } + | { kind: 'stub'; verb: 'replaced' | 'removed'; label: string } + +/** + * Mirror of `update_file.rs::split_content` (Rust `str::lines()`): a + * trailing `\n` does NOT produce a final empty line, `""` yields ZERO + * lines, and CRLF `\r` is trimmed without affecting the count. This is + * the K in the addition range [anchor, anchor+K-1] — drift here mistags + * the boundary context lines. + */ +export function contentLineCount(content: string): number { + if (content === '') return 0 + const parts = content.split('\n') + return parts[parts.length - 1] === '' ? parts.length - 1 : parts.length +} + +/** Mirror of `update_file.rs::ECHO_CONTEXT` — context lines above/below a + line op's region; load-bearing for `postRegionFirst` reconstruction. */ +const ECHO_CONTEXT = 2 + +/** Mirror of `update_file.rs::anchor` — a line op's first affected + ORIGINAL line (the bottom-up application sort key). Replace ops have + no line anchor. */ +function lineOpAnchor(op: UpdateOp): number | null { + switch (op.op) { + case 'insert': + return op.at_line + case 'remove': + case 'update_lines': + return op.from_line + case 'replace': + return null + } +} + +/** Mirror of the `MutationEvent` delta a line op records + (`record_line_op_events`): the line-count change it causes. Replace + deltas depend on server-side match positions — not computable here. */ +function lineOpDelta(op: UpdateOp): number | null { + switch (op.op) { + case 'insert': + return contentLineCount(op.content) + case 'remove': + return -(op.to_line - op.from_line + 1) + case 'update_lines': + return contentLineCount(op.content) - (op.to_line - op.from_line + 1) + case 'replace': + return null + } +} + +/** + * POST-APPLY first line of the op's echo region, reconstructed from the + * wire. `build_line_echo` sets `from_line = post_first − ECHO_CONTEXT` + * whenever the subtraction doesn't clamp at the file head, so for + * `from_line > 1` the anchor is exact even when other ops in a multi-op + * file shifted this region (Rust maps every anchor through later mutation + * events — `map_through_events`). At `from_line === 1` the head clamp + * erases the offset (post_first ∈ {1..ECHO_CONTEXT+1}); the wire alone + * cannot disambiguate, but the REQUEST can when the file's batch is + * line-op-only: line ops apply bottom-up and overlap validation keeps + * their covers disjoint, so `map_through_events` collapses to a plain + * delta sum — only siblings with a strictly SMALLER anchor apply after + * this op and shift its region (strictness also excludes the op itself; + * anchors are distinct because each lies in its op's disjoint cover). + * Batches containing a `replace` keep the ORIGINAL-anchor fallback: + * regex ops run after all line ops but their match positions (and thus + * deltas) exist only server-side — accepted residual: a newline-count- + * changing replacement above a head-of-file region can mistag at most + * the clamp window (pinned in tests). Callers without the request's op + * list also degrade to the original anchor. + */ +function postRegionFirst( + echo: OpEcho, + originalFirst: number, + op: UpdateOp, + fileOps?: readonly UpdateOp[], +): number { + if (echo.from_line > 1) return echo.from_line + ECHO_CONTEXT + const ownAnchor = lineOpAnchor(op) + if ( + ownAnchor === null || + !fileOps || + fileOps.some((o) => o.op === 'replace') + ) { + return originalFirst + } + return fileOps.reduce((mapped, sibling) => { + const anchor = lineOpAnchor(sibling) + const delta = lineOpDelta(sibling) + return anchor !== null && delta !== null && anchor < ownAnchor + ? mapped + delta + : mapped + }, originalFirst) +} + +/** + * Which absolute (post-apply) line numbers did this op INSERT? The range + * start comes from `postRegionFirst` (post-apply coordinates), the length + * K from the op's content. Replace-site echoes carry zero context + * (`build_site_echo`), so every echoed line is post-replace text. + */ +function addedLinePredicate( + echo: OpEcho, + op?: UpdateOp, + fileOps?: readonly UpdateOp[], +): (lineNo: number) => boolean { + if (!op) return () => false + switch (op.op) { + case 'replace': + return () => true + case 'insert': { + const first = postRegionFirst(echo, op.at_line, op, fileOps) + const last = first + contentLineCount(op.content) - 1 + return (lineNo) => lineNo >= first && lineNo <= last + } + case 'update_lines': { + const first = postRegionFirst(echo, op.from_line, op, fileOps) + const last = first + contentLineCount(op.content) - 1 + return (lineNo) => lineNo >= first && lineNo <= last + } + case 'remove': + return () => false + } +} + +/** + * Deletion stub for ops that destroyed original text: the old lines are + * not on the wire, so a single `−` row stands in for them. insert is a + * pure addition and replace's old text is carried by the op tooltip + * (`formatUpdateOp`) — neither gets a stub. The LABEL keeps original + * coordinates (it names destroyed original lines); the SEAM is post-apply: + * update_lines anchors where its additions begin, remove one past its + * recorded anchor (`record_line_op_events` anchors remove at + * `from_line − 1`, the surviving line above the cut). + */ +function deletionStub( + echo: OpEcho, + op?: UpdateOp, + fileOps?: readonly UpdateOp[], +): { row: EchoRow; anchorLine: number } | null { + if (!op || (op.op !== 'update_lines' && op.op !== 'remove')) return null + const verb = op.op === 'update_lines' ? 'replaced' : 'removed' + const range = + op.from_line === op.to_line + ? `L${op.from_line}` + : `L${op.from_line}–L${op.to_line}` + const anchorLine = + op.op === 'update_lines' + ? postRegionFirst(echo, op.from_line, op, fileOps) + : postRegionFirst(echo, op.from_line - 1, op, fileOps) + 1 + return { + row: { kind: 'stub', verb, label: `${verb} original ${range}` }, + anchorLine, + } +} + +/** Place the stub at the diff seam: before the first row at/after the + post-apply seam line (i.e. between leading context and the additions). */ +function insertStub( + rows: readonly EchoRow[], + stub: EchoRow, + anchorLine: number, +): EchoRow[] { + const idx = rows.findIndex((r) => r.kind === 'line' && r.lineNo >= anchorLine) + if (idx < 0) return [...rows, stub] + return [...rows.slice(0, idx), stub, ...rows.slice(idx)] +} + +/** + * Numbered rows for one echo, with the elision divider placed where the + * wire cut the middle. `update_file.rs` always elides symmetrically: line + * ops keep the first/last ECHO_HEAD_TAIL (8+8) lines (`build_line_echo`), + * replace sites keep exactly the region's first and last line + * (`build_site_echo`) — so the divider sits after ceil(lines/2) and tail + * numbering resumes at `from_line + head + elided`. When `op` is given, + * rows inside the op's POST-APPLY addition range (`postRegionFirst`) are + * tagged `added` (the range may span the elided middle — head and tail + * rows tag independently) and update_lines/remove contribute a deletion + * stub at the seam. `fileOps` (the file's FULL op batch from the request) + * lets head-clamped echoes map the anchor through sibling line-op deltas + * instead of falling back to original coordinates. + */ +export function echoRows( + echo: OpEcho, + op?: UpdateOp, + fileOps?: readonly UpdateOp[], +): EchoRow[] { + const isAdded = addedLinePredicate(echo, op, fileOps) + const lineRow = (text: string, lineNo: number): EchoRow => ({ + kind: 'line' as const, + lineNo, + text, + added: isAdded(lineNo), + }) + + const elided = echo.elided ?? 0 + let rows: EchoRow[] + if (elided <= 0) { + rows = echo.lines.map((text, i) => lineRow(text, echo.from_line + i)) + } else { + const headLen = Math.ceil(echo.lines.length / 2) + const head = echo.lines + .slice(0, headLen) + .map((text, i) => lineRow(text, echo.from_line + i)) + const tail = echo.lines + .slice(headLen) + .map((text, i) => lineRow(text, echo.from_line + headLen + elided + i)) + rows = [...head, { kind: 'elision' as const, count: elided }, ...tail] + } + + const stub = deletionStub(echo, op, fileOps) + return stub ? insertStub(rows, stub.row, stub.anchorLine) : rows +} + +/* ---------------- rendering ---------------- */ + +function EchoLines({ + echo, + op, + fileOps, +}: { + echo: OpEcho + op?: UpdateOp + fileOps?: readonly UpdateOp[] +}) { + const rows = echoRows(echo, op, fileOps) + if (rows.length === 0) { + return ( +
+ · empty echo +
+ ) + } + return ( +
+ {rows.map((row) => { + if (row.kind === 'elision') { + return ( + /* At most one elision divider per echo — the key is unique. */ +
+ + + + {row.count} {row.count === 1 ? 'line' : 'lines'} elided + +
+ ) + } + if (row.kind === 'stub') { + return ( + /* At most one deletion stub per echo — the key is unique. */ +
+ + + + {row.label} + {row.verb === 'replaced' ? ( + · old text not echoed + ) : null} + +
+ ) + } + return ( +
+ + {row.added ? '+' : ''} + + + {row.lineNo} + + + {row.text} + +
+ ) + })} +
+ ) +} + +function EchoGroupBlock({ + group, + op, + fileOps, +}: { + group: EchoGroup + op?: UpdateOp + fileOps?: readonly UpdateOp[] +}) { + // total_replacements is duplicated on each site of the op. + const total = group.echoes[0]?.total_replacements + const extra = + typeof total === 'number' && total > group.echoes.length + ? total - group.echoes.length + : 0 + return ( - result.success && - typeof result.before === 'string' && - typeof result.after === 'string' +
+
+ {op ? ( + + + + op {group.opIndex} + + + + {formatUpdateOp(op)} + + + ) : ( + + op {group.opIndex} + + )} +
+ {group.echoes.map((echo, i) => ( +
+ {echo.total_replacements != null ? ( +
+ {echo.total_replacements} replaced +
+ ) : null} + +
+ ))} + {extra > 0 ? ( + // Sites are capped at 5 — extras have NO echo, only this count. +
+ · {extra} more {extra === 1 ? 'replacement' : 'replacements'} not + echoed +
+ ) : null} +
) } -function OpSummaryTable({ - req, - resp, - running, - preview, +/** Failed file: echoes is [] and only `error` is populated; applied / + new_line_count are on the wire but meaningless — grayed out. The + prescriptive message renders in full (never swallow agent guidance). */ +function FileFailureRow({ result }: { result: UpdateFileResult }) { + return ( +
+
+ {result.path} + {result.error ? ( + + + + + + {result.error.code} + + + + + {result.error.message} + + + ) : ( + err + )} + + {result.applied} + + + {result.new_line_count} + +
+ {result.error ? ( +
+ {result.error.message} +
+ ) : null} +
+ ) +} + +function FileEchoSection({ + file, + result, }: { - req: UpdateFileRequest - resp: UpdateFileResponse | null - running?: boolean - preview?: boolean + file: UpdateFileSpec + result: UpdateFileResult }) { - const totalApplied = resp?.results.reduce( - (n, r) => n + (r.success ? r.applied : 0), - 0, + if (!result.success) return + const groups = groupEchoesByOp(result.echoes) + + return ( +
+
+ {/* Canonical absolute path from the result, not the request. */} + {result.path} + {file.ops.length} + {result.applied} + {result.new_line_count} + {result.echoes_truncated ? ( + + truncated + + ) : null} +
+ {groups.length > 0 ? ( + groups.map((group) => ( + + )) + ) : ( +
+ · no echoes +
+ )} + {result.echoes_truncated ? ( +
+ ⋯ echo budget (~4 KiB) exhausted before all ops — coder::read-file to + inspect +
+ ) : null} +
) +} +/** + * Request-side op summary — pending approval, running, and the fallback + * when the response is missing or unparseable. + */ +function OpSummaryTable({ req }: { req: UpdateFileRequest }) { return ( - - {req.files.map((file, i) => { - const result = resp?.results[i] + {req.files.map((file) => { const opSummary = file.ops .map((op) => { const head = formatUpdateOp(op) @@ -89,51 +564,10 @@ function OpSummaryTable({ - ) })} - {resp && typeof totalApplied === 'number' ? ( - - - - - ) : null}
path opsstatus
{opSummary} - {result?.success ? ( - - → {result.new_line_count} lines - - ) : null} - - {preview || running ? ( - - ) : result?.success ? ( - - {result.applied} ok - - ) : result?.error ? ( - - - - - err - - - {result.error} - - ) : ( - err - )}
- total applied - - 0 ? 'accent' : 'default'}> - {`${totalApplied} ops`} - -
) @@ -152,87 +586,42 @@ export function UpdateFileView({ ? safeParseResponse(updateFileResponseSchema, output) : null - const showDiffs = - !preview && - !running && - resp?.results.some((r) => hasDiffSnapshot(r)) === true + const showEchoes = !preview && !running && resp !== null + const failedCount = resp?.results.filter((r) => !r.success).length ?? 0 return (
{req.files.length} + {showEchoes && failedCount > 0 ? ( + + {failedCount} + + ) : null} {running ? ( · applying… - ) : showDiffs ? ( - diff ) : null}
- {showDiffs ? ( + {showEchoes && resp ? ( req.files.map((file, i) => { - const result = resp?.results[i] - if (result && !result.success) { + const result = resp.results[i] + if (!result) { return (
- {file.path} - {result.error ? ( - - - - - err - - - {result.error} - - ) : ( - err - )} -
- ) - } - - if (result && hasDiffSnapshot(result)) { - return ( -
-
- - {file.path} - - {file.ops.length} - {result.applied} - {result.new_line_count} -
- + {file.path} · no result
) } - - return ( -
- {file.path} · no diff snapshot -
- ) + return }) ) : ( - + )}
) diff --git a/console/web/src/components/chat/coder/__tests__/ReadFileView.test.ts b/console/web/src/components/chat/coder/__tests__/ReadFileView.test.ts new file mode 100644 index 00000000..39b2a421 --- /dev/null +++ b/console/web/src/components/chat/coder/__tests__/ReadFileView.test.ts @@ -0,0 +1,200 @@ +import { describe, expect, it } from 'vitest' +import { readFileRequestSchema, readFileResponseSchema } from '../parsers' +import { + deriveReadEntries, + formatNumericMode, + isBatchRequest, + nextWindowStart, + normalizeReadTarget, + windowLabel, +} from '../ReadFileView' + +describe('normalizeReadTarget', () => { + it('expands a bare path string to whole-file defaults', () => { + expect(normalizeReadTarget('src/lib.rs')).toEqual({ + path: 'src/lib.rs', + lineFrom: null, + lineTo: null, + numbered: false, + stat: false, + }) + }) + + it('collapses nullish object fields to one shape', () => { + expect( + normalizeReadTarget({ + path: 'src/config.rs', + line_from: 1, + line_to: 30, + numbered: true, + }), + ).toEqual({ + path: 'src/config.rs', + lineFrom: 1, + lineTo: 30, + numbered: true, + stat: false, + }) + }) + + it('keeps explicit null window fields as null', () => { + const t = normalizeReadTarget({ + path: 'a.txt', + line_from: null, + stat: true, + }) + expect(t.lineFrom).toBeNull() + expect(t.lineTo).toBeNull() + expect(t.stat).toBe(true) + }) +}) + +describe('windowLabel', () => { + it('returns null for non-windowed reads', () => { + expect(windowLabel(null, null)).toBeNull() + }) + + it('renders a bounded window', () => { + expect(windowLabel(10, 50)).toBe('L10–50') + }) + + it('renders an open-ended window to EOF', () => { + expect(windowLabel(40, null)).toBe('L40–EOF') + }) + + it('defaults line_from to 1 when only line_to is set (wire rule)', () => { + expect(windowLabel(null, 30)).toBe('L1–30') + }) +}) + +describe('nextWindowStart', () => { + it('advances past the returned window', () => { + expect(nextWindowStart(40, 11)).toBe(51) + }) + + it('treats a missing line_from as 1 (full read cut by budget)', () => { + expect(nextWindowStart(null, 25)).toBe(26) + }) + + it('tolerates missing lines_returned', () => { + expect(nextWindowStart(10, null)).toBe(10) + expect(nextWindowStart(10, undefined)).toBe(10) + }) +}) + +describe('formatNumericMode', () => { + it('decodes the lower 9 permission bits', () => { + expect(formatNumericMode(420)).toBe('rw-r--r--') // 0o644 + expect(formatNumericMode(493)).toBe('rwxr-xr-x') // 0o755 + }) + + it('masks a full st_mode down to the permission bits', () => { + expect(formatNumericMode(0o100644)).toBe('rw-r--r--') + }) + + it('falls back to an em dash on junk', () => { + expect(formatNumericMode(-1)).toBe('—') + expect(formatNumericMode(Number.NaN)).toBe('—') + expect(formatNumericMode(1.5)).toBe('—') + }) +}) + +describe('deriveReadEntries — single-path mode', () => { + it('folds top-level scalars into one synthetic success entry', () => { + const req = readFileRequestSchema.parse({ + path: 'src/main.rs', + line_from: 10, + line_to: 50, + }) + const resp = readFileResponseSchema.parse({ + path: '/work/project/src/main.rs', + content: 'fn main() {}\n', + is_utf8: true, + lines_returned: 1, + more_lines: true, + size: 2048, + mode: 420, + mtime: 1750000000, + }) + const entries = deriveReadEntries(req, resp) + expect(isBatchRequest(req)).toBe(false) + expect(entries).toHaveLength(1) + expect(entries[0].requested).toEqual({ + path: 'src/main.rs', + lineFrom: 10, + lineTo: 50, + numbered: false, + stat: false, + }) + expect(entries[0].result?.success).toBe(true) + expect(entries[0].result?.path).toBe('/work/project/src/main.rs') + expect(entries[0].result?.more_lines).toBe(true) + // total_lines absent on the wire = not fully traversed, NOT 0. + expect(entries[0].result?.total_lines).toBeUndefined() + }) + + it('keeps result null while the response is missing (running)', () => { + const req = readFileRequestSchema.parse({ path: 'a.txt' }) + const entries = deriveReadEntries(req, null) + expect(entries).toHaveLength(1) + expect(entries[0].result).toBeNull() + }) + + it('returns no entries when neither path nor paths is set (C210)', () => { + const req = readFileRequestSchema.parse({}) + expect(deriveReadEntries(req, null)).toEqual([]) + }) +}) + +describe('deriveReadEntries — batch mode', () => { + it('aligns results to targets by index, mixed success and failure', () => { + const req = readFileRequestSchema.parse({ + paths: [ + 'src/lib.rs', + { path: 'src/config.rs', line_from: 1, line_to: 30, numbered: true }, + { path: 'huge.bin', stat: true }, + ], + }) + const resp = readFileResponseSchema.parse({ + results: [ + { + path: '/work/project/src/lib.rs', + success: true, + content: 'pub mod config;\n', + is_utf8: true, + lines_returned: 1, + total_lines: 1, + more_lines: false, + size: 16, + }, + { + path: '/work/project/src/config.rs', + success: false, + error: { + code: 'C213', + message: + 'batch_read_budget_bytes (1048576) exhausted after 1048560 bytes — retry this entry in its own call', + }, + }, + ], + }) + const entries = deriveReadEntries(req, resp) + expect(isBatchRequest(req)).toBe(true) + expect(entries).toHaveLength(3) + expect(entries[0].requested.path).toBe('src/lib.rs') + expect(entries[0].result?.success).toBe(true) + expect(entries[1].requested.numbered).toBe(true) + expect(entries[1].result?.success).toBe(false) + expect(entries[1].result?.error?.code).toBe('C213') + // Fewer results than targets — trailing entries stay pending/null. + expect(entries[2].requested.stat).toBe(true) + expect(entries[2].result).toBeNull() + }) + + it('keeps every result null when the response is missing', () => { + const req = readFileRequestSchema.parse({ paths: ['a.txt', 'b.txt'] }) + const entries = deriveReadEntries(req, null) + expect(entries).toHaveLength(2) + expect(entries.every((e) => e.result === null)).toBe(true) + }) +}) diff --git a/console/web/src/components/chat/coder/__tests__/SearchView.test.ts b/console/web/src/components/chat/coder/__tests__/SearchView.test.ts new file mode 100644 index 00000000..58db15d5 --- /dev/null +++ b/console/web/src/components/chat/coder/__tests__/SearchView.test.ts @@ -0,0 +1,146 @@ +import { describe, expect, it } from 'vitest' +import type { ContentMatch } from '../parsers' +import { + buildGroupRows, + formatContextRequest, + formatMatchCount, + groupContentMatches, +} from '../SearchView' + +function match( + overrides: Partial & { line: number }, +): ContentMatch { + return { + path: '/repo/src/main.rs', + column: 1, + text: `line ${overrides.line}`, + ...overrides, + } +} + +describe('groupContentMatches', () => { + it('groups by path preserving first-seen file order', () => { + const groups = groupContentMatches([ + match({ line: 3, path: '/repo/a.rs' }), + match({ line: 7, path: '/repo/b.rs' }), + match({ line: 9, path: '/repo/a.rs' }), + ]) + expect(groups.map((g) => g.path)).toEqual(['/repo/a.rs', '/repo/b.rs']) + expect(groups[0].matches.map((m) => m.line)).toEqual([3, 9]) + expect(groups[1].matches.map((m) => m.line)).toEqual([7]) + }) + + it('returns an empty list for no matches', () => { + expect(groupContentMatches([])).toEqual([]) + }) +}) + +describe('buildGroupRows', () => { + it('emits a single match row when no context is present', () => { + const rows = buildGroupRows([match({ line: 12, column: 5 })]) + expect(rows).toHaveLength(1) + expect(rows[0]).toMatchObject({ + kind: 'match', + line: 12, + column: 5, + text: 'line 12', + }) + }) + + it('derives context line numbers by offset from the match line', () => { + const rows = buildGroupRows([ + match({ line: 10, before: ['a', 'b'], after: ['c'] }), + ]) + expect(rows.map((r) => [r.kind, r.line])).toEqual([ + ['context', 8], + ['context', 9], + ['match', 10], + ['context', 11], + ]) + expect(rows.map((r) => r.text)).toEqual(['a', 'b', 'line 10', 'c']) + }) + + it('inserts a gap row between non-contiguous blocks only', () => { + const rows = buildGroupRows([ + match({ line: 5, after: ['x'] }), + // previous block ends at 6, next starts at 18 (before[] of 2) → gap + match({ line: 20, before: ['y', 'z'] }), + // contiguous with 20 → no gap + match({ line: 21 }), + ]) + expect(rows.map((r) => r.kind)).toEqual([ + 'match', + 'context', + 'gap', + 'context', + 'context', + 'match', + 'match', + ]) + }) + + it('omits the gap when context windows touch or overlap', () => { + const rows = buildGroupRows([ + match({ line: 5, after: ['x'] }), // ends at 6 + match({ line: 7 }), // starts at 7 — contiguous + ]) + expect(rows.some((r) => r.kind === 'gap')).toBe(false) + }) + + it('assigns unique keys across rows', () => { + const rows = buildGroupRows([ + match({ line: 10, before: ['a'], after: ['b'] }), + match({ line: 10, before: ['a'], after: ['b'] }), + ]) + const keys = rows.map((r) => r.key) + expect(new Set(keys).size).toBe(keys.length) + }) + + it('only carries column on match rows', () => { + const rows = buildGroupRows([ + match({ line: 4, column: 9, before: ['ctx'] }), + ]) + const context = rows.find((r) => r.kind === 'context') + const hit = rows.find((r) => r.kind === 'match') + expect(context?.column).toBeUndefined() + expect(hit?.column).toBe(9) + }) +}) + +describe('formatContextRequest', () => { + it('returns null when no context was requested (absent, null, or 0)', () => { + expect(formatContextRequest(undefined, undefined)).toBeNull() + expect(formatContextRequest(null, null)).toBeNull() + expect(formatContextRequest(0, 0)).toBeNull() + }) + + it('collapses symmetric context to ±N', () => { + expect(formatContextRequest(2, 2)).toBe('±2') + }) + + it('shows only the sides that were asked for', () => { + expect(formatContextRequest(3, null)).toBe('-3') + expect(formatContextRequest(null, 1)).toBe('+1') + expect(formatContextRequest(2, 4)).toBe('-2 +4') + }) +}) + +describe('formatMatchCount', () => { + it('reports content hits as lines (one ContentMatch per line on wire)', () => { + expect(formatMatchCount(1, 0)).toBe('1 line') + expect(formatMatchCount(12, 0)).toBe('12 lines') + }) + + it('reports path hits with their own unit', () => { + expect(formatMatchCount(0, 1)).toBe('1 path') + expect(formatMatchCount(0, 3)).toBe('3 paths') + }) + + it('joins both kinds with a separator', () => { + expect(formatMatchCount(12, 3)).toBe('12 lines · 3 paths') + }) + + it('falls back to "0 matches" when nothing hit', () => { + expect(formatMatchCount(0, 0)).toBe('0 matches') + }) +}) diff --git a/console/web/src/components/chat/coder/__tests__/UpdateFileView.test.ts b/console/web/src/components/chat/coder/__tests__/UpdateFileView.test.ts new file mode 100644 index 00000000..d6288862 --- /dev/null +++ b/console/web/src/components/chat/coder/__tests__/UpdateFileView.test.ts @@ -0,0 +1,519 @@ +import { describe, expect, it } from 'vitest' +import type { OpEcho, UpdateOp } from '../parsers' +import { contentLineCount, echoRows, groupEchoesByOp } from '../UpdateFileView' + +function lineEcho(overrides: Partial = {}): OpEcho { + return { + op_index: 0, + from_line: 1, + lines: ['pub mod utils;'], + ...overrides, + } +} + +describe('groupEchoesByOp', () => { + it('returns no groups for a failed file (echoes always [])', () => { + expect(groupEchoesByOp([])).toEqual([]) + }) + + it('keeps one group per line op in wire order', () => { + const groups = groupEchoesByOp([ + lineEcho({ op_index: 0, from_line: 1 }), + lineEcho({ op_index: 1, from_line: 10 }), + ]) + expect(groups.map((g) => g.opIndex)).toEqual([0, 1]) + expect(groups[0]?.echoes).toHaveLength(1) + }) + + it('groups replace sites sharing an op_index, sites in match order', () => { + const groups = groupEchoesByOp([ + lineEcho({ op_index: 0, from_line: 1 }), + lineEcho({ op_index: 2, from_line: 10, total_replacements: 7 }), + lineEcho({ op_index: 2, from_line: 55, total_replacements: 7 }), + ]) + expect(groups).toHaveLength(2) + expect(groups[1]?.opIndex).toBe(2) + expect(groups[1]?.echoes.map((e) => e.from_line)).toEqual([10, 55]) + }) +}) + +describe('contentLineCount (parity with update_file.rs split_content)', () => { + it('mirrors Rust str::lines() — trailing \\n adds no line', () => { + expect(contentLineCount('')).toBe(0) + expect(contentLineCount('a')).toBe(1) + expect(contentLineCount('a\n')).toBe(1) + expect(contentLineCount('a\nb')).toBe(2) + expect(contentLineCount('a\nb\n')).toBe(2) + expect(contentLineCount('\n')).toBe(1) + expect(contentLineCount('a\n\n')).toBe(2) + }) + + it('CRLF \\r never affects the count', () => { + expect(contentLineCount('a\r\nb\r\n')).toBe(2) + expect(contentLineCount('\r\n')).toBe(1) + }) +}) + +describe('echoRows (no op — neutral rows, wire numbering)', () => { + it('numbers lines sequentially from from_line when nothing is elided', () => { + const rows = echoRows(lineEcho({ from_line: 10, lines: ['a', 'b', 'c'] })) + expect(rows).toEqual([ + { kind: 'line', lineNo: 10, text: 'a', added: false }, + { kind: 'line', lineNo: 11, text: 'b', added: false }, + { kind: 'line', lineNo: 12, text: 'c', added: false }, + ]) + }) + + it('places the replace-site divider between first and last region line', () => { + // build_site_echo: region 10..=15 echoes first + last, elided = 4. + const rows = echoRows( + lineEcho({ + from_line: 10, + lines: ['fn first() {', '}'], + elided: 4, + total_replacements: 7, + }), + ) + expect(rows).toEqual([ + { kind: 'line', lineNo: 10, text: 'fn first() {', added: false }, + { kind: 'elision', count: 4 }, + // Tail resumes after the 4 elided inner lines: region ends at L15. + { kind: 'line', lineNo: 15, text: '}', added: false }, + ]) + }) + + it('resumes line-op tail numbering after the elided middle (8+8 split)', () => { + // build_line_echo: 26-line region keeps first/last ECHO_HEAD_TAIL (8). + const lines = Array.from({ length: 16 }, (_, i) => `l${i}`) + const rows = echoRows(lineEcho({ from_line: 100, lines, elided: 10 })) + expect(rows).toHaveLength(17) + expect(rows[7]).toEqual({ + kind: 'line', + lineNo: 107, + text: 'l7', + added: false, + }) + expect(rows[8]).toEqual({ kind: 'elision', count: 10 }) + expect(rows[9]).toEqual({ + kind: 'line', + lineNo: 118, + text: 'l8', + added: false, + }) + expect(rows[16]).toEqual({ + kind: 'line', + lineNo: 125, + text: 'l15', + added: false, + }) + }) + + it('degrades an elided echo with no lines to just the divider', () => { + const rows = echoRows(lineEcho({ from_line: 1, lines: [], elided: 3 })) + expect(rows).toEqual([{ kind: 'elision', count: 3 }]) + }) +}) + +describe('echoRows — diff tagging (half-diff)', () => { + it('insert: tags [at_line, at_line+K-1] added with trailing-\\n content, no stub', () => { + // K = 2 ("x\ny\n" — trailing newline must NOT count a third line, or + // the trailing context line L12 would be mistagged as added). + const op: UpdateOp = { op: 'insert', at_line: 10, content: 'x\ny\n' } + // build_line_echo: region [10,11] ±2 context → L8..L13. + const echo = lineEcho({ + from_line: 8, + lines: ['c1', 'c2', 'x', 'y', 'c3', 'c4'], + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 8, text: 'c1', added: false }, + { kind: 'line', lineNo: 9, text: 'c2', added: false }, + { kind: 'line', lineNo: 10, text: 'x', added: true }, + { kind: 'line', lineNo: 11, text: 'y', added: true }, + { kind: 'line', lineNo: 12, text: 'c3', added: false }, + { kind: 'line', lineNo: 13, text: 'c4', added: false }, + ]) + }) + + it('update_lines: tags [from_line, from_line+K-1] and stubs at the seam', () => { + // K = 3 from "a\nb\nc\n" — replaces original L5–L6 (2 lines → 3). + const op: UpdateOp = { + op: 'update_lines', + from_line: 5, + to_line: 6, + content: 'a\nb\nc\n', + } + // Post-apply region [5,7] ±2 context → L3..L9. + const echo = lineEcho({ + from_line: 3, + lines: ['c1', 'c2', 'a', 'b', 'c', 'c3', 'c4'], + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 3, text: 'c1', added: false }, + { kind: 'line', lineNo: 4, text: 'c2', added: false }, + // Stub sits at the seam: after leading context, before additions. + { kind: 'stub', verb: 'replaced', label: 'replaced original L5–L6' }, + { kind: 'line', lineNo: 5, text: 'a', added: true }, + { kind: 'line', lineNo: 6, text: 'b', added: true }, + { kind: 'line', lineNo: 7, text: 'c', added: true }, + { kind: 'line', lineNo: 8, text: 'c3', added: false }, + { kind: 'line', lineNo: 9, text: 'c4', added: false }, + ]) + }) + + it('collapses the stub range to a single line number when from==to', () => { + const op: UpdateOp = { + op: 'update_lines', + from_line: 7, + to_line: 7, + content: 'z', + } + const echo = lineEcho({ from_line: 5, lines: ['c1', 'c2', 'z', 'c3'] }) + const stub = echoRows(echo, op).find((r) => r.kind === 'stub') + expect(stub).toEqual({ + kind: 'stub', + verb: 'replaced', + label: 'replaced original L7', + }) + }) + + it('tags an addition range spanning the elided middle (head AND tail)', () => { + // update_lines L100–L101 with 20 new lines → post-apply region + // [100,119]; ±2 context → [98,121] = 24 lines > ECHO_MAX_LINES(20) + // → first 8 (L98..105) + last 8 (L114..121), elided 8. + const content = `${Array.from({ length: 20 }, (_, i) => `n${i}`).join('\n')}\n` + const op: UpdateOp = { + op: 'update_lines', + from_line: 100, + to_line: 101, + content, + } + // Wire keeps head 8 (L98..L105 = c1,c2,n0..n5) + tail 8 + // (L114..L121 = n14..n19,c3,c4); L106..L113 (n6..n13) elided. + const echo = lineEcho({ + from_line: 98, + lines: [ + 'c1', + 'c2', + 'n0', + 'n1', + 'n2', + 'n3', + 'n4', + 'n5', + 'n14', + 'n15', + 'n16', + 'n17', + 'n18', + 'n19', + 'c3', + 'c4', + ], + elided: 8, + }) + const rows = echoRows(echo, op) + expect(rows).toHaveLength(18) + // Leading context, then the stub at the seam. + expect(rows[0]).toEqual({ + kind: 'line', + lineNo: 98, + text: 'c1', + added: false, + }) + expect(rows[1]).toEqual({ + kind: 'line', + lineNo: 99, + text: 'c2', + added: false, + }) + expect(rows[2]).toEqual({ + kind: 'stub', + verb: 'replaced', + label: 'replaced original L100–L101', + }) + // Head additions L100..L105 — every one tagged. + expect(rows[3]).toEqual({ + kind: 'line', + lineNo: 100, + text: 'n0', + added: true, + }) + expect(rows[8]).toEqual({ + kind: 'line', + lineNo: 105, + text: 'n5', + added: true, + }) + expect(rows[9]).toEqual({ kind: 'elision', count: 8 }) + // Tail resumes INSIDE the addition range: L114..L119 still added. + expect(rows[10]).toEqual({ + kind: 'line', + lineNo: 114, + text: 'n14', + added: true, + }) + expect(rows[15]).toEqual({ + kind: 'line', + lineNo: 119, + text: 'n19', + added: true, + }) + // Trailing context past the range end (119) stays neutral. + expect(rows[16]).toEqual({ + kind: 'line', + lineNo: 120, + text: 'c3', + added: false, + }) + expect(rows[17]).toEqual({ + kind: 'line', + lineNo: 121, + text: 'c4', + added: false, + }) + }) + + it('remove: every echoed line stays neutral, stub marks the seam', () => { + const op: UpdateOp = { op: 'remove', from_line: 10, to_line: 12 } + // Remove anchors at from_line-1 = L9; ±2 context → L7..L11. + const echo = lineEcho({ + from_line: 7, + lines: ['c1', 'c2', 'c3', 'c4', 'c5'], + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 7, text: 'c1', added: false }, + { kind: 'line', lineNo: 8, text: 'c2', added: false }, + { kind: 'line', lineNo: 9, text: 'c3', added: false }, + // The removal seam: original L10–L12 are gone; final L10 is the + // line that used to be L13. + { kind: 'stub', verb: 'removed', label: 'removed original L10–L12' }, + { kind: 'line', lineNo: 10, text: 'c4', added: false }, + { kind: 'line', lineNo: 11, text: 'c5', added: false }, + ]) + }) + + it('remove at EOF: stub appends when no echoed line is at/after from_line', () => { + const op: UpdateOp = { op: 'remove', from_line: 40, to_line: 42 } + // Tail removal: echo is only the surviving lines above the cut. + const echo = lineEcho({ from_line: 37, lines: ['c1', 'c2', 'c3'] }) + const rows = echoRows(echo, op) + expect(rows[3]).toEqual({ + kind: 'stub', + verb: 'removed', + label: 'removed original L40–L42', + }) + }) + + it('replace: every site line is added (zero-context echo), no stub', () => { + const op: UpdateOp = { + op: 'replace', + pattern: 'foo', + replacement: 'bar', + } + const echo = lineEcho({ + from_line: 40, + lines: ['const a = bar', 'const b = bar'], + total_replacements: 3, + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 40, text: 'const a = bar', added: true }, + { kind: 'line', lineNo: 41, text: 'const b = bar', added: true }, + ]) + }) + + it('replace: elided multi-line site tags first AND last line added', () => { + const op: UpdateOp = { + op: 'replace', + pattern: 'fn first\\(.*?\\n\\}', + replacement: 'fn first() {\n…\n}', + dot_matches_newline: true, + } + const echo = lineEcho({ + from_line: 10, + lines: ['fn first() {', '}'], + elided: 4, + total_replacements: 1, + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 10, text: 'fn first() {', added: true }, + { kind: 'elision', count: 4 }, + { kind: 'line', lineNo: 15, text: '}', added: true }, + ]) + }) + + it('insert with empty content tags nothing (K = 0, like Rust)', () => { + const op: UpdateOp = { op: 'insert', at_line: 5, content: '' } + const echo = lineEcho({ from_line: 3, lines: ['c1', 'c2', 'c3'] }) + expect(echoRows(echo, op).every((r) => r.kind === 'line' && !r.added)).toBe( + true, + ) + }) +}) + +describe('echoRows — multi-op anchor reconstruction (post-apply coords)', () => { + it('shifted update_lines: tags the post-apply line, not the original anchor (Rust echo_two_line_ops_offset_correctness)', () => { + // File 1..10; op 0 inserts "X" at L2 (+1), op 1 updates original L5 → + // "FIVE" now sits at POST-APPLY L6. Wire echo for op 1 (pinned from the + // Rust test): from_line 4 = post lines 4..8. The anchor reconstructs as + // from_line + ECHO_CONTEXT = 6; original-coords math would mistag the + // context line "4" (post L5) and misplace the stub before it. + const op: UpdateOp = { + op: 'update_lines', + from_line: 5, + to_line: 5, + content: 'FIVE', + } + const echo = lineEcho({ + op_index: 1, + from_line: 4, + lines: ['3', '4', 'FIVE', '6', '7'], + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 4, text: '3', added: false }, + { kind: 'line', lineNo: 5, text: '4', added: false }, + { kind: 'stub', verb: 'replaced', label: 'replaced original L5' }, + { kind: 'line', lineNo: 6, text: 'FIVE', added: true }, + { kind: 'line', lineNo: 7, text: '6', added: false }, + { kind: 'line', lineNo: 8, text: '7', added: false }, + ]) + }) + + it('head-clamped echo (from_line === 1) without fileOps falls back to original coords', () => { + // Op 0 of the same Rust test: its own anchor is unshifted (the other + // op sits strictly below), but build_line_echo's head clamp collapses + // from_line to 1, so wire reconstruction is impossible. Without the + // request's op list the original at_line is the only anchor — exact + // here, and "X" (post L2) tags added. + const op: UpdateOp = { op: 'insert', at_line: 2, content: 'X' } + const echo = lineEcho({ + op_index: 0, + from_line: 1, + lines: ['1', 'X', '2', '3'], + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 1, text: '1', added: false }, + { kind: 'line', lineNo: 2, text: 'X', added: true }, + { kind: 'line', lineNo: 3, text: '2', added: false }, + { kind: 'line', lineNo: 4, text: '3', added: false }, + ]) + }) + + it('shifted remove: stub seam follows the post-apply anchor', () => { + // File 1..10; insert "X" at L2 (+1), then remove original L8–L9. + // Rust anchors remove at from_line−1 = 7, mapped +1 → post L8 ("7"); + // echo from_line 8−2 = 6, post lines 6..9 = 5,6,7,10. The seam is + // post L9: stub lands between "7" and "10", not before post L8. + const op: UpdateOp = { op: 'remove', from_line: 8, to_line: 9 } + const echo = lineEcho({ + op_index: 1, + from_line: 6, + lines: ['5', '6', '7', '10'], + }) + expect(echoRows(echo, op)).toEqual([ + { kind: 'line', lineNo: 6, text: '5', added: false }, + { kind: 'line', lineNo: 7, text: '6', added: false }, + { kind: 'line', lineNo: 8, text: '7', added: false }, + { kind: 'stub', verb: 'removed', label: 'removed original L8–L9' }, + { kind: 'line', lineNo: 9, text: '10', added: false }, + ]) + }) + + it('head-clamp + line-op-only batch: maps the anchor through sibling deltas (request disambiguates the clamp)', () => { + // Insert "Z" at L1 (+1) + update_lines original L2 → post L3 "TWO". + // The echo head-clamps (from_line 1) so wire reconstruction is out, + // but the batch is line-op-only and the view holds the full request: + // post anchor = 2 + (+1 from the smaller-anchor insert) = 3 — "TWO" + // tags added and the stub lands at the true seam, not before "1". + const insertOp: UpdateOp = { op: 'insert', at_line: 1, content: 'Z' } + const op: UpdateOp = { + op: 'update_lines', + from_line: 2, + to_line: 2, + content: 'TWO', + } + const echo = lineEcho({ + op_index: 1, + from_line: 1, + lines: ['Z', '1', 'TWO', '3', '4'], + }) + expect(echoRows(echo, op, [insertOp, op])).toEqual([ + { kind: 'line', lineNo: 1, text: 'Z', added: false }, + { kind: 'line', lineNo: 2, text: '1', added: false }, + { kind: 'stub', verb: 'replaced', label: 'replaced original L2' }, + { kind: 'line', lineNo: 3, text: 'TWO', added: true }, + { kind: 'line', lineNo: 4, text: '3', added: false }, + { kind: 'line', lineNo: 5, text: '4', added: false }, + ]) + }) + + it('head-clamp mapping ignores siblings at larger anchors (they apply first, strictly below)', () => { + // record_line_op_events applies bottom-up: the remove (anchor 5) runs + // BEFORE the insert (anchor 1), so it is not a "later event" and must + // not shift the insert's region. File 1..6: insert "X" at L1 (+1), + // remove original L5–L6 → final X,1,2,3,4. Echo region [1,1] +ctx. + const op: UpdateOp = { op: 'insert', at_line: 1, content: 'X' } + const below: UpdateOp = { op: 'remove', from_line: 5, to_line: 6 } + const echo = lineEcho({ + op_index: 0, + from_line: 1, + lines: ['X', '1', '2'], + }) + const added = echoRows(echo, op, [op, below]) + .filter((r) => r.kind === 'line' && r.added) + .map((r) => (r.kind === 'line' ? r.text : '')) + expect(added).toEqual(['X']) + }) + + it('head-clamp + remove: the stub seam maps through the sibling insert', () => { + // File 1..6: insert "X" at L1 (+1), remove original L3–L4 → final + // X,1,2,5,6. Remove anchors at from_line−1 = 2, mapped +1 → post L3 + // ("2"); echo from_line 3−2 = 1 head-clamps. The seam is post L4: + // stub between "2" and "5" — original coords would misplace it + // before "2". + const insertOp: UpdateOp = { op: 'insert', at_line: 1, content: 'X' } + const op: UpdateOp = { op: 'remove', from_line: 3, to_line: 4 } + const echo = lineEcho({ + op_index: 1, + from_line: 1, + lines: ['X', '1', '2', '5', '6'], + }) + expect(echoRows(echo, op, [insertOp, op])).toEqual([ + { kind: 'line', lineNo: 1, text: 'X', added: false }, + { kind: 'line', lineNo: 2, text: '1', added: false }, + { kind: 'line', lineNo: 3, text: '2', added: false }, + { kind: 'stub', verb: 'removed', label: 'removed original L3–L4' }, + { kind: 'line', lineNo: 4, text: '5', added: false }, + { kind: 'line', lineNo: 5, text: '6', added: false }, + ]) + }) + + it('ACCEPTED RESIDUAL: a newline-adding replace above a head-of-file region defeats the mapping', () => { + // File 1..5: replace "1" → "Z\n1" (+1, matches post-line-op L1) + + // update_lines original L2 → post L3 "TWO". Regex ops run AFTER all + // line ops and their match positions (thus deltas) exist only + // server-side, so any replace in the batch forces the original-anchor + // fallback: post L2 ("1", context) mistags instead of "TWO". Bounded + // to the clamp window (first ECHO_CONTEXT+1 post-apply lines) — + // pinned so the tradeoff stays visible; see postRegionFirst. + const replaceOp: UpdateOp = { + op: 'replace', + pattern: '1', + replacement: 'Z\n1', + } + const op: UpdateOp = { + op: 'update_lines', + from_line: 2, + to_line: 2, + content: 'TWO', + } + const echo = lineEcho({ + op_index: 1, + from_line: 1, + lines: ['Z', '1', 'TWO', '3', '4'], + }) + const added = echoRows(echo, op, [replaceOp, op]) + .filter((r) => r.kind === 'line' && r.added) + .map((r) => (r.kind === 'line' ? r.text : '')) + expect(added).toEqual(['1']) // known off-by-shift; ideal would be ['TWO'] + }) +}) diff --git a/console/web/src/components/chat/coder/__tests__/parsers.test.ts b/console/web/src/components/chat/coder/__tests__/parsers.test.ts index 20643ba6..4a6e9669 100644 --- a/console/web/src/components/chat/coder/__tests__/parsers.test.ts +++ b/console/web/src/components/chat/coder/__tests__/parsers.test.ts @@ -1,17 +1,36 @@ import { describe, expect, it } from 'vitest' import { + CODER_FUNCTION_IDS, CODER_MUTATE_FUNCTION_IDS, + contentMatchSchema, createFileRequestSchema, createFileResponseSchema, deleteFileRequestSchema, deleteFileResponseSchema, formatUpdateOp, + infoRequestSchema, + infoResponseSchema, + isCoderFunction, isCoderMutateFunction, + joinEntryPath, + listFolderRequestSchema, + listFolderResponseSchema, + moveFileRequestSchema, + moveFileResponseSchema, + readFileRequestSchema, + readFileResponseSchema, safeParseRequest, safeParseResponse, + searchRequestSchema, + searchResponseSchema, + treeRequestSchema, + treeResponseSchema, + truncateInline, unwrapEnvelope, updateFileRequestSchema, updateFileResponseSchema, + updateOpReplaceSchema, + wireErrorSchema, } from '../parsers' function wrap(details: T) { @@ -22,41 +41,120 @@ function wrap(details: T) { } } -describe('isCoderMutateFunction', () => { +describe('isCoderFunction', () => { it('matches every id in the explicit allowlist', () => { + for (const id of CODER_FUNCTION_IDS) { + expect(isCoderFunction(id)).toBe(true) + } + expect(CODER_FUNCTION_IDS).toHaveLength(9) + }) + + it('rejects same-prefix non-members and other families', () => { + expect(isCoderFunction('coder::nonexistent')).toBe(false) + expect(isCoderFunction('sandbox::exec')).toBe(false) + }) +}) + +describe('isCoderMutateFunction', () => { + it('matches every id in the explicit allowlist, including move', () => { for (const id of CODER_MUTATE_FUNCTION_IDS) { expect(isCoderMutateFunction(id)).toBe(true) } + expect(isCoderMutateFunction('coder::move')).toBe(true) }) - it('rejects unrelated ids', () => { + it('rejects read-only coder ids and other families', () => { expect(isCoderMutateFunction('coder::read-file')).toBe(false) + expect(isCoderMutateFunction('coder::search')).toBe(false) expect(isCoderMutateFunction('sandbox::exec')).toBe(false) }) }) +describe('wireErrorSchema', () => { + it('accepts a structured {code, message} error', () => { + const r = wireErrorSchema.safeParse({ + code: 'C217', + message: 'already exists — pass overwrite: true', + }) + expect(r.success).toBe(true) + }) + + it('rejects entries missing the message', () => { + expect(wireErrorSchema.safeParse({ code: 'C211' }).success).toBe(false) + }) +}) + +describe('infoRequestSchema / infoResponseSchema', () => { + it('accepts an empty discovery request (null coerced to {})', () => { + expect(safeParseRequest(infoRequestSchema, undefined)).toEqual({}) + }) + + it('parses a wrapped full config payload', () => { + const r = safeParseResponse( + infoResponseSchema, + wrap({ + base_paths: ['/work/project', '/tmp/coder-cache'], + primary_root: '/work/project', + batch_read_budget_bytes: 1048576, + max_output_bytes: 131072, + max_read_bytes: 2097152, + max_write_bytes: 2097152, + default_exclude_globs: ['node_modules/**', '.git/**'], + non_accessible_globs: ['.env', 'secrets/**'], + list_default_page_size: 100, + list_max_page_size: 500, + search_default_max_line_bytes: 512, + search_default_max_matches: 100, + search_response_budget_bytes: 65536, + tree_default_depth: 3, + tree_per_folder_limit: 50, + version: '0.4.1', + }), + ) + expect(r?.primary_root).toBe('/work/project') + expect(r?.base_paths[0]).toBe(r?.primary_root) + expect(r?.non_accessible_globs).toContain('.env') + }) + + it('rejects a response missing a required budget field', () => { + expect( + safeParseResponse(infoResponseSchema, { version: '0.4.1' }), + ).toBeNull() + }) +}) + describe('createFileRequestSchema', () => { - it('accepts a batched create request', () => { + it('accepts the golden batched request', () => { const r = safeParseRequest(createFileRequestSchema, { - files: [{ path: 'src/main.ts', content: 'export {}\n' }], + files: [ + { content: 'pub mod utils;\n', overwrite: false, path: 'src/lib.rs' }, + { + content: '# scratch notes\n', + overwrite: true, + path: '/tmp/scratch/notes.md', + }, + ], }) - expect(r?.files[0]?.path).toBe('src/main.ts') + expect(r?.files).toHaveLength(2) + expect(r?.files[0]?.path).toBe('src/lib.rs') }) - it('rejects empty files array', () => { - expect(safeParseRequest(createFileRequestSchema, { files: [] })).toBeNull() + it('accepts an empty files array (no minItems in the golden; runtime rejects)', () => { + const r = safeParseRequest(createFileRequestSchema, { files: [] }) + expect(r?.files).toHaveLength(0) }) }) describe('createFileResponseSchema', () => { - it('parses raw success', () => { + it('parses raw success with error omitted (not null)', () => { const r = safeParseResponse(createFileResponseSchema, { - results: [{ path: 'a.txt', success: true, bytes_written: 5 }], + results: [{ path: '/work/a.txt', success: true, bytes_written: 5 }], }) expect(r?.results[0]?.bytes_written).toBe(5) + expect(r?.results[0]?.error).toBeUndefined() }) - it('parses wrapped partial failure', () => { + it('parses wrapped partial failure with a structured WireError', () => { const r = safeParseResponse( createFileResponseSchema, wrap({ @@ -65,37 +163,43 @@ describe('createFileResponseSchema', () => { path: '.env', success: false, bytes_written: 0, - error: 'C211: not accessible', + error: { + code: 'C211', + message: 'not accessible — matches non_accessible_globs', + }, }, - { path: 'ok.txt', success: true, bytes_written: 1 }, + { path: '/work/ok.txt', success: true, bytes_written: 1 }, ], }), ) expect(r?.results).toHaveLength(2) expect(r?.results[0]?.success).toBe(false) + expect(r?.results[0]?.error?.code).toBe('C211') + expect(r?.results[1]?.error).toBeUndefined() }) }) describe('updateFileRequestSchema', () => { - it('accepts mixed ops', () => { + it('accepts the golden request with all four op kinds', () => { const r = safeParseRequest(updateFileRequestSchema, { files: [ { - path: 'lib.rs', + path: 'src/lib.rs', ops: [ - { op: 'insert', at_line: 1, content: '// header\n' }, + { op: 'insert', at_line: 1, content: '// generated by coder\n' }, { op: 'remove', from_line: 10, to_line: 12 }, { op: 'update_lines', - from_line: 4, - to_line: 4, - content: 'fn main() {}\n', + from_line: 5, + to_line: 7, + content: 'pub fn hello() {\n println!("hello");\n}\n', }, { op: 'replace', - pattern: 'foo', - replacement: 'bar', - ignore_case: true, + pattern: '// BEGIN legacy.*?// END legacy', + replacement: '// removed', + dot_matches_newline: true, + expect_matches: 1, }, ], }, @@ -106,55 +210,603 @@ describe('updateFileRequestSchema', () => { if (!ops?.[0] || !ops[3]) throw new Error('expected ops') expect(formatUpdateOp(ops[0])).toBe('insert @ L1') expect(formatUpdateOp(ops[3])).toContain('replace') + expect(formatUpdateOp(ops[3])).toContain('(expect 1)') + }) + + it('accepts a null expect_matches (replace all unconditionally)', () => { + const op = updateOpReplaceSchema.safeParse({ + op: 'replace', + pattern: 'foo', + replacement: 'bar', + expect_matches: null, + }) + expect(op.success).toBe(true) + if (op.success) expect(formatUpdateOp(op.data)).not.toContain('expect') + }) + + // Golden-valid: no minItems on ops. The runtime answers with a per-file + // C210 ('ops must not be empty'), so the request must parse for the + // structured per-file failure view to render. + it('accepts a file spec with empty ops (runtime rejects per-file with C210)', () => { + const r = safeParseRequest(updateFileRequestSchema, { + files: [{ path: 'src/lib.rs', ops: [] }], + }) + expect(r?.files[0]?.ops).toHaveLength(0) }) }) describe('updateFileResponseSchema', () => { - it('unwraps harness envelope', () => { + it('unwraps the harness envelope and parses echoes', () => { const payload = { results: [ - { path: 'a.txt', success: true, applied: 2, new_line_count: 10 }, + { + path: '/work/src/lib.rs', + success: true, + applied: 2, + new_line_count: 42, + echoes: [ + { + op_index: 0, + from_line: 1, + lines: ['// generated by coder', 'pub mod utils;'], + }, + ], + echoes_truncated: false, + }, ], } expect(unwrapEnvelope(wrap(payload))).toEqual(payload) const r = safeParseResponse(updateFileResponseSchema, wrap(payload)) expect(r?.results[0]?.applied).toBe(2) + expect(r?.results[0]?.echoes[0]?.lines).toHaveLength(2) + expect(r?.results[0]?.echoes[0]?.elided).toBeUndefined() }) - it('parses before/after snapshots', () => { + it('parses replace-site echoes with elided + total_replacements', () => { const r = safeParseResponse(updateFileResponseSchema, { results: [ { - path: 'src/index.ts', + path: '/work/src/lib.rs', success: true, applied: 1, - new_line_count: 12, - before: 'const x = 1\n', - after: 'const x = 2\n', + new_line_count: 120, + echoes: [ + { + op_index: 0, + from_line: 10, + lines: ['fn first() {', '}'], + elided: 4, + total_replacements: 7, + }, + { + op_index: 0, + from_line: 55, + lines: ['fn second() {}'], + total_replacements: 7, + }, + ], + echoes_truncated: true, }, ], }) - expect(r?.results[0]?.before).toBe('const x = 1\n') - expect(r?.results[0]?.after).toBe('const x = 2\n') + expect(r?.results[0]?.echoes[0]?.elided).toBe(4) + expect(r?.results[0]?.echoes[0]?.total_replacements).toBe(7) + expect(r?.results[0]?.echoes[1]?.elided).toBeUndefined() + expect(r?.results[0]?.echoes_truncated).toBe(true) + }) + + it('parses a failed file with empty echoes and a WireError', () => { + const r = safeParseResponse(updateFileResponseSchema, { + results: [ + { + path: '/work/src/gone.rs', + success: false, + applied: 0, + new_line_count: 0, + echoes: [], + echoes_truncated: false, + error: { + code: 'C210', + message: 'expect_matches: 1 but pattern matched 0 times', + }, + }, + ], + }) + expect(r?.results[0]?.echoes).toEqual([]) + expect(r?.results[0]?.error?.code).toBe('C210') + }) + + it('rejects results missing the always-present echoes fields', () => { + expect( + safeParseResponse(updateFileResponseSchema, { + results: [ + { path: 'a.txt', success: true, applied: 1, new_line_count: 3 }, + ], + }), + ).toBeNull() }) }) describe('deleteFileRequestSchema', () => { - it('accepts recursive batch delete', () => { + it('accepts the golden recursive batch delete', () => { const r = safeParseRequest(deleteFileRequestSchema, { - paths: ['tmp/', 'old.txt'], + paths: ['src/old_module.rs', 'build/artifacts'], recursive: true, }) expect(r?.recursive).toBe(true) expect(r?.paths).toHaveLength(2) }) + + it('accepts an empty paths array (no minItems in the golden; runtime rejects)', () => { + const r = safeParseRequest(deleteFileRequestSchema, { paths: [] }) + expect(r?.paths).toHaveLength(0) + }) }) describe('deleteFileResponseSchema', () => { - it('parses idempotent miss', () => { + it('parses idempotent miss (already absent, not a deletion)', () => { const r = safeParseResponse(deleteFileResponseSchema, { - results: [{ path: 'gone.txt', success: true, removed: false }], + results: [{ path: '/work/gone.txt', success: true, removed: false }], }) + expect(r?.results[0]?.success).toBe(true) expect(r?.results[0]?.removed).toBe(false) + expect(r?.results[0]?.error).toBeUndefined() + }) + + it('parses a C210 refusal to delete an allowed root', () => { + const r = safeParseResponse(deleteFileResponseSchema, { + results: [ + { + path: '/work/project', + success: false, + removed: false, + error: { + code: 'C210', + message: 'refusing to delete an allowed root', + }, + }, + ], + }) + expect(r?.results[0]?.error?.code).toBe('C210') + }) +}) + +describe('moveFileRequestSchema', () => { + it('accepts the golden batched move', () => { + const r = safeParseRequest(moveFileRequestSchema, { + files: [ + { from: 'src/old_name.rs', to: 'src/new_name.rs' }, + { + from: 'build/output.bin', + overwrite: true, + to: '/tmp/coder-cache/output.bin', + }, + ], + }) + expect(r?.files).toHaveLength(2) + expect(r?.files[1]?.overwrite).toBe(true) + }) + + it('accepts an empty files array (no minItems in the golden; runtime rejects)', () => { + const r = safeParseRequest(moveFileRequestSchema, { files: [] }) + expect(r?.files).toHaveLength(0) + }) +}) + +describe('moveFileResponseSchema', () => { + it('parses a wrapped mixed batch: moved, no-op self-move, C217 failure', () => { + const r = safeParseResponse( + moveFileResponseSchema, + wrap({ + results: [ + { + from: '/work/src/old_name.rs', + to: '/work/src/new_name.rs', + success: true, + moved: true, + }, + { + from: '/work/same.rs', + to: '/work/same.rs', + success: true, + moved: false, + }, + { + from: '/work/build/output.bin', + to: '/tmp/coder-cache/output.bin', + success: false, + moved: false, + error: { + code: 'C217', + message: 'destination exists — pass overwrite: true', + }, + }, + ], + }), + ) + expect(r?.results[0]?.moved).toBe(true) + // success + !moved = no-op self-move, render as "unchanged". + expect(r?.results[1]?.success).toBe(true) + expect(r?.results[1]?.moved).toBe(false) + expect(r?.results[1]?.error).toBeUndefined() + expect(r?.results[2]?.error?.code).toBe('C217') + }) +}) + +describe('readFileRequestSchema', () => { + it('accepts the golden single-path windowed request (numbered)', () => { + const r = safeParseRequest(readFileRequestSchema, { + line_from: 10, + line_to: 50, + path: 'src/main.rs', + numbered: true, + }) + expect(r?.path).toBe('src/main.rs') + expect(r?.line_from).toBe(10) + expect(r?.numbered).toBe(true) + }) + + it('accepts the golden batch with mixed string and object targets', () => { + const r = safeParseRequest(readFileRequestSchema, { + paths: [ + 'src/lib.rs', + { line_from: 1, line_to: 30, path: 'src/config.rs' }, + { path: 'assets/logo.bin', stat: true }, + ], + }) + expect(r?.paths).toHaveLength(3) + expect(r?.paths?.[0]).toBe('src/lib.rs') + }) +}) + +describe('readFileResponseSchema', () => { + it('parses a single-path full read (scalars set, results omitted)', () => { + const r = safeParseResponse( + readFileResponseSchema, + wrap({ + path: '/work/src/main.rs', + content: 'fn main() {}\n', + is_utf8: true, + lines_returned: 1, + total_lines: 1, + more_lines: false, + size: 13, + mode: 420, + mtime: 1760000000, + }), + ) + expect(r?.content).toBe('fn main() {}\n') + expect(r?.results).toBeUndefined() + }) + + it('distinguishes absent total_lines (not traversed) from zero', () => { + const r = safeParseResponse(readFileResponseSchema, { + path: '/work/big.log', + content: 'line 40\nline 41\n', + is_utf8: true, + lines_returned: 2, + more_lines: true, + size: 9999999, + }) + expect(r?.more_lines).toBe(true) + expect(r?.total_lines).toBeUndefined() + }) + + it('parses a stat-only probe on a file too large to line-count', () => { + // total_lines / is_utf8 stay absent beyond max_read_bytes; stat SUCCEEDS. + const r = safeParseResponse(readFileResponseSchema, { + path: '/work/huge.bin', + lines_returned: 0, + more_lines: false, + size: 50000000, + mode: 420, + mtime: 1760000000, + }) + expect(r?.content).toBeUndefined() + expect(r?.lines_returned).toBe(0) + expect(r?.is_utf8).toBeUndefined() + expect(r?.total_lines).toBeUndefined() + }) + + it('tolerates explicit nulls for optional fields', () => { + const r = safeParseResponse(readFileResponseSchema, { + path: '/work/a.txt', + content: null, + total_lines: null, + }) + expect(r?.content).toBeNull() + }) + + it('parses a batch with a per-entry C213 budget failure mid-stream', () => { + const r = safeParseResponse(readFileResponseSchema, { + results: [ + { + path: '/work/src/lib.rs', + success: true, + content: 'pub mod utils;\n', + is_utf8: true, + lines_returned: 1, + total_lines: 1, + more_lines: false, + size: 15, + mode: 420, + mtime: 1760000000, + }, + { + path: '/work/src/big.rs', + success: false, + error: { + code: 'C213', + message: + 'batch_read_budget_bytes (1048576) exhausted after 1048561 bytes — read this file individually', + }, + }, + ], + }) + expect(r?.results).toHaveLength(2) + expect(r?.results?.[0]?.success).toBe(true) + expect(r?.results?.[1]?.error?.code).toBe('C213') + // size omitted when the budget died before the file was opened. + expect(r?.results?.[1]?.size).toBeUndefined() + }) +}) + +describe('searchRequestSchema', () => { + it('accepts the golden scoped content search with context lines', () => { + const r = safeParseRequest(searchRequestSchema, { + context_lines_after: 2, + context_lines_before: 2, + include_globs: ['**/*.rs'], + path: 'src', + query: 'fn handle', + search_content: true, + search_paths: false, + }) + expect(r?.query).toBe('fn handle') + expect(r?.context_lines_before).toBe(2) + }) + + it('accepts a minimal query-only request', () => { + const r = safeParseRequest(searchRequestSchema, { query: 'TODO' }) + expect(r?.query).toBe('TODO') + expect(r?.regex).toBeUndefined() + }) + + it('rejects a request without query', () => { + expect(safeParseRequest(searchRequestSchema, { path: 'src' })).toBeNull() + }) +}) + +describe('searchResponseSchema', () => { + it('parses content matches with and without context arrays', () => { + const r = safeParseResponse( + searchResponseSchema, + wrap({ + content_matches: [ + { + path: '/work/src/main.rs', + line: 12, + column: 4, + text: 'fn handle_request() {', + before: ['// router entry', ''], + after: [' let body = read();', ' respond(body)'], + }, + { + path: '/work/src/lib.rs', + line: 1, + column: 1, + text: 'fn handle() {}', + }, + ], + path_matches: [{ path: '/work/src/handlers.rs' }], + truncated: true, + }), + ) + expect(r?.content_matches[0]?.before).toHaveLength(2) + // Omitted context = zero context, not an empty array. + expect(r?.content_matches[1]?.before).toBeUndefined() + expect(r?.path_matches[0]?.path).toBe('/work/src/handlers.rs') + expect(r?.truncated).toBe(true) + }) + + it('rejects a content match missing the column', () => { + expect( + contentMatchSchema.safeParse({ path: 'a.rs', line: 1, text: 'x' }) + .success, + ).toBe(false) + }) +}) + +describe('treeRequestSchema', () => { + it('accepts the golden depth-limited request and an empty request', () => { + const r = safeParseRequest(treeRequestSchema, { max_depth: 3, path: '.' }) + expect(r?.max_depth).toBe(3) + expect(safeParseRequest(treeRequestSchema, {})).toEqual({}) + }) +}) + +describe('treeResponseSchema', () => { + it('parses a nested tree with truncation stubs and wire omissions', () => { + const r = safeParseResponse( + treeResponseSchema, + wrap({ + path: '/work/project', + root: { + name: 'project', + kind: 'dir', + size: 0, + mtime: 1760000000, + children: [ + // File node: children/truncated omitted, non_accessible omitted. + { name: 'README.md', kind: 'file', size: 1024, mtime: 1760000000 }, + { + name: 'src', + kind: 'dir', + size: 0, + mtime: 1760000000, + children: [ + { + name: 'main.rs', + kind: 'file', + size: 2048, + mtime: 1760000000, + }, + ], + truncated: { + reason: 'per_folder_limit', + shown: 1, + total: 73, + hint: 'use coder::list-folder to paginate src', + }, + }, + // Default-exclude stub: childless dir, no total. + { + name: 'node_modules', + kind: 'dir', + size: 0, + mtime: 1760000000, + truncated: { + reason: 'default_exclude', + shown: 0, + hint: 'pass use_default_excludes: false to descend', + }, + }, + { + name: '.env', + kind: 'file', + size: 64, + mtime: 1760000000, + non_accessible: true, + }, + ], + }, + }), + ) + expect(r?.root.name).toBe('project') + // non_accessible omitted on the wire when false — defaulted by the schema. + expect(r?.root.non_accessible).toBe(false) + expect(r?.root.children?.[0]?.non_accessible).toBe(false) + expect(r?.root.children?.[0]?.children).toBeUndefined() + expect(r?.root.children?.[1]?.truncated?.reason).toBe('per_folder_limit') + expect(r?.root.children?.[1]?.truncated?.total).toBe(73) + expect(r?.root.children?.[2]?.truncated?.total).toBeUndefined() + expect(r?.root.children?.[3]?.non_accessible).toBe(true) + }) +}) + +describe('listFolderRequestSchema', () => { + it('accepts the golden paged request and an empty request', () => { + const r = safeParseRequest(listFolderRequestSchema, { + page: 1, + page_size: 50, + path: 'src', + }) + expect(r?.page_size).toBe(50) + expect(safeParseRequest(listFolderRequestSchema, {})).toEqual({}) + }) + + it('accepts a null page_size (config default applies)', () => { + const r = safeParseRequest(listFolderRequestSchema, { page_size: null }) + expect(r?.page_size).toBeNull() + }) +}) + +describe('listFolderResponseSchema', () => { + it('parses a page with non_accessible always present per entry', () => { + const r = safeParseResponse( + listFolderResponseSchema, + wrap({ + path: '/work/project/src', + entries: [ + { + name: 'main.rs', + kind: 'file', + size: 2048, + mtime: 1760000000, + non_accessible: false, + }, + { + name: 'secrets', + kind: 'dir', + size: 0, + mtime: 1760000000, + non_accessible: true, + }, + ], + page: 1, + page_size: 50, + total: 73, + has_more: true, + }), + ) + expect(r?.entries).toHaveLength(2) + expect(r?.entries[1]?.non_accessible).toBe(true) + expect(r?.has_more).toBe(true) + // Entries carry basenames only — full path is derived. + const first = r?.entries[0] + if (!r || !first) throw new Error('expected entry') + expect(joinEntryPath(r.path, first.name)).toBe('/work/project/src/main.rs') + }) + + it('rejects entries missing the always-present non_accessible', () => { + expect( + safeParseResponse(listFolderResponseSchema, { + path: '/work', + entries: [{ name: 'a', kind: 'file', size: 1, mtime: 1 }], + page: 1, + page_size: 50, + total: 1, + has_more: false, + }), + ).toBeNull() + }) +}) + +describe('formatUpdateOp', () => { + it('labels line ops with 1-based ranges', () => { + expect(formatUpdateOp({ op: 'remove', from_line: 3, to_line: 9 })).toBe( + 'remove L3–9', + ) + }) + + it('shows regex flags and expect_matches on replace ops', () => { + expect( + formatUpdateOp({ + op: 'replace', + pattern: 'foo', + replacement: 'bar', + ignore_case: true, + dot_matches_newline: true, + expect_matches: 1, + }), + ).toBe('replace /foo/is → bar (expect 1)') + }) + + it('surfaces expect_matches: 0 (assert absence)', () => { + expect( + formatUpdateOp({ + op: 'replace', + pattern: 'legacy', + replacement: '', + expect_matches: 0, + }), + ).toBe("replace /legacy/ → '' (expect 0)") + }) +}) + +describe('truncateInline', () => { + it('collapses whitespace and appends an ellipsis past the cap', () => { + expect(truncateInline('a b\n\tc')).toBe('a b c') + const long = 'x'.repeat(60) + expect(truncateInline(long)).toHaveLength(48) + expect(truncateInline(long).endsWith('…')).toBe(true) + }) +}) + +describe('joinEntryPath', () => { + it('joins basenames onto parent paths without doubling slashes', () => { + expect(joinEntryPath('/work/src', 'main.rs')).toBe('/work/src/main.rs') + expect(joinEntryPath('/', 'etc')).toBe('/etc') + expect(joinEntryPath('', 'rel.txt')).toBe('rel.txt') }) }) diff --git a/console/web/src/components/chat/coder/__tests__/treeListFolderViews.test.ts b/console/web/src/components/chat/coder/__tests__/treeListFolderViews.test.ts new file mode 100644 index 00000000..68f9d750 --- /dev/null +++ b/console/web/src/components/chat/coder/__tests__/treeListFolderViews.test.ts @@ -0,0 +1,159 @@ +import { describe, expect, it } from 'vitest' +import { pageLabel } from '../ListFolderView' +import type { TreeNode } from '../parsers' +import { summariseTree, truncationLabel } from '../TreeView' + +describe('summariseTree', () => { + it('counts the root itself plus all descendants by kind', () => { + const root: TreeNode = { + name: 'project', + kind: 'dir', + size: 0, + mtime: 0, + non_accessible: false, + children: [ + { + name: 'src', + kind: 'dir', + size: 0, + mtime: 0, + non_accessible: false, + children: [ + { + name: 'main.rs', + kind: 'file', + size: 120, + mtime: 0, + non_accessible: false, + }, + { + name: 'link', + kind: 'symlink', + size: 0, + mtime: 0, + non_accessible: false, + }, + ], + }, + { + name: 'README.md', + kind: 'file', + size: 42, + mtime: 0, + non_accessible: false, + }, + ], + } + // Symlinks count in neither bucket. + expect(summariseTree(root)).toEqual({ dirs: 2, files: 2, truncated: 0 }) + }) + + it('tallies truncation stubs across the whole tree', () => { + const root: TreeNode = { + name: 'project', + kind: 'dir', + size: 0, + mtime: 0, + non_accessible: false, + truncated: { + reason: 'per_folder_limit', + shown: 1, + total: 9, + hint: 'use coder::list-folder', + }, + children: [ + { + name: 'node_modules', + kind: 'dir', + size: 0, + mtime: 0, + non_accessible: false, + truncated: { + reason: 'default_exclude', + shown: 0, + hint: 'pass use_default_excludes: false', + }, + }, + ], + } + expect(summariseTree(root)).toEqual({ dirs: 2, files: 0, truncated: 2 }) + }) + + it('handles a single-file root (no children on the wire)', () => { + const root: TreeNode = { + name: 'main.rs', + kind: 'file', + size: 120, + mtime: 0, + non_accessible: false, + } + expect(summariseTree(root)).toEqual({ dirs: 0, files: 1, truncated: 0 }) + }) +}) + +describe('truncationLabel', () => { + it('shows shown/total for per_folder_limit (the only reason with total)', () => { + expect( + truncationLabel({ + reason: 'per_folder_limit', + shown: 50, + total: 120, + hint: 'use coder::list-folder', + }), + ).toBe('50/120 children shown — paginate with list-folder') + }) + + it('tolerates a per_folder_limit stub missing total', () => { + expect( + truncationLabel({ + reason: 'per_folder_limit', + shown: 50, + hint: 'use coder::list-folder', + }), + ).toBe('50 children shown — paginate with list-folder') + }) + + it('says "not explored" for max_depth (no total on depth cuts)', () => { + expect( + truncationLabel({ + reason: 'max_depth', + shown: 0, + hint: 'increase max_depth', + }), + ).toBe('max depth — subtree not explored') + }) + + it('marks default_exclude stubs as not descended', () => { + expect( + truncationLabel({ + reason: 'default_exclude', + shown: 0, + hint: 'pass use_default_excludes: false', + }), + ).toBe('excluded by default — not descended') + }) + + it('renders unknown reasons verbatim (forward tolerance)', () => { + expect( + truncationLabel({ reason: 'budget_exhausted', shown: 3, hint: 'retry' }), + ).toBe('budget_exhausted') + }) +}) + +describe('pageLabel', () => { + it('derives total pages from total entries and effective page size', () => { + expect(pageLabel(2, 50, 120)).toBe('2/3') + }) + + it('handles exact division', () => { + expect(pageLabel(1, 50, 100)).toBe('1/2') + }) + + it('clamps an empty folder to one page', () => { + expect(pageLabel(1, 50, 0)).toBe('1/1') + }) + + it('falls back to the bare page when page_size is 0', () => { + expect(pageLabel(1, 0, 10)).toBe('1') + }) +}) diff --git a/console/web/src/components/chat/coder/entryShared.tsx b/console/web/src/components/chat/coder/entryShared.tsx new file mode 100644 index 00000000..2f697074 --- /dev/null +++ b/console/web/src/components/chat/coder/entryShared.tsx @@ -0,0 +1,47 @@ +/** + * Entry rendering shared by `coder::tree` and `coder::list-folder` — both + * surface the same `EntryKind` + `non_accessible` wire vocabulary, so the + * icon pick and the "locked" marker live here once. + */ +import { + File, + FileText, + Folder, + Link as LinkIcon, + TriangleAlert, +} from 'lucide-react' +import { + Tooltip, + TooltipContent, + TooltipTrigger, +} from '@/components/ui/Tooltip' +import type { EntryKind } from './parsers' + +/** "Visible but locked" marker for non_accessible entries (C211 on use). */ +export function LockedBadge() { + return ( + + + + + locked + + + + matches non_accessible_globs — listed, but coder::* ops on it return + C211 + + + ) +} + +/** Mirror FsLsView's icon picks: symlink → link, dir → folder, known + text/code extensions → FileText, everything else (incl. "other") → File. */ +export function iconForEntry(kind: EntryKind, name: string) { + if (kind === 'symlink') return LinkIcon + if (kind === 'dir') return Folder + const lower = name.toLowerCase() + if (/\.(md|txt|json|yml|yaml|toml|csv|log)$/.test(lower)) return FileText + if (/\.(js|jsx|ts|tsx|py|rs|go|rb|sh|bash)$/.test(lower)) return FileText + return File +} diff --git a/console/web/src/components/chat/coder/index.tsx b/console/web/src/components/chat/coder/index.tsx index 6da97778..1f155738 100644 --- a/console/web/src/components/chat/coder/index.tsx +++ b/console/web/src/components/chat/coder/index.tsx @@ -3,7 +3,17 @@ import { parseSandboxErrorDisplay } from '@/components/chat/sandbox/parsers' import type { FunctionCallMessage } from '@/types/chat' import { CreateFilePreview, CreateFileView } from './CreateFileView' import { DeleteFilePreview, DeleteFileView } from './DeleteFileView' -import { isCoderMutateFunction, unwrapEnvelope } from './parsers' +import { InfoView } from './InfoView' +import { ListFolderView } from './ListFolderView' +import { MovePreview, MoveView } from './MoveView' +import { + isCoderFunction, + isCoderMutateFunction, + unwrapEnvelope, +} from './parsers' +import { ReadFileView } from './ReadFileView' +import { SearchView } from './SearchView' +import { TreeView } from './TreeView' import { UpdateFilePreview, UpdateFileView } from './UpdateFileView' export function CoderFunctionIdLabel({ functionId }: { functionId: string }) { @@ -20,7 +30,7 @@ export function CoderFunctionIdLabel({ functionId }: { functionId: string }) { } function tryRender(message: FunctionCallMessage): React.ReactNode | null { - if (!isCoderMutateFunction(message.functionId)) return null + if (!isCoderFunction(message.functionId)) return null if (message.pendingApproval) return null const input = unwrapEnvelope(message.input) @@ -41,11 +51,26 @@ function tryRender(message: FunctionCallMessage): React.ReactNode | null { return case 'coder::delete-file': return + case 'coder::move': + return + case 'coder::read-file': + return + case 'coder::search': + return + case 'coder::tree': + return + case 'coder::list-folder': + return + case 'coder::info': + return default: return null } } +/** Only the mutators (create/update/delete/move) gate on approval — the + * read-side functions never reach the pending state, so they have no + * Preview components to dispatch to. */ function tryRenderPreview( message: FunctionCallMessage, ): React.ReactNode | null { @@ -58,12 +83,15 @@ function tryRenderPreview( return case 'coder::delete-file': return + case 'coder::move': + return default: return null } } export const CoderToolView = { + isCoderFunction, isCoderMutateFunction, tryRender, tryRenderRunning: tryRender, diff --git a/console/web/src/components/chat/coder/parsers.ts b/console/web/src/components/chat/coder/parsers.ts index 0164d1f0..716e583e 100644 --- a/console/web/src/components/chat/coder/parsers.ts +++ b/console/web/src/components/chat/coder/parsers.ts @@ -1,12 +1,21 @@ /** - * Zod schemas + envelope helpers for batched `coder::*` mutators. + * Zod schemas + envelope helpers for every `coder::*` function (v0.4.1 wire). * - * Wire source: + * Wire source (one Rust handler per function): * workers/coder/src/functions/create_file.rs -> CreateFileInput/Output * workers/coder/src/functions/update_file.rs -> UpdateFileInput/Output * workers/coder/src/functions/delete_file.rs -> DeleteFileInput/Output + * workers/coder/src/functions/move_file.rs -> MoveFileInput/Output + * workers/coder/src/functions/read_file.rs -> ReadFileInput/Output + * workers/coder/src/functions/search.rs -> SearchInput/Output + * workers/coder/src/functions/tree.rs -> TreeInput/Output + * workers/coder/src/functions/list_folder.rs -> ListFolderInput/Output + * workers/coder/src/functions/info.rs -> InfoInput/Output * * Schemas are non-strict so additive wire fields don't break the UI. + * Every response-side Rust `Option` carries `skip_serializing_if = + * "Option::is_none"`, so unset fields are OMITTED (never `null`) — we + * model them as `.nullish()` for forward tolerance either way. */ import { z } from 'zod' import { @@ -17,10 +26,36 @@ import { export { safeParseRequest, safeParseResponse, unwrapEnvelope } +/* ---------------- function ids ---------------- */ + +/** Explicit allowlist — never prefix-match (would be accidentally too broad). */ +export const CODER_FUNCTION_IDS = [ + 'coder::create-file', + 'coder::update-file', + 'coder::delete-file', + 'coder::move', + 'coder::read-file', + 'coder::search', + 'coder::tree', + 'coder::list-folder', + 'coder::info', +] as const +export type CoderFunctionId = (typeof CODER_FUNCTION_IDS)[number] + +const CODER_FUNCTION_ID_SET: ReadonlySet = new Set( + CODER_FUNCTION_IDS, +) + +export function isCoderFunction(id: string): id is CoderFunctionId { + return CODER_FUNCTION_ID_SET.has(id) +} + +/** Mutators gate approval previews; everything else is read-only. */ export const CODER_MUTATE_FUNCTION_IDS = [ 'coder::create-file', 'coder::update-file', 'coder::delete-file', + 'coder::move', ] as const export type CoderMutateFunctionId = (typeof CODER_MUTATE_FUNCTION_IDS)[number] @@ -32,27 +67,88 @@ export function isCoderMutateFunction(id: string): id is CoderMutateFunctionId { return CODER_MUTATE_FUNCTION_ID_SET.has(id) } +/* ---------------- shared ---------------- */ + +/** + * Per-entry structured error shared by every coder response. `code` is + * stable for branching (C210 bad input, C211 not-found-or-denied, + * C213 size/budget exceeded, C217 already-exists); `message` is + * agent-oriented and names the corrective next call — show verbatim. + */ +export const wireErrorSchema = z.object({ + code: z.string(), + message: z.string(), +}) +export type WireError = z.infer + +/** `list_folder.rs::EntryKind` / `tree.rs::NodeKind` — drives icons. */ +export const entryKindSchema = z.enum(['file', 'dir', 'symlink', 'other']) +export type EntryKind = z.infer + +/* ---------------- info ---------------- */ + +/** `info.rs::InfoInput` — pure discovery call, zero arguments. */ +export const infoRequestSchema = z.object({}) +export type InfoRequest = z.infer + +/** `info.rs::InfoOutput` — all 16 fields required, none nullable. */ +export const infoResponseSchema = z.object({ + /** Canonical absolute allowed roots; index 0 is the primary root. */ + base_paths: z.array(z.string()), + /** Convenience duplicate of `base_paths[0]`. */ + primary_root: z.string(), + batch_read_budget_bytes: z.number(), + max_output_bytes: z.number(), + max_read_bytes: z.number(), + max_write_bytes: z.number(), + /** Noise filter only (bypassable) — distinct from non_accessible_globs. */ + default_exclude_globs: z.array(z.string()), + /** Access-protected: listable but not readable/writable (C211). */ + non_accessible_globs: z.array(z.string()), + list_default_page_size: z.number(), + list_max_page_size: z.number(), + search_default_max_line_bytes: z.number(), + search_default_max_matches: z.number(), + search_response_budget_bytes: z.number(), + tree_default_depth: z.number(), + tree_per_folder_limit: z.number(), + version: z.string(), +}) +export type InfoResponse = z.infer + /* ---------------- create-file ---------------- */ export const createFileSpecSchema = z.object({ path: z.string(), content: z.string(), + /** Octal permission bits as a string, e.g. "0644". */ mode: z.string().optional(), + /** Create missing parent dirs (handler default true). */ parents: z.boolean().optional(), + /** When false (default), refuse existing paths with C217. */ overwrite: z.boolean().optional(), }) export type CreateFileSpec = z.infer +/** + * No `.min(1)` on any request array (here or below): the goldens pin no + * minItems. Empty batches (and `ops: []`) are runtime rejections — + * top-level error or per-entry C210 — and those exchanges must still + * render structurally instead of falling back to raw JSON. + */ export const createFileRequestSchema = z.object({ - files: z.array(createFileSpecSchema).min(1), + files: z.array(createFileSpecSchema), }) export type CreateFileRequest = z.infer export const createFileResultSchema = z.object({ + /** Canonical absolute; caller's input verbatim when resolution failed. */ path: z.string(), success: z.boolean(), + /** 0 on failure. */ bytes_written: z.number(), - error: z.string().optional(), + /** Omitted on success. */ + error: wireErrorSchema.nullish(), }) export type CreateFileResult = z.infer @@ -65,6 +161,7 @@ export type CreateFileResponse = z.infer export const updateOpInsertSchema = z.object({ op: z.literal('insert'), + /** Insert BEFORE this 1-based line; lines+1 appends to EOF. */ at_line: z.number(), content: z.string(), }) @@ -85,8 +182,13 @@ export const updateOpUpdateLinesSchema = z.object({ export const updateOpReplaceSchema = z.object({ op: z.literal('replace'), pattern: z.string(), + /** `$1`/`${name}` capture refs; literal `$` must be `$$`. */ replacement: z.string(), ignore_case: z.boolean().optional(), + /** When true `.` also matches \n (multi-line region replaces). */ + dot_matches_newline: z.boolean().optional(), + /** Mismatch fails the file with C210; 0 asserts absence; null/omit = all. */ + expect_matches: z.number().nullish(), }) export const updateOpSchema = z.discriminatedUnion('op', [ @@ -99,23 +201,47 @@ export type UpdateOp = z.infer export const updateFileSpecSchema = z.object({ path: z.string(), - ops: z.array(updateOpSchema).min(1), + ops: z.array(updateOpSchema), }) export type UpdateFileSpec = z.infer export const updateFileRequestSchema = z.object({ - files: z.array(updateFileSpecSchema).min(1), + files: z.array(updateFileSpecSchema), }) export type UpdateFileRequest = z.infer +/** + * `update_file.rs::OpEcho` — bounded post-apply snapshot per op. Line ops + * echo the region ±2 context lines; replace ops emit one echo per match + * site (max 5, no context) sharing the same `op_index`. + */ +export const opEchoSchema = z.object({ + /** 0-based index into the request's ops array. */ + op_index: z.number(), + /** 1-based first echoed line, AFTER all ops applied. */ + from_line: z.number(), + lines: z.array(z.string()), + /** Middle lines elided when the echoed region is large. */ + elided: z.number().nullish(), + /** Replace-site echoes only: total replacements across the file. */ + total_replacements: z.number().nullish(), +}) +export type OpEcho = z.infer + export const updateFileResultSchema = z.object({ path: z.string(), + /** Per-file atomic: all ops or none. */ success: z.boolean(), + /** Ops applied — only meaningful when success. */ applied: z.number(), + /** Final line count — only meaningful when success. */ new_line_count: z.number(), - before: z.string().optional(), - after: z.string().optional(), - error: z.string().optional(), + /** Always present on the wire; empty array on failure. */ + echoes: z.array(opEchoSchema), + /** True when the ~4 KiB echo budget cut echoes short — read-file to inspect. */ + echoes_truncated: z.boolean(), + /** Omitted on success. */ + error: wireErrorSchema.nullish(), }) export type UpdateFileResult = z.infer @@ -127,16 +253,20 @@ export type UpdateFileResponse = z.infer /* ---------------- delete-file ---------------- */ export const deleteFileRequestSchema = z.object({ - paths: z.array(z.string()).min(1), + paths: z.array(z.string()), + /** Required for non-empty dirs; files and empty dirs ignore it. */ recursive: z.boolean().optional(), }) export type DeleteFileRequest = z.infer export const deleteFileResultSchema = z.object({ path: z.string(), + /** Missing paths are idempotent SUCCESSES. */ success: z.boolean(), + /** False on idempotent missing-path success — "already absent". */ removed: z.boolean(), - error: z.string().optional(), + /** Omitted on success. C210 = refusing to delete an allowed root. */ + error: wireErrorSchema.nullish(), }) export type DeleteFileResult = z.infer @@ -145,6 +275,260 @@ export const deleteFileResponseSchema = z.object({ }) export type DeleteFileResponse = z.infer +/* ---------------- move ---------------- */ + +export const moveFileSpecSchema = z.object({ + from: z.string(), + to: z.string(), + /** When false (default), refuse existing destinations with C217. */ + overwrite: z.boolean().optional(), + /** Create missing parent dirs of the destination. */ + parents: z.boolean().optional(), +}) +export type MoveFileSpec = z.infer + +export const moveFileRequestSchema = z.object({ + files: z.array(moveFileSpecSchema), +}) +export type MoveFileRequest = z.infer + +export const moveFileResultSchema = z.object({ + from: z.string(), + to: z.string(), + success: z.boolean(), + /** False for a no-op self-move — render as "unchanged", not moved. */ + moved: z.boolean(), + /** Omitted on success. C210 messages may name a corrected target path. */ + error: wireErrorSchema.nullish(), +}) +export type MoveFileResult = z.infer + +export const moveFileResponseSchema = z.object({ + results: z.array(moveFileResultSchema), +}) +export type MoveFileResponse = z.infer + +/* ---------------- read-file ---------------- */ + +/** Batch entry: bare path string = whole-file read, or per-entry options. */ +export const readTargetSchema = z.union([ + z.string(), + z.object({ + path: z.string(), + line_from: z.number().nullish(), + line_to: z.number().nullish(), + /** Prefix lines with absolute `N→` numbers (feeds update-file ops). */ + numbered: z.boolean().optional(), + /** Metadata-only probe; no batch budget consumed. */ + stat: z.boolean().optional(), + }), +]) +export type ReadTarget = z.infer + +/** + * `read_file.rs::ReadFileInput` — runtime enforces `path` XOR `paths` + * (both or neither → C210); the schema itself has no required fields. + */ +export const readFileRequestSchema = z.object({ + path: z.string().nullish(), + paths: z.array(readTargetSchema).nullish(), + /** 1-based inclusive window start; switches to windowed streaming. */ + line_from: z.number().nullish(), + /** 1-based inclusive window end; omit = read to EOF (byte-capped). */ + line_to: z.number().nullish(), + numbered: z.boolean().optional(), + /** Metadata probe — exclusive with window/numbered/max_output_bytes. */ + stat: z.boolean().optional(), + /** Per-call override of the full-read context budget. */ + max_output_bytes: z.number().nullish(), +}) +export type ReadFileRequest = z.infer + +/** `read_file.rs::ReadEntryResult` — only `path` + `success` required. */ +export const readEntryResultSchema = z.object({ + path: z.string(), + success: z.boolean(), + content: z.string().nullish(), + is_utf8: z.boolean().nullish(), + lines_returned: z.number().nullish(), + /** Present only when the stream reached EOF — absent ≠ 0. */ + total_lines: z.number().nullish(), + more_lines: z.boolean().nullish(), + /** FILE size from metadata, not content size. */ + size: z.number().nullish(), + /** Unix permission bits, lower 9 bits of st_mode. */ + mode: z.number().nullish(), + mtime: z.number().nullish(), + /** Only when success:false. C213 budget errors name the batch budget. */ + error: wireErrorSchema.nullish(), +}) +export type ReadEntryResult = z.infer + +/** + * `read_file.rs::ReadFileOutput` — every field optional/omitted-when-unset. + * Single-path mode populates the scalars and omits `results`; batch mode + * populates only `results`. + */ +export const readFileResponseSchema = z.object({ + path: z.string().nullish(), + content: z.string().nullish(), + is_utf8: z.boolean().nullish(), + lines_returned: z.number().nullish(), + total_lines: z.number().nullish(), + more_lines: z.boolean().nullish(), + size: z.number().nullish(), + mode: z.number().nullish(), + mtime: z.number().nullish(), + results: z.array(readEntryResultSchema).nullish(), +}) +export type ReadFileResponse = z.infer + +/* ---------------- search ---------------- */ + +export const searchRequestSchema = z.object({ + /** Regex when `regex: true`, else literal substring. */ + query: z.string(), + regex: z.boolean().optional(), + ignore_case: z.boolean().optional(), + /** Folder scoping the walk; default "." = primary root. */ + path: z.string().optional(), + include_globs: z.array(z.string()).optional(), + exclude_globs: z.array(z.string()).optional(), + use_default_excludes: z.boolean().optional(), + search_content: z.boolean().optional(), + search_paths: z.boolean().optional(), + /** Max 10 — larger → C210. */ + context_lines_before: z.number().nullish(), + context_lines_after: z.number().nullish(), + max_matches: z.number().nullish(), + max_line_bytes: z.number().nullish(), +}) +export type SearchRequest = z.infer + +/** `search.rs::ContentMatch` — one per matching LINE (first match only). */ +export const contentMatchSchema = z.object({ + path: z.string(), + line: z.number(), + column: z.number(), + /** Truncated to max_line_bytes; never spans newlines. */ + text: z.string(), + /** Omitted when empty — absence means zero context. */ + before: z.array(z.string()).nullish(), + after: z.array(z.string()).nullish(), +}) +export type ContentMatch = z.infer + +export const pathMatchSchema = z.object({ + path: z.string(), +}) +export type PathMatch = z.infer + +export const searchResponseSchema = z.object({ + content_matches: z.array(contentMatchSchema), + path_matches: z.array(pathMatchSchema), + /** Hit max_matches or the response budget — refine, don't paginate. */ + truncated: z.boolean(), +}) +export type SearchResponse = z.infer + +/* ---------------- tree ---------------- */ + +export const treeRequestSchema = z.object({ + path: z.string().optional(), + /** Root node is depth 0; falls back to config tree_default_depth. */ + max_depth: z.number().nullish(), + per_folder_limit: z.number().nullish(), + /** False lists everything (excluded dirs otherwise become stubs). */ + use_default_excludes: z.boolean().optional(), +}) +export type TreeRequest = z.infer + +/** `tree.rs::TruncationInfo` — why a folder's children were cut off. */ +export const truncationInfoSchema = z.object({ + /** "per_folder_limit" | "max_depth" | "default_exclude". */ + reason: z.string(), + /** Children actually returned. */ + shown: z.number(), + /** Populated ONLY when reason == "per_folder_limit". */ + total: z.number().nullish(), + /** Pre-written next-step guidance — render it. */ + hint: z.string(), +}) +export type TruncationInfo = z.infer + +/** + * `tree.rs::TreeNode` (recursive). Wire caveats: `non_accessible` is + * skipped when false (defaulted here); `children`/`truncated` omitted + * when absent. The ROOT node's path is the top-level `path` — never + * join root.name onto it; child path = parent path + "/" + name. + */ +export interface TreeNode { + name: string + kind: EntryKind + size: number + mtime: number + non_accessible: boolean + children?: TreeNode[] | null + truncated?: TruncationInfo | null +} + +export const treeNodeSchema: z.ZodType = z.lazy(() => + z.object({ + name: z.string(), + kind: entryKindSchema, + size: z.number(), + mtime: z.number(), + non_accessible: z.boolean().default(false), + children: z.array(treeNodeSchema).nullish(), + truncated: truncationInfoSchema.nullish(), + }), +) + +export const treeResponseSchema = z.object({ + /** Canonical absolute path of the requested folder (= root node's path). */ + path: z.string(), + root: treeNodeSchema, +}) +export type TreeResponse = z.infer + +/* ---------------- list-folder ---------------- */ + +/** `list_folder.rs::ListFolderInput` — empty request lists primary root. */ +export const listFolderRequestSchema = z.object({ + path: z.string().optional(), + /** 1-based. */ + page: z.number().optional(), + /** Falls back to list_default_page_size; capped by list_max_page_size. */ + page_size: z.number().nullish(), +}) +export type ListFolderRequest = z.infer + +/** `list_folder.rs::DirEntry` — all fields always present on the wire. */ +export const dirEntrySchema = z.object({ + /** Basename only — join onto the response `path` for the full path. */ + name: z.string(), + kind: entryKindSchema, + size: z.number(), + mtime: z.number(), + /** Listed but locked: coder::* ops on it return C211. */ + non_accessible: z.boolean(), +}) +export type DirEntry = z.infer + +export const listFolderResponseSchema = z.object({ + path: z.string(), + /** Sorted by name, this page only. */ + entries: z.array(dirEntrySchema), + page: z.number(), + /** Effective size used (after default fill / cap clamp). */ + page_size: z.number(), + total: z.number(), + has_more: z.boolean(), +}) +export type ListFolderResponse = z.infer + +/* ---------------- formatting helpers ---------------- */ + /** Human-readable one-liner for a single update op (approval + done views). */ export function formatUpdateOp(op: UpdateOp): string { switch (op.op) { @@ -154,8 +538,16 @@ export function formatUpdateOp(op: UpdateOp): string { return `remove L${op.from_line}–${op.to_line}` case 'update_lines': return `update_lines L${op.from_line}–${op.to_line}` - case 'replace': - return `replace /${op.pattern}/ → ${op.replacement || "''"}` + case 'replace': { + // Regex-style flags: i = ignore_case, s = dot_matches_newline. + const flags = `${op.ignore_case ? 'i' : ''}${op.dot_matches_newline ? 's' : ''}` + const base = `replace /${op.pattern}/${flags} → ${op.replacement || "''"}` + // expect 0 asserts absence — still worth surfacing. + if (op.expect_matches === null || op.expect_matches === undefined) { + return base + } + return `${base} (expect ${op.expect_matches})` + } default: return 'unknown op' } @@ -166,3 +558,14 @@ export function truncateInline(text: string, max = 48): string { if (oneLine.length <= max) return oneLine return `${oneLine.slice(0, max - 1)}…` } + +/** + * Join a list-folder/tree entry `name` onto its parent's canonical path. + * Entries carry basenames only; the wire contract says derive + * `parent + "/" + name` (tolerating a trailing slash on the parent). + */ +export function joinEntryPath(parentPath: string, name: string): string { + if (parentPath === '') return name + if (parentPath.endsWith('/')) return `${parentPath}${name}` + return `${parentPath}/${name}` +} diff --git a/console/web/src/components/chat/sandbox/FsGrepView.tsx b/console/web/src/components/chat/sandbox/FsGrepView.tsx index f18b371d..8fda9759 100644 --- a/console/web/src/components/chat/sandbox/FsGrepView.tsx +++ b/console/web/src/components/chat/sandbox/FsGrepView.tsx @@ -1,4 +1,4 @@ -import type * as React from 'react' +import { renderWithHighlight } from './highlight' import { fsGrepRequestSchema, fsGrepResponseSchema, @@ -53,11 +53,11 @@ export function FsGrepView({ input, output }: FsGrepViewProps) {
                 
-                  {renderWithHighlight(
-                    m.content,
-                    req.data.pattern,
-                    !!req.data.ignore_case,
-                  )}
+                  {/* sandbox grep patterns are always regexes on the wire */}
+                  {renderWithHighlight(m.content, req.data.pattern, {
+                    isRegex: true,
+                    ignoreCase: !!req.data.ignore_case,
+                  })}
                 
               
@@ -67,64 +67,3 @@ export function FsGrepView({ input, output }: FsGrepViewProps) { ) } - -/** Best-effort substring/regex highlight. The daemon uses the Rust - `regex` crate; JS regex is a superset for the simple cases agents - use (TODO|FIXME, identifiers). Falls back to substring matching - if the pattern doesn't compile as a JS regex. */ -function renderWithHighlight( - line: string, - pattern: string, - ignoreCase: boolean, -): React.ReactNode { - if (!pattern) return line - let re: RegExp | null = null - try { - re = new RegExp(pattern, ignoreCase ? 'gi' : 'g') - } catch { - re = null - } - if (re) { - const parts: React.ReactNode[] = [] - let last = 0 - let n = 0 - for (const hit of line.matchAll(re)) { - const start = hit.index ?? 0 - const text = hit[0] - // Skip zero-width matches that would otherwise loop forever. - if (text.length === 0) continue - if (start > last) parts.push(line.slice(last, start)) - parts.push( - - {text} - , - ) - last = start + text.length - n++ - if (n > 200) break - } - if (last < line.length) parts.push(line.slice(last)) - return parts - } - // Substring fallback for patterns the JS regex engine rejects. - const needle = ignoreCase ? pattern.toLowerCase() : pattern - const hay = ignoreCase ? line.toLowerCase() : line - const parts: React.ReactNode[] = [] - let i = 0 - let n = 0 - while (i < line.length) { - const j = hay.indexOf(needle, i) - if (j === -1) { - parts.push(line.slice(i)) - break - } - if (j > i) parts.push(line.slice(i, j)) - parts.push( - - {line.slice(j, j + pattern.length)} - , - ) - i = j + pattern.length - } - return parts -} diff --git a/console/web/src/components/chat/sandbox/__tests__/highlight.test.ts b/console/web/src/components/chat/sandbox/__tests__/highlight.test.ts new file mode 100644 index 00000000..66c2f64d --- /dev/null +++ b/console/web/src/components/chat/sandbox/__tests__/highlight.test.ts @@ -0,0 +1,132 @@ +/** + * Pins the fiddly branches of the shared grep highlighter: the literal + * (isRegex: false) path must never fire regex metacharacters, broken + * regexes fall back to substring matching, zero-width matches are + * skipped instead of looping, the highlight-count cap stops marking + * but keeps the tail text, no-match lines pass through unhighlighted, + * and case-insensitive substring slicing preserves the line's + * original casing. + */ +import { isValidElement, type ReactNode } from 'react' +import { describe, expect, it } from 'vitest' +import { renderWithHighlight } from '../highlight' + +/** Flatten the ReactNode result into round-trip text + highlighted runs. */ +function flatten(node: ReactNode): { text: string; highlights: string[] } { + const parts = Array.isArray(node) ? node : [node] + const highlights: string[] = [] + let text = '' + for (const part of parts) { + if (typeof part === 'string') { + text += part + } else if (isValidElement<{ children: string }>(part)) { + text += part.props.children + highlights.push(part.props.children) + } + } + return { text, highlights } +} + +describe('renderWithHighlight', () => { + it('returns the line untouched for an empty query', () => { + expect( + renderWithHighlight('keep me', '', { isRegex: false, ignoreCase: false }), + ).toBe('keep me') + }) + + it('passes the line through unhighlighted when nothing matches', () => { + const literal = flatten( + renderWithHighlight('no hits here', 'absent', { + isRegex: false, + ignoreCase: false, + }), + ) + expect(literal.text).toBe('no hits here') + expect(literal.highlights).toEqual([]) + + const regex = flatten( + renderWithHighlight('no hits here', 'absent\\d+', { + isRegex: true, + ignoreCase: false, + }), + ) + expect(regex.text).toBe('no hits here') + expect(regex.highlights).toEqual([]) + }) + + it('treats the query as a literal when isRegex is false', () => { + // "a.b" as a regex would also match "axb"; the literal path must not. + const { text, highlights } = flatten( + renderWithHighlight('axb a.b axb', 'a.b', { + isRegex: false, + ignoreCase: false, + }), + ) + expect(text).toBe('axb a.b axb') + expect(highlights).toEqual(['a.b']) + }) + + it('matches as a regex when isRegex is true', () => { + const { text, highlights } = flatten( + renderWithHighlight('TODO: fix FIXME later', 'TODO|FIXME', { + isRegex: true, + ignoreCase: false, + }), + ) + expect(text).toBe('TODO: fix FIXME later') + expect(highlights).toEqual(['TODO', 'FIXME']) + }) + + it('falls back to substring matching when the regex does not compile', () => { + // "(" is invalid JS regex; Rust `regex` would reject it too, but the + // fallback keeps the view honest for any engine mismatch. + const { text, highlights } = flatten( + renderWithHighlight('call(arg)', '(', { + isRegex: true, + ignoreCase: false, + }), + ) + expect(text).toBe('call(arg)') + expect(highlights).toEqual(['(']) + }) + + it('skips zero-width regex matches without dropping real ones', () => { + // "x*" matches the empty string at every position; only the real "x" + // run should highlight and the call must terminate. + const { text, highlights } = flatten( + renderWithHighlight('axxb', 'x*', { isRegex: true, ignoreCase: false }), + ) + expect(text).toBe('axxb') + expect(highlights).toEqual(['xx']) + }) + + it('renders no highlights when every regex match is zero-width', () => { + const { text, highlights } = flatten( + renderWithHighlight('bbb', 'a*', { isRegex: true, ignoreCase: false }), + ) + expect(text).toBe('bbb') + expect(highlights).toEqual([]) + }) + + it('caps highlighted matches but keeps the unhighlighted tail', () => { + const line = 'a'.repeat(300) + const { text, highlights } = flatten( + renderWithHighlight(line, 'a', { isRegex: true, ignoreCase: true }), + ) + // The loop breaks once the counter passes 200, so exactly 201 runs + // are marked; the remaining text still round-trips unhighlighted. + expect(highlights).toHaveLength(201) + expect(text).toBe(line) + }) + + it('preserves original casing when slicing case-insensitive substrings', () => { + const { text, highlights } = flatten( + renderWithHighlight('ToDo and TODO, todone', 'todo', { + isRegex: false, + ignoreCase: true, + }), + ) + expect(text).toBe('ToDo and TODO, todone') + expect(highlights).toEqual(['ToDo', 'TODO', 'todo']) + }) +}) diff --git a/console/web/src/components/chat/sandbox/highlight.tsx b/console/web/src/components/chat/sandbox/highlight.tsx new file mode 100644 index 00000000..131e3862 --- /dev/null +++ b/console/web/src/components/chat/sandbox/highlight.tsx @@ -0,0 +1,85 @@ +/** + * Best-effort match highlighting for grep-style views — shared by coder + * SearchView and sandbox FsGrepView (grep patterns are always regexes, + * so FsGrepView passes `isRegex: true`). The daemons use the Rust + * `regex` crate; JS regex covers the simple patterns agents use and we + * fall back to substring matching when the pattern doesn't compile — or + * when `isRegex` is false, since the query is then a literal and regex + * metacharacters must not fire. + */ +import type { ReactNode } from 'react' + +export interface HighlightOptions { + /** Treat `query` as a regex; false = literal substring match. */ + isRegex: boolean + ignoreCase: boolean +} + +export function renderWithHighlight( + line: string, + query: string, + { isRegex, ignoreCase }: HighlightOptions, +): ReactNode { + if (!query) return line + if (isRegex) { + let re: RegExp | null = null + try { + re = new RegExp(query, ignoreCase ? 'gi' : 'g') + } catch { + re = null + } + if (re) return highlightRegex(line, re) + } + return highlightSubstring(line, query, ignoreCase) +} + +function highlightRegex(line: string, re: RegExp): ReactNode { + const parts: ReactNode[] = [] + let last = 0 + let n = 0 + for (const hit of line.matchAll(re)) { + const start = hit.index ?? 0 + const text = hit[0] + // Skip zero-width matches that would otherwise loop forever. + if (text.length === 0) continue + if (start > last) parts.push(line.slice(last, start)) + parts.push( + + {text} + , + ) + last = start + text.length + n++ + if (n > 200) break + } + if (last < line.length) parts.push(line.slice(last)) + return parts +} + +// Substring path: literal queries and patterns the JS regex engine rejects. +function highlightSubstring( + line: string, + query: string, + ignoreCase: boolean, +): ReactNode { + const needle = ignoreCase ? query.toLowerCase() : query + const hay = ignoreCase ? line.toLowerCase() : line + const parts: ReactNode[] = [] + let i = 0 + let n = 0 + while (i < line.length) { + const j = hay.indexOf(needle, i) + if (j === -1) { + parts.push(line.slice(i)) + break + } + if (j > i) parts.push(line.slice(i, j)) + parts.push( + + {line.slice(j, j + query.length)} + , + ) + i = j + query.length + } + return parts +} diff --git a/console/web/src/components/chat/worker/WorkerStatusView.tsx b/console/web/src/components/chat/worker/WorkerStatusView.tsx new file mode 100644 index 00000000..9dc9d910 --- /dev/null +++ b/console/web/src/components/chat/worker/WorkerStatusView.tsx @@ -0,0 +1,210 @@ +import { + ActionLine, + Chip, + MetaRow, + StatusPill, +} from '@/components/chat/sandbox/shared' +import { + safeParseRequest, + safeParseResponse, + statusState, + type WorkerStatusState, + workerStatusRequestSchema, + workerStatusResponseSchema, +} from './parsers' + +interface WorkerStatusViewProps { + input: unknown + output: unknown + running?: boolean +} + +/* Pill copy + variant for each of the four derived states. The variant + * intentionally mirrors the daemon's own severity ordering: + * running -> accent (alive) + * provisioning -> default (not running, no log signal — the daemon's own + * "likely still provisioning" branch; label tracks + * the hint, neutral variant never over-promises a + * healthy boot since the tail may be rotated) + * stopped -> warn (crashed / never booted; failure in stderr_tail) + * not-installed -> alert (not declared in project config at all) */ +const STATE_PRESENTATION: Record< + WorkerStatusState, + { label: string; variant: 'default' | 'accent' | 'warn' | 'alert' } +> = { + running: { label: 'running', variant: 'accent' }, + provisioning: { label: 'provisioning', variant: 'default' }, + stopped: { label: 'stopped', variant: 'warn' }, + 'not-installed': { label: 'not installed', variant: 'alert' }, +} + +export function WorkerStatusView({ + input, + output, + running, +}: WorkerStatusViewProps) { + const req = safeParseRequest(workerStatusRequestSchema, input) + if (!req) return null + + if (running) { + return ( +
+ + + + +
+ {`· checking ${req.name}…`} +
+
+ ) + } + + const resp = safeParseResponse(workerStatusResponseSchema, output) + if (!resp) { + // Request parsed but response didn't: never null out; show the name we + // already know plus an unavailable affordance so the row stays legible. + return ( +
+ + + + + +
+ ) + } + + const state = statusState(resp) + const presentation = STATE_PRESENTATION[state] + + return ( +
+ + + + {resp.version ? : null} + {typeof resp.pid === 'number' ? : null} + + + + + {resp.logs_dir ? : null} + + +
+ ) +} + +function TitleRow({ name }: { name: string }) { + return ( + + + {name} + + + ) +} + +/* The hint is the agent-facing next-step guidance: surface it as an advisory + * ActionLine, never tuck it away. */ +function HintRow({ hint }: { hint: string }) { + return ( + + + {hint} + + + ) +} + +function LogsDirRow({ path }: { path: string }) { + return ( +
+ logs + {path} +
+ ) +} + +/* Labeled monospace log pane. Empty tail collapses to a quiet ghost row + * (never a blank box); line order is preserved verbatim. stderr is toned + * apart from stdout so failures read at a glance. */ +function LogPane({ + label, + lines, + tone, +}: { + label: string + lines: string[] + tone: 'warn' | 'ink' +}) { + const labelTone = tone === 'warn' ? 'text-warn' : 'text-ink-faint' + return ( +
+
+ + {label} + +
+ {lines.length === 0 ? ( + + ) : ( + // Newline-joined into one
 so order + blank lines survive
+        // verbatim without per-line index keys (log lines can repeat).
+        
+          {lines.join('\n')}
+        
+ )} +
+ ) +} + +function GhostRow({ label }: { label: string }) { + return ( +
+ {`· ${label}`} +
+ ) +} + +function TypeChip({ workerType }: { workerType: string }) { + return ( + + type + {workerType} + + ) +} + +function VersionChip({ version }: { version: string }) { + return ( + + + version + + {version} + + ) +} + +function PidChip({ pid }: { pid: number }) { + return ( + + pid + {pid} + + ) +} + +function InstalledChip({ installed }: { installed: boolean }) { + return ( + + + {installed ? 'installed' : 'not installed'} + + + ) +} diff --git a/console/web/src/components/chat/worker/__tests__/parsers.test.ts b/console/web/src/components/chat/worker/__tests__/parsers.test.ts index 14e78b86..494320df 100644 --- a/console/web/src/components/chat/worker/__tests__/parsers.test.ts +++ b/console/web/src/components/chat/worker/__tests__/parsers.test.ts @@ -3,8 +3,10 @@ import { isWorkerFunction, safeParseRequest, safeParseResponse, + statusState, unwrapEnvelope, WORKER_FUNCTION_IDS, + type WorkerStatusResponse, workerAddRequestSchema, workerAddResponseSchema, workerClearResponseSchema, @@ -13,6 +15,8 @@ import { workerRemoveResponseSchema, workerStartRequestSchema, workerStartResponseSchema, + workerStatusRequestSchema, + workerStatusResponseSchema, workerStopRequestSchema, workerStopResponseSchema, workerUpdateResponseSchema, @@ -76,6 +80,137 @@ describe('worker::list', () => { }) }) +describe('worker::status', () => { + it('round-trips the request example', () => { + expect( + safeParseRequest(workerStatusRequestSchema, { name: 'pdfkit' }), + ).toEqual({ name: 'pdfkit' }) + }) + + it('rejects a request missing name', () => { + expect(safeParseRequest(workerStatusRequestSchema, {})).toBeNull() + }) + + it('round-trips a wrapped running StatusOutcome', () => { + const outcome = { + name: 'pdfkit', + installed: true, + worker_type: 'oci', + running: true, + pid: 28943, + version: '1.0.0', + logs_dir: '/Users/anderson/.iii/logs/pdfkit', + stderr_tail: [], + stdout_tail: ['[pdfkit] listening on :4101'], + hint: 'worker is healthy; trigger it with `pdfkit::render`.', + } + const parsed = safeParseResponse(workerStatusResponseSchema, wrap(outcome)) + expect(parsed).toMatchObject({ + name: 'pdfkit', + installed: true, + running: true, + pid: 28943, + version: '1.0.0', + }) + expect(parsed?.stdout_tail).toEqual(['[pdfkit] listening on :4101']) + }) + + it('parses null pid / version / logs_dir (engine builtin)', () => { + const parsed = safeParseResponse(workerStatusResponseSchema, { + name: 'iii-stream', + installed: true, + worker_type: 'builtin', + running: true, + pid: null, + version: null, + logs_dir: null, + stderr_tail: [], + stdout_tail: [], + hint: 'engine builtin; managed by the engine process.', + }) + expect(parsed?.pid).toBeNull() + expect(parsed?.version).toBeNull() + expect(parsed?.logs_dir).toBeNull() + }) + + it('defaults omitted tails to empty arrays', () => { + const parsed = safeParseResponse(workerStatusResponseSchema, { + name: 'pdfkit', + installed: true, + worker_type: 'oci', + running: true, + hint: 'healthy', + }) + expect(parsed?.stderr_tail).toEqual([]) + expect(parsed?.stdout_tail).toEqual([]) + }) + + it('parses a not-installed StatusOutcome', () => { + const parsed = safeParseResponse(workerStatusResponseSchema, { + name: 'ghost', + installed: false, + worker_type: 'not-installed', + running: false, + pid: null, + version: null, + logs_dir: null, + stderr_tail: [], + stdout_tail: [], + hint: 'not declared in config.yaml; run `worker::add` first.', + }) + expect(parsed?.installed).toBe(false) + expect(parsed?.worker_type).toBe('not-installed') + }) +}) + +describe('statusState', () => { + function outcome(over: Partial): WorkerStatusResponse { + return { + name: 'pdfkit', + installed: true, + worker_type: 'oci', + running: false, + pid: null, + version: null, + logs_dir: null, + stderr_tail: [], + stdout_tail: [], + hint: 'x', + ...over, + } + } + + it('derives "not-installed" when installed is false', () => { + expect( + statusState(outcome({ installed: false, worker_type: 'not-installed' })), + ).toBe('not-installed') + }) + + it('derives "running" when installed and running', () => { + expect(statusState(outcome({ running: true }))).toBe('running') + }) + + it('derives "provisioning" when down with both tails empty', () => { + // No-tail down is the daemon's own "installed but not running and no logs + // yet — likely still provisioning" branch, so the label matches its hint. + // The presentation layer keeps this on the neutral pill variant (a rotated + // tail could still hide a crash) — derivation just tracks the hint wording. + expect(statusState(outcome({ running: false }))).toBe('provisioning') + }) + + it('derives "stopped" when down with a non-empty stderr tail', () => { + expect( + statusState(outcome({ running: false, stderr_tail: ['npm ERR! boom'] })), + ).toBe('stopped') + }) + + it('derives "stopped" when down with only stdout output', () => { + expect( + statusState(outcome({ running: false, stdout_tail: ['booting…'] })), + ).toBe('stopped') + }) +}) + describe('worker::start', () => { it('parses a request', () => { expect( diff --git a/console/web/src/components/chat/worker/index.tsx b/console/web/src/components/chat/worker/index.tsx index 7e0ff7b0..9edc4d84 100644 --- a/console/web/src/components/chat/worker/index.tsx +++ b/console/web/src/components/chat/worker/index.tsx @@ -11,6 +11,7 @@ import { WorkerStopView, WorkerUpdateView, } from './WorkerOpView' +import { WorkerStatusView } from './WorkerStatusView' export function WorkerFunctionIdLabel({ functionId }: { functionId: string }) { if (!functionId.startsWith('worker::')) { @@ -43,6 +44,10 @@ function tryRender(message: FunctionCallMessage): React.ReactNode | null { switch (message.functionId) { case 'worker::list': return + case 'worker::status': + return ( + + ) case 'worker::start': return case 'worker::stop': diff --git a/console/web/src/components/chat/worker/parsers.ts b/console/web/src/components/chat/worker/parsers.ts index 8671b1b7..185c3bff 100644 --- a/console/web/src/components/chat/worker/parsers.ts +++ b/console/web/src/components/chat/worker/parsers.ts @@ -26,6 +26,7 @@ export const WORKER_FUNCTION_IDS = [ 'worker::start', 'worker::stop', 'worker::list', + 'worker::status', 'worker::clear', 'worker::schema', ] as const @@ -79,6 +80,63 @@ export const workerListResponseSchema = z.object({ }) export type WorkerListResponse = z.infer +/* ---------------- worker::status ---------------- */ + +export const workerStatusRequestSchema = z.object({ + name: z.string(), +}) +export type WorkerStatusRequest = z.infer + +/** + * StatusOutcome (flat, post-`unwrapEnvelope`). Rust `Option` => + * `.nullable().optional()` to mirror `workerEntrySchema` (pid/version). + * `stderr_tail` / `stdout_tail` default to `[]` so an absent key never + * collapses the log panes — the host always sends arrays, but the worker + * namespace has shipped omitted-tail payloads before. + */ +export const workerStatusResponseSchema = z.object({ + name: z.string(), + installed: z.boolean(), + worker_type: z.string(), + running: z.boolean(), + pid: z.number().nullable().optional(), + version: z.string().nullable().optional(), + logs_dir: z.string().nullable().optional(), + stderr_tail: z.array(z.string()).optional().default([]), + stdout_tail: z.array(z.string()).optional().default([]), + hint: z.string(), +}) +export type WorkerStatusResponse = z.infer + +/** + * The four-way status derived from a StatusOutcome. The Rust contract + * (`StatusOutcome.running`) is explicit: running=false covers BOTH + * install/boot AND a crash, and `stderr_tail` is the documented + * discriminator ("check stderr_tail to tell which"). So a down worker with + * a log tail is a `stopped` failure (stderr carries npm/boot errors), while + * a down worker with NO tail is the daemon's own "installed but not running + * and no logs yet — likely still provisioning" branch (worker_manager_daemon + * build_status). We label that no-tail case `provisioning` to match the + * daemon hint rendered on the same screen, but keep it on the neutral + * `default` pill variant (never the reassuring accent/ok families): empty + * tails can also be rotated/omitted (the host has shipped omitted-tail + * payloads — see the StatusResponse schema comment), so the neutral variant + * avoids over-promising a healthy boot while the label stays truthful to the + * hint. Pure + unit-tested here rather than in the component. + */ +export type WorkerStatusState = + | 'not-installed' + | 'running' + | 'stopped' + | 'provisioning' + +export function statusState(resp: WorkerStatusResponse): WorkerStatusState { + if (!resp.installed) return 'not-installed' + if (resp.running) return 'running' + const hasTail = resp.stderr_tail.length > 0 || resp.stdout_tail.length > 0 + return hasTail ? 'stopped' : 'provisioning' +} + /* ---------------- worker::add ---------------- */ export const workerAddRequestSchema = z.object({ diff --git a/console/web/src/index.css b/console/web/src/index.css index a71951d9..ecf454c6 100644 --- a/console/web/src/index.css +++ b/console/web/src/index.css @@ -28,6 +28,9 @@ --color-alert: #c43e1c; --color-warn: #a87a00; + /* Diff additions (coder update-file echoes) — the palette's only green. + Desaturated to sit with alert/warn; 5.3:1 on cream at 12px (AA). */ + --color-ok: #356f3d; --radius-none: 0px; --radius-full: 9999px; @@ -50,6 +53,7 @@ --color-rule-2: #1f1e1c; --color-accent: #3ea8ff; --color-accent-fg: #111110; + --color-ok: #5fae6a; } @layer base { diff --git a/console/web/src/stories/fixtures/coder-fixtures.ts b/console/web/src/stories/fixtures/coder-fixtures.ts index 638bc96a..1604788d 100644 --- a/console/web/src/stories/fixtures/coder-fixtures.ts +++ b/console/web/src/stories/fixtures/coder-fixtures.ts @@ -2,11 +2,23 @@ import type { FunctionCallMessage } from '@/types/chat' import { wrapHarness } from './sandbox-fixtures' const now = Date.now() +/** Fixed mtime so stories render stable dates (2026-03-14T00:53:20Z). */ +const MTIME = 1773456800 function byteLen(text: string): number { return new TextEncoder().encode(text).length } +/** Render the wire's `numbered: true` `N→` prefixes for a 1-based window. */ +function numberWindow(text: string, from: number, to: number): string { + const body = text + .split('\n') + .slice(from - 1, to) + .map((line, i) => `${from + i}→${line}`) + .join('\n') + return `${body}\n` +} + /** Minimal demo worker — mirrors the quick-start in workers/iii/skills/SKILL.md. */ const DEMO_WORKER_TS = `import { registerWorker } from 'iii-sdk' @@ -36,7 +48,7 @@ iii.registerFunction( ) ` -/** Substantial skill excerpt — enough lines for Pierre to show markdown + TS fences. */ +/** Substantial skill excerpt — enough lines for markdown + TS fences. 42 lines. */ const III_SKILL_MD = `--- name: iii description: >- @@ -81,49 +93,8 @@ iii.registerFunction('demo::add', async (payload: { a: number; b: number }) => { > a better fit. ` -const DEMO_WORKER_AFTER_TS = `import type { Logger } from 'iii-sdk' -import { registerWorker } from 'iii-sdk' - -const iii = registerWorker(process.env.III_ENGINE_URL!, { - workerName: 'demo', - invocationTimeoutMs: 30_000, -}) - -iii.registerFunction( - 'demo::sum', - async (payload: { a: number; b: number }) => { - return { c: payload.a + payload.b } - }, - { - description: 'Add two numbers.', - request_format: { - type: 'object', - properties: { - a: { type: 'number' }, - b: { type: 'number' }, - }, - required: ['a', 'b'], - }, - response_format: { - type: 'object', - properties: { c: { type: 'number' } }, - required: ['c'], - }, - }, -) -` - -/** Skill doc after expanding the discovery section (from top-level.md). */ -const III_SKILL_MD_AFTER_DISCOVERY = III_SKILL_MD.replace( - `## Need a capability? Discover before you build - -1. **Look at what is already registered** — \`engine::functions::list\`. -2. **Search the public registry** — \`directory::registry::workers::list\`. -3. **Build a worker** — only when steps 1 and 2 come up empty. - -> Discover in order. Don't jump to a worker you remember; the registry may hold -> a better fit.`, - `## Need a capability? Discover before you build — in this order +/** Replacement for the discovery section (SKILL.md lines 35–42). */ +const DISCOVERY_SECTION_V2 = `## Need a capability? Discover before you build — in this order The most common harness mistake is reimplementing something that already exists. Work the steps in order; stop at the first that satisfies the need. @@ -145,13 +116,27 @@ If a registered function fits, just call it: ## Trust runtime probes over introspection -\`engine::*::list\` reads can come back empty for blurred reasons. **Disambiguate -with a runtime probe** — call the function with \`iii.trigger(...)\`. If the probe -succeeds, the registration is live regardless of what \`*::list\` reported.`, -) +Probe with \`iii.trigger(...)\` before re-registering: a successful call proves +the registration is live regardless of what \`engine::*::list\` reported. +` -const III_SKILL_MD_WITH_HARNESS_BANNER = ` -${III_SKILL_MD}` +const DISCOVERY_V2_LINES = DISCOVERY_SECTION_V2.replace(/\n$/, '').split('\n') + +/* The update_lines op replaces SKILL.md lines 35..42 (the old discovery + * section through EOF). The post-apply echo region is the new section plus + * 2 lines of leading context (the table row + blank at 33–34); there is no + * trailing context at EOF. update_file.rs::build_line_echo keeps the first + * and last 8 lines and elides the middle. */ +const DISCOVERY_ECHO_REGION = [ + '| Trigger | A `(type, config, function_id)` triple. | A worker + a caller |', + '', + ...DISCOVERY_V2_LINES, +] +const DISCOVERY_ECHO_LINES = [ + ...DISCOVERY_ECHO_REGION.slice(0, 8), + ...DISCOVERY_ECHO_REGION.slice(-8), +] +const DISCOVERY_ECHO_ELIDED = DISCOVERY_ECHO_REGION.length - 16 const DEMO_PACKAGE_JSON = `{ "name": "demo-worker", @@ -166,40 +151,13 @@ const DEMO_PACKAGE_JSON = `{ } ` -const SKILL_OVERWRITE_MD = `--- -name: iii -description: >- - WebSocket-routed worker mesh — updated discovery guidance for harness agents. ---- - -# iii - -## Need a capability? Discover before you build — in this order - -The most common harness mistake is reimplementing something that already exists. -Work the steps in order; stop at the first that satisfies the need. - -**1. Look at what is already registered in the engine.** - -\`\`\`jsonc -// engine::functions::list — every function on this engine. -// Filter with { prefix: 'svc::' } or { search: 'resize' }. -// engine::workers::list — every connected worker. -\`\`\` - -If a registered function fits, just call it: -\`iii.trigger({ function_id, payload })\`. - -**2. Search the public registry** via \`directory::registry::workers::list\`. - -**3. Build a worker** only when steps 1 and 2 both come up empty. - -## Trust runtime probes over introspection - -\`engine::*::list\` reads can come back empty for blurred reasons. **Disambiguate -with a runtime probe** — call the function with \`iii.trigger(...)\`. If the probe -succeeds, the registration is live regardless of what \`*::list\` reported. -` +/** The one C211 wording (error.rs::C211_SUFFIX) — message carries no code prefix. */ +function c211(path: string) { + return { + code: 'C211', + message: `${path}: not found or not accessible. Verify the path with coder::list-folder or coder::tree.`, + } +} function base( id: string, @@ -220,7 +178,9 @@ function base( } } -/** Hero fixture: new TypeScript worker with JSON Schema metadata — rich TS diff. */ +/* ---------------- create-file ---------------- */ + +/** Hero fixture: new TypeScript worker with JSON Schema metadata — rich TS body. */ export const coderCreateSingle = base( 'coder-create-1', 'coder::create-file', @@ -238,7 +198,7 @@ export const coderCreateSingle = base( { results: [ { - path: 'workers/demo/src/index.ts', + path: '/work/workers/demo/src/index.ts', success: true, bytes_written: byteLen(DEMO_WORKER_TS), }, @@ -263,7 +223,7 @@ export const coderCreateSkillDoc = base( { results: [ { - path: 'workers/iii/skills/SKILL.md', + path: '/work/workers/iii/skills/SKILL.md', success: true, bytes_written: byteLen(III_SKILL_MD), }, @@ -271,7 +231,7 @@ export const coderCreateSkillDoc = base( }, ) -/** Multi-file scaffold — two diffs stacked in one call. */ +/** Multi-file scaffold — two bodies stacked in one call. */ export const coderCreateMultiScaffold = base( 'coder-create-scaffold', 'coder::create-file', @@ -292,12 +252,12 @@ export const coderCreateMultiScaffold = base( wrapHarness({ results: [ { - path: 'workers/demo/src/index.ts', + path: '/work/workers/demo/src/index.ts', success: true, bytes_written: byteLen(DEMO_WORKER_TS), }, { - path: 'workers/demo/package.json', + path: '/work/workers/demo/package.json', success: true, bytes_written: byteLen(DEMO_PACKAGE_JSON), }, @@ -305,6 +265,8 @@ export const coderCreateMultiScaffold = base( }), ) +/** Per-entry WireError: .env matches non_accessible_globs → C211 for that + * entry only; the path echoes the caller's input verbatim (resolution failed). */ export const coderCreateMultiPartialFail = base( 'coder-create-multi', 'coder::create-file', @@ -327,10 +289,10 @@ export const coderCreateMultiPartialFail = base( path: '.env', success: false, bytes_written: 0, - error: 'C211: path is not accessible', + error: c211('.env'), }, { - path: 'workers/demo/.gitignore', + path: '/work/workers/demo/.gitignore', success: true, bytes_written: byteLen('node_modules/\ndist/\n'), }, @@ -338,30 +300,6 @@ export const coderCreateMultiPartialFail = base( }, ) -/** Overwrite an existing skill — File view with substantial markdown body. */ -export const coderCreateOverwrite = base( - 'coder-create-ow', - 'coder::create-file', - { - files: [ - { - path: 'workers/iii/skills/SKILL.md', - content: SKILL_OVERWRITE_MD, - overwrite: true, - }, - ], - }, - wrapHarness({ - results: [ - { - path: 'workers/iii/skills/SKILL.md', - success: true, - bytes_written: byteLen(SKILL_OVERWRITE_MD), - }, - ], - }), -) - export const coderCreatePending = base( 'coder-create-pending', 'coder::create-file', @@ -394,6 +332,9 @@ export const coderCreateRunning = base( { running: true }, ) +/* ---------------- update-file ---------------- */ + +/** Insert + update_lines + replace in one file; per-op post-apply echoes. */ export const coderUpdateMixedOps = base( 'coder-update-ops', 'coder::update-file', @@ -418,6 +359,7 @@ export const coderUpdateMixedOps = base( op: 'replace', pattern: 'demo::add', replacement: 'demo::sum', + expect_matches: 1, }, ], }, @@ -426,18 +368,48 @@ export const coderUpdateMixedOps = base( { results: [ { - path: 'workers/demo/src/index.ts', + path: '/work/workers/demo/src/index.ts', success: true, applied: 3, - new_line_count: 35, - before: DEMO_WORKER_TS, - after: DEMO_WORKER_AFTER_TS, + new_line_count: 30, + echoes: [ + { + op_index: 0, + from_line: 1, + lines: [ + "import type { Logger } from 'iii-sdk'", + "import { registerWorker } from 'iii-sdk'", + '', + ], + }, + { + op_index: 1, + from_line: 2, + lines: [ + "import { registerWorker } from 'iii-sdk'", + '', + 'const iii = registerWorker(process.env.III_ENGINE_URL!, {', + " workerName: 'demo',", + ' invocationTimeoutMs: 30_000,', + '})', + '', + 'iii.registerFunction(', + ], + }, + { + op_index: 2, + from_line: 10, + lines: [" 'demo::sum',"], + total_replacements: 1, + }, + ], + echoes_truncated: false, }, ], }, ) -/** Markdown skill — expanded discovery section with jsonc example block. */ +/** Markdown skill — section rewrite whose line-op echo elides the middle. */ export const coderUpdateSkillDiscovery = base( 'coder-update-skill', 'coder::update-file', @@ -448,31 +420,9 @@ export const coderUpdateSkillDiscovery = base( ops: [ { op: 'update_lines', - from_line: 34, - to_line: 41, - content: `## Need a capability? Discover before you build — in this order - -The most common harness mistake is reimplementing something that already exists. -Work the steps in order; stop at the first that satisfies the need. - -**1. Look at what is already registered in the engine.** - -\`\`\`jsonc -// engine::functions::list — every function on this engine. -// engine::workers::list — every connected worker. -\`\`\` - -If a registered function fits, just call it: -\`iii.trigger({ function_id, payload })\`. - -**2. Search the public registry** via \`directory::registry::workers::list\`. - -**3. Build a worker** only when steps 1 and 2 both come up empty. - -## Trust runtime probes over introspection - -Disambiguate with a runtime probe — call \`iii.trigger(...)\` before re-registering. -`, + from_line: 35, + to_line: 42, + content: DISCOVERY_SECTION_V2, }, ], }, @@ -481,40 +431,51 @@ Disambiguate with a runtime probe — call \`iii.trigger(...)\` before re-regist wrapHarness({ results: [ { - path: 'workers/iii/skills/SKILL.md', + path: '/work/workers/iii/skills/SKILL.md', success: true, applied: 1, - new_line_count: 52, - before: III_SKILL_MD, - after: III_SKILL_MD_AFTER_DISCOVERY, + new_line_count: 34 + DISCOVERY_V2_LINES.length, + echoes: [ + { + op_index: 0, + from_line: 33, + lines: DISCOVERY_ECHO_LINES, + elided: DISCOVERY_ECHO_ELIDED, + }, + ], + echoes_truncated: false, }, ], }), ) -/** Two-file batch: worker refactor + package.json version bump. */ -export const coderUpdateMultiFile = base( - 'coder-update-multi', +/** Replace-site echoes: a multi-line region (first+last line, inner elided) + * and a bulk rename whose 12 matches exceed the 5-site echo cap. */ +export const coderUpdateReplaceSites = base( + 'coder-update-sites', 'coder::update-file', { files: [ { - path: 'workers/demo/src/index.ts', + path: 'workers/demo/src/adapters.ts', ops: [ { op: 'replace', - pattern: 'demo::add', - replacement: 'demo::sum', + pattern: '// BEGIN legacy exports.*?// END legacy exports', + replacement: + "// BEGIN adapter exports (generated)\nexport { LibkrunAdapter } from './libkrun'\nexport type { AdapterBootArgs } from './types'\n// END adapter exports (generated)", + dot_matches_newline: true, + expect_matches: 1, }, ], }, { - path: 'workers/demo/package.json', + path: 'workers/demo/src/events.ts', ops: [ { op: 'replace', - pattern: '"iii-sdk": "\\^0.12.0"', - replacement: '"iii-sdk": "^0.13.0"', + pattern: 'emitLegacyEvent', + replacement: 'emitEvent', }, ], }, @@ -523,25 +484,42 @@ export const coderUpdateMultiFile = base( { results: [ { - path: 'workers/demo/src/index.ts', + path: '/work/workers/demo/src/adapters.ts', success: true, applied: 1, - new_line_count: 35, - before: DEMO_WORKER_TS, - after: DEMO_WORKER_TS.replace("'demo::add'", "'demo::sum'"), + new_line_count: 31, + echoes: [ + { + op_index: 0, + from_line: 12, + lines: [ + '// BEGIN adapter exports (generated)', + '// END adapter exports (generated)', + ], + elided: 2, + total_replacements: 1, + }, + ], + echoes_truncated: false, }, { - path: 'workers/demo/package.json', + path: '/work/workers/demo/src/events.ts', success: true, applied: 1, - new_line_count: 10, - before: DEMO_PACKAGE_JSON, - after: DEMO_PACKAGE_JSON.replace('^0.12.0', '^0.13.0'), + new_line_count: 188, + echoes: [18, 44, 71, 102, 130].map((line) => ({ + op_index: 0, + from_line: line, + lines: [` emitEvent('boot', { vmId })`], + total_replacements: 12, + })), + echoes_truncated: false, }, ], }, ) +/** Per-entry WireError on one file; the other applies and echoes normally. */ export const coderUpdatePartialFail = base( 'coder-update-fail', 'coder::update-file', @@ -570,15 +548,23 @@ export const coderUpdatePartialFail = base( success: false, applied: 0, new_line_count: 0, - error: 'C211: not accessible', + echoes: [], + echoes_truncated: false, + error: c211('.env'), }, { - path: 'workers/iii/skills/SKILL.md', + path: '/work/workers/iii/skills/SKILL.md', success: true, applied: 1, - new_line_count: 48, - before: III_SKILL_MD, - after: III_SKILL_MD_WITH_HARNESS_BANNER, + new_line_count: 43, + echoes: [ + { + op_index: 0, + from_line: 1, + lines: ['', '---', 'name: iii'], + }, + ], + echoes_truncated: false, }, ], }), @@ -605,18 +591,21 @@ export const coderUpdatePending = base( { pendingApproval: true }, ) +/* ---------------- delete-file ---------------- */ + export const coderDeleteRecursive = base( 'coder-delete-rec', 'coder::delete-file', { paths: ['workers/demo/dist/', 'workers/demo/.turbo/'], recursive: true }, { results: [ - { path: 'workers/demo/dist/', success: true, removed: true }, - { path: 'workers/demo/.turbo/', success: true, removed: true }, + { path: '/work/workers/demo/dist', success: true, removed: true }, + { path: '/work/workers/demo/.turbo', success: true, removed: true }, ], }, ) +/** success + !removed = idempotent "already absent". */ export const coderDeleteIdempotent = base( 'coder-delete-miss', 'coder::delete-file', @@ -640,6 +629,503 @@ export const coderDeleteRunning = base( { running: true }, ) +/* ---------------- move ---------------- */ + +/** Rename + cross-root move + no-op self-move (success + !moved = unchanged). */ +export const coderMoveBatch = base( + 'coder-move-batch', + 'coder::move', + { + files: [ + { from: 'workers/demo/src/index.ts', to: 'workers/demo/src/main.ts' }, + { + from: 'workers/demo/build/output.tar.gz', + to: '/tmp/coder-cache/output.tar.gz', + overwrite: true, + }, + { from: 'workers/demo/notes.md', to: './workers/demo/notes.md' }, + ], + }, + wrapHarness({ + results: [ + { + from: '/work/workers/demo/src/index.ts', + to: '/work/workers/demo/src/main.ts', + success: true, + moved: true, + }, + { + from: '/work/workers/demo/build/output.tar.gz', + to: '/tmp/coder-cache/output.tar.gz', + success: true, + moved: true, + }, + { + from: '/work/workers/demo/notes.md', + to: '/work/workers/demo/notes.md', + success: true, + moved: false, + }, + ], + }), +) + +/** Per-entry C217: destination exists and overwrite was not passed. */ +export const coderMovePartialFail = base( + 'coder-move-fail', + 'coder::move', + { + files: [ + { from: 'workers/demo/src/index.ts', to: 'workers/demo/src/main.ts' }, + { from: 'workers/demo/README.md', to: 'workers/demo/docs/README.md' }, + ], + }, + { + results: [ + { + from: '/work/workers/demo/src/index.ts', + to: '/work/workers/demo/src/main.ts', + success: true, + moved: true, + }, + { + from: '/work/workers/demo/README.md', + to: '/work/workers/demo/docs/README.md', + success: false, + moved: false, + error: { + code: 'C217', + message: + '/work/workers/demo/docs/README.md already exists; pass overwrite=true to replace', + }, + }, + ], + }, +) + +export const coderMovePending = base( + 'coder-move-pending', + 'coder::move', + { + files: [ + { from: 'workers/demo/src/index.ts', to: 'workers/demo/src/main.ts' }, + ], + }, + undefined, + { pendingApproval: true }, +) + +/* ---------------- read-file ---------------- */ + +/** Single-path full read — scalar response fields, fully traversed. */ +export const coderReadSingle = base( + 'coder-read-single', + 'coder::read-file', + { path: 'workers/demo/src/index.ts' }, + { + path: '/work/workers/demo/src/index.ts', + content: DEMO_WORKER_TS, + is_utf8: true, + lines_returned: 26, + total_lines: 26, + more_lines: false, + size: byteLen(DEMO_WORKER_TS), + mode: 0o644, + mtime: MTIME, + }, +) + +/** Numbered window — `N→` prefixes from the wire, absent total_lines + * (the stream never reached EOF), more_lines feeding the next-window hint. */ +export const coderReadWindowNumbered = base( + 'coder-read-window', + 'coder::read-file', + { + path: 'workers/iii/skills/SKILL.md', + line_from: 26, + line_to: 33, + numbered: true, + }, + { + path: '/work/workers/iii/skills/SKILL.md', + content: numberWindow(III_SKILL_MD, 26, 33), + is_utf8: true, + lines_returned: 8, + total_lines: null, + more_lines: true, + size: byteLen(III_SKILL_MD), + mode: 0o644, + mtime: MTIME, + }, +) + +/** Stat probe on an over-cap file: size/mode/mtime populate, but + * total_lines/is_utf8 stay null (file exceeds max_read_bytes) — a SUCCESS. */ +export const coderReadStat = base( + 'coder-read-stat', + 'coder::read-file', + { path: 'logs/engine.jsonl', stat: true }, + { + path: '/work/logs/engine.jsonl', + content: null, + is_utf8: null, + lines_returned: 0, + total_lines: null, + more_lines: false, + size: 18_874_368, + mode: 0o644, + mtime: MTIME, + }, +) + +/** Batch read: bare-string and object targets, plus a per-entry C211. */ +export const coderReadBatch = base( + 'coder-read-batch', + 'coder::read-file', + { + paths: [ + 'workers/demo/package.json', + { + path: 'workers/iii/skills/SKILL.md', + line_from: 1, + line_to: 6, + numbered: true, + }, + '.env', + ], + }, + wrapHarness({ + results: [ + { + path: '/work/workers/demo/package.json', + success: true, + content: DEMO_PACKAGE_JSON, + is_utf8: true, + lines_returned: 11, + total_lines: 11, + more_lines: false, + size: byteLen(DEMO_PACKAGE_JSON), + mode: 0o644, + mtime: MTIME, + }, + { + path: '/work/workers/iii/skills/SKILL.md', + success: true, + content: numberWindow(III_SKILL_MD, 1, 6), + is_utf8: true, + lines_returned: 6, + total_lines: null, + more_lines: true, + size: byteLen(III_SKILL_MD), + mode: 0o644, + mtime: MTIME, + }, + { path: '.env', success: false, error: c211('.env') }, + ], + }), +) + +/* ---------------- search ---------------- */ + +/** Content matches with before/after context — straight-to-update-file flow. */ +export const coderSearchContext = base( + 'coder-search-ctx', + 'coder::search', + { + query: 'registerFunction', + path: 'workers', + include_globs: ['**/*.ts'], + context_lines_before: 1, + context_lines_after: 2, + search_paths: false, + }, + { + content_matches: [ + { + path: '/work/workers/demo/src/index.ts', + line: 5, + column: 5, + text: 'iii.registerFunction(', + before: [''], + after: [ + " 'demo::add',", + ' async (payload: { a: number; b: number }) => {', + ], + }, + { + path: '/work/workers/todo/src/index.ts', + line: 12, + column: 5, + text: "iii.registerFunction('todo::add', async (payload: AddTodo) => {", + before: ['// register the public surface'], + after: [' const todo = await store.add(payload)', ' return { todo }'], + }, + ], + path_matches: [], + truncated: false, + }, +) + +/** Budget-truncated search — `truncated: true` means refine, don't paginate. */ +export const coderSearchTruncated = base( + 'coder-search-trunc', + 'coder::search', + { query: 'TODO', ignore_case: true, use_default_excludes: false }, + wrapHarness({ + content_matches: [ + { + path: '/work/workers/demo/src/index.ts', + line: 1, + column: 4, + text: '// TODO: wire Logger from iii-sdk', + }, + { + path: '/work/workers/todo/src/store.ts', + line: 7, + column: 27, + text: 'export async function add(todo: AddTodo): Promise {', + }, + { + path: '/work/workers/todo/node_modules/iii-sdk/dist/index.js', + line: 1402, + column: 19, + text: '/* eslint-disable todo-comments */', + }, + ], + path_matches: [ + { path: '/work/workers/todo' }, + { path: '/work/workers/todo/TODO.md' }, + ], + truncated: true, + }), +) + +/* ---------------- tree ---------------- */ + +/** Snapshot with all three truncation stubs (default_exclude, max_depth, + * per_folder_limit + total) and a non_accessible leaf. Hints verbatim from + * tree.rs. Every node carries `non_accessible` explicitly — the golden + * requires it (tree.rs's omit-when-false serde skip is a worker-side + * serializer/golden split; the parser's `.default(false)` covers the live + * wire and is unit-tested in parsers.test.ts). */ +export const coderTreeSnapshot = base( + 'coder-tree-snap', + 'coder::tree', + { path: 'workers/demo', max_depth: 2, per_folder_limit: 5 }, + { + path: '/work/workers/demo', + root: { + name: 'demo', + kind: 'dir', + size: 288, + mtime: MTIME, + non_accessible: false, + children: [ + { + name: '.env', + kind: 'file', + size: 64, + mtime: MTIME, + non_accessible: true, + }, + { + name: 'fixtures', + kind: 'dir', + size: 4096, + mtime: MTIME, + non_accessible: false, + children: [ + { + name: 'add.json', + kind: 'file', + size: 212, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'boot.json', + kind: 'file', + size: 198, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'echo.json', + kind: 'file', + size: 240, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'list.json', + kind: 'file', + size: 187, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'sum.json', + kind: 'file', + size: 224, + mtime: MTIME, + non_accessible: false, + }, + ], + truncated: { + reason: 'per_folder_limit', + shown: 5, + total: 48, + hint: 'use coder::list-folder for paginated access to all entries', + }, + }, + { + name: 'node_modules', + kind: 'dir', + size: 4096, + mtime: MTIME, + non_accessible: false, + truncated: { + reason: 'default_exclude', + shown: 0, + hint: 'folder matches default_exclude_globs (coder::info lists them); re-call coder::tree with use_default_excludes: false to descend', + }, + }, + { + name: 'package.json', + kind: 'file', + size: byteLen(DEMO_PACKAGE_JSON), + mtime: MTIME, + non_accessible: false, + }, + { + name: 'src', + kind: 'dir', + size: 160, + mtime: MTIME, + non_accessible: false, + children: [ + { + name: 'adapters.ts', + kind: 'file', + size: 1184, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'index.ts', + kind: 'file', + size: byteLen(DEMO_WORKER_TS), + mtime: MTIME, + non_accessible: false, + }, + { + name: 'lib', + kind: 'dir', + size: 96, + mtime: MTIME, + non_accessible: false, + truncated: { + reason: 'max_depth', + shown: 0, + hint: 'raise max_depth or call coder::tree with this path as the new root', + }, + }, + ], + }, + ], + }, + }, +) + +/* ---------------- list-folder ---------------- */ + +/** Page 2 of a paginated listing; one non_accessible entry; has_more on. */ +export const coderListFolderPage = base( + 'coder-list-page', + 'coder::list-folder', + { path: 'workers', page: 2, page_size: 5 }, + { + path: '/work/workers', + entries: [ + { + name: 'echo', + kind: 'dir', + size: 192, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'iii', + kind: 'dir', + size: 256, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'notes.local.md', + kind: 'file', + size: 1832, + mtime: MTIME, + non_accessible: false, + }, + { + name: 'secrets', + kind: 'dir', + size: 96, + mtime: MTIME, + non_accessible: true, + }, + { + name: 'todo', + kind: 'dir', + size: 224, + mtime: MTIME, + non_accessible: false, + }, + ], + page: 2, + page_size: 5, + total: 23, + has_more: true, + }, +) + +/* ---------------- info ---------------- */ + +/** Pure discovery — values mirror config.rs defaults plus a two-root jail. */ +export const coderInfo = base( + 'coder-info', + 'coder::info', + {}, + wrapHarness({ + base_paths: ['/work', '/tmp/coder-cache'], + primary_root: '/work', + batch_read_budget_bytes: 1_048_576, + max_output_bytes: 131_072, + max_read_bytes: 10_485_760, + max_write_bytes: 10_485_760, + default_exclude_globs: [ + '**/.git/**', + '**/node_modules/**', + '**/target/**', + '**/dist/**', + '**/.venv/**', + '**/__pycache__/**', + ], + non_accessible_globs: ['**/.env', '**/.env.*', '**/secrets/**'], + list_default_page_size: 100, + list_max_page_size: 1000, + search_default_max_line_bytes: 4096, + search_default_max_matches: 1000, + search_response_budget_bytes: 262_144, + tree_default_depth: 4, + tree_per_folder_limit: 50, + version: '0.4.1', + }), +) + +/* ---------------- top-level error ---------------- */ + export const coderGateError = base( 'coder-gate-err', 'coder::create-file', @@ -671,16 +1157,27 @@ export const coderFixtures = [ coderCreateSkillDoc, coderCreateMultiScaffold, coderCreateMultiPartialFail, - coderCreateOverwrite, coderCreatePending, coderCreateRunning, coderUpdateMixedOps, coderUpdateSkillDiscovery, - coderUpdateMultiFile, + coderUpdateReplaceSites, coderUpdatePartialFail, coderUpdatePending, coderDeleteRecursive, coderDeleteIdempotent, coderDeleteRunning, + coderMoveBatch, + coderMovePartialFail, + coderMovePending, + coderReadSingle, + coderReadWindowNumbered, + coderReadStat, + coderReadBatch, + coderSearchContext, + coderSearchTruncated, + coderTreeSnapshot, + coderListFolderPage, + coderInfo, coderGateError, ] as const diff --git a/console/web/src/stories/fixtures/worker-fixtures.ts b/console/web/src/stories/fixtures/worker-fixtures.ts index 6dbfd25e..fc9095b4 100644 --- a/console/web/src/stories/fixtures/worker-fixtures.ts +++ b/console/web/src/stories/fixtures/worker-fixtures.ts @@ -77,6 +77,99 @@ export const workerListRunning = base( { running: true }, ) +/* ---------------- worker::status ---------------- */ + +/** Healthy managed OCI worker: running, real pid, version, both tails populated. */ +export const workerStatusRunning = base( + 'worker-status-running', + 'worker::status', + { name: 'pdfkit' }, + wrapHarness({ + name: 'pdfkit', + installed: true, + worker_type: 'oci', + running: true, + pid: 28943, + version: '1.0.0', + logs_dir: '/Users/anderson/.iii/logs/pdfkit', + stderr_tail: [], + stdout_tail: [ + '[pdfkit] booting render pool', + '[pdfkit] listening on :4101', + ], + hint: 'worker is healthy; trigger it with `pdfkit::render`.', + }), +) + +/** Crashed worker: installed, not running, failure captured in stderr_tail. */ +export const workerStatusStopped = base( + 'worker-status-stopped', + 'worker::status', + { name: 'pdfkit' }, + { + name: 'pdfkit', + installed: true, + worker_type: 'local', + running: false, + pid: null, + version: '1.0.0', + logs_dir: '/Users/anderson/.iii/logs/pdfkit', + stderr_tail: [ + 'npm ERR! code ELIFECYCLE', + 'npm ERR! pdfkit@1.0.0 start: `node server.js`', + 'npm ERR! Exit status 1', + ], + stdout_tail: [], + hint: 'worker crashed on boot; inspect stderr then `worker::start pdfkit`.', + }, +) + +/** Mid-boot worker: installed, not running, both tails empty (still provisioning). */ +export const workerStatusProvisioning = base( + 'worker-status-provisioning', + 'worker::status', + { name: 'pdfkit' }, + { + name: 'pdfkit', + installed: true, + worker_type: 'oci', + running: false, + pid: null, + version: null, + logs_dir: null, + stderr_tail: [], + stdout_tail: [], + hint: 'worker is provisioning; re-check `worker::status pdfkit` shortly.', + }, +) + +/** Unknown worker: not declared in config.yaml. */ +export const workerStatusNotInstalled = base( + 'worker-status-not-installed', + 'worker::status', + { name: 'ghost' }, + { + name: 'ghost', + installed: false, + worker_type: 'not-installed', + running: false, + pid: null, + version: null, + logs_dir: null, + stderr_tail: [], + stdout_tail: [], + hint: 'not declared in config.yaml; run `worker::add` first.', + }, +) + +export const workerStatusRunningLoading = base( + 'worker-status-loading', + 'worker::status', + { name: 'pdfkit' }, + undefined, + { running: true }, +) + /* ---------------- worker::start ---------------- */ export const workerStartDone = base( @@ -247,6 +340,11 @@ export const workerFixtures = [ workerListRunningOnly, workerListEmpty, workerListRunning, + workerStatusRunning, + workerStatusStopped, + workerStatusProvisioning, + workerStatusNotInstalled, + workerStatusRunningLoading, workerStartDone, workerStartNoPid, workerStopDone, diff --git a/console/web/src/stories/playground/scenarios/coder-mutate.ts b/console/web/src/stories/playground/scenarios/coder-mutate.ts index 6b101835..ebcb4cb0 100644 --- a/console/web/src/stories/playground/scenarios/coder-mutate.ts +++ b/console/web/src/stories/playground/scenarios/coder-mutate.ts @@ -1,4 +1,7 @@ -import { coderCreateSkillDoc } from '@/stories/fixtures/coder-fixtures' +import { + coderCreateSkillDoc, + coderTreeSnapshot, +} from '@/stories/fixtures/coder-fixtures' import { makeBackend, streamAssistant, @@ -10,7 +13,16 @@ export const coderMutate = makeBackend( 'coder-mutate', async function* (_prompt, _mode, _model, opts) { const signal = opts?.signal - yield* streamThought('scaffolding the iii skill doc…', { signal }) + yield* streamThought('scouting the workers tree, then scaffolding…', { + signal, + }) + yield* streamFcall({ + functionId: 'coder::tree', + input: coderTreeSnapshot.input, + output: coderTreeSnapshot.output, + waitMs: 500, + signal, + }) yield* streamFcall({ functionId: 'coder::create-file', input: coderCreateSkillDoc.input, diff --git a/console/web/src/stories/playground/scenarios/coder-update.ts b/console/web/src/stories/playground/scenarios/coder-update.ts index 1ba1d82c..75c7fe78 100644 --- a/console/web/src/stories/playground/scenarios/coder-update.ts +++ b/console/web/src/stories/playground/scenarios/coder-update.ts @@ -1,4 +1,8 @@ -import { coderUpdateSkillDiscovery } from '@/stories/fixtures/coder-fixtures' +import { + coderReadWindowNumbered, + coderSearchContext, + coderUpdateSkillDiscovery, +} from '@/stories/fixtures/coder-fixtures' import { makeBackend, streamAssistant, @@ -10,7 +14,21 @@ export const coderUpdate = makeBackend( 'coder-update', async function* (_prompt, _mode, _model, opts) { const signal = opts?.signal - yield* streamThought('expanding the discovery section in SKILL.md…', { + yield* streamThought('locating the discovery section in SKILL.md…', { + signal, + }) + yield* streamFcall({ + functionId: 'coder::search', + input: coderSearchContext.input, + output: coderSearchContext.output, + waitMs: 500, + signal, + }) + yield* streamFcall({ + functionId: 'coder::read-file', + input: coderReadWindowNumbered.input, + output: coderReadWindowNumbered.output, + waitMs: 500, signal, }) yield* streamFcall({ diff --git a/console/web/src/stories/playground/scenarios/index.ts b/console/web/src/stories/playground/scenarios/index.ts index 3e7a24ae..0d36bf71 100644 --- a/console/web/src/stories/playground/scenarios/index.ts +++ b/console/web/src/stories/playground/scenarios/index.ts @@ -60,7 +60,7 @@ export const SCENARIOS: PlaygroundScenario[] = [ id: 'coder-mutate', label: 'coder · mutate', description: - 'one coder::create-file call writing workers/iii/skills/SKILL.md with Pierre diff.', + 'coder::tree scout, then coder::create-file writing workers/iii/skills/SKILL.md.', group: 'happy paths', preferredMode: 'agent', backend: coderMutate, @@ -69,7 +69,7 @@ export const SCENARIOS: PlaygroundScenario[] = [ id: 'coder-update', label: 'coder · update', description: - 'coder::update-file on workers/iii/skills/SKILL.md with before/after Pierre diff.', + 'coder::search → numbered coder::read-file window → coder::update-file with post-apply echoes.', group: 'happy paths', preferredMode: 'agent', backend: coderUpdate, diff --git a/shell/src/fs/host.rs b/shell/src/fs/host.rs index f02b57b3..c5db6475 100644 --- a/shell/src/fs/host.rs +++ b/shell/src/fs/host.rs @@ -289,6 +289,12 @@ fn lexical_operand_with(path: &str, host_root_canon: Option<&Path>) -> PathBuf { normalize_lexical(p) } +// MIRROR-INVARIANT: `canonicalize_with_fallback` + `normalize_lexical` +// below are mirrored in `coder/src/path/mod.rs` (the coder worker's +// PathResolver). The two implementations are the same jail-safety +// algorithm and MUST evolve in lockstep — port any fix in one file to +// the other. +// Canonical case matrix for this invariant: `coder/tests/parity.rs`. /// Resolve `p` to a canonical path that is symlink-free for every existing /// ancestor, even when `p` itself doesn't yet exist. The naive fallback — /// "canonicalize, on ENOENT use the lexical path" — is a jail-escape vector