diff --git a/CLAUDE.md b/CLAUDE.md
index a7f2b35..975df67 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -7,7 +7,7 @@ Self-hosted Function-as-a-Service (FaaS) for homelab and on-premises use. Users
 ```bash
 # Docker (recommended)
 docker compose up -d
-# → dashboard at http://localhost:8443
+# → dashboard at http://localhost:3000  (compose maps host 3000 → container 8443)
 
 # Dev mode (frontend hot-reload + backend auto-restart)
 make dev
@@ -16,14 +16,14 @@ make dev
 ## Build Commands
 
 ```bash
-make build          # backend binary → build/orva  (calls adapters-embed)
+make build          # backend binary → build/orva  (calls adapters-embed + docs-embed)
 make build-all      # embed UI then build           (full release artifact)
-make test           # cd backend && go test ./...
-make lint           # cd backend && go vet ./...
+make test           # go test -count=1 ./...  (from repo root)
+make lint           # go vet ./...  (from repo root)
 make ui             # cd frontend && npm install && npm run build
 make embed          # build UI, copy dist/ → backend/internal/server/ui_dist/
-make cli            # static CLI binary → build/orva-cli (current OS)
-make cli-all        # cross-compile CLI: linux/amd64, linux/arm64, darwin/arm64
+make cli            # static CLI binary → build/orva (current OS)
+make cli-all        # cross-compile CLI: linux/{amd64,arm64}, darwin/{amd64,arm64}, windows/{amd64,arm64}
 make adapters-embed # sync runtimes/ → backend/cmd/orva/adapters/ (auto-called by build)
 make docs-embed     # sync docs/reference.md → mcp + frontend (auto-called by build/ui)
 make clean          # remove build/ and embedded artefacts
@@ -38,7 +38,7 @@ backend/          Go server (see backend/CLAUDE.md)
   internal/       Server packages (config, database, pool, proxy, mcp, …)
   runtimes/       Runtime adapter source: node, python
 cli/              Slim standalone CLI codebase (see cli/CLAUDE.md)
-  cmd/orva/       Slim CLI entry point (no server packages — ~12 MB binary)
+  cmd/orva/       Slim CLI entry point (no server packages — ~20 MB binary)
   commands/       Cobra subcommand library — single source of truth for
                   both binaries (server imports it for its CLI surface)
 internal/         Shared utilities accessible to both backend/ and cli/
@@ -47,7 +47,8 @@ internal/         Shared utilities accessible to both backend/ and cli/
 frontend/         Vue 3 dashboard (see frontend/CLAUDE.md)
 docs/             Operator and developer documentation (see docs/CLAUDE.md)
 scripts/          Installers (install.sh = server, install-cli.{sh,ps1} = CLI),
-                  Docker entrypoint, systemd unit, OpenRC unit
+                  Docker entrypoint (entrypoint.sh); the systemd + OpenRC units
+                  are emitted inline by install.sh, not separate files
 test/             Shell-based integration test suite (see test/CLAUDE.md)
   cli/            CLI-specific tests (build matrix, install-cli, upgrade, command-tree)
   install/        Server-install e2e harness (privileged systemd-in-docker)
@@ -58,7 +59,7 @@ Dockerfile        Multi-stage image (dev and production — single file)
 
 ## Data & Configuration
 
-- **Data dir**: `/var/lib/orva` (Docker volume `orva_data`) — contains `orva.db` (SQLite WAL) and `functions/<id>/versions/`
+- **Data dir**: `/var/lib/orva` (Docker volume `orva-data`) — contains `orva.db` (SQLite WAL) and `functions/<id>/versions/`
 - **Server config**: env vars or `/etc/orva/config.yaml`; full reference in `docs/CONFIG.md`
 - **CLI config**: `~/.orva/config.yaml` with `endpoint` and `api_key`
 
@@ -101,9 +102,9 @@ Every server binary stamps three variables via `-X` ldflags at link time. They f
 
 | Variable | Source | Example |
 |---|---|---|
-| `internal/version.Version`   | git tag on release; `git describe` in dev   | `v2026.05.15` |
-| `internal/version.Commit`    | `git rev-parse --short HEAD` (CI: `${GITHUB_SHA::7}`) | `1be3399` |
-| `internal/version.BuildTime` | `date -u +%Y-%m-%dT%H:%M:%SZ` at link time   | `2026-05-15T14:20:34Z` |
+| `backend/internal/version.Version`   | git tag on release; `git describe` in dev   | `v2026.06.14` |
+| `backend/internal/version.Commit`    | `git rev-parse --short HEAD` (CI: `${GITHUB_SHA::7}`) | `1be3399` |
+| `backend/internal/version.BuildTime` | `date -u +%Y-%m-%dT%H:%M:%SZ` at link time   | `2026-05-15T14:20:34Z` |
 
 Go silently ignores unknown `-X` targets, so renaming the version package or any of its variables MUST be done in lock-step across `Makefile`, `Dockerfile`, and `.github/workflows/release.yml` — otherwise the binary ships with defaults (`"dev"` / `"unknown"`) and the dashboard's Build info card lights up red flags.
 
@@ -114,4 +115,4 @@ Go silently ignores unknown `-X` targets, so renaming the version package or any
 - **UI is embedded** in the Go binary via `//go:embed ui_dist`; `make build` alone reuses the last embedded snapshot. Run `make build-all` (or `make embed` first) to pick up frontend changes.
 - **nsjail required on Linux** for sandbox invocations; the server starts without it but every invocation fails until it is installed.
 - **Firewall (nft) probe is lazy** — the nftables package does not probe on import; it probes on first use via `sync.Once`, so CLI invocations do not trigger nft warnings.
-- **Docs single source:** `docs/reference.md` is the canonical Orva reference markdown. `make docs-embed` ships copies to `backend/internal/mcp/reference.md` (embedded by the `get_orva_docs` MCP tool) and `frontend/public/docs.md` (served at `/docs.md` and read by the dashboard's Copy as Markdown button). Both consumers serve identical bytes.
+- **Docs single source:** `docs/reference.md` is the canonical Orva reference markdown. `make docs-embed` ships copies to `backend/internal/mcp/reference.md` (embedded by the `get_orva_docs` MCP tool), `frontend/public/docs.md` (served at `/docs.md` and read by the dashboard's Copy as Markdown button), and `cli/commands/reference.md` (embedded into the CLI, served by `orva docs`). All three consumers serve identical bytes.
diff --git a/backend/CLAUDE.md b/backend/CLAUDE.md
index 54462c5..96bd653 100644
--- a/backend/CLAUDE.md
+++ b/backend/CLAUDE.md
@@ -39,7 +39,7 @@ go vet ./...
 | `metrics` | Prometheus-text counters + histograms (no external deps, atomic ops) |
 | `secrets` | AES-256-GCM encrypted secrets per function |
 | `scheduler` | Cron runner (`robfig/cron/v3`) |
-| `mcp` | MCP server (go-sdk); 70 operator-management tools OR channel-mode (one tool per bundled function, invoke-only). Auth accepts API keys, OAuth 2.1 access tokens, OR channel tokens. |
+| `mcp` | MCP server (go-sdk); 72 operator-management tools OR channel-mode (one tool per bundled function, invoke-only). Auth accepts API keys, OAuth 2.1 access tokens, OR channel tokens. |
 | `oauth` | OAuth 2.1 authorization server (RFC 7591 DCR + RFC 8414 metadata + PKCE S256 + RFC 8707 resource indicators + RFC 7009 revocation). Lets claude.ai/ChatGPT add `/mcp` as a custom connector via the browser. Connected apps + sessions managed at `/api/v1/oauth/connected-apps` and `/api/v1/auth/sessions` and surfaced in the dashboard's Settings page. DCR default scope is `read invoke write admin`. |
 | `auth` | Shared `Principal` type (Kind=api_key / oauth / channel + ID/Label/Perms/Channel). Both REST middleware and MCP auth resolve the inbound bearer to a `*Principal`; downstream code (activity log, MCP tool registration) consumes the Kind directly. |
 | `trace` | Causal-trace collector + span lifecycle (W3C `traceparent` interop, outlier detection). See `docs/TRACING.md`. |
@@ -50,7 +50,6 @@ go vet ./...
 | `server` | HTTP router + middleware chain + all handlers |
 | `server/events` | SSE event hub + outbound webhook fanout |
 | `server/handlers` | One file per resource group; `respond/` sub-package |
-| `cli` | Shared `Client` + `Config` for CLI subcommands |
 | `backup` | `SnapshotDB` / `ArchiveTo` / `RestoreFrom` helpers |
 | `version` | Single source of truth for the version string |
 | `ai` | In-product AI chat assistant. `Manager` (service layer) wires the SQLite store, the secrets cipher (provider-key encryption), the in-process tool registry, the embedded Bifrost LLM gateway (`ai/llm`), and the agentic loop (`ai/agent`). Served at `/api/v1/ai/*` by `server/ai_handler.go` (SSE for chat/approval). The agent's `defaultSystemPrompt` const lives in `ai/manager.go` — it's a Go **raw string, so it must stay backtick-free** (escape any fenced-code examples by description, not literal ```). |
@@ -59,21 +58,21 @@ go vet ./...
 
 The canonical UUIDv7 generator (`ids`) and HTTP client (`client`) live at **repo-root** `internal/ids/` and `internal/client/` — shared with the slim CLI codebase, not under `backend/internal/`.
 
-## CLI Commands (`cmd/orva/`)
+## CLI Commands (`cli/commands/`)
 
-All Cobra subcommands share one binary with the server. `orva serve` starts the daemon; every other command is a CLI client that reads `~/.orva/config.yaml`.
+All Cobra subcommands share one binary with the server. `orva serve` starts the daemon; every other command is a CLI client that reads `~/.orva/config.yaml`. The command library lives at repo-root `cli/commands/` (NOT under `backend/cmd/orva/`, which holds only `main.go`/`serve.go`/`setup.go`/`init_cmd.go` + the embedded `adapters/`); both binaries register it via `commands.NewRoot()`. See `cli/CLAUDE.md`.
 
-Key files: `deploy.go`, `diff.go`, `functions.go`, `invoke.go`, `logs.go`, `cron.go`, `kv.go`, `jobs.go`, `secrets.go`, `webhooks.go`, `routes.go`, `keys.go`, `system.go`, `activity.go`, `completion.go`.
+Key files: `deploy.go`, `deployments.go`, `diff.go`, `rollback.go`, `functions.go`, `invoke.go`, `logs.go`, `executions.go`, `cron.go`, `kv.go`, `jobs.go`, `secrets.go`, `webhooks.go`, `routes.go`, `dns.go`, `firewall.go`, `fixtures.go`, `channels.go`, `traces.go`, `pool.go`, `keys.go`, `system.go`, `backup.go`, `activity.go`, `chat.go`, `docs.go`, `completion.go`.
 
 ## Key Patterns
 
-**Handler responses**: always use `respond.JSON(w, status, val)` / `respond.Error(w, status, "SLUG", "message")` from `server/handlers/respond/`.
+**Handler responses**: always use `respond.JSON(w, status, val)` / `respond.Error(w, status, "SLUG", "message", requestID)` from `server/handlers/respond/` (the last arg is the request ID, often `RequestID(r.Context())` or `""`).
 
 **Invocation funnel**: HTTP, cron, jobs, and F2F calls all go through `Worker.Dispatch()` (sync response) or `Worker.DispatchEx()` (multi-frame streaming). Never invoke nsjail directly from handlers.
 
 **Async DB writes**: execution rows use `database.AsyncInsertExecution*` batch writers — no synchronous DB calls on the hot proxy path.
 
-**Name resolution**: functions can be referenced by UUID or by name. Use `resolveFnID(db, nameOrID)` from `handlers/functions_helpers.go`.
+**Name resolution**: functions can be referenced by UUID or by name. Use the handler method `(h *FunctionHandler) resolveFnID(idOrName string) (string, bool)` in `handlers/functions.go` (sibling copies on `FixtureHandler` / `KVOperatorHandler` / `InboundWebhookHandler`).
 
 **Streaming wire protocol**: `response_start` → `chunk` (base64 body data) → `response_end` frames over the worker's stdin/stdout pipe. `proxy.Forward()` owns the write-loop.
 
@@ -98,4 +97,4 @@ SQLite WAL mode. All migrations in `internal/database/migrations.go` — additiv
 - **AI conversation editing is destructive-tail:** editing or deleting a chat message (`EditMessage` / `DeleteMessage` in `server/ai_handler.go`, backed by `database.DeleteMessagesFromSeq`) truncates the conversation at that message's `seq` — it deletes that message and every message + tool call after it, then (for edit) re-runs the turn. There is no branching history; the tail is gone. `Regenerate` is the same truncate-then-rerun on the last assistant turn.
 - **AI turns are one-per-conversation:** the `ai.Manager` holds a keyed try-lock (`tryLockConv`/`unlockConv`) acquired by every mutating entry point (Chat, Resume, RegenerateLast, EditAndResend, DeleteMessageFrom). An overlapping turn on the same conversation is rejected — SSE `error` for streaming paths, `ai.ErrConversationBusy` → 409 for the JSON delete. `database.InsertMessage` assigns `seq` atomically inside the INSERT (`MAX(seq)+1` subquery); never split it back into a SELECT-then-INSERT.
 - **AI gateway lifecycle:** `ai.Manager.Close()` releases the embedded Bifrost pools and is called from `Server.Shutdown` (via `s.router.ai`). The gateway is built lazily and rebuilt on provider-config change (`invalidateClient`).
-- **Docs single source:** `docs/reference.md` is the canonical Orva reference markdown (~53 KB). `make docs-embed` syncs it to `backend/internal/mcp/reference.md` (embedded by the `get_orva_docs` MCP tool) and `frontend/public/docs.md` (served at `/docs.md` for the dashboard's Copy as Markdown button). Edit the canonical file then run `make docs-embed`; the Vue Docs page is the rendered version (separate templates) and must be updated alongside if content changes.
+- **Docs single source:** `docs/reference.md` is the canonical Orva reference markdown (~68 KB). `make docs-embed` syncs it to `backend/internal/mcp/reference.md` (embedded by the `get_orva_docs` MCP tool), `frontend/public/docs.md` (served at `/docs.md` for the dashboard's Copy as Markdown button), and `cli/commands/reference.md` (embedded into the slim CLI, served by `orva docs`). Edit the canonical file then run `make docs-embed`; the Vue Docs page is the rendered version (separate templates) and must be updated alongside if content changes.
diff --git a/cli/CLAUDE.md b/cli/CLAUDE.md
index 841ad6a..35151e7 100644
--- a/cli/CLAUDE.md
+++ b/cli/CLAUDE.md
@@ -45,7 +45,8 @@ cli/
     ├── upgrade.go        # `orva upgrade` (self-update via go-selfupdate)
     ├── webhooks.go       # `orva webhooks …`
     ├── commands_test.go  # command-tree + flag-presence tests
-    ├── chat_test.go      # chat SSE drive + approval-flow tests (httptest)
+    ├── chat_test.go      # chat SSE drive + approval-flow + idle/EOF tests (httptest)
+    ├── upgrade_test.go   # `orva upgrade` decision logic + asset-filter tests
     ├── reference.md      # GENERATED — embedded by docs.go (make docs-embed)
     └── theme/            # lipgloss color palette (theme.New(enabled))
 ```
diff --git a/docs/CLAUDE.md b/docs/CLAUDE.md
index 8cb078a..9749f86 100644
--- a/docs/CLAUDE.md
+++ b/docs/CLAUDE.md
@@ -19,7 +19,7 @@ Human-maintained reference documentation. Keep these in sync when changing API s
 | `SECURITY.md` | Threat model, nsjail sandbox isolation, network firewall (nftables) |
 | `SUPPORT.md` | Support matrix — distros, kernels, container runtimes |
 | `TRACING.md` | Causal trace model, propagation, W3C interop, outlier detection |
-| `reference.md` | **Canonical** Orva reference (~53 KB GFM markdown) — single source of truth shipped to the dashboard's Copy-as-Markdown button (via `frontend/public/docs.md`) and the `get_orva_docs` MCP tool (via `backend/internal/mcp/reference.md`). `make docs-embed` syncs both copies. Uses `{{ORIGIN}}` placeholders that consumers substitute at runtime. |
+| `reference.md` | **Canonical** Orva reference (~68 KB GFM markdown) — single source of truth shipped to the dashboard's Copy-as-Markdown button (via `frontend/public/docs.md`), the `get_orva_docs` MCP tool (via `backend/internal/mcp/reference.md`), and the slim CLI's `orva docs` command (via `cli/commands/reference.md`). `make docs-embed` syncs all three copies. Uses `{{ORIGIN}}` placeholders that consumers substitute at runtime. |
 
 ## Update Triggers
 
diff --git a/frontend/CLAUDE.md b/frontend/CLAUDE.md
index ac56c8a..6d7699e 100644
--- a/frontend/CLAUDE.md
+++ b/frontend/CLAUDE.md
@@ -54,6 +54,6 @@ After `npm run build`, run `make embed` from the repo root to copy `dist/` into
 - Dev proxy: `vite.config.js` proxies `/api` and `/auth` to `http://localhost:8443`. Direct `/fn/`, `/webhook/`, and `/metrics` calls in dev must be made to `:8443` directly — they are not proxied through Vite.
 - `src/stores/events.js` opens a persistent SSE connection on mount and reconnects automatically on drop. Dashboard widgets subscribe to this store — they do not open their own connections.
 - All AI prompt and clipboard operations (`aiPrompts.js`) are purely client-side — no source code is sent over the network.
-- The `Editor.vue` test pane sends requests through the backend (`POST /api/v1/functions/{id}/invoke`) rather than directly to `/fn/` — this ensures auth and capture still apply.
+- The `Editor.vue` test pane invokes the function directly at `/fn/<id>` via `invokeFunctionFull` (the `fnClient` in `src/api/client.js`, baseURL `/fn`) so the method/path/headers/body from the Postman-style pane round-trip exactly. `fnClient`'s request interceptor still injects the `X-Orva-API-Key` header, and all `/fn/` traffic passes through the backend proxy, so auth + execution capture still apply. (Note `/fn/` is NOT under `/api/v1`, so it needs the separate client.)
 - **AI streaming reactivity (load-bearing):** `stores/ai.js` tracks the streaming assistant message by **index** (`curIdx`) and writes every delta back through the reactive array via `patchAssistant()` (rebuilds `parts` immutably, then `timeline.value[curIdx] = next`). Never hold a raw object reference and mutate `parts[i].text +=` — Vue 3 tracks the array proxy, not the raw ref, so per-token mutations silently fail to re-render. The same index-write rule applies to `tool_result` frames.
 - **AI markdown rendering:** `MessagePart.vue` splits markdown into ordered segments so top-level fenced code becomes a real `<CodeBlock>` while prose stays HTML; parsing is throttled to ~12/s (leading + trailing edge) during streaming. Tables inherit the body font size (no shrink) so tabular output matches prose; the system prompt steers the model toward prose/bullets and reserves tables for genuinely tabular data.
diff --git a/scripts/CLAUDE.md b/scripts/CLAUDE.md
index 8b0b5fa..fa88e1b 100644
--- a/scripts/CLAUDE.md
+++ b/scripts/CLAUDE.md
@@ -12,10 +12,10 @@ Support scripts for deployment and installation. None of these are called by the
 
 ## Gotchas
 
-- `entrypoint.sh` **always overwrites** `adapter.js` / `adapter.py` from the image on every container start — this ensures runtime upgrades roll out even when the user mounts a persistent `orva_data` volume.
+- `entrypoint.sh` **always overwrites** `adapter.js` / `adapter.py` from the image on every container start — this ensures runtime upgrades roll out even when the user mounts a persistent `orva-data` volume.
 - `install.sh` embeds the systemd/OpenRC units and `uninstall.sh`; the bare-metal install writes them to `$PREFIX/share/orva/scripts/` and the generated uninstaller to the same path. Edit the heredocs in `install.sh` — there is no separate unit file.
 - `install.sh --cli-only` installs only the `orva` CLI binary to `/usr/local/bin/orva` — no systemd unit, no rootfs, no service user. Use this on operator laptops or CI runners that talk to a remote Orva over HTTPS.
-- Mode/option precedence is flag > env > interactive prompt > default. Key knobs: `--version`/`ORVA_VERSION` (pin a release), `--dry-run`/`ORVA_INSTALL_DRYRUN=1` (detect only), `--no-pkg`/`ORVA_NO_PKG=1` (skip system packages), `--runtime`/`ORVA_DOCKER_RUNTIME` (force the Docker runtime), `ORVA_SKIP_VERIFY=1` (bypass checksum verification — air-gapped mirrors only; verification is fail-closed otherwise).
-- Downloaded assets (orva, nsjail, rootfs, CLI) are SHA-256 verified against `checksums.txt`; a missing checksum **aborts** the install unless `ORVA_SKIP_VERIFY=1`.
+- Mode/option precedence is flag > env > interactive prompt > default. Key knobs: `--version`/`ORVA_VERSION` (pin a release), `--dry-run`/`ORVA_INSTALL_DRYRUN=1` (detect only), `--no-pkg`/`ORVA_NO_PKG=1` (skip system packages), `--runtime`/`ORVA_DOCKER_RUNTIME` (force the Docker runtime). There is **no** checksum-bypass env var — `ORVA_SKIP_VERIFY` is referenced in a stale `install.sh` comment but is not implemented.
+- Downloaded assets (orva, nsjail, rootfs, CLI) are SHA-256 verified against `checksums.txt`. A checksum **mismatch** aborts the install. A *missing* checksum entry only warns and proceeds in `install.sh` (`verify()` is fail-open on a missing entry); `install-cli.sh` is stricter and aborts when the entry is missing.
 - `build-rootfs.sh` produces large tarballs (~hundreds of MB); run only when updating the rootfs base image or adding system libraries.
 - Cross-distro installer tests: `test/install/matrix.sh` (fast, unprivileged — shellcheck + POSIX parse + dry-run + real CLI install across 6 distros) and the privileged systemd-in-docker harness under `test/install/`. CI: `.github/workflows/install-e2e.yml`.
diff --git a/test/CLAUDE.md b/test/CLAUDE.md
index b68341b..86de7be 100644
--- a/test/CLAUDE.md
+++ b/test/CLAUDE.md
@@ -1,67 +1,79 @@
 # test/
 
-Shell-based integration test suite. Tests run against a **live Orva instance** — they do not start their own server. The backend must be running before executing any test.
+Shell-based integration tests. They run against a **live Orva instance** — they
+do not start their own server. The backend must already be running. (The
+comprehensive, self-spinning suite is the Python one under `test/e2e/` — see
+below; these shell scripts are the ad-hoc checks against a running instance.)
 
-## Running
+## Config
 
-```bash
-# Run everything (writes summary to run-all-results.tsv)
-./test/run-all.sh
+Most scripts read `BASE_URL` + `API_KEY` from the environment (the default port
+is **18443**, not 8443):
 
-# Individual suites
-./test/api-smoke.sh        # core API round-trips
-./test/auth-test.sh        # API key auth + permissions
-./test/rollback-test.sh    # deploy → rollback → redeploy
-./test/routes-test.sh      # custom HTTP route mapping
-./test/secrets-test.sh     # secret injection into sandbox
-./test/egress-test.sh      # nftables outbound allow/deny
-./test/errors-test.sh      # error response shapes
-./test/loadtest.sh         # sustained concurrency
-./test/atscale.sh          # ramp-load scaling
-./test/onboarding-flow.sh  # full user onboarding scenario
-./test/heavy-deploy-test.sh # large deploy + streaming response
-./test/ceiling.sh          # sandbox concurrency ceiling
+```bash
+export BASE_URL=http://localhost:18443
+export API_KEY=orva_...
 ```
 
-## Config
+Most scripts **hard-require** `API_KEY` and do NOT fall back to
+`~/.orva/config.yaml`. Two exceptions — `sdk-test.sh` and `tracing-test.sh` —
+also accept `ORVA_ENDPOINT`/`ORVA_API_KEY` and fall back to `~/.orva/config.yaml`.
+`loadtest.sh` and `ceiling.sh` are standalone (own args / hardcoded host).
 
-Tests read from environment variables, falling back to the CLI config:
+## Running
 
 ```bash
-export ORVA_ENDPOINT=http://localhost:8443
-export ORVA_API_KEY=orva_...
-```
+# Umbrella suite — requires API_KEY set; writes run-all-results.tsv. Runs:
+# secrets, routes, heavy-deploy, onboarding, errors, rollback, egress, auth,
+# tracing, atscale. (api-smoke / loadtest / ceiling / sdk-test are run individually.)
+./test/run-all.sh
 
-If neither is set, tests fall back to `~/.orva/config.yaml`.
+# Individual suites
+./test/api-smoke.sh         # fast smoke of public REST endpoints
+./test/auth-test.sh         # per-function auth_mode + rate limiting
+./test/rollback-test.sh     # deploy → rollback → redeploy
+./test/routes-test.sh       # custom HTTP route mapping
+./test/secrets-test.sh      # secret injection into sandbox
+./test/egress-test.sh       # per-function network_mode toggle (none vs egress)
+./test/errors-test.sh       # error response shapes
+./test/tracing-test.sh      # causal-trace propagation
+./test/sdk-test.sh          # runtime SDK surface (kv.incr/cas, jobs, …)
+./test/onboarding-flow.sh   # browser onboarding/auth flow via curl (no deploy/invoke)
+./test/heavy-deploy-test.sh # large deploy + streaming response
+./test/loadtest.sh          # multi-phase RPS benchmark (hey)
+./test/atscale.sh           # multi-function deploy + isolation verification
+./test/ceiling.sh <api-key> <fn-id> [base-url]  # throughput ceiling ramp
+```
 
 ## Test Files
 
 | File | What it covers |
 |---|---|
-| `api-smoke.sh` | Functions CRUD, deploy, invoke, KV, cron, jobs, webhooks, fixtures, replay |
-| `auth-test.sh` | API key creation/deletion, permission scopes, rate limiting |
+| `api-smoke.sh` | Fast smoke of public REST endpoints: system health/metrics, auth/status, functions CRUD, deploy-inline, invoke, deployments, keys, routes — status-family checks, not deep coverage |
+| `auth-test.sh` | Per-function `auth_mode` (none / platform_key / signed HMAC), `rate_limit_per_min` (429 + Retry-After), invalid auth_mode → 400 VALIDATION |
 | `rollback-test.sh` | Version history, rollback endpoint, redeploy after rollback |
 | `routes-test.sh` | Custom route registration, path-matching, method filtering |
-| `secrets-test.sh` | Secret set/get, injection as env vars inside sandbox |
-| `egress-test.sh` | Firewall allow-list enforcement (allowed vs blocked domains) |
+| `secrets-test.sh` | Secret set/get, injection as env vars inside the sandbox |
+| `egress-test.sh` | Per-function `network_mode` toggle (none = blocked, egress = allowed) end-to-end; runs unconditionally (no auto-skip) |
 | `errors-test.sh` | 4xx/5xx shapes, SLUG codes, user-visible error messages |
-| `loadtest.sh` | Concurrent invocations over 30s with `wrk` or `curl` |
-| `atscale.sh` | Graduated load ramp; results in `atscale-results.tsv` |
-| `onboarding-flow.sh` | Login → deploy → invoke → KV → clean up end-to-end |
-| `heavy-deploy-test.sh` | 5 MB+ deploy + streaming chunked response validation |
-| `ceiling.sh` | Confirms max-concurrent sandbox limit is enforced |
 | `tracing-test.sh` | Causal-trace propagation across HTTP / cron / jobs / F2F |
-| `fixtures/` | Saved JSON payloads used as test inputs by some suites |
+| `sdk-test.sh` | Runtime SDK surface — kv.incr/cas, kv.list cursor, jobs idempotency, etc. (Node + Python handlers); uses `ORVA_ENDPOINT`/`ORVA_API_KEY` or `~/.orva/config.yaml` |
+| `loadtest.sh` | Multi-phase load test with `hey` (`-n`/`-c`): hello, mixed Node/Python, CPU, slow (500ms), error phases |
+| `atscale.sh` | Multi-function deploy + isolation: deploy 20 mixed fns, idle-RAM baseline, hammer 5 with `hey` asserting cross-fn isolation + autoscaler scale counts; TSV to stdout |
+| `ceiling.sh` | Sustained-load ramp (120s/step after a 60s warmup) to find the real throughput ceiling; emits CSV (rps/p50/p95/p99/err/mem). Positional args: `<api-key> <fn-id> [base-url]` |
+| `onboarding-flow.sh` | Auth/session onboarding flow via curl: onboard → session cookie → /auth/me → refresh (token rotation) → logout → SQLite persistence. No deploy/invoke/KV |
+| `heavy-deploy-test.sh` | Large deploy + streaming chunked response validation |
 
 ## Subdirectories
 
+- `e2e/` — comprehensive programmatic **Python (stdlib-only)** E2E suite; spins its own fresh isolated Docker container via `cd test/e2e && python3 run.py`. Covers the full server API + CLI + AI assistant and is the source-of-truth "does everything still work as spec'd" suite. See `test/e2e/CLAUDE.md`.
 - `cli/` — CLI-only harnesses: build matrix, install-cli, upgrade round-trip, command-tree golden diff.
-- `install/` — Server-install e2e (privileged systemd-in-docker across distros + Kata flow).
-- `kata-bench/` — Measures Kata-runtime overhead vs runc baseline.
-- `fixtures/` — Reusable function bodies (Node + Python) consumed by the test suites.
+- `install/` — server-install e2e (privileged systemd-in-docker across distros + Kata flow).
+- `kata-bench/` — benchmarks runc vs kata vs kata-clh (cold-start + ceiling ramp); includes `aggregate.py` + `extended-functional.sh`.
+- `fixtures/` — reusable function handler sources (`node-*/handler.js`, `python-*/handler.py`) deployed by the suites.
 
 ## Notes
 
 - Tests are additive and idempotent where possible — they create resources with unique names and clean up after themselves.
-- `egress-test.sh` requires nftables to be active on the host; it is skipped automatically if nft is absent.
+- `egress-test.sh` asserts the per-function `network_mode` toggle; outbound isolation depends on the host firewall/pasta being configured.
 - `heavy-deploy-test.sh` logs are saved to `heavy-deploy-stream.log` for inspection after the run.
diff --git a/test/e2e/CLAUDE.md b/test/e2e/CLAUDE.md
index 45a8783..bc84afc 100644
--- a/test/e2e/CLAUDE.md
+++ b/test/e2e/CLAUDE.md
@@ -92,7 +92,7 @@ This suite is meant to **grow on every change**:
     cached keys and only evicted on expiry, never on delete. Fixed by sharing
     the cache and evicting on delete (`test_keys.py` now asserts a revoked key
     returns 401 immediately).
-  - REST `GET/DELETE /functions/{id}` resolve by UUID only, not name.
+  - REST `GET /functions/{id}` resolves by UUID only, not name (the handler calls `Registry.Get` directly); `DELETE` and the other id-taking routes resolve by UUID OR name via `resolveFnID`.
   - REST `create_function` is lenient (fills defaults), unlike the strict MCP
     tool — validation tests must use genuinely invalid input.
 - New feature/endpoint/CLI command → it is **not done** until it has a module/cases here.
@@ -108,7 +108,8 @@ This suite is meant to **grow on every change**:
   via `CLIRunner`, confirming CLI↔server parity for each surface.
 - **nsjail-dependent** scenarios (real function **deploy-build** + **invoke**) only
   run when the container's kernel allows nested sandboxing; otherwise they `skip()`.
-- **Function lookup:** REST `GET/DELETE /functions/{id}` resolve by **UUID only**;
-  capture the id from create responses (the agent/MCP layer also accepts a name).
+- **Function lookup:** REST `GET /functions/{id}` resolves by **UUID only**
+  (capture the id from create responses); `DELETE` and most other id-taking
+  routes accept a UUID **or** a name via `resolveFnID`, as does the agent/MCP layer.
 - **Settings** is a shared singleton row; the mock helpers snapshot/restore it so
   modules don't contaminate each other's view of "defaults".