diff --git a/.agents/skills/generate-sandbox-policy/examples.md b/.agents/skills/generate-sandbox-policy/examples.md index 7bf98dd6..ede51113 100644 --- a/.agents/skills/generate-sandbox-policy/examples.md +++ b/.agents/skills/generate-sandbox-policy/examples.md @@ -2,6 +2,13 @@ Examples organized by detail tier — from minimal (just host + intent) to full (complete API docs). +> **TLS note:** TLS termination is automatic. The proxy auto-detects TLS by +> peeking the first bytes of each connection, so there is no need to specify +> `tls: terminate` in policies. The `tls: terminate` and `tls: passthrough` +> values are deprecated. If you have an edge case where auto-detection must +> be bypassed, you can set `tls: skip` to disable TLS interception for that +> endpoint. + --- ## Minimal Tier Examples (host + intent, no API docs) @@ -23,7 +30,7 @@ network_policies: - { path: /usr/local/bin/claude } ``` -No `protocol`, `tls`, `rules`, or `access` — this is pure L4 (host:port + binary identity check). +No `protocol`, `rules`, or `access` — this is pure L4 (host:port + binary identity check). --- @@ -41,7 +48,6 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: @@ -64,7 +70,6 @@ network_policies: - host: integrate.api.nvidia.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: full binaries: @@ -109,13 +114,11 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only - host: api.gitlab.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: @@ -155,7 +158,6 @@ network_policies: - host: api.openai.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -202,7 +204,6 @@ network_policies: - host: integrate.api.nvidia.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -236,7 +237,6 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -291,7 +291,6 @@ Endpoints: - Methods: GET, HEAD, OPTIONS only - Paths: All paths (user wants to browse freely) - This maps exactly to the `read-only` preset -- Port 443 + L7 rules → needs `tls: terminate` ### Output @@ -303,7 +302,6 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: @@ -343,7 +341,6 @@ Endpoints: - Scope: `integrate.api.nvidia.com:443` - Methods: POST on `/v1/chat/completions`, GET on `/v1/models` and `/v1/models/*` - No preset fits — need explicit rules -- Port 443 + L7 → `tls: terminate` - Two binaries ### Output @@ -356,7 +353,6 @@ network_policies: - host: integrate.api.nvidia.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -490,7 +486,6 @@ paths: - Tasks: GET, POST, PUT, DELETE on `/projects/*/tasks` and `/projects/*/tasks/*` - Members: GET only on `/projects/*/members` - Admin: No rules = denied by default -- Port 443 + L7 → `tls: terminate` ### Output @@ -502,7 +497,6 @@ network_policies: - host: pm-api.example.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: # Projects — full CRUD @@ -606,7 +600,6 @@ network_policies: - host: metrics.corp.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -747,7 +740,6 @@ An exact IP is treated as `/32` — only that specific address is permitted. - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: @@ -849,7 +841,6 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: @@ -861,7 +852,6 @@ network_policies: - host: api.anthropic.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: full binaries: diff --git a/architecture/gateway-security.md b/architecture/gateway-security.md index f6598626..4989f69b 100644 --- a/architecture/gateway-security.md +++ b/architecture/gateway-security.md @@ -420,7 +420,7 @@ This section defines the primary attacker profiles, what the current design prot Separate from the cluster mTLS infrastructure, each sandbox has an independent TLS capability for inspecting outbound HTTPS traffic. This is documented here for completeness because it involves a distinct, per-sandbox PKI. -When a sandbox policy configures `tls: terminate` on an endpoint, the sandbox proxy performs TLS man-in-the-middle inspection: +The sandbox proxy automatically detects and terminates TLS on outbound HTTPS connections by peeking the first bytes of each tunnel. This enables credential injection and L7 inspection without requiring explicit policy configuration. The proxy performs TLS man-in-the-middle inspection: 1. **Ephemeral sandbox CA**: a per-sandbox CA (`CN=OpenShell Sandbox CA, O=OpenShell`) is generated at sandbox startup. This CA is completely independent of the cluster mTLS CA. 2. **Trust injection**: the sandbox CA is written to the sandbox filesystem and injected via `NODE_EXTRA_CA_CERTS` and `SSL_CERT_FILE` so processes inside the sandbox trust it. diff --git a/architecture/policy-advisor.md b/architecture/policy-advisor.md index 064567a1..19edb001 100644 --- a/architecture/policy-advisor.md +++ b/architecture/policy-advisor.md @@ -59,7 +59,7 @@ The `mechanistic_mapper` module (`crates/openshell-sandbox/src/mechanistic_mappe - Port recognition (well-known ports like 443, 5432 get a boost) - SSRF origin (SSRF denials get lower confidence) 6. Generates security notes for private IPs, database ports, and ephemeral port ranges -7. If L7 request samples are present, generates specific L7 rules (method + path) with `protocol: rest` and `tls: terminate` (plumbed but not yet fed data — see issue #205) +7. If L7 request samples are present, generates specific L7 rules (method + path) with `protocol: rest` (TLS termination is automatic — no `tls` field needed). Plumbed but not yet fed data — see issue #205. The mapper runs in `flush_proposals_to_gateway` after the aggregator drains. It produces `PolicyChunk` protos that are sent alongside the raw `DenialSummary` protos to the gateway. diff --git a/architecture/sandbox.md b/architecture/sandbox.md index a9d80ac8..1117d0f7 100644 --- a/architecture/sandbox.md +++ b/architecture/sandbox.md @@ -27,10 +27,10 @@ All paths are relative to `crates/openshell-sandbox/src/`. | `sandbox/linux/seccomp.rs` | Syscall filtering via BPF on `SYS_socket` | | `bypass_monitor.rs` | Background `/dev/kmsg` reader for iptables bypass detection events | | `sandbox/linux/netns.rs` | Network namespace creation, veth pair setup, bypass detection iptables rules, cleanup on drop | -| `l7/mod.rs` | L7 types (`L7Protocol`, `TlsMode`, `EnforcementMode`, `L7EndpointConfig`), config parsing, validation, access preset expansion | +| `l7/mod.rs` | L7 types (`L7Protocol`, `TlsMode`, `EnforcementMode`, `L7EndpointConfig`), config parsing, validation, access preset expansion, deprecated `tls` value handling | | `l7/inference.rs` | Inference API pattern detection (`detect_inference_pattern()`), HTTP request/response parsing and formatting for intercepted inference connections | -| `l7/tls.rs` | Ephemeral CA generation (`SandboxCa`), per-hostname leaf cert cache (`CertCache`), TLS termination/connection helpers | -| `l7/relay.rs` | Protocol-aware bidirectional relay with per-request OPA evaluation | +| `l7/tls.rs` | Ephemeral CA generation (`SandboxCa`), per-hostname leaf cert cache (`CertCache`), TLS termination/connection helpers, `looks_like_tls()` auto-detection | +| `l7/relay.rs` | Protocol-aware bidirectional relay with per-request OPA evaluation, credential-injection-only passthrough relay | | `l7/rest.rs` | HTTP/1.1 request/response parsing, body framing (Content-Length, chunked), deny response generation | | `l7/provider.rs` | `L7Provider` trait and `L7Request`/`BodyLength` types | @@ -674,11 +674,26 @@ sequenceDiagram else All IPs public P->>U: TCP connect (resolved addrs) P-->>S: HTTP/1.1 200 Connection Established - alt L7 config present - P->>P: TLS termination / protocol detection - P->>P: Per-request L7 evaluation - else L4-only + alt tls: skip P->>P: copy_bidirectional (raw tunnel) + else Auto-detect + P->>P: Peek first bytes + alt TLS detected + P->>P: TLS terminate (MITM) + alt L7 config present + P->>P: relay_with_inspection (per-request L7 evaluation) + else No L7 config + P->>P: relay_passthrough_with_credentials (credential injection) + end + else HTTP detected + alt L7 config present + P->>P: relay_with_inspection + else No L7 config + P->>P: relay_passthrough_with_credentials + end + else Neither TLS nor HTTP + P->>P: copy_bidirectional (raw tunnel) + end end end end @@ -876,20 +891,45 @@ flowchart TD `ResolvedRoute` has a custom `Debug` implementation in `crates/openshell-router/src/config.rs` that redacts the `api_key` field, printing `[REDACTED]` instead of the actual value. This prevents key leakage in log output and debug traces. -### Post-decision: L7 dispatch or raw tunnel (`Allow` path) +### Post-decision: auto-TLS detection, L7 dispatch, or raw tunnel (`Allow` path) -After a CONNECT is allowed, the SSRF check passes, and the upstream TCP connection is established: +After a CONNECT is allowed, the SSRF check passes, and the upstream TCP connection is established, the proxy determines how to handle the tunnel traffic. TLS detection is automatic — the proxy peeks the first bytes of the client stream to decide. 1. **Query L7 config**: `query_l7_config()` asks the OPA engine for `matched_endpoint_config`. If the endpoint has a `protocol` field, parse it into `L7EndpointConfig`. -2. **L7 inspection** (if config present): - - Clone the OPA engine for per-tunnel evaluation (`clone_engine_for_tunnel()`) - - Build `L7EvalContext` with host, port, policy name, binary path, ancestors, cmdline paths - - Branch on TLS mode: - - `TlsMode::Terminate`: MITM via `tls_terminate_client()` + `tls_connect_upstream()`, then `relay_with_inspection()` - - `TlsMode::Passthrough`: Peek first bytes on raw TCP; if `looks_like_http()` matches, run `relay_with_inspection()`; reject on protocol mismatch +2. **Check for `tls: skip`**: If the endpoint has `tls: skip`, bypass all auto-detection and relay raw bytes via `copy_bidirectional()`. This is the escape hatch for client-cert mTLS or non-standard protocols. -3. **L4-only** (no L7 config): `tokio::io::copy_bidirectional()` for a raw tunnel +3. **Peek and auto-detect**: Read up to 8 bytes from the client stream via `TcpStream::peek()`. Classify the traffic using `looks_like_tls()` (checks for TLS ClientHello record: byte 0 = `0x16`, bytes 1-2 = TLS version `0x03xx`) and `looks_like_http()` (checks for HTTP method prefix). + +4. **TLS detected** (`is_tls = true`): + - Terminate TLS unconditionally via `tls_terminate_client()` + `tls_connect_upstream()`. This happens for all HTTPS endpoints, not just those with L7 config. + - If L7 config is present: clone the OPA engine (`clone_engine_for_tunnel()`), run `relay_with_inspection()` for per-request policy evaluation. + - If no L7 config: run `relay_passthrough_with_credentials()` — parses HTTP minimally to inject credentials (via `SecretResolver`) and log requests, but does not evaluate L7 OPA rules. This enables credential injection on all HTTPS endpoints without requiring `protocol` in the policy. + - If TLS state is not configured: fall back to raw `copy_bidirectional()` with a warning. + +5. **Plaintext HTTP detected** (`is_http = true`, `is_tls = false`): + - If L7 config present: clone OPA engine, run `relay_with_inspection()` directly on the plaintext streams. + - If no L7 config: run `relay_passthrough_with_credentials()` for credential injection and observability. + +6. **Neither TLS nor HTTP**: Raw `copy_bidirectional()` tunnel (binary protocols, SSH-over-CONNECT, etc.). + +```mermaid +flowchart TD + A["CONNECT allowed + upstream connected"] --> B["Query L7 config"] + B --> C{"tls: skip?"} + C -- Yes --> D["Raw copy_bidirectional"] + C -- No --> E["Peek first bytes"] + E --> F{"looks_like_tls?"} + F -- Yes --> G["TLS terminate client + upstream"] + G --> H{"L7 config?"} + H -- Yes --> I["relay_with_inspection"] + H -- No --> J["relay_passthrough_with_credentials
(credential injection, no L7 rules)"] + F -- No --> K{"looks_like_http?"} + K -- Yes --> L{"L7 config?"} + L -- Yes --> M["relay_with_inspection"] + L -- No --> N["relay_passthrough_with_credentials"] + K -- No --> O["Raw copy_bidirectional
(binary protocol)"] +``` ## L7 Protocol-Aware Inspection @@ -918,7 +958,7 @@ flowchart LR | Type | Definition | Purpose | |------|-----------|---------| | `L7Protocol` | `Rest`, `Sql` | Supported application protocols | -| `TlsMode` | `Passthrough`, `Terminate` | TLS handling strategy | +| `TlsMode` | `Auto` (default), `Skip` | TLS handling strategy — `Auto` peeks first bytes and terminates if TLS is detected; `Skip` bypasses detection entirely | | `EnforcementMode` | `Audit`, `Enforce` | What to do on L7 deny (log-only vs block) | | `L7EndpointConfig` | `{ protocol, tls, enforcement }` | Per-endpoint L7 configuration | | `L7Decision` | `{ allowed, reason, matched_rule }` | Result of L7 evaluation | @@ -943,19 +983,19 @@ Expansion happens in `expand_access_presets()` before the Rego engine loads the **Errors** (block startup): - `rules` and `access` both specified on same endpoint - `protocol` specified without `rules` or `access` -- `tls: terminate` without a `protocol` - `protocol: sql` with `enforcement: enforce` (SQL parsing not available in v1) - Empty `rules` array (would deny all traffic) **Warnings** (logged): -- `protocol: rest` on port 443 without `tls: terminate` (L7 rules ineffective on encrypted traffic) +- `tls: terminate` or `tls: passthrough` on any endpoint (deprecated — TLS termination is now automatic; use `tls: skip` to disable) +- `tls: skip` with L7 rules on port 443 (L7 inspection cannot work on encrypted traffic) - Unknown HTTP method in rules -### TLS termination +### TLS termination (auto-detect) **File:** `crates/openshell-sandbox/src/l7/tls.rs` -TLS termination enables the proxy to inspect HTTPS traffic by performing MITM decryption. +TLS termination is automatic. The proxy peeks the first bytes of every CONNECT tunnel and terminates TLS whenever a ClientHello is detected. This enables credential injection and L7 inspection on all HTTPS endpoints without requiring explicit `tls: terminate` in the policy. The `tls` field defaults to `Auto`; use `tls: skip` to opt out entirely (e.g., for client-cert mTLS to upstream). **Ephemeral CA lifecycle:** 1. At sandbox startup, `SandboxCa::generate()` creates a self-signed CA (CN: "OpenShell Sandbox CA") using `rcgen` @@ -963,19 +1003,38 @@ TLS termination enables the proxy to inspect HTTPS traffic by performing MITM de 3. The sandbox CA cert path is set as `NODE_EXTRA_CA_CERTS` (additive for Node.js) 4. The combined bundle is set as `SSL_CERT_FILE`, `REQUESTS_CA_BUNDLE`, `CURL_CA_BUNDLE` (replaces defaults for OpenSSL, Python requests, curl) +**TLS auto-detection** (`looks_like_tls()`): +- Peeks up to 8 bytes from the client stream +- Checks for TLS ClientHello pattern: byte 0 = `0x16` (ContentType::Handshake), byte 1 = `0x03` (TLS major version), byte 2 ≤ `0x04` (minor version, covering SSL 3.0 through TLS 1.3) +- Returns `false` for plaintext HTTP, SSH, or other binary protocols + **Per-hostname leaf cert generation:** - `CertCache` maps hostnames to `CertifiedLeaf` structs (cert chain + private key) - First request for a hostname generates a leaf cert signed by the sandbox CA via `rcgen` - Cache has a hard limit of 256 entries; on overflow, the entire cache is cleared (sufficient for sandbox scale) - Each leaf cert chain contains two certs: the leaf and the CA -**Connection flow:** +**Connection flow (when TLS is detected):** 1. `tls_terminate_client()`: Accept TLS from the sandboxed client using a `ServerConfig` with the hostname-specific leaf cert. ALPN: `http/1.1`. 2. `tls_connect_upstream()`: Connect TLS to the real upstream using a `ClientConfig` with Mozilla root CAs (`webpki_roots`). ALPN: `http/1.1`. -3. Proxy now holds plaintext on both sides and runs `relay_with_inspection()`. +3. Proxy now holds plaintext on both sides. If L7 config is present, runs `relay_with_inspection()`. Otherwise, runs `relay_passthrough_with_credentials()` for credential injection without L7 evaluation. System CA bundles are searched at well-known paths: `/etc/ssl/certs/ca-certificates.crt` (Debian/Ubuntu), `/etc/pki/tls/certs/ca-bundle.crt` (RHEL), `/etc/ssl/ca-bundle.pem` (openSUSE), `/etc/ssl/cert.pem` (Alpine/macOS). +### Credential-injection-only relay + +**File:** `crates/openshell-sandbox/src/l7/relay.rs` (`relay_passthrough_with_credentials()`) + +When TLS is auto-terminated but no L7 policy (`protocol` + `access`/`rules`) is configured on the endpoint, the proxy enters a passthrough mode that still provides value: it parses HTTP requests minimally to rewrite credential placeholders (via `SecretResolver`) and logs each request for observability. This relay: + +1. Reads each HTTP request from the client via `RestProvider::parse_request()` +2. Logs the request method, path, host, and port at `info!()` level (tagged `"HTTP relay (credential injection)"`) +3. Forwards the request to upstream via `relay_http_request_with_resolver()`, which rewrites headers containing `openshell:resolve:env:*` placeholders with actual provider credential values +4. Relays the upstream response back to the client +5. Loops for HTTP keep-alive; exits on client close or non-reusable response + +This enables credential injection on all HTTPS endpoints automatically, without requiring the policy author to add `protocol: rest` and `access: full` just to get credentials injected. + ### REST protocol provider **File:** `crates/openshell-sandbox/src/l7/rest.rs` diff --git a/architecture/security-policy.md b/architecture/security-policy.md index b63179c4..44898d70 100644 --- a/architecture/security-policy.md +++ b/architecture/security-policy.md @@ -326,7 +326,7 @@ Controls which filesystem paths the sandboxed process can access. Enforced via L **Working directory**: When `include_workdir` is `true` and a `--workdir` is specified, the working directory path is appended to `read_write` if not already present. See `crates/openshell-sandbox/src/sandbox/linux/landlock.rs` -- `apply()`. -**TLS directory**: When network proxy mode is active with TLS termination enabled, the directory `/etc/openshell-tls` is automatically appended to `read_only` so sandbox processes can read the ephemeral CA certificate files. +**TLS directory**: When network proxy mode is active, the directory `/etc/openshell-tls` is automatically appended to `read_only` so sandbox processes can read the ephemeral CA certificate files (used for auto-TLS termination). ```yaml filesystem_policy: @@ -433,7 +433,7 @@ Each endpoint defines a network destination and, optionally, L7 inspection behav | `port` | `integer` | _(required)_ | TCP port to match. Mutually exclusive with `ports` — if both are set, `ports` takes precedence. See [Multi-Port Endpoints](#multi-port-endpoints). | | `ports` | `integer[]`| `[]` | Multiple TCP ports to match. When non-empty, the endpoint covers all listed ports. Backwards compatible with `port`. See [Multi-Port Endpoints](#multi-port-endpoints). | | `protocol` | `string` | `""` | Application protocol for L7 inspection. See [Behavioral Trigger: L7 Inspection](#behavioral-trigger-l7-inspection). | -| `tls` | `string` | `"passthrough"` | TLS handling mode. See [Behavioral Trigger: TLS Termination](#behavioral-trigger-tls-termination). | +| `tls` | `string` | `""` (auto) | TLS handling mode. Absent or empty: auto-detect and terminate TLS if detected. `"skip"`: bypass TLS detection entirely. `"terminate"` and `"passthrough"` are deprecated (treated as auto). See [Behavioral Trigger: TLS Handling](#behavioral-trigger-tls-handling). | | `enforcement` | `string` | `"audit"` | L7 enforcement mode: `"enforce"` or `"audit"` | | `access` | `string` | `""` | Shorthand preset for common L7 rule sets. Mutually exclusive with `rules`. | | `rules` | `L7Rule[]` | `[]` | Explicit L7 allow rules. Mutually exclusive with `access`. | @@ -528,7 +528,7 @@ network_policies: - { path: /usr/bin/curl } ``` -Host wildcards compose with all other endpoint features — L7 inspection, TLS termination, multi-port, and `allowed_ips`: +Host wildcards compose with all other endpoint features — L7 inspection, auto-TLS termination, multi-port, and `allowed_ips`: ```yaml network_policies: @@ -538,7 +538,6 @@ network_policies: - host: "*.example.com" port: 8080 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -597,7 +596,6 @@ network_policies: - host: "*.example.com" ports: [443, 8443] protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: @@ -773,25 +771,32 @@ resp = httpx.get("http://10.86.8.223:8000/screenshot/", proxy="http://10.200.0.1:3128") ``` -### Behavioral Trigger: TLS Termination +### Behavioral Trigger: TLS Handling **Trigger**: The `tls` field on a `NetworkEndpoint`. +TLS termination is automatic. The proxy peeks the first bytes of every CONNECT tunnel and terminates TLS whenever a ClientHello is detected. This removes the need for explicit `tls: terminate` in policy — all HTTPS connections are automatically terminated for credential injection and (when configured) L7 inspection. + | Condition | Behavior | | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `tls` absent or `"passthrough"` | For L7 endpoints: the proxy inspects plaintext only. For HTTPS endpoints (port 443), L7 rules will not be evaluated because the traffic is encrypted. A validation warning is emitted. | -| `tls: "terminate"` | The proxy performs MITM TLS termination: it presents a dynamically-generated certificate (signed by an ephemeral per-sandbox CA) to the client, decrypts the traffic, inspects the plaintext HTTP, then re-encrypts to upstream using real root CAs (webpki-roots). | +| `tls` absent or `""` (default) | **Auto-detect**: The proxy peeks the first bytes of the tunnel. If TLS is detected (ClientHello pattern), the proxy terminates TLS transparently (MITM), enabling credential injection and L7 inspection. If plaintext HTTP is detected, the proxy inspects directly. If neither, traffic is relayed raw. | +| `tls: "skip"` | **Explicit opt-out**: No TLS detection, no termination, no credential injection. The tunnel is a raw `copy_bidirectional` relay. Use for client-cert mTLS to upstream or non-standard binary protocols. | +| `tls: "terminate"` *(deprecated)* | Treated as auto-detect. Emits a deprecation warning: "TLS termination is now automatic. Use `tls: skip` to explicitly disable." | +| `tls: "passthrough"` *(deprecated)* | Treated as auto-detect. Emits the same deprecation warning. | -**Prerequisites for TLS termination**: +**Prerequisites for TLS termination (auto-detect path)**: -- The `protocol` field must also be set. `tls: terminate` without `protocol` is rejected at validation time. - The sandbox supervisor generates an ephemeral CA at startup (`SandboxCa::generate()`) and writes it to `/etc/openshell-tls/`. - Trust store environment variables are set on the child process: `NODE_EXTRA_CA_CERTS`, `SSL_CERT_FILE`, `REQUESTS_CA_BUNDLE`, `CURL_CA_BUNDLE`. - A combined CA bundle (system CAs + sandbox CA) is written to `/etc/openshell-tls/ca-bundle.pem` so `SSL_CERT_FILE` replaces the default trust store while still trusting real CAs. **Certificate caching**: Per-hostname leaf certificates are cached (up to 256 entries, then the entire cache is cleared). See `crates/openshell-sandbox/src/l7/tls.rs` -- `CertCache`. -**Validation warning**: When `protocol: rest` is set on port 443 without `tls: terminate`, the validator emits a warning: "L7 rules won't be evaluated on encrypted traffic without `tls: terminate`". +**Credential injection**: When TLS is auto-terminated but no L7 policy is configured (no `protocol` field), the proxy enters a passthrough relay that rewrites credential placeholders in HTTP headers (via `SecretResolver`) and logs requests for observability, but does not evaluate L7 OPA rules. This means credential injection works on all HTTPS endpoints automatically. + +**Validation warnings**: +- `tls: terminate` or `tls: passthrough`: deprecated, emits a warning. +- `tls: skip` with `protocol: rest` on port 443: emits a warning ("L7 inspection cannot work on encrypted traffic"). ### Behavioral Trigger: Enforcement Mode @@ -897,9 +902,10 @@ sequenceDiagram Note over Proxy: Query L7 config for matched endpoint Proxy->>OPA: query_endpoint_config(host, port, binary) - OPA-->>Proxy: {protocol: rest, tls: terminate, enforcement: enforce} + OPA-->>Proxy: {protocol: rest, enforcement: enforce} - Note over Proxy: TLS termination (if configured) + Note over Proxy: Auto-detect TLS (peek first bytes) + Note over Proxy: TLS ClientHello detected → terminate Client->>Proxy: TLS ClientHello Proxy-->>Client: TLS ServerHello (ephemeral cert for host) Note over Proxy: Decrypt client traffic @@ -944,7 +950,6 @@ The following validation rules are enforced during policy loading (both file mod | ---------------------------------------------- | ------------------------------------------------------------------------------------------ | | Both `rules` and `access` on the same endpoint | `rules and access are mutually exclusive` | | `protocol` set without `rules` or `access` | `protocol requires rules or access to define allowed traffic` | -| `tls: terminate` without `protocol` | `TLS termination requires a protocol for L7 inspection` | | `protocol: sql` with `enforcement: enforce` | `SQL enforcement requires full SQL parsing (not available in v1). Use enforcement: audit.` | | `rules: []` (empty list) | `rules list cannot be empty (would deny all traffic). Use access: full or remove rules.` | | Host wildcard is bare `*` or `**` | `host wildcard '*' matches all hosts; use specific patterns like '*.example.com'` | @@ -965,7 +970,8 @@ These errors are returned by the gateway's `UpdateSandboxPolicy` handler and rej | Condition | Warning Message | | ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | -| `protocol: rest` on port 443 without `tls: terminate` | `L7 rules won't be evaluated on encrypted traffic without tls: terminate` | +| `tls: terminate` or `tls: passthrough` on any endpoint | `'tls: {value}' is deprecated; TLS termination is now automatic. Use 'tls: skip' to disable.` | +| `tls: skip` with L7 rules on port 443 | `'tls: skip' with L7 rules on port 443 — L7 inspection cannot work on encrypted traffic` | | Host wildcard with ≤2 labels (e.g., `*.com`) | `host wildcard '*.com' is very broad (covers all subdomains of a TLD)` | | Unknown HTTP method in rules (not GET/HEAD/POST/PUT/DELETE/PATCH/OPTIONS/\*) | `Unknown HTTP method '{method}'. Standard methods: GET, HEAD, POST, PUT, DELETE, PATCH, OPTIONS.` | @@ -1152,14 +1158,13 @@ network_policies: binaries: - { path: /usr/local/bin/claude } - # L7 + TLS termination: Full access with HTTPS inspection + # L7 + auto-TLS: Full access with HTTPS inspection (TLS terminated automatically) claude_code_inspected: name: claude_code_inspected endpoints: - host: api.anthropic.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: full binaries: @@ -1244,14 +1249,13 @@ network_policies: binaries: - { path: /usr/bin/curl } - # Multi-port with L7: same L7 rules applied across two ports + # Multi-port with L7: same L7 rules applied across two ports (TLS auto-terminated) multi_port_l7: name: multi_port_l7 endpoints: - host: api.internal.svc ports: [8080, 9090] protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: diff --git a/architecture/system-architecture.md b/architecture/system-architecture.md index 290d27c6..5ea92064 100644 --- a/architecture/system-architecture.md +++ b/architecture/system-architecture.md @@ -118,7 +118,7 @@ graph TB NetNS -- "proxied traffic" --> Proxy Proxy -- "policy evaluation" --> OPA Proxy -- "inference requests" --> InferenceRouter - Proxy -- "TLS inspection
(optional L7)" --> CertCache + Proxy -- "Auto TLS termination
+ optional L7 inspection" --> CertCache %% ============================================================ %% CONNECTIONS: Sandbox --> Gateway (control plane) @@ -128,7 +128,7 @@ graph TB %% ============================================================ %% CONNECTIONS: Sandbox --> External (via proxy) %% ============================================================ - Proxy -- "HTTPS
(TLS passthrough
or MITM)" --> Anthropic + Proxy -- "HTTPS
(auto TLS termination)" --> Anthropic Proxy -- "HTTPS" --> OpenAI Proxy -- "HTTPS" --> NVIDIA_API Proxy -- "HTTPS" --> GitHub @@ -193,7 +193,7 @@ graph TB 3. **File Sync**: tar archives streamed over the SSH tunnel (no rsync dependency). -4. **Sandbox to External**: All agent outbound traffic is forced through the HTTP CONNECT proxy (10.200.0.1:3128) via a network namespace veth pair. OPA/Rego policies evaluate every connection. Optional TLS MITM enables L7 inspection. +4. **Sandbox to External**: All agent outbound traffic is forced through the HTTP CONNECT proxy (10.200.0.1:3128) via a network namespace veth pair. OPA/Rego policies evaluate every connection. TLS is automatically detected and terminated for credential injection; endpoints with `protocol` configured also get L7 request-level inspection. 5. **Inference Routing**: Inference requests are handled inside the sandbox by the openshell-router (not through the gateway). The gateway provides route configuration and credentials via gRPC; the sandbox executes HTTP requests directly to inference backends. diff --git a/crates/openshell-sandbox/data/sandbox-policy.rego b/crates/openshell-sandbox/data/sandbox-policy.rego index 61393e15..ded7c8c1 100644 --- a/crates/openshell-sandbox/data/sandbox-policy.rego +++ b/crates/openshell-sandbox/data/sandbox-policy.rego @@ -277,7 +277,8 @@ endpoint_matches_request(ep, network) if { ep.ports[_] == network.port } -# An endpoint has extended config if it specifies L7 protocol or allowed_ips. +# An endpoint has extended config if it specifies L7 protocol, allowed_ips, +# or an explicit tls mode (e.g. tls: skip). endpoint_has_extended_config(ep) if { ep.protocol } @@ -285,3 +286,7 @@ endpoint_has_extended_config(ep) if { endpoint_has_extended_config(ep) if { count(object.get(ep, "allowed_ips", [])) > 0 } + +endpoint_has_extended_config(ep) if { + ep.tls +} diff --git a/crates/openshell-sandbox/src/l7/mod.rs b/crates/openshell-sandbox/src/l7/mod.rs index 9b9ae473..09e54788 100644 --- a/crates/openshell-sandbox/src/l7/mod.rs +++ b/crates/openshell-sandbox/src/l7/mod.rs @@ -31,14 +31,16 @@ impl L7Protocol { } } -/// TLS handling mode for L7-inspected endpoints. +/// TLS handling mode for proxy connections. #[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] pub enum TlsMode { - /// No TLS termination — L7 inspection on plaintext only. + /// Auto-detect TLS by peeking the first bytes. If TLS is detected, + /// terminate it transparently. This is the default for all endpoints. #[default] - Passthrough, - /// Proxy terminates TLS, inspects plaintext, re-encrypts to upstream. - Terminate, + Auto, + /// Explicit opt-out: raw tunnel with no TLS termination and no credential + /// injection. Use for client-cert mTLS to upstream or non-standard protocols. + Skip, } /// Enforcement mode for L7 policy decisions. @@ -85,8 +87,22 @@ pub fn parse_l7_config(val: ®orus::Value) -> Option { let protocol = L7Protocol::parse(&protocol_val)?; let tls = match get_object_str(val, "tls").as_deref() { - Some("terminate") => TlsMode::Terminate, - _ => TlsMode::Passthrough, + Some("skip") => TlsMode::Skip, + Some("terminate") => { + tracing::warn!( + "'tls: terminate' is deprecated; TLS termination is now automatic. \ + Use 'tls: skip' to explicitly disable. This field will be removed in a future version." + ); + TlsMode::Auto + } + Some("passthrough") => { + tracing::warn!( + "'tls: passthrough' is deprecated; TLS termination is now automatic. \ + Use 'tls: skip' to explicitly disable. This field will be removed in a future version." + ); + TlsMode::Auto + } + _ => TlsMode::Auto, }; let enforcement = match get_object_str(val, "enforcement").as_deref() { @@ -101,6 +117,18 @@ pub fn parse_l7_config(val: ®orus::Value) -> Option { }) } +/// Parse the `tls` field from an endpoint config, independent of L7 protocol. +/// +/// Used to check for `tls: skip` even on L4-only endpoints (no `protocol` +/// field) that explicitly opt out of TLS auto-detection. +pub fn parse_tls_mode(val: ®orus::Value) -> TlsMode { + match get_object_str(val, "tls").as_deref() { + Some("skip") => TlsMode::Skip, + Some("terminate") | Some("passthrough") => TlsMode::Auto, // deprecation logged by parse_l7_config + _ => TlsMode::Auto, + } +} + /// Extract a string value from a regorus object. fn get_object_str(val: ®orus::Value, key: &str) -> Option { let key_val = regorus::Value::String(key.into()); @@ -209,10 +237,17 @@ pub fn validate_l7_policies(data_json: &serde_json::Value) -> (Vec, Vec< )); } - // tls: terminate requires protocol - if tls == "terminate" && protocol.is_empty() { - errors.push(format!( - "{loc}: TLS termination requires a protocol for L7 inspection" + // Deprecated tls values: warn but don't error + if tls == "terminate" || tls == "passthrough" { + warnings.push(format!( + "{loc}: 'tls: {tls}' is deprecated; TLS termination is now automatic. Use 'tls: skip' to disable." + )); + } + + // tls: skip with L7 on port 443 won't work + if tls == "skip" && !protocol.is_empty() && ports.contains(&443) { + warnings.push(format!( + "{loc}: 'tls: skip' with L7 rules on port 443 — L7 inspection cannot work on encrypted traffic" )); } @@ -234,12 +269,9 @@ pub fn validate_l7_policies(data_json: &serde_json::Value) -> (Vec, Vec< )); } - // port 443 + rest + no tls: terminate - if protocol == "rest" && ports.contains(&443) && tls != "terminate" { - warnings.push(format!( - "{loc}: L7 rules won't be evaluated on encrypted traffic without `tls: terminate`" - )); - } + // port 443 + rest + tls: skip — L7 won't work (already handled above) + // The old warning about missing `tls: terminate` is no longer needed + // because TLS termination is now automatic. // Validate HTTP methods in rules if has_rules && protocol == "rest" { @@ -350,7 +382,8 @@ mod tests { .unwrap(); let config = parse_l7_config(&val).unwrap(); assert_eq!(config.protocol, L7Protocol::Rest); - assert_eq!(config.tls, TlsMode::Terminate); + // "terminate" is deprecated and treated as Auto. + assert_eq!(config.tls, TlsMode::Auto); assert_eq!(config.enforcement, EnforcementMode::Enforce); } @@ -362,10 +395,20 @@ mod tests { .unwrap(); let config = parse_l7_config(&val).unwrap(); assert_eq!(config.protocol, L7Protocol::Rest); - assert_eq!(config.tls, TlsMode::Passthrough); + assert_eq!(config.tls, TlsMode::Auto); assert_eq!(config.enforcement, EnforcementMode::Audit); } + #[test] + fn parse_l7_config_skip() { + let val = regorus::Value::from_json_str( + r#"{"protocol": "rest", "tls": "skip", "host": "api.example.com", "port": 443}"#, + ) + .unwrap(); + let config = parse_l7_config(&val).unwrap(); + assert_eq!(config.tls, TlsMode::Skip); + } + #[test] fn parse_l7_config_no_protocol() { let val = @@ -436,35 +479,41 @@ mod tests { } #[test] - fn validate_tls_terminate_requires_protocol() { + fn validate_tls_terminate_deprecated_warning() { let data = serde_json::json!({ "network_policies": { "test": { "endpoints": [{ "host": "api.example.com", "port": 443, - "tls": "terminate" + "tls": "terminate", + "protocol": "rest", + "access": "full" }], "binaries": [] } } }); - let (errors, _warnings) = validate_l7_policies(&data); + let (errors, warnings) = validate_l7_policies(&data); assert!( - errors - .iter() - .any(|e| e.contains("TLS termination requires")) + errors.is_empty(), + "deprecated tls should not error: {errors:?}" + ); + assert!( + warnings.iter().any(|w| w.contains("deprecated")), + "should warn about deprecated tls: {warnings:?}" ); } #[test] - fn validate_port_443_rest_no_tls_warns() { + fn validate_tls_skip_with_l7_on_443_warns() { let data = serde_json::json!({ "network_policies": { "test": { "endpoints": [{ "host": "api.example.com", "port": 443, + "tls": "skip", "protocol": "rest", "access": "read-only" }], @@ -473,7 +522,35 @@ mod tests { } }); let (_errors, warnings) = validate_l7_policies(&data); - assert!(warnings.iter().any(|w| w.contains("tls: terminate"))); + assert!( + warnings.iter().any(|w| w.contains("tls: skip")), + "should warn about skip + L7 on 443: {warnings:?}" + ); + } + + #[test] + fn validate_port_443_rest_no_tls_no_warning() { + // With auto-TLS, no warning is needed for port 443 + rest without + // explicit tls field — TLS will be auto-detected. + let data = serde_json::json!({ + "network_policies": { + "test": { + "endpoints": [{ + "host": "api.example.com", + "port": 443, + "protocol": "rest", + "access": "read-only" + }], + "binaries": [] + } + } + }); + let (errors, warnings) = validate_l7_policies(&data); + assert!(errors.is_empty(), "should have no errors: {errors:?}"); + assert!( + !warnings.iter().any(|w| w.contains("tls")), + "should have no tls warnings with auto-detect: {warnings:?}" + ); } #[test] @@ -681,7 +758,8 @@ mod tests { } #[test] - fn validate_ports_array_rest_443_warns() { + fn validate_ports_array_rest_443_no_warning() { + // With auto-TLS, no warning needed for ports array containing 443. let data = serde_json::json!({ "network_policies": { "test": { @@ -695,10 +773,11 @@ mod tests { } } }); - let (_errors, warnings) = validate_l7_policies(&data); + let (errors, warnings) = validate_l7_policies(&data); + assert!(errors.is_empty(), "should have no errors: {errors:?}"); assert!( - warnings.iter().any(|w| w.contains("tls: terminate")), - "REST on port 443 without tls:terminate should warn, got warnings: {warnings:?}" + !warnings.iter().any(|w| w.contains("tls")), + "should have no tls warnings with auto-detect: {warnings:?}" ); } } diff --git a/crates/openshell-sandbox/src/l7/relay.rs b/crates/openshell-sandbox/src/l7/relay.rs index c1c5bb27..61828047 100644 --- a/crates/openshell-sandbox/src/l7/relay.rs +++ b/crates/openshell-sandbox/src/l7/relay.rs @@ -229,3 +229,72 @@ fn evaluate_l7_request( Ok((allowed, reason)) } + +/// Relay HTTP traffic with credential injection only (no L7 OPA evaluation). +/// +/// Used when TLS is auto-terminated but no L7 policy (`protocol` + `access`/`rules`) +/// is configured. Parses HTTP requests minimally to rewrite credential +/// placeholders and log requests for observability, then forwards everything. +pub async fn relay_passthrough_with_credentials( + client: &mut C, + upstream: &mut U, + ctx: &L7EvalContext, +) -> Result<()> +where + C: AsyncRead + AsyncWrite + Unpin + Send, + U: AsyncRead + AsyncWrite + Unpin + Send, +{ + let provider = crate::l7::rest::RestProvider; + let mut request_count: u64 = 0; + let resolver = ctx.secret_resolver.as_deref(); + + loop { + // Read next request from client. + let req = match provider.parse_request(client).await { + Ok(Some(req)) => req, + Ok(None) => break, // Client closed connection. + Err(e) => { + if is_benign_connection_error(&e) { + break; + } + return Err(e); + } + }; + + request_count += 1; + + // Log for observability. + let has_creds = resolver.is_some(); + info!( + host = %ctx.host, + port = ctx.port, + method = %req.action, + path = %req.target, + credentials_injected = has_creds, + request_num = request_count, + "HTTP_REQUEST", + ); + + // Forward request with credential rewriting. + let keep_alive = + crate::l7::rest::relay_http_request_with_resolver(&req, client, upstream, resolver) + .await?; + + // Relay response back to client. + let reusable = + crate::l7::rest::relay_response_to_client(upstream, client, &req.action).await?; + + if !keep_alive || !reusable { + break; + } + } + + debug!( + host = %ctx.host, + port = ctx.port, + total_requests = request_count, + "Credential injection relay completed" + ); + + Ok(()) +} diff --git a/crates/openshell-sandbox/src/l7/rest.rs b/crates/openshell-sandbox/src/l7/rest.rs index 26b8d4e8..ebb34957 100644 --- a/crates/openshell-sandbox/src/l7/rest.rs +++ b/crates/openshell-sandbox/src/l7/rest.rs @@ -191,7 +191,8 @@ where BodyLength::None => {} } upstream.flush().await.into_diagnostic()?; - relay_response(&req.action, upstream, client).await + let (reusable, _) = relay_response(&req.action, upstream, client).await?; + Ok(reusable) } /// Send a 403 Forbidden JSON deny response. @@ -416,11 +417,27 @@ fn find_crlf(buf: &[u8], start: usize) -> Option { /// /// Returns `true` if the upstream connection is reusable (keep-alive), /// `false` if it was consumed (read-until-EOF or `Connection: close`). +/// Relay an HTTP response from upstream back to the client. +/// +/// Returns `true` if the connection should stay alive for further requests. +pub(crate) async fn relay_response_to_client( + upstream: &mut U, + client: &mut C, + request_method: &str, +) -> Result +where + U: AsyncRead + Unpin, + C: AsyncWrite + Unpin, +{ + let (reusable, _status) = relay_response(request_method, upstream, client).await?; + Ok(reusable) +} + async fn relay_response( request_method: &str, upstream: &mut U, client: &mut C, -) -> Result +) -> Result<(bool, u16)> where U: AsyncRead + Unpin, C: AsyncWrite + Unpin, @@ -441,7 +458,7 @@ where if !buf.is_empty() { client.write_all(&buf).await.into_diagnostic()?; } - return Ok(false); + return Ok((false, 0)); } buf.extend_from_slice(&tmp[..n]); @@ -474,7 +491,7 @@ where .await .into_diagnostic()?; client.flush().await.into_diagnostic()?; - return Ok(!server_wants_close); + return Ok((!server_wants_close, status_code)); } // No explicit framing (no Content-Length, no Transfer-Encoding). @@ -494,7 +511,7 @@ where } relay_until_eof(upstream, client).await?; client.flush().await.into_diagnostic()?; - return Ok(false); + return Ok((false, status_code)); } // No Connection: close — an HTTP/1.1 keep-alive server that omits // framing headers has an empty body. Forward headers and continue @@ -505,7 +522,7 @@ where .await .into_diagnostic()?; client.flush().await.into_diagnostic()?; - return Ok(true); + return Ok((true, status_code)); } // Forward response headers + any overflow body bytes @@ -538,7 +555,7 @@ where // loop will exit via the normal error path. Exiting early here would // tear down the CONNECT tunnel before the client can detect the close, // causing ~30 s retry delays in clients like `gh`. - Ok(true) + Ok((true, status_code)) } /// Parse the HTTP status code from a response status line. @@ -841,7 +858,7 @@ mod tests { .await .expect("relay_response should not deadlock"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); assert!(!reusable, "connection consumed by read-until-EOF"); client_write.shutdown().await.unwrap(); @@ -879,7 +896,7 @@ mod tests { .await .expect("must not block when no Connection: close"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); assert!(reusable, "keep-alive implied, connection reusable"); client_write.shutdown().await.unwrap(); @@ -912,7 +929,7 @@ mod tests { .await .expect("HEAD relay must not deadlock waiting for body"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); assert!(reusable, "HEAD response should be reusable"); client_write.shutdown().await.unwrap(); @@ -942,7 +959,7 @@ mod tests { .await .expect("204 relay must not deadlock"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); assert!(reusable, "204 response should be reusable"); client_write.shutdown().await.unwrap(); @@ -974,7 +991,7 @@ mod tests { .await .expect("must not block when chunked body is complete in overflow"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); assert!(reusable, "connection should be reusable"); client_write.shutdown().await.unwrap(); @@ -1010,7 +1027,7 @@ mod tests { .await .expect("must not block when chunked response has trailers"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); assert!(reusable, "chunked response should be reusable"); client_write.shutdown().await.unwrap(); @@ -1045,7 +1062,7 @@ mod tests { .await .expect("normal relay must not deadlock"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); assert!(reusable, "Content-Length response should be reusable"); client_write.shutdown().await.unwrap(); @@ -1073,7 +1090,7 @@ mod tests { .await .expect("relay must not deadlock"); - let reusable = result.expect("relay_response should succeed"); + let (reusable, _status) = result.expect("relay_response should succeed"); // With explicit framing, Connection: close is still reported as reusable // so the relay loop continues. The *next* upstream write will fail and // exit the loop via the normal error path. diff --git a/crates/openshell-sandbox/src/l7/tls.rs b/crates/openshell-sandbox/src/l7/tls.rs index 25b9c8e7..4ec0de03 100644 --- a/crates/openshell-sandbox/src/l7/tls.rs +++ b/crates/openshell-sandbox/src/l7/tls.rs @@ -259,6 +259,31 @@ pub fn parse_pem_certs(path: &Path) -> Result>> { .into_diagnostic() } +/// Peek the first bytes of a stream and determine if it looks like a TLS +/// ClientHello handshake. +/// +/// A TLS record starts with: +/// - byte 0: `0x16` (ContentType::Handshake) +/// - bytes 1-2: TLS version (0x0301 = TLS 1.0, 0x0302 = TLS 1.1, 0x0303 = TLS 1.2/1.3) +/// +/// Returns `true` if the peeked bytes match the TLS handshake pattern. +/// Returns `false` for plaintext HTTP, raw binary, or insufficient data. +pub fn looks_like_tls(peek: &[u8]) -> bool { + if peek.len() < 3 { + return false; + } + // ContentType::Handshake + if peek[0] != 0x16 { + return false; + } + // TLS version major must be 0x03 (SSL 3.0 / TLS 1.x) + if peek[1] != 0x03 { + return false; + } + // TLS version minor: 0x00 (SSL 3.0) through 0x04 (TLS 1.3 record layer) + peek[2] <= 0x04 +} + #[cfg(test)] mod tests { use super::*; @@ -309,6 +334,42 @@ mod tests { assert_eq!(cache_inner.len(), 1); } + #[test] + fn looks_like_tls_valid_clienthello() { + // TLS 1.0 ClientHello + assert!(looks_like_tls(&[0x16, 0x03, 0x01, 0x00, 0x05])); + // TLS 1.2 + assert!(looks_like_tls(&[0x16, 0x03, 0x03, 0x01, 0x00])); + // TLS 1.3 record layer (minor 0x01, but hello advertises 1.3 via extension) + assert!(looks_like_tls(&[0x16, 0x03, 0x01])); + // SSL 3.0 + assert!(looks_like_tls(&[0x16, 0x03, 0x00])); + } + + #[test] + fn looks_like_tls_rejects_http() { + assert!(!looks_like_tls(b"GET / HTTP/1.1")); + assert!(!looks_like_tls(b"POST /api")); + assert!(!looks_like_tls(b"CONNECT host:443")); + } + + #[test] + fn looks_like_tls_rejects_short_input() { + assert!(!looks_like_tls(&[])); + assert!(!looks_like_tls(&[0x16])); + assert!(!looks_like_tls(&[0x16, 0x03])); + } + + #[test] + fn looks_like_tls_rejects_non_tls_binary() { + // SSH protocol + assert!(!looks_like_tls(b"SSH-2.0-OpenSSH")); + // Random binary + assert!(!looks_like_tls(&[0xFF, 0xFE, 0x00])); + // Wrong content type + assert!(!looks_like_tls(&[0x17, 0x03, 0x03])); // Application data, not handshake + } + #[test] fn upstream_config_alpn() { let _ = rustls::crypto::ring::default_provider().install_default(); diff --git a/crates/openshell-sandbox/src/mechanistic_mapper.rs b/crates/openshell-sandbox/src/mechanistic_mapper.rs index e3567321..e5ae6497 100644 --- a/crates/openshell-sandbox/src/mechanistic_mapper.rs +++ b/crates/openshell-sandbox/src/mechanistic_mapper.rs @@ -114,7 +114,6 @@ pub async fn generate_proposals(summaries: &[DenialSummary]) -> Vec port: *port, ports: vec![*port], protocol: "rest".to_string(), - tls: "terminate".to_string(), enforcement: "enforce".to_string(), rules: l7_rules, allowed_ips: allowed_ips.clone(), @@ -606,7 +605,8 @@ mod tests { // L7 fields should be set. assert_eq!(ep.protocol, "rest"); - assert_eq!(ep.tls, "terminate"); + // tls field is no longer set (auto-detection handles it). + assert!(ep.tls.is_empty()); assert_eq!(ep.enforcement, "enforce"); // Should have L7 rules. diff --git a/crates/openshell-sandbox/src/proxy.rs b/crates/openshell-sandbox/src/proxy.rs index 7fc267e0..bab9b41f 100644 --- a/crates/openshell-sandbox/src/proxy.rs +++ b/crates/openshell-sandbox/src/proxy.rs @@ -523,110 +523,131 @@ async fn handle_tcp_connection( engine = "opa", policy = %policy_str, reason = "", - connect_msg, + "{connect_msg}", ); - if let Some(l7_config) = l7_config { - // Clone engine for per-tunnel L7 evaluation (cheap: shares compiled policy via Arc) - let tunnel_engine = opa_engine.clone_engine_for_tunnel().unwrap_or_else(|e| { - warn!(error = %e, "Failed to clone OPA engine for L7, falling back to L4-only"); - // This shouldn't happen, but if it does fall through to copy_bidirectional - regorus::Engine::new() - }); + // Determine effective TLS mode. Check the raw endpoint config for + // `tls: skip` independently of L7 config (which requires `protocol`). + let effective_tls_skip = + query_tls_mode(&opa_engine, &decision, &host_lc, port) == crate::l7::TlsMode::Skip; - let ctx = crate::l7::relay::L7EvalContext { - host: host_lc.clone(), - port, - policy_name: matched_policy.clone().unwrap_or_default(), - binary_path: decision - .binary - .as_ref() - .map(|p| p.to_string_lossy().into_owned()) - .unwrap_or_default(), - ancestors: decision - .ancestors - .iter() - .map(|p| p.to_string_lossy().into_owned()) - .collect(), - cmdline_paths: decision - .cmdline_paths - .iter() - .map(|p| p.to_string_lossy().into_owned()) - .collect(), - secret_resolver: secret_resolver.clone(), - }; + // Build L7 eval context (shared by TLS-terminated and plaintext paths). + let ctx = crate::l7::relay::L7EvalContext { + host: host_lc.clone(), + port, + policy_name: matched_policy.clone().unwrap_or_default(), + binary_path: decision + .binary + .as_ref() + .map(|p| p.to_string_lossy().into_owned()) + .unwrap_or_default(), + ancestors: decision + .ancestors + .iter() + .map(|p| p.to_string_lossy().into_owned()) + .collect(), + cmdline_paths: decision + .cmdline_paths + .iter() + .map(|p| p.to_string_lossy().into_owned()) + .collect(), + secret_resolver: secret_resolver.clone(), + }; - if l7_config.tls == crate::l7::TlsMode::Terminate { - // TLS termination: MITM decrypt, inspect, re-encrypt - if let Some(ref tls) = tls_state { - let l7_result = async { - let mut tls_client = - crate::l7::tls::tls_terminate_client(client, tls, &host_lc).await?; - let mut tls_upstream = crate::l7::tls::tls_connect_upstream( - upstream, - &host_lc, - tls.upstream_config(), - ) - .await?; - // No protocol detection needed — ALPN proves HTTP + if effective_tls_skip { + // tls: skip — raw tunnel, no termination, no credential injection. + debug!( + host = %host_lc, + port = port, + "tls: skip — bypassing TLS auto-detection, raw tunnel" + ); + let _ = tokio::io::copy_bidirectional(&mut client, &mut upstream) + .await + .into_diagnostic()?; + return Ok(()); + } + + // Auto-detect TLS by peeking the first bytes. + let mut peek_buf = [0u8; 8]; + let n = client.peek(&mut peek_buf).await.into_diagnostic()?; + if n == 0 { + return Ok(()); + } + + let is_tls = crate::l7::tls::looks_like_tls(&peek_buf[..n]); + let is_http = crate::l7::rest::looks_like_http(&peek_buf[..n]); + + if is_tls { + // TLS detected — terminate unconditionally. + if let Some(ref tls) = tls_state { + let tls_result = async { + let mut tls_client = + crate::l7::tls::tls_terminate_client(client, tls, &host_lc).await?; + let mut tls_upstream = + crate::l7::tls::tls_connect_upstream(upstream, &host_lc, tls.upstream_config()) + .await?; + + if let Some(ref l7_config) = l7_config { + // L7 inspection on terminated TLS traffic. + let tunnel_engine = + opa_engine.clone_engine_for_tunnel().unwrap_or_else(|e| { + warn!(error = %e, "Failed to clone OPA engine for L7, falling back to relay-only"); + regorus::Engine::new() + }); crate::l7::relay::relay_with_inspection( - &l7_config, + l7_config, std::sync::Mutex::new(tunnel_engine), &mut tls_client, &mut tls_upstream, &ctx, ) .await - }; - if let Err(e) = l7_result.await { - if is_benign_relay_error(&e) { - debug!( - host = %host_lc, - port = port, - error = %e, - "TLS L7 connection closed" - ); - } else { - warn!( - host = %host_lc, - port = port, - error = %e, - "TLS L7 relay error" - ); - } - } - } else { - warn!( - host = %host_lc, - port = port, - "TLS termination requested but TLS state not configured, falling back to L4" - ); - let _ = tokio::io::copy_bidirectional(&mut client, &mut upstream) + } else { + // No L7 config — relay with credential injection only. + crate::l7::relay::relay_passthrough_with_credentials( + &mut tls_client, + &mut tls_upstream, + &ctx, + ) .await - .into_diagnostic()?; - } - } else { - // Plaintext: protocol detection via peek on raw TcpStream - if l7_config.protocol == crate::l7::L7Protocol::Rest { - let mut peek_buf = [0u8; 8]; - let n = client.peek(&mut peek_buf).await.into_diagnostic()?; - if n == 0 { - return Ok(()); } - if !crate::l7::rest::looks_like_http(&peek_buf[..n]) { + }; + if let Err(e) = tls_result.await { + if is_benign_relay_error(&e) { + debug!( + host = %host_lc, + port = port, + error = %e, + "TLS connection closed" + ); + } else { warn!( host = %host_lc, port = port, - policy = %ctx.policy_name, - "Expected REST protocol but received non-matching bytes. Connection rejected." + error = %e, + "TLS relay error" ); - return Err(miette::miette!( - "Protocol mismatch: expected HTTP but received non-HTTP bytes" - )); } } + } else { + warn!( + host = %host_lc, + port = port, + "TLS detected but TLS state not configured, falling back to raw tunnel" + ); + let _ = tokio::io::copy_bidirectional(&mut client, &mut upstream) + .await + .into_diagnostic()?; + } + } else if is_http { + // Plaintext HTTP detected. + if let Some(ref l7_config) = l7_config { + let tunnel_engine = opa_engine.clone_engine_for_tunnel().unwrap_or_else(|e| { + warn!(error = %e, "Failed to clone OPA engine for L7, falling back to relay-only"); + regorus::Engine::new() + }); if let Err(e) = crate::l7::relay::relay_with_inspection( - &l7_config, + l7_config, std::sync::Mutex::new(tunnel_engine), &mut client, &mut upstream, @@ -635,30 +656,39 @@ async fn handle_tcp_connection( .await { if is_benign_relay_error(&e) { - debug!( - host = %host_lc, - port = port, - error = %e, - "L7 connection closed" - ); + debug!(host = %host_lc, port = port, error = %e, "L7 connection closed"); } else { - warn!( - host = %host_lc, - port = port, - error = %e, - "L7 relay error" - ); + warn!(host = %host_lc, port = port, error = %e, "L7 relay error"); + } + } + } else { + // Plaintext HTTP, no L7 config — relay with credential injection. + if let Err(e) = crate::l7::relay::relay_passthrough_with_credentials( + &mut client, + &mut upstream, + &ctx, + ) + .await + { + if is_benign_relay_error(&e) { + debug!(host = %host_lc, port = port, error = %e, "HTTP relay closed"); + } else { + warn!(host = %host_lc, port = port, error = %e, "HTTP relay error"); } } } - return Ok(()); + } else { + // Neither TLS nor HTTP — raw binary relay. + debug!( + host = %host_lc, + port = port, + "Non-TLS non-HTTP traffic detected, raw tunnel" + ); + let _ = tokio::io::copy_bidirectional(&mut client, &mut upstream) + .await + .into_diagnostic()?; } - // L4-only: raw bidirectional copy (existing behavior) - let _ = tokio::io::copy_bidirectional(&mut client, &mut upstream) - .await - .into_diagnostic()?; - Ok(()) } @@ -1143,6 +1173,38 @@ fn query_l7_config( } } +/// Query the TLS mode for an endpoint, independent of L7 config. +/// +/// This extracts `tls: skip` from the endpoint even when no `protocol` is set. +fn query_tls_mode( + engine: &OpaEngine, + decision: &ConnectDecision, + host: &str, + port: u16, +) -> crate::l7::TlsMode { + let has_policy = match &decision.action { + NetworkAction::Allow { matched_policy } => matched_policy.is_some(), + _ => false, + }; + if !has_policy { + return crate::l7::TlsMode::Auto; + } + + let input = crate::opa::NetworkInput { + host: host.to_string(), + port, + binary_path: decision.binary.clone().unwrap_or_default(), + binary_sha256: String::new(), + ancestors: decision.ancestors.clone(), + cmdline_paths: decision.cmdline_paths.clone(), + }; + + match engine.query_endpoint_config(&input) { + Ok(Some(val)) => crate::l7::parse_tls_mode(&val), + _ => crate::l7::TlsMode::Auto, + } +} + /// Check if an IP address is internal (loopback, private RFC1918, or link-local). /// /// This is a defense-in-depth check to prevent SSRF via the CONNECT proxy. diff --git a/crates/openshell-sandbox/testdata/sandbox-policy.yaml b/crates/openshell-sandbox/testdata/sandbox-policy.yaml index 76ad39b2..297face2 100644 --- a/crates/openshell-sandbox/testdata/sandbox-policy.yaml +++ b/crates/openshell-sandbox/testdata/sandbox-policy.yaml @@ -45,7 +45,6 @@ network_policies: - host: github.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: diff --git a/docs/reference/policy-schema.md b/docs/reference/policy-schema.md index decece75..cb37d0ba 100644 --- a/docs/reference/policy-schema.md +++ b/docs/reference/policy-schema.md @@ -160,7 +160,7 @@ Each endpoint defines a reachable destination and optional inspection rules. | `host` | string | Yes | Hostname or IP address. Supports wildcards: `*.example.com` matches any subdomain. | | `port` | integer | Yes | TCP port number. | | `protocol` | string | No | Set to `rest` to enable HTTP request inspection. Omit for TCP passthrough. | -| `tls` | string | No | TLS handling mode. `terminate` decrypts TLS at the proxy for inspection. `passthrough` forwards encrypted traffic without inspection. Only relevant when `protocol` is `rest`. | +| `tls` | string | No | TLS handling mode. The proxy auto-detects TLS by peeking the first bytes of each connection and terminates it when `protocol` is `rest`, so this field is optional in most cases. Set to `skip` to disable auto-detection for edge cases such as client-certificate mTLS or non-standard protocols. The values `terminate` and `passthrough` are deprecated and log a warning; they are still accepted for backward compatibility but have no effect on behavior. | | `enforcement` | string | No | `enforce` actively blocks disallowed requests. `audit` logs violations but allows traffic through. | | `access` | string | No | HTTP access level. One of `read-only`, `read-write`, or `full`. Mutually exclusive with `rules`. | | `rules` | list of rule objects | No | Fine-grained per-method, per-path allow rules. Mutually exclusive with `access`. | @@ -216,7 +216,6 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: diff --git a/docs/sandboxes/policies.md b/docs/sandboxes/policies.md index 191c9e79..565a7a4c 100644 --- a/docs/sandboxes/policies.md +++ b/docs/sandboxes/policies.md @@ -57,7 +57,6 @@ network_policies: - host: api.example.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: full binaries: @@ -73,7 +72,7 @@ Dynamic sections can be updated on a running sandbox with `openshell policy set` | `filesystem_policy` | Static | Controls which directories the agent can access on disk. Paths are split into `read_only` and `read_write` lists. Any path not listed in either list is inaccessible. Set `include_workdir: true` to automatically add the agent's working directory to `read_write`. [Landlock LSM](https://docs.kernel.org/security/landlock.html) enforces these restrictions at the kernel level. | | `landlock` | Static | Configures Landlock LSM enforcement behavior. Set `compatibility` to `best_effort` (use the highest ABI the host kernel supports) or `hard_requirement` (fail if the required ABI is unavailable). | | `process` | Static | Sets the OS-level identity for the agent process. `run_as_user` and `run_as_group` default to `sandbox`. Root (`root` or `0`) is rejected. The agent also runs with seccomp filters that block dangerous system calls. | -| `network_policies` | Dynamic | Controls network access for ordinary outbound traffic from the sandbox. Each block has a name, a list of endpoints (host, port, protocol, and optional rules), and a list of binaries allowed to use those endpoints.
Every outbound connection except `https://inference.local` goes through the proxy, which queries the {doc}`policy engine <../about/architecture>` with the destination and calling binary. A connection is allowed only when both match an entry in the same policy block.
For endpoints with `protocol: rest` and `tls: terminate`, each HTTP request is also checked against that endpoint's `rules` (method and path).
Endpoints without `protocol` or `tls` allow the TCP stream through without inspecting payloads.
If no endpoint matches, the connection is denied. Configure managed inference separately through {doc}`../inference/configure`. | +| `network_policies` | Dynamic | Controls network access for ordinary outbound traffic from the sandbox. Each block has a name, a list of endpoints (host, port, protocol, and optional rules), and a list of binaries allowed to use those endpoints.
Every outbound connection except `https://inference.local` goes through the proxy, which queries the {doc}`policy engine <../about/architecture>` with the destination and calling binary. A connection is allowed only when both match an entry in the same policy block.
For endpoints with `protocol: rest`, the proxy auto-detects TLS and terminates it so each HTTP request is checked against that endpoint's `rules` (method and path).
Endpoints without `protocol` allow the TCP stream through without inspecting payloads.
If no endpoint matches, the connection is denied. Configure managed inference separately through {doc}`../inference/configure`. | ## Apply a Custom Policy @@ -207,7 +206,7 @@ Allow `pip install` and `uv pip install` to reach PyPI: - { path: /usr/local/bin/uv } ``` -Endpoints without `protocol` or `tls` use TCP passthrough — the proxy allows the stream without inspecting payloads. +Endpoints without `protocol` use TCP passthrough, where the proxy allows the stream without inspecting payloads. :::: ::::{tab-item} Granular rules @@ -224,7 +223,6 @@ For an end-to-end walkthrough that combines this policy with a GitHub credential - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -253,7 +251,7 @@ For an end-to-end walkthrough that combines this policy with a GitHub credential - { path: /usr/bin/gh } ``` -Endpoints with `protocol: rest` and `tls: terminate` enable HTTP request inspection — the proxy decrypts TLS and checks each HTTP request against the `rules` list. +Endpoints with `protocol: rest` enable HTTP request inspection. The proxy auto-detects TLS on HTTPS endpoints, terminates it, and checks each HTTP request against the `rules` list. :::: ::::: diff --git a/docs/tutorials/first-network-policy.md b/docs/tutorials/first-network-policy.md index 44f5b14d..5011ac89 100644 --- a/docs/tutorials/first-network-policy.md +++ b/docs/tutorials/first-network-policy.md @@ -127,14 +127,13 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: - { path: /usr/bin/curl } ``` -The `filesystem_policy`, `landlock`, and `process` sections preserve the default sandbox settings. This is required because `policy set` replaces the entire policy. The `network_policies` section is the key part: `curl` may make GET, HEAD, and OPTIONS requests to `api.github.com` over HTTPS. Everything else is denied. The proxy terminates TLS using `tls: terminate` to inspect each HTTP request and enforce the `read-only` access preset at the method level. +The `filesystem_policy`, `landlock`, and `process` sections preserve the default sandbox settings. This is required because `policy set` replaces the entire policy. The `network_policies` section is the key part: `curl` may make GET, HEAD, and OPTIONS requests to `api.github.com` over HTTPS. Everything else is denied. The proxy auto-detects TLS on HTTPS endpoints and terminates it to inspect each HTTP request and enforce the `read-only` access preset at the method level. Apply it: diff --git a/docs/tutorials/github-sandbox.md b/docs/tutorials/github-sandbox.md index 9798b629..9372ca29 100644 --- a/docs/tutorials/github-sandbox.md +++ b/docs/tutorials/github-sandbox.md @@ -230,7 +230,7 @@ network_policies: claude_code: name: claude-code endpoints: - - { host: api.anthropic.com, port: 443, protocol: rest, enforcement: enforce, access: full, tls: terminate } + - { host: api.anthropic.com, port: 443, protocol: rest, enforcement: enforce, access: full } - { host: statsig.anthropic.com, port: 443 } - { host: sentry.io, port: 443 } - { host: raw.githubusercontent.com, port: 443 } @@ -257,7 +257,6 @@ network_policies: - host: github.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: - allow: @@ -280,7 +279,6 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce rules: # GraphQL API (used by gh CLI) @@ -337,8 +335,8 @@ The following table summarizes the two GitHub-specific blocks: | Block | Endpoint | Behavior | |---|---|---| -| `github_git` | `github.com:443` | Git Smart HTTP protocol with TLS termination. Permits `info/refs` (clone/fetch), `git-upload-pack` (fetch data), and `git-receive-pack` (push) for the specified repository. Denies all operations on unlisted repositories. | -| `github_api` | `api.github.com:443` | REST API with TLS termination. Permits all HTTP methods for the specified repository and GraphQL queries. Denies API access to unlisted repositories. | +| `github_git` | `github.com:443` | Git Smart HTTP protocol. The proxy auto-detects and terminates TLS to inspect requests. Permits `info/refs` (clone/fetch), `git-upload-pack` (fetch data), and `git-receive-pack` (push) for the specified repository. Denies all operations on unlisted repositories. | +| `github_api` | `api.github.com:443` | REST API. The proxy auto-detects and terminates TLS to inspect requests. Permits all HTTP methods for the specified repository and GraphQL queries. Denies API access to unlisted repositories. | The remaining blocks (`claude_code`, `nvidia_inference`, `pypi`, `vscode`) are identical to the {doc}`default policy `. The default policy's `github_ssh_over_https` and `github_rest_api` blocks are replaced by the `github_git` and `github_api` blocks above, which grant write access to the specified repository. Sandbox behavior outside of GitHub operations is unchanged. diff --git a/examples/sandbox-policy-quickstart/policy.yaml b/examples/sandbox-policy-quickstart/policy.yaml index d90406c1..6bb0cb7d 100644 --- a/examples/sandbox-policy-quickstart/policy.yaml +++ b/examples/sandbox-policy-quickstart/policy.yaml @@ -26,7 +26,6 @@ network_policies: - host: api.github.com port: 443 protocol: rest - tls: terminate enforcement: enforce access: read-only binaries: diff --git a/scripts/smoke-test-network-policy.sh b/scripts/smoke-test-network-policy.sh new file mode 100755 index 00000000..383bde0f --- /dev/null +++ b/scripts/smoke-test-network-policy.sh @@ -0,0 +1,481 @@ +#!/usr/bin/env bash +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# ============================================================================= +# Network Policy Smoke Test +# ============================================================================= +# +# End-to-end smoke test for sandbox network policies, TLS auto-termination, +# credential injection, and L4/L7 enforcement. Uses GitHub's API as the target. +# +# Prerequisites: +# - A running OpenShell gateway (`openshell status` shows Healthy) +# - GITHUB_TOKEN or GH_TOKEN env var set with a valid GitHub token +# - The `openshell` CLI on PATH +# +# Usage: +# GITHUB_TOKEN=ghp_xxx ./scripts/smoke-test-network-policy.sh +# +# What it tests: +# +# Phase 1 — L4 allow/deny (no L7 rules): +# Creates a sandbox with L4-only policy for api.github.com. +# - curl api.github.com/zen -> should succeed (TLS auto-terminated) +# - curl httpbin.org -> should be blocked (implicit deny) +# +# Phase 2 — L7 enforcement (method + path rules): +# Creates a sandbox with read-only L7 enforcement. +# - GET /zen -> should succeed +# - POST /user/repos -> should be blocked (403) +# +# Phase 3 — Credential injection: +# Creates a sandbox with provider attached and full L7 access. +# - curl /user (no auth header) -> should return authenticated response +# (proxy auto-injects GITHUB_TOKEN via TLS MITM) +# +# Phase 4 — tls: skip escape hatch: +# Creates a sandbox with tls: skip. +# - curl /zen -> should succeed (raw tunnel, no auth needed) +# - curl /user -> should get 401 (no credential injection) +# +# After all tests, sandboxes are kept alive for log inspection. +# The script prompts before cleanup. +# +# ============================================================================= +# +# Embedded Policy YAMLs +# ============================================================================= +# +# POLICY_L4_ONLY (L4 allow api.github.com:443, deny everything else): +# network_policies: +# github_api: +# endpoints: [{ host: api.github.com, port: 443 }] +# binaries: [{ path: /usr/bin/curl }] +# +# POLICY_L7_READONLY (L7 read-only enforcement): +# network_policies: +# github_api: +# endpoints: +# - host: api.github.com +# port: 443 +# protocol: rest +# enforcement: enforce +# access: read-only +# binaries: [{ path: /usr/bin/curl }] +# +# POLICY_CRED_INJECT (L7 full access, provider credential injection): +# Same as L7 but with access: full +# +# POLICY_TLS_SKIP (L4 with tls: skip — raw tunnel): +# network_policies: +# github_api: +# endpoints: [{ host: api.github.com, port: 443, tls: skip }] +# binaries: [{ path: /usr/bin/curl }] +# +# ============================================================================= + +set -uo pipefail +# Note: NOT using set -e so we can capture exit codes without exiting. + +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[0;33m' +BOLD='\033[1m' +RESET='\033[0m' + +PASS_COUNT=0 +FAIL_COUNT=0 + +pass() { ((PASS_COUNT++)); echo -e " ${GREEN}PASS${RESET} $1"; } +fail() { ((FAIL_COUNT++)); echo -e " ${RED}FAIL${RESET} $1\n $2"; } +header() { echo -e "\n${BOLD}=== $1 ===${RESET}"; } + +PROVIDER_NAME="smoke-test-github" +SANDBOXES=() +POLICY_DIR="" + +# Resolve token from GITHUB_TOKEN or GH_TOKEN +TOKEN="${GITHUB_TOKEN:-${GH_TOKEN:-}}" + +# --------------------------------------------------------------------------- +# Preflight +# --------------------------------------------------------------------------- + +header "Preflight" + +if [[ -z "$TOKEN" ]]; then + echo -e "${RED}Error: GITHUB_TOKEN or GH_TOKEN env var is required${RESET}" + exit 1 +fi +echo " Token is set" + +if ! openshell status >/dev/null 2>&1; then + echo -e "${RED}Error: No healthy gateway. Run: openshell gateway start${RESET}" + exit 1 +fi +echo " Gateway is healthy" + +POLICY_DIR=$(mktemp -d) +echo " Policy dir: $POLICY_DIR" + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +write_policy() { + local name="$1" + local file="$POLICY_DIR/${name}.yaml" + cat > "$file" + echo "$file" +} + +# Create a sandbox with --keep and a sleep, wait for Ready. +create_sandbox() { + local name="$1" + shift + local provider_flag=("$@") + + echo " Creating sandbox: $name" + openshell sandbox create --name "$name" --keep "${provider_flag[@]}" \ + -- sh -c "echo Ready && sleep 3600" >/dev/null 2>&1 & + local pid=$! + + local attempts=0 + while [[ $attempts -lt 40 ]]; do + if openshell sandbox list 2>/dev/null | grep -q "$name.*Ready"; then + echo " Sandbox $name is Ready" + SANDBOXES+=("$name") + # Kill the blocking create process (sandbox stays alive with --keep) + kill "$pid" 2>/dev/null || true + wait "$pid" 2>/dev/null || true + return 0 + fi + sleep 2 + ((attempts++)) + done + + echo -e " ${RED}TIMEOUT waiting for $name${RESET}" + kill "$pid" 2>/dev/null || true + wait "$pid" 2>/dev/null || true + return 1 +} + +# Run a command inside a sandbox via SSH. +sandbox_exec() { + local name="$1" + shift + + local ssh_config + ssh_config=$(openshell sandbox ssh-config "$name" 2>/dev/null) + local ssh_host + ssh_host=$(echo "$ssh_config" | grep "^Host " | awk '{print $2}') + local ssh_config_file="$POLICY_DIR/ssh_config_${name}" + echo "$ssh_config" > "$ssh_config_file" + + ssh -F "$ssh_config_file" \ + -o StrictHostKeyChecking=no \ + -o UserKnownHostsFile=/dev/null \ + -o LogLevel=ERROR \ + -o ConnectTimeout=15 \ + "$ssh_host" "$@" 2>&1 +} + +# --------------------------------------------------------------------------- +# Write policies +# --------------------------------------------------------------------------- + +POLICY_L4=$(write_policy l4-only <<'YAML' +version: 1 +filesystem_policy: + include_workdir: true + read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log] + read_write: [/sandbox, /tmp, /dev/null] +landlock: + compatibility: best_effort +process: + run_as_user: sandbox + run_as_group: sandbox +network_policies: + github_api: + name: github-api-l4 + endpoints: + - host: api.github.com + port: 443 + binaries: + - { path: /usr/bin/curl } +YAML +) + +POLICY_L7_RO=$(write_policy l7-readonly <<'YAML' +version: 1 +filesystem_policy: + include_workdir: true + read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log] + read_write: [/sandbox, /tmp, /dev/null] +landlock: + compatibility: best_effort +process: + run_as_user: sandbox + run_as_group: sandbox +network_policies: + github_api: + name: github-api-l7-readonly + endpoints: + - host: api.github.com + port: 443 + protocol: rest + enforcement: enforce + access: read-only + binaries: + - { path: /usr/bin/curl } +YAML +) + +POLICY_CRED=$(write_policy l7-cred-inject <<'YAML' +version: 1 +filesystem_policy: + include_workdir: true + read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log] + read_write: [/sandbox, /tmp, /dev/null] +landlock: + compatibility: best_effort +process: + run_as_user: sandbox + run_as_group: sandbox +network_policies: + github_api: + name: github-api-cred-inject + endpoints: + - host: api.github.com + port: 443 + protocol: rest + enforcement: enforce + access: full + binaries: + - { path: /usr/bin/curl } +YAML +) + +POLICY_SKIP=$(write_policy tls-skip <<'YAML' +version: 1 +filesystem_policy: + include_workdir: true + read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log] + read_write: [/sandbox, /tmp, /dev/null] +landlock: + compatibility: best_effort +process: + run_as_user: sandbox + run_as_group: sandbox +network_policies: + github_api: + name: github-api-skip + endpoints: + - host: api.github.com + port: 443 + tls: skip + binaries: + - { path: /usr/bin/curl } +YAML +) + +# --------------------------------------------------------------------------- +# Phase 0: Provider setup +# --------------------------------------------------------------------------- + +header "Phase 0: Provider Setup" + +openshell provider delete "$PROVIDER_NAME" >/dev/null 2>&1 || true + +if openshell provider create \ + --name "$PROVIDER_NAME" \ + --type github \ + --credential "GITHUB_TOKEN=$TOKEN" >/dev/null 2>&1; then + pass "Provider '$PROVIDER_NAME' created" +else + fail "Provider creation failed" "" + exit 1 +fi + +# --------------------------------------------------------------------------- +# Phase 1: L4 allow/deny +# --------------------------------------------------------------------------- + +header "Phase 1: L4 Allow/Deny (no L7 rules, TLS auto-terminated)" + +SB1="smoke-l4" +if create_sandbox "$SB1"; then + echo " Setting L4-only policy..." + openshell policy set "$SB1" --policy "$POLICY_L4" >/dev/null 2>&1 + echo " Waiting for policy propagation (15s)..." + sleep 15 + + # Test 1: L4 allow + echo " Running: curl api.github.com/zen" + output=$(sandbox_exec "$SB1" "curl -s -o /dev/null -w '%{http_code}' --max-time 10 https://api.github.com/zen") + if [[ "$output" == *"200"* ]]; then + pass "L4 allow: curl to api.github.com succeeded (HTTP 200)" + else + fail "L4 allow: expected HTTP 200" "got: $output" + fi + + # Test 2: L4 deny (implicit deny for httpbin.org) + echo " Running: curl httpbin.org (should be blocked)" + output=$(sandbox_exec "$SB1" "curl -s -o /dev/null -w '%{http_code}' --max-time 10 https://httpbin.org/get" || true) + if [[ "$output" == *"403"* || "$output" == *"000"* || -z "$output" ]]; then + pass "L4 deny: curl to httpbin.org blocked" + else + fail "L4 deny: expected connection failure" "got: $output" + fi +else + fail "L4 sandbox creation failed" "" + fail "L4 deny test skipped" "sandbox not created" +fi + +# --------------------------------------------------------------------------- +# Phase 2: L7 enforcement +# --------------------------------------------------------------------------- + +header "Phase 2: L7 Enforcement (read-only, TLS auto-terminated)" + +SB2="smoke-l7" +if create_sandbox "$SB2"; then + echo " Setting L7 read-only policy..." + openshell policy set "$SB2" --policy "$POLICY_L7_RO" >/dev/null 2>&1 + echo " Waiting for policy propagation (15s)..." + sleep 15 + + # Test 3: L7 allow (GET) + echo " Running: GET /zen" + output=$(sandbox_exec "$SB2" "curl -s -o /dev/null -w '%{http_code}' --max-time 10 https://api.github.com/zen") + if [[ "$output" == *"200"* ]]; then + pass "L7 allow: GET /zen succeeded (read-only allows GET)" + else + fail "L7 allow: expected HTTP 200 for GET" "got: $output" + fi + + # Test 4: L7 deny (POST blocked by read-only) + echo " Running: POST /user/repos (should be blocked)" + output=$(sandbox_exec "$SB2" "curl -s -o /dev/null -w '%{http_code}' --max-time 10 -X POST https://api.github.com/user/repos -d '{\"name\":\"should-not-create\"}'" || true) + if [[ "$output" == *"403"* ]]; then + pass "L7 deny: POST blocked by read-only enforcement" + else + fail "L7 deny: expected HTTP 403 for POST" "got: $output" + fi +else + fail "L7 sandbox creation failed" "" + fail "L7 deny test skipped" "sandbox not created" +fi + +# --------------------------------------------------------------------------- +# Phase 3: Credential injection +# --------------------------------------------------------------------------- + +header "Phase 3: Credential Injection (provider attached, TLS auto-terminated)" + +SB3="smoke-cred" +if create_sandbox "$SB3" --provider "$PROVIDER_NAME"; then + echo " Setting L7 full policy..." + openshell policy set "$SB3" --policy "$POLICY_CRED" >/dev/null 2>&1 + echo " Waiting for policy propagation (15s)..." + sleep 15 + + # Test 5: Credential injection — curl /user using the placeholder env var. + # The sandbox process sees GITHUB_TOKEN=openshell:resolve:env:GITHUB_TOKEN + # in its environment. When curl sends this as an Authorization header, + # the proxy's SecretResolver rewrites the placeholder to the real token. + echo " Running: curl /user -H 'Authorization: token \$GITHUB_TOKEN'" + output=$(sandbox_exec "$SB3" 'curl -s --max-time 10 -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/user' || true) + if [[ "$output" == *"login"* ]]; then + pass "Credential injection: /user returned authenticated response" + elif [[ "$output" == *"401"* || "$output" == *"Unauthorized"* ]]; then + fail "Credential injection: got 401 (placeholder may have leaked)" "$output" + else + fail "Credential injection: unexpected response" "$output" + fi +else + fail "Credential injection sandbox creation failed" "" +fi + +# --------------------------------------------------------------------------- +# Phase 4: tls: skip escape hatch +# --------------------------------------------------------------------------- + +header "Phase 4: tls: skip (raw tunnel, no MITM)" + +SB4="smoke-skip" +if create_sandbox "$SB4" --provider "$PROVIDER_NAME"; then + echo " Setting tls: skip policy..." + openshell policy set "$SB4" --policy "$POLICY_SKIP" >/dev/null 2>&1 + echo " Waiting for policy propagation (15s)..." + sleep 15 + + # Test 6: L4 connection succeeds (raw tunnel, /zen needs no auth) + echo " Running: curl /zen (should succeed via raw tunnel)" + output=$(sandbox_exec "$SB4" "curl -s -o /dev/null -w '%{http_code}' --max-time 10 https://api.github.com/zen" || true) + if [[ "$output" == *"200"* ]]; then + pass "tls: skip: L4 connection succeeded (raw tunnel)" + else + fail "tls: skip: expected 200 for /zen" "got: $output" + fi + + # Test 7: Credential injection does NOT work with tls: skip. + # The placeholder leaks verbatim since there's no MITM to rewrite it. + echo " Running: curl /user with \$GITHUB_TOKEN (should fail, placeholder leaks)" + output=$(sandbox_exec "$SB4" 'curl -s --max-time 10 -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/user' || true) + if [[ "$output" == *"401"* || "$output" == *"Unauthorized"* || "$output" == *"Bad credentials"* ]]; then + pass "tls: skip: /user returned 401 (credential injection bypassed)" + elif [[ "$output" == *"login"* ]]; then + fail "tls: skip: /user was authenticated (MITM should be disabled)" "$output" + else + pass "tls: skip: /user not authenticated (expected)" + fi +else + fail "tls: skip sandbox creation failed" "" +fi + +# --------------------------------------------------------------------------- +# Results +# --------------------------------------------------------------------------- + +header "Results" +echo -e " ${GREEN}Passed: ${PASS_COUNT}${RESET}" +echo -e " ${RED}Failed: ${FAIL_COUNT}${RESET}" +echo "" + +if [[ ${#SANDBOXES[@]} -gt 0 ]]; then + echo -e "${BOLD}Sandboxes kept for inspection:${RESET}" + for sb in "${SANDBOXES[@]}"; do + echo " - $sb" + done + echo "" + echo "Inspect logs with:" + echo " openshell logs --source sandbox" + echo "" + + read -r -p "Delete all smoke test sandboxes and provider? [y/N] " answer + if [[ "$answer" =~ ^[Yy]$ ]]; then + echo "" + for sb in "${SANDBOXES[@]}"; do + openshell sandbox delete "$sb" >/dev/null 2>&1 && echo " Deleted $sb" || true + done + openshell provider delete "$PROVIDER_NAME" >/dev/null 2>&1 && echo " Deleted provider $PROVIDER_NAME" || true + else + echo " Sandboxes left running. Clean up manually:" + echo " openshell sandbox delete --all" + echo " openshell provider delete $PROVIDER_NAME" + fi +fi + +# Clean up temp files +rm -rf "$POLICY_DIR" + +echo "" +if [[ $FAIL_COUNT -gt 0 ]]; then + echo -e "${RED}${BOLD}SMOKE TEST FAILED${RESET}" + exit 1 +else + echo -e "${GREEN}${BOLD}SMOKE TEST PASSED${RESET}" + exit 0 +fi