feat: add remote-cli docker troubleshooting#175
Conversation
Co-authored-by: Son Dao <son.dao@katalon.com>
Co-authored-by: Son Dao <son.dao@katalon.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a tightly-scoped Docker troubleshooting surface to Thor by routing agent-visible docker ... commands through remote-cli (server-side policy enforcement), while keeping the real Docker CLI and host Docker socket mounted only inside remote-cli.
Changes:
- Add
POST /exec/dockertoremote-cliwith an allowlist policy (ps,logs,stats) and NDJSON streaming support. - Add an OpenCode-side
dockershim that forwards toremote-cliwhile preserving normal Docker CLI syntax. - Wire Docker CLI + socket into the
remote-clicontainer only, document operator setup (DOCKER_SOCKET_GID), and add targeted unit/e2e coverage.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/test-e2e.sh | Adds an e2e check that OpenCode’s docker is a shim allowing only the approved subcommands. |
| README.md | Documents the new Docker troubleshooting integration surface + env var. |
| packages/remote-cli/src/policy.ts | Adds validateDockerArgs allowlist/denylist enforcement. |
| packages/remote-cli/src/policy.test.ts | Adds unit tests for the Docker policy validator. |
| packages/remote-cli/src/index.ts | Adds /exec/docker NDJSON streaming endpoint backed by policy + exec streaming. |
| packages/remote-cli/src/exec.ts | Extends streaming exec helper to support aborting via AbortSignal. |
| packages/remote-cli/src/docker.test.ts | Adds endpoint tests validating allow/deny behavior and NDJSON streaming output. |
| docs/plan/2026053101_docker-remote-cli-access.md | Design/rollout plan for the Docker troubleshooting surface. |
| docs/feat/security-model.md | Updates security model to call out Docker socket isolation and allowlist behavior. |
| Dockerfile | Installs Docker CLI only in remote-cli image; adds OpenCode docker shim only. |
| docker/opencode/config/skills/docker/SKILL.md | Adds an agent skill doc for the supported Docker troubleshooting commands. |
| docker/opencode/bin/docker | Adds shim that forwards docker ... to remote-cli. |
| docker-compose.yml | Mounts /var/run/docker.sock only into remote-cli and adds group_add for non-root access. |
| .github/workflows/core-e2e.yml | Exports DOCKER_SOCKET_GID for CI compose runs. |
| .env.example | Documents optional DOCKER_SOCKET_GID configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "opencode docker inspect returns remote-cli policy error" \ | ||
| "output: ${opencode_docker_inspect_output:0:500}" | ||
|
|
||
| # ── 2. Remote-cli git/gh auth ──────────────────────────────────────────────── |
There was a problem hiding this comment.
Fixed in 25cc4a9 — renumbered the following section header so the e2e step output stays consistent.
AI-generated — verify before acting. View Thor context
| export function validateDockerArgs(args: string[]): string | null { | ||
| if (!Array.isArray(args) || args.length === 0) { | ||
| return "args must be a non-empty array"; | ||
| } | ||
|
|
||
| for (const arg of args) { | ||
| const flag = arg.split("=")[0]; | ||
| const deniedFlag = arg.startsWith("-H") && arg.length > 2 ? "-H" : flag; | ||
| if (DENIED_DOCKER_DAEMON_FLAGS.has(deniedFlag)) { | ||
| return `flag "${deniedFlag}" is not allowed for docker`; | ||
| } | ||
| } | ||
|
|
||
| const subcommand = args[0]; | ||
| if (!ALLOWED_DOCKER_SUBCOMMANDS.has(subcommand)) { | ||
| return `"docker ${subcommand}" is not allowed — only docker ps, docker logs, and docker stats are permitted`; | ||
| } | ||
|
|
||
| return null; | ||
| } |
There was a problem hiding this comment.
Fixed in 25cc4a9 — validateDockerArgs now rejects non-string entries before iterating, so malformed payloads return a clean policy error instead of throwing.
AI-generated — verify before acting. View Thor context
| app.post("/exec/docker", async (req, res) => { | ||
| const writeNdjson = (chunk: ExecStreamEvent) => { | ||
| res.write(JSON.stringify(chunk) + "\n"); | ||
| }; |
There was a problem hiding this comment.
Fixed in 5e9af69 — NDJSON writes now go through a safe writer that no-ops after the response is ended/destroyed and swallows late write errors. I reused it across the streaming routes and added regression coverage for the safe-writer behavior.
AI-generated — verify before acting. View Thor context
| export function validateDockerArgs(args: string[]): string | null { | ||
| if (!Array.isArray(args) || args.length === 0 || !args.every((arg) => typeof arg === "string")) { | ||
| return "args must be a non-empty array"; | ||
| } |
There was a problem hiding this comment.
Fixed in the latest branch head — the Docker policy now returns args must be a non-empty string array, and I added a route-level regression test covering non-string args.
AI-generated — verify before acting. View Thor context
Co-authored-by: Son Dao <son.dao@katalon.com>
|
|
||
| const write = (chunk: ExecStreamEvent) => { | ||
| res.write(JSON.stringify(chunk) + "\n"); | ||
| }; | ||
| const write = createSafeNdjsonWriter(res); | ||
|
|
||
| await withNdjsonHeartbeat(write, async () => { | ||
| const exitCode = await execCommandStream("scoutqa", args, "/workspace", { |
There was a problem hiding this comment.
Thanks — I’m confirming with the original requester before making further PR changes.
AI-generated — verify before acting. View Thor context
There was a problem hiding this comment.
Fixed in 3f8db0d — the scoutqa streaming catch path now reuses createSafeNdjsonWriter(...) for its exit chunk, so it no longer bypasses the disconnect-safe writer on late errors.
AI-generated — verify before acting. View Thor context
| describe("validateDockerArgs", () => { | ||
| it("allows only docker ps, logs, and stats with pass-through arguments", () => { | ||
| expect(validateDockerArgs(["ps"])).toBeNull(); | ||
| expect(validateDockerArgs(["ps", "--all", "--format", "{{.Names}"])).toBeNull(); |
There was a problem hiding this comment.
Thanks — I’m confirming with the original requester before making further PR changes.
AI-generated — verify before acting. View Thor context
There was a problem hiding this comment.
Fixed in 3f8db0d — corrected the example to the canonical "{{.Names}}" form.
AI-generated — verify before acting. View Thor context
Co-authored-by: Son Dao <son.dao@katalon.com>
| export interface ExecCommandStreamOptions { | ||
| signal?: AbortSignal; | ||
| } |
There was a problem hiding this comment.
Thanks — I’m confirming with the original requester before making further changes here.
AI-generated — verify before acting. View Thor context
There was a problem hiding this comment.
Fixed in 10ad8e4 — added focused execCommandStream regression coverage proving abort triggers child.kill(...) and that the abort listener is cleaned up when the process settles.
AI-generated — verify before acting. View Thor context
| let settled = false; | ||
| const child = spawn(binary, args, { cwd, stdio: ["ignore", "pipe", "pipe"] }); | ||
|
|
||
| const finish = (exitCode: number) => { | ||
| if (settled) return; | ||
| settled = true; | ||
| options.signal?.removeEventListener("abort", abort); | ||
| resolve(exitCode); | ||
| }; | ||
|
|
||
| const abort = () => { | ||
| if (child.exitCode === null && !child.killed) { | ||
| child.kill("SIGTERM"); | ||
| } | ||
| }; | ||
|
|
||
| if (options.signal) { | ||
| if (options.signal.aborted) { | ||
| abort(); | ||
| } else { | ||
| options.signal.addEventListener("abort", abort, { once: true }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Thanks — I’m confirming with the original requester before making further changes here.
AI-generated — verify before acting. View Thor context
There was a problem hiding this comment.
Fixed in 10ad8e4 — execCommandStream now escalates from SIGTERM to SIGKILL after a 5s grace period if the child still has not exited, and clears the escalation timer when the process settles. Added unit coverage for the escalation and timer cleanup path.
AI-generated — verify before acting. View Thor context
Co-authored-by: Son Dao <son.dao@katalon.com>
Summary
docker ps,docker logs, anddocker statsdocker ...syntax in OpenCode via a shim while keeping the real Docker CLI and socket mounted only inremote-cliTesting
Notes
docker compose configwas not runnable in this environment because the Docker CLI is unavailable here; the compose/docker wiring is covered by code review and targeted file changes.AI-generated — verify before acting. View Thor context