docs: add instructions for running coding agents offline#281
docs: add instructions for running coding agents offline#281d-morrison wants to merge 11 commits into
Conversation
Add a new section "Running Coding Agents Offline" to the AI tools chapter. Covers Ollama setup, connecting Positron and VS Code (via Continue) to a local model, using local models on HPC/SLURM clusters with port forwarding, and data-privacy considerations. Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds documentation to the AI tools chapter for using coding agents without internet access,
focusing on running local models (Ollama) and integrating them into common editor workflows.
Changes:
- Adds a new include file describing offline/local-agent workflows (Ollama setup, editor integration, HPC usage, privacy).
- Inserts a new “Running Coding Agents Offline” section into
ai-tools.qmdthat includes the new content.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| ai-tools/running-agents-offline.qmd | New offline/local-agent guidance (Ollama, Positron/VS Code integration, HPC workflow, privacy notes). |
| ai-tools.qmd | Adds a new section entry and includes the new offline guidance file. |
|
@claude address feedback |
|
Claude finished @d-morrison's task in 1m 32s —— View job Addressing Copilot Review Feedback
All 4 copilot review comments have been addressed in commit
|
- Remove comparative claim ("most common") about Ollama
- Add security caution for piping remote install script to sh
- Fix SSH port-forwarding example to target compute node, not login node
- Replace direct file link with @sec-slurm cross-reference
Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
|
|
@claude review |
|
Claude finished @d-morrison's task in 1m 26s —— View job PR Review
Overall this is a well-structured, useful addition. The previous Copilot feedback has been addressed. I found a few issues worth fixing before merge.
|
| Severity | Issue | Location |
|---|---|---|
| Bug | qwen2.5-coder:32b VRAM comment says ~8 GB (should be ~20 GB) |
line 50 |
| Bug | SSH tunnel fails unless OLLAMA_HOST=0.0.0.0 is set on compute node |
lines 109–119 |
| Minor | Step 2 of HPC workflow missing actual shell command | lines 109–111 |
| Minor | llama3.3 default is 70B; no VRAM warning given |
line 54 |
The two bugs should be addressed before merge; the minor items are suggestions.
|
@claude address review comments |
|
Claude finished @d-morrison's task in 1m 11s —— View job Addressing Review Comments
All four issues from the review have been addressed:
| |
- Move ~8 GB VRAM note to qwen2.5-coder:7b line (was incorrectly on 32b) - Fix qwen2.5-coder:32b VRAM requirement to ~20 GB - Add VRAM warning for llama3.3 (70B variant, ~40 GB) - Add OLLAMA_HOST=0.0.0.0:11434 command to HPC step 2 so SSH tunnel can reach compute node Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
|
@claude review |
|
Claude finished @d-morrison's task in 1m 52s —— View job PR Review
All four issues from the previous review and all four Copilot comments have been addressed. The document is in good shape. I found two practical gaps that are likely to trip up readers, plus two minor notes.
|
| Severity | Issue | Location |
|---|---|---|
| Important | SSH tunnel session lifetime not explained; may confuse beginners | lines 119–124 |
| Important | No warning about HPC home-directory quota for large model files | lines 100–106 |
| Minor | Positron API key field may need a placeholder value | line 82 |
| Minor | curl | less can behave oddly in some terminals |
line 30 |
The two important items are practical gaps that will affect readers attempting to follow the guide on real HPC systems. The minor items are polish.
|
@claude review |
|
Claude finished @d-morrison's task in 1m 53s —— View job PR Review
|
| Severity | Issue | Status |
|---|---|---|
| Important | SSH tunnel session lifetime unexplained | Not yet addressed |
| Important | No HPC home-directory quota warning | Not yet addressed |
| New | Ollama binary installation on cluster not covered | New finding |
| Minor | Positron API key "leave blank" needs a fallback suggestion | Not yet addressed |
| Minor | curl | less terminal compatibility |
Not yet addressed |
The two important items and the new gap are the highest priority before merge.
|
Claude Code CLI (local session) is working on this — paws off until I'm done. |
- Add a callout that the Ollama binary must be installed on the cluster first, with no-root options (module load / static binary / admins). - Warn about HPC home-directory quotas and show OLLAMA_MODELS redirect to scratch, with a matching rsync target. - Explain the SSH tunnel must stay open and add a background -N -f form. - Positron API key: offer a placeholder fallback for clients that reject an empty field. - Replace 'curl | less' with save-then-page for terminal robustness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed all five items from the latest review:
The four inline items (comparative 'common way', @claude review |
|
Claude finished @d-morrison's task in 1m 14s —— View job PR Review
All five items from the previous review have been addressed. The document is in good shape. Verification of Previous Findings
New ObservationsMinor — OLLAMA_HOST=0.0.0.0:11434 ollama serveOn a shared compute node, binding to all interfaces means any other user logged in to the same node can send requests to the Ollama API. This is unlikely to be a critical issue in practice (HPC jobs typically get exclusive node allocation), but a one-sentence note would be appropriate: "If you are on a shared node, be aware that Nit — callout wording in the The sentence currently reads:
The parenthetical explains why not to pipe to
|
| # macOS / Linux (one-liner from ollama.com) | ||
| curl -fsSL https://ollama.com/install.sh | sh |
| Local models require significant hardware (typically a GPU with ≥8 GB VRAM), | ||
| are generally less capable than frontier cloud models, | ||
| and may produce lower-quality results on complex tasks. |
| Set `OLLAMA_HOST=0.0.0.0` so the SSH tunnel from the login node can reach the port: | ||
|
|
| (`-N` runs no remote command, `-f` backgrounds ssh after authenticating; | ||
| stop it later with `kill` or `pkill -f 11434:gpu-node-01`.) |
|
Claude Code CLI (local session) is working on this — paws off until I'm done. |
- Note that OLLAMA_HOST=0.0.0.0 exposes the port to other users on a shared compute node (rarely an issue with exclusive node allocation). - Rephrase the install-script callout to lead with the save-first workaround instead of a caveat about the piping approach it no longer uses. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed both new observations.
@claude review |
|
Claude finished @d-morrison's task in 1m 26s —— View job PR Review
Both observations from the previous review are fixed. Three items from the latest Copilot review (2026-06-03) remain open. Verification of Previous Observations
Remaining Copilot Review Items (2026-06-03)Bug — The callout at lines 29–37 explicitly tells readers to not pipe directly and to review the script first or use a safer alternative. The code block immediately below then shows: # macOS / Linux (one-liner from ollama.com)
curl -fsSL https://ollama.com/install.sh | shThis is internally inconsistent: the caution warns against the pattern, then the code block demonstrates it. Copilot flagged this in the latest review and it is still present. The code block should either be removed (the caution already gives the save-first command and mentions # After reviewing the script (see caution above):
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.shMinor — GPU requirement is overstated; CPU-only is possible (line 13) Copilot noted that Ollama can run entirely on CPU (just more slowly) and that hardware requirements vary by model size and quantization. Stating that a GPU is "typically required" is too strong. Suggested rephrasing:
Minor — stop it later with `kill` or `pkill -f 11434:gpu-node-01`Copilot flagged that ssh -N -f -L 11434:gpu-node-01:11434 user@cluster.example.edu
# To stop: record the PID above, then:
kill <PID>
# Or match only SSH background tunnels more specifically:
pkill -f "ssh.*-N.*11434:gpu-node-01"
|
| Severity | Issue | Status |
|---|---|---|
| Bug | curl | sh code block contradicts the preceding caution |
Not yet addressed |
| Minor | GPU requirement overstated; CPU-only inference is possible | Not yet addressed |
| Minor | pkill -f 11434:gpu-node-01 pattern too broad |
Not yet addressed |
The curl | sh inconsistency is the highest-priority item — the current text simultaneously warns against an unsafe pattern and then demonstrates it. The two minor items are polish. The document is otherwise in good shape and close to merge-ready.
|
Claude Code CLI (local session) is working on this — paws off until I'm done. |
- Replace the 'curl ... | sh' one-liner with a download-then-run two-step form so the install command no longer contradicts the save-first caution above it. - Soften the GPU 'requirement' wording: GPU works best, but CPU-only inference is possible (slower) and needs vary with model size/quantization. - Replace the over-broad 'pkill -f 11434:gpu-node-01' with a full-command pattern plus a pgrep/PID alternative so unrelated processes aren't killed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed the 3 remaining items from the latest review (commit fb93c8a):
The earlier Copilot items on the SSH forward target (already routes through the login node to @claude review |
|
Claude finished @d-morrison's task in 2m 1s —— View job PR Review
All three items from the previous review are correctly addressed. The document is in good shape and close to merge-ready. Two minor observations below. Verification of Previous Findings
New ObservationsNit — download-and-run command still chains without a review pause (line 43) # macOS / Linux: download first, review it (see caution above), then run
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.shThe curl -fsSL https://ollama.com/install.sh -o install.sh
less install.sh # review before running
sh install.shNot a blocker, but the current form makes it easy to bypass the step the callout is there to enforce. Minor — The callout in step 1 (lines 128–144) tells readers to set OLLAMA_HOST=0.0.0.0:11434 ollama servewith no mention of # If you redirected model storage (see quota warning above):
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MODELS=/scratch/$USER/ollama-models ollama serveSummary
The document is otherwise clean and all previous findings are resolved. Both items above are polish — neither is a blocker. |
|
Done with my local session — unclaiming. (Re-triggered @claude review on the latest commit.) |
|
Claude finished @d-morrison's task in 2m 3s —— View job PR Review
All three items from the previous Claude review have been addressed in commit Verification of Previous Findings
Remaining Open Items (from June 3 Copilot review, 23:00 UTC)Minor — Callout warns about piping into The caution currently reads:
Copilot's comment: the text warns about piping to
Minor — rsync copies full # Ollama stores models in ~/.ollama/models by default
rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/Copilot's comment: the rsync copies the entire Note: the quota-warning callout below (line 139) correctly uses Summary
Both items are minor polish rather than functional blockers. The document is otherwise in solid shape; the HPC workflow, tunnel lifetime guidance, quota warning, and binary-install note are all well-written. |
| Before running any remote install script, review it first. | ||
| Instead of piping directly into `less` (which can behave oddly in some | ||
| terminal emulators), save it first and then page it: | ||
| `curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh`. |
| # On a machine with internet access | ||
| ollama pull qwen2.5-coder:7b | ||
| # Ollama stores models in ~/.ollama/models by default | ||
| rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/ | ||
| ``` |
|
Claude Code CLI (local session) is working on this — paws off until I'm done. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| # macOS / Linux: download first, review it (see caution above), then run | ||
| curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh |
| # Ollama stores models in ~/.ollama/models by default | ||
| rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/ | ||
| ``` |
| # Store models outside your home directory (recommended on HPC) | ||
| export OLLAMA_MODELS=/scratch/$USER/ollama-models | ||
| rsync -a ~/.ollama/models/ user@cluster.example.edu:/scratch/$USER/ollama-models/ |
| If you are on a shared compute node, be aware that binding to `0.0.0.0` | ||
| exposes the Ollama port to other users on that host; most HPC clusters | ||
| grant exclusive node allocation, so this is rarely a concern in practice. |
| # Smaller, faster; requires ~8 GB VRAM; works on most machines with a modern GPU or Apple Silicon | ||
| ollama pull qwen2.5-coder:7b | ||
|
|
||
| # More capable; requires ~20 GB VRAM | ||
| ollama pull qwen2.5-coder:32b | ||
|
|
||
| # Alternatively, a general-purpose model (70B variant; requires ~40 GB VRAM---unusable on most single GPUs) | ||
| ollama pull llama3.3 |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@claude review |
|
Claude finished @d-morrison's task in 2m 59s —— View job PR Review
The latest commit (
|
| Severity | Issue | Status |
|---|---|---|
| Minor | Install snippet chains download+run without a review pause | Not yet addressed |
| Minor | Callout warns about less instead of sh as the security risk |
Not yet addressed |
| Minor | rsync copies full ~/.ollama/ but comment says model files only |
Not yet addressed |
| Minor | export OLLAMA_MODELS mixed with local rsync in same code block |
Not yet addressed |
| Minor | Step 2 ollama serve missing OLLAMA_MODELS for quota-redirect users |
Not yet addressed |
| Minor | 0.0.0.0 note implies exclusive node allocation is the norm |
Not yet addressed |
| Nit | VRAM figures unsourced; will become outdated | Not yet addressed |
None of these are functional blockers on their own — the document gives correct, actionable guidance. The most confusing items for readers are the mixed local/cluster code block (#4 and #5 together) and the less vs sh misdirection in the callout (#2).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| ```bash | ||
| # macOS / Linux: download first, review it (see caution above), then run | ||
| curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh | ||
| ``` |
- Callout now leads with the real risk (piping curl into sh/bash), with the less-vs-saved-file note as a secondary aside. - Split the mixed local/cluster code block into a local rsync block and a cluster-side OLLAMA_MODELS export block, each labeled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds a new "Running Coding Agents Offline" section to the AI tools chapter, covering Ollama setup, Positron/VS Code integration, HPC cluster usage with SSH port forwarding, and data-privacy considerations.
Closes #280
Generated with Claude Code