docs: add instructions for running coding agents offline by d-morrison · Pull Request #281 · UCD-SERG/lab-manual

d-morrison · 2026-05-24T22:55:04Z

Adds a new "Running Coding Agents Offline" section to the AI tools chapter, covering Ollama setup, Positron/VS Code integration, HPC cluster usage with SSH port forwarding, and data-privacy considerations.

Closes #280

Generated with Claude Code

Add a new section "Running Coding Agents Offline" to the AI tools chapter. Covers Ollama setup, connecting Positron and VS Code (via Continue) to a local model, using local models on HPC/SLURM clusters with port forwarding, and data-privacy considerations. Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>

Copilot

Pull request overview

Adds documentation to the AI tools chapter for using coding agents without internet access,
focusing on running local models (Ollama) and integrating them into common editor workflows.

Changes:

Adds a new include file describing offline/local-agent workflows (Ollama setup, editor integration, HPC usage, privacy).
Inserts a new “Running Coding Agents Offline” section into ai-tools.qmd that includes the new content.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
ai-tools/running-agents-offline.qmd	New offline/local-agent guidance (Ollama, Positron/VS Code integration, HPC workflow, privacy notes).
ai-tools.qmd	Adds a new section entry and includes the new offline guidance file.

d-morrison · 2026-05-24T23:00:18Z

@claude address feedback

claude · 2026-05-24T23:00:31Z

Claude finished @d-morrison's task in 1m 32s —— View job

Addressing Copilot Review Feedback

Read ai-tools/running-agents-offline.qmd
Fix "most common way" → "a common way" (line 23)
Add security caution for piping remote script to sh (line 32)
Fix SSH port-forwarding example to target compute node (line 107)
Use @sec-slurm cross-reference instead of direct file link (line 118)
Commit and push changes

All 4 copilot review comments have been addressed in commit fc679bf:

"most common" claim — Changed to "a common way" to avoid an unsupported comparative claim.
Install script security — Added a .callout-caution block before the curl-pipe-sh one-liner, advising users to review the script first and pointing to brew install ollama and the Ollama releases page as safer alternatives.
SSH port-forwarding target — Fixed the example from localhost:11434 (login node) to gpu-node-01:11434, so the tunnel forwards through the login node to the actual compute node running ollama serve, with a comment to replace the placeholder hostname.
SLURM cross-reference — Replaced [SLURM chapter](slurm.qmd) with @sec-slurm to match the convention used elsewhere in the manual (e.g., data-masking.qmd).
|

- Remove comparative claim ("most common") about Ollama - Add security caution for piping remote install script to sh - Fix SSH port-forwarding example to target compute node, not login node - Replace direct file link with @sec-slurm cross-reference Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>

github-actions · 2026-05-24T23:02:34Z

PR Preview Action v1.8.1-2-g6ad689f
🚀 View preview at https://UCD-SERG.github.io/lab-manual/pr-preview/pr-281/
Built to branch `gh-pages` at 2026-06-18 03:27 UTC. Preview will be ready when the GitHub Pages deployment is complete.

d-morrison · 2026-05-25T00:19:57Z

@claude review

claude · 2026-05-25T00:20:12Z

Claude finished @d-morrison's task in 1m 26s —— View job

PR Review

Overall this is a well-structured, useful addition. The previous Copilot feedback has been addressed. I found a few issues worth fixing before merge.

`ai-tools/running-agents-offline.qmd`

Bug — incorrect VRAM claim for qwen2.5-coder:32b (line 50)

# More capable; requires ~8 GB VRAM
ollama pull qwen2.5-coder:32b

A 32B-parameter model in 4-bit quantization requires roughly 20 GB VRAM, not 8 GB. The ~8 GB VRAM comment belongs to the 7b model above. This should either:

Move the VRAM comment to the 7b line, or
State the correct requirement for 32b (e.g., ~20 GB VRAM)

Fix this →

Bug — SSH tunnel will silently fail without OLLAMA_HOST (lines 112–119)

The HPC section has the tunnel:

ssh -L 11434:gpu-node-01:11434 user@cluster.example.edu

This forwards localhost:11434 → login-node → gpu-node-01:11434. However, by default Ollama binds only to 127.0.0.1 on the compute node, so gpu-node-01:11434 as seen from the login node is unreachable — the connection will be refused. Users need to set OLLAMA_HOST=0.0.0.0:11434 when starting the server on the compute node:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

This should be noted in step 2 or alongside the ssh command in step 3.

Fix this →

Minor — step 2 in HPC section lacks an actual command (line 109–111)

2. **Start Ollama on a compute node** (or an interactive session)
   using the pre-downloaded model files—no internet required.

This step describes an action but shows no command. A code block here (even just ollama serve) would make the workflow concrete and scannable. This also links naturally to the OLLAMA_HOST fix above.

Minor — llama3.3 pulls a 70B model without a VRAM warning (line 54)

# Alternatively, a general-purpose model
ollama pull llama3.3

llama3.3 defaults to the 70B variant, which requires ~40 GB VRAM — unusable on most single GPUs. This may be surprising to readers following the section's theme of "smaller/faster" models. Either note the VRAM requirement, suggest a smaller variant (e.g., llama3.2:3b), or remove this example.

`ai-tools.qmd`

The integration change (lines 75–77) is clean. The {#sec-ai-offline} anchor is good practice and the placement immediately after the "Firewall and Network Configuration" section is logical.

Summary

Severity	Issue	Location
Bug	`qwen2.5-coder:32b` VRAM comment says ~8 GB (should be ~20 GB)	line 50
Bug	SSH tunnel fails unless `OLLAMA_HOST=0.0.0.0` is set on compute node	lines 109–119
Minor	Step 2 of HPC workflow missing actual shell command	lines 109–111
Minor	`llama3.3` default is 70B; no VRAM warning given	line 54

The two bugs should be addressed before merge; the minor items are suggestions.

d-morrison · 2026-05-25T00:25:51Z

@claude address review comments

claude · 2026-05-25T00:26:04Z

Claude finished @d-morrison's task in 1m 11s —— View job

Addressing Review Comments

Read ai-tools/running-agents-offline.qmd
Fix qwen2.5-coder:32b VRAM comment (8 GB → ~20 GB); move ~8 GB note to 7b line
Add OLLAMA_HOST=0.0.0.0:11434 ollama serve command to HPC step 2
HPC step 2 now has actual shell command + note explaining why 0.0.0.0 is required
Add VRAM warning for llama3.3 (70B variant; ~40 GB VRAM)
Commit and push (41fe8cb)

All four issues from the review have been addressed:

Issue	Fix
`qwen2.5-coder:32b` claimed ~8 GB VRAM (should be ~20 GB)	Moved ~8 GB note to 7b line; updated 32b to ~20 GB
SSH tunnel silently fails without `OLLAMA_HOST=0.0.0.0`	Added `OLLAMA_HOST=0.0.0.0:11434 ollama serve` code block to step 2 with explanation
HPC step 2 missing shell command	Added `OLLAMA_HOST=0.0.0.0:11434 ollama serve` code block
`llama3.3` pulls 70B with no VRAM warning	Added "(70B variant; requires ~40 GB VRAM—unusable on most single GPUs)" to comment

|

- Move ~8 GB VRAM note to qwen2.5-coder:7b line (was incorrectly on 32b) - Fix qwen2.5-coder:32b VRAM requirement to ~20 GB - Add VRAM warning for llama3.3 (70B variant, ~40 GB) - Add OLLAMA_HOST=0.0.0.0:11434 command to HPC step 2 so SSH tunnel can reach compute node Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>

d-morrison · 2026-05-25T01:02:38Z

@claude review

claude · 2026-05-25T01:02:50Z

Claude finished @d-morrison's task in 1m 52s —— View job

PR Review

All four issues from the previous review and all four Copilot comments have been addressed. The document is in good shape. I found two practical gaps that are likely to trip up readers, plus two minor notes.

`ai-tools/running-agents-offline.qmd`

Important — SSH tunnel session must stay open; no guidance given (lines 119–124)

The SSH command in step 3:

ssh -L 11434:gpu-node-01:11434 user@cluster.example.edu

opens an interactive login shell. The tunnel only lives as long as that shell does — if the user closes the terminal or the session times out, the tunnel drops and the editor silently loses connection. Readers who are new to SSH tunneling will hit this and not know why. Either add a note ("keep this terminal open while using your editor") or show the non-blocking form:

ssh -N -f -L 11434:gpu-node-01:11434 user@cluster.example.edu

(-N = no remote command, -f = background before executing)

Fix this →

Important — HPC home-directory quota not mentioned (lines 100–106)

The rsync step copies models to ~/.ollama/ on the cluster:

rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

On most HPC systems, home directories have tight quotas (commonly 10–50 GB), while qwen2.5-coder:7b is ~4 GB and qwen2.5-coder:32b is ~20 GB. Users who follow this literally may fill their home quota and break other things on the cluster.

Consider adding a note that OLLAMA_MODELS can redirect model storage to a scratch or project filesystem:

# Store models outside your home directory (recommended on HPC)
export OLLAMA_MODELS=/scratch/$USER/ollama-models

and adjust the rsync target accordingly.

Fix this →

Minor — Positron API key field may not accept blank (line 82)

Step 4 says "leave the API key blank". Some OpenAI-compatible clients require a non-empty value in that field even if the server ignores it. Worth adding a fallback: "leave the API key blank, or enter any placeholder value (e.g., ollama)".

Minor — curl | less may not page correctly in all terminals (line 30)

The caution block suggests:

curl -fsSL https://ollama.com/install.sh | less

Piping directly to less works in most terminals but can behave oddly in some HPC environments or terminal emulators. A slightly more robust alternative is curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh, which saves the file first. Not a blocker, but worth noting.

`ai-tools.qmd`

The include and section header are clean. {#sec-ai-offline} follows the existing anchoring convention.

Summary

Severity	Issue	Location
Important	SSH tunnel session lifetime not explained; may confuse beginners	lines 119–124
Important	No warning about HPC home-directory quota for large model files	lines 100–106
Minor	Positron API key field may need a placeholder value	line 82
Minor	`curl \| less` can behave oddly in some terminals	line 30

The two important items are practical gaps that will affect readers attempting to follow the guide on real HPC systems. The minor items are polish.

d-morrison · 2026-06-03T22:03:12Z

@claude review

claude · 2026-06-03T22:04:22Z

Claude finished @d-morrison's task in 1m 53s —— View job

PR Review

The two bugs from the previous review (VRAM comment, OLLAMA_HOST) have been resolved. However, the two "Important" items from the most recent review are still present in the current file, along with the two "Minor" items. I also found one new gap worth noting.

`ai-tools/running-agents-offline.qmd`

Important — SSH tunnel session lifetime still not explained (lines 117–126)

The current text:

# Replace gpu-node-01 with your actual compute node hostname
ssh -L 11434:gpu-node-01:11434 user@cluster.example.edu

There is still no guidance that this terminal must remain open while the editor is in use. A new user will close the terminal and silently lose the Ollama connection. Please add either a note ("keep this terminal open") or show the background form:

ssh -N -f -L 11434:gpu-node-01:11434 user@cluster.example.edu

Fix this →

Important — HPC home-directory quota still not warned about (lines 99–107)

The current rsync step:

rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

copies model files into ~/.ollama/ on the cluster. qwen2.5-coder:7b is ~4 GB and qwen2.5-coder:32b is ~20 GB. Most HPC home directories have tight quotas (10–50 GB) and filling them breaks other jobs. A callout or inline note should warn about this and show how to redirect model storage:

# Store models outside your home directory (recommended on HPC)
export OLLAMA_MODELS=/scratch/$USER/ollama-models

The rsync target would need to match the chosen path.

Fix this →

New — Ollama binary installation on the cluster not addressed (lines 93–135)

The HPC section shows how to rsync model files to the cluster but never addresses how to install the Ollama binary on the cluster. On most HPC systems there is no root access, so the standard curl | sh installer may fail or install to the wrong location. Users who follow this guide step-by-step will have model files but no ollama serve to run. The section should either:

Briefly note that Ollama must be installed on the cluster (e.g., download a static binary from the Ollama releases page), or
Instruct users to check with their HPC admins or use module load ollama if available.

Fix this →

Minor — Positron API key field still only says "leave blank" (line 74)

Set the base URL to `http://localhost:11434/v1` and leave the API key blank.

Some OpenAI-compatible clients reject an empty API key field. Suggest adding a fallback: "leave the API key blank, or enter any placeholder value such as ollama."

Minor — curl | less terminal compatibility (line 31)

The caution block advises:

curl -fsSL https://ollama.com/install.sh | less

Piping directly to less can behave oddly in some HPC terminal emulators. A slightly more robust alternative:

curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh

Not a blocker, but worth noting for the HPC audience this section targets.

`ai-tools.qmd`

The {#sec-ai-offline} anchor and include placement remain clean and correct.

Summary

Severity	Issue	Status
Important	SSH tunnel session lifetime unexplained	Not yet addressed
Important	No HPC home-directory quota warning	Not yet addressed
New	Ollama binary installation on cluster not covered	New finding
Minor	Positron API key "leave blank" needs a fallback suggestion	Not yet addressed
Minor	`curl \| less` terminal compatibility	Not yet addressed

The two important items and the new gap are the highest priority before merge.

d-morrison · 2026-06-03T22:15:44Z

Claude Code CLI (local session) is working on this — paws off until I'm done.

- Add a callout that the Ollama binary must be installed on the cluster first, with no-root options (module load / static binary / admins). - Warn about HPC home-directory quotas and show OLLAMA_MODELS redirect to scratch, with a matching rsync target. - Explain the SSH tunnel must stay open and add a background -N -f form. - Positron API key: offer a placeholder fallback for clients that reject an empty field. - Replace 'curl | less' with save-then-page for terminal robustness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

d-morrison · 2026-06-03T22:19:06Z

Addressed all five items from the latest review:

Important — SSH tunnel session lifetime: added a note that the tunnel terminal must stay open while the editor is in use, plus a background ssh -N -f -L ... alternative (with how to kill it).
Important — HPC home-directory quota: added a callout warning that models are large (~4 GB / ~20 GB) and home quotas are tight, showing how to redirect storage with OLLAMA_MODELS=/scratch/$USER/ollama-models and a matching rsync target.
New — Ollama binary install on the cluster: added a callout before step 1 that the binary must be installed on the host running ollama serve, with no-root options (module load ollama, HPC admins, or a static binary from the releases page).
Minor — Positron API key 'leave blank': added a placeholder fallback (ollama) for clients that reject an empty field.
Minor — curl | less compatibility: replaced with curl ... -o install.sh && less install.sh.

The four inline items (comparative 'common way', curl | sh security caution, SSH forwarding through the login node to the compute node, and the @sec-slurm cross-reference) were already resolved in the current file.

@claude review

claude · 2026-06-03T22:21:41Z

Claude finished @d-morrison's task in 1m 14s —— View job

PR Review

All five items from the previous review have been addressed. The document is in good shape.

Verification of Previous Findings

Severity	Issue	Status
Important	SSH tunnel session lifetime unexplained	Fixed — lines 161–171 add a "must stay open" note and the background `ssh -N -f` alternative with kill instructions
Important	No HPC home-directory quota warning	Fixed — lines 126–142 add a `.callout-warning` with `OLLAMA_MODELS` redirect example and corrected rsync target
New	Ollama binary install on cluster not covered	Fixed — lines 101–112 add a `.callout-note` before step 1 covering `module load`, HPC admin, and static binary options
Minor	Positron API key "leave blank" needs fallback	Fixed — line 77 now says "or, if the client rejects an empty field, enter any placeholder value such as `ollama`"
Minor	`curl \| less` terminal compatibility	Fixed — line 33 now uses `curl ... -o install.sh && less install.sh`

New Observations

Minor — OLLAMA_HOST=0.0.0.0 may expose the API to other cluster users (line 149)

OLLAMA_HOST=0.0.0.0:11434 ollama serve

On a shared compute node, binding to all interfaces means any other user logged in to the same node can send requests to the Ollama API. This is unlikely to be a critical issue in practice (HPC jobs typically get exclusive node allocation), but a one-sentence note would be appropriate: "If you are on a shared node, be aware that 0.0.0.0 exposes the port to other users on that host."

Fix this →

Nit — callout wording in the curl | less caution is slightly circular (lines 31–33)

The sentence currently reads:

Save it and open it in a pager (piping curl straight into less can behave oddly in some terminal emulators): curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh.

The parenthetical explains why not to pipe to less, but then the very command shown pipes to less via the save-first workaround — which is fine, but the parenthetical reads as a caveat about an approach the callout itself no longer uses. Consider dropping the parenthetical or rephasing to "instead of piping directly into less (which can behave oddly), save it first and then page it".

`ai-tools.qmd`

The section anchor {#sec-ai-offline} and include placement remain clean.

Summary

The two bugs and new gap from earlier reviews are resolved. The only remaining item of any substance is the OLLAMA_HOST=0.0.0.0 shared-node security note (minor). The document is ready to merge; the two observations above are polish, not blockers.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

+# macOS / Linux (one-liner from ollama.com)
+curl -fsSL https://ollama.com/install.sh | sh


+Local models require significant hardware (typically a GPU with ≥8 GB VRAM),
+are generally less capable than frontier cloud models,
+and may produce lower-quality results on complex tasks.


+   Set `OLLAMA_HOST=0.0.0.0` so the SSH tunnel from the login node can reach the port:
+


+   (`-N` runs no remote command, `-f` backgrounds ssh after authenticating;
+   stop it later with `kill` or `pkill -f 11434:gpu-node-01`.)


d-morrison · 2026-06-03T22:30:06Z

Claude Code CLI (local session) is working on this — paws off until I'm done.

- Note that OLLAMA_HOST=0.0.0.0 exposes the port to other users on a shared compute node (rarely an issue with exclusive node allocation). - Rephrase the install-script callout to lead with the save-first workaround instead of a caveat about the piping approach it no longer uses. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

d-morrison · 2026-06-03T22:31:38Z

Addressed both new observations.

Shared-node security (line ~149): Added a sentence after the OLLAMA_HOST=0.0.0.0 block noting that binding to 0.0.0.0 exposes the port to other users on a shared compute node, with the caveat that exclusive node allocation makes this rare in practice.
curl | less callout nit (lines ~31–33): Rephrased to lead with the save-first workaround ("Instead of piping directly into less (which can behave oddly)…, save it first and then page it") so it no longer reads as a caveat about an approach the callout doesn't use.

@claude review

claude · 2026-06-03T22:31:50Z

Claude finished @d-morrison's task in 1m 26s —— View job

PR Review

Both observations from the previous review are fixed. Three items from the latest Copilot review (2026-06-03) remain open.

Verification of Previous Observations

Severity	Issue	Status
Minor	`OLLAMA_HOST=0.0.0.0` shared-node security note	Fixed — lines 152–154 add the note with an "exclusive allocation" caveat
Nit	`curl \| less` callout phrasing was circular	Fixed — callout now leads with "Instead of piping directly into `less`…, save it first and then page it"

Remaining Copilot Review Items (2026-06-03)

Bug — curl | sh code block contradicts the preceding caution (line 41)

The callout at lines 29–37 explicitly tells readers to not pipe directly and to review the script first or use a safer alternative. The code block immediately below then shows:

# macOS / Linux (one-liner from ollama.com)
curl -fsSL https://ollama.com/install.sh | sh

This is internally inconsistent: the caution warns against the pattern, then the code block demonstrates it. Copilot flagged this in the latest review and it is still present. The code block should either be removed (the caution already gives the save-first command and mentions brew install ollama) or replaced with the safe form:

# After reviewing the script (see caution above):
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

Fix this →

Minor — GPU requirement is overstated; CPU-only is possible (line 13)

Local models require significant hardware (typically a GPU with ≥8 GB VRAM)

Copilot noted that Ollama can run entirely on CPU (just more slowly) and that hardware requirements vary by model size and quantization. Stating that a GPU is "typically required" is too strong. Suggested rephrasing:

"Local models work best with a GPU (≥8 GB VRAM for smaller models); CPU-only inference is possible but significantly slower."

Fix this →

Minor — pkill -f 11434:gpu-node-01 pattern is too broad (line 175)

stop it later with `kill` or `pkill -f 11434:gpu-node-01`

Copilot flagged that -f 11434:gpu-node-01 matches against the full process command line and could accidentally kill unrelated SSH processes that happen to contain that string. A safer approach is to capture the PID when launching:

ssh -N -f -L 11434:gpu-node-01:11434 user@cluster.example.edu
# To stop: record the PID above, then:
kill <PID>
# Or match only SSH background tunnels more specifically:
pkill -f "ssh.*-N.*11434:gpu-node-01"

Fix this →

`ai-tools.qmd`

The section anchor and include placement remain clean.

Summary

Severity	Issue	Status
Bug	`curl \| sh` code block contradicts the preceding caution	Not yet addressed
Minor	GPU requirement overstated; CPU-only inference is possible	Not yet addressed
Minor	`pkill -f 11434:gpu-node-01` pattern too broad	Not yet addressed

The curl | sh inconsistency is the highest-priority item — the current text simultaneously warns against an unsafe pattern and then demonstrates it. The two minor items are polish. The document is otherwise in good shape and close to merge-ready.

d-morrison · 2026-06-03T22:55:02Z

Claude Code CLI (local session) is working on this — paws off until I'm done.

- Replace the 'curl ... | sh' one-liner with a download-then-run two-step form so the install command no longer contradicts the save-first caution above it. - Soften the GPU 'requirement' wording: GPU works best, but CPU-only inference is possible (slower) and needs vary with model size/quantization. - Replace the over-broad 'pkill -f 11434:gpu-node-01' with a full-command pattern plus a pgrep/PID alternative so unrelated processes aren't killed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

d-morrison · 2026-06-03T22:57:31Z

Addressed the 3 remaining items from the latest review (commit fb93c8a):

curl ... | sh contradicted the caution (line 41) — replaced the one-liner with the download-then-run two-step form (curl ... -o install.sh && sh install.sh), consistent with the save-first caution above it.
GPU requirement overstated (line 13) — rephrased: local models work best with a GPU (≥8 GB VRAM for smaller models), but CPU-only inference is possible (slower), and needs vary with model size/quantization.
pkill -f 11434:gpu-node-01 too broad (line 175) — replaced with a full-command pattern (pkill -f "ssh.*-N.*11434:gpu-node-01") plus a safer pgrep/PID alternative.

The earlier Copilot items on the SSH forward target (already routes through the login node to gpu-node-01) and the @sec-slurm cross-reference (already used at line 185) were resolved in prior rounds.

@claude review

claude · 2026-06-03T22:58:16Z

Claude finished @d-morrison's task in 2m 1s —— View job

PR Review

All three items from the previous review are correctly addressed. The document is in good shape and close to merge-ready. Two minor observations below.

Verification of Previous Findings

Severity	Issue	Status
Bug	`curl \| sh` code block contradicted the preceding caution	Fixed — line 43 now uses `curl ... -o install.sh && sh install.sh`; callout rephrased to lead with save-first
Minor	GPU requirement overstated; CPU-only inference is possible	Fixed — lines 13–15 now say "work best with a GPU…; CPU-only is possible but slower; needs vary with size/quantization"
Minor	`pkill -f 11434:gpu-node-01` pattern too broad	Fixed — lines 178–181 now use `pkill -f "ssh.-N.11434:gpu-node-01"` with the `pgrep`/PID alternative

New Observations

Nit — download-and-run command still chains without a review pause (line 43)

# macOS / Linux: download first, review it (see caution above), then run
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

The && chains download directly to execution — a reader who just pastes and runs this block still skips the review the callout recommends. Splitting into two separate lines would match the save-then-review-then-run intent:

curl -fsSL https://ollama.com/install.sh -o install.sh
less install.sh  # review before running
sh install.sh

Not a blocker, but the current form makes it easy to bypass the step the callout is there to enforce.

Fix this →

Minor — OLLAMA_MODELS path not reflected in step 2's ollama serve command (lines 133–151)

The callout in step 1 (lines 128–144) tells readers to set OLLAMA_MODELS=/scratch/$USER/ollama-models and says "Set the same OLLAMA_MODELS value before running ollama serve." Step 2 then shows:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

with no mention of OLLAMA_MODELS. A reader who followed the callout's advice must figure out on their own how to combine both env vars. Adding a note or combined example to step 2 would close this gap:

# If you redirected model storage (see quota warning above):
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MODELS=/scratch/$USER/ollama-models ollama serve

Fix this →

Summary

Severity	Issue	Status
Nit	Install code block chains download+run without a review pause	New
Minor	Step 2 `ollama serve` missing `OLLAMA_MODELS` for readers who followed the quota callout	New

The document is otherwise clean and all previous findings are resolved. Both items above are polish — neither is a blocker.

d-morrison · 2026-06-03T22:58:48Z

Done with my local session — unclaiming. (Re-triggered @claude review on the latest commit.)

claude · 2026-06-03T23:00:15Z

Claude finished @d-morrison's task in 2m 3s —— View job

PR Review

All three items from the previous Claude review have been addressed in commit fb93c8a. Two items from the June 3 Copilot review (23:00 UTC) are still open.

Verification of Previous Findings

Severity	Issue	Status
Bug	`curl \| sh` code block contradicted the preceding caution	Fixed — line 43 now uses `curl ... -o install.sh && sh install.sh`
Minor	GPU requirement overstated	Fixed — lines 13–15 now say "work best with a GPU... CPU-only inference is possible but significantly slower"
Minor	`pkill -f 11434:gpu-node-01` pattern too broad	Fixed — lines 177–181 now use `pkill -f "ssh.-N.11434:gpu-node-01"` and offer the `pgrep`/PID alternative

Remaining Open Items (from June 3 Copilot review, 23:00 UTC)

Minor — Callout warns about piping into less, but the actual risk is piping into sh (line 33–35)

The caution currently reads:

Before running any remote install script, review it first.
Instead of piping directly into less (which can behave oddly in some terminal emulators), save it first and then page it: ...

Copilot's comment: the text warns about piping to less (a viewer — completely safe) rather than piping to sh/bash (the actual security risk). The phrasing "Instead of piping directly into less" reads as if less is the dangerous step, not sh. The callout structure should lead with the sh-pipe risk, then mention the less-pipe compatibility note as a separate sub-point. For example:

Before running any remote install script, pipe it to sh is a security risk — review it first. Save it and page it to inspect it (less piped directly can behave oddly in some terminals): curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh.

Fix this →

Minor — rsync copies full ~/.ollama/ but comment says only model files (lines 124–125)

# Ollama stores models in ~/.ollama/models by default
rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

Copilot's comment: the rsync copies the entire ~/.ollama/ directory (which includes config, manifests, blobs, and other state), while the comment implies only model files are being transferred. This is inconsistent and may copy more than intended. The rsync source should be narrowed to ~/.ollama/models/ with a matching destination, or the comment should be updated to reflect that the full directory is intentionally copied.

Note: the quota-warning callout below (line 139) correctly uses ~/.ollama/models/ as the source, so the fix is to make the primary step 1 rsync consistent with it.

Fix this →

Summary

Severity	Issue	Status
Minor	Callout warns about `less` instead of `sh` as the security risk	Not yet addressed
Minor	rsync copies full `~/.ollama/` but comment says only model files	Not yet addressed

Both items are minor polish rather than functional blockers. The document is otherwise in solid shape; the HPC workflow, tunnel lifetime guidance, quota warning, and binary-install note are all well-written.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

+Before running any remote install script, review it first.
+Instead of piping directly into `less` (which can behave oddly in some
+terminal emulators), save it first and then page it:
+`curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh`.


+   # On a machine with internet access
+   ollama pull qwen2.5-coder:7b
+   # Ollama stores models in ~/.ollama/models by default
+   rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/
+   ```


d-morrison · 2026-06-17T17:20:29Z

Claude Code CLI (local session) is working on this — paws off until I'm done.

…0524-1932

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

+# macOS / Linux: download first, review it (see caution above), then run
+curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh


+   # Ollama stores models in ~/.ollama/models by default
+   rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/
+   ```


+   # Store models outside your home directory (recommended on HPC)
+   export OLLAMA_MODELS=/scratch/$USER/ollama-models
+   rsync -a ~/.ollama/models/ user@cluster.example.edu:/scratch/$USER/ollama-models/


+   If you are on a shared compute node, be aware that binding to `0.0.0.0`
+   exposes the Ollama port to other users on that host; most HPC clusters
+   grant exclusive node allocation, so this is rarely a concern in practice.


+# Smaller, faster; requires ~8 GB VRAM; works on most machines with a modern GPU or Apple Silicon
+ollama pull qwen2.5-coder:7b
+
+# More capable; requires ~20 GB VRAM
+ollama pull qwen2.5-coder:32b
+
+# Alternatively, a general-purpose model (70B variant; requires ~40 GB VRAM---unusable on most single GPUs)
+ollama pull llama3.3


Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

d-morrison · 2026-06-18T02:59:08Z

@claude review

claude · 2026-06-18T02:59:24Z

Claude finished @d-morrison's task in 2m 59s —— View job

PR Review

The latest commit (08040ae) fixed a broken link in the Ollama OpenAI-compatibility URL — good catch. The five issues flagged in the June 17 Copilot review remain open in the current file, along with two items carried over from the earlier Claude reviews.

`ai-tools/running-agents-offline.qmd`

Minor — install code block still chains download and run without a review pause (line 43)

# macOS / Linux: download first, review it (see caution above), then run
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

The && runs sh install.sh immediately after the download — a reader who copy-pastes this skips the review the callout recommends. The comment says "review it (see caution above)" but the chained command makes that easy to miss. Splitting into separate lines matches the intent:

curl -fsSL https://ollama.com/install.sh -o install.sh
less install.sh   # review before running
sh install.sh

Fix this →

Minor — callout warns about piping into less rather than sh as the security risk (lines 33–35)

Instead of piping directly into `less` (which can behave oddly in some
terminal emulators), save it first and then page it:

less is a viewer — completely safe to pipe into. The actual security risk is piping into sh/bash. The current phrasing misleads readers into thinking less is the dangerous step. The callout should lead with the sh-pipe risk and mention the less-compatibility note as a secondary ergonomic aside. For example:

Before running any remote install script, review it first — piping directly into sh or bash executes it without inspection. Save it first and then page it (note: piping curl directly into less can behave oddly in some terminals): curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh.

Fix this →

Minor — rsync copies full ~/.ollama/ but comment says only model files (line 125)

# Ollama stores models in ~/.ollama/models by default
rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

The comment specifically names ~/.ollama/models, but the rsync source is ~/.ollama/ (the entire directory, including config, manifest blobs, and other state). The quota-warning callout immediately below already uses the correct narrowed path (~/.ollama/models/ → /scratch/$USER/ollama-models/). The primary step 1 command should match:

rsync -a ~/.ollama/models/ user@cluster.example.edu:~/.ollama/models/

Fix this →

Minor — export OLLAMA_MODELS appears in the same block as the local rsync command (lines 137–140)

# Store models outside your home directory (recommended on HPC)
export OLLAMA_MODELS=/scratch/$USER/ollama-models
rsync -a ~/.ollama/models/ user@cluster.example.edu:/scratch/$USER/ollama-models/

This code block mixes a cluster-side environment variable (export OLLAMA_MODELS=..., which sets where ollama serve looks for models on the cluster) with a local rsync command (which runs on the user's workstation). A reader could easily export OLLAMA_MODELS locally and then wonder why it has no effect. The two actions should be separated — either into distinct blocks with labels ("On your local machine:" / "On the cluster:") or by moving the export note to step 2 where ollama serve is launched.

Fix this →

Minor — OLLAMA_MODELS not shown in step 2's ollama serve command (lines 150–151)

OLLAMA_HOST=0.0.0.0:11434 ollama serve

The quota-warning callout in step 1 says "Set the same OLLAMA_MODELS value before running ollama serve", but step 2's code block only shows OLLAMA_HOST. A reader who followed the quota redirect must piece together how to combine both env vars. A combined example (even as a commented alternative) closes this gap:

# If you redirected model storage (see quota warning above):
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MODELS=/scratch/$USER/ollama-models ollama serve

Fix this →

Minor — 0.0.0.0 risk downplayed; scheduler exclusivity is not universal (lines 154–156)

If you are on a shared compute node, be aware that binding to `0.0.0.0`
exposes the Ollama port to other users on that host; most HPC clusters
grant exclusive node allocation, so this is rarely a concern in practice.

"Most HPC clusters grant exclusive node allocation" is not universally true — shared-node jobs (e.g., array jobs, interactive sessions with small resource requests) are common on many systems. Presenting this as "rarely a concern in practice" may give users false confidence. A safer framing is to recommend checking with the cluster documentation or requesting exclusive allocation when in doubt:

If you are on a shared compute node, be aware that binding to 0.0.0.0 exposes the Ollama port to other users on that host. Check your cluster's scheduler documentation to confirm whether your job gets exclusive node access, or request it explicitly (e.g., --exclusive in SLURM).

Fix this →

Nit — VRAM figures are specific but unsourced (lines 51–58)

# Smaller, faster; requires ~8 GB VRAM; works on most machines with a modern GPU or Apple Silicon
# More capable; requires ~20 GB VRAM
# 70B variant; requires ~40 GB VRAM---unusable on most single GPUs

These numbers (especially "unusable on most single GPUs") will become outdated as models are re-quantized and new hardware ships. Linking to the Ollama model page or noting that these are approximate figures for default quantization would give readers a stable source to check current requirements.

`ai-tools.qmd`

The section anchor {#sec-ai-offline} and include placement remain clean.

Summary

Severity	Issue	Status
Minor	Install snippet chains download+run without a review pause	Not yet addressed
Minor	Callout warns about `less` instead of `sh` as the security risk	Not yet addressed
Minor	rsync copies full `~/.ollama/` but comment says model files only	Not yet addressed
Minor	`export OLLAMA_MODELS` mixed with local rsync in same code block	Not yet addressed
Minor	Step 2 `ollama serve` missing `OLLAMA_MODELS` for quota-redirect users	Not yet addressed
Minor	`0.0.0.0` note implies exclusive node allocation is the norm	Not yet addressed
Nit	VRAM figures unsourced; will become outdated	Not yet addressed

None of these are functional blockers on their own — the document gives correct, actionable guidance. The most confusing items for readers are the mixed local/cluster code block (#4 and #5 together) and the less vs sh misdirection in the callout (#2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

+```bash
+# macOS / Linux: download first, review it (see caution above), then run
+curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh
+```


- Callout now leads with the real risk (piping curl into sh/bash), with the less-vs-saved-file note as a secondary aside. - Split the mixed local/cluster code block into a local rsync block and a cluster-side OLLAMA_MODELS export block, each labeled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 24, 2026 22:55

Copilot started reviewing on behalf of d-morrison May 24, 2026 22:55 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread ai-tools/running-agents-offline.qmd Outdated

Comment thread ai-tools/running-agents-offline.qmd

Comment thread ai-tools/running-agents-offline.qmd Outdated

Comment thread ai-tools/running-agents-offline.qmd Outdated

Copilot AI review requested due to automatic review settings May 25, 2026 00:27

d-morrison removed the request for review from Copilot May 25, 2026 00:27

Copilot AI review requested due to automatic review settings June 3, 2026 22:18

Copilot started reviewing on behalf of d-morrison June 3, 2026 22:19 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 3, 2026 22:57

Copilot started reviewing on behalf of d-morrison June 3, 2026 22:57 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

d-morrison and others added 2 commits June 17, 2026 10:20

Merge remote-tracking branch 'origin/main' into claude/issue-280-2026…

d35dadd

…0524-1932

#281: normalize em dashes to ASCII in running-agents-offline.qmd

035ddfc

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 17, 2026 17:51

Copilot started reviewing on behalf of d-morrison June 17, 2026 17:52 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

#281: fix broken ollama OpenAI-compat link (404 -> stable blog URL)

08040ae

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#281: rsync only the models/ dir to match the comment and quota note

2c854a4

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 18, 2026 03:06

Copilot started reviewing on behalf of d-morrison June 18, 2026 03:06 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread ai-tools/running-agents-offline.qmd

Comment on lines +41 to +44

```bash

# macOS / Linux: download first, review it (see caution above), then run

curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

```

		# macOS / Linux (one-liner from ollama.com)
		curl -fsSL https://ollama.com/install.sh \| sh

		Set `OLLAMA_HOST=0.0.0.0` so the SSH tunnel from the login node can reach the port:

		(`-N` runs no remote command, `-f` backgrounds ssh after authenticating;
		stop it later with `kill` or `pkill -f 11434:gpu-node-01`.)

		# macOS / Linux: download first, review it (see caution above), then run
		curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

Conversation

d-morrison commented May 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d-morrison commented May 24, 2026

Uh oh!

claude Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressing Copilot Review Feedback

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-18 03:27 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

d-morrison commented May 25, 2026

Uh oh!

claude Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

ai-tools/running-agents-offline.qmd

ai-tools.qmd

Summary

Uh oh!

d-morrison commented May 25, 2026

Uh oh!

claude Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressing Review Comments

Uh oh!

d-morrison commented May 25, 2026

Uh oh!

claude Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

ai-tools/running-agents-offline.qmd

ai-tools.qmd

Summary

Uh oh!

d-morrison commented Jun 3, 2026

Uh oh!

claude Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

ai-tools/running-agents-offline.qmd

ai-tools.qmd

Summary

Uh oh!

d-morrison commented Jun 3, 2026

Uh oh!

d-morrison commented Jun 3, 2026

Uh oh!

claude Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Verification of Previous Findings

New Observations

ai-tools.qmd

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

d-morrison commented Jun 3, 2026

Uh oh!

d-morrison commented Jun 3, 2026

Uh oh!

claude Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Verification of Previous Observations

Remaining Copilot Review Items (2026-06-03)

claude Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-18 03:27 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

claude Bot commented May 25, 2026 •

edited

Loading

`ai-tools/running-agents-offline.qmd`

`ai-tools.qmd`

claude Bot commented May 25, 2026 •

edited

Loading

claude Bot commented May 25, 2026 •

edited

Loading

`ai-tools/running-agents-offline.qmd`

`ai-tools.qmd`

claude Bot commented Jun 3, 2026 •

edited

Loading

`ai-tools/running-agents-offline.qmd`

`ai-tools.qmd`

claude Bot commented Jun 3, 2026 •

edited

Loading

`ai-tools.qmd`

claude Bot commented Jun 3, 2026 •

edited

Loading

`ai-tools.qmd`

claude Bot commented Jun 3, 2026 •

edited

Loading

claude Bot commented Jun 3, 2026 •

edited

Loading

claude Bot commented Jun 18, 2026 •

edited

Loading

`ai-tools/running-agents-offline.qmd`

`ai-tools.qmd`