Skip to content

docs: add instructions for running coding agents offline#281

Open
d-morrison wants to merge 11 commits into
mainfrom
claude/issue-280-20260524-1932
Open

docs: add instructions for running coding agents offline#281
d-morrison wants to merge 11 commits into
mainfrom
claude/issue-280-20260524-1932

Conversation

@d-morrison

Copy link
Copy Markdown
Member

Adds a new "Running Coding Agents Offline" section to the AI tools chapter, covering Ollama setup, Positron/VS Code integration, HPC cluster usage with SSH port forwarding, and data-privacy considerations.

Closes #280

Generated with Claude Code

Add a new section "Running Coding Agents Offline" to the AI tools
chapter. Covers Ollama setup, connecting Positron and VS Code (via
Continue) to a local model, using local models on HPC/SLURM clusters
with port forwarding, and data-privacy considerations.

Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 24, 2026 22:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation to the AI tools chapter for using coding agents without internet access,
focusing on running local models (Ollama) and integrating them into common editor workflows.

Changes:

  • Adds a new include file describing offline/local-agent workflows (Ollama setup, editor integration, HPC usage, privacy).
  • Inserts a new “Running Coding Agents Offline” section into ai-tools.qmd that includes the new content.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
ai-tools/running-agents-offline.qmd New offline/local-agent guidance (Ollama, Positron/VS Code integration, HPC workflow, privacy notes).
ai-tools.qmd Adds a new section entry and includes the new offline guidance file.

Comment thread ai-tools/running-agents-offline.qmd Outdated
Comment thread ai-tools/running-agents-offline.qmd
Comment thread ai-tools/running-agents-offline.qmd Outdated
Comment thread ai-tools/running-agents-offline.qmd Outdated
@d-morrison

Copy link
Copy Markdown
Member Author

@claude address feedback

@claude

claude Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 1m 32s —— View job


Addressing Copilot Review Feedback

  • Read ai-tools/running-agents-offline.qmd
  • Fix "most common way" → "a common way" (line 23)
  • Add security caution for piping remote script to sh (line 32)
  • Fix SSH port-forwarding example to target compute node (line 107)
  • Use @sec-slurm cross-reference instead of direct file link (line 118)
  • Commit and push changes

All 4 copilot review comments have been addressed in commit fc679bf:

  1. "most common" claim — Changed to "a common way" to avoid an unsupported comparative claim.
  2. Install script security — Added a .callout-caution block before the curl-pipe-sh one-liner, advising users to review the script first and pointing to brew install ollama and the Ollama releases page as safer alternatives.
  3. SSH port-forwarding target — Fixed the example from localhost:11434 (login node) to gpu-node-01:11434, so the tunnel forwards through the login node to the actual compute node running ollama serve, with a comment to replace the placeholder hostname.
  4. SLURM cross-reference — Replaced [SLURM chapter](slurm.qmd) with @sec-slurm to match the convention used elsewhere in the manual (e.g., data-masking.qmd).
    |

- Remove comparative claim ("most common") about Ollama
- Add security caution for piping remote install script to sh
- Fix SSH port-forwarding example to target compute node, not login node
- Replace direct file link with @sec-slurm cross-reference

Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
@github-actions

github-actions Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1-2-g6ad689f

QR code for preview link

🚀 View preview at
https://UCD-SERG.github.io/lab-manual/pr-preview/pr-281/

Built to branch gh-pages at 2026-06-18 03:27 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@d-morrison

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 1m 26s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review ai-tools/running-agents-offline.qmd
  • Review ai-tools.qmd
  • Post findings

Overall this is a well-structured, useful addition. The previous Copilot feedback has been addressed. I found a few issues worth fixing before merge.


ai-tools/running-agents-offline.qmd

Bug — incorrect VRAM claim for qwen2.5-coder:32b (line 50)

# More capable; requires ~8 GB VRAM
ollama pull qwen2.5-coder:32b

A 32B-parameter model in 4-bit quantization requires roughly 20 GB VRAM, not 8 GB. The ~8 GB VRAM comment belongs to the 7b model above. This should either:

  • Move the VRAM comment to the 7b line, or
  • State the correct requirement for 32b (e.g., ~20 GB VRAM)

Fix this →


Bug — SSH tunnel will silently fail without OLLAMA_HOST (lines 112–119)

The HPC section has the tunnel:

ssh -L 11434:gpu-node-01:11434 user@cluster.example.edu

This forwards localhost:11434login-nodegpu-node-01:11434. However, by default Ollama binds only to 127.0.0.1 on the compute node, so gpu-node-01:11434 as seen from the login node is unreachable — the connection will be refused. Users need to set OLLAMA_HOST=0.0.0.0:11434 when starting the server on the compute node:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

This should be noted in step 2 or alongside the ssh command in step 3.

Fix this →


Minor — step 2 in HPC section lacks an actual command (line 109–111)

2. **Start Ollama on a compute node** (or an interactive session)
   using the pre-downloaded model files—no internet required.

This step describes an action but shows no command. A code block here (even just ollama serve) would make the workflow concrete and scannable. This also links naturally to the OLLAMA_HOST fix above.


Minor — llama3.3 pulls a 70B model without a VRAM warning (line 54)

# Alternatively, a general-purpose model
ollama pull llama3.3

llama3.3 defaults to the 70B variant, which requires ~40 GB VRAM — unusable on most single GPUs. This may be surprising to readers following the section's theme of "smaller/faster" models. Either note the VRAM requirement, suggest a smaller variant (e.g., llama3.2:3b), or remove this example.


ai-tools.qmd

The integration change (lines 75–77) is clean. The {#sec-ai-offline} anchor is good practice and the placement immediately after the "Firewall and Network Configuration" section is logical.


Summary

Severity Issue Location
Bug qwen2.5-coder:32b VRAM comment says ~8 GB (should be ~20 GB) line 50
Bug SSH tunnel fails unless OLLAMA_HOST=0.0.0.0 is set on compute node lines 109–119
Minor Step 2 of HPC workflow missing actual shell command lines 109–111
Minor llama3.3 default is 70B; no VRAM warning given line 54

The two bugs should be addressed before merge; the minor items are suggestions.

@d-morrison

Copy link
Copy Markdown
Member Author

@claude address review comments

@claude

claude Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 1m 11s —— View job


Addressing Review Comments

  • Read ai-tools/running-agents-offline.qmd
  • Fix qwen2.5-coder:32b VRAM comment (8 GB → ~20 GB); move ~8 GB note to 7b line
  • Add OLLAMA_HOST=0.0.0.0:11434 ollama serve command to HPC step 2
  • HPC step 2 now has actual shell command + note explaining why 0.0.0.0 is required
  • Add VRAM warning for llama3.3 (70B variant; ~40 GB VRAM)
  • Commit and push (41fe8cb)

All four issues from the review have been addressed:

Issue Fix
qwen2.5-coder:32b claimed ~8 GB VRAM (should be ~20 GB) Moved ~8 GB note to 7b line; updated 32b to ~20 GB
SSH tunnel silently fails without OLLAMA_HOST=0.0.0.0 Added OLLAMA_HOST=0.0.0.0:11434 ollama serve code block to step 2 with explanation
HPC step 2 missing shell command Added OLLAMA_HOST=0.0.0.0:11434 ollama serve code block
llama3.3 pulls 70B with no VRAM warning Added "(70B variant; requires ~40 GB VRAM—unusable on most single GPUs)" to comment

|

- Move ~8 GB VRAM note to qwen2.5-coder:7b line (was incorrectly on 32b)
- Fix qwen2.5-coder:32b VRAM requirement to ~20 GB
- Add VRAM warning for llama3.3 (70B variant, ~40 GB)
- Add OLLAMA_HOST=0.0.0.0:11434 command to HPC step 2 so SSH tunnel can reach compute node

Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 25, 2026 00:27
@d-morrison d-morrison removed the request for review from Copilot May 25, 2026 00:27
@d-morrison

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 1m 52s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review ai-tools/running-agents-offline.qmd
  • Review ai-tools.qmd
  • Post findings

All four issues from the previous review and all four Copilot comments have been addressed. The document is in good shape. I found two practical gaps that are likely to trip up readers, plus two minor notes.


ai-tools/running-agents-offline.qmd

Important — SSH tunnel session must stay open; no guidance given (lines 119–124)

The SSH command in step 3:

ssh -L 11434:gpu-node-01:11434 user@cluster.example.edu

opens an interactive login shell. The tunnel only lives as long as that shell does — if the user closes the terminal or the session times out, the tunnel drops and the editor silently loses connection. Readers who are new to SSH tunneling will hit this and not know why. Either add a note ("keep this terminal open while using your editor") or show the non-blocking form:

ssh -N -f -L 11434:gpu-node-01:11434 user@cluster.example.edu

(-N = no remote command, -f = background before executing)

Fix this →


Important — HPC home-directory quota not mentioned (lines 100–106)

The rsync step copies models to ~/.ollama/ on the cluster:

rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

On most HPC systems, home directories have tight quotas (commonly 10–50 GB), while qwen2.5-coder:7b is ~4 GB and qwen2.5-coder:32b is ~20 GB. Users who follow this literally may fill their home quota and break other things on the cluster.

Consider adding a note that OLLAMA_MODELS can redirect model storage to a scratch or project filesystem:

# Store models outside your home directory (recommended on HPC)
export OLLAMA_MODELS=/scratch/$USER/ollama-models

and adjust the rsync target accordingly.

Fix this →


Minor — Positron API key field may not accept blank (line 82)

Step 4 says "leave the API key blank". Some OpenAI-compatible clients require a non-empty value in that field even if the server ignores it. Worth adding a fallback: "leave the API key blank, or enter any placeholder value (e.g., ollama)".


Minor — curl | less may not page correctly in all terminals (line 30)

The caution block suggests:

curl -fsSL https://ollama.com/install.sh | less

Piping directly to less works in most terminals but can behave oddly in some HPC environments or terminal emulators. A slightly more robust alternative is curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh, which saves the file first. Not a blocker, but worth noting.


ai-tools.qmd

The include and section header are clean. {#sec-ai-offline} follows the existing anchoring convention.


Summary

Severity Issue Location
Important SSH tunnel session lifetime not explained; may confuse beginners lines 119–124
Important No warning about HPC home-directory quota for large model files lines 100–106
Minor Positron API key field may need a placeholder value line 82
Minor curl | less can behave oddly in some terminals line 30

The two important items are practical gaps that will affect readers attempting to follow the guide on real HPC systems. The minor items are polish.

@d-morrison

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 1m 53s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review ai-tools/running-agents-offline.qmd
  • Review ai-tools.qmd
  • Post findings

The two bugs from the previous review (VRAM comment, OLLAMA_HOST) have been resolved. However, the two "Important" items from the most recent review are still present in the current file, along with the two "Minor" items. I also found one new gap worth noting.


ai-tools/running-agents-offline.qmd

Important — SSH tunnel session lifetime still not explained (lines 117–126)

The current text:

# Replace gpu-node-01 with your actual compute node hostname
ssh -L 11434:gpu-node-01:11434 user@cluster.example.edu

There is still no guidance that this terminal must remain open while the editor is in use. A new user will close the terminal and silently lose the Ollama connection. Please add either a note ("keep this terminal open") or show the background form:

ssh -N -f -L 11434:gpu-node-01:11434 user@cluster.example.edu

Fix this →


Important — HPC home-directory quota still not warned about (lines 99–107)

The current rsync step:

rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

copies model files into ~/.ollama/ on the cluster. qwen2.5-coder:7b is ~4 GB and qwen2.5-coder:32b is ~20 GB. Most HPC home directories have tight quotas (10–50 GB) and filling them breaks other jobs. A callout or inline note should warn about this and show how to redirect model storage:

# Store models outside your home directory (recommended on HPC)
export OLLAMA_MODELS=/scratch/$USER/ollama-models

The rsync target would need to match the chosen path.

Fix this →


New — Ollama binary installation on the cluster not addressed (lines 93–135)

The HPC section shows how to rsync model files to the cluster but never addresses how to install the Ollama binary on the cluster. On most HPC systems there is no root access, so the standard curl | sh installer may fail or install to the wrong location. Users who follow this guide step-by-step will have model files but no ollama serve to run. The section should either:

  • Briefly note that Ollama must be installed on the cluster (e.g., download a static binary from the Ollama releases page), or
  • Instruct users to check with their HPC admins or use module load ollama if available.

Fix this →


Minor — Positron API key field still only says "leave blank" (line 74)

Set the base URL to `http://localhost:11434/v1` and leave the API key blank.

Some OpenAI-compatible clients reject an empty API key field. Suggest adding a fallback: "leave the API key blank, or enter any placeholder value such as ollama."


Minor — curl | less terminal compatibility (line 31)

The caution block advises:

curl -fsSL https://ollama.com/install.sh | less

Piping directly to less can behave oddly in some HPC terminal emulators. A slightly more robust alternative:

curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh

Not a blocker, but worth noting for the HPC audience this section targets.


ai-tools.qmd

The {#sec-ai-offline} anchor and include placement remain clean and correct.


Summary

Severity Issue Status
Important SSH tunnel session lifetime unexplained Not yet addressed
Important No HPC home-directory quota warning Not yet addressed
New Ollama binary installation on cluster not covered New finding
Minor Positron API key "leave blank" needs a fallback suggestion Not yet addressed
Minor curl | less terminal compatibility Not yet addressed

The two important items and the new gap are the highest priority before merge.

@d-morrison

Copy link
Copy Markdown
Member Author

Claude Code CLI (local session) is working on this — paws off until I'm done.

- Add a callout that the Ollama binary must be installed on the cluster
  first, with no-root options (module load / static binary / admins).
- Warn about HPC home-directory quotas and show OLLAMA_MODELS redirect to
  scratch, with a matching rsync target.
- Explain the SSH tunnel must stay open and add a background -N -f form.
- Positron API key: offer a placeholder fallback for clients that reject
  an empty field.
- Replace 'curl | less' with save-then-page for terminal robustness.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 22:18
@d-morrison

Copy link
Copy Markdown
Member Author

Addressed all five items from the latest review:

  • Important — SSH tunnel session lifetime: added a note that the tunnel terminal must stay open while the editor is in use, plus a background ssh -N -f -L ... alternative (with how to kill it).
  • Important — HPC home-directory quota: added a callout warning that models are large (~4 GB / ~20 GB) and home quotas are tight, showing how to redirect storage with OLLAMA_MODELS=/scratch/$USER/ollama-models and a matching rsync target.
  • New — Ollama binary install on the cluster: added a callout before step 1 that the binary must be installed on the host running ollama serve, with no-root options (module load ollama, HPC admins, or a static binary from the releases page).
  • Minor — Positron API key 'leave blank': added a placeholder fallback (ollama) for clients that reject an empty field.
  • Minor — curl | less compatibility: replaced with curl ... -o install.sh && less install.sh.

The four inline items (comparative 'common way', curl | sh security caution, SSH forwarding through the login node to the compute node, and the @sec-slurm cross-reference) were already resolved in the current file.

@claude review

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 1m 14s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review ai-tools/running-agents-offline.qmd
  • Review ai-tools.qmd
  • Post findings

All five items from the previous review have been addressed. The document is in good shape.


Verification of Previous Findings

Severity Issue Status
Important SSH tunnel session lifetime unexplained Fixed — lines 161–171 add a "must stay open" note and the background ssh -N -f alternative with kill instructions
Important No HPC home-directory quota warning Fixed — lines 126–142 add a .callout-warning with OLLAMA_MODELS redirect example and corrected rsync target
New Ollama binary install on cluster not covered Fixed — lines 101–112 add a .callout-note before step 1 covering module load, HPC admin, and static binary options
Minor Positron API key "leave blank" needs fallback Fixed — line 77 now says "or, if the client rejects an empty field, enter any placeholder value such as ollama"
Minor curl | less terminal compatibility Fixed — line 33 now uses curl ... -o install.sh && less install.sh

New Observations

Minor — OLLAMA_HOST=0.0.0.0 may expose the API to other cluster users (line 149)

OLLAMA_HOST=0.0.0.0:11434 ollama serve

On a shared compute node, binding to all interfaces means any other user logged in to the same node can send requests to the Ollama API. This is unlikely to be a critical issue in practice (HPC jobs typically get exclusive node allocation), but a one-sentence note would be appropriate: "If you are on a shared node, be aware that 0.0.0.0 exposes the port to other users on that host."

Fix this →


Nit — callout wording in the curl | less caution is slightly circular (lines 31–33)

The sentence currently reads:

Save it and open it in a pager (piping curl straight into less can behave oddly in some terminal emulators): curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh.

The parenthetical explains why not to pipe to less, but then the very command shown pipes to less via the save-first workaround — which is fine, but the parenthetical reads as a caveat about an approach the callout itself no longer uses. Consider dropping the parenthetical or rephasing to "instead of piping directly into less (which can behave oddly), save it first and then page it".


ai-tools.qmd

The section anchor {#sec-ai-offline} and include placement remain clean.


Summary

The two bugs and new gap from earlier reviews are resolved. The only remaining item of any substance is the OLLAMA_HOST=0.0.0.0 shared-node security note (minor). The document is ready to merge; the two observations above are polish, not blockers.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread ai-tools/running-agents-offline.qmd Outdated
Comment on lines +40 to +41
# macOS / Linux (one-liner from ollama.com)
curl -fsSL https://ollama.com/install.sh | sh
Comment thread ai-tools/running-agents-offline.qmd Outdated
Comment on lines +13 to +15
Local models require significant hardware (typically a GPU with ≥8 GB VRAM),
are generally less capable than frontier cloud models,
and may produce lower-quality results on complex tasks.
Comment on lines +146 to +147
Set `OLLAMA_HOST=0.0.0.0` so the SSH tunnel from the login node can reach the port:

Comment thread ai-tools/running-agents-offline.qmd Outdated
Comment on lines +170 to +171
(`-N` runs no remote command, `-f` backgrounds ssh after authenticating;
stop it later with `kill` or `pkill -f 11434:gpu-node-01`.)
@d-morrison

Copy link
Copy Markdown
Member Author

Claude Code CLI (local session) is working on this — paws off until I'm done.

- Note that OLLAMA_HOST=0.0.0.0 exposes the port to other users on a
  shared compute node (rarely an issue with exclusive node allocation).
- Rephrase the install-script callout to lead with the save-first
  workaround instead of a caveat about the piping approach it no longer uses.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@d-morrison

Copy link
Copy Markdown
Member Author

Addressed both new observations.

  1. Shared-node security (line ~149): Added a sentence after the OLLAMA_HOST=0.0.0.0 block noting that binding to 0.0.0.0 exposes the port to other users on a shared compute node, with the caveat that exclusive node allocation makes this rare in practice.
  2. curl | less callout nit (lines ~31–33): Rephrased to lead with the save-first workaround ("Instead of piping directly into less (which can behave oddly)…, save it first and then page it") so it no longer reads as a caveat about an approach the callout doesn't use.

@claude review

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 1m 26s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review ai-tools/running-agents-offline.qmd
  • Review ai-tools.qmd
  • Post findings

Both observations from the previous review are fixed. Three items from the latest Copilot review (2026-06-03) remain open.


Verification of Previous Observations

Severity Issue Status
Minor OLLAMA_HOST=0.0.0.0 shared-node security note Fixed — lines 152–154 add the note with an "exclusive allocation" caveat
Nit curl | less callout phrasing was circular Fixed — callout now leads with "Instead of piping directly into less…, save it first and then page it"

Remaining Copilot Review Items (2026-06-03)

Bug — curl | sh code block contradicts the preceding caution (line 41)

The callout at lines 29–37 explicitly tells readers to not pipe directly and to review the script first or use a safer alternative. The code block immediately below then shows:

# macOS / Linux (one-liner from ollama.com)
curl -fsSL https://ollama.com/install.sh | sh

This is internally inconsistent: the caution warns against the pattern, then the code block demonstrates it. Copilot flagged this in the latest review and it is still present. The code block should either be removed (the caution already gives the save-first command and mentions brew install ollama) or replaced with the safe form:

# After reviewing the script (see caution above):
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

Fix this →


Minor — GPU requirement is overstated; CPU-only is possible (line 13)

Local models require significant hardware (typically a GPU with ≥8 GB VRAM)

Copilot noted that Ollama can run entirely on CPU (just more slowly) and that hardware requirements vary by model size and quantization. Stating that a GPU is "typically required" is too strong. Suggested rephrasing:

"Local models work best with a GPU (≥8 GB VRAM for smaller models); CPU-only inference is possible but significantly slower."

Fix this →


Minor — pkill -f 11434:gpu-node-01 pattern is too broad (line 175)

stop it later with `kill` or `pkill -f 11434:gpu-node-01`

Copilot flagged that -f 11434:gpu-node-01 matches against the full process command line and could accidentally kill unrelated SSH processes that happen to contain that string. A safer approach is to capture the PID when launching:

ssh -N -f -L 11434:gpu-node-01:11434 user@cluster.example.edu
# To stop: record the PID above, then:
kill <PID>
# Or match only SSH background tunnels more specifically:
pkill -f "ssh.*-N.*11434:gpu-node-01"

Fix this →


ai-tools.qmd

The section anchor and include placement remain clean.


Summary

Severity Issue Status
Bug curl | sh code block contradicts the preceding caution Not yet addressed
Minor GPU requirement overstated; CPU-only inference is possible Not yet addressed
Minor pkill -f 11434:gpu-node-01 pattern too broad Not yet addressed

The curl | sh inconsistency is the highest-priority item — the current text simultaneously warns against an unsafe pattern and then demonstrates it. The two minor items are polish. The document is otherwise in good shape and close to merge-ready.

@d-morrison

Copy link
Copy Markdown
Member Author

Claude Code CLI (local session) is working on this — paws off until I'm done.

- Replace the 'curl ... | sh' one-liner with a download-then-run two-step form
  so the install command no longer contradicts the save-first caution above it.
- Soften the GPU 'requirement' wording: GPU works best, but CPU-only inference
  is possible (slower) and needs vary with model size/quantization.
- Replace the over-broad 'pkill -f 11434:gpu-node-01' with a full-command
  pattern plus a pgrep/PID alternative so unrelated processes aren't killed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 22:57
@d-morrison

Copy link
Copy Markdown
Member Author

Addressed the 3 remaining items from the latest review (commit fb93c8a):

  • curl ... | sh contradicted the caution (line 41) — replaced the one-liner with the download-then-run two-step form (curl ... -o install.sh && sh install.sh), consistent with the save-first caution above it.
  • GPU requirement overstated (line 13) — rephrased: local models work best with a GPU (≥8 GB VRAM for smaller models), but CPU-only inference is possible (slower), and needs vary with model size/quantization.
  • pkill -f 11434:gpu-node-01 too broad (line 175) — replaced with a full-command pattern (pkill -f "ssh.*-N.*11434:gpu-node-01") plus a safer pgrep/PID alternative.

The earlier Copilot items on the SSH forward target (already routes through the login node to gpu-node-01) and the @sec-slurm cross-reference (already used at line 185) were resolved in prior rounds.

@claude review

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 1s —— View job


PR Review

  • Gather context
  • Read changed files
  • Verify the 3 addressed items from previous review
  • Check for new issues
  • Post findings

All three items from the previous review are correctly addressed. The document is in good shape and close to merge-ready. Two minor observations below.


Verification of Previous Findings

Severity Issue Status
Bug curl | sh code block contradicted the preceding caution Fixed — line 43 now uses curl ... -o install.sh && sh install.sh; callout rephrased to lead with save-first
Minor GPU requirement overstated; CPU-only inference is possible Fixed — lines 13–15 now say "work best with a GPU…; CPU-only is possible but slower; needs vary with size/quantization"
Minor pkill -f 11434:gpu-node-01 pattern too broad Fixed — lines 178–181 now use pkill -f "ssh.*-N.*11434:gpu-node-01" with the pgrep/PID alternative

New Observations

Nit — download-and-run command still chains without a review pause (line 43)

# macOS / Linux: download first, review it (see caution above), then run
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

The && chains download directly to execution — a reader who just pastes and runs this block still skips the review the callout recommends. Splitting into two separate lines would match the save-then-review-then-run intent:

curl -fsSL https://ollama.com/install.sh -o install.sh
less install.sh  # review before running
sh install.sh

Not a blocker, but the current form makes it easy to bypass the step the callout is there to enforce.

Fix this →


Minor — OLLAMA_MODELS path not reflected in step 2's ollama serve command (lines 133–151)

The callout in step 1 (lines 128–144) tells readers to set OLLAMA_MODELS=/scratch/$USER/ollama-models and says "Set the same OLLAMA_MODELS value before running ollama serve." Step 2 then shows:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

with no mention of OLLAMA_MODELS. A reader who followed the callout's advice must figure out on their own how to combine both env vars. Adding a note or combined example to step 2 would close this gap:

# If you redirected model storage (see quota warning above):
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MODELS=/scratch/$USER/ollama-models ollama serve

Fix this →


Summary

Severity Issue Status
Nit Install code block chains download+run without a review pause New
Minor Step 2 ollama serve missing OLLAMA_MODELS for readers who followed the quota callout New

The document is otherwise clean and all previous findings are resolved. Both items above are polish — neither is a blocker.

@d-morrison

Copy link
Copy Markdown
Member Author

Done with my local session — unclaiming. (Re-triggered @claude review on the latest commit.)

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 3s —— View job


PR Review

  • Gather context
  • Read changed files
  • Verify previous findings are addressed
  • Check for new issues
  • Post findings

All three items from the previous Claude review have been addressed in commit fb93c8a. Two items from the June 3 Copilot review (23:00 UTC) are still open.


Verification of Previous Findings

Severity Issue Status
Bug curl | sh code block contradicted the preceding caution Fixed — line 43 now uses curl ... -o install.sh && sh install.sh
Minor GPU requirement overstated Fixed — lines 13–15 now say "work best with a GPU... CPU-only inference is possible but significantly slower"
Minor pkill -f 11434:gpu-node-01 pattern too broad Fixed — lines 177–181 now use pkill -f "ssh.*-N.*11434:gpu-node-01" and offer the pgrep/PID alternative

Remaining Open Items (from June 3 Copilot review, 23:00 UTC)

Minor — Callout warns about piping into less, but the actual risk is piping into sh (line 33–35)

The caution currently reads:

Before running any remote install script, review it first.
Instead of piping directly into less (which can behave oddly in some terminal emulators), save it first and then page it: ...

Copilot's comment: the text warns about piping to less (a viewer — completely safe) rather than piping to sh/bash (the actual security risk). The phrasing "Instead of piping directly into less" reads as if less is the dangerous step, not sh. The callout structure should lead with the sh-pipe risk, then mention the less-pipe compatibility note as a separate sub-point. For example:

Before running any remote install script, pipe it to sh is a security risk — review it first. Save it and page it to inspect it (less piped directly can behave oddly in some terminals): curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh.

Fix this →


Minor — rsync copies full ~/.ollama/ but comment says only model files (lines 124–125)

# Ollama stores models in ~/.ollama/models by default
rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

Copilot's comment: the rsync copies the entire ~/.ollama/ directory (which includes config, manifests, blobs, and other state), while the comment implies only model files are being transferred. This is inconsistent and may copy more than intended. The rsync source should be narrowed to ~/.ollama/models/ with a matching destination, or the comment should be updated to reflect that the full directory is intentionally copied.

Note: the quota-warning callout below (line 139) correctly uses ~/.ollama/models/ as the source, so the fix is to make the primary step 1 rsync consistent with it.

Fix this →


Summary

Severity Issue Status
Minor Callout warns about less instead of sh as the security risk Not yet addressed
Minor rsync copies full ~/.ollama/ but comment says only model files Not yet addressed

Both items are minor polish rather than functional blockers. The document is otherwise in solid shape; the HPC workflow, tunnel lifetime guidance, quota warning, and binary-install note are all well-written.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread ai-tools/running-agents-offline.qmd Outdated
Comment on lines +32 to +35
Before running any remote install script, review it first.
Instead of piping directly into `less` (which can behave oddly in some
terminal emulators), save it first and then page it:
`curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh`.
Comment on lines +122 to +126
# On a machine with internet access
ollama pull qwen2.5-coder:7b
# Ollama stores models in ~/.ollama/models by default
rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/
```
@d-morrison

Copy link
Copy Markdown
Member Author

Claude Code CLI (local session) is working on this — paws off until I'm done.

Copilot AI review requested due to automatic review settings June 17, 2026 17:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

Comment on lines +42 to +43
# macOS / Linux: download first, review it (see caution above), then run
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh
Comment on lines +124 to +126
# Ollama stores models in ~/.ollama/models by default
rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/
```
Comment thread ai-tools/running-agents-offline.qmd Outdated
Comment on lines +137 to +139
# Store models outside your home directory (recommended on HPC)
export OLLAMA_MODELS=/scratch/$USER/ollama-models
rsync -a ~/.ollama/models/ user@cluster.example.edu:/scratch/$USER/ollama-models/
Comment on lines +154 to +156
If you are on a shared compute node, be aware that binding to `0.0.0.0`
exposes the Ollama port to other users on that host; most HPC clusters
grant exclusive node allocation, so this is rarely a concern in practice.
Comment on lines +51 to +58
# Smaller, faster; requires ~8 GB VRAM; works on most machines with a modern GPU or Apple Silicon
ollama pull qwen2.5-coder:7b

# More capable; requires ~20 GB VRAM
ollama pull qwen2.5-coder:32b

# Alternatively, a general-purpose model (70B variant; requires ~40 GB VRAM---unusable on most single GPUs)
ollama pull llama3.3
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-morrison

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 59s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review ai-tools/running-agents-offline.qmd
  • Review ai-tools.qmd
  • Post findings

The latest commit (08040ae) fixed a broken link in the Ollama OpenAI-compatibility URL — good catch. The five issues flagged in the June 17 Copilot review remain open in the current file, along with two items carried over from the earlier Claude reviews.


ai-tools/running-agents-offline.qmd

Minor — install code block still chains download and run without a review pause (line 43)

# macOS / Linux: download first, review it (see caution above), then run
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh

The && runs sh install.sh immediately after the download — a reader who copy-pastes this skips the review the callout recommends. The comment says "review it (see caution above)" but the chained command makes that easy to miss. Splitting into separate lines matches the intent:

curl -fsSL https://ollama.com/install.sh -o install.sh
less install.sh   # review before running
sh install.sh

Fix this →


Minor — callout warns about piping into less rather than sh as the security risk (lines 33–35)

Instead of piping directly into `less` (which can behave oddly in some
terminal emulators), save it first and then page it:

less is a viewer — completely safe to pipe into. The actual security risk is piping into sh/bash. The current phrasing misleads readers into thinking less is the dangerous step. The callout should lead with the sh-pipe risk and mention the less-compatibility note as a secondary ergonomic aside. For example:

Before running any remote install script, review it first — piping directly into sh or bash executes it without inspection. Save it first and then page it (note: piping curl directly into less can behave oddly in some terminals): curl -fsSL https://ollama.com/install.sh -o install.sh && less install.sh.

Fix this →


Minor — rsync copies full ~/.ollama/ but comment says only model files (line 125)

# Ollama stores models in ~/.ollama/models by default
rsync -a ~/.ollama/ user@cluster.example.edu:~/.ollama/

The comment specifically names ~/.ollama/models, but the rsync source is ~/.ollama/ (the entire directory, including config, manifest blobs, and other state). The quota-warning callout immediately below already uses the correct narrowed path (~/.ollama/models//scratch/$USER/ollama-models/). The primary step 1 command should match:

rsync -a ~/.ollama/models/ user@cluster.example.edu:~/.ollama/models/

Fix this →


Minor — export OLLAMA_MODELS appears in the same block as the local rsync command (lines 137–140)

# Store models outside your home directory (recommended on HPC)
export OLLAMA_MODELS=/scratch/$USER/ollama-models
rsync -a ~/.ollama/models/ user@cluster.example.edu:/scratch/$USER/ollama-models/

This code block mixes a cluster-side environment variable (export OLLAMA_MODELS=..., which sets where ollama serve looks for models on the cluster) with a local rsync command (which runs on the user's workstation). A reader could easily export OLLAMA_MODELS locally and then wonder why it has no effect. The two actions should be separated — either into distinct blocks with labels ("On your local machine:" / "On the cluster:") or by moving the export note to step 2 where ollama serve is launched.

Fix this →


Minor — OLLAMA_MODELS not shown in step 2's ollama serve command (lines 150–151)

OLLAMA_HOST=0.0.0.0:11434 ollama serve

The quota-warning callout in step 1 says "Set the same OLLAMA_MODELS value before running ollama serve", but step 2's code block only shows OLLAMA_HOST. A reader who followed the quota redirect must piece together how to combine both env vars. A combined example (even as a commented alternative) closes this gap:

# If you redirected model storage (see quota warning above):
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MODELS=/scratch/$USER/ollama-models ollama serve

Fix this →


Minor — 0.0.0.0 risk downplayed; scheduler exclusivity is not universal (lines 154–156)

If you are on a shared compute node, be aware that binding to `0.0.0.0`
exposes the Ollama port to other users on that host; most HPC clusters
grant exclusive node allocation, so this is rarely a concern in practice.

"Most HPC clusters grant exclusive node allocation" is not universally true — shared-node jobs (e.g., array jobs, interactive sessions with small resource requests) are common on many systems. Presenting this as "rarely a concern in practice" may give users false confidence. A safer framing is to recommend checking with the cluster documentation or requesting exclusive allocation when in doubt:

If you are on a shared compute node, be aware that binding to 0.0.0.0 exposes the Ollama port to other users on that host. Check your cluster's scheduler documentation to confirm whether your job gets exclusive node access, or request it explicitly (e.g., --exclusive in SLURM).

Fix this →


Nit — VRAM figures are specific but unsourced (lines 51–58)

# Smaller, faster; requires ~8 GB VRAM; works on most machines with a modern GPU or Apple Silicon
# More capable; requires ~20 GB VRAM
# 70B variant; requires ~40 GB VRAM---unusable on most single GPUs

These numbers (especially "unusable on most single GPUs") will become outdated as models are re-quantized and new hardware ships. Linking to the Ollama model page or noting that these are approximate figures for default quantization would give readers a stable source to check current requirements.


ai-tools.qmd

The section anchor {#sec-ai-offline} and include placement remain clean.


Summary

Severity Issue Status
Minor Install snippet chains download+run without a review pause Not yet addressed
Minor Callout warns about less instead of sh as the security risk Not yet addressed
Minor rsync copies full ~/.ollama/ but comment says model files only Not yet addressed
Minor export OLLAMA_MODELS mixed with local rsync in same code block Not yet addressed
Minor Step 2 ollama serve missing OLLAMA_MODELS for quota-redirect users Not yet addressed
Minor 0.0.0.0 note implies exclusive node allocation is the norm Not yet addressed
Nit VRAM figures unsourced; will become outdated Not yet addressed

None of these are functional blockers on their own — the document gives correct, actionable guidance. The most confusing items for readers are the mixed local/cluster code block (#4 and #5 together) and the less vs sh misdirection in the callout (#2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 18, 2026 03:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +41 to +44
```bash
# macOS / Linux: download first, review it (see caution above), then run
curl -fsSL https://ollama.com/install.sh -o install.sh && sh install.sh
```
- Callout now leads with the real risk (piping curl into sh/bash), with the
  less-vs-saved-file note as a secondary aside.
- Split the mixed local/cluster code block into a local rsync block and a
  cluster-side OLLAMA_MODELS export block, each labeled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add instructions for running coding agents offline

2 participants