From 321962f393e8fb8f6660fb24c84b72bec902e08c Mon Sep 17 00:00:00 2001
From: Paritosh Dixit <paritoshd@nvidia.com>
Date: Mon, 23 Mar 2026 00:41:57 +0000
Subject: [PATCH 1/6] docs(spark): add local Ollama inference setup section

Add step-by-step instructions for setting up local inference with
Ollama on DGX Spark, covering NVIDIA runtime verification, Ollama
install and model pre-load, OLLAMA_HOST=0.0.0.0 configuration, and
sandbox connection with verification.

Fixes #314, #385
---
 spark-install.md | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/spark-install.md b/spark-install.md
index 140b36d02..71bc36b72 100644
--- a/spark-install.md
+++ b/spark-install.md
@@ -95,6 +95,87 @@ newgrp docker  # or log out and back in
 nemoclaw onboard
 ```
 
+## Setup Local Inference (Ollama)
+
+Use this to run inference locally on the DGX Spark's GPU instead of routing to NVIDIA cloud.
+
+### Step 1: Verify NVIDIA Container Runtime
+
+```bash
+docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+```
+
+If this fails, configure the NVIDIA runtime and restart Docker:
+
+```bash
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+```
+
+### Step 2: Install Ollama
+
+```bash
+curl -fsSL https://ollama.com/install.sh | sh
+```
+
+Verify it is running:
+
+```bash
+curl http://localhost:11434
+```
+
+### Step 3: Pull and Pre-load a Model
+
+Download Nemotron 3 Super 120B (~87 GB; may take several minutes):
+
+```bash
+ollama pull nemotron-3-super:120b
+```
+
+Run it briefly to pre-load weights into unified memory, then exit:
+
+```bash
+ollama run nemotron-3-super:120b
+# type /bye to exit
+```
+
+### Step 4: Configure Ollama to Listen on All Interfaces
+
+By default Ollama binds to `127.0.0.1`, which is not reachable from inside the sandbox container. Configure it to listen on all interfaces:
+
+```bash
+sudo mkdir -p /etc/systemd/system/ollama.service.d
+printf '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0"\n' | sudo tee /etc/systemd/system/ollama.service.d/override.conf
+
+sudo systemctl daemon-reload
+sudo systemctl restart ollama
+```
+
+Verify Ollama is listening on `0.0.0.0`:
+
+```bash
+sudo netstat -nap | grep 11434
+```
+
+### Step 5: Install OpenShell and NemoClaw
+
+```bash
+curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
+curl -fsSL https://nvidia.com/nemoclaw.sh | bash
+```
+
+When prompted for **Inference options**, select **Local Ollama**, then select the model you pulled.
+
+### Step 6: Connect and Test
+
+```bash
+# Connect to the sandbox
+nemoclaw my-assistant connect
+
+# Inside the sandbox, talk to the agent
+openclaw agent --agent main --local -m "Which model and GPU are in use?" --session-id test
+```
+
 ## Known Issues
 
 | Issue | Status | Workaround |

From 13bbde0627fc26d6e0a940f73816b420091b4325 Mon Sep 17 00:00:00 2001
From: Paritosh Dixit <paritoshd@nvidia.com>
Date: Mon, 23 Mar 2026 01:19:13 +0000
Subject: [PATCH 2/6] docs(spark): prefer ss over netstat for listener
 verification

netstat requires net-tools which is not installed by default on
Ubuntu 24.04. ss from iproute2 is available by default and is
more reliable for verifying listening sockets.

Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>
---
 spark-install.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/spark-install.md b/spark-install.md
index 71bc36b72..09632dceb 100644
--- a/spark-install.md
+++ b/spark-install.md
@@ -97,7 +97,7 @@ nemoclaw onboard
 
 ## Setup Local Inference (Ollama)
 
-Use this to run inference locally on the DGX Spark's GPU instead of routing to NVIDIA cloud.
+Use this to run inference locally on the DGX Spark's GPU instead of routing to cloud.
 
 ### Step 1: Verify NVIDIA Container Runtime
 
@@ -151,10 +151,10 @@ sudo systemctl daemon-reload
 sudo systemctl restart ollama
 ```
 
-Verify Ollama is listening on `0.0.0.0`:
+Verify Ollama is listening on all interfaces:
 
 ```bash
-sudo netstat -nap | grep 11434
+ss -tlnp | grep 11434
 ```
 
 ### Step 5: Install OpenShell and NemoClaw

From a9dbc13e8855c50e06595bbd6295ee5102983a7f Mon Sep 17 00:00:00 2001
From: Paritosh Dixit <paritoshd@nvidia.com>
Date: Mon, 23 Mar 2026 01:25:17 +0000
Subject: [PATCH 3/6] docs(spark): add direct inference.local check in Step 6

Add explicit curl to https://inference.local/v1/models inside the
sandbox to validate the proxy route before running the agent. This
prevents fallback paths from masking regressions in the fix for #314.
---
 spark-install.md | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/spark-install.md b/spark-install.md
index 09632dceb..78e76c3e9 100644
--- a/spark-install.md
+++ b/spark-install.md
@@ -171,8 +171,18 @@ When prompted for **Inference options**, select **Local Ollama**, then select th
 ```bash
 # Connect to the sandbox
 nemoclaw my-assistant connect
+```
+
+Inside the sandbox, first verify `inference.local` is reachable directly (must use HTTPS — the proxy intercepts `CONNECT inference.local:443`):
 
-# Inside the sandbox, talk to the agent
+```bash
+curl -s https://inference.local/v1/models
+# Expected: JSON response listing the configured model
+```
+
+Then talk to the agent:
+
+```bash
 openclaw agent --agent main --local -m "Which model and GPU are in use?" --session-id test
 ```
 

From 8d02c4d693cd9a8b835deb2942211cf1405eaf99 Mon Sep 17 00:00:00 2001
From: Paritosh Dixit <paritoshd@nvidia.com>
Date: Mon, 23 Mar 2026 01:33:13 +0000
Subject: [PATCH 4/6] docs(spark): fail fast on non-200 from inference.local
 probe

Use curl -sf so the check exits non-zero on HTTP errors (403, 503,
etc.), preventing a silent 403 from masking a proxy routing regression.

Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>
---
 spark-install.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/spark-install.md b/spark-install.md
index 78e76c3e9..0fce0eaef 100644
--- a/spark-install.md
+++ b/spark-install.md
@@ -161,7 +161,7 @@ ss -tlnp | grep 11434
 
 ```bash
 curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
-curl -fsSL https://nvidia.com/nemoclaw.sh | bash
+curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
 ```
 
 When prompted for **Inference options**, select **Local Ollama**, then select the model you pulled.
@@ -176,8 +176,9 @@ nemoclaw my-assistant connect
 Inside the sandbox, first verify `inference.local` is reachable directly (must use HTTPS — the proxy intercepts `CONNECT inference.local:443`):
 
 ```bash
-curl -s https://inference.local/v1/models
+curl -sf https://inference.local/v1/models
 # Expected: JSON response listing the configured model
+# Exits non-zero on HTTP errors (403, 503, etc.) — failure here indicates a proxy routing regression
 ```
 
 Then talk to the agent:

From f4b660efb43ab5b86e36fe2b02e382f5a1f6ffe7 Mon Sep 17 00:00:00 2001
From: Paritosh Dixit <paritoshd@nvidia.com>
Date: Mon, 23 Mar 2026 23:57:06 +0000
Subject: [PATCH 5/6] docs: Move local ollama inference section up

Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>
---
 spark-install.md | 133 +++++++++++++++++++++++------------------------
 1 file changed, 64 insertions(+), 69 deletions(-)

diff --git a/spark-install.md b/spark-install.md
index 1d65fb95c..406e54047 100644
--- a/spark-install.md
+++ b/spark-install.md
@@ -19,7 +19,7 @@ curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh |
 git clone https://github.com/NVIDIA/NemoClaw.git
 cd NemoClaw
 
-# Spark-specific setup
+# Spark-specific setup (For details see [What's Different on Spark](#whats-different-on-spark))
 sudo ./scripts/setup-spark.sh
 
 # Install NemoClaw using the NemoClaw/install.sh:
@@ -29,65 +29,21 @@ sudo ./scripts/setup-spark.sh
 curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
 ```
 
-## What's Different on Spark
-
-DGX Spark ships **Ubuntu 24.04 + Docker 28.x** but no k8s/k3s. OpenShell embeds k3s inside a Docker container, which hits two problems on Spark:
-
-### 1. Docker permissions
-
-```text
-Error in the hyper legacy client: client error (Connect)
-  Permission denied (os error 13)
-```
-
-**Cause**: Your user isn't in the `docker` group.
-**Fix**: `setup-spark` runs `usermod -aG docker $USER`. You may need to log out and back in (or `newgrp docker`) for it to take effect.
-
-### 2. cgroup v2 incompatibility
-
-```text
-K8s namespace not ready
-openat2 /sys/fs/cgroup/kubepods/pids.max: no
-Failed to start ContainerManager: failed to initialize top level QOS containers
-```
-
-**Cause**: Spark runs cgroup v2 (Ubuntu 24.04 default). OpenShell's gateway container starts k3s, which tries to create cgroup v1-style paths that don't exist. The fix is `--cgroupns=host` on the container, but OpenShell doesn't expose that flag.
-
-**Fix**: `setup-spark` sets `"default-cgroupns-mode": "host"` in `/etc/docker/daemon.json` and restarts Docker. This makes all containers use the host cgroup namespace, which is what k3s needs.
-
-## Manual Setup (if setup-spark doesn't work)
-
-### Fix Docker cgroup namespace
+## Verifying Your Install
 
 ```bash
-# Check if you're on cgroup v2
-stat -fc %T /sys/fs/cgroup/
-# Expected: cgroup2fs
-
-# Add cgroupns=host to Docker daemon config
-sudo python3 -c "
-import json, os
-path = '/etc/docker/daemon.json'
-d = json.load(open(path)) if os.path.exists(path) else {}
-d['default-cgroupns-mode'] = 'host'
-json.dump(d, open(path, 'w'), indent=2)
-"
-
-# Restart Docker
-sudo systemctl restart docker
-```
-
-### Fix Docker permissions
+# Check sandbox is running
+nemoclaw my-assistant connect
 
-```bash
-sudo usermod -aG docker $USER
-newgrp docker  # or log out and back in
+# Inside the sandbox, talk to the agent:
+openclaw agent --agent main --local -m "hello" --session-id test
 ```
 
-### Then run the onboard wizard
+## Uninstall (perform this before re-installing)
 
 ```bash
-nemoclaw onboard
+# Uninstall NemoClaw (Remove OpenShell sandboxes, gateway, NemoClaw providers, related Docker containers, images, volumes and configs)
+nemoclaw uninstall
 ```
 
 ## Setup Local Inference (Ollama)
@@ -182,6 +138,61 @@ Then talk to the agent:
 openclaw agent --agent main --local -m "Which model and GPU are in use?" --session-id test
 ```
 
+## What's Different on Spark
+
+DGX Spark ships **Ubuntu 24.04 + Docker 28.x** but no k8s/k3s. OpenShell embeds k3s inside a Docker container, which hits two problems on Spark:
+
+### 1. Docker permissions
+
+```text
+Error in the hyper legacy client: client error (Connect)
+  Permission denied (os error 13)
+```
+
+**Cause**: Your user isn't in the `docker` group.
+**Fix**: `setup-spark` runs `usermod -aG docker $USER`. You may need to log out and back in (or `newgrp docker`) for it to take effect.
+
+### 2. cgroup v2 incompatibility
+
+```text
+K8s namespace not ready
+openat2 /sys/fs/cgroup/kubepods/pids.max: no
+Failed to start ContainerManager: failed to initialize top level QOS containers
+```
+
+**Cause**: Spark runs cgroup v2 (Ubuntu 24.04 default). OpenShell's gateway container starts k3s, which tries to create cgroup v1-style paths that don't exist. The fix is `--cgroupns=host` on the container, but OpenShell doesn't expose that flag.
+
+**Fix**: `setup-spark` sets `"default-cgroupns-mode": "host"` in `/etc/docker/daemon.json` and restarts Docker. This makes all containers use the host cgroup namespace, which is what k3s needs.
+
+## Manual Setup (if setup-spark doesn't work)
+
+### Fix Docker cgroup namespace
+
+```bash
+# Check if you're on cgroup v2
+stat -fc %T /sys/fs/cgroup/
+# Expected: cgroup2fs
+
+# Add cgroupns=host to Docker daemon config
+sudo python3 -c "
+import json, os
+path = '/etc/docker/daemon.json'
+d = json.load(open(path)) if os.path.exists(path) else {}
+d['default-cgroupns-mode'] = 'host'
+json.dump(d, open(path, 'w'), indent=2)
+"
+
+# Restart Docker
+sudo systemctl restart docker
+```
+
+### Fix Docker permissions
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker  # or log out and back in
+```
+
 ## Known Issues
 
 | Issue | Status | Workaround |
@@ -192,22 +203,6 @@ openclaw agent --agent main --local -m "Which model and GPU are in use?" --sessi
 | Image pull failure (k3s can't find built image) | OpenShell bug | `openshell gateway destroy && openshell gateway start`, re-run setup |
 | GPU passthrough | Untested on Spark | Should work with `--gpu` flag if NVIDIA Container Toolkit is configured |
 
-## Verifying Your Install
-
-```bash
-# Check sandbox is running
-openshell sandbox list
-# Should show: nemoclaw  Ready
-
-# Test the agent
-openshell sandbox connect nemoclaw
-# Inside sandbox:
-nemoclaw-start openclaw agent --agent main --local -m 'hello' --session-id test
-
-# Monitor network egress (separate terminal)
-openshell term
-```
-
 ## Architecture Notes
 
 ```text

From 9ac0867cebb9bf3b003fe653a14d4ac2cd52eada Mon Sep 17 00:00:00 2001
From: Paritosh Dixit <paritoshd@nvidia.com>
Date: Tue, 24 Mar 2026 13:21:54 +0000
Subject: [PATCH 6/6] docs: Resolved review comments

Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>
---
 spark-install.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/spark-install.md b/spark-install.md
index 406e54047..0898ae801 100644
--- a/spark-install.md
+++ b/spark-install.md
@@ -105,12 +105,16 @@ sudo systemctl restart ollama
 Verify Ollama is listening on all interfaces:
 
 ```bash
-ss -tlnp | grep 11434
+sudo ss -tlnp | grep 11434
 ```
 
 ### Step 5: Install OpenShell and NemoClaw
 
 ```bash
+# If the OpenShell and NemoClaw are already installed, uninstall them. A fresh NemoClaw install will run onboard with local inference options.
+nemoclaw uninstall
+
+# Install OpenShell and NemoClaw
 curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
 curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
 ```