This file tells Claude Code how to install and run mac code on any Mac with Apple Silicon.
mac code is a local AI coding agent (like Claude Code) that runs entirely on your Mac using a local LLM via llama.cpp. No cloud, no API keys, no cost.
- macOS with Apple Silicon (M1/M2/M3/M4, 16GB+ RAM recommended)
- Homebrew installed (
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)") - Python 3.10+
- Go 1.25+ (
brew install go)
brew install llama.cpppip3 install huggingface-hub rich --break-system-packagesmkdir -p ~/models
python3 -c "
from huggingface_hub import hf_hub_download
hf_hub_download('unsloth/Qwen3.5-35B-A3B-GGUF',
'Qwen3.5-35B-A3B-UD-IQ2_M.gguf', local_dir='$HOME/models/')
"python3 -c "
from huggingface_hub import hf_hub_download
hf_hub_download('unsloth/Qwen3.5-9B-GGUF',
'Qwen3.5-9B-Q4_K_M.gguf', local_dir='$HOME/models/')
"cd <this-repo-directory>
git clone https://github.com/sipeed/picoclaw.git
cd picoclaw && make deps && make build && cd ..mkdir -p ~/.picoclaw/workspace
cp config.example.json ~/.picoclaw/config.jsonFor the 35B MoE model (default — 30 tok/s, SSD paging):
llama-server \
--model ~/models/Qwen3.5-35B-A3B-UD-IQ2_M.gguf \
--port 8000 --host 127.0.0.1 \
--flash-attn on --ctx-size 12288 \
--cache-type-k q4_0 --cache-type-v q4_0 \
--n-gpu-layers 99 --reasoning off -np 1 -t 4For the 9B model (64K context, tool calling):
llama-server \
--model ~/models/Qwen3.5-9B-Q4_K_M.gguf \
--port 8000 --host 127.0.0.1 \
--flash-attn on --ctx-size 65536 \
--cache-type-k q4_0 --cache-type-v q4_0 \
--n-gpu-layers 99 --reasoning off -t 4python3 agent.pyagent.py— Main agent TUI with auto-routing, slash commands, web search, toolschat.py— Lightweight streaming chat (no tools, direct to LLM)dashboard.py— Real-time server monitor (tok/s, slots, memory)config.example.json— PicoClaw config with DuckDuckGo search + fetch MCP serverssetup.sh— One-command install script (alternative to manual steps)
Two models, one agent:
- 35B MoE (IQ2_M) — Default. 30 tok/s via SSD paging, 12K context. The breakthrough: a 35B model on a $600 Mac mini.
- 9B (Q4_K_M) — 64K context with quantized KV cache. Persistent context via MLX (save/load in 0.0003s, R2 sync).
Both use text-based intent routing (not JSON tool calling). Switch with /model 9b or /model 35b.
- GPU OOM after long sessions: Reboot the Mac to clear Metal GPU memory, then restart the server
- Context overflow errors: Clear PicoClaw sessions:
rm -rf ~/.picoclaw/workspace/sessions/ - PicoClaw not found: Make sure you built it in step 5 and the binary is at
picoclaw/build/picoclaw-darwin-arm64 - Model download fails: Ensure
huggingface-hubis installed and you have ~11 GB free disk space
- Models:
~/models/ - PicoClaw config:
~/.picoclaw/config.json - PicoClaw sessions:
~/.picoclaw/workspace/sessions/ - PicoClaw binary:
<repo>/picoclaw/build/picoclaw-darwin-arm64