Turn your Mac Mini into a lean coding machine powered by a Windows PC LLM server
Perfect setup for developers who want AI assistance without slowing down their Mac.
This repository contains everything you need to set up a dual-machine AI coding assistant:
- Mac Mini (8GB): Fast coding interface, stays responsive
- Windows PC (32GB): Heavy LLM processing, runs large models
Mac Mini (Client) Windows PC (Server)
ββββββββββββββββββββββββ βββββββββββββββββββββββ
β π» VS Code/Cursor β ββββββΊβ π§ LM Studio β
β β¨οΈ Terminal β LAN β π€ 32B+ Models β
β π Code Helper β β β‘ 32GB RAM β
β Fast & Responsive β β Heavy Processing β
ββββββββββββββββββββββββ βββββββββββββββββββββββ
Benefits:
- β Mac Mini never slows down (no models loaded locally)
- β Run huge models (32B, 70B) on Windows PC
- β Better AI quality (larger models)
- β All 8GB Mac RAM free for coding
- β Simple scripts for easy use
Go to Windows PC and follow: windows-server/WINDOWS_SERVER_SETUP.md
TL;DR:
# Download LM Studio from https://lmstudio.ai
# Open LM Studio β Local Server β Start Server
# Note your IP address: ipconfig# Clone this repo (or download scripts)
git clone https://github.com/YOUR_USERNAME/mac-mini-llm-setup.git
cd mac-mini-llm-setup
# Make scripts executable
chmod +x scripts/*.sh
# Copy scripts to home directory
cp scripts/*.sh ~/
cp configs/ollama_gpu_config.example ~/.ollama_gpu_config
# Connect to Windows server
~/connect-to-llm-server.sh
# Enter Windows PC IP when prompted# Interactive chat
~/chat-with-server.sh
# Code assistance
~/code-helper.sh
# VS Code integration
# Install "Continue" extension
# Configure with docs/SETUP_GUIDE.mdmac-mini-llm-setup/
βββ README.md # This file
βββ docs/ # Documentation
β βββ SETUP_GUIDE.md # Detailed setup instructions
β βββ LLM_SERVER_ARCHITECTURE.md # Architecture overview
β βββ GPU_OPTIMIZATION_GUIDE.md # Local GPU optimization (optional)
β βββ GPU_TEST_RESULTS.md # Performance benchmarks
β βββ FINAL_SUMMARY.md # Complete summary
βββ scripts/ # Mac Mini client scripts
β βββ connect-to-llm-server.sh # Connect to Windows server
β βββ chat-with-server.sh # Interactive AI chat
β βββ code-helper.sh # Code assistance tool
β βββ optimize_gpu.sh # Local GPU optimization
β βββ gpu_monitor.sh # GPU monitoring tool
βββ windows-server/ # Windows PC server setup
β βββ WINDOWS_SERVER_SETUP.md # Complete Windows setup guide
βββ configs/ # Configuration examples
βββ ollama_gpu_config.example # GPU config template
# In terminal:
~/code-helper.sh
# Choose:
# 1) Explain code
# 2) Review code
# 3) Generate code
# 4) Fix bugs
# 5) Optimize code
# 6) Generate commit message
# 7) Custom question~/chat-with-server.sh
# Chat with AI powered by 32B model on Windows PC
# Ask questions, debug issues, brainstorm ideasInstall "Continue" extension and configure:
{
"models": [{
"title": "Windows Server - Qwen 32B",
"provider": "openai",
"model": "qwen2.5-coder:32b",
"apiBase": "http://YOUR_WINDOWS_IP:1234/v1",
"apiKey": "sk-dummy"
}]
}Then use Cmd+L in VS Code for instant AI assistance!
- β connect-to-llm-server.sh - Auto-configure connection to Windows PC
- β chat-with-server.sh - Interactive chat interface
- β code-helper.sh - 7 code assistance modes
- β optimize_gpu.sh - Local GPU optimization (optional)
- β gpu_monitor.sh - Real-time GPU monitoring
- β LM Studio support - Easy GUI setup
- β Ollama support - Command-line option
- β 32B+ model support - Run huge models
- β Auto-start configuration - Start on boot
- β Firewall configuration - One-command setup
- Model size: Max 7B
- Speed: 20-30 tok/sec
- RAM usage: High (6-7GB)
- Mac responsiveness: Slows down
- Model size: Up to 70B+
- Speed: 40-60 tok/sec (32B model)
- Mac RAM usage: Low (2-3GB)
- Mac responsiveness: Always fast β¨
| Document | Description |
|---|---|
SETUP_GUIDE.md |
Step-by-step setup instructions |
WINDOWS_SERVER_SETUP.md |
Complete Windows PC setup |
LLM_SERVER_ARCHITECTURE.md |
Architecture & use cases |
GPU_OPTIMIZATION_GUIDE.md |
Local GPU optimization |
FINAL_SUMMARY.md |
Complete project summary |
- macOS (any recent version)
- 8GB+ RAM
- Network connection to Windows PC
- Windows 10/11
- 16GB+ RAM (32GB recommended)
- LM Studio or Ollama installed
- Both on same local network
- Firewall allows port 1234 (LM Studio) or 11434 (Ollama)
# Mac Mini:
~/connect-to-llm-server.sh # First time setup
~/chat-with-server.sh # Interactive chat
~/code-helper.sh # Code assistance
source ~/.llm_server_config # Load server settings
# Windows PC (PowerShell):
ipconfig # Get IP address
ollama serve # Start Ollama server
ollama list # List available modelsOn Windows PC:
# Check if server is running
curl http://localhost:1234/v1/models # LM Studio
curl http://localhost:11434/api/tags # Ollama
# Check firewall
New-NetFirewallRule -DisplayName "LM Studio" -Direction Inbound -LocalPort 1234 -Protocol TCP -Action AllowOn Mac Mini:
# Test connection
ping YOUR_WINDOWS_IP
curl http://YOUR_WINDOWS_IP:1234/v1/models
# Reconfigure
~/connect-to-llm-server.shFor 32GB Windows PC:
qwen2.5-coder:32b- Best code qualitycodellama:34b- Meta's coding modeldeepseek-coder:33b- Strong reasoning
llama3.1:70b- Huge context windowqwen2.5:72b- Top-tier reasoningmixtral:8x22b- Fast multi-expert
qwen2.5-coder:14b- Good balancellama3.1:8b- Quick responses
- Morning: Windows PC auto-starts LM Studio/Ollama
- Code: Mac Mini stays fast, all processing on Windows
- AI Help: Cmd+L in VS Code or
~/code-helper.shin terminal - Results: Better quality (32B models) + Faster Mac
- β Full-stack developers
- β Anyone with limited Mac RAM
- β Teams sharing an LLM server
- β People who want best AI quality
Found a bug or have a suggestion? Open an issue or PR!
MIT License - Use freely!
Built for developers who want the best of both worlds:
- Fast, responsive Mac for coding
- Powerful Windows PC for AI processing
Ready to supercharge your coding workflow? Start with SETUP_GUIDE.md! π