🚀 Mac Mini + Windows PC LLM Server Setup

Turn your Mac Mini into a lean coding machine powered by a Windows PC LLM server

Perfect setup for developers who want AI assistance without slowing down their Mac.

📋 Overview

This repository contains everything you need to set up a dual-machine AI coding assistant:

Mac Mini (8GB): Fast coding interface, stays responsive
Windows PC (32GB): Heavy LLM processing, runs large models

    Mac Mini (Client)              Windows PC (Server)
┌──────────────────────┐       ┌─────────────────────┐
│  💻 VS Code/Cursor   │ ◄────►│  🧠 LM Studio       │
│  ⌨️  Terminal        │  LAN  │  🤖 32B+ Models     │
│  📝 Code Helper      │       │  ⚡ 32GB RAM        │
│  Fast & Responsive   │       │  Heavy Processing   │
└──────────────────────┘       └─────────────────────┘

Benefits:

✅ Mac Mini never slows down (no models loaded locally)
✅ Run huge models (32B, 70B) on Windows PC
✅ Better AI quality (larger models)
✅ All 8GB Mac RAM free for coding
✅ Simple scripts for easy use

⚡ Quick Start

1. Windows PC Setup (5 minutes)

Go to Windows PC and follow: windows-server/WINDOWS_SERVER_SETUP.md

TL;DR:

# Download LM Studio from https://lmstudio.ai
# Open LM Studio → Local Server → Start Server
# Note your IP address: ipconfig

2. Mac Mini Setup (2 minutes)

# Clone this repo (or download scripts)
git clone https://github.com/YOUR_USERNAME/mac-mini-llm-setup.git
cd mac-mini-llm-setup

# Make scripts executable
chmod +x scripts/*.sh

# Copy scripts to home directory
cp scripts/*.sh ~/
cp configs/ollama_gpu_config.example ~/.ollama_gpu_config

# Connect to Windows server
~/connect-to-llm-server.sh
# Enter Windows PC IP when prompted

3. Start Coding!

# Interactive chat
~/chat-with-server.sh

# Code assistance
~/code-helper.sh

# VS Code integration
# Install "Continue" extension
# Configure with docs/SETUP_GUIDE.md

📁 Repository Structure

mac-mini-llm-setup/
├── README.md                       # This file
├── docs/                           # Documentation
│   ├── SETUP_GUIDE.md              # Detailed setup instructions
│   ├── LLM_SERVER_ARCHITECTURE.md  # Architecture overview
│   ├── GPU_OPTIMIZATION_GUIDE.md   # Local GPU optimization (optional)
│   ├── GPU_TEST_RESULTS.md         # Performance benchmarks
│   └── FINAL_SUMMARY.md            # Complete summary
├── scripts/                        # Mac Mini client scripts
│   ├── connect-to-llm-server.sh    # Connect to Windows server
│   ├── chat-with-server.sh         # Interactive AI chat
│   ├── code-helper.sh              # Code assistance tool
│   ├── optimize_gpu.sh             # Local GPU optimization
│   └── gpu_monitor.sh              # GPU monitoring tool
├── windows-server/                 # Windows PC server setup
│   └── WINDOWS_SERVER_SETUP.md     # Complete Windows setup guide
└── configs/                        # Configuration examples
    └── ollama_gpu_config.example   # GPU config template

🎯 Use Cases

1. Code Assistance While Coding

# In terminal:
~/code-helper.sh

# Choose:
# 1) Explain code
# 2) Review code
# 3) Generate code
# 4) Fix bugs
# 5) Optimize code
# 6) Generate commit message
# 7) Custom question

2. Interactive AI Chat

~/chat-with-server.sh

# Chat with AI powered by 32B model on Windows PC
# Ask questions, debug issues, brainstorm ideas

3. VS Code Integration

Install "Continue" extension and configure:

{
  "models": [{
    "title": "Windows Server - Qwen 32B",
    "provider": "openai",
    "model": "qwen2.5-coder:32b",
    "apiBase": "http://YOUR_WINDOWS_IP:1234/v1",
    "apiKey": "sk-dummy"
  }]
}

Then use Cmd+L in VS Code for instant AI assistance!

🔧 Features

Mac Mini Scripts:

✅ connect-to-llm-server.sh - Auto-configure connection to Windows PC
✅ chat-with-server.sh - Interactive chat interface
✅ code-helper.sh - 7 code assistance modes
✅ optimize_gpu.sh - Local GPU optimization (optional)
✅ gpu_monitor.sh - Real-time GPU monitoring

Windows Server:

✅ LM Studio support - Easy GUI setup
✅ Ollama support - Command-line option
✅ 32B+ model support - Run huge models
✅ Auto-start configuration - Start on boot
✅ Firewall configuration - One-command setup

📊 Performance

Before (Mac Mini Local):

Model size: Max 7B
Speed: 20-30 tok/sec
RAM usage: High (6-7GB)
Mac responsiveness: Slows down

After (Windows Server):

Model size: Up to 70B+
Speed: 40-60 tok/sec (32B model)
Mac RAM usage: Low (2-3GB)
Mac responsiveness: Always fast ✨

📚 Documentation

Document	Description
`SETUP_GUIDE.md`	Step-by-step setup instructions
`WINDOWS_SERVER_SETUP.md`	Complete Windows PC setup
`LLM_SERVER_ARCHITECTURE.md`	Architecture & use cases
`GPU_OPTIMIZATION_GUIDE.md`	Local GPU optimization
`FINAL_SUMMARY.md`	Complete project summary

🛠️ Requirements

Mac Mini:

macOS (any recent version)
8GB+ RAM
Network connection to Windows PC

Windows PC:

Windows 10/11
16GB+ RAM (32GB recommended)
LM Studio or Ollama installed

Network:

Both on same local network
Firewall allows port 1234 (LM Studio) or 11434 (Ollama)

🚀 Quick Commands

# Mac Mini:
~/connect-to-llm-server.sh     # First time setup
~/chat-with-server.sh          # Interactive chat
~/code-helper.sh               # Code assistance
source ~/.llm_server_config    # Load server settings

# Windows PC (PowerShell):
ipconfig                       # Get IP address
ollama serve                   # Start Ollama server
ollama list                    # List available models

🐛 Troubleshooting

Can't connect to server?

On Windows PC:

# Check if server is running
curl http://localhost:1234/v1/models    # LM Studio
curl http://localhost:11434/api/tags    # Ollama

# Check firewall
New-NetFirewallRule -DisplayName "LM Studio" -Direction Inbound -LocalPort 1234 -Protocol TCP -Action Allow

On Mac Mini:

# Test connection
ping YOUR_WINDOWS_IP
curl http://YOUR_WINDOWS_IP:1234/v1/models

# Reconfigure
~/connect-to-llm-server.sh

📦 Recommended Models

For 32GB Windows PC:

Code Generation (Best):

qwen2.5-coder:32b - Best code quality
codellama:34b - Meta's coding model
deepseek-coder:33b - Strong reasoning

General Purpose:

llama3.1:70b - Huge context window
qwen2.5:72b - Top-tier reasoning
mixtral:8x22b - Fast multi-expert

Fast & Lightweight:

qwen2.5-coder:14b - Good balance
llama3.1:8b - Quick responses

🎊 What This Gives You

Daily Workflow:

Morning: Windows PC auto-starts LM Studio/Ollama
Code: Mac Mini stays fast, all processing on Windows
AI Help: Cmd+L in VS Code or ~/code-helper.sh in terminal
Results: Better quality (32B models) + Faster Mac

Perfect For:

✅ Full-stack developers
✅ Anyone with limited Mac RAM
✅ Teams sharing an LLM server
✅ People who want best AI quality

🤝 Contributing

Found a bug or have a suggestion? Open an issue or PR!

📄 License

MIT License - Use freely!

🙏 Credits

Built for developers who want the best of both worlds:

Fast, responsive Mac for coding
Powerful Windows PC for AI processing

🔗 Links

Ready to supercharge your coding workflow? Start with SETUP_GUIDE.md! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
docs		docs
scripts		scripts
windows-server		windows-server
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🚀 Mac Mini + Windows PC LLM Server Setup

📋 Overview

⚡ Quick Start

1. Windows PC Setup (5 minutes)

2. Mac Mini Setup (2 minutes)

3. Start Coding!

📁 Repository Structure

🎯 Use Cases

1. Code Assistance While Coding

2. Interactive AI Chat

3. VS Code Integration

🔧 Features

Mac Mini Scripts:

Windows Server:

📊 Performance

Before (Mac Mini Local):

After (Windows Server):

📚 Documentation

🛠️ Requirements

Mac Mini:

Windows PC:

Network:

🚀 Quick Commands

🐛 Troubleshooting

Can't connect to server?

📦 Recommended Models

Code Generation (Best):

General Purpose:

Fast & Lightweight:

🎊 What This Gives You

Daily Workflow:

Perfect For:

🤝 Contributing

📄 License

🙏 Credits

🔗 Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages