OwLLM is an open platform to build, deploy, and run custom AI agent teams — on your hardware, your VPS, or in a VM, 24/7. Bring your own models: local, cloud, or both. Fine-tune. Quantize. Abliterate. Red-team. Automate.
[!IMPORTANT] OwLLM Desktop currently ships for Windows 10/11 (x64) only. macOS (Apple Silicon + Intel) and Linux (x86_64) builds are coming via the already-configured cross-platform CI. Watch the repo for release notifications.
Most AI tools give you a chatbox. OwLLM gives you a workforce.
You compose teams of specialised agents — an orchestrator that plans, a coder that writes, a critic that reviews, a researcher that fact-checks — and they collaborate on real tasks in parallel. Each team is a graph of roles + prompts you define. The 18 teams shipped in this repo are starter samples, not the menu.
| What OwLLM gives you that others don't | |
|---|---|
| 🧩 Build your own teams | Compose agents from 8 base roles + custom prompts. Visual graph builder. Hot-updates through this repo — push a team JSON, it lands on every installed app. |
| ☁️ Cloud OR local — same teams | No 4090? Plug in Claude / GPT / Gemini / Kimi keys, teams work identically. Have a GPU? Run open-weight models locally and stop paying per token. Mix both in the same conversation. |
| 🎓 Fine-tune any model | Full LoRA + Unsloth + TRL pipeline. Drop a JSONL, watch loss curves, save adapters. Works on consumer GPUs (8 GB+). |
| 🔬 Abliterate for safety research | Orthogonalise weights against refusal directions. Generate adversarial datasets. Train better safety classifiers. The honest tools the field actually needs. |
| 🛠 GGUF + quantization built-in | Convert HF safetensors → GGUF, quantize Q4/Q5/Q6/Q8/F16. Ship custom models anyone with llama.cpp can run. |
| 🛡 Red-team capable | Compose adversarial agent teams whose job is to find vulnerabilities — in models, code, apps. Pair with fine-tuning to train defenders. |
| 🔒 OS-level isolation (Win/Mac/Linux) | Flip it on and every tool your agents run — shell, file writes, edits, search, and the cloud CLIs (Claude/Codex/Gemini/Kimi) — runs inside a real Linux sandbox: WSL2 on Windows, a Lima VM on macOS, bubblewrap on Linux (Mac/Linux beta). The project lives on the sandbox filesystem, so a model that runs rm -rf or writes outside it cannot touch your real drive or home. Projects are isolated by default and the toolchain auto-installs. Your provider logins — the CLIs and every API key — auto-sync into the sandbox, so isolated cloud agents just work; the Accounts page tests each provider on both host and sandbox. Connect GitHub to clone/push private repos from inside; convert a project isolated↔not anytime. Code page, agentic teams, and the fine-tuning chat are all covered; the rest stays native. |
| 🔌 MCP-first tooling | Plug in any Model Context Protocol server (filesystem, git, browser, Postgres, GitHub…). Keyless DuckDuckGo web search is auto-installed on first run — no API key, no card. Engine-agnostic: any search MCP you add is used automatically. Curated packs per team. |
| 🏠 Run anywhere | Desktop today. Headless on a $5/mo VPS, 24/7 — on the roadmap. Containerised / VM — on the roadmap. Your agents, your hardware, your terms. |
OwLLM ships starter teams in nine categories. All of them are forkable and remixable — they're templates, not the menu. The real product is the team builder.
| Category | What teams here do | Starter samples |
|---|---|---|
| 🛠 Code | Architect → code → critic → refactor; bug hunting; reviews | code_artisan, dev_squad, code_reviewer, bug_hunter |
| 🔬 Research | Multi-source synthesis with real citations, fact-checking | research_lab, learning_tutor |
| 📊 Data | SQL → notebook → viz → narrative | data_analyst |
| 🎨 Design | Product → UX → tech → critique | product_studio |
| ✍️ Writing | Outline → draft → edit → SEO → publish | writers_room, social_desk |
| 🤝 Ops | Triage → respond → schedule → digest | secretary, concierge, customer_support |
| 💼 Personal | Calendar, finance, health, home automation | finance, health_coach, smart_home |
| 🌐 Social | Outreach, support, community management | sales_outreach, n8n_workflow_builder |
| 🛡 Safety / Red-team | Adversarial dataset generation, jailbreak research, refusal probing | (build your own — see data/teams/SCHEMA.md) |
| 🎮 Gamify | Agent-vs-agent, achievements, arena | (in progress — Q4 2026) |
Browse the 18 starter teams → · Build your own →
- Open Studio in the desktop app
- Drop in agents: orchestrator + 1..N specialists (coder, critic, researcher, brainstormer, devops, documentation, operator, …)
- Wire the dispatch graph (orchestrator → coder → critic → back to orchestrator)
- Write each agent's system prompt
- Save → team appears in your picker
- Publish to the community via PR against
data/teams/— your team becomes one-click installable for every other user
LoRA pipeline with Unsloth, TRL, PEFT, bitsandbytes. Llama / Qwen / Mistral / Gemma — anything on HuggingFace. Live loss curves, graceful Stop preserves checkpoints, resume-from-checkpoint and resume-adapter both supported. Runs on a 12 GB GPU.
Orthogonalise weight matrices against refusal directions (the Labonne / Arditi technique, packaged). Use cases:
- AI safety labs training refusal classifiers need cleanly-uncensored teacher models
- Red teams need models that don't sandbag jailbreak tests
- Academic research on alignment failure modes
The corpus prep + abliteration script ship together.
Convert HF safetensors → GGUF, quantize to Q4_K_M / Q5_K_M / Q6_K / Q8_0 / F16. The same pipeline that gives you tiny, fast custom models others can run on llama.cpp / Ollama / LM Studio.
Build a team whose role is to PROBE another model. Output: a labelled dataset of jailbreak attempts, refusal patterns, edge cases. Sells to AI safety labs. Trains your own filters.
You don't need a 4090. Many users will never have one.
- Cloud-only: Plug in Claude / GPT / Gemini / Kimi API keys. Teams work identically. ~30 MB install, runs on any laptop.
- Local + cloud mix: Have a 3060? Run Llama for the bulk, hand off to Claude for the hard parts in the same conversation. Save 90% on tokens.
- Local-only: Have a 4090? Never touch a cloud API. Privacy by default. Stop paying per token forever.
Same teams. Same agent definitions. Same UI. The model layer is just plumbing.
| Mode | Status | Use case |
|---|---|---|
| Desktop (Windows) | ✅ shipped | Daily-driver AI workstation on your laptop |
| Desktop (macOS / Linux) | 🔜 Q3 2026 | Mac / Ubuntu users |
| Headless on VPS (24/7) | 🔜 Q4 2026 | Run your custom teams on a $5/mo box. Reach them via Telegram, web, API. Always-on agentic services. |
| Containerised / VM | 🔜 Q4 2026 | Drop OwLLM into your existing infra. |
The team definitions, role prompts, MCP configs, and model selections are all portable across deployment modes — build a team once, run it anywhere.
- Download
OwLLM.Desktop.Setup.exe(~30 MB — one file, that's it) - Run
OwLLM-Desktop-Setup-x64.exe. Windows SmartScreen may flag it the first time (the binary isn't EV-signed yet) — click "More info" → "Run anyway". - On first launch, a hardware-aware wizard opens. It detects your hardware and offers the modules that fit:
- Local Inference (~33 MB CPU / ~32 MB Vulkan / ~285 MB CUDA) — only needed if you want local models
- Audio / Speech-to-Text (~148 MB) — for voice messages, mic input
- Fine-tuning (~12 GB) — only if you'll train models
- MCP toolchain (~260 MB) — only if you want browser / git / postgres MCP servers
Cloud-only? Skip the wizard entirely and just enter your API keys in Settings. The shell alone is enough for cloud-model chat + agent orchestration.
Three independent update streams — small, fast, no full reinstalls:
- Shell auto-updates via Tauri's signed updater
- Modules (llama backend, fine-tune env, audio, MCP) check + swap per-launch
- Data layer (team templates, role prompts, model profiles, MCP recommendations) hot-pulls from
data/in this repo on launch. A new team you contribute today reaches every installed app within minutes — no rebuild.
That's why the data/ tree is open and community-driven even though the app binaries are closed-source.
- Multi-agent dispatch with worktree isolation
- Modular installer + hardware-aware wizard
- MCP-first tool architecture
- Fine-tuning + abliteration pipeline
- GGUF / quantization pipeline
- Telegram bridge
- WSL tool isolation — agents run their tools inside Ubuntu, off your Windows drive
- Cloud CLIs inside the sandbox — Claude/Codex/Gemini/Kimi run isolated too
- Connect GitHub — isolated agents clone private repos + push from inside the sandbox
- Auto login-sync — codex/claude/gemini/kimi + every API key mirrored into the sandbox
- Convert projects isolated↔not from the header; Accounts tests host + sandbox
- [~] Mac/Linux isolation (beta) — Lima VM (macOS) + bubblewrap (Linux), same model as WSL
- Visual team builder — Q3 2026
- macOS + Linux desktop — Q3 2026
- 24/7 headless / VPS mode — Q4 2026
- Container / VM deployment — Q4 2026
- Gamification (agent-vs-agent arena, achievements) — Q4 2026 (in progress)
- WhatsApp bridge — Q4 2026
- Vision models (LLaVA / Pixtral) — Q4 2026
- Voice output (TTS) — Q1 2027
- Public team marketplace — Q1 2027
Track active work in Discussions → Roadmap.
- Indie devs & founders — your AI workforce, not a SaaS subscription
- AI safety researchers — abliteration, red-team teams, adversarial dataset gen
- Model creators — fine-tune, quantize, ship GGUFs
- Automation builders — replace n8n / Zapier with agents that understand meaning
- Privacy-bound teams — legal, medical, defence, regulated industries
- Agencies — run custom client agent teams 24/7 (when VPS mode lands)
- Power users — anyone tired of generic chatboxes
- 💬 GitHub Discussions — Q&A, show what you built, roadmap input
- 🐛 Issues — bug reports (use the template)
- 🎨 Contributing — agent teams, roles, translations, docs
Repository contents (agent teams, role definitions, registry, schemas, docs): MIT — fork freely, share team packs, build on it.
Application binaries via Releases: see EULA.md. Source for the application itself is not currently public.
Standing on the shoulders of: llama.cpp, whisper.cpp, Tauri, Unsloth, Model Context Protocol, and the open-weight model creators (Meta, Alibaba, Mistral, Google, DeepSeek, Anthropic for their safety research).
If you build something cool with OwLLM, share it in Discussions → Show & Tell. Stars are how this category proves itself worth investing in.