Stop MLX kernel panics from rebooting your Mac.
MetalGuard is a GPU safety layer for MLX on Apple Silicon. Running MLX models can trip a bug in Apple's IOGPUFamily GPU driver that kernel-panics and reboots your entire Mac instead of just failing the process. MetalGuard catches the conditions that trigger that bug — before they reach the kernel.
pip install metal-guard · zero dependencies · macOS / Apple Silicon · MIT
Current version: v1.1.0 — see CHANGELOG.md for release history.
You ran an MLX model and your Mac suddenly restarted. That is not a hardware fault and not your mistake — it is a known bug in Apple's GPU driver. Here is the fix, start to finish:
1. Open Terminal. Press ⌘ + Space, type Terminal, press Enter.
2. Install metal-guard — copy this line, paste it into Terminal, press Enter:
pip install metal-guardIt has zero dependencies, so it installs in seconds and cannot fail with a missing-package error.
3. Run it — type this, press Enter:
metal-guardmetal-guard reads the panic report your Mac just wrote, explains in plain language what happened, and offers to install a one-line protection so it does not happen again. Answer y when it asks.
That's it. The next time an MLX model would have panicked your Mac, metal-guard pauses it with an explanation instead.
No
pip?pipcomes with Python. Ifpip install metal-guardsays command not found, install Python from python.org first, then try again. You can also usepipx:pipx install metal-guard.
- Diagnoses the panic. Reads the macOS panic report, identifies which Apple driver bug it was, and explains it in plain words — no kernel-log decoding required.
- Prevents the next one. A reversible shell guard routes risky MLX runs through a cooldown check; models known to panic are flagged before they load.
- Contains the damage. Runs MLX in an isolated subprocess, narrows the race windows that trigger the bug, and refuses to restart straight into a panic loop after a reboot.
- Stays out of the way. Zero dependencies, advisory by default, and every gate has an off switch.
MetalGuard is a workaround, not a cure — the root bug is inside Apple's driver and only Apple can fix it. What MetalGuard does is take your Mac from "reboots without warning" to "pauses with an explanation."
Apple's Metal GPU driver on Apple Silicon has a bug: when GPU memory management fails, the kernel panics the entire machine instead of gracefully killing the process.
panic(cpu 4 caller 0xfffffe0032a550f8):
"completeMemory() prepare count underflow" @IOGPUMemory.cpp:492
Any workflow that loads and unloads MLX models in sequence can trip it — the driver's internal reference count underflows and the machine reboots. This is not your code's fault. It is a driver-level bug with no fix timeline. See ml-explore/mlx-lm#883.
| Workload | Risk | Why |
|---|---|---|
| Single-model server (LM Studio) | Low | One model, no switching |
| Multi-model pipeline | High | Every load/unload transition can panic |
Long-running server (mlx_lm.server) |
High | KV cache grows unbounded, Metal buffers accumulate |
| Agent framework + tool calling | High | 50–100 short generate() calls per conversation |
| 24/7 daemon | Critical | Memory drift over days, no natural cleanup point |
Searched for one of these error strings? You're in the right place.
If your Mac is panicking / rebooting while running MLX and you searched for any of these, MetalGuard is built for you:
IOGPUMemory.cpp:492 completeMemory() prepare count underflow · IOGPUMemory.cpp:550 kernel panic · kIOGPUCommandBufferCallbackErrorOutOfMemory · mlx::core::gpu::check_error → std::terminate → abort (SIGABRT) · mlx::core::metal::GPUMemoryAllocator / fPendingMemorySet · IOGPUGroupMemory.cpp:219 pending memory set panic · IOGPUGroupMemory::remove_memory_object memory object not found · mlx_lm.generate crashes mid-inference · mlx_lm.server OOM kernel panic / Mac reboot · com.apple.iokit.IOGPUFamily in a panic report · AGX_RELAX_CDM_CTXSTORE_TIMEOUT · GPU watchdog killing MLX on MacBook · M1 / M2 / M3 / M4 (Max / Ultra / Pro) kernel panic · long-context (≥ 65k) prefill triggers reboot · back-to-back MLX model loads cause IOGPU underflow panic.
pip install metal-guardThis gives you the metal-guard and mlx-safe-python command-line tools. To keep it isolated from your other Python packages, use pipx instead: pipx install metal-guard.
pip install metal-guard also installs the metal_guard Python package:
import metal_guard as mg
verdict = mg.evaluate_panic_cooldown()
print(verdict.exit_code, verdict.reason)git clone https://github.com/Harperbot/metal-guard.git
cd metal-guard
pip install -e ".[test]"
pytest -q$ metal-guard panic-gate
🟢 PROCEED no recent IOGPU panics
24h=0 72h=0
$ metal-guard status
metal-guard 1.1.0 🟢 OK
mode defensive — defensive mode (default)
panics 0 in last 72hIf metal-guard is not found after install, your pip --user bin directory is probably not on PATH — python3 -m metal_guard_cli panic-gate works as a fallback.
| Command | What it does |
|---|---|
metal-guard |
First-run wizard: scan for the recent panic, explain it, offer protection |
metal-guard diagnose |
Scan for recent kernel panics and explain them (no changes made) |
metal-guard guard install |
Install the reversible shell guard (see below) |
metal-guard guard uninstall / status |
Remove the shell guard / report its state |
metal-guard panic-gate |
Cooldown verdict — for use in launchd / CI scripts |
metal-guard status |
Full status snapshot |
metal-guard postmortem <dir> |
Collect a diagnostic bundle after a panic |
metal-guard guard install adds a single delimited block to your shell rc (~/.zshrc or ~/.bashrc) that routes interactive-shell python / python3 through mlx-safe-python. While a panic cooldown is active, MLX runs are paused automatically; otherwise they pass straight through. It is fully reversible — metal-guard guard uninstall removes the block cleanly — and covers interactive terminals only (Terminal, iTerm, VS Code), never launchd jobs or scripts. Disable without uninstalling: export METALGUARD_SHELL_GUARD_DISABLED=1.
from metal_guard import metal_guard, require_cadence_clear, CircuitBreaker
# Refuse back-to-back loads, and refuse new workers after a panic cluster
require_cadence_clear("mlx-community/gemma-4-26b-a4b-it-4bit")
CircuitBreaker().check()
# Register GPU-bound threads so cleanup waits for them
metal_guard.register_thread(thread)
metal_guard.wait_for_threads()
# Safe unload, OOM-protected inference, pre-load headroom check
metal_guard.safe_cleanup() # gc + flush GPU + cooldown
result = metal_guard.oom_protected(generate, model, tokenizer, prompt=p)
metal_guard.ensure_headroom(model_name="my-model-8bit")Hardware-aware defaults in one line:
config = MetalGuard.recommended_config()
metal_guard.start_watchdog(warn_pct=config["watchdog_warn_pct"],
critical_pct=config["watchdog_critical_pct"])Every API is listed under Reference below.
If you ship an MLX-based app, server, or backend, embedding metal-guard means your users are protected from kernel panics without installing or configuring anything themselves — the most reliable way to reach users who would never find a safety tool on their own.
1. Add it as a dependency. metal-guard has zero third-party runtime dependencies, so adding it cannot pull in a conflicting package or break your build:
# pyproject.toml
dependencies = ["metal-guard>=1.1,<2"]2. Guard the panic-prone transitions. Wrap model load, unload, and back-to-back inference with the API above — at minimum require_cadence_clear() before a load and metal_guard.safe_cleanup() after an unload.
3. Fail safe, not loud. metal-guard's gates raise typed exceptions (e.g. SpawnRefused, MLXLockConflict) instead of letting a panic reboot the machine — catch them and degrade gracefully, such as falling back to an API model.
4. (Optional) Explain panics to your users. After a reboot, call metal_guard.parse_panic_reports() and show users the same plain-language explanation the CLI gives — turning a mysterious crash into a handled event.
metal-guard follows semantic versioning; pin to a compatible range.
A community-curated list of MLX models that kernel-panic Apple Silicon Macs in production — with hardware contexts, root-cause hypotheses, and verified workarounds.
Apple's driver bug has no fix timeline. But which models trigger it under which workloads is community-knowable — it is just scattered across GitHub issues, LM Studio bug reports, Discord screenshots, and panic-full-*.panic files nobody publishes. MetalGuard gives that knowledge a structured home:
from metal_guard import check_known_panic_model, warn_if_known_panic_model
advisory = check_known_panic_model("mlx-community/gemma-4-31b-it-8bit")
if advisory is not None:
print(advisory["recommendation"])
# → "metal-guard narrows the race window but does NOT eliminate panic on
# this model. Switch backend (Ollama / llama.cpp) or pivot to an MoE variant."
warn_if_known_panic_model(model_id) # fire-and-forget, per-process dedupEach entry carries the panic_signature (the exact IOGPUMemory.cpp:NNN line to match), reproductions (hardware / RAM / time-to-panic / workload), community cross-references, an actionable recommendation, and upstream issue links.
Hit a panic on a specific model with metal-guard fully engaged? Your data point is valuable — open a Known Panic Model report. The registry is intentionally conservative: entries require a confirmed reproduction or a clear upstream issue, so working models are not falsely blacklisted.
The registry records models known to panic — it cannot record models nobody has reported yet, and every entry reflects what was observed up to a point in time. A model's absence from the registry is not a safety certificate — it just means no one has reported it here. If you want to run a local model that isn't listed, test it yourself first on your own hardware and workload; if it panics, report it so the next person is warned.
The panic landscape also moves in the other direction. The root bug is upstream, and upstream is not standing still — recent MLX releases have already merged mitigations (e.g. mlx#3348, a thread-local CommandEncoder), and a future MLX or macOS release could narrow or close the bug entirely. When that happens, a registry entry's "switch backend" advice becomes unnecessary — and metal-guard's check_version_advisories() and observer mode (METALGUARD_MODE=observer, which relaxes the defensive layers once a fixed MLX runtime is installed) are how you track it. Treat the registry and these advisories as a point-in-time snapshot, not a permanent verdict — re-check against the MLX and macOS versions you actually run.
MetalGuard is organised as defence layers (L1–L13) — a defence-in-depth onion: L1–L8 narrow race windows during a run, L9 + L11 short-circuit just before a kernel-level abort, L10 + L12 handle recovery after a panic + reboot, and L13 surfaces it all as a JSON snapshot. See CHANGELOG.md for when each layer landed and the incident that motivated it.
Register any thread that touches Metal so cleanup waits for GPU work to finish before mx.clear_cache().
| API | What it does |
|---|---|
metal_guard.register_thread(thread) |
Add a GPU-bound thread to the registry |
metal_guard.wait_for_threads(timeout=None) -> int |
Block until registered threads finish; returns count still alive |
Ordered cleanup that avoids the "main thread freed while worker thread still generating" race — the original panic root cause.
| API | What it does |
|---|---|
metal_guard.flush_gpu() |
mx.eval(sync) + mx.clear_cache() — only safe after wait_for_threads() |
metal_guard.safe_cleanup() |
Full sequence: wait → gc.collect → flush → cooldown |
metal_guard.guarded_cleanup() |
Context manager that runs safe_cleanup() on exit |
kv_cache_clear_on_pressure(available_gb, growth_rate_gb_per_min) |
Ready-made on_pressure callback for the KV monitor |
Turn the raw C++ Metal OOM into a catchable Python exception with automatic cleanup and optional retry.
| API | What it does |
|---|---|
metal_guard.oom_protected(fn, *args, max_retries=1, **kwargs) |
Run with OOM catch → cleanup → retry |
metal_guard.oom_protected_context() |
Context-manager variant |
metal_guard.is_metal_oom(exc) -> bool |
Classify an arbitrary exception |
MetalOOMError |
Catchable exception, carries MemoryStats |
Refuse loads that will not fit, with model-size estimation from the HF model ID.
| API | What it does |
|---|---|
metal_guard.can_fit(model_size_gb, overhead_gb=2.0) -> bool |
Non-raising check |
metal_guard.require_fit(model_size_gb, model_name, overhead_gb=2.0) |
Clean up, then raise MemoryError if it still won't fit |
MetalGuard.estimate_model_size_from_name(name) (static) |
Parse param count + quantisation → GB estimate |
For mlx_lm.server, agent frameworks, and 24/7 daemons.
| API | What it does |
|---|---|
metal_guard.memory_stats() -> MemoryStats |
Snapshot (active / peak / limit / available / pct) |
metal_guard.is_pressure_high(threshold_pct=67.0) -> bool |
Quick pressure check |
metal_guard.ensure_headroom(model_name, threshold_pct=67.0) |
Clean up if pressure high, no-op otherwise |
metal_guard.start_watchdog(interval_secs, warn_pct, critical_pct, on_critical) |
Drift watchdog with escalating response |
metal_guard.start_kv_cache_monitor(interval_secs, headroom_gb, growth_rate_warn, on_pressure) |
KV growth monitor, fires before OOM |
bench_scoped_load(model_id, ...) |
Context manager for sequential benchmark runs — guarantees unload before next load |
Runtime-selectable defensive vs observer posture, so you can A/B upstream mitigations without changing code.
| API | What it does |
|---|---|
current_mode() -> str |
"defensive" (default) or "observer" |
is_defensive() / is_observer() -> bool |
Convenience predicates |
describe_mode() -> dict |
Mode name, description, env var |
Run MLX in a fresh multiprocessing child so a kernel-level abort cannot kill the parent.
| API | What it does |
|---|---|
MLXSubprocessRunner(model_id, ...) |
Persistent worker subprocess, respawns on crash |
call_model_isolated(model_id, prompt, ...) |
One-shot helper: spawn → generate → shut down |
shutdown_all_workers() |
Force-terminate any runners tracked at exit |
SubprocessCrashError / SubprocessTimeoutError |
Typed failures for callers |
SpawnRefused |
Raised at runner construction when the model's advisory tier is panic (override: METALGUARD_LOCAL_PANIC_MODEL_BLOCK_DISABLED=1) |
File lock under MLX_LOCK_PATH so bench / server / pipeline never initialise Metal on the same box simultaneously.
| API | What it does |
|---|---|
acquire_mlx_lock(label, force=False) |
Raise MLXLockConflict if held; force=True SIGTERMs the holder with timeout + cooldown |
release_mlx_lock() -> bool |
Release if this process holds it |
read_mlx_lock() -> dict | None |
Non-blocking inspect; self-heals stale + zombie holders |
mlx_exclusive_lock(label) |
Context manager: acquire on enter, release on exit |
The last line of defence after the first eight layers — written in response to a kernel panic that lived below the SIGABRT layer: by the time Python saw anything, the machine had already rebooted. The only fix is to avoid the trigger.
| API | What it does |
|---|---|
CadenceGuard(path=None, *, min_interval_sec=180) |
Persisted per-model load-timestamp store |
require_cadence_clear(model_id, *, min_interval_sec=180) |
Atomic check + mark; raises CadenceViolation if a load happened too recently |
parse_panic_reports(directory=None, *, since_ts=None) |
Scan macOS panic reports (/Library/Logs/DiagnosticReports, /var/db/PanicReporter, ~/Library/...; .panic + .ips) and classify |
ingest_panics_jsonl(*, report_dir=None, jsonl_path=None) -> int |
Dedupe-append to ~/.cache/metal-guard/panics.jsonl |
CircuitBreaker(*, window_sec=3600, panic_threshold=2, cooldown_sec=3600) |
Refuse new workers after a panic cluster |
detect_panic_signature(text) -> (name, explanation) |
Classify a panic log: prepare_count_underflow / pending_memory_set / remove_memory_object / ctxstore_timeout / metal_oom |
After a kernel panic + reboot, launchd auto-respawns plists ~14 minutes later — and the next MLX workload can immediately re-trigger the bug. L10 reads the macOS panic reports and applies a staircase cooldown (1 panic → 2h; ≥2 in 24h or ≥3 in 72h → lockout requiring an explicit ack).
| API | What it does |
|---|---|
evaluate_panic_cooldown() -> CooldownVerdict |
Stdlib-only; verdict.exit_code ∈ {0=proceed, 2=cooldown, ≥3=gate broken} |
scan_recent_panics(hours=72.0) -> list[PanicRecord] |
AND-pattern IOGPU-panic scan |
ack_panic_lockout() |
Clear an active lockout |
metal-guard panic-gate / metal-guard ack |
CLI wrappers for launchd scripts |
Env: METALGUARD_PANIC_COOLDOWN_STAGE1_H / _LOCKOUT_24H_N / _LOCKOUT_72H_N / _LOCKOUT_MAX_H / _GATE_DISABLED=1.
Pre-panic signal: a SUBPROC_PRE breadcrumb without a matching SUBPROC_POST after 90 s strongly suggests Metal is stuck — kill the worker before the kernel does.
| API | What it does |
|---|---|
scan_orphan_subproc_pre(threshold_sec=90.0) -> list[OrphanPre] |
FIFO-paired PRE↔POST scan over the breadcrumb tail |
metal-guard orphan-scan [--threshold-sec N] |
CLI wrapper |
After a panic + reboot, collects the diagnostic bundle into one directory: panic files (capped), the breadcrumb-log tail, panics.jsonl history, mx.metal stats, and an index.md summary — and writes a sentinel cooldown so L10 defers further runs even if the panic reports rotate out.
| API | What it does |
|---|---|
run_postmortem(output_dir) -> dict |
Full orchestration; returns paths + panic count |
metal-guard postmortem <output_dir> |
CLI wrapper (kill-switch: METALGUARD_POSTMORTEM_DISABLED=1) |
Versioned JSON snapshot for cross-process consumers (menu-bar apps, dashboards, ssh inspection) that should not import metal_guard directly.
| API | What it does |
|---|---|
get_status_snapshot(*, include_panics=True, breadcrumb_lines=20) -> dict |
Aggregate memory / KV monitor / panics / lock holder / mode / L10 verdict |
write_status_snapshot(out_path=None) |
Atomic write to ~/.cache/metal-guard/status.json |
metal-guard status-write [--once | --interval 30] |
CLI / daemon wrapper |
| API | What it does |
|---|---|
MetalGuard.detect_hardware() -> dict (static) |
Chip, GPU memory, recommended working set, tier, IOGPUFamily kext version |
MetalGuard.recommended_config() -> dict (classmethod) |
Safe defaults for every layer on the detected hardware |
check_version_advisories(packages=None) -> list[dict] |
Warn if installed (mlx, mlx-lm, mlx-vlm, transformers) versions trip a known advisory |
install_upstream_defensive_patches(force=False) -> dict[str, bool] |
Idempotent, version-gated monkey-patches for known upstream regressions |
audit_wired_limit() -> dict |
Flag dangerous iogpu.wired_limit_mb overrides (mlx-lm#1047) |
read_gpu_driver_version() -> str | None |
IOGPUFamily kext version (mlx#3186) |
| API | What it does |
|---|---|
lookup_dims(model_id) / estimate_prefill_peak_alloc_gb(...) / require_prefill_fit(...) |
GQA-aware prefill ceiling — refuse a prefill before a 30 GB single-alloc panic |
recommend_chunk_size(...) / describe_prefill_plan(...) |
Advisory prefill chunking |
KVGrowthTracker(...) |
Per-request cumulative KV guard — catches a runaway request the global monitor misses |
detect_process_mode() -> ProcessMode |
"server" / "embedded" / "notebook" / "cli" / "subprocess_worker" |
format_panic_for_apple_feedback(forensics, ...) |
Ready-to-paste Apple Feedback Assistant report |
metal_guard.breadcrumb(msg) |
Write an fsync'd line to the breadcrumb log |
All L9 artifacts use ~/.cache/metal-guard/: cadence.json (CadenceGuard), panics.jsonl (panic archive), breaker.json (CircuitBreaker), status.json (L13 snapshot). The breadcrumb log defaults to logs/metal_breadcrumb.log; override via MetalGuard(breadcrumb_path=...).
┌─────────────────────────────────────────────────┐
│ Your Application Code │
│ Agent loop / Server / Pipeline / Daemon │
└──────────────────┬──────────────────────────────┘
┌──────────────────▼──────────────────────────────┐
│ MetalGuard │
│ L9 Cadence + CircuitBreaker refuse bad loads │
│ L8 Process lock cross-process │
│ L7 Subprocess isolation panic-isolated │
│ L5 Watchdogs drift alerts │
│ L3 OOM recovery catch + retry │
│ L2 Safe cleanup gc + flush │
│ L1 Thread registry wait before free │
│ L10–L13 cooldown / postmortem / status │
└──────────────────┬──────────────────────────────┘
┌──────────────────▼──────────────────────────────┐
│ MLX + Metal Driver │
│ ⚠️ Driver bug: panics instead of OOM │
└─────────────────────────────────────────────────┘
If you engage every defence and still see repeat panics on the same model, the race window is wider than a userspace layer can narrow. Two escape hatches, by ROI:
- Switch backend. Ollama and
llama.cppuse Metal under the hood but run a persistent-worker architecture that sidesteps the subprocess teardown race entirely. You lose some raw throughput; you gain "doesn't panic the machine." - Pivot to an MoE model. Mixture-of-Experts variants (e.g.
mlx-community/gemma-4-26b-a4b-it-4bit) have a smaller active-parameter footprint per forward pass and a narrower KV trajectory. Community reports converge on MoE as the most reliable same-ecosystem workaround.
MetalGuard is complementary to both — CadenceGuard still helps whenever you hot-swap models.
One hard-learned SOP note. Anything that imports torch, mlx, mlx_lm, mlx_vlm, sentence_transformers, transformers, diffusers, or accelerate initialises the Metal backend and can hit the same kernel bug — even an interactive version-check command. During an active cooldown, use pip show <pkg> or python -c "import importlib.metadata as m; print(m.version('<pkg>'))"; never python -c "import <ml-package>; print(<ml-package>.__version__)".
The root bug lives inside Apple's IOGPUFamily kext (mlx#3186) and cannot be patched from Python. MetalGuard lowers the trigger rate (avoids the known trigger paths), contains the blast radius (subprocess isolation), and prevents post-reboot cascades (CircuitBreaker). It does not eliminate panics — especially the uncatchable completion-handler abort (mlx#3390) that fires before any Python signal handler. One production box went from ~1.4 panics/day to zero over a 24 h window after L9 landed — but that is risk-reduction, not elimination. Until Apple ships a fixed kext, this is the upper bound of what a Python-side layer can do.
| Issue | Problem | Layer |
|---|---|---|
| mlx#3186 | IOGPUFamily kernel panic (canonical) | L1/L2/L8/L9 |
| mlx#3346 | fPendingMemorySet second signature |
detect_panic_signature + L9 |
| mlx#3348 | CommandEncoder thread-local (merged) | Advisory-gated observer mode |
| mlx#3390 | Uncatchable completion-handler abort | L7 subprocess isolation |
| mlx-lm#883 | Kernel panic from KV cache growth | L1 thread + L2 safe cleanup |
| mlx-lm#854 | Server OOM crash | L3 oom_protected + L5 |
| mlx-lm#1047 | wired_limit correlation with panics |
audit_wired_limit |
MIT