You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.
macMLX brings local LLM inference to Apple Silicon with a first-class native macOS experience. No cloud, no telemetry, no Electron — just your Mac running models at full speed.
Defensive layer for mlx / mlx-lm / mlx-vlm on Apple Silicon. Prevents IOGPUFamily kernel panics, SIGABRT, and Mac reboots from MLX inference. Includes a community-curated registry of known-panic models (KNOWN_PANIC_MODELS) with hardware contexts, root-cause hypotheses, and verified workarounds.
Local AI for Apple Silicon. Chat with open LLMs (Qwen, Gemma, DeepSeek, Mistral, Llama…) fully offline on your Mac via Apple MLX — private by design, zero telemetry.
RETIX is a local-first vision MLX based CLI for agents that need to inspect screenshots, extract visible text, and verify visual claims with deterministic output.
OpenAI-compatible Qwen3-Omni server on Apple Silicon (MLX / mlx-vlm). Text + image + audio + video in, text / tool calls out. Single-flight queue, resident model, sync-only.
OpenAI-compatible multi-model Qwen server for Apple Silicon (MLX / mlx-vlm): Qwen3-Omni + Qwen3.6-27B in one process, memory-budgeted resident cache, text/image/audio/video in, text/tool-calls out.
Extract structured information from documents, convert images to Markdown, and generate templates with the NuMind NuExtract3 model on Apple Silicon with MLX.
Analyze medical text, 2D images (e.g. chest X-ray), 3D CT volumes, and whole-slide pathology images with the Google MedGemma 1.5 model on Apple Silicon with MLX.