Skip to content

Submit: Rapid-MLX — OpenAI-compatible local LLM server for Apple Silicon (2-4x faster than Ollama) #194

@raullenchai

Description

@raullenchai

Tool Submission: Rapid-MLX

Name: Rapid-MLX
URL: https://github.com/raullenchai/Rapid-MLX
Category: Developer Tools / AI / Local LLM Inference
License: Apache-2.0
Language: Python

What is Rapid-MLX?

An OpenAI-compatible local LLM inference server built specifically for Apple Silicon. It delivers 2–4x faster token generation than Ollama by running directly on MLX (Apple's ML framework) with a highly optimized streaming pipeline.

Key Features

  • OpenAI-compatible API — drop-in replacement, works with any OpenAI SDK client
  • 2–4x faster than Ollama on Apple Silicon (M1/M2/M3/M4)
  • Tool calling — full function/tool calling support for agentic workflows
  • Reasoning models — streaming <think> token support (Qwen3, DeepSeek-R1, etc.)
  • Vision & Audio — multimodal model support
  • Structured output — JSON schema enforcement
  • Prompt caching — persistent KV cache across requests for faster multi-turn chats
  • Speculative decoding (MTP) — 1.4x additional decode speedup on supported models

Install

# Homebrew (macOS)
brew install raullenchai/rapid-mlx/rapid-mlx

# pip
pip install rapid-mlx

Why it's relevant to developers

Local LLM inference on Mac has historically been bottlenecked by Ollama's overhead. Rapid-MLX bypasses that by integrating directly with Apple's MLX framework, giving developers a fully OpenAI-compatible server that runs substantially faster — making local AI development and testing much more practical on MacBooks and Mac Studios.

Benchmark (Qwen3.5-9B, M3 Ultra)

Engine Tokens/sec
Rapid-MLX ~95 tok/s
mlx-lm ~90 tok/s
Ollama ~23 tok/s

Rapid-MLX is 4.2x faster than Ollama on this configuration.


Happy to provide any additional info or assets needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions