MLX implementations of Hugging Face-style models for Apple Silicon.
pip install mlx-transformersFor local development:
pip install -r requirements.txt
pip install -e .import mlx.core as mx
from transformers import AutoConfig, AutoTokenizer
from mlx_transformers.models import BertModel
model_name = "sentence-transformers/all-MiniLM-L6-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
model = BertModel(config)
model.from_pretrained(model_name)
inputs = tokenizer("Hello from MLX", return_tensors="np")
inputs = {k: mx.array(v) for k, v in inputs.items()}
outputs = model(**inputs)Quantized loading:
model.from_pretrained(
model_name,
quantize=True,
group_size=64,
bits=4,
mode="affine",
)- BERT
- RoBERTa
- XLM-RoBERTa
- LLaMA
- Phi
- Phi-3
- Qwen3
- Qwen3-VL
- OpenELM
- Persimmon
- Fuyu
- Gemma3
- M2M100 / NLLB
Phi-3:
python examples/text_generation/phi3_generation.py \
--model-name microsoft/Phi-3-mini-4k-instruct \
--prompt "Explain attention masking." \
--max-tokens 128 \
--temp 0.0Qwen3-VL:
python examples/text_generation/qwen3_vl_generation.py \
--model-name Qwen/Qwen3-VL-2B-Instruct \
--image-url "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg" \
--prompt "Describe the image." \
--max-tokens 128 \
--temp 0.0NLLB:
python examples/translation/nllb_translation.py \
--model_name facebook/nllb-200-distilled-600M \
--revision refs/pr/45 \
--source_language English \
--target_language Yoruba \
--text_to_translate "Let us translate text to Yoruba"Chat UI:
cd chat
bash start.shBenchmark:
python examples/text_generation/benchmark_generation.py --helppython -m unittestSome models are gated on Hugging Face. Set HF_TOKEN if needed.