Write your vision pipeline once. Swap any model β HuggingFace, ONNX, Torchvision β without changing a line of code.
For ML engineers and CV practitioners who want YOLO-like simplicity with HuggingFace-scale model choice. MATA is a task-centric computer vision framework built on three ideas:
- Universal model loading β load any model by HuggingFace ID, local ONNX file, or config alias with one API
- Composable graph pipelines β wire Detect β Segment β Embed into typed DAGs with parallel execution, conditional branching, and control flow
- Zero-shot everything β CLIP classify, GroundingDINO detect, SAM segment β no training required
One-liner inference β any HuggingFace model, three lines:
import mata
result = mata.run("detect", "image.jpg", model="facebook/detr-resnet-50")
for det in result.instances:
print(f"{det.label_name}: {det.score:.2f} at {det.bbox}")Multi-task graph pipeline β MATA's unique power. Compose tasks into typed, parallel workflows:
import mata
from mata.nodes import Detect, Filter, PromptBoxes, Fuse
result = mata.infer(
image="image.jpg",
graph=[
Detect(using="detector", text_prompts="cat . dog", out="dets"),
Filter(src="dets", score_gt=0.3, out="filtered"),
PromptBoxes(using="segmenter", dets="filtered", out="masks"),
Fuse(dets="filtered", masks="masks", out="final"),
],
providers={
"detector": mata.load("detect", "IDEA-Research/grounding-dino-tiny"),
"segmenter": mata.load("segment", "facebook/sam-vit-base"),
}
)CLI β run from the terminal, no script needed:
mata run detect image.jpg --model facebook/detr-resnet-50 --conf 0.4 --save
mata track video.mp4 --model facebook/detr-resnet-50 --tracker botsort --save
mata recognize person.jpg --gallery gallery.npz --model openai/clip-vit-base-patch32pip install datamataFor GPU acceleration, install PyTorch with CUDA first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install datamataSee INSTALLATION.md for CUDA version table, optional dependencies (ONNX, barcode, notebook, Valkey), and troubleshooting.
result = mata.run("detect", "image.jpg", model="facebook/detr-resnet-50", threshold=0.4)
for det in result.instances:
print(f"{det.label_name}: {det.score:.2f} at {det.bbox}")result = mata.run("classify", "image.jpg", model="microsoft/resnet-50")
print(f"Top-1: {result.top1.label_name} ({result.top1.score:.2%})")result = mata.run("segment", "image.jpg",
model="facebook/mask2former-swin-tiny-coco-instance", threshold=0.5)
instances = result.get_instances()result = mata.run("depth", "image.jpg",
model="depth-anything/Depth-Anything-V2-Small-hf")
result.save("depth.png", colormap="magma")| Task | One-liner | Guide |
|---|---|---|
| OCR | mata.run("ocr", "doc.jpg", model="easyocr") |
OCR Guide |
| Tracking | mata.track("video.mp4", model="...", tracker="botsort") |
Tracking Guide |
| VLM | mata.run("vlm", "img.jpg", model="Qwen/Qwen3-VL-2B-Instruct", prompt="...") |
VLM Guide |
| Embedding | mata.run("embed", "img.jpg", model="openai/clip-vit-base-patch32") |
Embed Example |
| Barcode | mata.run("barcode", "img.jpg", model="pyzbar") |
Barcode Examples |
| Recognition | mata.run("recognize", "img.jpg", gallery=gallery, model="...") |
Recognition Guide |
Compose multi-task workflows as typed directed graphs. Run independent tasks in parallel for 1.5-3x speedup:
from mata.nodes import Detect, Classify, EstimateDepth, Fuse
from mata.core.graph import Graph
result = mata.infer(
image="scene.jpg",
graph=Graph("scene_analysis").parallel([
Detect(using="detector", out="dets"),
Classify(using="classifier", text_prompts=["indoor", "outdoor"], out="cls"),
EstimateDepth(using="depth", out="depth"),
]).then(
Fuse(dets="dets", classification="cls", depth="depth", out="scene")
),
providers={
"detector": mata.load("detect", "facebook/detr-resnet-50"),
"classifier": mata.load("classify", "openai/clip-vit-base-patch32"),
"depth": mata.load("depth", "depth-anything/Depth-Anything-V2-Small-hf"),
}
)Control flow primitives (v1.9.5) β EarlyExit, While, and Graph.add(condition=...) for quality gates, feedback loops, and adaptive pipelines.
Pre-built presets for common workflows:
from mata.presets import grounding_dino_sam, full_scene_analysis
result = mata.infer("image.jpg", grounding_dino_sam(), providers={...})See Graph API Reference | Cookbook | Examples
Perform any vision task without training β just provide text prompts:
# Classify into arbitrary categories
result = mata.run("classify", "image.jpg",
model="openai/clip-vit-base-patch32",
text_prompts=["cat", "dog", "bird"])
# Detect objects by description
result = mata.run("detect", "image.jpg",
model="IDEA-Research/grounding-dino-tiny",
text_prompts="red apple . green apple . banana")
# Segment anything with point/box/text prompts
result = mata.run("segment", "image.jpg",
model="facebook/sam-vit-base",
point_prompts=[(320, 240, 1)])See Zero-Shot Guide for CLIP, GroundingDINO, OWL-ViT, SAM, and SAM3 details.
Track objects across video with persistent IDs, ReID, and streaming support:
# One-liner video tracking
results = mata.track("video.mp4",
model="facebook/detr-resnet-50", tracker="botsort", conf=0.3, save=True)
# Memory-efficient streaming for RTSP / long videos
for result in mata.track("rtsp://camera/stream",
model="facebook/detr-resnet-50", stream=True):
print(f"Active tracks: {len(result.instances)}")
# Appearance-based ReID β recover IDs after occlusion
results = mata.track("video.mp4", model="facebook/detr-resnet-50",
reid_model="openai/clip-vit-base-patch32")ByteTrack and BotSort are fully vendored β no external tracking dependencies. See Tracking Guide for ByteTrack vs BotSort comparison, cross-camera ReID, and YAML config.
mata run detect image.jpg --model facebook/detr-resnet-50 --conf 0.4 --save
mata run classify image.jpg --model microsoft/resnet-50 --json
mata run vlm image.jpg --model Qwen/Qwen3-VL-2B-Instruct --prompt "Describe this"
mata track video.mp4 --model facebook/detr-resnet-50 --tracker botsort --save
mata val detect --data coco.yaml --model facebook/detr-resnet-50
mata --versionAll subcommands support --help. See CLI Examples.
MATA works with any model from HuggingFace Transformers, Torchvision, or local ONNX/TorchScript files. Tested and recommended models:
| Task | Representative Models | Runtimes |
|---|---|---|
| Detection | DETR, RT-DETR, GroundingDINO, OWL-ViT, RetinaNet, Faster R-CNN, FCOS, SSD | PyTorch, ONNX, TorchScript, Torchvision |
| Classification | ResNet, ViT, ConvNeXt, EfficientNet, Swin, CLIP (zero-shot) | PyTorch, ONNX, TorchScript |
| Segmentation | Mask2Former, MaskFormer, SAM, SAM3 (zero-shot) | PyTorch |
| Depth | Depth Anything V1/V2 | PyTorch |
| VLM | Qwen3-VL, MedGemma, Florence-2, LLaVA-NeXT, SmolVLM, Moondream2, + 3 more | PyTorch |
| OCR | EasyOCR, PaddleOCR, Tesseract, GOT-OCR2, TrOCR | PyTorch |
| Embedding | CLIP, DINOv2, OSNet | PyTorch, ONNX |
| Barcode | pyzbar, zxing-cpp | Native |
See Supported Models for model IDs, benchmarks, and runtime compatibility matrix.
- Training-first workflows β
mata.train()is in beta (v2.0.0b1). If training is your primary need today, consider HuggingFace Trainer directly. - Edge / mobile deployment β TensorRT and TFLite export are planned but not yet available.
- Single-model, maximum-throughput β MATA's adapter layer adds ~1-2ms overhead. For bare-metal speed on one model, use the runtime directly.
mata.run() / mata.load() / mata.infer()
|
UniversalLoader (5-strategy auto-detection)
|
Task Adapters (HuggingFace / ONNX / TorchScript / Torchvision)
| |
VisionResult (single-task) Graph System (multi-task)
| |
Runtime Layer Parallel scheduler + control flow
|
Export (JSON / CSV / image overlay / crops)
See CHANGELOG.md for full version history.
- v2.0 (Q2 2026) β Training module (
mata.train()), TensorRT, mobile export, breaking API cleanup - v2.x β HuggingFace Hub model recommendations, KACA CNN integration, V2L HyperLoRA research
- v2.5+ β 3D vision, edge deployment, Auto-ML
- Quickstart Guide β get running in 5 minutes
- Notebook Examples β interactive Jupyter tutorials
- Graph Cookbook β multi-task pipeline recipes
- Real-World Scenarios β 20 industry-ready pipelines
- Quick Reference β export, config, validation cheat sheet
- Validation Guide β mAP, accuracy, and depth metrics against COCO / ImageNet / DIODE
Apache License 2.0. See LICENSE and NOTICE.
MATA does not distribute model weights. Models fetched via mata.load() are governed by their own licenses (Apache 2.0, MIT, CC-BY-NC, etc.). You are responsible for complying with model-specific terms.
Contributions welcome. See CONTRIBUTING.md for guidelines (Apache 2.0 compatibility, >80% test coverage, Black formatting, type hints).
Built on HuggingFace Transformers, PyTorch, and ONNX Runtime.
