Skip to content

datamata-io/mata

MATA Logo

MATA | Model-Agnostic Task Architecture

Write your vision pipeline once. Swap any model β€” HuggingFace, ONNX, Torchvision β€” without changing a line of code.

Python Apache 2.0 v1.9.6 Tests


For ML engineers and CV practitioners who want YOLO-like simplicity with HuggingFace-scale model choice. MATA is a task-centric computer vision framework built on three ideas:

  1. Universal model loading β€” load any model by HuggingFace ID, local ONNX file, or config alias with one API
  2. Composable graph pipelines β€” wire Detect β†’ Segment β†’ Embed into typed DAGs with parallel execution, conditional branching, and control flow
  3. Zero-shot everything β€” CLIP classify, GroundingDINO detect, SAM segment β€” no training required

See It in Action

One-liner inference β€” any HuggingFace model, three lines:

import mata

result = mata.run("detect", "image.jpg", model="facebook/detr-resnet-50")
for det in result.instances:
    print(f"{det.label_name}: {det.score:.2f} at {det.bbox}")

Multi-task graph pipeline β€” MATA's unique power. Compose tasks into typed, parallel workflows:

import mata
from mata.nodes import Detect, Filter, PromptBoxes, Fuse

result = mata.infer(
    image="image.jpg",
    graph=[
        Detect(using="detector", text_prompts="cat . dog", out="dets"),
        Filter(src="dets", score_gt=0.3, out="filtered"),
        PromptBoxes(using="segmenter", dets="filtered", out="masks"),
        Fuse(dets="filtered", masks="masks", out="final"),
    ],
    providers={
        "detector":  mata.load("detect", "IDEA-Research/grounding-dino-tiny"),
        "segmenter": mata.load("segment", "facebook/sam-vit-base"),
    }
)

CLI β€” run from the terminal, no script needed:

mata run detect image.jpg --model facebook/detr-resnet-50 --conf 0.4 --save
mata track video.mp4 --model facebook/detr-resnet-50 --tracker botsort --save
mata recognize person.jpg --gallery gallery.npz --model openai/clip-vit-base-patch32

Installation

pip install datamata

For GPU acceleration, install PyTorch with CUDA first:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install datamata

See INSTALLATION.md for CUDA version table, optional dependencies (ONNX, barcode, notebook, Valkey), and troubleshooting.

Core Tasks

Detection

result = mata.run("detect", "image.jpg", model="facebook/detr-resnet-50", threshold=0.4)
for det in result.instances:
    print(f"{det.label_name}: {det.score:.2f} at {det.bbox}")

Classification

result = mata.run("classify", "image.jpg", model="microsoft/resnet-50")
print(f"Top-1: {result.top1.label_name} ({result.top1.score:.2%})")

Segmentation

result = mata.run("segment", "image.jpg",
    model="facebook/mask2former-swin-tiny-coco-instance", threshold=0.5)
instances = result.get_instances()

Depth Estimation

result = mata.run("depth", "image.jpg",
    model="depth-anything/Depth-Anything-V2-Small-hf")
result.save("depth.png", colormap="magma")

And More

Task One-liner Guide
OCR mata.run("ocr", "doc.jpg", model="easyocr") OCR Guide
Tracking mata.track("video.mp4", model="...", tracker="botsort") Tracking Guide
VLM mata.run("vlm", "img.jpg", model="Qwen/Qwen3-VL-2B-Instruct", prompt="...") VLM Guide
Embedding mata.run("embed", "img.jpg", model="openai/clip-vit-base-patch32") Embed Example
Barcode mata.run("barcode", "img.jpg", model="pyzbar") Barcode Examples
Recognition mata.run("recognize", "img.jpg", gallery=gallery, model="...") Recognition Guide

What Makes MATA Different

Graph Pipelines

Compose multi-task workflows as typed directed graphs. Run independent tasks in parallel for 1.5-3x speedup:

from mata.nodes import Detect, Classify, EstimateDepth, Fuse
from mata.core.graph import Graph

result = mata.infer(
    image="scene.jpg",
    graph=Graph("scene_analysis").parallel([
        Detect(using="detector", out="dets"),
        Classify(using="classifier", text_prompts=["indoor", "outdoor"], out="cls"),
        EstimateDepth(using="depth", out="depth"),
    ]).then(
        Fuse(dets="dets", classification="cls", depth="depth", out="scene")
    ),
    providers={
        "detector": mata.load("detect", "facebook/detr-resnet-50"),
        "classifier": mata.load("classify", "openai/clip-vit-base-patch32"),
        "depth": mata.load("depth", "depth-anything/Depth-Anything-V2-Small-hf"),
    }
)

Control flow primitives (v1.9.5) β€” EarlyExit, While, and Graph.add(condition=...) for quality gates, feedback loops, and adaptive pipelines.

Pre-built presets for common workflows:

from mata.presets import grounding_dino_sam, full_scene_analysis
result = mata.infer("image.jpg", grounding_dino_sam(), providers={...})

See Graph API Reference | Cookbook | Examples

Zero-Shot Vision

Perform any vision task without training β€” just provide text prompts:

# Classify into arbitrary categories
result = mata.run("classify", "image.jpg",
    model="openai/clip-vit-base-patch32",
    text_prompts=["cat", "dog", "bird"])

# Detect objects by description
result = mata.run("detect", "image.jpg",
    model="IDEA-Research/grounding-dino-tiny",
    text_prompts="red apple . green apple . banana")

# Segment anything with point/box/text prompts
result = mata.run("segment", "image.jpg",
    model="facebook/sam-vit-base",
    point_prompts=[(320, 240, 1)])

See Zero-Shot Guide for CLIP, GroundingDINO, OWL-ViT, SAM, and SAM3 details.

Object Tracking

Track objects across video with persistent IDs, ReID, and streaming support:

# One-liner video tracking
results = mata.track("video.mp4",
    model="facebook/detr-resnet-50", tracker="botsort", conf=0.3, save=True)

# Memory-efficient streaming for RTSP / long videos
for result in mata.track("rtsp://camera/stream",
                         model="facebook/detr-resnet-50", stream=True):
    print(f"Active tracks: {len(result.instances)}")

# Appearance-based ReID β€” recover IDs after occlusion
results = mata.track("video.mp4", model="facebook/detr-resnet-50",
    reid_model="openai/clip-vit-base-patch32")

ByteTrack and BotSort are fully vendored β€” no external tracking dependencies. See Tracking Guide for ByteTrack vs BotSort comparison, cross-camera ReID, and YAML config.

Command-Line Interface

mata run detect image.jpg --model facebook/detr-resnet-50 --conf 0.4 --save
mata run classify image.jpg --model microsoft/resnet-50 --json
mata run vlm image.jpg --model Qwen/Qwen3-VL-2B-Instruct --prompt "Describe this"
mata track video.mp4 --model facebook/detr-resnet-50 --tracker botsort --save
mata val detect --data coco.yaml --model facebook/detr-resnet-50
mata --version

All subcommands support --help. See CLI Examples.

Supported Models

MATA works with any model from HuggingFace Transformers, Torchvision, or local ONNX/TorchScript files. Tested and recommended models:

Task Representative Models Runtimes
Detection DETR, RT-DETR, GroundingDINO, OWL-ViT, RetinaNet, Faster R-CNN, FCOS, SSD PyTorch, ONNX, TorchScript, Torchvision
Classification ResNet, ViT, ConvNeXt, EfficientNet, Swin, CLIP (zero-shot) PyTorch, ONNX, TorchScript
Segmentation Mask2Former, MaskFormer, SAM, SAM3 (zero-shot) PyTorch
Depth Depth Anything V1/V2 PyTorch
VLM Qwen3-VL, MedGemma, Florence-2, LLaVA-NeXT, SmolVLM, Moondream2, + 3 more PyTorch
OCR EasyOCR, PaddleOCR, Tesseract, GOT-OCR2, TrOCR PyTorch
Embedding CLIP, DINOv2, OSNet PyTorch, ONNX
Barcode pyzbar, zxing-cpp Native

See Supported Models for model IDs, benchmarks, and runtime compatibility matrix.

When NOT to Use MATA

  • Training-first workflows β€” mata.train() is in beta (v2.0.0b1). If training is your primary need today, consider HuggingFace Trainer directly.
  • Edge / mobile deployment β€” TensorRT and TFLite export are planned but not yet available.
  • Single-model, maximum-throughput β€” MATA's adapter layer adds ~1-2ms overhead. For bare-metal speed on one model, use the runtime directly.

Architecture

mata.run() / mata.load() / mata.infer()
         |
   UniversalLoader (5-strategy auto-detection)
         |
   Task Adapters (HuggingFace / ONNX / TorchScript / Torchvision)
         |                          |
   VisionResult (single-task)   Graph System (multi-task)
         |                          |
   Runtime Layer              Parallel scheduler + control flow
         |
   Export (JSON / CSV / image overlay / crops)

Roadmap

See CHANGELOG.md for full version history.

  • v2.0 (Q2 2026) β€” Training module (mata.train()), TensorRT, mobile export, breaking API cleanup
  • v2.x β€” HuggingFace Hub model recommendations, KACA CNN integration, V2L HyperLoRA research
  • v2.5+ β€” 3D vision, edge deployment, Auto-ML

What's Next?

License

Apache License 2.0. See LICENSE and NOTICE.

MATA does not distribute model weights. Models fetched via mata.load() are governed by their own licenses (Apache 2.0, MIT, CC-BY-NC, etc.). You are responsible for complying with model-specific terms.

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines (Apache 2.0 compatibility, >80% test coverage, Black formatting, type hints).

Acknowledgments

Built on HuggingFace Transformers, PyTorch, and ONNX Runtime.

About

🦾 MATA | Model-agnostic computer vision framework β€” detect, track, segment, classify, embed & more. One API for HuggingFace, ONNX, and TorchScript models. Built-in ByteTrack/BotSort tracking, ReID, VLM agent tool-calling, barcode decoding, and OCR evaluation. Zero-shot ready. πŸš€

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages