This directory contains runnable client examples for MATA-SERVER.
| Script | Transport | Tasks covered |
|---|---|---|
rest_infer.py |
REST (POST /v1/infer) |
detect, classify, segment |
rest_vlm.py |
REST (POST /v1/infer) |
vlm |
ws_video_infer.py |
WebSocket (WS /v1/stream/{id}) |
detect, segment, classify, vlm |
Prerequisites — a MATA-SERVER instance must be reachable before running any example. Start one locally with:
MATA_SERVER_AUTH_MODE=none mataserver serveAll examples default to
127.0.0.1:8110. Pass--host/--portto override.
Runs single-shot inference against the REST API using a base64-encoded image payload.
Covers the three classic vision tasks — object detection, image classification, and instance segmentation.
pip install requests# Run all three tasks with the default model set
python examples/rest_infer.py --image examples/images/coco_cat_remote.jpg
# Run a single task
python examples/rest_infer.py \
--image examples/images/coco_cat_remote.jpg \
--task detect \
--model PekingU/rtdetr_r18vd
# Zero-shot open-vocabulary detection with text prompts
python examples/rest_infer.py \
--image examples/images/coco_cat_remote.jpg \
--task detect \
--model google/owlv2-base-patch16-ensemble \
--prompts "cat,dog,remote control"| Argument | Default | Description |
|---|---|---|
--image |
(required) | Path to an image file |
--task |
all three | One of detect, classify, segment |
--model |
task default | HuggingFace model repo ID; defaults to the bundled task-to-model map |
--prompts |
— | Comma-separated text prompts for zero-shot / open-vocabulary models |
--host |
127.0.0.1 |
Server hostname |
--port |
8110 |
Server port |
| Task | Default model |
|---|---|
detect |
PekingU/rtdetr_r18vd |
classify |
google/vit-base-patch16-224 |
segment |
facebook/mask2former-swin-tiny-coco-instance |
--- Task: DETECT | Model: PekingU/rtdetr_r18vd ---
3 detection(s)
[0.94] cat bbox=[42, 10, 380, 470]
[0.81] remote bbox=[200, 300, 260, 420]
[0.57] couch bbox=[0, 250, 480, 480]
Full response keys: ['schema_version', 'task', 'model', 'timestamp', 'detections']
--- Task: CLASSIFY | Model: google/vit-base-patch16-224 ---
5 class(es)
[0.84] tabby cat
[0.07] Egyptian cat
...
--- Task: SEGMENT | Model: facebook/mask2former-swin-tiny-coco-instance ---
4 segment(s)
[0.91] cat bbox=[42, 10, 380, 470]
...
Sends an image and a natural-language prompt to a Visual Language Model (VLM) via the REST API and prints the generated response text.
pip install requests# Basic question about an image
python examples/rest_vlm.py \
--image examples/images/coco_cat_remote.jpg \
--prompt "What do you see in this image?"
# Control generation parameters
python examples/rest_vlm.py \
--image examples/images/coco_cat_remote.jpg \
--prompt "List every object you can identify." \
--max-tokens 256 \
--temperature 0.3
# Use a different VLM
python examples/rest_vlm.py \
--image examples/images/coco_cat_remote.jpg \
--prompt "Describe the scene in one sentence." \
--model Qwen/Qwen2.5-VL-7B-Instruct| Argument | Default | Description |
|---|---|---|
--image |
(required) | Path to an image file |
--prompt |
"Describe this image." |
Natural-language question or instruction |
--model |
Qwen/Qwen2.5-VL-3B-Instruct |
VLM model repo ID |
--max-tokens |
— | Maximum number of tokens to generate |
--temperature |
— | Sampling temperature (0.0 = greedy, higher = more creative) |
--host |
127.0.0.1 |
Server hostname |
--port |
8110 |
Server port |
--- VLM Inference ---
Model : Qwen/Qwen2.5-VL-3B-Instruct
Prompt : 'What do you see in this image?'
Response:
The image shows a cat sitting on a couch next to a remote control. The cat appears to be relaxed and is looking towards the camera.
Streams a local video file to MATA-SERVER over a WebSocket connection and prints inference results frame-by-frame as they arrive.
Implements the full session lifecycle:
POST /v1/sessions— create a streaming session and receive asession_idWS /v1/stream/{session_id}— connect and stream binary-encoded framesDELETE /v1/sessions/{session_id}— clean up after streaming ends
Frames are encoded using the MATA binary wire format: a 13-byte header (frame_id uint32 BE + timestamp float64 BE + encoding uint8) followed by JPEG bytes.
pip install aiohttp opencv-python# Object detection on a video
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--task detect
# Limit to first 60 frames and cap the send rate
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--task detect \
--max-frames 60 \
--fps-limit 15
# Use the "latest" frame policy — server always processes the newest frame
# (drops intermediate frames when inference is slower than send rate)
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--model PekingU/rtdetr_r18vd \
--task detect \
--frame-policy latest
# Authenticated server
python examples/ws_video_infer.py \
--video examples/videos/cup.mp4 \
--task detect \
--api-key my-secret-key| Argument | Default | Description |
|---|---|---|
--video |
(required) | Path to a video file (mp4, avi, etc.) |
--task |
(required) | Inference task: detect, segment, classify, vlm, etc. |
--model |
PekingU/rtdetr_r18vd |
HuggingFace model repo ID |
--max-frames |
0 (all) |
Maximum frames to send; 0 = entire video |
--fps-limit |
0 (native) |
Cap send rate in frames per second; 0 = no limit |
--frame-policy |
queue |
queue (process every frame in order) or latest (skip stale frames) |
--api-key |
— | Bearer token for authenticated servers |
--host |
127.0.0.1 |
Server hostname |
--port |
8110 |
Server port |
| Policy | Behaviour | Best for |
|---|---|---|
queue |
Every frame is queued and processed in order. No frames are dropped. | Offline analysis, accuracy-critical tasks |
latest |
When the server is busy, older queued frames are dropped and only the most recent is kept. | Real-time / live-stream scenarios |
[1/3] Creating session model='PekingU/rtdetr_r18vd' task='detect' frame_policy='queue'
session_id : sess_a1b2c3d4e5f6
[2/3] Streaming 120/120 frames @ 30.0 fps
[frame 0] 2 detections
[frame 1] 2 detections
[frame 2] 3 detections
...
Sent 120 frames. Waiting for results…
Sent : 120 frames in 4.01s (29.9 fps)
Received: 118 results | 0 dropped | 0 errors
[3/3] Deleting session sess_a1b2c3d4e5f6
Session deleted (204)
| File | Description |
|---|---|
images/coco_cat_remote.jpg |
COCO-style photo with a cat and a TV remote — used by rest_infer.py and rest_vlm.py |
videos/cup.mp4 |
Short clip of a cup — used by ws_video_infer.py |
- API reference — full endpoint specs and request/response schemas
- Streaming protocol — binary frame format and WebSocket lifecycle
- Deployment guide — Docker, GPU, and production configuration
- Root README — project overview, quick-start, and CLI reference