Examples

This directory contains runnable client examples for MATA-SERVER.

Script	Transport	Tasks covered
`rest_infer.py`	REST (`POST /v1/infer`)	`detect`, `classify`, `segment`
`rest_vlm.py`	REST (`POST /v1/infer`)	`vlm`
`ws_video_infer.py`	WebSocket (`WS /v1/stream/{id}`)	`detect`, `segment`, `classify`, `vlm`

Prerequisites — a MATA-SERVER instance must be reachable before running any example. Start one locally with:
MATA_SERVER_AUTH_MODE=none mataserver serve
All examples default to 127.0.0.1:8110. Pass --host / --port to override.

rest_infer.py

Runs single-shot inference against the REST API using a base64-encoded image payload.
Covers the three classic vision tasks — object detection, image classification, and instance segmentation.

Requirements

pip install requests

Usage

# Run all three tasks with the default model set
python examples/rest_infer.py --image examples/images/coco_cat_remote.jpg

# Run a single task
python examples/rest_infer.py \
  --image examples/images/coco_cat_remote.jpg \
  --task detect \
  --model PekingU/rtdetr_r18vd

# Zero-shot open-vocabulary detection with text prompts
python examples/rest_infer.py \
  --image examples/images/coco_cat_remote.jpg \
  --task detect \
  --model google/owlv2-base-patch16-ensemble \
  --prompts "cat,dog,remote control"

Arguments

Argument	Default	Description
`--image`	(required)	Path to an image file
`--task`	all three	One of `detect`, `classify`, `segment`
`--model`	task default	HuggingFace model repo ID; defaults to the bundled task-to-model map
`--prompts`	—	Comma-separated text prompts for zero-shot / open-vocabulary models
`--host`	`127.0.0.1`	Server hostname
`--port`	`8110`	Server port

Default models

Task	Default model
`detect`	`PekingU/rtdetr_r18vd`
`classify`	`google/vit-base-patch16-224`
`segment`	`facebook/mask2former-swin-tiny-coco-instance`

Sample output

--- Task: DETECT  |  Model: PekingU/rtdetr_r18vd ---
  3 detection(s)
    [0.94] cat  bbox=[42, 10, 380, 470]
    [0.81] remote  bbox=[200, 300, 260, 420]
    [0.57] couch  bbox=[0, 250, 480, 480]
  Full response keys: ['schema_version', 'task', 'model', 'timestamp', 'detections']

--- Task: CLASSIFY  |  Model: google/vit-base-patch16-224 ---
  5 class(es)
    [0.84] tabby cat
    [0.07] Egyptian cat
    ...

--- Task: SEGMENT  |  Model: facebook/mask2former-swin-tiny-coco-instance ---
  4 segment(s)
    [0.91] cat  bbox=[42, 10, 380, 470]
    ...

rest_vlm.py

Sends an image and a natural-language prompt to a Visual Language Model (VLM) via the REST API and prints the generated response text.

Requirements

pip install requests

Usage

# Basic question about an image
python examples/rest_vlm.py \
  --image examples/images/coco_cat_remote.jpg \
  --prompt "What do you see in this image?"

# Control generation parameters
python examples/rest_vlm.py \
  --image examples/images/coco_cat_remote.jpg \
  --prompt "List every object you can identify." \
  --max-tokens 256 \
  --temperature 0.3

# Use a different VLM
python examples/rest_vlm.py \
  --image examples/images/coco_cat_remote.jpg \
  --prompt "Describe the scene in one sentence." \
  --model Qwen/Qwen2.5-VL-7B-Instruct

Arguments

Argument	Default	Description
`--image`	(required)	Path to an image file
`--prompt`	`"Describe this image."`	Natural-language question or instruction
`--model`	`Qwen/Qwen2.5-VL-3B-Instruct`	VLM model repo ID
`--max-tokens`	—	Maximum number of tokens to generate
`--temperature`	—	Sampling temperature (`0.0` = greedy, higher = more creative)
`--host`	`127.0.0.1`	Server hostname
`--port`	`8110`	Server port

Sample output

--- VLM Inference ---
  Model  : Qwen/Qwen2.5-VL-3B-Instruct
  Prompt : 'What do you see in this image?'

  Response:
The image shows a cat sitting on a couch next to a remote control. The cat appears to be relaxed and is looking towards the camera.

ws_video_infer.py

Streams a local video file to MATA-SERVER over a WebSocket connection and prints inference results frame-by-frame as they arrive.
Implements the full session lifecycle:

POST /v1/sessions — create a streaming session and receive a session_id
WS /v1/stream/{session_id} — connect and stream binary-encoded frames
DELETE /v1/sessions/{session_id} — clean up after streaming ends

Frames are encoded using the MATA binary wire format: a 13-byte header (frame_id uint32 BE + timestamp float64 BE + encoding uint8) followed by JPEG bytes.

Requirements

pip install aiohttp opencv-python

Usage

# Object detection on a video
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --task detect

# Limit to first 60 frames and cap the send rate
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --task detect \
  --max-frames 60 \
  --fps-limit 15

# Use the "latest" frame policy — server always processes the newest frame
# (drops intermediate frames when inference is slower than send rate)
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --model PekingU/rtdetr_r18vd \
  --task detect \
  --frame-policy latest

# Authenticated server
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --task detect \
  --api-key my-secret-key

Arguments

Argument	Default	Description
`--video`	(required)	Path to a video file (`mp4`, `avi`, etc.)
`--task`	(required)	Inference task: `detect`, `segment`, `classify`, `vlm`, etc.
`--model`	`PekingU/rtdetr_r18vd`	HuggingFace model repo ID
`--max-frames`	`0` (all)	Maximum frames to send; `0` = entire video
`--fps-limit`	`0` (native)	Cap send rate in frames per second; `0` = no limit
`--frame-policy`	`queue`	`queue` (process every frame in order) or `latest` (skip stale frames)
`--api-key`	—	Bearer token for authenticated servers
`--host`	`127.0.0.1`	Server hostname
`--port`	`8110`	Server port

Frame policies

Policy	Behaviour	Best for
`queue`	Every frame is queued and processed in order. No frames are dropped.	Offline analysis, accuracy-critical tasks
`latest`	When the server is busy, older queued frames are dropped and only the most recent is kept.	Real-time / live-stream scenarios

Sample output

[1/3] Creating session  model='PekingU/rtdetr_r18vd'  task='detect'  frame_policy='queue'
  session_id : sess_a1b2c3d4e5f6

[2/3] Streaming 120/120 frames @ 30.0 fps
  [frame    0] 2 detections
  [frame    1] 2 detections
  [frame    2] 3 detections
  ...
  Sent 120 frames. Waiting for results…

  Sent    : 120 frames in 4.01s (29.9 fps)
  Received: 118 results | 0 dropped | 0 errors

[3/3] Deleting session sess_a1b2c3d4e5f6
  Session deleted (204)

Sample assets

File	Description
`images/coco_cat_remote.jpg`	COCO-style photo with a cat and a TV remote — used by `rest_infer.py` and `rest_vlm.py`
`videos/cup.mp4`	Short clip of a cup — used by `ws_video_infer.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples

rest_infer.py

Requirements

Usage

Arguments

Default models

Sample output

rest_vlm.py

Requirements

Usage

Arguments

Sample output

ws_video_infer.py

Requirements

Usage

Arguments

Frame policies

Sample output

Sample assets

Further reading

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Examples

rest_infer.py

Requirements

Usage

Arguments

Default models

Sample output

rest_vlm.py

Requirements

Usage

Arguments

Sample output

ws_video_infer.py

Requirements

Usage

Arguments

Frame policies

Sample output

Sample assets

Further reading