Skip to content

Latest commit

 

History

History
257 lines (196 loc) · 10.2 KB

File metadata and controls

257 lines (196 loc) · 10.2 KB

Examples

This directory contains runnable client examples for MATA-SERVER.

Script Transport Tasks covered
rest_infer.py REST (POST /v1/infer) detect, classify, segment
rest_vlm.py REST (POST /v1/infer) vlm
ws_video_infer.py WebSocket (WS /v1/stream/{id}) detect, segment, classify, vlm

Prerequisites — a MATA-SERVER instance must be reachable before running any example. Start one locally with:

MATA_SERVER_AUTH_MODE=none mataserver serve

All examples default to 127.0.0.1:8110. Pass --host / --port to override.


rest_infer.py

Runs single-shot inference against the REST API using a base64-encoded image payload.
Covers the three classic vision tasks — object detection, image classification, and instance segmentation.

Requirements

pip install requests

Usage

# Run all three tasks with the default model set
python examples/rest_infer.py --image examples/images/coco_cat_remote.jpg

# Run a single task
python examples/rest_infer.py \
  --image examples/images/coco_cat_remote.jpg \
  --task detect \
  --model PekingU/rtdetr_r18vd

# Zero-shot open-vocabulary detection with text prompts
python examples/rest_infer.py \
  --image examples/images/coco_cat_remote.jpg \
  --task detect \
  --model google/owlv2-base-patch16-ensemble \
  --prompts "cat,dog,remote control"

Arguments

Argument Default Description
--image (required) Path to an image file
--task all three One of detect, classify, segment
--model task default HuggingFace model repo ID; defaults to the bundled task-to-model map
--prompts Comma-separated text prompts for zero-shot / open-vocabulary models
--host 127.0.0.1 Server hostname
--port 8110 Server port

Default models

Task Default model
detect PekingU/rtdetr_r18vd
classify google/vit-base-patch16-224
segment facebook/mask2former-swin-tiny-coco-instance

Sample output

--- Task: DETECT  |  Model: PekingU/rtdetr_r18vd ---
  3 detection(s)
    [0.94] cat  bbox=[42, 10, 380, 470]
    [0.81] remote  bbox=[200, 300, 260, 420]
    [0.57] couch  bbox=[0, 250, 480, 480]
  Full response keys: ['schema_version', 'task', 'model', 'timestamp', 'detections']

--- Task: CLASSIFY  |  Model: google/vit-base-patch16-224 ---
  5 class(es)
    [0.84] tabby cat
    [0.07] Egyptian cat
    ...

--- Task: SEGMENT  |  Model: facebook/mask2former-swin-tiny-coco-instance ---
  4 segment(s)
    [0.91] cat  bbox=[42, 10, 380, 470]
    ...

rest_vlm.py

Sends an image and a natural-language prompt to a Visual Language Model (VLM) via the REST API and prints the generated response text.

Requirements

pip install requests

Usage

# Basic question about an image
python examples/rest_vlm.py \
  --image examples/images/coco_cat_remote.jpg \
  --prompt "What do you see in this image?"

# Control generation parameters
python examples/rest_vlm.py \
  --image examples/images/coco_cat_remote.jpg \
  --prompt "List every object you can identify." \
  --max-tokens 256 \
  --temperature 0.3

# Use a different VLM
python examples/rest_vlm.py \
  --image examples/images/coco_cat_remote.jpg \
  --prompt "Describe the scene in one sentence." \
  --model Qwen/Qwen2.5-VL-7B-Instruct

Arguments

Argument Default Description
--image (required) Path to an image file
--prompt "Describe this image." Natural-language question or instruction
--model Qwen/Qwen2.5-VL-3B-Instruct VLM model repo ID
--max-tokens Maximum number of tokens to generate
--temperature Sampling temperature (0.0 = greedy, higher = more creative)
--host 127.0.0.1 Server hostname
--port 8110 Server port

Sample output

--- VLM Inference ---
  Model  : Qwen/Qwen2.5-VL-3B-Instruct
  Prompt : 'What do you see in this image?'

  Response:
The image shows a cat sitting on a couch next to a remote control. The cat appears to be relaxed and is looking towards the camera.

ws_video_infer.py

Streams a local video file to MATA-SERVER over a WebSocket connection and prints inference results frame-by-frame as they arrive.
Implements the full session lifecycle:

  1. POST /v1/sessions — create a streaming session and receive a session_id
  2. WS /v1/stream/{session_id} — connect and stream binary-encoded frames
  3. DELETE /v1/sessions/{session_id} — clean up after streaming ends

Frames are encoded using the MATA binary wire format: a 13-byte header (frame_id uint32 BE + timestamp float64 BE + encoding uint8) followed by JPEG bytes.

Requirements

pip install aiohttp opencv-python

Usage

# Object detection on a video
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --task detect

# Limit to first 60 frames and cap the send rate
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --task detect \
  --max-frames 60 \
  --fps-limit 15

# Use the "latest" frame policy — server always processes the newest frame
# (drops intermediate frames when inference is slower than send rate)
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --model PekingU/rtdetr_r18vd \
  --task detect \
  --frame-policy latest

# Authenticated server
python examples/ws_video_infer.py \
  --video examples/videos/cup.mp4 \
  --task detect \
  --api-key my-secret-key

Arguments

Argument Default Description
--video (required) Path to a video file (mp4, avi, etc.)
--task (required) Inference task: detect, segment, classify, vlm, etc.
--model PekingU/rtdetr_r18vd HuggingFace model repo ID
--max-frames 0 (all) Maximum frames to send; 0 = entire video
--fps-limit 0 (native) Cap send rate in frames per second; 0 = no limit
--frame-policy queue queue (process every frame in order) or latest (skip stale frames)
--api-key Bearer token for authenticated servers
--host 127.0.0.1 Server hostname
--port 8110 Server port

Frame policies

Policy Behaviour Best for
queue Every frame is queued and processed in order. No frames are dropped. Offline analysis, accuracy-critical tasks
latest When the server is busy, older queued frames are dropped and only the most recent is kept. Real-time / live-stream scenarios

Sample output

[1/3] Creating session  model='PekingU/rtdetr_r18vd'  task='detect'  frame_policy='queue'
  session_id : sess_a1b2c3d4e5f6

[2/3] Streaming 120/120 frames @ 30.0 fps
  [frame    0] 2 detections
  [frame    1] 2 detections
  [frame    2] 3 detections
  ...
  Sent 120 frames. Waiting for results…

  Sent    : 120 frames in 4.01s (29.9 fps)
  Received: 118 results | 0 dropped | 0 errors

[3/3] Deleting session sess_a1b2c3d4e5f6
  Session deleted (204)

Sample assets

File Description
images/coco_cat_remote.jpg COCO-style photo with a cat and a TV remote — used by rest_infer.py and rest_vlm.py
videos/cup.mp4 Short clip of a cup — used by ws_video_infer.py

Further reading