Skip to content

Latest commit

 

History

History
51 lines (38 loc) · 1.77 KB

File metadata and controls

51 lines (38 loc) · 1.77 KB

Deployer — Local LLM Model Serving

Scripts for deploying vLLM-based local LLM servers on GPU machines.

URL Configuration

All endpoint URLs in these scripts use placeholder values. You must configure your actual server hostname before use.

Required Environment Variables

Variable Description Example
DEPLOYER_HOST GPU server hostname my-gpu-server
DEPLOYER_PHI4_URL Override Phi-4 endpoint URL http://my-gpu-server:8001/v1
DEPLOYER_QWEN3_URL Override Qwen3 endpoint URL http://my-gpu-server:8002/v1
POETRY_BIN Path to poetry binary /usr/local/bin/poetry
POETRY_VENV Path to virtualenv activate script /path/to/venv/bin/activate

Setup

  1. Edit config.sh with your GPU allocation and model paths
  2. Set environment variables (or edit the scripts directly):
export DEPLOYER_HOST=my-gpu-server
export POETRY_VENV=/path/to/venv/bin/activate
  1. Start servers: bash deployer/start.sh
  2. Check status: bash deployer/status.sh
  3. Test endpoints: python deployer/test_endpoints.py
  4. Stop servers: bash deployer/stop.sh

Scripts

Script Purpose
config.sh Server configuration (GPU allocation, model names, ports)
start.sh Start vLLM instances and request router
stop.sh Stop all running instances
status.sh Check health of all endpoints
test_endpoints.py Comprehensive endpoint testing
benchmark.py Latency and throughput benchmarking
request_router.py Load-balancing proxy for multi-instance models

Error Messages

If you see RuntimeError: Deployer endpoint not configured, it means the placeholder URLs have not been replaced. Set the required environment variables listed above.