Deployer — Local LLM Model Serving

Scripts for deploying vLLM-based local LLM servers on GPU machines.

URL Configuration

All endpoint URLs in these scripts use placeholder values. You must configure your actual server hostname before use.

Required Environment Variables

Variable	Description	Example
`DEPLOYER_HOST`	GPU server hostname	`my-gpu-server`
`DEPLOYER_PHI4_URL`	Override Phi-4 endpoint URL	`http://my-gpu-server:8001/v1`
`DEPLOYER_QWEN3_URL`	Override Qwen3 endpoint URL	`http://my-gpu-server:8002/v1`
`POETRY_BIN`	Path to poetry binary	`/usr/local/bin/poetry`
`POETRY_VENV`	Path to virtualenv activate script	`/path/to/venv/bin/activate`

Setup

Edit config.sh with your GPU allocation and model paths
Set environment variables (or edit the scripts directly):

export DEPLOYER_HOST=my-gpu-server
export POETRY_VENV=/path/to/venv/bin/activate

Start servers: bash deployer/start.sh
Check status: bash deployer/status.sh
Test endpoints: python deployer/test_endpoints.py
Stop servers: bash deployer/stop.sh

Scripts

Script	Purpose
`config.sh`	Server configuration (GPU allocation, model names, ports)
`start.sh`	Start vLLM instances and request router
`stop.sh`	Stop all running instances
`status.sh`	Check health of all endpoints
`test_endpoints.py`	Comprehensive endpoint testing
`benchmark.py`	Latency and throughput benchmarking
`request_router.py`	Load-balancing proxy for multi-instance models

Error Messages

If you see RuntimeError: Deployer endpoint not configured, it means the placeholder URLs have not been replaced. Set the required environment variables listed above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployer — Local LLM Model Serving

URL Configuration

Required Environment Variables

Setup

Scripts

Error Messages

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Deployer — Local LLM Model Serving

URL Configuration

Required Environment Variables

Setup

Scripts

Error Messages