Scripts for deploying vLLM-based local LLM servers on GPU machines.
All endpoint URLs in these scripts use placeholder values. You must configure your actual server hostname before use.
| Variable | Description | Example |
|---|---|---|
DEPLOYER_HOST |
GPU server hostname | my-gpu-server |
DEPLOYER_PHI4_URL |
Override Phi-4 endpoint URL | http://my-gpu-server:8001/v1 |
DEPLOYER_QWEN3_URL |
Override Qwen3 endpoint URL | http://my-gpu-server:8002/v1 |
POETRY_BIN |
Path to poetry binary | /usr/local/bin/poetry |
POETRY_VENV |
Path to virtualenv activate script | /path/to/venv/bin/activate |
- Edit
config.shwith your GPU allocation and model paths - Set environment variables (or edit the scripts directly):
export DEPLOYER_HOST=my-gpu-server
export POETRY_VENV=/path/to/venv/bin/activate- Start servers:
bash deployer/start.sh - Check status:
bash deployer/status.sh - Test endpoints:
python deployer/test_endpoints.py - Stop servers:
bash deployer/stop.sh
| Script | Purpose |
|---|---|
config.sh |
Server configuration (GPU allocation, model names, ports) |
start.sh |
Start vLLM instances and request router |
stop.sh |
Stop all running instances |
status.sh |
Check health of all endpoints |
test_endpoints.py |
Comprehensive endpoint testing |
benchmark.py |
Latency and throughput benchmarking |
request_router.py |
Load-balancing proxy for multi-instance models |
If you see RuntimeError: Deployer endpoint not configured, it means the
placeholder URLs have not been replaced. Set the required environment variables
listed above.