ops: set-model.ps1 — atomic vLLM model swap on a running host#14
Open
jieyao-MilestoneHub wants to merge 1 commit into
Open
ops: set-model.ps1 — atomic vLLM model swap on a running host#14jieyao-MilestoneHub wants to merge 1 commit into
jieyao-MilestoneHub wants to merge 1 commit into
Conversation
Drops a different model into the running vLLM container without a
redeploy. Composes with the existing day-to-day ops loop
(setup-ssh.ps1 → fix-and-start.ps1 → develop / experiment →
restore-idle-protection.ps1) and shares the same conventions:
tag-based instance discovery, IPv4-validated EIP, hardened SSH
options, idempotent re-runs, deterministic exit codes.
Threat model
T1. Argument injection. Every parameter is whitelist-validated
client-side via PowerShell ValidatePattern / ValidateSet /
ValidateRange. Validation re-runs on the remote host as defence
in depth. Values transit as a single ConvertTo-Json blob via fd 3
to bash -s; the remote script reads them with jq, so they never
enter the shell-tokenisation surface.
T2. Compose-file corruption. The edit is atomic: an awk state machine
rewrites a copy, ``docker compose config -q`` validates the
result, then ``mv`` swaps it in. The state machine only touches
the ``vllm:`` service's ``command:`` list; sibling services and
other keys are untouched. Originals are preserved under
``/opt/llm-gateway/deploy/.swap-history/`` for forensic diff /
manual rollback. Optional flags (--quantization /
--tool-call-parser) are dropped when set to "none" and inserted
immediately after --max-model-len when newly added.
T3. No source-code mutation over SSH. Before any swap, a READ-ONLY
pre-flight searches the deployed registry.py for the requested
served_model_name. Unknown names exit 1 with a pointer to open a
registry PR. Mutating Python source over SSH would defeat the
gateway's normal code-review path.
T4. Concurrent operators. ``flock -n
/var/lock/llm-gateway-set-model.lock`` makes parallel swaps fail
fast instead of interleaving compose edits.
T5. Bearer-token leakage. The script never reads .env, never logs
secrets, runs no traffic against the bearer-gated /v1 surface.
Idempotency. Re-running with the currently-loaded model is a no-op:
read the existing compose, find every requested flag already in
place, verify /ready is 200, exit 0 without touching anything.
SOLID
- SRP: one responsibility — atomic compose-flag rewrite + service
recreate. Registry validation is read-only; no source mutation.
- OCP: adding a new vLLM flag = one new awk branch + one
verify_present call + one ValidatePattern. Adding a new candidate
model = a separate registry PR; this script does not change.
- DIP: discovery is tag-based; the script never hardcodes
instance/EIP. Operators inject -InstanceId / -Eip for non-standard
layouts without touching code.
ea9afe7 to
e5e1aec
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
scripts/ops/set-model.ps1— drops a different model into the running vLLM container without a redeploy. Composes with the existing day-to-day ops loop (setup-ssh.ps1→fix-and-start.ps1→ develop / experiment →restore-idle-protection.ps1) and shares the same conventions: tag-based instance discovery, IPv4-validated EIP, hardened SSH options, idempotent re-runs, deterministic exit codes.Use cases
--gpu-memory-utilization,--max-model-len,--tool-call-parser) for an ablation, with auto-archived rollback.-PrePull) so model-load latency does not pollute timings.Threat model
ValidatePattern/ValidateSet/ValidateRange). Validation re-runs on the remote host as defence in depth. Values transit as a singleConvertTo-Jsonblob via fd 3 tobash -s; the remote script reads them withjq, so they never enter the shell-tokenisation surface.awkstate machine rewrites a copy,docker compose config -qvalidates, thenmvswaps it in. The state machine only touches thevllm:service'scommand:list; sibling services and other keys are untouched. Originals preserved under/opt/llm-gateway/deploy/.swap-history/for forensic diff / manual rollback. Optional flags (--quantization/--tool-call-parser) are dropped whennoneand inserted immediately after--max-model-lenwhen newly added.registry.pyfor the requestedserved_model_name(grep -F). Unknown names exit 1 with a pointer to open a registry PR. Mutating Python source over SSH would defeat the gateway's normal code-review path.flock -n /var/lock/llm-gateway-set-model.lockmakes parallel swaps fail fast instead of interleaving compose edits..env, never logs secrets, runs no traffic against the bearer-gated/v1surface.Idempotency
Re-running with the currently-loaded model is a no-op: read the existing compose, find every requested flag already in place, verify
/readyis 200, exit 0 without touching anything.Exit codes
/ready200docker compose upfailed/readynot 200 within budgetThese are stable — adding a new code is a breaking change for any caller that branches on values; prefer reusing an existing one.
Test plan
This repo's CI is Python-only (no PSScriptAnalyzer / Pester). The script's correctness is verified by manual smoke against a real dev EC2 instance:
/readyreaffirmed/ready200 within budget-PrePulland a fresh model; expect HF cache populated, no service recreation, exit 0-Model 'foo/bar; rm -rf /'; expect ValidatePattern rejection client-side, no SSH attemptedFootprint
No code outside
scripts/ops/. No CI changes. No Python touch.