chore(deps): update dependency vllm to v0.19.0 [security]#791
Open
renovate[bot] wants to merge 1 commit intomainfrom
Open
chore(deps): update dependency vllm to v0.19.0 [security]#791renovate[bot] wants to merge 1 commit intomainfrom
renovate[bot] wants to merge 1 commit intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
0.18.0→0.19.0Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
GitHub Vulnerability Alerts
CVE-2026-34756
Summary
A Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the
nparameter in theChatCompletionRequestandCompletionRequestPydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically largenvalue. This completely blocks the Pythonasyncioevent loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue.Details
The root cause of this vulnerability lies in the missing upper bound checks across the request parsing and asynchronous scheduling layers:
In
vllm/entrypoints/openai/chat_completion/protocol.py, thenparameter is defined simply as an integer without anypydantic.Fieldconstraints for an upper bound.When the API request is converted to internal
SamplingParamsinvllm/sampling_params.py, the_verify_argsmethod only checks the lower bound (self.n < 1), entirely omitting an upper bounds check.When the malicious request reaches the core engine (
vllm/v1/engine/async_llm.py), the engine attempts to fan out the requestntimes to generate identical independent sequences within a synchronous loop.Because Python's
asyncioruns on a single thread and event loop, this monolithicfor-loop monopolizes the CPU thread. The server stops responding to all other connections (including liveness probes). Simultaneously, the memory allocator is overwhelmed by cloning millions of request object instances viacopy(request), driving the host's Resident Set Size (RSS) up by gigabytes per second until the OSOOM-killerterminates the vLLM process.Impact
Vulnerability Type: Resource Exhaustion / Denial of Service
Impacted Parties:
vllm.entrypoints.openai.api_server), which happens to be the primary entrypoint for OpenAI-compatible setups.Because this vulnerability exploits the control plane rather than the data plane, an unauthenticated remote attacker can achieve a high success rate in taking down production inference hosts with a single HTTP request. This effectively circumvents any hardware-level capacity planning and conventional bandwidth stress limitations.
CVE-2026-34753
Summary
A Server Side Request Forgery (SSRF) vulnerability in
download_bytes_from_urlallows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server, without any URL validation or domain restrictions.This can be used to target internal services (e.g. cloud metadata endpoints or internal HTTP APIs) reachable from the vLLM host.
Details
Vulnerable component
The vulnerable logic is in the batch runner entrypoint
vllm/entrypoints/openai/run_batch.py, functiondownload_bytes_from_url:Key properties:
data,http,https).http/https, it directly callssession.get(url)on the provided string.MediaConnector), which implements an explicit domain allowlist.download_bytes_from_urldoes not reuse that protection.URL controllability
The
urlargument is fully controlled by batch input JSON via thefile_urlfield ofBatchTranscriptionRequest/BatchTranslationRequest.There is no restriction on the domain, IP, or port of
file_urlin these models.The batch runner reads each line of the input file (
args.input_file), parses it as JSON, and constructs aBatchTranscriptionRequest/BatchTranslationRequest. Whateverfile_urlappears in that JSON line becomesbatch_request_body.file_url.file_urlis passed directly intodownload_bytes_from_url:So the data flow is:
body.file_url.BatchRequestInput/BatchTranscriptionRequest/BatchTranslationRequestparse that JSON and storefile_urlverbatim.make_transcription_wrappercallsdownload_bytes_from_url(batch_request_body.file_url).download_bytes_from_url’s HTTP/HTTPS branch issuesaiohttp.ClientSession().get(url)to that attacker-controlled URL with no further validation.This is a classic SSRF pattern: a server-side component makes arbitrary HTTP requests to a URL string taken from untrusted input.
Comparison with safer code
The project already contains a safer URL-handling path for multimodal media in
vllm/multimodal/media/connector.py, which demonstrates the intent to mitigate SSRF via domain allowlists and URL normalization:and:
download_bytes_from_urldoes not reuse this allowlist or any equivalent validation, even though it also fetches user-provided URLs.CVE-2026-34755
Summary
The
VideoMediaIO.load_base64()method atvllm/multimodal/media/video.py:51-62splitsvideo/jpegdata URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. Thenum_framesparameter (default: 32), which is enforced by theload_bytes()code path at line 47-48, is completely bypassed in thevideo/jpegbase64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM.Details
Vulnerable code
The
load_bytes()path (line 47-48) properly delegates to a video loader that respectsself.num_frames(default 32). Theload_base64("video/jpeg", ...)path bypasses this limit entirely —data.split(",")produces an unbounded list and every frame is decoded into a numpy array.video/jpeg is part of vLLM's public API
video/jpegis a vLLM-specific MIME type, not IANA-registered. However it is part of the public API surface:encode_video_url()atvllm/multimodal/utils.py:96-108generatesdata:video/jpeg;base64,...URLstests/entrypoints/openai/test_video.py:62andtests/entrypoints/test_chat_utils.py:153both use this formatMemory amplification
Each JPEG frame decodes to a full numpy array. For 640x480 RGB images, each frame is ~921 KB decoded. 5000 frames = ~4.6 GB.
np.stack()then creates an additional copy. The compressed JPEG payload is small (~100 KB for 5000 frames) but decompresses to gigabytes.Data flow
connector.py:91usessplit(",", 1)which splits on only the first comma. All remaining commas stay indataand are later split byvideo.py:54.Comparison with existing protections
load_bytes()(binary video)num_frames(default 32)load_base64("video/jpeg", ...)data.split(",")Release Notes
vllm-project/vllm (vllm)
v0.19.0Compare Source
vLLM v0.19.0
Highlights
This release features 448 commits from 197 contributors (54 new)!
transformers>=5.5.0. We recommend using pre-built docker imagevllm/vllm-openai:gemma4for out of box usage.Model Support
--lora-target-modulesto restrict LoRA to specific modules (#34984),language_model_onlyrespected (#37375), Mistral3 fix (#36928), Qwen3.5 fix (#36976), out-of-tree ops replacement (#37181).Engine Core
--speculative-config(#37880), Eagle3 drafter quant_config propagation (#37280), Eagle3 norm_before_fc propagation (#38111).Hardware & Performance
Large Scale Serving
Quantization
API & Frontend
/v1/chat/completions/batchfor batched chat completions (#38011).--lora-target-modules(#34984),-scshorthand for--speculative-config(#38380).--calculate-kv-scales(#37201),scoretask (#37537), pooling multi-task support (#37956),reasoning_contentmessage field removed (#37480).Security
VLLM_MAX_N_SEQUENCESenvironment variable to enforce sequence limits (#37952).Dependencies
V0 Deprecation
--disable-frontend-multiprocessing(#37612).New Contributors
v0.18.1Compare Source
This is a patch release on top of v0.18.0 to address a few issues:
Configuration
📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.