From 94ea278571fac6d0de2f408c27dc301ea42c8b55 Mon Sep 17 00:00:00 2001 From: jiao Date: Sun, 3 May 2026 03:07:06 +0800 Subject: [PATCH 1/5] docs: add SECURITY.md and NOTICE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two OSS hygiene additions surfaced by an internal security review: * **SECURITY.md** — points reporters at GitHub Security Advisories (private disclosure), documents a 90-day coordinated disclosure window, names what's in scope vs. upstream, and records the small set of accepted trade-offs (IPv4-literal SSRF blocklist; chunked refused with 411; loopback-bind threat model) so reporters do not re-file them as vulnerabilities. GitHub surfaces the file as a banner on the public repo. * **NOTICE** — explicit Apache-2.0 attribution for the vLLM upstream the gateway proxies in front of. The README mentions vLLM in prose; the NOTICE file is the canonical legal-attribution location and makes clear that vLLM is not redistributed by this package and references are nominative. Co-Authored-By: Claude Opus 4.7 (1M context) --- NOTICE | 16 +++++++++++ SECURITY.md | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+) create mode 100644 NOTICE create mode 100644 SECURITY.md diff --git a/NOTICE b/NOTICE new file mode 100644 index 0000000..1fc6711 --- /dev/null +++ b/NOTICE @@ -0,0 +1,16 @@ +llm-gateway +Copyright 2026 CoreNovus contributors + +Licensed under the MIT License — see LICENSE for the full text. + +This product is a thin FastAPI proxy in front of vLLM, an open-source +LLM serving engine (https://github.com/vllm-project/vllm) developed +by the vLLM project and licensed under the Apache License, Version 2.0: + + https://www.apache.org/licenses/LICENSE-2.0 + +This product does not redistribute the vLLM engine itself; vLLM is +expected to be supplied by the operator (vendor Docker container) at +deploy time. References to vLLM in code, documentation, and examples +are nominative and do not imply endorsement by or affiliation with +the vLLM project. diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 0000000..ba28c9c --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,80 @@ +# Security policy + +## Reporting a vulnerability + +Please report suspected security vulnerabilities **privately** via +[GitHub Security Advisories][advisories]. + +[advisories]: https://github.com/CoreNovus/llm-gateway/security/advisories/new + +Do **not** open a public GitHub issue for security reports. + +We aim to acknowledge within 5 business days. Please include: + +- A clear description of the vulnerability and its impact +- Reproduction steps or a minimal proof-of-concept +- Affected versions / commits / deployment shape (loopback vs. exposed) + +If a public exploit is already in circulation, say so in the report — +we accelerate the disclosure window in that case. + +## Disclosure policy + +We follow a 90-day coordinated disclosure window: + +1. Day 0 — report received, triage begins +2. Day 5 — acknowledgement + initial severity assessment shared with reporter +3. Day 90 — fix released, advisory published, reporter credited (if desired) + +If the fix lands sooner, the advisory ships with the release. If +active exploitation is observed, the window is shortened by mutual +agreement with the reporter. + +## Scope + +**In scope:** + +- Source code under `llm_gateway/` +- Container / systemd artefacts under `deploy/` +- Pinned dependencies in `pyproject.toml` / `poetry.lock` + +**Out of scope** (please report to upstream): + +- vLLM (`https://github.com/vllm-project/vllm`) — engine internals +- FastAPI / Starlette / httpx / pydantic / uvicorn — framework dependencies +- Operator-controlled configuration. Empty `BEARER_TOKEN` disables + auth; this is documented behaviour for local dev and is gated by + `__main__._require_bearer_token` on the production entry path. + +## Known limitations (documented, not vulnerabilities) + +The following are accepted trade-offs rather than bugs. Filing them +as vulnerabilities is welcome but the response will point back here: + +- The `vllm_upstream_url` SSRF blocklist only checks IPv4 literals + against a known-metadata-IP set. Hostnames that resolve to those + IPs (DNS rebinding) are not caught at config-load time; the httpx + transport must enforce this on its own if the threat model + requires it. +- Chunked transfer-encoding requests are refused with 411 Length + Required rather than length-counted at the ASGI layer. Honest JSON + clients (httpx, openai-python, langchain-openai, curl) always set + `Content-Length` so the practical impact is zero. Operators who + must accept chunked uploads need a proxy in front that materialises + `Content-Length`, or to extend `BodySizeLimitMiddleware` to a + streaming ASGI implementation. +- The gateway binds to `127.0.0.1` by default and assumes an SSH + tunnel as the network-level boundary. Lifting the bind to + `0.0.0.0` without re-reading the threat model is an operator + misconfiguration, not a gateway vulnerability. + +## Hardening recommendations for operators + +Beyond the in-process defences this package ships with, deploy-time +hardening lives in `deploy/`: + +- Container hardening: `cap_drop: ALL`, `no-new-privileges`, + `read_only` rootfs, non-root UID 1001. +- Systemd unit: same posture for non-container deploys. +- Operator-side `--limit-max-requests` / kernel-level limits cover + HTTP-protocol-level abuse below this middleware's reach. From 004990677727c1d6e50c13a37e9e87f4eb8df52c Mon Sep 17 00:00:00 2001 From: jiao Date: Sun, 3 May 2026 03:07:30 +0800 Subject: [PATCH 2/5] chore: add .pre-commit-config.yaml mirroring CI gates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the same set of formatters / linters / secret-scanners that CI runs, so a contributor sees lint failures locally before pushing rather than waiting for a CI red. Hooks: - pre-commit-hooks @ v4.6.0 — trailing whitespace, EOF newline, YAML/TOML syntax, large-file gate (512 KB), merge-conflict markers, LF line-ending fixer (defends against Windows contributor CRLF noise mixing into the repo). - ruff @ v0.7.0 — same version pinned in pyproject.toml dev-deps. Auto-fix on commit; manual review of the diff before push. - black @ 26.3.1 — same version pinned in pyproject.toml dev-deps. - detect-secrets @ v1.5.0 — baseline file pattern is the standard ``.secrets.baseline``; first-run generates it via ``detect-secrets scan > .secrets.baseline``. Install per-checkout with ``pre-commit install``. Run ad-hoc with ``pre-commit run --all-files``. Versions are pinned to exact tags so behaviour is reproducible and bumps are deliberate. Co-Authored-By: Claude Opus 4.7 (1M context) --- .pre-commit-config.yaml | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 .pre-commit-config.yaml diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml new file mode 100644 index 0000000..3e1ff9f --- /dev/null +++ b/.pre-commit-config.yaml @@ -0,0 +1,45 @@ +# Pre-commit hooks — the same gates CI runs, but local and fast. +# +# Install once per checkout: +# pre-commit install +# +# Run all hooks ad hoc: +# pre-commit run --all-files +# +# Versions pinned to exact SHA / tag so contributors get reproducible +# behaviour; bump deliberately rather than tracking ``main``. + +repos: + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.6.0 + hooks: + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-yaml + - id: check-toml + - id: check-added-large-files + args: ["--maxkb=512"] + - id: check-merge-conflict + - id: mixed-line-ending + args: ["--fix=lf"] + + - repo: https://github.com/astral-sh/ruff-pre-commit + # Match the ruff version pinned in pyproject.toml's dev-deps so + # local + CI behaviour does not drift between operators. + rev: v0.7.0 + hooks: + - id: ruff + args: ["--fix"] + + - repo: https://github.com/psf/black-pre-commit-mirror + rev: 26.3.1 + hooks: + - id: black + + - repo: https://github.com/Yelp/detect-secrets + rev: v1.5.0 + hooks: + - id: detect-secrets + args: ["--baseline", ".secrets.baseline"] + # Allow first-run when the baseline does not yet exist. + exclude: ^\.secrets\.baseline$ From 679ca5107a00fc6868be1a12df17ecb0d1312405 Mon Sep 17 00:00:00 2001 From: jiao Date: Sun, 3 May 2026 03:07:58 +0800 Subject: [PATCH 3/5] chore: add CI workflow (ruff / black / pyright / pytest) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GitHub's default-setup CodeQL was the only automated check on PRs; broken format / lint / type / test regressions slipped past review because nothing stopped them at the seam between author and merge. Add ``ci.yml`` covering Python 3.11 + 3.12 matrix: - Poetry install with poetry.lock-keyed cache - ``ruff check`` (lint) - ``black --check`` (format gate, no auto-fix in CI) - ``pyright`` (type check on llm_gateway/) - ``pytest tests/unit/ -v`` (existing unit suite) Concurrency group cancels superseded runs on the same ref so a fast-following push does not waste runners. Pip-audit is deliberately not bundled into this workflow — it deserves its own scheduled cron to surface new CVEs decoupled from PR lifecycle, and is queued as a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/ci.yml | 54 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 .github/workflows/ci.yml diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..3113e63 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,54 @@ +name: CI + +on: + push: + branches: [main] + pull_request: + branches: [main] + +# Cancel in-flight runs on the same ref when a new commit lands. +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + test: + name: test (Python ${{ matrix.python-version }}) + runs-on: ubuntu-latest + strategy: + fail-fast: false + matrix: + # Match the supported range declared in pyproject.toml + # (``requires-python = ">=3.11,<4.0"`` + classifier list). + python-version: ["3.11", "3.12"] + steps: + - uses: actions/checkout@v4 + + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + + - name: Install Poetry + run: pipx install poetry + + - name: Cache Poetry virtualenv + uses: actions/cache@v4 + with: + path: ~/.cache/pypoetry/virtualenvs + key: poetry-${{ matrix.python-version }}-${{ hashFiles('poetry.lock') }} + + - name: Install dependencies + run: poetry install --with dev --no-interaction --no-ansi + + - name: Lint (ruff) + run: poetry run ruff check llm_gateway/ tests/ + + - name: Format check (black) + run: poetry run black --check llm_gateway/ tests/ + + - name: Type check (pyright) + run: poetry run pyright llm_gateway/ + + - name: Tests (pytest) + run: poetry run pytest tests/unit/ -v From 5211e3509c641d6c2bcd77d2acf06e12a7d4ae2c Mon Sep 17 00:00:00 2001 From: jiao Date: Sun, 3 May 2026 11:40:20 +0800 Subject: [PATCH 4/5] fix(api): annotate _safe_token_count value as Any MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The CI workflow added in this PR surfaced a pyright error on chat_completions.py:143 — `int(value or 0)` rejects when `value` is typed as `object` because object does not conform to ConvertibleToInt (str | Buffer | SupportsInt | SupportsIndex). `Any` is the honest type for this defensive helper since the function exists precisely to handle arbitrary upstream JSON; the `try/except (TypeError, ValueError)` already covers every runtime shape mismatch. No behaviour change — pyright type relaxation only. Co-Authored-By: Claude Opus 4.7 (1M context) --- llm_gateway/api/chat_completions.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/llm_gateway/api/chat_completions.py b/llm_gateway/api/chat_completions.py index d95c757..2902d66 100644 --- a/llm_gateway/api/chat_completions.py +++ b/llm_gateway/api/chat_completions.py @@ -23,7 +23,7 @@ # deferred-string evaluation). Same constraint as ``api/health.py``. from collections.abc import Callable -from typing import Annotated +from typing import Annotated, Any from fastapi import APIRouter, Depends, status from fastapi.responses import JSONResponse, Response, StreamingResponse @@ -134,10 +134,17 @@ def _record_token_usage(metrics: Metrics, model: str, response: dict) -> None: metrics.tokens_completion_total.labels(model=model).inc(completion) -def _safe_token_count(value: object) -> int: +def _safe_token_count(value: Any) -> int: """Coerce an upstream token-count field to a non-negative ``int``. Returns 0 for ``None`` / missing / non-numeric / negative values. + + Parameter is typed ``Any`` (not ``object``) because the value comes + straight from a parsed-JSON upstream payload — pyright rejects + ``int(object | Literal[0])`` since plain ``object`` does not + conform to ``ConvertibleToInt`` (str | Buffer | SupportsInt | + SupportsIndex). The ``try/except`` already covers every runtime + shape, so the static type can be honest about that. """ try: count = int(value or 0) From a9ee9526a93b9aee7fe4707ec410ce73e722ff20 Mon Sep 17 00:00:00 2001 From: jiao Date: Sun, 3 May 2026 11:43:19 +0800 Subject: [PATCH 5/5] fix(tests): update rate-limit digest expectation to 32 hex chars MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Companion to PR #6 / commit f40049e which bumped the bearer-token rate-limit key truncation from `[:16]` (64 bits) to `[:32]` (128 bits). The existing test `test_middleware_passes_request_when_limiter_allows` still asserted the old 16-hex slice, so the production code drifted from the test silently — PR #6 had no CI gate at the time it merged (the CI workflow lands in this PR), and the gap surfaced once CI ran. The companion regression-guard test `test_middleware_key_hash_does_not_contain_plaintext_token` is agnostic to digest length and stays unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/unit/test_rate_limit.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/unit/test_rate_limit.py b/tests/unit/test_rate_limit.py index 0121ff6..d4d2c71 100644 --- a/tests/unit/test_rate_limit.py +++ b/tests/unit/test_rate_limit.py @@ -179,7 +179,9 @@ def test_middleware_passes_request_when_limiter_allows() -> None: assert response.status_code == 200 # The bearer is hashed before becoming the bucket key — the # plaintext token must not appear in the bucket dict. - expected_digest = hashlib.sha256(b"t1").hexdigest()[:16] + # 32 hex chars (128 bits) — bumped from 16 to close a token-grinding + # collision attack. See PR #6 / commit f40049e. + expected_digest = hashlib.sha256(b"t1").hexdigest()[:32] assert limiter.calls == [f"token:{expected_digest}"]