From 94ea278571fac6d0de2f408c27dc301ea42c8b55 Mon Sep 17 00:00:00 2001
From: jiao <yhocotw31016@gmail.com>
Date: Sun, 3 May 2026 03:07:06 +0800
Subject: [PATCH 1/5] docs: add SECURITY.md and NOTICE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two OSS hygiene additions surfaced by an internal security review:

* **SECURITY.md** — points reporters at GitHub Security Advisories
  (private disclosure), documents a 90-day coordinated disclosure
  window, names what's in scope vs. upstream, and records the small
  set of accepted trade-offs (IPv4-literal SSRF blocklist; chunked
  refused with 411; loopback-bind threat model) so reporters do not
  re-file them as vulnerabilities. GitHub surfaces the file as a
  banner on the public repo.

* **NOTICE** — explicit Apache-2.0 attribution for the vLLM upstream
  the gateway proxies in front of. The README mentions vLLM in prose;
  the NOTICE file is the canonical legal-attribution location and
  makes clear that vLLM is not redistributed by this package and
  references are nominative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 NOTICE      | 16 +++++++++++
 SECURITY.md | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)
 create mode 100644 NOTICE
 create mode 100644 SECURITY.md

diff --git a/NOTICE b/NOTICE
new file mode 100644
index 0000000..1fc6711
--- /dev/null
+++ b/NOTICE
@@ -0,0 +1,16 @@
+llm-gateway
+Copyright 2026 CoreNovus contributors
+
+Licensed under the MIT License — see LICENSE for the full text.
+
+This product is a thin FastAPI proxy in front of vLLM, an open-source
+LLM serving engine (https://github.com/vllm-project/vllm) developed
+by the vLLM project and licensed under the Apache License, Version 2.0:
+
+    https://www.apache.org/licenses/LICENSE-2.0
+
+This product does not redistribute the vLLM engine itself; vLLM is
+expected to be supplied by the operator (vendor Docker container) at
+deploy time. References to vLLM in code, documentation, and examples
+are nominative and do not imply endorsement by or affiliation with
+the vLLM project.
diff --git a/SECURITY.md b/SECURITY.md
new file mode 100644
index 0000000..ba28c9c
--- /dev/null
+++ b/SECURITY.md
@@ -0,0 +1,80 @@
+# Security policy
+
+## Reporting a vulnerability
+
+Please report suspected security vulnerabilities **privately** via
+[GitHub Security Advisories][advisories].
+
+[advisories]: https://github.com/CoreNovus/llm-gateway/security/advisories/new
+
+Do **not** open a public GitHub issue for security reports.
+
+We aim to acknowledge within 5 business days. Please include:
+
+- A clear description of the vulnerability and its impact
+- Reproduction steps or a minimal proof-of-concept
+- Affected versions / commits / deployment shape (loopback vs. exposed)
+
+If a public exploit is already in circulation, say so in the report —
+we accelerate the disclosure window in that case.
+
+## Disclosure policy
+
+We follow a 90-day coordinated disclosure window:
+
+1. Day 0 — report received, triage begins
+2. Day 5 — acknowledgement + initial severity assessment shared with reporter
+3. Day 90 — fix released, advisory published, reporter credited (if desired)
+
+If the fix lands sooner, the advisory ships with the release. If
+active exploitation is observed, the window is shortened by mutual
+agreement with the reporter.
+
+## Scope
+
+**In scope:**
+
+- Source code under `llm_gateway/`
+- Container / systemd artefacts under `deploy/`
+- Pinned dependencies in `pyproject.toml` / `poetry.lock`
+
+**Out of scope** (please report to upstream):
+
+- vLLM (`https://github.com/vllm-project/vllm`) — engine internals
+- FastAPI / Starlette / httpx / pydantic / uvicorn — framework dependencies
+- Operator-controlled configuration. Empty `BEARER_TOKEN` disables
+  auth; this is documented behaviour for local dev and is gated by
+  `__main__._require_bearer_token` on the production entry path.
+
+## Known limitations (documented, not vulnerabilities)
+
+The following are accepted trade-offs rather than bugs. Filing them
+as vulnerabilities is welcome but the response will point back here:
+
+- The `vllm_upstream_url` SSRF blocklist only checks IPv4 literals
+  against a known-metadata-IP set. Hostnames that resolve to those
+  IPs (DNS rebinding) are not caught at config-load time; the httpx
+  transport must enforce this on its own if the threat model
+  requires it.
+- Chunked transfer-encoding requests are refused with 411 Length
+  Required rather than length-counted at the ASGI layer. Honest JSON
+  clients (httpx, openai-python, langchain-openai, curl) always set
+  `Content-Length` so the practical impact is zero. Operators who
+  must accept chunked uploads need a proxy in front that materialises
+  `Content-Length`, or to extend `BodySizeLimitMiddleware` to a
+  streaming ASGI implementation.
+- The gateway binds to `127.0.0.1` by default and assumes an SSH
+  tunnel as the network-level boundary. Lifting the bind to
+  `0.0.0.0` without re-reading the threat model is an operator
+  misconfiguration, not a gateway vulnerability.
+
+## Hardening recommendations for operators
+
+Beyond the in-process defences this package ships with, deploy-time
+hardening lives in `deploy/`:
+
+- Container hardening: `cap_drop: ALL`, `no-new-privileges`,
+  `read_only` rootfs, non-root UID 1001.
+- Systemd unit: same posture for non-container deploys.
+- Operator-side `--limit-max-requests` / kernel-level limits cover
+  HTTP-protocol-level abuse below this middleware's reach.

From 004990677727c1d6e50c13a37e9e87f4eb8df52c Mon Sep 17 00:00:00 2001
From: jiao <yhocotw31016@gmail.com>
Date: Sun, 3 May 2026 03:07:30 +0800
Subject: [PATCH 2/5] chore: add .pre-commit-config.yaml mirroring CI gates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds the same set of formatters / linters / secret-scanners that CI
runs, so a contributor sees lint failures locally before pushing
rather than waiting for a CI red.

Hooks:
- pre-commit-hooks @ v4.6.0 — trailing whitespace, EOF newline,
  YAML/TOML syntax, large-file gate (512 KB), merge-conflict markers,
  LF line-ending fixer (defends against Windows contributor CRLF
  noise mixing into the repo).
- ruff @ v0.7.0 — same version pinned in pyproject.toml dev-deps.
  Auto-fix on commit; manual review of the diff before push.
- black @ 26.3.1 — same version pinned in pyproject.toml dev-deps.
- detect-secrets @ v1.5.0 — baseline file pattern is the standard
  ``.secrets.baseline``; first-run generates it via
  ``detect-secrets scan > .secrets.baseline``.

Install per-checkout with ``pre-commit install``. Run ad-hoc with
``pre-commit run --all-files``. Versions are pinned to exact tags so
behaviour is reproducible and bumps are deliberate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .pre-commit-config.yaml | 45 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 .pre-commit-config.yaml

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..3e1ff9f
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,45 @@
+# Pre-commit hooks — the same gates CI runs, but local and fast.
+#
+# Install once per checkout:
+#     pre-commit install
+#
+# Run all hooks ad hoc:
+#     pre-commit run --all-files
+#
+# Versions pinned to exact SHA / tag so contributors get reproducible
+# behaviour; bump deliberately rather than tracking ``main``.
+
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.6.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-toml
+      - id: check-added-large-files
+        args: ["--maxkb=512"]
+      - id: check-merge-conflict
+      - id: mixed-line-ending
+        args: ["--fix=lf"]
+
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Match the ruff version pinned in pyproject.toml's dev-deps so
+    # local + CI behaviour does not drift between operators.
+    rev: v0.7.0
+    hooks:
+      - id: ruff
+        args: ["--fix"]
+
+  - repo: https://github.com/psf/black-pre-commit-mirror
+    rev: 26.3.1
+    hooks:
+      - id: black
+
+  - repo: https://github.com/Yelp/detect-secrets
+    rev: v1.5.0
+    hooks:
+      - id: detect-secrets
+        args: ["--baseline", ".secrets.baseline"]
+        # Allow first-run when the baseline does not yet exist.
+        exclude: ^\.secrets\.baseline$

From 679ca5107a00fc6868be1a12df17ecb0d1312405 Mon Sep 17 00:00:00 2001
From: jiao <yhocotw31016@gmail.com>
Date: Sun, 3 May 2026 03:07:58 +0800
Subject: [PATCH 3/5] chore: add CI workflow (ruff / black / pyright / pytest)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

GitHub's default-setup CodeQL was the only automated check on PRs;
broken format / lint / type / test regressions slipped past review
because nothing stopped them at the seam between author and merge.

Add ``ci.yml`` covering Python 3.11 + 3.12 matrix:

- Poetry install with poetry.lock-keyed cache
- ``ruff check`` (lint)
- ``black --check`` (format gate, no auto-fix in CI)
- ``pyright`` (type check on llm_gateway/)
- ``pytest tests/unit/ -v`` (existing unit suite)

Concurrency group cancels superseded runs on the same ref so a
fast-following push does not waste runners.

Pip-audit is deliberately not bundled into this workflow — it
deserves its own scheduled cron to surface new CVEs decoupled from
PR lifecycle, and is queued as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .github/workflows/ci.yml | 54 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)
 create mode 100644 .github/workflows/ci.yml

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
new file mode 100644
index 0000000..3113e63
--- /dev/null
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,54 @@
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+# Cancel in-flight runs on the same ref when a new commit lands.
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  test:
+    name: test (Python ${{ matrix.python-version }})
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        # Match the supported range declared in pyproject.toml
+        # (``requires-python = ">=3.11,<4.0"`` + classifier list).
+        python-version: ["3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install Poetry
+        run: pipx install poetry
+
+      - name: Cache Poetry virtualenv
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/pypoetry/virtualenvs
+          key: poetry-${{ matrix.python-version }}-${{ hashFiles('poetry.lock') }}
+
+      - name: Install dependencies
+        run: poetry install --with dev --no-interaction --no-ansi
+
+      - name: Lint (ruff)
+        run: poetry run ruff check llm_gateway/ tests/
+
+      - name: Format check (black)
+        run: poetry run black --check llm_gateway/ tests/
+
+      - name: Type check (pyright)
+        run: poetry run pyright llm_gateway/
+
+      - name: Tests (pytest)
+        run: poetry run pytest tests/unit/ -v

From 5211e3509c641d6c2bcd77d2acf06e12a7d4ae2c Mon Sep 17 00:00:00 2001
From: jiao <yhocotw31016@gmail.com>
Date: Sun, 3 May 2026 11:40:20 +0800
Subject: [PATCH 4/5] fix(api): annotate _safe_token_count value as Any
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The CI workflow added in this PR surfaced a pyright error on
chat_completions.py:143 — `int(value or 0)` rejects when `value` is
typed as `object` because object does not conform to
ConvertibleToInt (str | Buffer | SupportsInt | SupportsIndex).

`Any` is the honest type for this defensive helper since the function
exists precisely to handle arbitrary upstream JSON; the
`try/except (TypeError, ValueError)` already covers every runtime
shape mismatch. No behaviour change — pyright type relaxation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 llm_gateway/api/chat_completions.py | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/llm_gateway/api/chat_completions.py b/llm_gateway/api/chat_completions.py
index d95c757..2902d66 100644
--- a/llm_gateway/api/chat_completions.py
+++ b/llm_gateway/api/chat_completions.py
@@ -23,7 +23,7 @@
 # deferred-string evaluation). Same constraint as ``api/health.py``.
 
 from collections.abc import Callable
-from typing import Annotated
+from typing import Annotated, Any
 
 from fastapi import APIRouter, Depends, status
 from fastapi.responses import JSONResponse, Response, StreamingResponse
@@ -134,10 +134,17 @@ def _record_token_usage(metrics: Metrics, model: str, response: dict) -> None:
         metrics.tokens_completion_total.labels(model=model).inc(completion)
 
 
-def _safe_token_count(value: object) -> int:
+def _safe_token_count(value: Any) -> int:
     """Coerce an upstream token-count field to a non-negative ``int``.
 
     Returns 0 for ``None`` / missing / non-numeric / negative values.
+
+    Parameter is typed ``Any`` (not ``object``) because the value comes
+    straight from a parsed-JSON upstream payload — pyright rejects
+    ``int(object | Literal[0])`` since plain ``object`` does not
+    conform to ``ConvertibleToInt`` (str | Buffer | SupportsInt |
+    SupportsIndex). The ``try/except`` already covers every runtime
+    shape, so the static type can be honest about that.
     """
     try:
         count = int(value or 0)

From a9ee9526a93b9aee7fe4707ec410ce73e722ff20 Mon Sep 17 00:00:00 2001
From: jiao <yhocotw31016@gmail.com>
Date: Sun, 3 May 2026 11:43:19 +0800
Subject: [PATCH 5/5] fix(tests): update rate-limit digest expectation to 32
 hex chars
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Companion to PR #6 / commit f40049e which bumped the bearer-token
rate-limit key truncation from `[:16]` (64 bits) to `[:32]` (128
bits). The existing test
`test_middleware_passes_request_when_limiter_allows` still asserted
the old 16-hex slice, so the production code drifted from the test
silently — PR #6 had no CI gate at the time it merged (the CI
workflow lands in this PR), and the gap surfaced once CI ran.

The companion regression-guard test
`test_middleware_key_hash_does_not_contain_plaintext_token` is
agnostic to digest length and stays unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 tests/unit/test_rate_limit.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test_rate_limit.py b/tests/unit/test_rate_limit.py
index 0121ff6..d4d2c71 100644
--- a/tests/unit/test_rate_limit.py
+++ b/tests/unit/test_rate_limit.py
@@ -179,7 +179,9 @@ def test_middleware_passes_request_when_limiter_allows() -> None:
     assert response.status_code == 200
     # The bearer is hashed before becoming the bucket key — the
     # plaintext token must not appear in the bucket dict.
-    expected_digest = hashlib.sha256(b"t1").hexdigest()[:16]
+    # 32 hex chars (128 bits) — bumped from 16 to close a token-grinding
+    # collision attack. See PR #6 / commit f40049e.
+    expected_digest = hashlib.sha256(b"t1").hexdigest()[:32]
     assert limiter.calls == [f"token:{expected_digest}"]