Skip to content

fix(gateway): cap request body size at 8 MiB; return 413 instead of OOM#51

Open
YOMXXX wants to merge 1 commit into
Tencent:mainfrom
YOMXXX:fix/gateway-body-size-limit
Open

fix(gateway): cap request body size at 8 MiB; return 413 instead of OOM#51
YOMXXX wants to merge 1 commit into
Tencent:mainfrom
YOMXXX:fix/gateway-body-size-limit

Conversation

@YOMXXX
Copy link
Copy Markdown

@YOMXXX YOMXXX commented May 18, 2026

Summary | 摘要

修复 TDAI Gateway 的请求体无上限 DoS 缺陷:parseJsonBody 此前把整个 body 累加到 Buffer[] 才校验,单个超大或永不结束的 POST 能把 daemon 堆内存撑到 OOM。无 token 的 Hermes backward-compat 模式下,任何本地进程都能无鉴权触发。

Fix unbounded request-body DoS in TDAI Gateway: parseJsonBody buffered every incoming body into an Buffer[] and only validated JSON at end-of-stream — a single oversized or never-ending POST could grow daemon heap until OOM. On a no-token (Hermes backward-compat) daemon, any local process can trigger this without auth.

Fix | 修复

  • 每个 request body 上限 TDAI_GATEWAY_MAX_BODY_BYTES(默认 8 MiB:足够容纳 /seed 几百个历史 session 的批量导入;远小于会让典型 Node 进程 OOM 的边界)。
  • 快速失败:请求头若带 Content-Length 且声明大小超过 limit,未读取任何 body 字节 就 413。
  • 流式守门:客户端撒谎(CL 小但实际大)或用 Transfer-Encoding: chunked 不带 CL 时,运行时累计字节数仍会触发 limit。
  • 触发 limit 时 req.pause()(不 destroy)—— 让 dispatcher 在同一个 socket 上写完 413 response,再由 Node 自然关闭 keep-alive 连接。直接 destroy 会让 response 写不出去退化到 500。
  • 新增 PayloadTooLargeErrorhandleRequest 的 catch 块识别后映射到 HTTP 413 + warn 日志,不再走 500 fallback —— 这对会在 5xx 重试但不重试 4xx 的客户端尤其重要。

环境变量解析容错:TDAI_GATEWAY_MAX_BODY_BYTES 为非数字、空、负数时退回默认 cap,misconfig 不会让 daemon 启不起来。

Tests

新建 src/gateway/__tests__/body-size.test.ts(7 cases):

  1. ✅ 小 body 在 limit 内 → 正常路由
  2. Content-Length > limit → 413(fast-fail,不 buffer)
  3. Transfer-Encoding: chunked body > limit → 413 或 ECONNRESET(流式守门)
  4. ✅ 413 走 dispatcher 路径, 退化到 500
  5. ✅ 413 response body 含具体 limit 值(exceeds 1024 bytes
  6. TDAI_GATEWAY_MAX_BODY_BYTES=50 生效(小 cap → 413)
  7. TDAI_GATEWAY_MAX_BODY_BYTES=not-a-number 退回默认 8 MiB
✓ npx vitest run src/gateway/__tests__/body-size.test.ts → 7/7 passed

Scope | 范围

DCO

Commit c27498bSigned-off-by: 李冠辰 <liguanchen@xiaomi.com>

`parseJsonBody` buffered every incoming request body into an unbounded
`Buffer[]` and only validated JSON at end-of-stream. A single oversized
or never-ending POST to /capture, /seed, /search/* etc. could grow the
daemon's heap until it OOM-crashed — and on a no-token (Hermes
backward-compat) daemon, any local process can trigger this without
authentication.

Fix:

- Cap each request body at `TDAI_GATEWAY_MAX_BODY_BYTES` (default 8 MiB
  — generous for /seed payloads with hundreds of historical sessions,
  small enough that a single request cannot OOM a typical node process).
- Fail fast when a present `Content-Length` declares more than the cap,
  before any buffering.
- Track running total during streaming, so a client that lies about
  Content-Length (or omits it via Transfer-Encoding: chunked) is still
  caught.
- Pause the request stream (not destroy) on cap hit, so the dispatcher
  can still write the 413 response on the same socket; Node closes the
  keep-alive connection after `res.end()`.
- New `PayloadTooLargeError` lets `handleRequest` map cap violations to
  HTTP 413 + a warn log, instead of falling through to the generic 500
  branch that retried clients would interpret as a transient server bug.

Malformed env values (`TDAI_GATEWAY_MAX_BODY_BYTES=not-a-number`, empty,
negative) fall back to the default cap so a misconfig cannot brick the
daemon.

Tests: new src/gateway/__tests__/body-size.test.ts — 7 cases covering
under-cap happy path, Content-Length fast-fail, chunked / lying-CL
streaming guard, 413-not-500 dispatch contract, env override at
construction time, malformed-env fallback.

Signed-off-by: 李冠辰 <liguanchen@xiaomi.com>
@Maxwell-Code07
Copy link
Copy Markdown
Collaborator

Hi @YOMXXX,

已收到关于 Gateway 请求体大小限制的修复 PR,感谢发现并修复!我们会评审后回复。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants