Skip to content

fix(plugin): make before_message_write hook a no-op by default (cache-friendly)#52

Open
YOMXXX wants to merge 1 commit into
Tencent:mainfrom
YOMXXX:fix/cache-friendly-default
Open

fix(plugin): make before_message_write hook a no-op by default (cache-friendly)#52
YOMXXX wants to merge 1 commit into
Tencent:mainfrom
YOMXXX:fix/cache-friendly-default

Conversation

@YOMXXX
Copy link
Copy Markdown

@YOMXXX YOMXXX commented May 18, 2026

Summary | 摘要

before_message_write hook 默认改为 no-op,恢复 LLM prompt cache 在 sub-agent / replay 边界上的命中。原剥离行为保留为 TDAI_STRIP_RELEVANT_MEMORIES_ON_WRITE=1 的 opt-in。Closes #11.

Default before_message_write to no-op for <relevant-memories> stripping so the LLM prompt prefix stays stable across sub-agent / replay boundaries. Legacy strip behavior is preserved behind an opt-in env var. Closes #11.

Root cause

index.ts:591 的 hook 把 user message 里的 <relevant-memories> 块剥离再写 session JSONL。hook 注释里写它有两层用意:

  1. 让 transcript 干净(不让 replay 看到召回 artifact)
  2. 防 L0 反馈循环(防止召回内容被 L0 当作"用户原话"再次录入)

但 (2) 已经被 src/core/conversation/l0-recorder.ts:254 上的 sanitizeText() 独立完成 —— 每条 L0-bound message 都会经过 sanitizeText,它会剥离 <relevant-memories>(以及其他几个 injected tag)。所以 hook 那次剥离严格是为 (1),也是 #11 cache miss 的唯一来源

Fix

  • 新默认:hook 不再剥离 <relevant-memories>,user message 完整写入 session JSONL → sub-agent / replay 重读 transcript 时跟 in-memory effectivePrompt 前缀对齐 → LLM cache 命中。
  • 旧行为回退TDAI_STRIP_RELEVANT_MEMORIES_ON_WRITE=1 一键恢复剥离。env 在每次 hook 触发时读取,运行时切换不需要重启 host。
  • 不变
    • auto-recall.ts 仍注入 <relevant-memories> 到当前轮 user message
    • l0-recorder.tssanitizeText 仍在 L0 录入时剥离 memories(真正的反馈循环防御)
    • sanitize.ts 自身

Theoretical cache-hit model

N 轮对话,每轮 user message 含 1-3 KB <relevant-memories> 块。Sub-agent / replay 路径:

Strip behavior LLM sees on turn N+1 Cache prefix vs in-memory turn N
Strip (current) User msgs 1..N without memories Differs in every user msg → miss
Don't strip (new) User msgs 1..N with original memories Identical → prefix cache hit

Don't-strip cost:N × (1-3 KB) 额外 tokens。N ≤ 30、memories ≤ 3 KB 时 ≤ 90 KB ≈ 25 K tokens,远小于现代 context window 上限(128 K – 1 M),也明显小于 first-token cache miss 带来的延迟与单价损失(典型 cache hit 是 ~0.1× cost)。

Refactor

把 hook callback 抽成 export 的纯函数 maybeStripRelevantMemoriesOnWrite(message),hook 注册改为一行 wrapper。让 helper 可以脱离 OpenClaw runtime 独立单测,且 hook 的语义("看 message 决定要不要替换 content")跟 helper 签名 ({ content } | null) 一一对应。

Tests

新建 src/__tests__/before-message-write.test.ts (main 上还没有 src 测试,跟 #39 / #42 / #51 同套路),11 个 cases:

# 场景 期望
1 env 未设 + user msg 含 memories helper 返回 null
2 env=1 + string content + 含 memories 剥离
3 env=1 + parts content + 某 part 含 memories 仅剥离该 part
4 env=1 + role=assistant + 含 memories 不动
5 env=1 + user msg 无 memories tag 返回 null
6 env 为 "true" / "yes" / "0" / "1 " / "" / "TRUE"(非字面值 "1" 视为未设 → 返回 null(it.each 6 rows)
✓ npx vitest run src/__tests__/before-message-write.test.ts → 11/11 passed

Compatibility

  • Hermes plugin path:不受影响。Hermes 通过 Gateway HTTP /recall 拿召回内容,自己拼 prompt,不走 OpenClaw before_message_write hook。
  • Claude Code plugin path:不受影响。cc plugin 通过 additionalContext 在 cc 的 UserPromptSubmit hook 里注入召回,跟 OpenClaw hook 完全独立。
  • 依赖 transcript 干净度的 OpenClaw 用户(log shipping / audit / 独立 replay 工具):设 TDAI_STRIP_RELEVANT_MEMORIES_ON_WRITE=1 恢复旧行为。

Out of scope

  • 真实 LLM cache 命中率测量 —— 没有 representative CI 环境;理论模型如上,腾讯团队若想 A/B 在合并前自行验证。
  • per-turn 精细化剥离("只剥离非最近一轮 memories")—— 当前 OpenClaw hook 接口看不到 history,无法实现。
  • L0 capture 路径改动 —— 已经正确。

DCO

Commit 带 Signed-off-by: 李冠辰 <liguanchen@xiaomi.com>

…-friendly)

The before_message_write hook in index.ts unconditionally stripped
<relevant-memories> tags from user messages before persisting to the
session JSONL. This destroyed LLM prompt-prefix stability across
sub-agent / replay boundaries: each turn's in-memory effectivePrompt
contained the memories block but the same turn re-read from the JSONL
did not — every cross-boundary prompt suffered a first-token cache
miss. Reported in Tencent#11 by @yunhao-tech; similar direction suggested by
@changxu21-spec.

The hook's stated dual purpose was (1) transcript cleanliness and
(2) anti-feedback-loop. (2) is already handled independently by
sanitizeText() in src/core/conversation/l0-recorder.ts:254 on every
L0-bound message, so the hook strip is strictly about (1) and is
strictly the cause of the cache miss.

Default behavior change: the hook becomes a no-op. The transcript now
preserves <relevant-memories> blocks; prompt cache hits across
boundaries; long agent loops no longer accumulate first-token latency.

Opt-in legacy strip: set TDAI_STRIP_RELEVANT_MEMORIES_ON_WRITE=1. Env
is read on every hook invocation (no constructor-time caching), so
operators can flip behavior without restarting OpenClaw / Hermes.

Strip logic is factored into an exported helper
maybeStripRelevantMemoriesOnWrite(message) so it can be unit-tested
without mocking the OpenClaw runtime.

Tests: new src/__tests__/before-message-write.test.ts — 11 cases
covering env-unset default, env=1 strip for string/parts content,
role guard, no-tag short-circuit, and a 6-row it.each over
non-literal-"1" env values.

Unchanged:
- src/core/hooks/auto-recall.ts still injects <relevant-memories>.
- src/core/conversation/l0-recorder.ts still strips via sanitizeText
  on L0 capture (true anti-loop defense).
- src/utils/sanitize.ts itself.

Hermes plugin path and Claude Code plugin path are both unaffected
(neither goes through the OpenClaw before_message_write hook).

Closes Tencent#11.

Signed-off-by: 李冠辰 <liguanchen@xiaomi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

before_message_write 剥离 <relevant-memories> 导致多轮对话 prompt cache 命中率下降

1 participant