Skip to content

fix(test-infra): hook pipe-blindness + hash assertion fix + EpisodeWorker baseline#2

Open
oinani0721 wants to merge 4 commits into
mainfrom
fix/test-infra-paralysis
Open

fix(test-infra): hook pipe-blindness + hash assertion fix + EpisodeWorker baseline#2
oinani0721 wants to merge 4 commits into
mainfrom
fix/test-infra-paralysis

Conversation

@oinani0721

Copy link
Copy Markdown
Owner

Summary

This branch lands fix-test-infra-paralysis (originally a7dd270, cherry-picked as f19dcff) plus two follow-up fixes based on ChatGPT Deep Research review:

  • Cherry-pick f19dcff: The main fix-test-infra-paralysis change — wrapper-based hook exit code propagation, 8 stale-mock skips, new EpisodeWorker baseline test file. Reduces backend unit failures from 256 → ~103 (~60%).
  • 7f495ed Frontend stryker wrapper: Closes the last residual pipe-blindness in .claude/hooks/post-tool-router.sh line 70, flagged by ChatGPT review.
  • 0fd6d39 Hash length assertion fix: Corrects test_id_format and test_batch_id_format assertions from +16 to +32 to match the actual sha256[:32] truncation in memory_service.py. Reduces failures 103 → 101.

This PR is not intended for merge yet — it exists to (1) trigger CI (test.yml + api-spec-sync.yml) so combined status becomes populated for ChatGPT review, and (2) give a review thread anchor.

Key review areas

  1. scripts/run_cmd_capture.sh (new in cherry-pick, 96 lines) — wrapper that captures stdout+stderr to /tmp via &>, preserves $?, prints tail on failure. No pipes internally.
  2. .claude/hooks/post-tool-router.sh — frontend stryker now routed through wrapper (3 backend pytest tiers were already routed in the cherry-picked commit). Knip moved into a cwd-isolating subshell.
  3. backend/tests/unit/test_episode_worker_retry.py (new in cherry-pick, 312 lines) — 5 baseline tests covering EpisodeWorker enqueue / exponential backoff / dead-letter / metrics / request_id propagation. Replaces semantics that used to be in MemoryService._write_to_graphiti_json_with_retry (deleted in Phase 2 refactor).
  4. 8 stale-mock skips (unit + integration) — each with reason string pointing to the new EpisodeWorker baseline.
  5. backend/tests/unit/test_story_30_10_idempotency.py — 4-line assertion fix (+ docstring "hash16" → "hash32").

Test plan

  • backend/tests/unit/test_episode_worker_retry.py — 5 PASS (baseline)
  • backend/tests/unit/test_story_30_10_idempotency.py::TestDeterministicEpisodeId::test_id_format — PASS (post-fix)
  • backend/tests/unit/test_story_30_10_idempotency.py::TestBatchDeterministicEpisodeId::test_batch_id_format — PASS (post-fix)
  • Full pytest backend/tests/unit/ -m "not integration" --ignore=...batch... — 101 failed / 8 errors / 48 skipped / 2506 passed (down from 103/8 baseline)
  • bash -n .claude/hooks/post-tool-router.sh — syntax clean
  • CI test.yml (Python 3.11 + 3.12) — will auto-trigger on PR
  • CI api-spec-sync.yml — will auto-trigger if openapi.json paths touched (not touched here; likely skip)

Out of scope (follow-up)

  • /tmp wrapper log cleanup policy (document in known-gotchas)
  • caplog handler stability with dynamic dictConfig (document as logging discipline)
  • 7-8 missing agent template .md files (needs sprint-level decision: restore or remove smoke test)
  • Remaining ~101 unit failures + 8 errors (agent_service / cache / calibration / batch — not production bugs per Agent 2 analysis, but deferred to separate sprint)
  • Branch protection / required status checks (needs user/admin GitHub settings change)

🤖 Generated with Claude Code

oinani0721 and others added 4 commits April 7, 2026 06:33
…-paralysis)

Two-phase fix for the post-fix-structlog-caplog-compat residual: 136 unit
test failures + 17 errors that fall into hook-blindness and stale-mock
patterns. Reduces backend unit failures from 136 to 102 (-34) and errors
from 17 to 8 (-9), with 47 more skipped. Combined with the earlier
fix-structlog-caplog-compat commit, total reduction from the 256-failure
baseline is ~57%.

═══════════════════════════════════════════════════════════════════════════
Phase 0 — Hook chain exit-code propagation
═══════════════════════════════════════════════════════════════════════════

Three hook layers (lefthook backend-smoke + frontend-test, post-tool-router
smoke + related + single-file, stop-test-runner) all piped pytest output
through `| tail -N` or `| head -N`. POSIX pipelines return only the
rightmost command's exit status; none of the hooks set `pipefail`. Result:
every hook silently passed even when pytest exited 1. The 256 baseline
failures persisted across the entire commit window 793cd53→3b96e49 because
all three guard layers were blind.

Files:
- scripts/run_cmd_capture.sh (NEW): pure bash wrapper. Captures full
  stdout+stderr to /tmp/run_cmd_capture_<pid>_<ts>.log via `&>`, prints
  `[TEST FAILURE] exit code: <N>` + temp file path + last N lines on
  failure, exits with the wrapped command's original exit code. Never
  uses pipes itself, so its own exit code equals the command's.
  Verified directly: exit 7 propagated to caller, tail captured 5 lines,
  pytest canary test (assert False) → wrapper exit 1 with full traceback.
- lefthook.yml: backend-smoke + frontend-test rewritten to invoke wrapper
  with --cwd backend/frontend --tail 120. Removes the dual `| tail -5`
  pattern that swallowed test failures.
- .claude/hooks/post-tool-router.sh: 3 pytest pipes (smoke tier, related
  tier, single-file tier) rewritten to wrapper. Deleted the 3 redundant
  `[ \$? -ne 0 ] && exit 1` checks since they were operating on tail's
  exit code (always 0). Vulture (no pipe) untouched.
- .claude/hooks/stop-test-runner.js: replaced execSync `| head -20` with
  wrapper invocation. stdio: "inherit" so the wrapper's [TEST FAILURE]
  block streams directly to user terminal without Node's 1MB maxBuffer
  truncating long tracebacks. Forced .venv/bin/python to avoid PATH
  ambiguity. Deleted the unreliable /FAILED|ERROR/.test(result) regex
  check (pytest --tb=line doesn't always include literal "FAILED" in
  truncated output). Exit 2 on failure (Stop hook protocol).

Phase 1 (AST 38 logger pos-args rewrite): CANCELLED. Ground-truth test
revealed that structlog.stdlib.BoundLogger does NOT raise on positional
args — it preserves %s placeholders in the event field and stores args
in a positional_args array. Zero of the 136 failures are caused by
logger pos-args. Phase 1 would have improved log schema ergonomics but
fixed no tests. Tracked as out-of-scope follow-up.

═══════════════════════════════════════════════════════════════════════════
Phase 2 — EpisodeWorker baseline + skip 8 stale-mock test files
═══════════════════════════════════════════════════════════════════════════

37 references to deleted MemoryService._write_to_graphiti_json_with_retry
(and a few to _write_to_graphiti_json) across 8 test files were producing
AttributeError noise. fix-rag-transform-and-episode-isolation Phase 2 had
moved retry/backoff/dead-letter semantics to GraphitiEpisodeWorker without
migrating these tests.

Files:
- backend/tests/unit/test_episode_worker_retry.py (NEW): 5 baseline tests
  for EpisodeWorker covering enqueue + process success, exponential backoff
  sleep series with full jitter (assert sleep[i] ∈ [0, 2**i), 60s cap),
  dead-letter on retries-exhausted (4 attempts → JSONL written + counters),
  WorkerMetrics field completeness (10 fields), and request_id propagation
  through EpisodeTask → DeadLetterStore JSONL record. All 5 PASS in 0.47s.
  Important debug lesson encoded as a module-level _ORIGINAL_ASYNCIO_SLEEP
  constant: patching app.services.episode_worker.asyncio.sleep affects the
  asyncio module singleton process-wide, causing infinite recursion in any
  side_effect that itself awaits asyncio.sleep. Capture the original at
  import time and use it inside the patch closure.

Stale-mock cleanup (per user authorization for module/class skip + reason):

- backend/tests/unit/test_memory_service_write_retry.py: module-level skip
  (whole file is 100% tests of deleted method internals)
- backend/tests/unit/test_graphiti_json_dual_write.py: module-level skip
  (single class, all tests reference deleted private methods)
- backend/tests/integration/test_dual_write_consistency.py: module-level skip
  (integration test of deleted dual-write path)
- backend/tests/unit/test_story_30_10_idempotency.py: class-level skip on
  TestEpisodesDedup, TestBatchEpisodesDedup, TestGraphitiJsonWriteDedup.
  PRESERVES TestDeterministicEpisodeId + TestBatchDeterministicEpisodeId
  (6 hash-only tests still active).
- backend/tests/unit/test_failure_observability.py: class-level skip on
  TestMemoryServiceDualWriteFailure. Other classes (TestDeadLetterRequestId,
  TestEdgeSyncFailureCounter, etc.) untouched.
- backend/tests/unit/test_qa_38_6_scoring_reliability_extra.py: class-level
  skip on TestFullCycleIntegration (uses ms._write_to_graphiti_json_with_retry
  attribute assignment).
- backend/tests/unit/test_story_38_6_scoring_reliability.py: class-level
  skip on TestAC3StartupRecovery (same attribute assignment pattern).
- backend/tests/integration/test_story_38_7_ac5_recovery_and_cross_story.py:
  class-level skip on TestAC5Recovery (patch.object on deleted method).

Each skip carries a reason string pointing at test_episode_worker_retry.py
for the equivalent semantics under the new pipeline. Files preserved (not
deleted) for blame/audit history.

═══════════════════════════════════════════════════════════════════════════
Verification
═══════════════════════════════════════════════════════════════════════════

Backend unit regression (excl integration/slow/e2e and the 87-error
test_story_30_11/30_13 batch files):

  Before fix-structlog-caplog-compat: 169 fail, 87 err, 2212 pass
  After fix-structlog-caplog-compat:  136 fail, 17 err, 2472 pass
  After fix-test-infra-paralysis P2:  102 fail,  8 err, 2473 pass, 48 skipped

Δ from baseline: -67 fail, -79 err, +261 pass, +47 skipped (~57% reduction
of failures+errors).

Remaining 102 failures + 8 errors are out-of-scope test debt:
- agent_service contract drift (test_agent_service_*.py, test_agent_context_*)
- agent_templates_smoke file-existence assertions
- cache_configuration / calibration_tracker / batch test fixtures
- Two test_id_format failures in test_story_30_10 (pure-hash logic, unrelated)

These will be addressed by separate changes (fix-batch-story-test-fixtures,
fix-agent-contract-drift) tracked under fix-test-infra-paralysis tasks.md
section 6.

OpenSpec:
- openspec/changes/fix-test-infra-paralysis/proposal.md, design.md,
  specs/test-infrastructure-resilience/spec.md, tasks.md created via
  npx openspec new + instructions workflow. Validates strict OK,
  4/4 artifacts complete. (gitignored — local working state per project
  convention; will be archived to openspec/specs/ in a separate step
  after human spec review.)
- New capability test-infrastructure-resilience codifies 5 requirements:
  hook chains MUST surface non-zero exit codes; hooks MUST persist full
  output to disk; structlog.stdlib.BoundLogger callers MUST use kwargs;
  tests MUST NOT mock deprecated retry symbols; AST cleanup script MUST
  be dry-run by default with whitelisted target files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit a7dd270182b455b270526343bd91c27138866059)
ChatGPT Deep Research review of f19dcff flagged the only remaining
pipe-blindness in the hook chain: `.claude/hooks/post-tool-router.sh`
line 70 still invoked `npx stryker run 2>&1 | tail -20` followed by
a useless `[ $? -ne 0 ]` check (reading tail's exit code, always 0).

Fix: route stryker through the existing `scripts/run_cmd_capture.sh`
wrapper already used by the three backend pytest tiers in the same
file, so exit code propagation becomes mechanism-guaranteed rather
than convention-dependent. Knip (no pipe in its invocation) stays
direct but moves to a cwd-isolating subshell so no cross-block cd
side-effects remain.

With this change, every test-output pipeline in the hook chain uses
the wrapper, completing the a7dd270 fix-test-infra-paralysis scope.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… 32)

test_story_30_10_idempotency.py's test_id_format and test_batch_id_format
asserted `len(prefix) + 16`, but the production code in
backend/app/services/memory_service.py uses `hashlib.sha256(...).hexdigest()[:32]`
(lines 84 and 100), producing 32 hex chars. The assertions were stale
from a pre-merge design where hash truncation was 16 hex.

Docstrings were also updated from "hash16" to "hash32" to match code.

No production code change — the 32-char hash is the intended current
state (collision resistance) and other tests in the same file (e.g.
test_same_event_same_id) already pass with the 32-char output.

Reduces backend unit failures from 103 to 101 (cherry-pick baseline:
post-f19dcff was 103/8 not 102/8 as estimated in plan).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Motivation: PR #2 had merge conflicts blocking GitHub's pull_request
workflow trigger (GitHub Actions cannot create the refs/pull/2/merge
ref when the merge fails, so the workflow is silently skipped with
zero runs visible in the Actions API or PR Checks UI).

Conflict location: lefthook.yml pre-push backend-smoke block.

Both sides made complementary (not opposing) changes:
- origin/main (2026-04-07 auto-sync): narrowed backend-smoke scope to
  A11 regression suite (test_kg_relevance_weighted + test_a11_kg_relevance_e2e,
  30 tests, ~1s) to avoid the pre-existing 136-failure debt, and
  switched from `| tail -5` to explicit `$?` capture + `exit $TEST_EXIT`
  so pipe-blindness cannot swallow pytest exit codes.
- fix/test-infra-paralysis (f19dcff): routed both frontend-test and
  backend-smoke through scripts/run_cmd_capture.sh so POSIX pipe exit
  code propagation becomes mechanism-guaranteed.

Resolution: kept origin/main's backend-smoke verbatim (narrow A11 scope
+ $? capture pattern — no pipe means wrapper is redundant) while
preserving fix branch's frontend-test wrapper change (main did not
touch frontend-test, so the pipe-blindness fix is still needed there).

Local verification (from worktree, absolute venv path):
    .venv/bin/python -m pytest tests/unit/test_kg_relevance_weighted.py \\
        tests/e2e/test_a11_kg_relevance_e2e.py -q --tb=line --no-header \\
        -p no:cacheprovider --override-ini="addopts="
Result: 32 passed in 1.27s (matches main's stated "30 tests, ~1s" baseline).

YAML validation (python -c 'import yaml; yaml.safe_load(...)'): OK.

After this merge commit is pushed, PR #2's merge state should flip to
MERGEABLE, GitHub will generate refs/pull/2/merge, and the pull_request
workflow trigger (test.yml + api-spec-sync.yml) should fire on the
merge ref for the first time in this repo's history.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request Apr 10, 2026
…positive

L5-#2 regression: previously the /api/v1/health endpoint only checked
neo4j_stats["initialized"] before dispatching `RETURN 1 AS ping`. When
Neo4jClient auto-fell-back to JSON mode (initialized=True, mode=JSON_FALLBACK),
the ping query was silently routed to _run_query_json_fallback, which no-ops
for simple queries instead of raising. Result: components.neo4j == "ok" even
though Neo4j was unreachable — false-green monitoring.

Fix: mirror the production 4-way classification from
app/services/memory_service.py:969-987 (Story 30.3 fix). Now we only treat
the client as healthy when stats["mode"] == "NEO4J" AND health_status is True;
JSON_FALLBACK is reported as "json_fallback" (operational, not error);
otherwise "degraded" or "not_initialized".

HTTP 200 unchanged at top level — JSON_FALLBACK is operational on Tauri
desktop sidecar so we don't fail the contract; the signal flows through
components.neo4j only. Frontend useBackendStatus.ts only reads top-level
status, so this is safe.

Tests: TestHealthCheckNeo4jFallback (4 cases) — fully self-contained via
monkeypatch of get_neo4j_client; does not depend on broken backend/tests/unit
fixtures.

Plan reference: Plan v25 Option C (L5-#2)
Pattern source: backend/app/services/memory_service.py:969-987

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request Apr 10, 2026
Update phase-1-day-1-spike-results.md to reflect that both Critical L5
findings from Spike 1 are now resolved:

- L5-#1 Graphiti fire-and-forget task leak: FIXED in 990e958 via pre-flight
  Neo4j probe in GraphitiEpisodeWorker.initialize_graphiti. The PRD's original
  assumption that this was our own fire-and-forget bug was wrong — the real
  root cause is graphiti-core v0.28.2 library internals at
  neo4j_driver.py L91-101. Replaced root cause analysis with the verified
  explanation including the exact library file:line reference.

- L5-#2 Health endpoint false-positive: FIXED in 1f170a6 via 4-way mode-aware
  classification in health.py. The PRD's assumption that the endpoint was
  "missing a ping check" was wrong — the check was already there, but the
  Neo4jClient JSON_FALLBACK auto-fallback caused the ping to silently no-op
  through _run_query_json_fallback. Replaced root cause analysis with the
  verified false-positive chain from Neo4jClient.py:450-452.

- Decision Matrix table: marked both findings as FIXED with commit SHAs.
- Cross-References: added Plan v25 Option C references, memory_service.py
  pattern source, and graphiti_core root cause file pointer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request Apr 19, 2026
用户刚需:R4 工作流中 ship hands-on demo 和用户 UAT 批注两个环节之前靠
claude 记性,这次 story 1.16 v1 后忘记 ship 验收单被用户发现。固化三层防御:

L1 硬规则 (_bmad-output/.claude/CLAUDE.md):
  - dev-story Definition of Done 升级为 3 项(技术 + spec + R4 ship)
  - 9 项 dod 自检清单必跑
  - claude code 行动规则第 4 条改为"必 ship 验收单 + 通知用户"
  - 用户视角动作 #2 从"看 story spec 的 ## uat script"
    改为"在 obsidian 打开 canvas-vault/验收单/story-{id}-*.md"
  - r4 6 环节 × dod 映射表明确 r4-4/5/6/7 四个环节的强制对应

L2 固定模板 (_bmad-output/templates/uat-sheet-template.md):
  - 7 段结构:目标 / behavior / 交互 / uat / 结果 / 批注 / spec trace
  - 含 {placeholder} 占位,claude 复制后填充
  - 含 correct-course 触发的覆盖更新协议(不新建,追加历史 callout)

L3 (待拍板):stop hook 技术保障
  - 检测上轮有 review 状态变更但 canvas-vault/验收单/ 无新文件 → 阻断
  - 等用户确认是否值得增加 hook 复杂度

story 1.16 验收单 (canvas-vault/验收单/story-1.16-批注-hotkey.md) 作为
首次样例,已按 7 段结构 ship 并含 v1→v2 历史追溯。

story: 1.16
PLAN-EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 3, 2026
新增独立 service 模块(spec 偏离:替代扩展 1161 行的 context_enrichment_service.py,符合 SOLID):
- backend/app/services/wikilink_context_service.py (180 行)
  · enrich_from_wikilink_graph(node_path, max_hops=2, timeout_ms=200)
  · WikilinkNeighborContext dataclass (slug/path/hop/relationship_type/frontmatter/content_summary)
  · EnrichmentResult dataclass (degraded 标记)
  · _extract_relationship_type: 从 frontmatter relationships[] 提取目标关系
  · _normalize_target_slug: vault 路径 → basename
  · 降级路径:graph_not_built / traversal_timeout / unexpected_error 全部 degraded=True 不抛异常

- backend/tests/unit/test_wikilink_context_service.py (210 行, 19 cases all green)
  · _normalize_target_slug × 4
  · _extract_relationship_type × 7(含 malformed 防御)
  · enrich_from_wikilink_graph × 8(含正常 / 孤立 / 异常 / 排序)

完成 AC #1 (2-hop 遍历) + AC #2 (关系类型提取) + AC #5 (降级处理)。
依赖 Story 1.3 wikilink_graph_service.get_neighbors(已 done in commit 4e0c27b)。

剩余 Task 2/3/5.3/5.4/6:ChatContextAssembler + Skill workflow + 集成测试 + UAT 验收单

PLAN-NNN: EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 3, 2026
…(25 tests green)

backend/app/services/chat_context_assembler.py (240 行):
- ChatContextAssembler 类
- assemble_context(current_note, neighbors, token_budget) — 5 优先级填充
  Priority 1: 当前笔记全文(最高,不可压缩)
  Priority 2: 1-hop frontmatter + Tips + errors
  Priority 3: 1-hop content_summary
  Priority 4: 2-hop frontmatter
  Priority 5: 2-hop content_summary
- compress_content(text, max_tokens) — atomic 块保护($$...$$ / $...$ / ```...```)
- _extract_atomic_blocks / _restore_atomic_blocks: placeholder 替换防破坏
- count_tokens: tiktoken cl100k_base(fallback 到 chars/4)
- token_budget: 默认 8192 / 环境变量 CHAT_CONTEXT_TOKEN_BUDGET 覆盖
- 返回 AssembledContext (text/used_tokens/budget/truncated/sections_included)

backend/tests/unit/test_chat_context_assembler.py (210 行, 25 cases all green):
- _resolve_token_budget × 4 (default / override / env / invalid env)
- _extract_atomic_blocks × 4 (LaTeX $$/单$/代码块/混合)
- _restore_atomic_blocks roundtrip + _drop_orphan_placeholders
- count_tokens × 3 (basic/empty/中文)
- compress_content × 5 (within budget/zero/LaTeX 保护/代码块保护/句子边界)
- assemble_context × 7 (current_note 优先/1hop+2hop 分组/summary/budget 不足/dataclass)

完成 AC #2 (上下文组装) + AC #3 (token 预算压缩 + 公式/代码块保护)。
累计 Story 2.1 单元测试:44/44 全 green。

剩余 Task 3 (Skill workflow + REST endpoint) + Task 5.3-5.4 (集成测试) + Task 6 (验收单)。

PLAN-NNN: EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 4, 2026
PLAN-EPIC2-STORY2.1-PHASE-1.7+

ChatGPT 对抗审查 cefabb2 找到 5 个 P0 + HIGH (评分 4/10). 4 路并行
Explore agent 验证全部成立, 全部修复 + 锁定 regression test.

P0#1 — chat.py 没传 trace=enrichment.trace 致 manifest 永远 trace_unavailable
  Fix: chat.py:156 trace=enrichment.trace (本地 unstaged 早已存在, 漏 commit 进 cefabb2)

P0#2 — WikilinkGraphService 缺 build_timestamp 字段, getattr fallback 永远 unbuilt
  Fix: graph_service 加 _build_timestamp + property + get_stats (同上, unstaged 漏 commit)

P0#5 — _CALLOUT_PATTERN 贪婪 regex 吞并相邻 callout
  Repro: > [!tip]+ A\n> a\n> [!error]+ B\n> b → 1 个 callout (期望 2)
  Fix: line scanner 替换 regex (O(n) 无 backtracking), _extract_body_excerpt 同改
  Test: 5 regression (相邻/blank quote line/3 连续/code fence/relation 噪音)

P0-A — _read_neighbor_md 信任 absolute path 可读 vault 外文件
  Fix: _resolve_vault_md_path sandbox (resolve strict + relative_to(root)
       + .md suffix check + 1MB DoS cap)
  Test: 4 regression (vault 外/dotdot escape/非 .md/超大文件)

P0-B — _format_neighbor_metadata body 行未 escape, 可注入 </neighbor><system>
  攻击载荷可闭合 <neighbor> 标签 + 注入伪 system 块绕过 <context_policy>
  Fix: 加 _xml_text_escape (含 control char 清理) + 应用到所有 user-content 行
       (rel_value/type/tips/errors/callout kind+title+content/summary snippet)
  Test: 3 injection regression (callout/summary/relation_type) + 专门
        test_security_p0_vulnerabilities.py 文件

实测 curl /api/v1/chat/enrich-context (节点/Fundamentals.md, 2 邻居):
- Token: 644 → 672 (escape 略增, 功能完整)
- Graph version: 2026-05-04T05:08:16+00:00 真实 ISO (eager build OK)
- Included: 2 | Degradations: none
- callout 段 + summary 段装载正常, 用户实测 manifest 与之前一致

测试: 102 pytest passed (68 → 102, +34 P0 regression)

致谢 ChatGPT 反馈: 找到本地 unstaged 修复漏 commit (P0#1/#2) +
line scanner 边界 bug (P0#5) + 实际 prompt injection 攻击载荷 (P0-B) +
path traversal 设计缺陷 (P0-A). cefabb2 评分 4/10 → 本 commit 期望
7/10 (剩余 HIGH: timeout / except 过宽 / 多 worker race / regex DoS
等需独立 follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 4, 2026
PLAN-EPIC2-STORY2.5-T2-T3

Story 2.5 错误自动提取与分类 — Task 2 (4 主类) + Task 3 (补救策略) 落地.
4 类映射决策选 D 方案 (扩展不破坏): 现有 ErrorType (Story 3.6 production
data) 保留, 新增 PedagogyErrorType (PRD §FR-CONV-06) 双标签共存.

新增 (entity_types.py):
- PedagogyErrorType enum: conceptual_confusion / procedural_error /
  careless_slip / metacognitive_error (PRD AC #2)
- RemedyStrategy 加 2 项: DISCRIMINATION_COMPARISON (辨析+对比) +
  TRANSFER_SELF_EXPLANATION (迁移+自我解释) — 对齐 PRD AC #3
- PEDAGOGY_TYPE_TO_REMEDIES: PRD 4 类 → 补救策略 list
- LEGACY_TO_PEDAGOGY: legacy 4 类 → PRD 4 类静态映射
- disambiguate_superficial(): SUPERFICIAL 二义消解 (sub_tag 优先 + 关键词)
- map_legacy_to_pedagogy(): 统一映射函数 (含 SUPERFICIAL 拆分)

新增 (error_classifier.py):
- ClassifiedError pydantic model: legacy_type + pedagogy_type 双标签
  + legacy_remedy + pedagogy_remedies + sub_tags + is_ambiguous property
  (confidence < 0.6 = AMBIGUOUS, PRD AC #2)
- ErrorClassifier.classify_with_pedagogy(): 双标签分类入口
- ErrorClassifier._llm_classify_with_confidence(): 同时拿 ErrorType + confidence

向后兼容: 现有 ErrorType / ERROR_TYPE_TO_REMEDY / ClassificationResult /
classify() 全部保留, Story 3.6 production data 不破坏.

SUPERFICIAL 二义消解规则:
- sub_tag 含 transfer_failure / metacognitive / overconfidence → METACOGNITIVE
- description 含 迁移/应用/新场景/transfer/过度自信 → METACOGNITIVE
- 否则默认 → CONCEPTUAL_CONFUSION

Tests +24 (全 PASS):
- 4 LEGACY→PEDAGOGY 映射 (含 PROBLEM_FRAMING→CARELESS_SLIP, KNOWLEDGE_GAP
  →CONCEPTUAL_CONFUSION 等关键映射)
- 6 SUPERFICIAL 二义消解 (默认/sub_tag 优先/关键词触发/sub_tag>关键词)
- 6 PEDAGOGY→REMEDY 完整性测试 + 向后兼容
- 7 classify_with_pedagogy 双标签结果验证 (含 AMBIGUOUS 标记 / 4 主类
  完整覆盖 / 双 remedy 关联)
- 1 现有 classify() 行为不变测试

Story 2.5 剩余 task (待续 commit):
- Task 1: error_extractor.py 对话错误提取
- Task 4: 双写 frontmatter + Graphiti (record_error MCP)
- Task 5: Skill workflow /chat-with-context 集成
- Task 6: e2e 集成测试

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 4, 2026
…e 集成测试

PLAN-EPIC2-STORY2.5-SHIPPED

Story 2.5 错误自动提取与分类 — 全部 6 task 完成 ✅, 标记 done.

Task 5 — record_error MCP tool 升级 (双标签 + 双写)

backend/app/mcp/tools/error_tools.py 改造:
- RecordErrorInput 加 sub_tags 字段 (Story 2.5 SUPERFICIAL 二义消解)
- RecordErrorOutput 加 5 个新字段 (向后兼容):
  · pedagogy_type (PRD §FR-CONV-06 4 主类)
  · pedagogy_remedies (list[str])
  · confidence (LLM 分类置信度)
  · is_ambiguous (confidence < 0.6 → True, PRD AC #2)
  · frontmatter_written / graphiti_status
- 新增 _resolve_node_file_path(): 从 node_id 推断 vault_root/X.md 文件路径
  (从 settings.canvas_base_path 解析, 失败返回 None)
- record_error() 重写:
  · 调 classify_with_pedagogy() (Task 2 双标签) 替代 legacy classify()
  · 调 write_error_dual() (Task 4) 同步 frontmatter + fire-and-forget Graphiti
  · 保留全部 Story 3.6 legacy 字段 (error_type / remedy_strategy /
    error_type_label / remedy_description) 不破坏向后兼容
  · graphiti_status: scheduled (fire-and-forget) | ok | failed |
    skipped_frontmatter_failed | not_attempted
- AC #4 + #6: frontmatter 本地优先, Graphiti 失败仍 recorded=True

Task 6 — e2e 集成测试 (5 tests, 全 PASS)

backend/tests/integration/test_error_extraction_e2e.py 新增:
- test_e2e_dialog_to_frontmatter_full_pipeline:
  完整链路 dialog → ExtractErrorsFromDialog → classify_with_pedagogy
  → write_error_dual → frontmatter 含双标签 (legacy + pedagogy)
  + Graphiti record_knowledge_entity 被调用
- test_e2e_dialog_no_errors_no_writes: 无错误对话 → 无写入 (AC #5)
- test_e2e_record_error_mcp_tool_full_pipeline:
  MCP tool input → SUPERFICIAL + sub_tag transfer_failure 触发
  二义消解 → pedagogy_type=METACOGNITIVE_ERROR (D 方案核心验证)
  → frontmatter 含 transfer_self_explanation remedy
- test_e2e_record_error_low_confidence_marked_ambiguous:
  confidence 0.45 → is_ambiguous=True (PRD AC #2)
- test_e2e_record_error_graphiti_failure_frontmatter_succeeds:
  AC #6 验证 — memory_service ImportError → frontmatter 仍写入,
  recorded=True, graphiti_status=scheduled

Story 2.5 spec status: ready-for-dev → ✅ done

测试: 全套 162 pytest passed (含 Story 2.5 共 55 tests:
24 mapping + 11 extractor + 15 writer + 5 e2e)

Story 2.5 全 6 task ship 总览:
- ✅ Task 1: error_extractor.py (LLM 对话错误提取, AC #1, #5)
- ✅ Task 2: 4 主类 + 双标签 D 方案 (PRD AC #2)
- ✅ Task 3: 补救策略 PedagogyErrorType → RemedyStrategy 映射 (PRD AC #3)
- ✅ Task 4: error_writer.py 双写 (frontmatter atomic + Graphiti retry, AC #4 + #6)
- ✅ Task 5: record_error MCP tool 升级 (向后兼容, 双标签 + 双写)
- ✅ Task 6: e2e 集成测试 (5 tests, 全链路 + 边界场景)

EPIC 2 进度:
- Story 2.1 ✅ done (commit bfe0ef2)
- Story 2.5 ✅ done (commit dad9ed7 + d7621f4 + 57aa3bd + this commit)
- 进度 22% (2/9 stories, 20h/70h)
- 剩余: 2.2 / 2.3 / 2.4 / 2.6 / 2.7 / 2.8 / 2.9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 4, 2026
…on d 方案

PLAN-EPIC2-STORY2.5-DEEP-RESEARCH

生成 ChatGPT deep research prompt 文档供用户复制粘贴用. 不同于
Story 2.1 三轮对抗审查 (4/10→7/10→8/10), 本次 Story 2.5 采用
deep research 模式 (学术 + 产业 + 改进方向综合研究).

文档结构:
- System prompt: 教育心理学 + 学习科学 + AI 工程 跨领域 staff researcher
- 项目坐标: GitHub URL + 分支 + HEAD commit (268c9aa) + 关键文件路径
- PRD §FR-CONV-06 期望 + AC #2/#4/#6
- D 方案核心代码片段:
  · ErrorType (legacy) + PedagogyErrorType (PRD) 双 enum
  · LEGACY_TO_PEDAGOGY 静态映射
  · disambiguate_superficial() SUPERFICIAL 二义消解
  · PEDAGOGY_TYPE_TO_REMEDIES 补救策略
  · ClassifiedError 双标签数据模型
  · EXTRACTION_PROMPT LLM 提取 prompt
  · write_error_to_frontmatter / write_error_to_graphiti
  · record_error MCP tool 输入输出 schema
- 55 tests 基线 + 4 测试文件分布
- 4 个 deep research 核心问题:
  Q1 学术对齐度: Bloom / BKT/DKT / VanLehn / Chi / Ericsson
  Q2 产业实践对比: Khan Academy / Duolingo / NotebookLM / Anki AI 等
  Q3 D 方案评估: 弱点+改进+方案选择
  Q4 Phase 2 ROI: FSRS / Hub penalty / 跨概念模式 / Tip 生成
- 输出 format 模板 (table + 评分)

使用方式: 复制粘贴给 ChatGPT (推荐 GPT-5/o3 with deep research mode).
不是普通 code review, 不是审查 P0 安全问题. 是综合研究学术对齐 +
产业实践 + 改进路径.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 4, 2026
基于 ChatGPT Round-2 reply (commit 348a7ae) 锁定的双 spec, ready-for-dev:

Story 2.5.X — 用户主权回归 C+ 方案 (471 行, 18-24h, P0)
  · trace: FR-CONV-06 + Decision-Review-D15 (待用户 PRD §12 批注)
  · AC #1: AI 候选写 frontmatter error_candidates[] 不直接进 errors[]
  · AC #2: 6 状态机 (pending/accepted/edited/dismissed/disputed/expired)
  · AC #3: dedupe 不重复添加, hash 不含 session_id (跨 session 同错应 update 不 append)
  · AC #4: 非阻塞 Notice + Dashboard Dataview 保活 ("待复盘 N 条")
  · AC #5: POST /api/v1/errors/accept-candidate (candidate → errors[] + Graphiti)
  · AC #6: POST /api/v1/errors/rebuild-graphiti?group_id=... 兜底机制
  · AC #7: dismissed/disputed 路径 (用户否决 AI, dispute_reason 必填)
  · 10 Tasks 含 candidate writer / 状态机 / accept/dismiss/dispute/rebuild endpoints
    + Dashboard Dataview / Plugin 命令 / session_id 注入 / expired 自动归档
  · 依赖 Story 2.5 (commit 0d05ad8)
  · UAT 6 场景 + 7 自动 checkpoints

Story 2.5.Y — 隔离硬化 SubjectConfig 复用 (494 行, 26-35h, P0)
  · trace: FR-CONV-06 + FR-CTX-08 + Decision-Review-D16 (待用户 PRD §12 批注)
  · AC #1: PostTurnExtractRequest 强制 vault_id 字段 (缺则 422)
  · AC #2: 复用 SubjectConfig.build_group_id() 派生 group_id
  · AC #3: error_writer.py:270 移除 DEFAULT_GROUP_ID 硬编码 (group_id 必填参数化)
  · AC #4: LanceDB 强制注入 WHERE group_id 过滤 (修 vault_notes_retriever)
  · AC #5: Cypher 防御性 helper cypher_with_group_filter()
  · AC #6: group_id 命名统一为 vault:<vault_id>[:<sub>] 格式 (弃 cs188/canvas-dev)
  · AC #7: per-group export/rebuild 脚本 + idempotency
  · AC #8: E2E 两 vault 同名节点不串 (三层 Cypher/LanceDB/Graphiti 隔离)
  · 10 Tasks 含 SubjectConfig 强化 / vault_id 字段 / 硬编码移除 / 隔离审计
    + 命名迁移 / export-rebuild 脚本 / Plugin 端 vault_id 注入 / 文档更新
  · 依赖 Story 2.5 + 2.5.X
  · UAT 8 场景 + 7 自动 checkpoints

合计工作量: 44-59h (Round-1 77-104h 减 43%)
commit-ready 程度: 8.5/10 (ChatGPT Round-2)

下一步:
- 用户在 PRD §12 批注 D15 (用户主权方案 = C+) + D16 (隔离方案 = 复用 SubjectConfig)
- 用 bmad-bmm-dev-story skill 启动 Story 2.5.X 实施 (先 X 后 Y)
- Y 在 X 基础上加隔离硬化, 改 X 的 endpoint 签名时同步更新 X 测试

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 5, 2026
Story 2.5.X Task 2/10 完成 (AC #2: 6 状态机实现)

实施 (Task 2.1-2.5):
- backend/app/services/candidate_state_machine.py 新增模块:
  · CandidateStatus Literal type (pending|accepted|edited|dismissed|disputed|expired)
  · ALLOWED_TRANSITIONS 图: pending → 5 终态全合法, 终态间 0 转换
  · validate_status_transition(current, target) → HTTPException 422 (友好 error message 三类)
  · apply_status_change(candidate, target, *, changed_by="user")
    - 校验 + in-place mutation
    - 自动写 status_changed_at (ISO 8601 timestamp)
    - 自动写 status_changed_by ("user" 默认 / "system" cron 用)
  · is_terminal_status / is_active_status helpers (Dashboard 过滤)

测试 (41 新增):
- 6 状态全包含
- 5 合法 pending→X (parametrize)
- 9 非法 (5 反向 + 4 终态间, parametrize)
- unknown current/target → 422
- terminal state error message 含 "terminal state"
- ISO 8601 时间戳格式正则验证
- changed_by 默认 "user" / 显式 "system"
- in-place mutation 验证
- 业务场景: accept/dispute/expire workflow + double accept 拒绝

回归: 110 测试全 pass (41 state_machine + 16 candidate_writer + 53 v1.0)

剩余 Story 2.5.X Tasks (8/10):
  Task 3: accept_candidate endpoint
  Task 4: dismiss/dispute endpoints
  Task 5: rebuild_graphiti endpoint + chat.py 切 candidate_only
  Task 6: Dashboard Dataview 保活
  Task 7: Plugin 命令 + Notice + SuggestModal
  Task 8: session_id E2E 验证 (已部分覆盖)
  Task 9: expired 30 天 cron
  Task 10: 集成测试 + UAT

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 5, 2026
…→ review

Story 2.5.X (D15 用户主权 C+) 全量 ship → status=review

Tasks 完成:
- Task 8: session_id E2E 验证 (跨 session 累加 seen_sessions)
- Task 9: expired 30 天自动归档 cron
- Task 10: 集成测试 (10 E2E + 各 service 单测覆盖完整)

实施:

backend/app/services/candidate_expiry_service.py (~210 行新文件):
- expire_pending_candidates(vault_root, *, expiry_days=30, now=None) → ExpireStats
  · 扫描 vault 节点/*.md (复用 _scan_vault_md_files)
  · pending + created_at < cutoff → apply_status_change("expired", changed_by="system")
  · 幂等性: 已 expired 不再处理
  · per-file lock 复用 (_get_file_lock)
  · 仅当有改动时写文件 (避免无意义 mtime 更新)
  · 单条失败不中断, 记入 failures[]
- _parse_created_at: ISO 8601 / Z 后缀 / naive / None 容错
- _is_expired: status=pending AND created_at < cutoff (无 created_at 保守跳过)
- ExpireStats / ExpireFailure Pydantic schemas

backend/tests/unit/test_candidate_expiry_service.py (20 测试):
- _parse_created_at: 4 容错场景
- _is_expired: 5 边界 (old/recent/non-pending/no created_at)
- expire 主流程: old → expired / recent 不变 / 终态跳过
- 幂等性: 第二次跑 0 expired
- 跨文件批量
- 无 created_at 保守跳过
- 仅当有 expire 时写文件 (mtime 不变)
- DEFAULT_EXPIRY_DAYS = 30 + cutoff_iso 写入 stats

backend/tests/integration/test_2_5_x_e2e.py (10 E2E 测试):
- E2E #1: full accept (write → accept → errors[] + Graphiti queued)
- E2E #2: accept with edits → status=edited + edits 应用到 errors[]
- E2E #3: dismiss path → 不入 errors[]
- E2E #4: dispute path → dispute_reason 持久化
- E2E #5 (Task 8): session_id 跨 3 session 累加 → 1 candidate + seen_sessions={s1,s2,s3}
- E2E #6 (Task 9): expired 30 天后 cron 归档 → status=expired + changed_by=system
- E2E #7 (Task 5): rebuild_graphiti 从 errors[] 重建
- E2E #8: rebuild dry_run 仅扫描计数
- E2E #9: 双重 accept 反向不可逆 → 422
- E2E #10: dismiss → accept 终态间被拒 → 422

测试累计:
- Backend: 167 passed (113 Story 2.5.X + 54 v1.0 回归)
  · 16 candidate_writer (Task 1)
  · 41 state_machine (Task 2)
  · 14 candidate_service (Task 3+4)
  · 13 rebuild_service (Task 5)
  · 20 expiry_service (Task 9)
  · 10 e2e (Task 8+10)
  · 53 v1.0 (error_writer + extractor + classifier + ChatGPT regression)
- Plugin: 104 passed (19 helpers + 85 v1.0)
- TOTAL: 271 全 pass, 0 fail

sprint-status: 2-5-x in-progress → review

Story 2.5.X 完整交付:
- 4 backend endpoint: accept/dismiss/dispute/rebuild-graphiti
- 3 plugin command + 2 Modal class + helpers 模块
- Dashboard "📋 待复盘错误候选" Dataview section
- candidate_expiry_service cron (lifespan hook 集成留 2.5.Y)
- 6 状态机 + dedupe + per-file lock + 原子写入
- frontmatter error_candidates[] + errors[] 双数组并存
- session_id 透传 + seen_sessions[] 累加

下一步: 用户 UAT (Phase A 半手动 demo + 完整 7 命令测试) → done

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 5, 2026
… 移硬编码

Story 2.5.Y (D16 隔离硬化) Tasks 1-3/10 完成 (并行 2.5.X review)

Task 1 - 复用并强化 SubjectConfig (AC #2):
- backend/app/core/subject_config.py:
  · 新增 build_vault_group_id(vault_id, subject_id, canvas_path)
  · 强制 vault: 前缀命名 (vault:cs_61b / vault:数学 / vault:cs_61b:algorithms)
  · vault_id 必填 (空抛 ValueError)
  · subject_id 优先于 canvas_path (互斥)
  · canvas_path 复用 extract_canvas_name 提取 stem
  · 新增 is_vault_group_id() helper (检测新格式 vs legacy)
- 旧 build_group_id 保留向后兼容 (Story 1.9 production data)
- 测试: 21 新增 test_subject_config_vault.py
  · vault_id 必填 (空/whitespace/None 抛错)
  · 基础组合 (vault_id 单/中文/sanitize)
  · subject_id 二级隔离
  · canvas_path stem 提取 + .canvas 扩展
  · 互斥性 (subject 优先于 canvas)
  · 与旧 build_group_id 区分性
  · is_vault_group_id 识别新旧格式

Task 2 - PostTurnExtractRequest 加 vault_id 字段 (AC #1):
- backend/app/api/v1/endpoints/chat.py:
  · PostTurnExtractRequest 加 vault_id: str = Field(..., min_length=1) 必填
  · 加 subject_id / canvas_path 可选字段
  · post_turn_extract endpoint 入口调 build_vault_group_id + set_current_subject_id 注入 ContextVar
- 测试: 7 新增 test_post_turn_request_vault_id.py
  · vault_id 必填校验 (缺/空/None → ValidationError)
  · subject_id / canvas_path 可选默认 None
  · 中文 vault_id 通过
  · v1.0 校验仍生效 (messages min/max + total_chars budget)

Task 3 - error_writer 移除 DEFAULT_GROUP_ID 硬编码 (AC #3):
- backend/app/services/error_writer.py:
  · write_error_to_graphiti 加 group_id: Optional[str] kwonly 参数
  · group_id 解析优先级:
    1. 显式 group_id 参数 (cron/CLI 场景)
    2. ContextVar get_current_subject_id (endpoint 注入, 主路径)
    3. fallback DEFAULT_GROUP_ID + structlog warning (deprecated)
  · record_knowledge_entity 调用改用 effective_group_id (不再硬编码)
- 渐进式迁移: DEFAULT_GROUP_ID fallback 保留 (Task 6 命名迁移后可移除)

测试 cascading 修复:
- backend/tests/integration/test_story_2_5_chatgpt_round2_p0.py:
  · 6 个 endpoint 测试加 "vault_id": "cs_61b" 字段
- backend/tests/integration/test_error_extraction_e2e.py:
  · write_error_dual 调用加 mode="write_confirmed" (Story 2.5.X 兼容)

回归: 217 测试全 pass (88 v1.0 + 113 Story 2.5.X + 28 Story 2.5.Y, 0 fail)

sprint-status: 2-5-y ready-for-dev → in-progress

剩余 Story 2.5.Y Tasks (7/10):
  Task 4: LanceDB 向量搜索注入 group_id 过滤
  Task 5: Cypher 防御性 helper cypher_with_group_filter
  Task 6: group_id 命名统一迁移脚本 (cs188/canvas-dev → vault:<id>)
  Task 7: per-group export/rebuild 脚本
  Task 8: E2E 多 vault 测试
  Task 9: Plugin 端传 vault_id (cascade 改 main.ts post-turn-extract)
  Task 10: 文档更新 (CLAUDE.md group_id 规约)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 8, 2026
@SPEC: round-23-stage-1

7 sub-tasks 全部完成 (用户决策 2026-05-08):

- 7.1 patch 1 fail-closed config (validate_security_defaults, 4/4)
- 7.2 patch 2 canonical_group_id 单一入口 (cs188→vault:default, 6/6)
- 7.3 patch 3 search_nodes fulltext fast path (13/13)
- 7.4 错误管理读路径接通 (error_reader + 3 GET endpoints, 7/7)
- 7.5 internal_api_key + websocket auth (8/8 fail-closed matrix)
- 7.6 cs188 历史数据迁移脚本 (CLI dry-run/apply/json/force)
- 7.7 测试 + uat (102/104 = 98%)

round-14 残缺修复进度:
- #1 错误管理只写不读 → 修复
- #2 cs188 group_id 散落 → 修复
- #3 search_nodes CONTAINS 退化 → 修复
- #4 前后端零同步 → stage 2

felt-sense: 3.5/10 → 8.3/10 (+4.8)

source: round-23-chatgpt-dr-result-and-synthesis-2026-05-08.md
uat: Stage-1-Round-23-阶段1-硬化-UAT-2026-05-08.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 8, 2026
…onship sync

Round-14 残缺 4 项最终修复(#1-#4 全部 ship):
- #1 错误管理只写不读 (Stage 1 已修, error_reader.py)
- #2 cs188 group_id 散落 (Stage 1 已修, canonical_group_id)
- #3 search_nodes CONTAINS 退化 (Stage 1 已修, fulltext fast path)
- #4 前后端零同步 (Stage 2 修复, relationship_sync_service)

5 sub-tasks 完成:
- 8.1 Wikilink 增量 refresh: schedule_note_index + _debounced_note_index
       + POST /api/v1/index/refresh-changed (6/6 端到端测试 pass)
- 8.2 JSON fallback 原子化: 新建 app/utils/atomic_io.py (149 LOC)
       + 替换 4 处 json.dump (memory/edge/review/canvas) (5/5 crash-safe)
- 8.3 Graphiti 读写一致性: 现状盘点 (50/50 现有测试 pass)
       + vector_count placeholder 接真实 LanceDB stats
- 8.4 残缺 #4 frontend ↔ Graphiti 双写: relationship_sync_service.py (253 LOC)
       + POST /sync/relationships/by-node + /sync/relationships/vault (8/8)
- 8.5 测试 + UAT: 152/154 全栈回归 (98.7% pass)

Karpathy 80/20 实践: 实际工时 ~12.5h vs ChatGPT 估 60h (节省 79%)。
Stage 1 patches 已铺基础设施 + 现有 atomic 模板复用 80% + Graphiti
集成已实现 50/50 测试通过 — 大量预设工作已完成无需重写。

Felt-sense 整体闭环成熟度: 4.0/10 → 9.0/10 (+5.0)

EPIC1-BMAD-DEV-ASSESS-2026-04-17
PLAN-023
@SPEC: round-23-stage-2

Source: _bmad-output/research/round-23-chatgpt-dr-result-and-synthesis-2026-05-08.md
UAT: _bmad-output/验收单/Stage-2-Round-23-阶段2-收口-UAT-2026-05-08.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 8, 2026
Story 2.2 (supplementary-material-search) Phase A 落地。按 Round-23 之后渐进 UAT
模式(每 Phase ship mini-UAT 防偏离),Phase A = Task 1 (MCP 集成) + Task 4 (降级)。

变更:
- 新建 backend/app/services/supplementary_search_service.py (~210 行)
  - hybrid 搜索 (bge-m3 + jieba) + source priority 复用 + explanation files filter
  - 三档降级:lancedb_unavailable / search_failed / empty_index / all_filtered_below_threshold
  - format_supplementary_xml 含 XML escape 防 vault 内容破坏 XML 解析
- backend/app/api/v1/endpoints/chat.py 4 处 patch
  - imports: pathlib.Path + structlog + supplementary_search_service
  - EnrichContextResponse 加 supplementary_count/degraded/reason 3 字段(零 schema breaking)
  - enrich_context Step 5 注入(mode=answer + user_question 双重守门,Story 2.1 预留 schema 直接复用)
  - return 回填 3 新字段
- canvas-vault/.claude/skills/chat-with-context/SKILL.md 3 处 patch
  - prompt 解析说明加 <supplementary_materials> section 描述
  - 开场白模板加 "📚 相关材料" 提示
  - 新增 "## 补充材料展示" 段含 felt-sense 引导 + 降级处理规则

AC 覆盖:
- AC #1 (search_vault_notes 集成) ✅ chat.py:177-216
- AC #5 (LanceDB 不可用降级) ✅ supplementary_search_service.py:42-114 + chat.py try/except
- AC #4 (增量索引 < 500ms) ✅ 复用 Story 38.1 lancedb_index_service

Phase A 暂不做(明确 scope 防蔓延):
- AC #2 (wikilink 三精度 file/heading/block_id) → Phase B
- AC #3 (类型权重精排 lecture > discussion > exam) → Phase B
- Task 5 (单元 + 集成 + 性能测试) → Phase C

mini-UAT 验收单(DoD-3 v3.0 双段铁律 7 sections + 5 题自检全过):
_bmad-output/验收单/Story-2.2-Phase-A-MCP-集成-2026-05-08.md

Plan: EPIC1-BMAD-DEV-ASSESS-2026-04-17
Story: 2.2-Phase-A

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 9, 2026
用户实测痛点: phase a supplementary 给的 wikilink 点击不能跳转到具体笔记片段.
样本: [[节点/规划的分类-1549()#规划的分类-1549|跳转]] - heading anchor 漂移
样本: [[raw/CS188/videos/lectures/lecture 2/chunks/merged#4.4 价值迭代...|跳转]]
       - chunks/merged.md 是 lancedb 切分时的虚拟派生路径,文件不真实存在

3 并行 agent 实测确认双重 bug:

根因 #1 (70%): heading over-strip
- supplementary_search_service.py:404 旧代码: re.sub(r"\(\)\s*$", "", heading)
- 本意清残留 "[time]()" 但被前一条 regex 已处理
- 副作用: 节点 "规划的分类-1549()" 的 heading "# 规划的分类-1549()" 被剥成 "规划的分类-1549"
- → wikilink anchor 与 obsidian 文档实际 heading 不字面匹配 → 不跳转

根因 #2 (30%): chunks/merged 虚拟派生路径
- lancedb_client.py:2098-2230 切分 md 时按 heading 分 chunk,写 file_path 含 "chunks/merged.md"
- vault 内实际只有 "lecture 2/lecture 2.md",chunks/merged.md 不真实存在
- obsidian "no file matches" → 跳转失败

业界共识 (smart connections / khoj / copilot for obsidian 100% 一致):
chunk 是索引虚拟物件,绝不写虚拟派生文件,citation 始终指向原 .md + heading

patch (2 处):
- 移除 over-strip regex `\(\)\s*$` (line 404)
  · 仅保留 [[wikilink]] 清理 + [text](url) markdown link 清理
  · heading 字面保留所有 () / - / : 等真实字符
- 新增 _resolve_chunks_to_source_file(path) helper:
  · "X/chunks/<chunk>.md" → "X/X.md" (回写到原文件)
  · 不含 chunks/ 的 path 原样返回
- _normalize_material 调用 helper line 405

obsidian 跳转规则验证:
- heading 严格 case-sensitive + 字面匹配 (obsidian help / forum 40724)
- 文件名 case-insensitive + 模糊
- 末尾空格被 trim, 但 - 和 () 必须字面保留
- chunks/ 派生路径 obsidian 永远找不到文件

实测预期 (drop + reindex 后):
- wikilink: [[节点/规划的分类-1549()#规划的分类-1549()]] (heading 含真实 ())
- wikilink: [[raw/CS188/videos/lectures/lecture 2/lecture 2#2.3 规划代理 (Planning Agents)]]
  (chunks/merged 已回写到 lecture 2.md)
- 用户点击真实可跳转

未做 (留 phase b):
- block-id `^c-{hash8}` 写入源文件做 stable anchor (业界终极解)
- heading 大小写 / 末尾空格规范化 (相对低频问题)

plan: EPIC1-BMAD-DEV-ASSESS-2026-04-17
story: 2.2-phase-a-t1.5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 10, 2026
…oad + mv3 8 tests

防 5 vault 并发串库 — enrich-context endpoint vault_id 必填全链路。

- MV-1 backend/app/api/v1/endpoints/chat.py
  EnrichContextRequest.vault_id: str = Field(..., min_length=1) 必填
  EnrichContextRequest.subject_id: str | None = None 可选 (一 vault 一学科兼容)
  handler 入口调用链:
    sanitize_vault_id(req.vault_id) → 标准化 (NFKC + casefold + Unicode \w)
    build_vault_group_id(...)        → 构造 vault:<id>:<subj>:<canvas>
    set_current_subject_id(group_id) → 写 ContextVar
  → downstream wikilink/lancedb/supplementary 各 service 通过
    get_current_subject_id() 拿到同一 vault_id,5 vault 并发不串库
  契约参考 PostTurnExtractRequest (Story 2.5.Y AC #2)

- MV-2 frontend/obsidian-plugin/src/main.ts
  handleChatWithContext (Cmd+Shift+E) payload 加
    vault_id: inferVaultId(this.app.vault.getName())
  handleStudyQuestion   (Cmd+Shift+Q) payload 同步
  Plugin 传 raw vault name,backend 端统一 sanitize 防算法漂移

- MV-3 backend/tests/unit/test_enrich_context_vault_isolation.py (新增)
  8 个核心防御测试:
    1 vault_id 缺失 → 422 (plugin 旧版本不能 silent corruption)
    2 vault_id 空字符串 → 422 (min_length=1)
    3 vault_id 提供 → sanitize + build + set 全链路验证
    4 中文 vault_id 不坍缩 default (最致命数据泄漏点回归保护)
    5 subject_id 可选 (向后兼容)
    6 并发隔离 (asyncio.gather 2 vault 跑 ContextVar 各自独立) — P0 核心
    7 ../etc/passwd 路径遍历被净化
    8 emoji 被 strip 中文保留

  辅助调整:
    test_chat_endpoint.py / test_study_question_deep_mode.py
    helper _enrich_payload / _payload 加 vault_id='test_vault' 默认
    保 14+8=22 现有测试通过 (回归保护)

跑全套件 65 测试 0 regression (8 新 + 19 chat + 8 sq + 17 rag-p0 + 13 supp)

Trace: EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 12, 2026
…PT P0-A 间接)

- main.ts handleChatWithContext: AbortController + setTimeout(3000) + signal/clearTimeout
- main.ts fallbackToLocalNeighbors: 新增方法 (collectNodeNeighbors + buildNodeChatPrompt)
- node-chat-context.ts: 新增 buildChatWithContextFallbackPrompt 纯函数 + ChatFallbackReason type
- tests/chat-fallback.test.ts: 8 新增 unit tests (156/156 pass, 49ms)
- sprint-status: 2-2-and-2-9-merged in-progress, T1 review (用户 UAT 已 pass)
- 验收单 DoD-3 双段铁律合规(3 次 hook 阻断后改"被动观察"设计通过)

PLAN-EPIC1-BMAD-DEV-ASSESS-2026-04-17
Story: 2.2+2.9
Trace: 合并 spec AC #2 (原 Story 2.9 AC #6) / ChatGPT Review 2026-05-11 P0-A 间接关联

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 12, 2026
…idence

T3 (T3.1-T3.11): supplementary_reranker.py + rerank_service.py + wire
- TYPE_WEIGHTS table (PRD §4.1.1, lecture_notes 1.0 → raw_notes 0.6)
- BM25 Okapi 自实现 (jieba 中英分词, 避免引入未声明 rank-bm25 dep)
- hub_penalty = log(degree/median + 1) (Story 2.9 AC #2)
- final_score = relevance × type_weight + query_overlap × 0.3 - hub_penalty
- get_filter_threshold() = 0.42; rerank top_k=5
- chat.py wire rerank 进 supplementary 流程
- format_supplementary_xml 透出 rerank 4 字段 attribute
- TraceItemModel API 加 5 optional 字段 (forward compat)

T5 (T5.1-T5.3, T5.5; T5.4 plugin Notice 留 plugin iter):
- TraceItem.evidence + WikilinkNeighborContext.evidence
- _extract_relationship_info() 返回 (type, evidence) tuple
- assembler 渲染 `- 引证: ...` 行 (xml_text_escape + 截断 200 字)

测试: 65+ 新增 unit tests; 126/126 touched 文件 green; 零回归

PLAN-ID: EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 12, 2026
…dit f1+f3

chatgpt v4 5-verdict 拿到 4 closed + 1 closed-with-caveats. 同时找到 3 个 p1
我和 claude self-audit 都漏的真问题. 本 commit 完全闭口.

w3-1 metadata redaction (chatgpt p1 #1):
- supplementary_search_service.py:format_supplementary_xml 扩展 taint-aware
  到 metadata 字段. 当 taint in {review, quarantine}, title / wikilink /
  source_path 也输出 [redacted: tainted title (risk=x.xx)] / [redacted] 等
  placeholder, 不再无条件 _xml_escape 原文.
- 修旁路: 攻击者把 prompt injection payload 埋 frontmatter title 即可绕过
  (snippet redacted 但 title 原样进 prompt). 4 新测试覆盖 review/quarantine
  /clean/partial-field 4 路径.
- bonus: test_supplementary_metadata_fuzz.py 去掉 4 个 xfail (sanitizer
  现 merged 可强制 gate)

w3-2 de-xfail 2 security tests (chatgpt p1 #2):
- test_supplementary_review_floor.py + test_cross_vault_global_search.py
  去掉 @pytest.mark.xfail(strict=false) 装饰器
- test_supplementary_review_floor 内 assert 翻转 (从"review survive floor"
  改为"review must be dropped by floor"匹配 wave-2 p0-3b 修法)
- --strict-markers 跑现 0 xfail / 0 xpassed / 2 passed strict

w3-3 lancedb except narrowing + warning (chatgpt p1 #3):
- lancedb_client.py:active_vault_id level 2/3 except 缩窄
  (importerror, attributeerror, runtimeerror, valueerror) — basesexception
  / keyboardinterrupt / systemexit / asyncio.cancellederror 现在正确传播
- level 4 default fallback 前加 logger.warning ("fell back to 'default'")
- 3 新测试覆盖: default fallback log / narrow exception keyboardinterrupt 传播
  / runtime error fall through 链路完整

w3-4a frontend trim (claude self-audit f1):
- main.ts:buildBackendHeaders 加 .trim() — 防 user 误填 "  " whitespace key
  被当 valid 触发 backend 403, 同时 trim 后发送防 leading/trailing space
  constant_time_compare 失败
- buildBackendHeadersPure pure-function mirror 同步更新
- 5 新测试 (whitespace-only / 混合 whitespace / 两端空格 / 内部空格保留 /
  undefined 安全)

w3-4b wikilink __default__ once-warned (claude self-audit f3):
- wikilink_graph_service.py:_resolve_vault_key 当落 __default__ 桶时
  inspect.stack() 取真实 caller frame, 同 caller 仅 warn 一次 (避免每请求
  噪音), 不同 caller 独立 warn
- 加 _caller_fingerprint helper + _warn_default_fallback_once helper
- 3 新测试 (same caller warns once / different callers independent /
  exception path also warns)

测试: backend 262 pass (含 3+3+4+3=13 新) / 1 pre-existing fail (story 2.5.y
d16 cs61b: -> vault:cs61b: 格式锁, wave-3 无关) / 0 xpassed (strict gate);
frontend 191 pass (+5 trim 测试). --strict-markers 闭口 ci.

PLAN-ID: EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 13, 2026
story 2.3 (current_task 8-session plan s2) v1.0 ship — 5 ac + 5 task / 20 子任务全实现:

ac #1: memory_service.search_error_memories(node_id, group_id, limit=5) wrapper
  - post-merge filter by episode_type ∈ {error, misconception, mistake}
  - timestamp desc sort, oversample max(20, limit*4) 防 episode_type filter 后剩余不足
  - schema normalize → error_type/description/corrected_at/tags/source_session
  - search_memories signature 加 node_id 参数, 向后兼容 50+ 现有调用方

ac #2: chat_context_assembler.inject_error_reminders + priority 1.5 注入
  - _format_historical_errors xml 标签包装 + 顶部 <policy> 段 (自然过渡 + 不要生硬插入)
  - 正面措辞模板硬编码 (学习者之前标记过 ... 如果讨论涉及此话题, 请自然地提醒区分)
  - assemble_context historical_errors 参数 priority 1.5 (current_note 后 1-hop 邻居前)
  - token 不够整段跳过不截断 (单条 error 截断会失真)

ac #3 性能: chat.py asyncio.wait_for(timeout=3.0) + structlog memory_search_latency_ms
ac #4 双路径熔断:
  - timeouterror → reason=search_timeout
  - (connectionerror, runtimeerror, oserror) → reason=service_unavailable
  - 降级时 historical_errors=[], 对话照常进行用户不感知
ac #5 空记录: empty list → 跳过 priority 1.5 段, 不输出冗余 无历史误解 提示

测试: tests/unit/test_story_2_3_error_reminders.py (287 行, 21 用例, 1.64s)
回归: test_chat_context_assembler.py + test_chat_endpoint.py 共 66 用例零失败
总计: 87/87 pass

dod-3: _bmad-output/验收单/Story-2.3-historical-error-reminder.md (status: review)
  - d3-a 段 4-b 禁词 0 命中
  - d3-e 我做 x→我看到 y→我感觉 z felt-sense 14 处
  - d3-c 段 4-a 21 项 claude 已代验全 ✅ 含证据

plan: EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 13, 2026
…pass opt-in

chatgpt-dr-2026-05-13 安全审查 critical #2: 修 memory poisoning 攻击向量.

漏洞:
- 旧版 _require_observer_token 在 SIDECAR_OBSERVER_TOKEN 未配置时直接 return
- 任何可达客户端可匿名 POST /memory/extract-conversation 注入 misconception
- 攻击者可武器化用户个人记忆管道, 让 ai 按假误解出针对性考题
- 与 user 核心诉求 批注+个人记忆+检验白板针对性考察 直接冲突

修复 (chatgpt 推荐 + claude 加强):
- 默认 fail-closed (token unset 也返回 503, 不再 open)
- 显式 ALLOW_LOCAL_OBSERVER_BYPASS=true env 开关
- bypass 同时要求 client.host loopback only
- 即便 bypass=true 但来自 lan/external ip → 仍 503 (纵深防御)
- structlog warning log 当 bypass 触发, ops 可见

auth decision matrix:
- token set + match → allow
- token set + mismatch → 401
- token unset + bypass=true + loopback → allow + warning
- token unset + bypass=true + non-loopback → 503
- token unset + bypass=false/unset → 503

测试: 12 cases, 6 branches + 3 defense-in-depth + 1 regression.
12/12 pass / 0.98s.

archive: _bmad-output/research/2026-05-13-chatgpt-security-audit-INLINE.md

历史: chatgpt 已 3 次提生产默认值收紧 (round-23 / wave-2 v4 / 本次)
本次首次落地.

@SPEC: PLAN-EPIC1-BMAD-DEV-ASSESS-2026-04-17
@SPEC: PLAN-001-CHATGPT-DR-2026-05-13-P0-1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721 added a commit that referenced this pull request May 15, 2026
α-5 (反馈瞬间 #2/#3, 零依赖):
- status-bar.ts: pure functions (countTipsFromFrontmatter, buildNavPath,
  buildStatusBarText, classifyTipsTransition, shouldTrackInNavPath,
  buildTipsIncreaseNotice) + StatusBarController class
- main.ts onload: addStatusBarItem + new StatusBarController
  · metadataCache.on('changed') fan-out 到 statusBar.handleMetadataChanged
  · workspace.on('file-open') 接 handleFileOpen
  · onLayoutReady 用 getActiveFile() 触发一次初始化
- 格式: "📝 Tips: N · 📍 prev → current" 常驻 status bar
- Notice "🎓 已记住 N 条 Tips" 在 tips count 自增时触发
- 37 tests all pass (pure helpers + spec-as-test grep main.ts wiring)

α-3 (反馈瞬间 #4/#5, 接口契约已锁, 等 backend α-2/α-4):
- exam-quick.ts: pure helpers (buildExamFilePath, todayDateStr,
  buildExamFileBody, extractAnswer, hasFeedbackSection, buildFeedbackAppend)
  + QuickExamController class
- main.ts onload: new QuickExamController (closure 注入 callBackend + inferVaultId)
  · vault.on('modify') fast-path 分发到 onFileModified
  · addCommand canvas:start-quick-exam
- 考察文件协议:
  路径: 节点/考察-{concept}-{YYYY-MM-DD}[-{n}].md (重名递增)
  frontmatter: exam_question_id / source_concept / generated_at / exam_status
  正文: # 考察 / ## 题目 / ## 你的回答 / ## 提交 (Cmd+S)
  评分后追加: ## 反馈 ({score}/5) + feedback 文本
- 防重复 grade: hasFeedbackSection + session.graded flag
- 35 tests all pass (含 spec-as-test grep main.ts wiring)

Backend 契约 (协调文档 §5 锁定):
- POST /api/v1/exam/quick  body={node_id, vault_id}
   resp={question_id, question_text, generated_at?}
- POST /api/v1/exam/grade  body={question_id, user_answer}
   resp={score:0-5, feedback, mastery_delta?}
Schema 不一致时 plugin 会 Notice "后端返回数据缺 X — 接口契约未对齐".

测试:
- npm test: 253/265 pass, 12 fail (pre-existing baseline: callout wrapSelection
  ×6 + vault-indicator wiring ×6, 与 Session B 无关)
- 新增 72 测试全 pass

Build:
- npm run build → 0 TS error, main.js 103KB
- DD-12 阻断 cp 到 canvas-vault/, 由协调员部署

Broadcast: _bmad-output/_status/mvp-alpha-broadcast-session-b.yaml

PLAN-NNN: EPIC1-BMAD-DEV-ASSESS-2026-04-17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant