fix(test-infra): hook pipe-blindness + hash assertion fix + EpisodeWorker baseline#2
Open
oinani0721 wants to merge 4 commits into
Open
fix(test-infra): hook pipe-blindness + hash assertion fix + EpisodeWorker baseline#2oinani0721 wants to merge 4 commits into
oinani0721 wants to merge 4 commits into
Conversation
…-paralysis) Two-phase fix for the post-fix-structlog-caplog-compat residual: 136 unit test failures + 17 errors that fall into hook-blindness and stale-mock patterns. Reduces backend unit failures from 136 to 102 (-34) and errors from 17 to 8 (-9), with 47 more skipped. Combined with the earlier fix-structlog-caplog-compat commit, total reduction from the 256-failure baseline is ~57%. ═══════════════════════════════════════════════════════════════════════════ Phase 0 — Hook chain exit-code propagation ═══════════════════════════════════════════════════════════════════════════ Three hook layers (lefthook backend-smoke + frontend-test, post-tool-router smoke + related + single-file, stop-test-runner) all piped pytest output through `| tail -N` or `| head -N`. POSIX pipelines return only the rightmost command's exit status; none of the hooks set `pipefail`. Result: every hook silently passed even when pytest exited 1. The 256 baseline failures persisted across the entire commit window 793cd53→3b96e49 because all three guard layers were blind. Files: - scripts/run_cmd_capture.sh (NEW): pure bash wrapper. Captures full stdout+stderr to /tmp/run_cmd_capture_<pid>_<ts>.log via `&>`, prints `[TEST FAILURE] exit code: <N>` + temp file path + last N lines on failure, exits with the wrapped command's original exit code. Never uses pipes itself, so its own exit code equals the command's. Verified directly: exit 7 propagated to caller, tail captured 5 lines, pytest canary test (assert False) → wrapper exit 1 with full traceback. - lefthook.yml: backend-smoke + frontend-test rewritten to invoke wrapper with --cwd backend/frontend --tail 120. Removes the dual `| tail -5` pattern that swallowed test failures. - .claude/hooks/post-tool-router.sh: 3 pytest pipes (smoke tier, related tier, single-file tier) rewritten to wrapper. Deleted the 3 redundant `[ \$? -ne 0 ] && exit 1` checks since they were operating on tail's exit code (always 0). Vulture (no pipe) untouched. - .claude/hooks/stop-test-runner.js: replaced execSync `| head -20` with wrapper invocation. stdio: "inherit" so the wrapper's [TEST FAILURE] block streams directly to user terminal without Node's 1MB maxBuffer truncating long tracebacks. Forced .venv/bin/python to avoid PATH ambiguity. Deleted the unreliable /FAILED|ERROR/.test(result) regex check (pytest --tb=line doesn't always include literal "FAILED" in truncated output). Exit 2 on failure (Stop hook protocol). Phase 1 (AST 38 logger pos-args rewrite): CANCELLED. Ground-truth test revealed that structlog.stdlib.BoundLogger does NOT raise on positional args — it preserves %s placeholders in the event field and stores args in a positional_args array. Zero of the 136 failures are caused by logger pos-args. Phase 1 would have improved log schema ergonomics but fixed no tests. Tracked as out-of-scope follow-up. ═══════════════════════════════════════════════════════════════════════════ Phase 2 — EpisodeWorker baseline + skip 8 stale-mock test files ═══════════════════════════════════════════════════════════════════════════ 37 references to deleted MemoryService._write_to_graphiti_json_with_retry (and a few to _write_to_graphiti_json) across 8 test files were producing AttributeError noise. fix-rag-transform-and-episode-isolation Phase 2 had moved retry/backoff/dead-letter semantics to GraphitiEpisodeWorker without migrating these tests. Files: - backend/tests/unit/test_episode_worker_retry.py (NEW): 5 baseline tests for EpisodeWorker covering enqueue + process success, exponential backoff sleep series with full jitter (assert sleep[i] ∈ [0, 2**i), 60s cap), dead-letter on retries-exhausted (4 attempts → JSONL written + counters), WorkerMetrics field completeness (10 fields), and request_id propagation through EpisodeTask → DeadLetterStore JSONL record. All 5 PASS in 0.47s. Important debug lesson encoded as a module-level _ORIGINAL_ASYNCIO_SLEEP constant: patching app.services.episode_worker.asyncio.sleep affects the asyncio module singleton process-wide, causing infinite recursion in any side_effect that itself awaits asyncio.sleep. Capture the original at import time and use it inside the patch closure. Stale-mock cleanup (per user authorization for module/class skip + reason): - backend/tests/unit/test_memory_service_write_retry.py: module-level skip (whole file is 100% tests of deleted method internals) - backend/tests/unit/test_graphiti_json_dual_write.py: module-level skip (single class, all tests reference deleted private methods) - backend/tests/integration/test_dual_write_consistency.py: module-level skip (integration test of deleted dual-write path) - backend/tests/unit/test_story_30_10_idempotency.py: class-level skip on TestEpisodesDedup, TestBatchEpisodesDedup, TestGraphitiJsonWriteDedup. PRESERVES TestDeterministicEpisodeId + TestBatchDeterministicEpisodeId (6 hash-only tests still active). - backend/tests/unit/test_failure_observability.py: class-level skip on TestMemoryServiceDualWriteFailure. Other classes (TestDeadLetterRequestId, TestEdgeSyncFailureCounter, etc.) untouched. - backend/tests/unit/test_qa_38_6_scoring_reliability_extra.py: class-level skip on TestFullCycleIntegration (uses ms._write_to_graphiti_json_with_retry attribute assignment). - backend/tests/unit/test_story_38_6_scoring_reliability.py: class-level skip on TestAC3StartupRecovery (same attribute assignment pattern). - backend/tests/integration/test_story_38_7_ac5_recovery_and_cross_story.py: class-level skip on TestAC5Recovery (patch.object on deleted method). Each skip carries a reason string pointing at test_episode_worker_retry.py for the equivalent semantics under the new pipeline. Files preserved (not deleted) for blame/audit history. ═══════════════════════════════════════════════════════════════════════════ Verification ═══════════════════════════════════════════════════════════════════════════ Backend unit regression (excl integration/slow/e2e and the 87-error test_story_30_11/30_13 batch files): Before fix-structlog-caplog-compat: 169 fail, 87 err, 2212 pass After fix-structlog-caplog-compat: 136 fail, 17 err, 2472 pass After fix-test-infra-paralysis P2: 102 fail, 8 err, 2473 pass, 48 skipped Δ from baseline: -67 fail, -79 err, +261 pass, +47 skipped (~57% reduction of failures+errors). Remaining 102 failures + 8 errors are out-of-scope test debt: - agent_service contract drift (test_agent_service_*.py, test_agent_context_*) - agent_templates_smoke file-existence assertions - cache_configuration / calibration_tracker / batch test fixtures - Two test_id_format failures in test_story_30_10 (pure-hash logic, unrelated) These will be addressed by separate changes (fix-batch-story-test-fixtures, fix-agent-contract-drift) tracked under fix-test-infra-paralysis tasks.md section 6. OpenSpec: - openspec/changes/fix-test-infra-paralysis/proposal.md, design.md, specs/test-infrastructure-resilience/spec.md, tasks.md created via npx openspec new + instructions workflow. Validates strict OK, 4/4 artifacts complete. (gitignored — local working state per project convention; will be archived to openspec/specs/ in a separate step after human spec review.) - New capability test-infrastructure-resilience codifies 5 requirements: hook chains MUST surface non-zero exit codes; hooks MUST persist full output to disk; structlog.stdlib.BoundLogger callers MUST use kwargs; tests MUST NOT mock deprecated retry symbols; AST cleanup script MUST be dry-run by default with whitelisted target files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit a7dd270182b455b270526343bd91c27138866059)
ChatGPT Deep Research review of f19dcff flagged the only remaining pipe-blindness in the hook chain: `.claude/hooks/post-tool-router.sh` line 70 still invoked `npx stryker run 2>&1 | tail -20` followed by a useless `[ $? -ne 0 ]` check (reading tail's exit code, always 0). Fix: route stryker through the existing `scripts/run_cmd_capture.sh` wrapper already used by the three backend pytest tiers in the same file, so exit code propagation becomes mechanism-guaranteed rather than convention-dependent. Knip (no pipe in its invocation) stays direct but moves to a cwd-isolating subshell so no cross-block cd side-effects remain. With this change, every test-output pipeline in the hook chain uses the wrapper, completing the a7dd270 fix-test-infra-paralysis scope. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… 32) test_story_30_10_idempotency.py's test_id_format and test_batch_id_format asserted `len(prefix) + 16`, but the production code in backend/app/services/memory_service.py uses `hashlib.sha256(...).hexdigest()[:32]` (lines 84 and 100), producing 32 hex chars. The assertions were stale from a pre-merge design where hash truncation was 16 hex. Docstrings were also updated from "hash16" to "hash32" to match code. No production code change — the 32-char hash is the intended current state (collision resistance) and other tests in the same file (e.g. test_same_event_same_id) already pass with the 32-char output. Reduces backend unit failures from 103 to 101 (cherry-pick baseline: post-f19dcff was 103/8 not 102/8 as estimated in plan). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Motivation: PR #2 had merge conflicts blocking GitHub's pull_request workflow trigger (GitHub Actions cannot create the refs/pull/2/merge ref when the merge fails, so the workflow is silently skipped with zero runs visible in the Actions API or PR Checks UI). Conflict location: lefthook.yml pre-push backend-smoke block. Both sides made complementary (not opposing) changes: - origin/main (2026-04-07 auto-sync): narrowed backend-smoke scope to A11 regression suite (test_kg_relevance_weighted + test_a11_kg_relevance_e2e, 30 tests, ~1s) to avoid the pre-existing 136-failure debt, and switched from `| tail -5` to explicit `$?` capture + `exit $TEST_EXIT` so pipe-blindness cannot swallow pytest exit codes. - fix/test-infra-paralysis (f19dcff): routed both frontend-test and backend-smoke through scripts/run_cmd_capture.sh so POSIX pipe exit code propagation becomes mechanism-guaranteed. Resolution: kept origin/main's backend-smoke verbatim (narrow A11 scope + $? capture pattern — no pipe means wrapper is redundant) while preserving fix branch's frontend-test wrapper change (main did not touch frontend-test, so the pipe-blindness fix is still needed there). Local verification (from worktree, absolute venv path): .venv/bin/python -m pytest tests/unit/test_kg_relevance_weighted.py \\ tests/e2e/test_a11_kg_relevance_e2e.py -q --tb=line --no-header \\ -p no:cacheprovider --override-ini="addopts=" Result: 32 passed in 1.27s (matches main's stated "30 tests, ~1s" baseline). YAML validation (python -c 'import yaml; yaml.safe_load(...)'): OK. After this merge commit is pushed, PR #2's merge state should flip to MERGEABLE, GitHub will generate refs/pull/2/merge, and the pull_request workflow trigger (test.yml + api-spec-sync.yml) should fire on the merge ref for the first time in this repo's history. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
Apr 10, 2026
…positive L5-#2 regression: previously the /api/v1/health endpoint only checked neo4j_stats["initialized"] before dispatching `RETURN 1 AS ping`. When Neo4jClient auto-fell-back to JSON mode (initialized=True, mode=JSON_FALLBACK), the ping query was silently routed to _run_query_json_fallback, which no-ops for simple queries instead of raising. Result: components.neo4j == "ok" even though Neo4j was unreachable — false-green monitoring. Fix: mirror the production 4-way classification from app/services/memory_service.py:969-987 (Story 30.3 fix). Now we only treat the client as healthy when stats["mode"] == "NEO4J" AND health_status is True; JSON_FALLBACK is reported as "json_fallback" (operational, not error); otherwise "degraded" or "not_initialized". HTTP 200 unchanged at top level — JSON_FALLBACK is operational on Tauri desktop sidecar so we don't fail the contract; the signal flows through components.neo4j only. Frontend useBackendStatus.ts only reads top-level status, so this is safe. Tests: TestHealthCheckNeo4jFallback (4 cases) — fully self-contained via monkeypatch of get_neo4j_client; does not depend on broken backend/tests/unit fixtures. Plan reference: Plan v25 Option C (L5-#2) Pattern source: backend/app/services/memory_service.py:969-987 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
Apr 10, 2026
Update phase-1-day-1-spike-results.md to reflect that both Critical L5 findings from Spike 1 are now resolved: - L5-#1 Graphiti fire-and-forget task leak: FIXED in 990e958 via pre-flight Neo4j probe in GraphitiEpisodeWorker.initialize_graphiti. The PRD's original assumption that this was our own fire-and-forget bug was wrong — the real root cause is graphiti-core v0.28.2 library internals at neo4j_driver.py L91-101. Replaced root cause analysis with the verified explanation including the exact library file:line reference. - L5-#2 Health endpoint false-positive: FIXED in 1f170a6 via 4-way mode-aware classification in health.py. The PRD's assumption that the endpoint was "missing a ping check" was wrong — the check was already there, but the Neo4jClient JSON_FALLBACK auto-fallback caused the ping to silently no-op through _run_query_json_fallback. Replaced root cause analysis with the verified false-positive chain from Neo4jClient.py:450-452. - Decision Matrix table: marked both findings as FIXED with commit SHAs. - Cross-References: added Plan v25 Option C references, memory_service.py pattern source, and graphiti_core root cause file pointer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
Apr 19, 2026
用户刚需:R4 工作流中 ship hands-on demo 和用户 UAT 批注两个环节之前靠 claude 记性,这次 story 1.16 v1 后忘记 ship 验收单被用户发现。固化三层防御: L1 硬规则 (_bmad-output/.claude/CLAUDE.md): - dev-story Definition of Done 升级为 3 项(技术 + spec + R4 ship) - 9 项 dod 自检清单必跑 - claude code 行动规则第 4 条改为"必 ship 验收单 + 通知用户" - 用户视角动作 #2 从"看 story spec 的 ## uat script" 改为"在 obsidian 打开 canvas-vault/验收单/story-{id}-*.md" - r4 6 环节 × dod 映射表明确 r4-4/5/6/7 四个环节的强制对应 L2 固定模板 (_bmad-output/templates/uat-sheet-template.md): - 7 段结构:目标 / behavior / 交互 / uat / 结果 / 批注 / spec trace - 含 {placeholder} 占位,claude 复制后填充 - 含 correct-course 触发的覆盖更新协议(不新建,追加历史 callout) L3 (待拍板):stop hook 技术保障 - 检测上轮有 review 状态变更但 canvas-vault/验收单/ 无新文件 → 阻断 - 等用户确认是否值得增加 hook 复杂度 story 1.16 验收单 (canvas-vault/验收单/story-1.16-批注-hotkey.md) 作为 首次样例,已按 7 段结构 ship 并含 v1→v2 历史追溯。 story: 1.16 PLAN-EPIC1-BMAD-DEV-ASSESS-2026-04-17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 3, 2026
新增独立 service 模块(spec 偏离:替代扩展 1161 行的 context_enrichment_service.py,符合 SOLID): - backend/app/services/wikilink_context_service.py (180 行) · enrich_from_wikilink_graph(node_path, max_hops=2, timeout_ms=200) · WikilinkNeighborContext dataclass (slug/path/hop/relationship_type/frontmatter/content_summary) · EnrichmentResult dataclass (degraded 标记) · _extract_relationship_type: 从 frontmatter relationships[] 提取目标关系 · _normalize_target_slug: vault 路径 → basename · 降级路径:graph_not_built / traversal_timeout / unexpected_error 全部 degraded=True 不抛异常 - backend/tests/unit/test_wikilink_context_service.py (210 行, 19 cases all green) · _normalize_target_slug × 4 · _extract_relationship_type × 7(含 malformed 防御) · enrich_from_wikilink_graph × 8(含正常 / 孤立 / 异常 / 排序) 完成 AC #1 (2-hop 遍历) + AC #2 (关系类型提取) + AC #5 (降级处理)。 依赖 Story 1.3 wikilink_graph_service.get_neighbors(已 done in commit 4e0c27b)。 剩余 Task 2/3/5.3/5.4/6:ChatContextAssembler + Skill workflow + 集成测试 + UAT 验收单 PLAN-NNN: EPIC1-BMAD-DEV-ASSESS-2026-04-17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 3, 2026
…(25 tests green) backend/app/services/chat_context_assembler.py (240 行): - ChatContextAssembler 类 - assemble_context(current_note, neighbors, token_budget) — 5 优先级填充 Priority 1: 当前笔记全文(最高,不可压缩) Priority 2: 1-hop frontmatter + Tips + errors Priority 3: 1-hop content_summary Priority 4: 2-hop frontmatter Priority 5: 2-hop content_summary - compress_content(text, max_tokens) — atomic 块保护($$...$$ / $...$ / ```...```) - _extract_atomic_blocks / _restore_atomic_blocks: placeholder 替换防破坏 - count_tokens: tiktoken cl100k_base(fallback 到 chars/4) - token_budget: 默认 8192 / 环境变量 CHAT_CONTEXT_TOKEN_BUDGET 覆盖 - 返回 AssembledContext (text/used_tokens/budget/truncated/sections_included) backend/tests/unit/test_chat_context_assembler.py (210 行, 25 cases all green): - _resolve_token_budget × 4 (default / override / env / invalid env) - _extract_atomic_blocks × 4 (LaTeX $$/单$/代码块/混合) - _restore_atomic_blocks roundtrip + _drop_orphan_placeholders - count_tokens × 3 (basic/empty/中文) - compress_content × 5 (within budget/zero/LaTeX 保护/代码块保护/句子边界) - assemble_context × 7 (current_note 优先/1hop+2hop 分组/summary/budget 不足/dataclass) 完成 AC #2 (上下文组装) + AC #3 (token 预算压缩 + 公式/代码块保护)。 累计 Story 2.1 单元测试:44/44 全 green。 剩余 Task 3 (Skill workflow + REST endpoint) + Task 5.3-5.4 (集成测试) + Task 6 (验收单)。 PLAN-NNN: EPIC1-BMAD-DEV-ASSESS-2026-04-17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 4, 2026
PLAN-EPIC2-STORY2.1-PHASE-1.7+ ChatGPT 对抗审查 cefabb2 找到 5 个 P0 + HIGH (评分 4/10). 4 路并行 Explore agent 验证全部成立, 全部修复 + 锁定 regression test. P0#1 — chat.py 没传 trace=enrichment.trace 致 manifest 永远 trace_unavailable Fix: chat.py:156 trace=enrichment.trace (本地 unstaged 早已存在, 漏 commit 进 cefabb2) P0#2 — WikilinkGraphService 缺 build_timestamp 字段, getattr fallback 永远 unbuilt Fix: graph_service 加 _build_timestamp + property + get_stats (同上, unstaged 漏 commit) P0#5 — _CALLOUT_PATTERN 贪婪 regex 吞并相邻 callout Repro: > [!tip]+ A\n> a\n> [!error]+ B\n> b → 1 个 callout (期望 2) Fix: line scanner 替换 regex (O(n) 无 backtracking), _extract_body_excerpt 同改 Test: 5 regression (相邻/blank quote line/3 连续/code fence/relation 噪音) P0-A — _read_neighbor_md 信任 absolute path 可读 vault 外文件 Fix: _resolve_vault_md_path sandbox (resolve strict + relative_to(root) + .md suffix check + 1MB DoS cap) Test: 4 regression (vault 外/dotdot escape/非 .md/超大文件) P0-B — _format_neighbor_metadata body 行未 escape, 可注入 </neighbor><system> 攻击载荷可闭合 <neighbor> 标签 + 注入伪 system 块绕过 <context_policy> Fix: 加 _xml_text_escape (含 control char 清理) + 应用到所有 user-content 行 (rel_value/type/tips/errors/callout kind+title+content/summary snippet) Test: 3 injection regression (callout/summary/relation_type) + 专门 test_security_p0_vulnerabilities.py 文件 实测 curl /api/v1/chat/enrich-context (节点/Fundamentals.md, 2 邻居): - Token: 644 → 672 (escape 略增, 功能完整) - Graph version: 2026-05-04T05:08:16+00:00 真实 ISO (eager build OK) - Included: 2 | Degradations: none - callout 段 + summary 段装载正常, 用户实测 manifest 与之前一致 测试: 102 pytest passed (68 → 102, +34 P0 regression) 致谢 ChatGPT 反馈: 找到本地 unstaged 修复漏 commit (P0#1/#2) + line scanner 边界 bug (P0#5) + 实际 prompt injection 攻击载荷 (P0-B) + path traversal 设计缺陷 (P0-A). cefabb2 评分 4/10 → 本 commit 期望 7/10 (剩余 HIGH: timeout / except 过宽 / 多 worker race / regex DoS 等需独立 follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 4, 2026
PLAN-EPIC2-STORY2.5-T2-T3 Story 2.5 错误自动提取与分类 — Task 2 (4 主类) + Task 3 (补救策略) 落地. 4 类映射决策选 D 方案 (扩展不破坏): 现有 ErrorType (Story 3.6 production data) 保留, 新增 PedagogyErrorType (PRD §FR-CONV-06) 双标签共存. 新增 (entity_types.py): - PedagogyErrorType enum: conceptual_confusion / procedural_error / careless_slip / metacognitive_error (PRD AC #2) - RemedyStrategy 加 2 项: DISCRIMINATION_COMPARISON (辨析+对比) + TRANSFER_SELF_EXPLANATION (迁移+自我解释) — 对齐 PRD AC #3 - PEDAGOGY_TYPE_TO_REMEDIES: PRD 4 类 → 补救策略 list - LEGACY_TO_PEDAGOGY: legacy 4 类 → PRD 4 类静态映射 - disambiguate_superficial(): SUPERFICIAL 二义消解 (sub_tag 优先 + 关键词) - map_legacy_to_pedagogy(): 统一映射函数 (含 SUPERFICIAL 拆分) 新增 (error_classifier.py): - ClassifiedError pydantic model: legacy_type + pedagogy_type 双标签 + legacy_remedy + pedagogy_remedies + sub_tags + is_ambiguous property (confidence < 0.6 = AMBIGUOUS, PRD AC #2) - ErrorClassifier.classify_with_pedagogy(): 双标签分类入口 - ErrorClassifier._llm_classify_with_confidence(): 同时拿 ErrorType + confidence 向后兼容: 现有 ErrorType / ERROR_TYPE_TO_REMEDY / ClassificationResult / classify() 全部保留, Story 3.6 production data 不破坏. SUPERFICIAL 二义消解规则: - sub_tag 含 transfer_failure / metacognitive / overconfidence → METACOGNITIVE - description 含 迁移/应用/新场景/transfer/过度自信 → METACOGNITIVE - 否则默认 → CONCEPTUAL_CONFUSION Tests +24 (全 PASS): - 4 LEGACY→PEDAGOGY 映射 (含 PROBLEM_FRAMING→CARELESS_SLIP, KNOWLEDGE_GAP →CONCEPTUAL_CONFUSION 等关键映射) - 6 SUPERFICIAL 二义消解 (默认/sub_tag 优先/关键词触发/sub_tag>关键词) - 6 PEDAGOGY→REMEDY 完整性测试 + 向后兼容 - 7 classify_with_pedagogy 双标签结果验证 (含 AMBIGUOUS 标记 / 4 主类 完整覆盖 / 双 remedy 关联) - 1 现有 classify() 行为不变测试 Story 2.5 剩余 task (待续 commit): - Task 1: error_extractor.py 对话错误提取 - Task 4: 双写 frontmatter + Graphiti (record_error MCP) - Task 5: Skill workflow /chat-with-context 集成 - Task 6: e2e 集成测试 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 4, 2026
…e 集成测试 PLAN-EPIC2-STORY2.5-SHIPPED Story 2.5 错误自动提取与分类 — 全部 6 task 完成 ✅, 标记 done. Task 5 — record_error MCP tool 升级 (双标签 + 双写) backend/app/mcp/tools/error_tools.py 改造: - RecordErrorInput 加 sub_tags 字段 (Story 2.5 SUPERFICIAL 二义消解) - RecordErrorOutput 加 5 个新字段 (向后兼容): · pedagogy_type (PRD §FR-CONV-06 4 主类) · pedagogy_remedies (list[str]) · confidence (LLM 分类置信度) · is_ambiguous (confidence < 0.6 → True, PRD AC #2) · frontmatter_written / graphiti_status - 新增 _resolve_node_file_path(): 从 node_id 推断 vault_root/X.md 文件路径 (从 settings.canvas_base_path 解析, 失败返回 None) - record_error() 重写: · 调 classify_with_pedagogy() (Task 2 双标签) 替代 legacy classify() · 调 write_error_dual() (Task 4) 同步 frontmatter + fire-and-forget Graphiti · 保留全部 Story 3.6 legacy 字段 (error_type / remedy_strategy / error_type_label / remedy_description) 不破坏向后兼容 · graphiti_status: scheduled (fire-and-forget) | ok | failed | skipped_frontmatter_failed | not_attempted - AC #4 + #6: frontmatter 本地优先, Graphiti 失败仍 recorded=True Task 6 — e2e 集成测试 (5 tests, 全 PASS) backend/tests/integration/test_error_extraction_e2e.py 新增: - test_e2e_dialog_to_frontmatter_full_pipeline: 完整链路 dialog → ExtractErrorsFromDialog → classify_with_pedagogy → write_error_dual → frontmatter 含双标签 (legacy + pedagogy) + Graphiti record_knowledge_entity 被调用 - test_e2e_dialog_no_errors_no_writes: 无错误对话 → 无写入 (AC #5) - test_e2e_record_error_mcp_tool_full_pipeline: MCP tool input → SUPERFICIAL + sub_tag transfer_failure 触发 二义消解 → pedagogy_type=METACOGNITIVE_ERROR (D 方案核心验证) → frontmatter 含 transfer_self_explanation remedy - test_e2e_record_error_low_confidence_marked_ambiguous: confidence 0.45 → is_ambiguous=True (PRD AC #2) - test_e2e_record_error_graphiti_failure_frontmatter_succeeds: AC #6 验证 — memory_service ImportError → frontmatter 仍写入, recorded=True, graphiti_status=scheduled Story 2.5 spec status: ready-for-dev → ✅ done 测试: 全套 162 pytest passed (含 Story 2.5 共 55 tests: 24 mapping + 11 extractor + 15 writer + 5 e2e) Story 2.5 全 6 task ship 总览: - ✅ Task 1: error_extractor.py (LLM 对话错误提取, AC #1, #5) - ✅ Task 2: 4 主类 + 双标签 D 方案 (PRD AC #2) - ✅ Task 3: 补救策略 PedagogyErrorType → RemedyStrategy 映射 (PRD AC #3) - ✅ Task 4: error_writer.py 双写 (frontmatter atomic + Graphiti retry, AC #4 + #6) - ✅ Task 5: record_error MCP tool 升级 (向后兼容, 双标签 + 双写) - ✅ Task 6: e2e 集成测试 (5 tests, 全链路 + 边界场景) EPIC 2 进度: - Story 2.1 ✅ done (commit bfe0ef2) - Story 2.5 ✅ done (commit dad9ed7 + d7621f4 + 57aa3bd + this commit) - 进度 22% (2/9 stories, 20h/70h) - 剩余: 2.2 / 2.3 / 2.4 / 2.6 / 2.7 / 2.8 / 2.9 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 4, 2026
…on d 方案 PLAN-EPIC2-STORY2.5-DEEP-RESEARCH 生成 ChatGPT deep research prompt 文档供用户复制粘贴用. 不同于 Story 2.1 三轮对抗审查 (4/10→7/10→8/10), 本次 Story 2.5 采用 deep research 模式 (学术 + 产业 + 改进方向综合研究). 文档结构: - System prompt: 教育心理学 + 学习科学 + AI 工程 跨领域 staff researcher - 项目坐标: GitHub URL + 分支 + HEAD commit (268c9aa) + 关键文件路径 - PRD §FR-CONV-06 期望 + AC #2/#4/#6 - D 方案核心代码片段: · ErrorType (legacy) + PedagogyErrorType (PRD) 双 enum · LEGACY_TO_PEDAGOGY 静态映射 · disambiguate_superficial() SUPERFICIAL 二义消解 · PEDAGOGY_TYPE_TO_REMEDIES 补救策略 · ClassifiedError 双标签数据模型 · EXTRACTION_PROMPT LLM 提取 prompt · write_error_to_frontmatter / write_error_to_graphiti · record_error MCP tool 输入输出 schema - 55 tests 基线 + 4 测试文件分布 - 4 个 deep research 核心问题: Q1 学术对齐度: Bloom / BKT/DKT / VanLehn / Chi / Ericsson Q2 产业实践对比: Khan Academy / Duolingo / NotebookLM / Anki AI 等 Q3 D 方案评估: 弱点+改进+方案选择 Q4 Phase 2 ROI: FSRS / Hub penalty / 跨概念模式 / Tip 生成 - 输出 format 模板 (table + 评分) 使用方式: 复制粘贴给 ChatGPT (推荐 GPT-5/o3 with deep research mode). 不是普通 code review, 不是审查 P0 安全问题. 是综合研究学术对齐 + 产业实践 + 改进路径. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 4, 2026
基于 ChatGPT Round-2 reply (commit 348a7ae) 锁定的双 spec, ready-for-dev: Story 2.5.X — 用户主权回归 C+ 方案 (471 行, 18-24h, P0) · trace: FR-CONV-06 + Decision-Review-D15 (待用户 PRD §12 批注) · AC #1: AI 候选写 frontmatter error_candidates[] 不直接进 errors[] · AC #2: 6 状态机 (pending/accepted/edited/dismissed/disputed/expired) · AC #3: dedupe 不重复添加, hash 不含 session_id (跨 session 同错应 update 不 append) · AC #4: 非阻塞 Notice + Dashboard Dataview 保活 ("待复盘 N 条") · AC #5: POST /api/v1/errors/accept-candidate (candidate → errors[] + Graphiti) · AC #6: POST /api/v1/errors/rebuild-graphiti?group_id=... 兜底机制 · AC #7: dismissed/disputed 路径 (用户否决 AI, dispute_reason 必填) · 10 Tasks 含 candidate writer / 状态机 / accept/dismiss/dispute/rebuild endpoints + Dashboard Dataview / Plugin 命令 / session_id 注入 / expired 自动归档 · 依赖 Story 2.5 (commit 0d05ad8) · UAT 6 场景 + 7 自动 checkpoints Story 2.5.Y — 隔离硬化 SubjectConfig 复用 (494 行, 26-35h, P0) · trace: FR-CONV-06 + FR-CTX-08 + Decision-Review-D16 (待用户 PRD §12 批注) · AC #1: PostTurnExtractRequest 强制 vault_id 字段 (缺则 422) · AC #2: 复用 SubjectConfig.build_group_id() 派生 group_id · AC #3: error_writer.py:270 移除 DEFAULT_GROUP_ID 硬编码 (group_id 必填参数化) · AC #4: LanceDB 强制注入 WHERE group_id 过滤 (修 vault_notes_retriever) · AC #5: Cypher 防御性 helper cypher_with_group_filter() · AC #6: group_id 命名统一为 vault:<vault_id>[:<sub>] 格式 (弃 cs188/canvas-dev) · AC #7: per-group export/rebuild 脚本 + idempotency · AC #8: E2E 两 vault 同名节点不串 (三层 Cypher/LanceDB/Graphiti 隔离) · 10 Tasks 含 SubjectConfig 强化 / vault_id 字段 / 硬编码移除 / 隔离审计 + 命名迁移 / export-rebuild 脚本 / Plugin 端 vault_id 注入 / 文档更新 · 依赖 Story 2.5 + 2.5.X · UAT 8 场景 + 7 自动 checkpoints 合计工作量: 44-59h (Round-1 77-104h 减 43%) commit-ready 程度: 8.5/10 (ChatGPT Round-2) 下一步: - 用户在 PRD §12 批注 D15 (用户主权方案 = C+) + D16 (隔离方案 = 复用 SubjectConfig) - 用 bmad-bmm-dev-story skill 启动 Story 2.5.X 实施 (先 X 后 Y) - Y 在 X 基础上加隔离硬化, 改 X 的 endpoint 签名时同步更新 X 测试 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 5, 2026
Story 2.5.X Task 2/10 完成 (AC #2: 6 状态机实现) 实施 (Task 2.1-2.5): - backend/app/services/candidate_state_machine.py 新增模块: · CandidateStatus Literal type (pending|accepted|edited|dismissed|disputed|expired) · ALLOWED_TRANSITIONS 图: pending → 5 终态全合法, 终态间 0 转换 · validate_status_transition(current, target) → HTTPException 422 (友好 error message 三类) · apply_status_change(candidate, target, *, changed_by="user") - 校验 + in-place mutation - 自动写 status_changed_at (ISO 8601 timestamp) - 自动写 status_changed_by ("user" 默认 / "system" cron 用) · is_terminal_status / is_active_status helpers (Dashboard 过滤) 测试 (41 新增): - 6 状态全包含 - 5 合法 pending→X (parametrize) - 9 非法 (5 反向 + 4 终态间, parametrize) - unknown current/target → 422 - terminal state error message 含 "terminal state" - ISO 8601 时间戳格式正则验证 - changed_by 默认 "user" / 显式 "system" - in-place mutation 验证 - 业务场景: accept/dispute/expire workflow + double accept 拒绝 回归: 110 测试全 pass (41 state_machine + 16 candidate_writer + 53 v1.0) 剩余 Story 2.5.X Tasks (8/10): Task 3: accept_candidate endpoint Task 4: dismiss/dispute endpoints Task 5: rebuild_graphiti endpoint + chat.py 切 candidate_only Task 6: Dashboard Dataview 保活 Task 7: Plugin 命令 + Notice + SuggestModal Task 8: session_id E2E 验证 (已部分覆盖) Task 9: expired 30 天 cron Task 10: 集成测试 + UAT Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 5, 2026
…→ review
Story 2.5.X (D15 用户主权 C+) 全量 ship → status=review
Tasks 完成:
- Task 8: session_id E2E 验证 (跨 session 累加 seen_sessions)
- Task 9: expired 30 天自动归档 cron
- Task 10: 集成测试 (10 E2E + 各 service 单测覆盖完整)
实施:
backend/app/services/candidate_expiry_service.py (~210 行新文件):
- expire_pending_candidates(vault_root, *, expiry_days=30, now=None) → ExpireStats
· 扫描 vault 节点/*.md (复用 _scan_vault_md_files)
· pending + created_at < cutoff → apply_status_change("expired", changed_by="system")
· 幂等性: 已 expired 不再处理
· per-file lock 复用 (_get_file_lock)
· 仅当有改动时写文件 (避免无意义 mtime 更新)
· 单条失败不中断, 记入 failures[]
- _parse_created_at: ISO 8601 / Z 后缀 / naive / None 容错
- _is_expired: status=pending AND created_at < cutoff (无 created_at 保守跳过)
- ExpireStats / ExpireFailure Pydantic schemas
backend/tests/unit/test_candidate_expiry_service.py (20 测试):
- _parse_created_at: 4 容错场景
- _is_expired: 5 边界 (old/recent/non-pending/no created_at)
- expire 主流程: old → expired / recent 不变 / 终态跳过
- 幂等性: 第二次跑 0 expired
- 跨文件批量
- 无 created_at 保守跳过
- 仅当有 expire 时写文件 (mtime 不变)
- DEFAULT_EXPIRY_DAYS = 30 + cutoff_iso 写入 stats
backend/tests/integration/test_2_5_x_e2e.py (10 E2E 测试):
- E2E #1: full accept (write → accept → errors[] + Graphiti queued)
- E2E #2: accept with edits → status=edited + edits 应用到 errors[]
- E2E #3: dismiss path → 不入 errors[]
- E2E #4: dispute path → dispute_reason 持久化
- E2E #5 (Task 8): session_id 跨 3 session 累加 → 1 candidate + seen_sessions={s1,s2,s3}
- E2E #6 (Task 9): expired 30 天后 cron 归档 → status=expired + changed_by=system
- E2E #7 (Task 5): rebuild_graphiti 从 errors[] 重建
- E2E #8: rebuild dry_run 仅扫描计数
- E2E #9: 双重 accept 反向不可逆 → 422
- E2E #10: dismiss → accept 终态间被拒 → 422
测试累计:
- Backend: 167 passed (113 Story 2.5.X + 54 v1.0 回归)
· 16 candidate_writer (Task 1)
· 41 state_machine (Task 2)
· 14 candidate_service (Task 3+4)
· 13 rebuild_service (Task 5)
· 20 expiry_service (Task 9)
· 10 e2e (Task 8+10)
· 53 v1.0 (error_writer + extractor + classifier + ChatGPT regression)
- Plugin: 104 passed (19 helpers + 85 v1.0)
- TOTAL: 271 全 pass, 0 fail
sprint-status: 2-5-x in-progress → review
Story 2.5.X 完整交付:
- 4 backend endpoint: accept/dismiss/dispute/rebuild-graphiti
- 3 plugin command + 2 Modal class + helpers 模块
- Dashboard "📋 待复盘错误候选" Dataview section
- candidate_expiry_service cron (lifespan hook 集成留 2.5.Y)
- 6 状态机 + dedupe + per-file lock + 原子写入
- frontmatter error_candidates[] + errors[] 双数组并存
- session_id 透传 + seen_sessions[] 累加
下一步: 用户 UAT (Phase A 半手动 demo + 完整 7 命令测试) → done
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 5, 2026
… 移硬编码 Story 2.5.Y (D16 隔离硬化) Tasks 1-3/10 完成 (并行 2.5.X review) Task 1 - 复用并强化 SubjectConfig (AC #2): - backend/app/core/subject_config.py: · 新增 build_vault_group_id(vault_id, subject_id, canvas_path) · 强制 vault: 前缀命名 (vault:cs_61b / vault:数学 / vault:cs_61b:algorithms) · vault_id 必填 (空抛 ValueError) · subject_id 优先于 canvas_path (互斥) · canvas_path 复用 extract_canvas_name 提取 stem · 新增 is_vault_group_id() helper (检测新格式 vs legacy) - 旧 build_group_id 保留向后兼容 (Story 1.9 production data) - 测试: 21 新增 test_subject_config_vault.py · vault_id 必填 (空/whitespace/None 抛错) · 基础组合 (vault_id 单/中文/sanitize) · subject_id 二级隔离 · canvas_path stem 提取 + .canvas 扩展 · 互斥性 (subject 优先于 canvas) · 与旧 build_group_id 区分性 · is_vault_group_id 识别新旧格式 Task 2 - PostTurnExtractRequest 加 vault_id 字段 (AC #1): - backend/app/api/v1/endpoints/chat.py: · PostTurnExtractRequest 加 vault_id: str = Field(..., min_length=1) 必填 · 加 subject_id / canvas_path 可选字段 · post_turn_extract endpoint 入口调 build_vault_group_id + set_current_subject_id 注入 ContextVar - 测试: 7 新增 test_post_turn_request_vault_id.py · vault_id 必填校验 (缺/空/None → ValidationError) · subject_id / canvas_path 可选默认 None · 中文 vault_id 通过 · v1.0 校验仍生效 (messages min/max + total_chars budget) Task 3 - error_writer 移除 DEFAULT_GROUP_ID 硬编码 (AC #3): - backend/app/services/error_writer.py: · write_error_to_graphiti 加 group_id: Optional[str] kwonly 参数 · group_id 解析优先级: 1. 显式 group_id 参数 (cron/CLI 场景) 2. ContextVar get_current_subject_id (endpoint 注入, 主路径) 3. fallback DEFAULT_GROUP_ID + structlog warning (deprecated) · record_knowledge_entity 调用改用 effective_group_id (不再硬编码) - 渐进式迁移: DEFAULT_GROUP_ID fallback 保留 (Task 6 命名迁移后可移除) 测试 cascading 修复: - backend/tests/integration/test_story_2_5_chatgpt_round2_p0.py: · 6 个 endpoint 测试加 "vault_id": "cs_61b" 字段 - backend/tests/integration/test_error_extraction_e2e.py: · write_error_dual 调用加 mode="write_confirmed" (Story 2.5.X 兼容) 回归: 217 测试全 pass (88 v1.0 + 113 Story 2.5.X + 28 Story 2.5.Y, 0 fail) sprint-status: 2-5-y ready-for-dev → in-progress 剩余 Story 2.5.Y Tasks (7/10): Task 4: LanceDB 向量搜索注入 group_id 过滤 Task 5: Cypher 防御性 helper cypher_with_group_filter Task 6: group_id 命名统一迁移脚本 (cs188/canvas-dev → vault:<id>) Task 7: per-group export/rebuild 脚本 Task 8: E2E 多 vault 测试 Task 9: Plugin 端传 vault_id (cascade 改 main.ts post-turn-extract) Task 10: 文档更新 (CLAUDE.md group_id 规约) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 8, 2026
@SPEC: round-23-stage-1 7 sub-tasks 全部完成 (用户决策 2026-05-08): - 7.1 patch 1 fail-closed config (validate_security_defaults, 4/4) - 7.2 patch 2 canonical_group_id 单一入口 (cs188→vault:default, 6/6) - 7.3 patch 3 search_nodes fulltext fast path (13/13) - 7.4 错误管理读路径接通 (error_reader + 3 GET endpoints, 7/7) - 7.5 internal_api_key + websocket auth (8/8 fail-closed matrix) - 7.6 cs188 历史数据迁移脚本 (CLI dry-run/apply/json/force) - 7.7 测试 + uat (102/104 = 98%) round-14 残缺修复进度: - #1 错误管理只写不读 → 修复 - #2 cs188 group_id 散落 → 修复 - #3 search_nodes CONTAINS 退化 → 修复 - #4 前后端零同步 → stage 2 felt-sense: 3.5/10 → 8.3/10 (+4.8) source: round-23-chatgpt-dr-result-and-synthesis-2026-05-08.md uat: Stage-1-Round-23-阶段1-硬化-UAT-2026-05-08.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 8, 2026
…onship sync Round-14 残缺 4 项最终修复(#1-#4 全部 ship): - #1 错误管理只写不读 (Stage 1 已修, error_reader.py) - #2 cs188 group_id 散落 (Stage 1 已修, canonical_group_id) - #3 search_nodes CONTAINS 退化 (Stage 1 已修, fulltext fast path) - #4 前后端零同步 (Stage 2 修复, relationship_sync_service) 5 sub-tasks 完成: - 8.1 Wikilink 增量 refresh: schedule_note_index + _debounced_note_index + POST /api/v1/index/refresh-changed (6/6 端到端测试 pass) - 8.2 JSON fallback 原子化: 新建 app/utils/atomic_io.py (149 LOC) + 替换 4 处 json.dump (memory/edge/review/canvas) (5/5 crash-safe) - 8.3 Graphiti 读写一致性: 现状盘点 (50/50 现有测试 pass) + vector_count placeholder 接真实 LanceDB stats - 8.4 残缺 #4 frontend ↔ Graphiti 双写: relationship_sync_service.py (253 LOC) + POST /sync/relationships/by-node + /sync/relationships/vault (8/8) - 8.5 测试 + UAT: 152/154 全栈回归 (98.7% pass) Karpathy 80/20 实践: 实际工时 ~12.5h vs ChatGPT 估 60h (节省 79%)。 Stage 1 patches 已铺基础设施 + 现有 atomic 模板复用 80% + Graphiti 集成已实现 50/50 测试通过 — 大量预设工作已完成无需重写。 Felt-sense 整体闭环成熟度: 4.0/10 → 9.0/10 (+5.0) EPIC1-BMAD-DEV-ASSESS-2026-04-17 PLAN-023 @SPEC: round-23-stage-2 Source: _bmad-output/research/round-23-chatgpt-dr-result-and-synthesis-2026-05-08.md UAT: _bmad-output/验收单/Stage-2-Round-23-阶段2-收口-UAT-2026-05-08.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 8, 2026
Story 2.2 (supplementary-material-search) Phase A 落地。按 Round-23 之后渐进 UAT 模式(每 Phase ship mini-UAT 防偏离),Phase A = Task 1 (MCP 集成) + Task 4 (降级)。 变更: - 新建 backend/app/services/supplementary_search_service.py (~210 行) - hybrid 搜索 (bge-m3 + jieba) + source priority 复用 + explanation files filter - 三档降级:lancedb_unavailable / search_failed / empty_index / all_filtered_below_threshold - format_supplementary_xml 含 XML escape 防 vault 内容破坏 XML 解析 - backend/app/api/v1/endpoints/chat.py 4 处 patch - imports: pathlib.Path + structlog + supplementary_search_service - EnrichContextResponse 加 supplementary_count/degraded/reason 3 字段(零 schema breaking) - enrich_context Step 5 注入(mode=answer + user_question 双重守门,Story 2.1 预留 schema 直接复用) - return 回填 3 新字段 - canvas-vault/.claude/skills/chat-with-context/SKILL.md 3 处 patch - prompt 解析说明加 <supplementary_materials> section 描述 - 开场白模板加 "📚 相关材料" 提示 - 新增 "## 补充材料展示" 段含 felt-sense 引导 + 降级处理规则 AC 覆盖: - AC #1 (search_vault_notes 集成) ✅ chat.py:177-216 - AC #5 (LanceDB 不可用降级) ✅ supplementary_search_service.py:42-114 + chat.py try/except - AC #4 (增量索引 < 500ms) ✅ 复用 Story 38.1 lancedb_index_service Phase A 暂不做(明确 scope 防蔓延): - AC #2 (wikilink 三精度 file/heading/block_id) → Phase B - AC #3 (类型权重精排 lecture > discussion > exam) → Phase B - Task 5 (单元 + 集成 + 性能测试) → Phase C mini-UAT 验收单(DoD-3 v3.0 双段铁律 7 sections + 5 题自检全过): _bmad-output/验收单/Story-2.2-Phase-A-MCP-集成-2026-05-08.md Plan: EPIC1-BMAD-DEV-ASSESS-2026-04-17 Story: 2.2-Phase-A Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 9, 2026
用户实测痛点: phase a supplementary 给的 wikilink 点击不能跳转到具体笔记片段.
样本: [[节点/规划的分类-1549()#规划的分类-1549|跳转]] - heading anchor 漂移
样本: [[raw/CS188/videos/lectures/lecture 2/chunks/merged#4.4 价值迭代...|跳转]]
- chunks/merged.md 是 lancedb 切分时的虚拟派生路径,文件不真实存在
3 并行 agent 实测确认双重 bug:
根因 #1 (70%): heading over-strip
- supplementary_search_service.py:404 旧代码: re.sub(r"\(\)\s*$", "", heading)
- 本意清残留 "[time]()" 但被前一条 regex 已处理
- 副作用: 节点 "规划的分类-1549()" 的 heading "# 规划的分类-1549()" 被剥成 "规划的分类-1549"
- → wikilink anchor 与 obsidian 文档实际 heading 不字面匹配 → 不跳转
根因 #2 (30%): chunks/merged 虚拟派生路径
- lancedb_client.py:2098-2230 切分 md 时按 heading 分 chunk,写 file_path 含 "chunks/merged.md"
- vault 内实际只有 "lecture 2/lecture 2.md",chunks/merged.md 不真实存在
- obsidian "no file matches" → 跳转失败
业界共识 (smart connections / khoj / copilot for obsidian 100% 一致):
chunk 是索引虚拟物件,绝不写虚拟派生文件,citation 始终指向原 .md + heading
patch (2 处):
- 移除 over-strip regex `\(\)\s*$` (line 404)
· 仅保留 [[wikilink]] 清理 + [text](url) markdown link 清理
· heading 字面保留所有 () / - / : 等真实字符
- 新增 _resolve_chunks_to_source_file(path) helper:
· "X/chunks/<chunk>.md" → "X/X.md" (回写到原文件)
· 不含 chunks/ 的 path 原样返回
- _normalize_material 调用 helper line 405
obsidian 跳转规则验证:
- heading 严格 case-sensitive + 字面匹配 (obsidian help / forum 40724)
- 文件名 case-insensitive + 模糊
- 末尾空格被 trim, 但 - 和 () 必须字面保留
- chunks/ 派生路径 obsidian 永远找不到文件
实测预期 (drop + reindex 后):
- wikilink: [[节点/规划的分类-1549()#规划的分类-1549()]] (heading 含真实 ())
- wikilink: [[raw/CS188/videos/lectures/lecture 2/lecture 2#2.3 规划代理 (Planning Agents)]]
(chunks/merged 已回写到 lecture 2.md)
- 用户点击真实可跳转
未做 (留 phase b):
- block-id `^c-{hash8}` 写入源文件做 stable anchor (业界终极解)
- heading 大小写 / 末尾空格规范化 (相对低频问题)
plan: EPIC1-BMAD-DEV-ASSESS-2026-04-17
story: 2.2-phase-a-t1.5
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 10, 2026
…oad + mv3 8 tests
防 5 vault 并发串库 — enrich-context endpoint vault_id 必填全链路。
- MV-1 backend/app/api/v1/endpoints/chat.py
EnrichContextRequest.vault_id: str = Field(..., min_length=1) 必填
EnrichContextRequest.subject_id: str | None = None 可选 (一 vault 一学科兼容)
handler 入口调用链:
sanitize_vault_id(req.vault_id) → 标准化 (NFKC + casefold + Unicode \w)
build_vault_group_id(...) → 构造 vault:<id>:<subj>:<canvas>
set_current_subject_id(group_id) → 写 ContextVar
→ downstream wikilink/lancedb/supplementary 各 service 通过
get_current_subject_id() 拿到同一 vault_id,5 vault 并发不串库
契约参考 PostTurnExtractRequest (Story 2.5.Y AC #2)
- MV-2 frontend/obsidian-plugin/src/main.ts
handleChatWithContext (Cmd+Shift+E) payload 加
vault_id: inferVaultId(this.app.vault.getName())
handleStudyQuestion (Cmd+Shift+Q) payload 同步
Plugin 传 raw vault name,backend 端统一 sanitize 防算法漂移
- MV-3 backend/tests/unit/test_enrich_context_vault_isolation.py (新增)
8 个核心防御测试:
1 vault_id 缺失 → 422 (plugin 旧版本不能 silent corruption)
2 vault_id 空字符串 → 422 (min_length=1)
3 vault_id 提供 → sanitize + build + set 全链路验证
4 中文 vault_id 不坍缩 default (最致命数据泄漏点回归保护)
5 subject_id 可选 (向后兼容)
6 并发隔离 (asyncio.gather 2 vault 跑 ContextVar 各自独立) — P0 核心
7 ../etc/passwd 路径遍历被净化
8 emoji 被 strip 中文保留
辅助调整:
test_chat_endpoint.py / test_study_question_deep_mode.py
helper _enrich_payload / _payload 加 vault_id='test_vault' 默认
保 14+8=22 现有测试通过 (回归保护)
跑全套件 65 测试 0 regression (8 新 + 19 chat + 8 sq + 17 rag-p0 + 13 supp)
Trace: EPIC1-BMAD-DEV-ASSESS-2026-04-17
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 12, 2026
…PT P0-A 间接) - main.ts handleChatWithContext: AbortController + setTimeout(3000) + signal/clearTimeout - main.ts fallbackToLocalNeighbors: 新增方法 (collectNodeNeighbors + buildNodeChatPrompt) - node-chat-context.ts: 新增 buildChatWithContextFallbackPrompt 纯函数 + ChatFallbackReason type - tests/chat-fallback.test.ts: 8 新增 unit tests (156/156 pass, 49ms) - sprint-status: 2-2-and-2-9-merged in-progress, T1 review (用户 UAT 已 pass) - 验收单 DoD-3 双段铁律合规(3 次 hook 阻断后改"被动观察"设计通过) PLAN-EPIC1-BMAD-DEV-ASSESS-2026-04-17 Story: 2.2+2.9 Trace: 合并 spec AC #2 (原 Story 2.9 AC #6) / ChatGPT Review 2026-05-11 P0-A 间接关联 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 12, 2026
…idence T3 (T3.1-T3.11): supplementary_reranker.py + rerank_service.py + wire - TYPE_WEIGHTS table (PRD §4.1.1, lecture_notes 1.0 → raw_notes 0.6) - BM25 Okapi 自实现 (jieba 中英分词, 避免引入未声明 rank-bm25 dep) - hub_penalty = log(degree/median + 1) (Story 2.9 AC #2) - final_score = relevance × type_weight + query_overlap × 0.3 - hub_penalty - get_filter_threshold() = 0.42; rerank top_k=5 - chat.py wire rerank 进 supplementary 流程 - format_supplementary_xml 透出 rerank 4 字段 attribute - TraceItemModel API 加 5 optional 字段 (forward compat) T5 (T5.1-T5.3, T5.5; T5.4 plugin Notice 留 plugin iter): - TraceItem.evidence + WikilinkNeighborContext.evidence - _extract_relationship_info() 返回 (type, evidence) tuple - assembler 渲染 `- 引证: ...` 行 (xml_text_escape + 截断 200 字) 测试: 65+ 新增 unit tests; 126/126 touched 文件 green; 零回归 PLAN-ID: EPIC1-BMAD-DEV-ASSESS-2026-04-17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 12, 2026
…dit f1+f3 chatgpt v4 5-verdict 拿到 4 closed + 1 closed-with-caveats. 同时找到 3 个 p1 我和 claude self-audit 都漏的真问题. 本 commit 完全闭口. w3-1 metadata redaction (chatgpt p1 #1): - supplementary_search_service.py:format_supplementary_xml 扩展 taint-aware 到 metadata 字段. 当 taint in {review, quarantine}, title / wikilink / source_path 也输出 [redacted: tainted title (risk=x.xx)] / [redacted] 等 placeholder, 不再无条件 _xml_escape 原文. - 修旁路: 攻击者把 prompt injection payload 埋 frontmatter title 即可绕过 (snippet redacted 但 title 原样进 prompt). 4 新测试覆盖 review/quarantine /clean/partial-field 4 路径. - bonus: test_supplementary_metadata_fuzz.py 去掉 4 个 xfail (sanitizer 现 merged 可强制 gate) w3-2 de-xfail 2 security tests (chatgpt p1 #2): - test_supplementary_review_floor.py + test_cross_vault_global_search.py 去掉 @pytest.mark.xfail(strict=false) 装饰器 - test_supplementary_review_floor 内 assert 翻转 (从"review survive floor" 改为"review must be dropped by floor"匹配 wave-2 p0-3b 修法) - --strict-markers 跑现 0 xfail / 0 xpassed / 2 passed strict w3-3 lancedb except narrowing + warning (chatgpt p1 #3): - lancedb_client.py:active_vault_id level 2/3 except 缩窄 (importerror, attributeerror, runtimeerror, valueerror) — basesexception / keyboardinterrupt / systemexit / asyncio.cancellederror 现在正确传播 - level 4 default fallback 前加 logger.warning ("fell back to 'default'") - 3 新测试覆盖: default fallback log / narrow exception keyboardinterrupt 传播 / runtime error fall through 链路完整 w3-4a frontend trim (claude self-audit f1): - main.ts:buildBackendHeaders 加 .trim() — 防 user 误填 " " whitespace key 被当 valid 触发 backend 403, 同时 trim 后发送防 leading/trailing space constant_time_compare 失败 - buildBackendHeadersPure pure-function mirror 同步更新 - 5 新测试 (whitespace-only / 混合 whitespace / 两端空格 / 内部空格保留 / undefined 安全) w3-4b wikilink __default__ once-warned (claude self-audit f3): - wikilink_graph_service.py:_resolve_vault_key 当落 __default__ 桶时 inspect.stack() 取真实 caller frame, 同 caller 仅 warn 一次 (避免每请求 噪音), 不同 caller 独立 warn - 加 _caller_fingerprint helper + _warn_default_fallback_once helper - 3 新测试 (same caller warns once / different callers independent / exception path also warns) 测试: backend 262 pass (含 3+3+4+3=13 新) / 1 pre-existing fail (story 2.5.y d16 cs61b: -> vault:cs61b: 格式锁, wave-3 无关) / 0 xpassed (strict gate); frontend 191 pass (+5 trim 测试). --strict-markers 闭口 ci. PLAN-ID: EPIC1-BMAD-DEV-ASSESS-2026-04-17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 13, 2026
story 2.3 (current_task 8-session plan s2) v1.0 ship — 5 ac + 5 task / 20 子任务全实现: ac #1: memory_service.search_error_memories(node_id, group_id, limit=5) wrapper - post-merge filter by episode_type ∈ {error, misconception, mistake} - timestamp desc sort, oversample max(20, limit*4) 防 episode_type filter 后剩余不足 - schema normalize → error_type/description/corrected_at/tags/source_session - search_memories signature 加 node_id 参数, 向后兼容 50+ 现有调用方 ac #2: chat_context_assembler.inject_error_reminders + priority 1.5 注入 - _format_historical_errors xml 标签包装 + 顶部 <policy> 段 (自然过渡 + 不要生硬插入) - 正面措辞模板硬编码 (学习者之前标记过 ... 如果讨论涉及此话题, 请自然地提醒区分) - assemble_context historical_errors 参数 priority 1.5 (current_note 后 1-hop 邻居前) - token 不够整段跳过不截断 (单条 error 截断会失真) ac #3 性能: chat.py asyncio.wait_for(timeout=3.0) + structlog memory_search_latency_ms ac #4 双路径熔断: - timeouterror → reason=search_timeout - (connectionerror, runtimeerror, oserror) → reason=service_unavailable - 降级时 historical_errors=[], 对话照常进行用户不感知 ac #5 空记录: empty list → 跳过 priority 1.5 段, 不输出冗余 无历史误解 提示 测试: tests/unit/test_story_2_3_error_reminders.py (287 行, 21 用例, 1.64s) 回归: test_chat_context_assembler.py + test_chat_endpoint.py 共 66 用例零失败 总计: 87/87 pass dod-3: _bmad-output/验收单/Story-2.3-historical-error-reminder.md (status: review) - d3-a 段 4-b 禁词 0 命中 - d3-e 我做 x→我看到 y→我感觉 z felt-sense 14 处 - d3-c 段 4-a 21 项 claude 已代验全 ✅ 含证据 plan: EPIC1-BMAD-DEV-ASSESS-2026-04-17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 13, 2026
…pass opt-in chatgpt-dr-2026-05-13 安全审查 critical #2: 修 memory poisoning 攻击向量. 漏洞: - 旧版 _require_observer_token 在 SIDECAR_OBSERVER_TOKEN 未配置时直接 return - 任何可达客户端可匿名 POST /memory/extract-conversation 注入 misconception - 攻击者可武器化用户个人记忆管道, 让 ai 按假误解出针对性考题 - 与 user 核心诉求 批注+个人记忆+检验白板针对性考察 直接冲突 修复 (chatgpt 推荐 + claude 加强): - 默认 fail-closed (token unset 也返回 503, 不再 open) - 显式 ALLOW_LOCAL_OBSERVER_BYPASS=true env 开关 - bypass 同时要求 client.host loopback only - 即便 bypass=true 但来自 lan/external ip → 仍 503 (纵深防御) - structlog warning log 当 bypass 触发, ops 可见 auth decision matrix: - token set + match → allow - token set + mismatch → 401 - token unset + bypass=true + loopback → allow + warning - token unset + bypass=true + non-loopback → 503 - token unset + bypass=false/unset → 503 测试: 12 cases, 6 branches + 3 defense-in-depth + 1 regression. 12/12 pass / 0.98s. archive: _bmad-output/research/2026-05-13-chatgpt-security-audit-INLINE.md 历史: chatgpt 已 3 次提生产默认值收紧 (round-23 / wave-2 v4 / 本次) 本次首次落地. @SPEC: PLAN-EPIC1-BMAD-DEV-ASSESS-2026-04-17 @SPEC: PLAN-001-CHATGPT-DR-2026-05-13-P0-1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oinani0721
added a commit
that referenced
this pull request
May 15, 2026
α-5 (反馈瞬间 #2/#3, 零依赖): - status-bar.ts: pure functions (countTipsFromFrontmatter, buildNavPath, buildStatusBarText, classifyTipsTransition, shouldTrackInNavPath, buildTipsIncreaseNotice) + StatusBarController class - main.ts onload: addStatusBarItem + new StatusBarController · metadataCache.on('changed') fan-out 到 statusBar.handleMetadataChanged · workspace.on('file-open') 接 handleFileOpen · onLayoutReady 用 getActiveFile() 触发一次初始化 - 格式: "📝 Tips: N · 📍 prev → current" 常驻 status bar - Notice "🎓 已记住 N 条 Tips" 在 tips count 自增时触发 - 37 tests all pass (pure helpers + spec-as-test grep main.ts wiring) α-3 (反馈瞬间 #4/#5, 接口契约已锁, 等 backend α-2/α-4): - exam-quick.ts: pure helpers (buildExamFilePath, todayDateStr, buildExamFileBody, extractAnswer, hasFeedbackSection, buildFeedbackAppend) + QuickExamController class - main.ts onload: new QuickExamController (closure 注入 callBackend + inferVaultId) · vault.on('modify') fast-path 分发到 onFileModified · addCommand canvas:start-quick-exam - 考察文件协议: 路径: 节点/考察-{concept}-{YYYY-MM-DD}[-{n}].md (重名递增) frontmatter: exam_question_id / source_concept / generated_at / exam_status 正文: # 考察 / ## 题目 / ## 你的回答 / ## 提交 (Cmd+S) 评分后追加: ## 反馈 ({score}/5) + feedback 文本 - 防重复 grade: hasFeedbackSection + session.graded flag - 35 tests all pass (含 spec-as-test grep main.ts wiring) Backend 契约 (协调文档 §5 锁定): - POST /api/v1/exam/quick body={node_id, vault_id} resp={question_id, question_text, generated_at?} - POST /api/v1/exam/grade body={question_id, user_answer} resp={score:0-5, feedback, mastery_delta?} Schema 不一致时 plugin 会 Notice "后端返回数据缺 X — 接口契约未对齐". 测试: - npm test: 253/265 pass, 12 fail (pre-existing baseline: callout wrapSelection ×6 + vault-indicator wiring ×6, 与 Session B 无关) - 新增 72 测试全 pass Build: - npm run build → 0 TS error, main.js 103KB - DD-12 阻断 cp 到 canvas-vault/, 由协调员部署 Broadcast: _bmad-output/_status/mvp-alpha-broadcast-session-b.yaml PLAN-NNN: EPIC1-BMAD-DEV-ASSESS-2026-04-17 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch lands
fix-test-infra-paralysis(originallya7dd270, cherry-picked asf19dcff) plus two follow-up fixes based on ChatGPT Deep Research review:f19dcff: The main fix-test-infra-paralysis change — wrapper-based hook exit code propagation, 8 stale-mock skips, new EpisodeWorker baseline test file. Reduces backend unit failures from 256 → ~103 (~60%).7f495edFrontend stryker wrapper: Closes the last residual pipe-blindness in.claude/hooks/post-tool-router.shline 70, flagged by ChatGPT review.0fd6d39Hash length assertion fix: Correctstest_id_formatandtest_batch_id_formatassertions from+16to+32to match the actualsha256[:32]truncation inmemory_service.py. Reduces failures 103 → 101.This PR is not intended for merge yet — it exists to (1) trigger CI (test.yml + api-spec-sync.yml) so combined status becomes populated for ChatGPT review, and (2) give a review thread anchor.
Key review areas
scripts/run_cmd_capture.sh(new in cherry-pick, 96 lines) — wrapper that captures stdout+stderr to /tmp via&>, preserves$?, prints tail on failure. No pipes internally..claude/hooks/post-tool-router.sh— frontend stryker now routed through wrapper (3 backend pytest tiers were already routed in the cherry-picked commit). Knip moved into a cwd-isolating subshell.backend/tests/unit/test_episode_worker_retry.py(new in cherry-pick, 312 lines) — 5 baseline tests covering EpisodeWorker enqueue / exponential backoff / dead-letter / metrics / request_id propagation. Replaces semantics that used to be inMemoryService._write_to_graphiti_json_with_retry(deleted in Phase 2 refactor).backend/tests/unit/test_story_30_10_idempotency.py— 4-line assertion fix (+ docstring "hash16" → "hash32").Test plan
backend/tests/unit/test_episode_worker_retry.py— 5 PASS (baseline)backend/tests/unit/test_story_30_10_idempotency.py::TestDeterministicEpisodeId::test_id_format— PASS (post-fix)backend/tests/unit/test_story_30_10_idempotency.py::TestBatchDeterministicEpisodeId::test_batch_id_format— PASS (post-fix)pytest backend/tests/unit/ -m "not integration" --ignore=...batch...— 101 failed / 8 errors / 48 skipped / 2506 passed (down from 103/8 baseline)bash -n .claude/hooks/post-tool-router.sh— syntax cleanOut of scope (follow-up)
dictConfig(document as logging discipline)🤖 Generated with Claude Code