feat(memory): wiki self-healing — cao memory heal (Phase 4 U1)#306
feat(memory): wiki self-healing — cao memory heal (Phase 4 U1)#306fanhongy wants to merge 6 commits into
cao memory heal (Phase 4 U1)#306Conversation
Turn lint findings into fixes: orphan/contradiction/stale_claim repaired under --apply, poison dual-gated, graph_density flag-only. Dry-run default, audit trail, .heal.lock, bounded caps. Closes #297.
f6d3bf7 to
3115067
Compare
There was a problem hiding this comment.
Pull request overview
Adds a “self-healing” backend workflow for the memory wiki by introducing a wiki_healer service and a new cao memory heal CLI command that turns wiki_lint.run_lint() findings into planned/applied remediation actions (dry-run by default), with audit logging for each mutation.
Changes:
- Introduces
services/wiki_healer.pywith dry-run planning,--applygating, per-type/run-level caps, and a.heal.lockconcurrency guard. - Adds
cao memory healCLI command plus CLI tests, and extends the audit-log sync whitelist for the new heal events. - Adds a dedicated unit test suite for healer behavior across issue types (including gating, caps, and lock contention) and updates memory docs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/cli_agent_orchestrator/services/wiki_healer.py |
Core implementation of wiki “heal” planning/apply logic, per-issue-type fixers, caps, and locking. |
src/cli_agent_orchestrator/cli/commands/memory.py |
Adds cao memory heal command that runs lint then invokes healer with formatting options. |
src/cli_agent_orchestrator/services/audit_log.py |
Adds new heal-related event types to the SYNC audit whitelist. |
test/services/test_wiki_healer.py |
New end-to-end/unit tests for healer behavior, including dry-run invariants, fixes, caps, and lock conflict. |
test/cli/commands/test_memory.py |
CLI-level tests for heal flag plumbing, filtering, output formats, and lock conflict surfacing. |
test/services/test_audit_log.py |
Updates whitelist expectation to include heal events. |
docs/memory.md |
Documents delivery status and references the new healing functionality. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @click.option( | ||
| "--scope", | ||
| type=click.Choice([s.value for s in MemoryScope], case_sensitive=False), | ||
| default="project", | ||
| show_default=True, | ||
| help="Scope to heal.", | ||
| ) |
| if cap is not None and ( | ||
| sum(1 for a in actions if _action_src_type(a.issue_type) == t) >= cap | ||
| ): | ||
| truncated_by_type[t] = truncated_by_type.get(t, 0) + 1 |
| action = await healer(svc, issue, scope, scope_id, db) | ||
| actions.append(action) | ||
| n_this_type += 1 | ||
| actions_applied += 1 | ||
| db.commit() |
| action = await healer(svc, issue, scope, scope_id, db) | ||
| actions.append(action) | ||
| n_this_type += 1 | ||
| actions_applied += 1 | ||
| db.commit() | ||
| except Exception as e: | ||
| # Roll the whole group back; surface as errored actions so the | ||
| # report is truthful (no silent partial-commit). | ||
| logger.warning("heal batch group rolled back type=%s: %s", t, type(e).__name__) |
| return HealAction( | ||
| "orphan_pruned", | ||
| key, | ||
| status="applied", | ||
| description="file already absent; index/metadata cleaned", | ||
| ) |
| **In progress:** Phase 4 U1 wiki self-healing adds `cao memory heal`, which consumes the | ||
| Phase 3 lint findings and applies a fix per issue type (dry-run by default, `--apply` to | ||
| mutate, full audit trail). It lives on `feat/wiki-self-healing` and is not yet PR'd. |
…ing, scope choice Buffer per-action audit payloads and emit only after the group commit succeeds, so a rolled-back heal never records a false mutation (notably poison_frequency). Exclude skipped no-ops from cap budget in both apply and dry-run. Restrict `heal --scope` to global/project. Doc + wording fixes.
| async def _heal_orphan( | ||
| svc: MemoryService, issue: LintIssue, scope: str, scope_id: Optional[str], db: Any | ||
| ) -> HealAction: | ||
| """Delete wiki file + index line + SQLite row for one orphan_page issue. | ||
|
|
||
| The SQLite row delete runs on the SHARED ``db`` session (no per-issue | ||
| commit); the batch transaction is committed once by the caller. | ||
| """ | ||
| try: |
There was a problem hiding this comment.
this is a good call. Will do the verifcation before delete memory wiki.
| if pre_strip is None: | ||
| # No paragraph matched — leave content unchanged, audit + skipped. This | ||
| # records a read-only outcome (nothing mutated), so it is safe to emit | ||
| # regardless of the batch commit; buffer it for uniformity. | ||
| return HealAction( | ||
| "stale_claim_pruned", | ||
| key, | ||
| status="skipped", | ||
| description=f"stale id {stale_id} not found in article", | ||
| pre_strip_paragraph=None, | ||
| audit=( | ||
| "stale_claim_pruned", | ||
| f"no paragraph found for {stale_id} in {key}", | ||
| { | ||
| "key": key, | ||
| "scope": scope, | ||
| "scope_id": scope_id or "", | ||
| "stale_identifier": stale_id, | ||
| "pre_strip_paragraph": "", | ||
| }, | ||
| ), | ||
| ) |
| audit=( | ||
| "orphan_pruned", | ||
| f"deleted orphan wiki file: {key}", | ||
| { | ||
| "key": key, | ||
| "scope": scope, | ||
| "scope_id": scope_id or "", | ||
| "file_path": str(wiki_path), | ||
| }, | ||
| ), |
| | — | **Phase 4 U1 — wiki self-healing** (`cao memory heal`): turn lint findings into fixes, dry-run by default | 🟡 In progress — branch `feat/wiki-self-healing` | | ||
| | — | **Phase 4 — import/export, federation** | ⏳ Pending — not yet split into a PR | | ||
|
|
||
| **What works on `main` today:** store, recall, forget, four scopes, SQLite-indexed BM25 | ||
| search, 3-factor recall scoring, CLI inspection, MCP tools, retention/cleanup, all Phase 2.5 | ||
| hardening, auto-injection into provider config files, LLM wiki compaction, cross-references, | ||
| `cao memory lint` detectors, the daily audit log, and the memory Web UI. | ||
|
|
||
| **In progress:** Phase 4 U1 wiki self-healing adds `cao memory heal`, which consumes the | ||
| Phase 3 lint findings and applies a fix per issue type (dry-run by default, `--apply` to | ||
| mutate, full audit trail). It is up for review on `feat/wiki-self-healing` (PR #306). |
There was a problem hiding this comment.
will use a separate commit to update the doc. The doc is a bit messy at the moment, need a formal lint
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
run_lint(scope="project") returns orphans across all project containers but LintIssue carries no scope_id, so _heal_orphan rebuilt the path in the current container — a key collision could delete a live memory. Guard now skips delete when a SQLite row or index entry exists for this (scope, scope_id). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove the maintainer Delivery Status table; fix stale claims (recall search_mode/sort_by + 3-factor scoring, per-scope injection caps, project identity precedence chain, plugin config-file injection); document lint, compact, and heal CLI commands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Phase 3 shipped wiki lint detectors (
cao memory lint) that report drift but offer no path from a finding to a fix. This adds the other half:services/wiki_healer.py+ acao memory healCLI that consumes the exactLintIssuelistrun_lint()produces and applies a fix per issue type.Dry-run by default, explicit
--applyto mutate, one awaited audit event per mutation. Backend only — no web UI, no new detectors. Closes #297.Fixes per issue type
issue_typeorphan_pageindex.mdline + SQLite row--applyorphan_prunedcontradictionupdated_atarticle, forget the loser--applycontradiction_resolvedstale_claim--applystale_claim_prunedpoison_frequencyaccess_count--apply --aggressive(dual gate)poison_access_zeroedgraph_densityDesign invariants
apply=Falsemutates nothing; poison needs--apply --aggressive.(key, scope, scope_id)and trust the DB, never theLintIssuepayload..heal.lock(flock,LOCK_NB), separate from the index lock.MAX_HEAL_ACTIONS=200+ per-type caps; truncation reported, never silent.stale_claimstashes the pre-strip paragraph (size-capped) in the audit record.updated_attie, keep the lexicographically-smaller key (reproducible, never order-dependent).CLI
Testing
test/services/test_wiki_healer.py— 27 tests (dry-run read-only, each fix, dual gate, caps/truncation, SQL-authoritative, lock conflict, deterministic tiebreak, unparseable-skip).mypy/black/isortclean on touched files. The 2 failures in the full suite (test_bm25_performance_within_budget,test_real_kiro_initialization_and_idle) are pre-existing flakes unrelated to this change.Acceptance (#297)
heal()covers all five issue types; the two flag-only/gated types documented.--apply.index.mdor the DB (.heal.lock).Out of scope (separate issues)
Daily heal cron /
flow_servicewiring; memory import/export (Phase 4 U2); cross-project federation (Phase 4 U3); any web UI surface.🤖 Generated with Claude Code