Skip to content

feat(memory): wiki self-healing — cao memory heal (Phase 4 U1)#306

Open
fanhongy wants to merge 6 commits into
mainfrom
feat/wiki-self-healing
Open

feat(memory): wiki self-healing — cao memory heal (Phase 4 U1)#306
fanhongy wants to merge 6 commits into
mainfrom
feat/wiki-self-healing

Conversation

@fanhongy

Copy link
Copy Markdown
Contributor

Summary

Phase 3 shipped wiki lint detectors (cao memory lint) that report drift but offer no path from a finding to a fix. This adds the other half: services/wiki_healer.py + a cao memory heal CLI that consumes the exact LintIssue list run_lint() produces and applies a fix per issue type.

Dry-run by default, explicit --apply to mutate, one awaited audit event per mutation. Backend only — no web UI, no new detectors. Closes #297.

Fixes per issue type

issue_type Fix Gate Audit event
orphan_page delete wiki file + index.md line + SQLite row --apply orphan_pruned
contradiction keep newer updated_at article, forget the loser --apply contradiction_resolved
stale_claim strip the paragraph naming the stale path/symbol, atomic rewrite, stash pre-strip paragraph in audit --apply stale_claim_pruned
poison_frequency zero access_count --apply --aggressive (dual gate) poison_access_zeroed
graph_density flag-only, never mutates

Design invariants

  • Dry-run defaultapply=False mutates nothing; poison needs --apply --aggressive.
  • SQL row authoritative — contradiction/poison re-read the DB row by (key, scope, scope_id) and trust the DB, never the LintIssue payload.
  • Atomic per-issue-type batch — each group's DB writes run in one transaction; partial failure rolls that group back, others proceed.
  • Concurrency guard — dedicated .heal.lock (flock, LOCK_NB), separate from the index lock.
  • Bounded blast radius — run-level MAX_HEAL_ACTIONS=200 + per-type caps; truncation reported, never silent.
  • Recovery fieldstale_claim stashes the pre-strip paragraph (size-capped) in the audit record.
  • Deterministic contradiction tiebreak — on a same-second updated_at tie, keep the lexicographically-smaller key (reproducible, never order-dependent).

CLI

cao memory heal --scope <s> [--apply] [--issue-type <t>] [--aggressive] [--format table|json]

Testing

  • test/services/test_wiki_healer.py — 27 tests (dry-run read-only, each fix, dual gate, caps/truncation, SQL-authoritative, lock conflict, deterministic tiebreak, unparseable-skip).
  • CLI + audit-whitelist tests added.
  • Built via a design → implement → 3-lens adversarial review → fix workflow, then run through a pre-PR gate seeded with the bug-classes Copilot caught on prior memory PRs (timestamp-serialization, same-second-tie, plan/apply drift — all fixed).
  • Targeted suites green; mypy/black/isort clean on touched files. The 2 failures in the full suite (test_bm25_performance_within_budget, test_real_kiro_initialization_and_idle) are pre-existing flakes unrelated to this change.

Acceptance (#297)

  • heal() covers all five issue types; the two flag-only/gated types documented.
  • Default invocation is a dry-run plan; no mutation without --apply.
  • Every applied fix emits an audit event.
  • Run-level + per-type caps enforced; exceeding them truncates with a clear report.
  • Concurrent heal + store does not corrupt index.md or the DB (.heal.lock).
  • 0 regressions in existing lint / memory tests.

Out of scope (separate issues)

Daily heal cron / flow_service wiring; memory import/export (Phase 4 U2); cross-project federation (Phase 4 U3); any web UI surface.

🤖 Generated with Claude Code

Turn lint findings into fixes: orphan/contradiction/stale_claim repaired
under --apply, poison dual-gated, graph_density flag-only. Dry-run default,
audit trail, .heal.lock, bounded caps. Closes #297.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “self-healing” backend workflow for the memory wiki by introducing a wiki_healer service and a new cao memory heal CLI command that turns wiki_lint.run_lint() findings into planned/applied remediation actions (dry-run by default), with audit logging for each mutation.

Changes:

  • Introduces services/wiki_healer.py with dry-run planning, --apply gating, per-type/run-level caps, and a .heal.lock concurrency guard.
  • Adds cao memory heal CLI command plus CLI tests, and extends the audit-log sync whitelist for the new heal events.
  • Adds a dedicated unit test suite for healer behavior across issue types (including gating, caps, and lock contention) and updates memory docs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/cli_agent_orchestrator/services/wiki_healer.py Core implementation of wiki “heal” planning/apply logic, per-issue-type fixers, caps, and locking.
src/cli_agent_orchestrator/cli/commands/memory.py Adds cao memory heal command that runs lint then invokes healer with formatting options.
src/cli_agent_orchestrator/services/audit_log.py Adds new heal-related event types to the SYNC audit whitelist.
test/services/test_wiki_healer.py New end-to-end/unit tests for healer behavior, including dry-run invariants, fixes, caps, and lock conflict.
test/cli/commands/test_memory.py CLI-level tests for heal flag plumbing, filtering, output formats, and lock conflict surfacing.
test/services/test_audit_log.py Updates whitelist expectation to include heal events.
docs/memory.md Documents delivery status and references the new healing functionality.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +365 to +371
@click.option(
"--scope",
type=click.Choice([s.value for s in MemoryScope], case_sensitive=False),
default="project",
show_default=True,
help="Scope to heal.",
)
Comment on lines +711 to +714
if cap is not None and (
sum(1 for a in actions if _action_src_type(a.issue_type) == t) >= cap
):
truncated_by_type[t] = truncated_by_type.get(t, 0) + 1
Comment on lines +788 to +792
action = await healer(svc, issue, scope, scope_id, db)
actions.append(action)
n_this_type += 1
actions_applied += 1
db.commit()
Comment on lines +788 to +796
action = await healer(svc, issue, scope, scope_id, db)
actions.append(action)
n_this_type += 1
actions_applied += 1
db.commit()
except Exception as e:
# Roll the whole group back; surface as errored actions so the
# report is truthful (no silent partial-commit).
logger.warning("heal batch group rolled back type=%s: %s", t, type(e).__name__)
Comment on lines +301 to +306
return HealAction(
"orphan_pruned",
key,
status="applied",
description="file already absent; index/metadata cleaned",
)
Comment thread docs/memory.md Outdated
Comment on lines +27 to +29
**In progress:** Phase 4 U1 wiki self-healing adds `cao memory heal`, which consumes the
Phase 3 lint findings and applies a fix per issue type (dry-run by default, `--apply` to
mutate, full audit trail). It lives on `feat/wiki-self-healing` and is not yet PR'd.
@haofeif haofeif added the enhancement New feature or request label Jun 16, 2026
…ing, scope choice

Buffer per-action audit payloads and emit only after the group commit
succeeds, so a rolled-back heal never records a false mutation (notably
poison_frequency). Exclude skipped no-ops from cap budget in both apply
and dry-run. Restrict `heal --scope` to global/project. Doc + wording fixes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Comment on lines +271 to +279
async def _heal_orphan(
svc: MemoryService, issue: LintIssue, scope: str, scope_id: Optional[str], db: Any
) -> HealAction:
"""Delete wiki file + index line + SQLite row for one orphan_page issue.

The SQLite row delete runs on the SHARED ``db`` session (no per-issue
commit); the batch transaction is committed once by the caller.
"""
try:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good call. Will do the verifcation before delete memory wiki.

Comment on lines +514 to +535
if pre_strip is None:
# No paragraph matched — leave content unchanged, audit + skipped. This
# records a read-only outcome (nothing mutated), so it is safe to emit
# regardless of the batch commit; buffer it for uniformity.
return HealAction(
"stale_claim_pruned",
key,
status="skipped",
description=f"stale id {stale_id} not found in article",
pre_strip_paragraph=None,
audit=(
"stale_claim_pruned",
f"no paragraph found for {stale_id} in {key}",
{
"key": key,
"scope": scope,
"scope_id": scope_id or "",
"stale_identifier": stale_id,
"pre_strip_paragraph": "",
},
),
)
Comment on lines +313 to +322
audit=(
"orphan_pruned",
f"deleted orphan wiki file: {key}",
{
"key": key,
"scope": scope,
"scope_id": scope_id or "",
"file_path": str(wiki_path),
},
),
Comment thread test/services/test_wiki_healer.py
Comment thread docs/memory.md Outdated
Comment on lines +19 to +29
| — | **Phase 4 U1 — wiki self-healing** (`cao memory heal`): turn lint findings into fixes, dry-run by default | 🟡 In progress — branch `feat/wiki-self-healing` |
| — | **Phase 4 — import/export, federation** | ⏳ Pending — not yet split into a PR |

**What works on `main` today:** store, recall, forget, four scopes, SQLite-indexed BM25
search, 3-factor recall scoring, CLI inspection, MCP tools, retention/cleanup, all Phase 2.5
hardening, auto-injection into provider config files, LLM wiki compaction, cross-references,
`cao memory lint` detectors, the daily audit log, and the memory Web UI.

**In progress:** Phase 4 U1 wiki self-healing adds `cao memory heal`, which consumes the
Phase 3 lint findings and applies a fix per issue type (dry-run by default, `--apply` to
mutate, full audit trail). It is up for review on `feat/wiki-self-healing` (PR #306).

@fanhongy fanhongy Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will use a separate commit to update the doc. The doc is a bit messy at the moment, need a formal lint

Comment thread src/cli_agent_orchestrator/services/wiki_healer.py
fanhongy and others added 4 commits June 18, 2026 14:27
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
run_lint(scope="project") returns orphans across all project containers
but LintIssue carries no scope_id, so _heal_orphan rebuilt the path in the
current container — a key collision could delete a live memory. Guard now
skips delete when a SQLite row or index entry exists for this (scope,
scope_id).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove the maintainer Delivery Status table; fix stale claims (recall
search_mode/sort_by + 3-factor scoring, per-scope injection caps, project
identity precedence chain, plugin config-file injection); document lint,
compact, and heal CLI commands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Wiki self-healing: turn lint findings into fixes

3 participants