Automate the KB structural audit-and-fix loop

# PRD: Automate the KB structural audit-and-fix loop

*Draft generated by the `to-prd` skill from the 2026-05-11 KB 30-day audit session.*

## Claim Ledger

Every load-bearing claim in this PRD must trace to a canon source — a real file path, a real script output, a real KB note — and must remain backtestable after implementation lands. Per `KB/80_GOVERNANCE/CLAIM_BACKTEST_RULES.md` (companion artifact to this PRD).

| Claim | Primary source | Backtest method |
|---|---|---|
| `kb_backtest.py` exists at `KB/80_GOVERNANCE/kb_backtest.py` and emits `ERROR:` lines with the format `ERROR: <relpath>: <message>` | `KB/80_GOVERNANCE/kb_backtest.py` (Reporter.print, line ~85) | `python KB/80_GOVERNANCE/kb_backtest.py --recent 30 --include-drafts --no-rest` and inspect stdout format |
| Backtest categories cited in this PRD correspond to real error messages the script emits | `kb_backtest.py:check_source_note`, `check_wikilinks`, `check_routing` | grep `kb_backtest.py` for `reporter.error(` literals; each fix-category in the PRD must match one |
| `03_TECHNICAL_CORE/scripts/test_output_provenance.py` runs a fresh pipeline into a temp directory and performs three classes of check (source-pattern grep, manifest field provenance, cross-fixture leak) | `test_output_provenance.py` docstring lines 1-39 | read the file's docstring directly |
| `AI_USAGE_POLICY.md` forbids autonomous promotion to `status: verified` | `KB/80_GOVERNANCE/AI_USAGE_POLICY.md` lines 44, 51, 68-69 | grep the file for "verified" |
| The 2026-05-11 audit found 60 ERROR findings across 93 targets, all closed | `KB/log.md` 2026-05-11 entry; original baseline output saved during the audit session | re-run `kb_backtest.py --recent 30 --include-drafts --no-rest` and confirm 0 findings on the post-fix vault |
| Five maps exist with the names cited (`ARCO_home`, `NCOR_map`, `positioning_map`, `semantic-infrastructure_context-and-plan`, `open_questions_map`) | `KB/90_MAPS/` directory listing; `kb_backtest.py:MAP_TARGETS` constant | `ls KB/90_MAPS/`; grep `kb_backtest.py` for `MAP_TARGETS` |
| `KB/log.md` entries use the format `## [YYYY-MM-DD] operation \| description` | `KB/CLAUDE.md` "Log:" line; `KB/log.md` existing entries | read either file |

If any row in this ledger fails its backtest method, the PRD must not be published, or the failing claim must be revised and re-verified.

*Ledger verified 2026-05-11 against canon. Every row's backtest method was executed during PRD authoring; one initially-hallucinated row ("kb_backtest.py uses fixture-style assertions") was caught and removed in revision. See `KB/log.md` 2026-05-11 entry.*

## Problem Statement

The ARCO Obsidian KB lives at `KB/` and is governed by `KB/CLAUDE.md`, `KB/80_GOVERNANCE/VAULT_SPEC.md`, and the `KB/80_GOVERNANCE/kb_backtest.py` structural checker. Today, when a periodic audit (e.g. `--recent 30 --include-drafts`) is run, the script emits a flat list of `ERROR:` lines — frequently dozens at a time. Each error must then be read, classified by the human, and hand-fixed across source notes, person stubs, gap notes, map files, `NOTE_INDEX.md`, and `open_questions_map.md`.

The recent 30-day audit (2026-05-11) produced 60 ERROR findings across 93 governed targets in a single `kb_backtest.py --recent 30 --include-drafts --no-rest` run, and required a substantial hand-fix pass: changing `#tier2` → `#tier3` on four compiled notes, replacing two `[[extension_protocol]]` wikilinks with backtick repo-path references to `docs/agent/extension_protocol.md`, adding source-note entries to four map files (`ARCO_home`, `NCOR_map`, `positioning_map`, `semantic-infrastructure_context-and-plan`), creating three missing person stubs (Hendler, Lassila, Henninger) plus one gap stub (`gap_drift-handling`), and appending six entries to `NOTE_INDEX.md`. Almost every category was deterministic; almost none of it required the human to make a *semantic* judgement. The fact that a 60-error pass takes a human in the loop is the problem.

## Solution

A `kb-audit-fix` skill (and underlying library) that:

1. Runs `kb_backtest.py` to produce findings.
2. Classifies each finding into a fix-category.
3. Auto-applies the deterministic fix-categories without human input.
4. Emits a structured human queue for the ambiguous categories (semantic content like Relevance-section prose, new person-stub fields).
5. Re-runs `kb_backtest.py` to verify convergence.
6. Composes and appends a `## [YYYY-MM-DD] review |` block to `KB/log.md` matching the existing operation-log conventions.
7. Exits nonzero if the human queue is non-empty, so the loop can gate CI or a periodic agent run.

The intended end state: the human reviews the human queue (typically 3-10 items per pass) and authors the semantic content for those items only. Everything structurally deterministic — tier tags, NOTE_INDEX, map routing entries, repo-path wikilink replacements, `open_questions_map` routing — is automated.

**Backtestability acceptance criterion.** Every claim the fixer writes into the vault — auto-routed map bullets, NOTE_INDEX one-liners, log entries — must trace to a real datapoint in the source note it describes. Auto-routed bullets are tagged with `` so a follow-up backtest pass can grep for them and confirm the bullet text matches the source note's Summary first paragraph. This pattern dogfoods `KB/80_GOVERNANCE/CLAIM_BACKTEST_RULES.md`: the fixer is itself an artifact whose outputs must pass a claim ledger.

## User Stories

1. As the KB maintainer, I want to run a single command and have all deterministic structural defects in recently added notes fixed automatically, so that I only spend time on the semantically ambiguous fixes.
2. As the KB maintainer, I want a structured human queue of ambiguous fixes (with the suggested shape pre-filled), so that I can resolve them quickly without re-reading the backtest output.
3. As the KB maintainer, I want the audit-fix pass to be idempotent, so that re-running it on a clean vault is a no-op.
4. As the KB maintainer, I want the audit-fix pass to leave a `log.md` entry matching existing conventions, so that the operation history is preserved and grep-parseable.
5. As the KB maintainer, I want the pass to exit nonzero when the human queue is non-empty, so that I can gate CI or a periodic agent run on convergence.
6. As the KB maintainer, I want notes with `status: compiled` carrying `#tier2` to be auto-demoted to `#tier3`, so that the trust-model invariant ("promotion to verified is always a manual human step") is mechanically enforced.
7. As the KB maintainer, I want invalid `source_kind` values (e.g. `marketing`, `primary`) flagged as a human-queue item rather than auto-rewritten, so that the semantic mismatch is brought to my attention instead of silently coerced.
8. As the KB maintainer, I want missing-but-required frontmatter fields like `people: []` to be auto-added with empty default values, so that schema-compliance is restored without losing the audit signal.
9. As the KB maintainer, I want broken wikilinks that point to repo files (e.g. `[[extension_protocol]]` for `docs/agent/extension_protocol.md`) to be auto-rewritten as backtick repo-path references, so that the KB respects the "no `KB/` paths in `repo_refs`" rule and the inverse.
10. As the KB maintainer, I want broken wikilinks that are filename mismatches (e.g. `[[source-note_steal-this-deck-kgc-2026-recap]]` when the file is `steal-this-deck-kgc-2026-recap.md`) to be auto-resolved against the vault stem index, so that obvious typos are fixed without my involvement.
11. As the KB maintainer, I want broken wikilinks that look like person stubs (`[[person_<name>]]` where no stub exists) to be queued for human creation with a person-stub skeleton pre-filled, so that I can fill in org/role/relationship in one minute per person.
12. As the KB maintainer, I want broken wikilinks that look like template placeholders (e.g. `[[person_name]]` in `profile-dossier-template.md`) detected and either cleared or skipped via a per-file allowlist, so that templates do not generate false-positive errors.
13. As the KB maintainer, I want notes missing required sections (`## Relevance to ARCO`, `## Relevance to NCOR / Sales`, `## Open Questions`) to be queued for human prose, with a section-header stub auto-inserted at the correct position, so that I can write only the prose and not worry about placement.
14. As the KB maintainer, I want notes missing from `NOTE_INDEX.md` to be auto-appended with a one-line description derived from the note's title and Summary section, so that index coverage stays complete.
15. As the KB maintainer, I want notes that declare a `supports: [[map]]` edge but are not back-linked from that map to be auto-added to the map's source-note section, with the bullet text composed from the note's Summary first paragraph, so that map routing reciprocates the support edge without prose drift.
16. As the KB maintainer, I want notes that are tagged `positioning`, `ncor`, or `semantic-infrastructure` but missing from the corresponding map to be auto-routed identically to the supports-edge case, so that tag-based routing is enforced symmetrically.
17. As the KB maintainer, I want notes with non-empty `## Open Questions` sections that are missing from `open_questions_map.md` to be auto-appended into a "Recent Sources (pending triage)" section, so that they appear in the central question surface immediately while leaving thematic placement for a follow-up human pass.
18. As the KB maintainer, I want the auto-fixer to write a single transactional edit per file, so that re-running mid-failure does not produce partial state.
19. As the KB maintainer, I want each fix-category to be independently disablable via a flag, so that I can run a "tier-tags only" pass when I am confident about the rest of the vault.
20. As the KB maintainer, I want the human queue rendered both as a structured JSON file and as a human-readable markdown table, so that I can either eyeball it or feed it to a follow-up agent.
21. As the KB maintainer, I want a `--dry-run` mode that lists the edits a real run would make without writing them, so that I can preview a large pass before committing to it.
22. As the KB maintainer, I want the post-fix backtest re-run to be diffed against the pre-fix backtest, so that I can see exactly which findings the pass closed and which it could not.
23. As the KB maintainer, I want the `log.md` entry to enumerate the categories, counts, and files-edited, so that the operation history is auditable and matches the format of existing manual review entries.
24. As the KB maintainer, I want a fix-category for "missing tier tag entirely" (note has no `tier2`/`tier3` tag at all), so that schema completeness is enforced beyond just "wrong tier".
25. As the KB maintainer, I want the human queue items to carry a stable identifier per finding, so that a subsequent agent can mark items resolved without reprocessing the full backtest.

## Implementation Decisions

### Modules

Six deep modules, each independently testable.

1. **Finding classifier.** Input: raw stdout from `kb_backtest.py`. Output: typed `Finding` records grouped by fix-category. Fix-categories: `wrong-tier-tag`, `missing-tier-tag`, `invalid-source-kind`, `missing-frontmatter-field`, `broken-wikilink-template-placeholder`, `broken-wikilink-repo-path`, `broken-wikilink-filename-mismatch`, `broken-wikilink-needs-stub`, `broken-wikilink-other`, `missing-required-section`, `missing-note-index-entry`, `missing-map-routing-supports`, `missing-map-routing-tag`, `missing-open-questions-routing`. Classifier is a pure function over the backtest output text plus the vault stem index — no I/O during classification.

2. **Auto-fixer registry.** A registry mapping fix-category → fixer. Each fixer takes `Finding[]` of its category plus vault root and returns a list of file-edit operations (path + before/after string, or "queue this for human" record). Deterministic fixers in this PRD: `wrong-tier-tag` (compiled → tier3), `missing-frontmatter-field` (add empty default), `broken-wikilink-repo-path` (wikilink → backtick repo-path), `broken-wikilink-filename-mismatch` (resolve via stem index), `missing-note-index-entry` (append with derived one-liner), `missing-map-routing-supports` (append generated bullet to map), `missing-map-routing-tag` (same), `missing-open-questions-routing` (append to a pending-triage section). Human-queue fixers in this PRD: `invalid-source-kind`, `broken-wikilink-needs-stub`, `broken-wikilink-template-placeholder` (with allowlist override), `missing-required-section`. Each fixer is independently disablable via flag.

3. **Map-routing entry generator.** Input: source-note frontmatter dict + map type (one of `ARCO_home`, `NCOR_map`, `positioning_map`, `semantic-infrastructure_context-and-plan`, `open_questions_map`). Output: one-line markdown bullet in the format the target map already uses. Pure function. The bullet text is composed from the note's Summary first paragraph (deterministic), with a marker like `` so a human review pass can find them.

4. **Human queue emitter.** Input: ambiguous `Finding[]` records. Output: two artifacts — (a) `KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.json` and (b) a markdown rendering of the same in `KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.md`. Each queue item carries a stable id derived from `(finding-category, file-path, finding-message-hash)`, the suggested fix-shape, and a placeholder block the human can fill in. Items are mutually independent so a follow-up agent can resolve them in any order.

5. **Log entry composer.** Input: pre-fix `Finding[]`, post-fix `Finding[]`, edit operations applied, human queue size. Output: a markdown block in the format `## [YYYY-MM-DD] review | KB audit-fix pass (N→M findings)` matching the existing `KB/log.md` conventions. Pure function.

6. **Orchestrator.** Wires (1) → (2) → (3 used internally by 2) → (4) → re-run backtest → (5) → append-to-log. Exits 0 if post-fix findings count is 0 AND human queue is empty; exits 1 if human queue is non-empty; exits 2 on infrastructure failure.

### Architectural decisions

- **Reuse, don't reimplement, the backtest.** The classifier consumes `kb_backtest.py`'s stdout as a stable contract. Backtest itself is not modified; if its output format changes, the classifier's parser updates in one place.
- **Idempotence by construction.** A second run on a clean vault produces zero findings → zero edits → no log entry. Achieve this by short-circuiting on the backtest result.
- **No silent semantic rewrites.** `invalid-source-kind` is queued for the human, not auto-corrected, because the right value depends on what the source actually is.
- **Per-file transactional edits.** When multiple findings target the same file, all edits for that file are computed against the pre-fix file content and written in a single `Edit`-style operation, so a mid-run failure leaves clean state.
- **Stable finding ids.** `(category, file-path, sha1(message))` so a follow-up agent or human can mark items resolved against the queue without re-running backtest.

### Schema additions

- `KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.json` — list of queue items, each with `{id, category, path, finding, suggested_fix_shape, status: "pending" | "resolved"}`.
- Marker comment in auto-routed map bullets: ``. Greppable for the human-review-prose follow-up pass.

### Interaction with existing discipline

- Respects `AI_USAGE_POLICY.md`: never sets `status: verified` or `#tier2`. Only demotes erroneously elevated tier tags.
- Respects `VAULT_SPEC.md` post-write checklist for new entries (NOTE_INDEX, map routing, person-stub discipline).
- Respects `KB/CLAUDE.md` "after every ingest/lint/query/review/restructure" log discipline.
- Respects the "Wikilinks in frontmatter arrays are graph edges" rule (no unquoted bare strings; preserves `"[[...]]"` form).

## Testing Decisions

A good test for this module set exercises the **observable behavior** of each module: feed a fixture in, assert the artifact out. No assertions on internal call order, no mocks of the file system at the unit level (use real tmpdir fixtures), no patching of the backtest script (use captured stdout strings as fixture data). Tests should keep passing under a future refactor that changes the internal call graph but preserves the external contract.

**Modules under test (user-confirmed):**

1. **Finding classifier.** Given captured `kb_backtest.py` stdout strings as fixtures (one fixture per fix-category, plus one mixed fixture covering all categories), assert the returned `Finding[]` has the right categories and counts. Prior art: there is no existing fixture-style test for `kb_backtest.py` itself — this PRD introduces the pattern. The closest existing fixture-driven test surface in the repo is `03_TECHNICAL_CORE/scripts/test_output_provenance.py`, which runs pipeline output against committed fixtures (see "Auto-fixer registry" below).

2. **Auto-fixer registry, per-category.** For each deterministic fixer, a fixture pair (pre-state KB tmpdir + expected post-state KB tmpdir). Run the fixer, diff actual vs expected. Each fix-category is its own test case. Prior art: `03_TECHNICAL_CORE/scripts/test_output_provenance.py` — per its docstring, that test runs a fresh pipeline into a temp directory and performs three classes of check: source-pattern grep for forbidden Python patterns, manifest field provenance (graph_backed values must match a re-run of the declared SPARQL source query), and cross-fixture leak detection. The pattern this PRD borrows: temp-directory test execution, committed fixture inputs, declarative manifest of what counts as correct, and explicit fail-by-design when the contract is not yet met.

3. **Map-routing entry generator.** Given a frontmatter dict + map type, assert the bullet string matches an expected snapshot. Pure function — cheapest tests in the bundle. Use parameterized cases covering each of the five map types and at least three frontmatter shapes (rich `supports`, only `tags`, both).

4. **Log entry composer.** Given before/after `Finding[]` counts + edit ops, assert the produced markdown block matches the existing `KB/log.md` format conventions (header line shape, category enumeration, files-edited summary). Snapshot-style test.

**Modules NOT under test in this PRD (user-confirmed):**

- Human queue emitter — out of scope for first pass; behavior is "write JSON + markdown" and reasonable to validate by inspection during initial iterations.
- Orchestrator — integration-level; can be exercised via a single end-to-end test on a small KB tmpdir fixture later, but not in scope for module-level testing.

## Out of Scope

- **Modifying `kb_backtest.py` itself.** The audit-fix loop consumes its output. Backtest remains the source of truth for what counts as a finding.
- **Semantic content authoring.** The fixer does not write `## Relevance to ARCO` prose, does not draft Argument Deposits rows, does not invent person-stub `relationship` fields. Those land in the human queue.
- **Promotion to verified.** Tier elevation (`#tier3` → `#tier2`, `status: compiled` → `status: verified`) remains a manual human step per `AI_USAGE_POLICY.md`. The fixer only ever demotes wrongly-elevated tier tags.
- **REST visibility checks.** The fixer does not depend on or invoke the Obsidian Local REST API. REST checks remain a separate concern exercised by `kb_backtest.py --require-rest`.
- **Repository-side files outside `KB/`.** The fixer reads but never writes `docs/agent/*`, `03_TECHNICAL_CORE/*`, or any non-KB area of the repo.
- **`00_INBOX_RAW/`.** Raw inbox notes are intentionally ungoverned. The fixer skips them entirely.
- **Multi-vault support.** Single ARCO KB only. The vault root is `KB/` relative to the repo root, period.
- **GUI / Obsidian plugin.** CLI-only. Obsidian sees the resulting file changes on its next refresh; no integration with Obsidian's command palette.
- **LLM-authored fixes for the human queue.** A follow-up PRD can specify an agent that drains the human queue by drafting prose for human review. Not in scope here.
- **Backporting fixes to git history.** The fixer operates on the current working tree only. Historical drift in past commits is not addressed.

Claim	Primary source	Backtest method
`kb_backtest.py` exists at `KB/80_GOVERNANCE/kb_backtest.py` and emits `ERROR:` lines with the format `ERROR: <relpath>: <message>`	`KB/80_GOVERNANCE/kb_backtest.py` (Reporter.print, line ~85)	`python KB/80_GOVERNANCE/kb_backtest.py --recent 30 --include-drafts --no-rest` and inspect stdout format
Backtest categories cited in this PRD correspond to real error messages the script emits	`kb_backtest.py:check_source_note`, `check_wikilinks`, `check_routing`	grep `kb_backtest.py` for `reporter.error(` literals; each fix-category in the PRD must match one
`03_TECHNICAL_CORE/scripts/test_output_provenance.py` runs a fresh pipeline into a temp directory and performs three classes of check (source-pattern grep, manifest field provenance, cross-fixture leak)	`test_output_provenance.py` docstring lines 1-39	read the file's docstring directly
`AI_USAGE_POLICY.md` forbids autonomous promotion to `status: verified`	`KB/80_GOVERNANCE/AI_USAGE_POLICY.md` lines 44, 51, 68-69	grep the file for "verified"
The 2026-05-11 audit found 60 ERROR findings across 93 targets, all closed	`KB/log.md` 2026-05-11 entry; original baseline output saved during the audit session	re-run `kb_backtest.py --recent 30 --include-drafts --no-rest` and confirm 0 findings on the post-fix vault
Five maps exist with the names cited (`ARCO_home`, `NCOR_map`, `positioning_map`, `semantic-infrastructure_context-and-plan`, `open_questions_map`)	`KB/90_MAPS/` directory listing; `kb_backtest.py:MAP_TARGETS` constant	`ls KB/90_MAPS/`; grep `kb_backtest.py` for `MAP_TARGETS`
`KB/log.md` entries use the format `## [YYYY-MM-DD] operation \| description`	`KB/CLAUDE.md` "Log:" line; `KB/log.md` existing entries	read either file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate the KB structural audit-and-fix loop #43

PRD: Automate the KB structural audit-and-fix loop

Claim Ledger

Problem Statement

Solution

User Stories

Implementation Decisions

Modules

Architectural decisions

Schema additions

Interaction with existing discipline

Testing Decisions

Out of Scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Automate the KB structural audit-and-fix loop #43

Description

PRD: Automate the KB structural audit-and-fix loop

Claim Ledger

Problem Statement

Solution

User Stories

Implementation Decisions

Modules

Architectural decisions

Schema additions

Interaction with existing discipline

Testing Decisions

Out of Scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions