PRD: Automate the KB structural audit-and-fix loop
Draft generated by the to-prd skill from the 2026-05-11 KB 30-day audit session.
Claim Ledger
Every load-bearing claim in this PRD must trace to a canon source — a real file path, a real script output, a real KB note — and must remain backtestable after implementation lands. Per KB/80_GOVERNANCE/CLAIM_BACKTEST_RULES.md (companion artifact to this PRD).
| Claim |
Primary source |
Backtest method |
kb_backtest.py exists at KB/80_GOVERNANCE/kb_backtest.py and emits ERROR: lines with the format ERROR: <relpath>: <message> |
KB/80_GOVERNANCE/kb_backtest.py (Reporter.print, line ~85) |
python KB/80_GOVERNANCE/kb_backtest.py --recent 30 --include-drafts --no-rest and inspect stdout format |
| Backtest categories cited in this PRD correspond to real error messages the script emits |
kb_backtest.py:check_source_note, check_wikilinks, check_routing |
grep kb_backtest.py for reporter.error( literals; each fix-category in the PRD must match one |
03_TECHNICAL_CORE/scripts/test_output_provenance.py runs a fresh pipeline into a temp directory and performs three classes of check (source-pattern grep, manifest field provenance, cross-fixture leak) |
test_output_provenance.py docstring lines 1-39 |
read the file's docstring directly |
AI_USAGE_POLICY.md forbids autonomous promotion to status: verified |
KB/80_GOVERNANCE/AI_USAGE_POLICY.md lines 44, 51, 68-69 |
grep the file for "verified" |
| The 2026-05-11 audit found 60 ERROR findings across 93 targets, all closed |
KB/log.md 2026-05-11 entry; original baseline output saved during the audit session |
re-run kb_backtest.py --recent 30 --include-drafts --no-rest and confirm 0 findings on the post-fix vault |
Five maps exist with the names cited (ARCO_home, NCOR_map, positioning_map, semantic-infrastructure_context-and-plan, open_questions_map) |
KB/90_MAPS/ directory listing; kb_backtest.py:MAP_TARGETS constant |
ls KB/90_MAPS/; grep kb_backtest.py for MAP_TARGETS |
KB/log.md entries use the format ## [YYYY-MM-DD] operation | description |
KB/CLAUDE.md "Log:" line; KB/log.md existing entries |
read either file |
If any row in this ledger fails its backtest method, the PRD must not be published, or the failing claim must be revised and re-verified.
Ledger verified 2026-05-11 against canon. Every row's backtest method was executed during PRD authoring; one initially-hallucinated row ("kb_backtest.py uses fixture-style assertions") was caught and removed in revision. See KB/log.md 2026-05-11 entry.
Problem Statement
The ARCO Obsidian KB lives at KB/ and is governed by KB/CLAUDE.md, KB/80_GOVERNANCE/VAULT_SPEC.md, and the KB/80_GOVERNANCE/kb_backtest.py structural checker. Today, when a periodic audit (e.g. --recent 30 --include-drafts) is run, the script emits a flat list of ERROR: lines — frequently dozens at a time. Each error must then be read, classified by the human, and hand-fixed across source notes, person stubs, gap notes, map files, NOTE_INDEX.md, and open_questions_map.md.
The recent 30-day audit (2026-05-11) produced 60 ERROR findings across 93 governed targets in a single kb_backtest.py --recent 30 --include-drafts --no-rest run, and required a substantial hand-fix pass: changing #tier2 → #tier3 on four compiled notes, replacing two [[extension_protocol]] wikilinks with backtick repo-path references to docs/agent/extension_protocol.md, adding source-note entries to four map files (ARCO_home, NCOR_map, positioning_map, semantic-infrastructure_context-and-plan), creating three missing person stubs (Hendler, Lassila, Henninger) plus one gap stub (gap_drift-handling), and appending six entries to NOTE_INDEX.md. Almost every category was deterministic; almost none of it required the human to make a semantic judgement. The fact that a 60-error pass takes a human in the loop is the problem.
Solution
A kb-audit-fix skill (and underlying library) that:
- Runs
kb_backtest.py to produce findings.
- Classifies each finding into a fix-category.
- Auto-applies the deterministic fix-categories without human input.
- Emits a structured human queue for the ambiguous categories (semantic content like Relevance-section prose, new person-stub fields).
- Re-runs
kb_backtest.py to verify convergence.
- Composes and appends a
## [YYYY-MM-DD] review | block to KB/log.md matching the existing operation-log conventions.
- Exits nonzero if the human queue is non-empty, so the loop can gate CI or a periodic agent run.
The intended end state: the human reviews the human queue (typically 3-10 items per pass) and authors the semantic content for those items only. Everything structurally deterministic — tier tags, NOTE_INDEX, map routing entries, repo-path wikilink replacements, open_questions_map routing — is automated.
Backtestability acceptance criterion. Every claim the fixer writes into the vault — auto-routed map bullets, NOTE_INDEX one-liners, log entries — must trace to a real datapoint in the source note it describes. Auto-routed bullets are tagged with <!-- auto-routed YYYY-MM-DD; review prose --> so a follow-up backtest pass can grep for them and confirm the bullet text matches the source note's Summary first paragraph. This pattern dogfoods KB/80_GOVERNANCE/CLAIM_BACKTEST_RULES.md: the fixer is itself an artifact whose outputs must pass a claim ledger.
User Stories
- As the KB maintainer, I want to run a single command and have all deterministic structural defects in recently added notes fixed automatically, so that I only spend time on the semantically ambiguous fixes.
- As the KB maintainer, I want a structured human queue of ambiguous fixes (with the suggested shape pre-filled), so that I can resolve them quickly without re-reading the backtest output.
- As the KB maintainer, I want the audit-fix pass to be idempotent, so that re-running it on a clean vault is a no-op.
- As the KB maintainer, I want the audit-fix pass to leave a
log.md entry matching existing conventions, so that the operation history is preserved and grep-parseable.
- As the KB maintainer, I want the pass to exit nonzero when the human queue is non-empty, so that I can gate CI or a periodic agent run on convergence.
- As the KB maintainer, I want notes with
status: compiled carrying #tier2 to be auto-demoted to #tier3, so that the trust-model invariant ("promotion to verified is always a manual human step") is mechanically enforced.
- As the KB maintainer, I want invalid
source_kind values (e.g. marketing, primary) flagged as a human-queue item rather than auto-rewritten, so that the semantic mismatch is brought to my attention instead of silently coerced.
- As the KB maintainer, I want missing-but-required frontmatter fields like
people: [] to be auto-added with empty default values, so that schema-compliance is restored without losing the audit signal.
- As the KB maintainer, I want broken wikilinks that point to repo files (e.g.
[[extension_protocol]] for docs/agent/extension_protocol.md) to be auto-rewritten as backtick repo-path references, so that the KB respects the "no KB/ paths in repo_refs" rule and the inverse.
- As the KB maintainer, I want broken wikilinks that are filename mismatches (e.g.
[[source-note_steal-this-deck-kgc-2026-recap]] when the file is steal-this-deck-kgc-2026-recap.md) to be auto-resolved against the vault stem index, so that obvious typos are fixed without my involvement.
- As the KB maintainer, I want broken wikilinks that look like person stubs (
[[person_<name>]] where no stub exists) to be queued for human creation with a person-stub skeleton pre-filled, so that I can fill in org/role/relationship in one minute per person.
- As the KB maintainer, I want broken wikilinks that look like template placeholders (e.g.
[[person_name]] in profile-dossier-template.md) detected and either cleared or skipped via a per-file allowlist, so that templates do not generate false-positive errors.
- As the KB maintainer, I want notes missing required sections (
## Relevance to ARCO, ## Relevance to NCOR / Sales, ## Open Questions) to be queued for human prose, with a section-header stub auto-inserted at the correct position, so that I can write only the prose and not worry about placement.
- As the KB maintainer, I want notes missing from
NOTE_INDEX.md to be auto-appended with a one-line description derived from the note's title and Summary section, so that index coverage stays complete.
- As the KB maintainer, I want notes that declare a
supports: [[map]] edge but are not back-linked from that map to be auto-added to the map's source-note section, with the bullet text composed from the note's Summary first paragraph, so that map routing reciprocates the support edge without prose drift.
- As the KB maintainer, I want notes that are tagged
positioning, ncor, or semantic-infrastructure but missing from the corresponding map to be auto-routed identically to the supports-edge case, so that tag-based routing is enforced symmetrically.
- As the KB maintainer, I want notes with non-empty
## Open Questions sections that are missing from open_questions_map.md to be auto-appended into a "Recent Sources (pending triage)" section, so that they appear in the central question surface immediately while leaving thematic placement for a follow-up human pass.
- As the KB maintainer, I want the auto-fixer to write a single transactional edit per file, so that re-running mid-failure does not produce partial state.
- As the KB maintainer, I want each fix-category to be independently disablable via a flag, so that I can run a "tier-tags only" pass when I am confident about the rest of the vault.
- As the KB maintainer, I want the human queue rendered both as a structured JSON file and as a human-readable markdown table, so that I can either eyeball it or feed it to a follow-up agent.
- As the KB maintainer, I want a
--dry-run mode that lists the edits a real run would make without writing them, so that I can preview a large pass before committing to it.
- As the KB maintainer, I want the post-fix backtest re-run to be diffed against the pre-fix backtest, so that I can see exactly which findings the pass closed and which it could not.
- As the KB maintainer, I want the
log.md entry to enumerate the categories, counts, and files-edited, so that the operation history is auditable and matches the format of existing manual review entries.
- As the KB maintainer, I want a fix-category for "missing tier tag entirely" (note has no
tier2/tier3 tag at all), so that schema completeness is enforced beyond just "wrong tier".
- As the KB maintainer, I want the human queue items to carry a stable identifier per finding, so that a subsequent agent can mark items resolved without reprocessing the full backtest.
Implementation Decisions
Modules
Six deep modules, each independently testable.
-
Finding classifier. Input: raw stdout from kb_backtest.py. Output: typed Finding records grouped by fix-category. Fix-categories: wrong-tier-tag, missing-tier-tag, invalid-source-kind, missing-frontmatter-field, broken-wikilink-template-placeholder, broken-wikilink-repo-path, broken-wikilink-filename-mismatch, broken-wikilink-needs-stub, broken-wikilink-other, missing-required-section, missing-note-index-entry, missing-map-routing-supports, missing-map-routing-tag, missing-open-questions-routing. Classifier is a pure function over the backtest output text plus the vault stem index — no I/O during classification.
-
Auto-fixer registry. A registry mapping fix-category → fixer. Each fixer takes Finding[] of its category plus vault root and returns a list of file-edit operations (path + before/after string, or "queue this for human" record). Deterministic fixers in this PRD: wrong-tier-tag (compiled → tier3), missing-frontmatter-field (add empty default), broken-wikilink-repo-path (wikilink → backtick repo-path), broken-wikilink-filename-mismatch (resolve via stem index), missing-note-index-entry (append with derived one-liner), missing-map-routing-supports (append generated bullet to map), missing-map-routing-tag (same), missing-open-questions-routing (append to a pending-triage section). Human-queue fixers in this PRD: invalid-source-kind, broken-wikilink-needs-stub, broken-wikilink-template-placeholder (with allowlist override), missing-required-section. Each fixer is independently disablable via flag.
-
Map-routing entry generator. Input: source-note frontmatter dict + map type (one of ARCO_home, NCOR_map, positioning_map, semantic-infrastructure_context-and-plan, open_questions_map). Output: one-line markdown bullet in the format the target map already uses. Pure function. The bullet text is composed from the note's Summary first paragraph (deterministic), with a marker like <!-- auto-routed YYYY-MM-DD; review prose --> so a human review pass can find them.
-
Human queue emitter. Input: ambiguous Finding[] records. Output: two artifacts — (a) KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.json and (b) a markdown rendering of the same in KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.md. Each queue item carries a stable id derived from (finding-category, file-path, finding-message-hash), the suggested fix-shape, and a placeholder block the human can fill in. Items are mutually independent so a follow-up agent can resolve them in any order.
-
Log entry composer. Input: pre-fix Finding[], post-fix Finding[], edit operations applied, human queue size. Output: a markdown block in the format ## [YYYY-MM-DD] review | KB audit-fix pass (N→M findings) matching the existing KB/log.md conventions. Pure function.
-
Orchestrator. Wires (1) → (2) → (3 used internally by 2) → (4) → re-run backtest → (5) → append-to-log. Exits 0 if post-fix findings count is 0 AND human queue is empty; exits 1 if human queue is non-empty; exits 2 on infrastructure failure.
Architectural decisions
- Reuse, don't reimplement, the backtest. The classifier consumes
kb_backtest.py's stdout as a stable contract. Backtest itself is not modified; if its output format changes, the classifier's parser updates in one place.
- Idempotence by construction. A second run on a clean vault produces zero findings → zero edits → no log entry. Achieve this by short-circuiting on the backtest result.
- No silent semantic rewrites.
invalid-source-kind is queued for the human, not auto-corrected, because the right value depends on what the source actually is.
- Per-file transactional edits. When multiple findings target the same file, all edits for that file are computed against the pre-fix file content and written in a single
Edit-style operation, so a mid-run failure leaves clean state.
- Stable finding ids.
(category, file-path, sha1(message)) so a follow-up agent or human can mark items resolved against the queue without re-running backtest.
Schema additions
KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.json — list of queue items, each with {id, category, path, finding, suggested_fix_shape, status: "pending" | "resolved"}.
- Marker comment in auto-routed map bullets:
<!-- auto-routed YYYY-MM-DD; review prose -->. Greppable for the human-review-prose follow-up pass.
Interaction with existing discipline
- Respects
AI_USAGE_POLICY.md: never sets status: verified or #tier2. Only demotes erroneously elevated tier tags.
- Respects
VAULT_SPEC.md post-write checklist for new entries (NOTE_INDEX, map routing, person-stub discipline).
- Respects
KB/CLAUDE.md "after every ingest/lint/query/review/restructure" log discipline.
- Respects the "Wikilinks in frontmatter arrays are graph edges" rule (no unquoted bare strings; preserves
"[[...]]" form).
Testing Decisions
A good test for this module set exercises the observable behavior of each module: feed a fixture in, assert the artifact out. No assertions on internal call order, no mocks of the file system at the unit level (use real tmpdir fixtures), no patching of the backtest script (use captured stdout strings as fixture data). Tests should keep passing under a future refactor that changes the internal call graph but preserves the external contract.
Modules under test (user-confirmed):
-
Finding classifier. Given captured kb_backtest.py stdout strings as fixtures (one fixture per fix-category, plus one mixed fixture covering all categories), assert the returned Finding[] has the right categories and counts. Prior art: there is no existing fixture-style test for kb_backtest.py itself — this PRD introduces the pattern. The closest existing fixture-driven test surface in the repo is 03_TECHNICAL_CORE/scripts/test_output_provenance.py, which runs pipeline output against committed fixtures (see "Auto-fixer registry" below).
-
Auto-fixer registry, per-category. For each deterministic fixer, a fixture pair (pre-state KB tmpdir + expected post-state KB tmpdir). Run the fixer, diff actual vs expected. Each fix-category is its own test case. Prior art: 03_TECHNICAL_CORE/scripts/test_output_provenance.py — per its docstring, that test runs a fresh pipeline into a temp directory and performs three classes of check: source-pattern grep for forbidden Python patterns, manifest field provenance (graph_backed values must match a re-run of the declared SPARQL source query), and cross-fixture leak detection. The pattern this PRD borrows: temp-directory test execution, committed fixture inputs, declarative manifest of what counts as correct, and explicit fail-by-design when the contract is not yet met.
-
Map-routing entry generator. Given a frontmatter dict + map type, assert the bullet string matches an expected snapshot. Pure function — cheapest tests in the bundle. Use parameterized cases covering each of the five map types and at least three frontmatter shapes (rich supports, only tags, both).
-
Log entry composer. Given before/after Finding[] counts + edit ops, assert the produced markdown block matches the existing KB/log.md format conventions (header line shape, category enumeration, files-edited summary). Snapshot-style test.
Modules NOT under test in this PRD (user-confirmed):
- Human queue emitter — out of scope for first pass; behavior is "write JSON + markdown" and reasonable to validate by inspection during initial iterations.
- Orchestrator — integration-level; can be exercised via a single end-to-end test on a small KB tmpdir fixture later, but not in scope for module-level testing.
Out of Scope
- Modifying
kb_backtest.py itself. The audit-fix loop consumes its output. Backtest remains the source of truth for what counts as a finding.
- Semantic content authoring. The fixer does not write
## Relevance to ARCO prose, does not draft Argument Deposits rows, does not invent person-stub relationship fields. Those land in the human queue.
- Promotion to verified. Tier elevation (
#tier3 → #tier2, status: compiled → status: verified) remains a manual human step per AI_USAGE_POLICY.md. The fixer only ever demotes wrongly-elevated tier tags.
- REST visibility checks. The fixer does not depend on or invoke the Obsidian Local REST API. REST checks remain a separate concern exercised by
kb_backtest.py --require-rest.
- Repository-side files outside
KB/. The fixer reads but never writes docs/agent/*, 03_TECHNICAL_CORE/*, or any non-KB area of the repo.
00_INBOX_RAW/. Raw inbox notes are intentionally ungoverned. The fixer skips them entirely.
- Multi-vault support. Single ARCO KB only. The vault root is
KB/ relative to the repo root, period.
- GUI / Obsidian plugin. CLI-only. Obsidian sees the resulting file changes on its next refresh; no integration with Obsidian's command palette.
- LLM-authored fixes for the human queue. A follow-up PRD can specify an agent that drains the human queue by drafting prose for human review. Not in scope here.
- Backporting fixes to git history. The fixer operates on the current working tree only. Historical drift in past commits is not addressed.
PRD: Automate the KB structural audit-and-fix loop
Draft generated by the
to-prdskill from the 2026-05-11 KB 30-day audit session.Claim Ledger
Every load-bearing claim in this PRD must trace to a canon source — a real file path, a real script output, a real KB note — and must remain backtestable after implementation lands. Per
KB/80_GOVERNANCE/CLAIM_BACKTEST_RULES.md(companion artifact to this PRD).kb_backtest.pyexists atKB/80_GOVERNANCE/kb_backtest.pyand emitsERROR:lines with the formatERROR: <relpath>: <message>KB/80_GOVERNANCE/kb_backtest.py(Reporter.print, line ~85)python KB/80_GOVERNANCE/kb_backtest.py --recent 30 --include-drafts --no-restand inspect stdout formatkb_backtest.py:check_source_note,check_wikilinks,check_routingkb_backtest.pyforreporter.error(literals; each fix-category in the PRD must match one03_TECHNICAL_CORE/scripts/test_output_provenance.pyruns a fresh pipeline into a temp directory and performs three classes of check (source-pattern grep, manifest field provenance, cross-fixture leak)test_output_provenance.pydocstring lines 1-39AI_USAGE_POLICY.mdforbids autonomous promotion tostatus: verifiedKB/80_GOVERNANCE/AI_USAGE_POLICY.mdlines 44, 51, 68-69KB/log.md2026-05-11 entry; original baseline output saved during the audit sessionkb_backtest.py --recent 30 --include-drafts --no-restand confirm 0 findings on the post-fix vaultARCO_home,NCOR_map,positioning_map,semantic-infrastructure_context-and-plan,open_questions_map)KB/90_MAPS/directory listing;kb_backtest.py:MAP_TARGETSconstantls KB/90_MAPS/; grepkb_backtest.pyforMAP_TARGETSKB/log.mdentries use the format## [YYYY-MM-DD] operation | descriptionKB/CLAUDE.md"Log:" line;KB/log.mdexisting entriesIf any row in this ledger fails its backtest method, the PRD must not be published, or the failing claim must be revised and re-verified.
Ledger verified 2026-05-11 against canon. Every row's backtest method was executed during PRD authoring; one initially-hallucinated row ("kb_backtest.py uses fixture-style assertions") was caught and removed in revision. See
KB/log.md2026-05-11 entry.Problem Statement
The ARCO Obsidian KB lives at
KB/and is governed byKB/CLAUDE.md,KB/80_GOVERNANCE/VAULT_SPEC.md, and theKB/80_GOVERNANCE/kb_backtest.pystructural checker. Today, when a periodic audit (e.g.--recent 30 --include-drafts) is run, the script emits a flat list ofERROR:lines — frequently dozens at a time. Each error must then be read, classified by the human, and hand-fixed across source notes, person stubs, gap notes, map files,NOTE_INDEX.md, andopen_questions_map.md.The recent 30-day audit (2026-05-11) produced 60 ERROR findings across 93 governed targets in a single
kb_backtest.py --recent 30 --include-drafts --no-restrun, and required a substantial hand-fix pass: changing#tier2→#tier3on four compiled notes, replacing two[[extension_protocol]]wikilinks with backtick repo-path references todocs/agent/extension_protocol.md, adding source-note entries to four map files (ARCO_home,NCOR_map,positioning_map,semantic-infrastructure_context-and-plan), creating three missing person stubs (Hendler, Lassila, Henninger) plus one gap stub (gap_drift-handling), and appending six entries toNOTE_INDEX.md. Almost every category was deterministic; almost none of it required the human to make a semantic judgement. The fact that a 60-error pass takes a human in the loop is the problem.Solution
A
kb-audit-fixskill (and underlying library) that:kb_backtest.pyto produce findings.kb_backtest.pyto verify convergence.## [YYYY-MM-DD] review |block toKB/log.mdmatching the existing operation-log conventions.The intended end state: the human reviews the human queue (typically 3-10 items per pass) and authors the semantic content for those items only. Everything structurally deterministic — tier tags, NOTE_INDEX, map routing entries, repo-path wikilink replacements,
open_questions_maprouting — is automated.Backtestability acceptance criterion. Every claim the fixer writes into the vault — auto-routed map bullets, NOTE_INDEX one-liners, log entries — must trace to a real datapoint in the source note it describes. Auto-routed bullets are tagged with
<!-- auto-routed YYYY-MM-DD; review prose -->so a follow-up backtest pass can grep for them and confirm the bullet text matches the source note's Summary first paragraph. This pattern dogfoodsKB/80_GOVERNANCE/CLAIM_BACKTEST_RULES.md: the fixer is itself an artifact whose outputs must pass a claim ledger.User Stories
log.mdentry matching existing conventions, so that the operation history is preserved and grep-parseable.status: compiledcarrying#tier2to be auto-demoted to#tier3, so that the trust-model invariant ("promotion to verified is always a manual human step") is mechanically enforced.source_kindvalues (e.g.marketing,primary) flagged as a human-queue item rather than auto-rewritten, so that the semantic mismatch is brought to my attention instead of silently coerced.people: []to be auto-added with empty default values, so that schema-compliance is restored without losing the audit signal.[[extension_protocol]]fordocs/agent/extension_protocol.md) to be auto-rewritten as backtick repo-path references, so that the KB respects the "noKB/paths inrepo_refs" rule and the inverse.[[source-note_steal-this-deck-kgc-2026-recap]]when the file issteal-this-deck-kgc-2026-recap.md) to be auto-resolved against the vault stem index, so that obvious typos are fixed without my involvement.[[person_<name>]]where no stub exists) to be queued for human creation with a person-stub skeleton pre-filled, so that I can fill in org/role/relationship in one minute per person.[[person_name]]inprofile-dossier-template.md) detected and either cleared or skipped via a per-file allowlist, so that templates do not generate false-positive errors.## Relevance to ARCO,## Relevance to NCOR / Sales,## Open Questions) to be queued for human prose, with a section-header stub auto-inserted at the correct position, so that I can write only the prose and not worry about placement.NOTE_INDEX.mdto be auto-appended with a one-line description derived from the note's title and Summary section, so that index coverage stays complete.supports: [[map]]edge but are not back-linked from that map to be auto-added to the map's source-note section, with the bullet text composed from the note's Summary first paragraph, so that map routing reciprocates the support edge without prose drift.positioning,ncor, orsemantic-infrastructurebut missing from the corresponding map to be auto-routed identically to the supports-edge case, so that tag-based routing is enforced symmetrically.## Open Questionssections that are missing fromopen_questions_map.mdto be auto-appended into a "Recent Sources (pending triage)" section, so that they appear in the central question surface immediately while leaving thematic placement for a follow-up human pass.--dry-runmode that lists the edits a real run would make without writing them, so that I can preview a large pass before committing to it.log.mdentry to enumerate the categories, counts, and files-edited, so that the operation history is auditable and matches the format of existing manual review entries.tier2/tier3tag at all), so that schema completeness is enforced beyond just "wrong tier".Implementation Decisions
Modules
Six deep modules, each independently testable.
Finding classifier. Input: raw stdout from
kb_backtest.py. Output: typedFindingrecords grouped by fix-category. Fix-categories:wrong-tier-tag,missing-tier-tag,invalid-source-kind,missing-frontmatter-field,broken-wikilink-template-placeholder,broken-wikilink-repo-path,broken-wikilink-filename-mismatch,broken-wikilink-needs-stub,broken-wikilink-other,missing-required-section,missing-note-index-entry,missing-map-routing-supports,missing-map-routing-tag,missing-open-questions-routing. Classifier is a pure function over the backtest output text plus the vault stem index — no I/O during classification.Auto-fixer registry. A registry mapping fix-category → fixer. Each fixer takes
Finding[]of its category plus vault root and returns a list of file-edit operations (path + before/after string, or "queue this for human" record). Deterministic fixers in this PRD:wrong-tier-tag(compiled → tier3),missing-frontmatter-field(add empty default),broken-wikilink-repo-path(wikilink → backtick repo-path),broken-wikilink-filename-mismatch(resolve via stem index),missing-note-index-entry(append with derived one-liner),missing-map-routing-supports(append generated bullet to map),missing-map-routing-tag(same),missing-open-questions-routing(append to a pending-triage section). Human-queue fixers in this PRD:invalid-source-kind,broken-wikilink-needs-stub,broken-wikilink-template-placeholder(with allowlist override),missing-required-section. Each fixer is independently disablable via flag.Map-routing entry generator. Input: source-note frontmatter dict + map type (one of
ARCO_home,NCOR_map,positioning_map,semantic-infrastructure_context-and-plan,open_questions_map). Output: one-line markdown bullet in the format the target map already uses. Pure function. The bullet text is composed from the note's Summary first paragraph (deterministic), with a marker like<!-- auto-routed YYYY-MM-DD; review prose -->so a human review pass can find them.Human queue emitter. Input: ambiguous
Finding[]records. Output: two artifacts — (a)KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.jsonand (b) a markdown rendering of the same inKB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.md. Each queue item carries a stable id derived from(finding-category, file-path, finding-message-hash), the suggested fix-shape, and a placeholder block the human can fill in. Items are mutually independent so a follow-up agent can resolve them in any order.Log entry composer. Input: pre-fix
Finding[], post-fixFinding[], edit operations applied, human queue size. Output: a markdown block in the format## [YYYY-MM-DD] review | KB audit-fix pass (N→M findings)matching the existingKB/log.mdconventions. Pure function.Orchestrator. Wires (1) → (2) → (3 used internally by 2) → (4) → re-run backtest → (5) → append-to-log. Exits 0 if post-fix findings count is 0 AND human queue is empty; exits 1 if human queue is non-empty; exits 2 on infrastructure failure.
Architectural decisions
kb_backtest.py's stdout as a stable contract. Backtest itself is not modified; if its output format changes, the classifier's parser updates in one place.invalid-source-kindis queued for the human, not auto-corrected, because the right value depends on what the source actually is.Edit-style operation, so a mid-run failure leaves clean state.(category, file-path, sha1(message))so a follow-up agent or human can mark items resolved against the queue without re-running backtest.Schema additions
KB/40_REVIEWS/auto-fix-queue_YYYY-MM-DD.json— list of queue items, each with{id, category, path, finding, suggested_fix_shape, status: "pending" | "resolved"}.<!-- auto-routed YYYY-MM-DD; review prose -->. Greppable for the human-review-prose follow-up pass.Interaction with existing discipline
AI_USAGE_POLICY.md: never setsstatus: verifiedor#tier2. Only demotes erroneously elevated tier tags.VAULT_SPEC.mdpost-write checklist for new entries (NOTE_INDEX, map routing, person-stub discipline).KB/CLAUDE.md"after every ingest/lint/query/review/restructure" log discipline."[[...]]"form).Testing Decisions
A good test for this module set exercises the observable behavior of each module: feed a fixture in, assert the artifact out. No assertions on internal call order, no mocks of the file system at the unit level (use real tmpdir fixtures), no patching of the backtest script (use captured stdout strings as fixture data). Tests should keep passing under a future refactor that changes the internal call graph but preserves the external contract.
Modules under test (user-confirmed):
Finding classifier. Given captured
kb_backtest.pystdout strings as fixtures (one fixture per fix-category, plus one mixed fixture covering all categories), assert the returnedFinding[]has the right categories and counts. Prior art: there is no existing fixture-style test forkb_backtest.pyitself — this PRD introduces the pattern. The closest existing fixture-driven test surface in the repo is03_TECHNICAL_CORE/scripts/test_output_provenance.py, which runs pipeline output against committed fixtures (see "Auto-fixer registry" below).Auto-fixer registry, per-category. For each deterministic fixer, a fixture pair (pre-state KB tmpdir + expected post-state KB tmpdir). Run the fixer, diff actual vs expected. Each fix-category is its own test case. Prior art:
03_TECHNICAL_CORE/scripts/test_output_provenance.py— per its docstring, that test runs a fresh pipeline into a temp directory and performs three classes of check: source-pattern grep for forbidden Python patterns, manifest field provenance (graph_backed values must match a re-run of the declared SPARQL source query), and cross-fixture leak detection. The pattern this PRD borrows: temp-directory test execution, committed fixture inputs, declarative manifest of what counts as correct, and explicit fail-by-design when the contract is not yet met.Map-routing entry generator. Given a frontmatter dict + map type, assert the bullet string matches an expected snapshot. Pure function — cheapest tests in the bundle. Use parameterized cases covering each of the five map types and at least three frontmatter shapes (rich
supports, onlytags, both).Log entry composer. Given before/after
Finding[]counts + edit ops, assert the produced markdown block matches the existingKB/log.mdformat conventions (header line shape, category enumeration, files-edited summary). Snapshot-style test.Modules NOT under test in this PRD (user-confirmed):
Out of Scope
kb_backtest.pyitself. The audit-fix loop consumes its output. Backtest remains the source of truth for what counts as a finding.## Relevance to ARCOprose, does not draft Argument Deposits rows, does not invent person-stubrelationshipfields. Those land in the human queue.#tier3→#tier2,status: compiled→status: verified) remains a manual human step perAI_USAGE_POLICY.md. The fixer only ever demotes wrongly-elevated tier tags.kb_backtest.py --require-rest.KB/. The fixer reads but never writesdocs/agent/*,03_TECHNICAL_CORE/*, or any non-KB area of the repo.00_INBOX_RAW/. Raw inbox notes are intentionally ungoverned. The fixer skips them entirely.KB/relative to the repo root, period.