feat: extraction intelligence — two-layer dedup for meeting pipeline#12
Merged
feat: extraction intelligence — two-layer dedup for meeting pipeline#12
Conversation
… batch LLM review Two-layer dedup architecture to reduce ~40% decision/learning duplication: 1. Prompt-level: self-review instructions, "what is NOT a decision/learning" exclusion lists, confidence guides, and trivial/garbage filters 2. Post-reconciliation: batchLLMReview() for semantic dedup against committed memory items parsed from .arete/memory/items/ Also wires up broken plumbing: real confidence scores (no longer hardcoded 0.9), prior items from recent meetings fed into extraction prompts, and committed decisions/learnings loaded into reconciliation context. 463 tests pass (core), 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…erage Fixes from engineering lead review: - Fix `undefined as number` cast in confidence arrays → honest `(number | undefined)[]` type - Fix `hasNegationMarkers` false positives on "notification", "another", "note" — use word-boundary regex - Fix timezone-sensitive date comparison in `parseMemoryItems` — compare as YYYY-MM-DD strings - Fix 150-char action-item length limit incorrectly applied to decisions/learnings — use lighter `isGarbageDecisionOrLearning` filter - Add prompt injection mitigation in `batchLLMReview` — sanitize/truncate text, strip braces - Improve JSON parsing in `batchLLMReview` — try direct parse, then strip fences, then regex - Cache `reconciliationContext` in agent.ts and CLI to avoid redundant I/O - Expand trivial learning patterns — personal location, birthday, favorites - Add 12 new tests: word-boundary negation, long decisions/learnings, loadReconciliationContext, expanded trivial patterns 475 tests pass, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two new tests for batchLLMReview integration in agent.ts: - Verifies dropped items get status='skipped', source='reconciled' - Verifies graceful degradation when LLM call fails Note: 3 pre-existing failures in agent.test.ts (confidence/dedup) are unrelated to this change (confirmed on clean main). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Updated LEARNINGS.md with 5 new gotchas from extraction-intelligence - Added memory entry with metrics, pre-mortem analysis, and learnings - Rebuilt dist to reflect all source changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
isGarbageDecisionOrLearning(),isTrivialDecision(),isTrivialLearning()functions with word-boundary negation markersparseMemoryItems()reads committed decisions/learnings from.arete/memory/items/for reconciliation contextbatchLLMReview()provides semantic dedup against committed memory with prompt injection sanitizationAddresses ~40% commitment duplication, ~23% decision duplication, and ~18% learning duplication found in workspace audit.
Stats
Test plan
npx tsx --testin packages/core (475 passing)npx tsc --noEmit)🤖 Generated with Claude Code