feat: extraction intelligence — two-layer dedup for meeting pipeline by johnkoht · Pull Request #12 · johnkoht/arete

johnkoht · 2026-04-09T19:07:06Z

Summary

Prompt hardening: Added "What is NOT a decision/learning" exclusion sections and self-review instructions to the extraction prompt, reducing low-signal items at source
Confidence parsing: Decisions and learnings now carry real confidence values from the LLM (was hardcoded 0.9), enabling threshold-based filtering
Trivial/garbage filters: New isGarbageDecisionOrLearning(), isTrivialDecision(), isTrivialLearning() functions with word-boundary negation markers
Memory loading: parseMemoryItems() reads committed decisions/learnings from .arete/memory/items/ for reconciliation context
Batch LLM review: Post-reconciliation batchLLMReview() provides semantic dedup against committed memory with prompt injection sanitization
Prior items wiring: Backend loads recent meeting items into the extraction exclusion list

Addresses ~40% commitment duplication, ~23% decision duplication, and ~18% learning duplication found in workspace audit.

Stats

Metric	Value
Tasks	9/9 (100% first-attempt)
Tests added	~48 new
Tests passing	475 (core) + 45 (backend)
Files changed	26 (+1,731 / -99)

Test plan

Core tests pass: npx tsx --test in packages/core (475 passing)
Backend tests pass (45 passing)
TypeScript typecheck clean (npx tsc --noEmit)
Code review: 11 fixes applied (type safety, edge cases, word-boundary regression)
Manual: Process an unprocessed meeting and verify reduced duplication + varying confidence scores

🤖 Generated with Claude Code

… batch LLM review Two-layer dedup architecture to reduce ~40% decision/learning duplication: 1. Prompt-level: self-review instructions, "what is NOT a decision/learning" exclusion lists, confidence guides, and trivial/garbage filters 2. Post-reconciliation: batchLLMReview() for semantic dedup against committed memory items parsed from .arete/memory/items/ Also wires up broken plumbing: real confidence scores (no longer hardcoded 0.9), prior items from recent meetings fed into extraction prompts, and committed decisions/learnings loaded into reconciliation context. 463 tests pass (core), 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…erage Fixes from engineering lead review: - Fix `undefined as number` cast in confidence arrays → honest `(number | undefined)[]` type - Fix `hasNegationMarkers` false positives on "notification", "another", "note" — use word-boundary regex - Fix timezone-sensitive date comparison in `parseMemoryItems` — compare as YYYY-MM-DD strings - Fix 150-char action-item length limit incorrectly applied to decisions/learnings — use lighter `isGarbageDecisionOrLearning` filter - Add prompt injection mitigation in `batchLLMReview` — sanitize/truncate text, strip braces - Improve JSON parsing in `batchLLMReview` — try direct parse, then strip fences, then regex - Cache `reconciliationContext` in agent.ts and CLI to avoid redundant I/O - Expand trivial learning patterns — personal location, birthday, favorites - Add 12 new tests: word-boundary negation, long decisions/learnings, loadReconciliationContext, expanded trivial patterns 475 tests pass, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two new tests for batchLLMReview integration in agent.ts: - Verifies dropped items get status='skipped', source='reconciled' - Verifies graceful degradation when LLM call fails Note: 3 pre-existing failures in agent.test.ts (confidence/dedup) are unrelated to this change (confirmed on clean main). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Updated LEARNINGS.md with 5 new gotchas from extraction-intelligence - Added memory entry with metrics, pre-mortem analysis, and learnings - Rebuilt dist to reflect all source changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

johnkoht and others added 4 commits April 9, 2026 14:06

johnkoht merged commit 095aee3 into main Apr 9, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extraction intelligence — two-layer dedup for meeting pipeline#12

feat: extraction intelligence — two-layer dedup for meeting pipeline#12
johnkoht merged 4 commits intomainfrom
worktree-extraction-intelligence

johnkoht commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johnkoht commented Apr 9, 2026

Summary

Stats

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant