Skip to content

feat: incremental analysis — watch mode, parse only changed files #27

@bntvllnt

Description

@bntvllnt

Summary

Add incremental parsing that only re-analyzes files changed since last index, plus a watch mode for real-time updates.

Motivation

  • Full re-parse on every run is wasteful for large repos
  • codebase-memory-mcp caches persistently and updates incrementally
  • Augment Code does initial index in 27 min, then incremental <20s
  • Watch mode enables real-time metrics during development
  • Essential for scaling to large repos (1000+ files)

Approach

  1. Diff-based parsing: Compare git HEAD with cached index HEAD
  2. Selective re-parse: Only parse files in git diff --name-only
  3. Graph patching: Remove old nodes/edges for changed files, add new ones
  4. Metric recomputation: PageRank/betweenness need full recompute (global), but blast radius can be scoped
  5. Watch mode: codebase-intelligence watch <path> — re-index on file save

Challenges

  • PageRank and betweenness are global metrics — can't be updated incrementally
  • Graph structure changes (new imports) affect transitive metrics
  • Tradeoff: approximate incremental metrics vs exact full recompute

Acceptance Criteria

  • --incremental flag skips unchanged files
  • watch CLI command for real-time updates
  • 10x faster re-analysis for small changes on large repos
  • Correctness: incremental results match full recompute (within tolerance for global metrics)
  • Tests comparing incremental vs full results

Priority

Long-term — Essential for large repo adoption but current caching handles most cases.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions