This document is for contributors and curious engineers who want to understand why the code is structured this way.
Traditional RAG (Retrieval-Augmented Generation) treats code as text chunks. It splits files by line count or character limit and throws chunks at a vector database.
LVM treats code as a Directed Acyclic Graph (DAG) of typed Symbols. Every function, class, and interface is a node. Every call-site or type reference is a directed edge. The LLM navigates this graph on demand — it never receives more than it needs.
This distinction matters for three reasons:
- Correctness — Semantic boundaries (function start/end) are always respected. No half-functions, no split type definitions.
- Determinism — Every symbol has a stable
@LVM-ID. The AI can say "replace the body offn_loginUser_42" instead of "find the function on line 47." - Efficiency — Compressed skeletons are 10–20x smaller than raw source, and the compression ratio improves as the codebase grows.
┌──────────────────────────────────────────────────────────────────────┐
│ Request Path │
│ │
│ File System ──► IgnoreManager ──► tree-sitter Parser │
│ │ │
│ ScanResult │
│ (symbols + imports) │
│ │ │
│ ┌──────────┴───────────┐ │
│ SymbolGraph SymbolResolver │
│ (DAG of IDs) (name → CodeSymbol) │
│ └──────────┬───────────┘ │
│ │ │
│ CondenserEngine │
│ │ │
│ ┌────────────────────┼─────────────────┐ │
│ CLI Scan MCP Server VS Code Ext │
│ (efficiency (hydrate_context, (Ghost Mode, │
│ report) get_skeleton) status bar) │
└──────────────────────────────────────────────────────────────────────┘
Responsibility: Prune the file tree before any parsing happens.
Loads rules from (in priority order):
- Hard-coded defaults (
node_modules,dist,*.min.js, etc.) .gitignorein the project root.lvmignorein the project root
Uses the ignore npm package — the same engine as git — so behavior is byte-for-byte identical to what developers expect.
Why this is first: Without pruning, scanning a React project takes ~30 seconds (node_modules has 50k+ JS files). With pruning, it takes < 500ms.
Responsibility: Convert raw source code into a structured ScanResult.
Uses tree-sitter's C-native parser via Node bindings. The traversal is a single depth-first walk — O(n) where n = AST node count.
In one pass it captures:
| Capture | Node Types |
|---|---|
| Import map | import_declaration |
| Definitions | function_declaration, method_definition, arrow_function, class_declaration, interface_declaration, type_alias_declaration |
| Call dependencies | call_expression (base identifier extracted) |
| Type dependencies | type_identifier |
The context stack pattern:
const contextStack: CodeSymbol[] = [];
// When we ENTER a definition → push onto stack
contextStack.push(currentSymbol);
// When we see a call_expression → attribute to stack top
activeParent.dependencies.push(calleeName);
// When we EXIT a definition → pop from stack
contextStack.pop();This gives us lexical scoping for free without any complex scope resolution logic.
Responsibility: Maintain the DAG of all indexed symbols.
┌──────────────┐ calls ┌──────────────────┐
│ loginUser │ ──────────► │ hashPassword │
└──────────────┘ └──────────────────┘
│ uses type │ uses type
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Credentials │ │ string │
└──────────────┘ └──────────────────┘
Key method: getRequiredContext(id, depth)
BFS traversal up to depth hops. At depth 0 you get the target. At depth 1 you get the target + all direct dependencies. At depth 2 you get the full subgraph.
This powers the "smart hydration" feature — the LLM can request a function and automatically receive its type definitions without a second round-trip.
Responsibility: Map a raw identifier string to its CodeSymbol definition, possibly across files.
Resolution priority:
1. Local definition (same file) → unambiguous
2. Named import match → follow the import path
3. Single global match → safe assumption for SaaS monorepos
4. null → ambiguous, skip dependency link
The resolver is what makes multi-file dependency linking work. Without it, userService found in controller.ts would never be connected to the class definition in services/user.ts.
Responsibility: The public API. Orchestrates all other components.
Key design decisions:
indexSourceis async and parallel — usesPromise.allover directory entries for fast bulk indexing- Incremental re-indexing —
indexFile(path)can be called on a single file (used by VS Code's on-save hook) without re-scanning the whole project - Stats accumulation —
rawTokenTotalandcondensedTokenTotalgrow across all indexed files, powering the efficiency report without storing raw source in memory
Responsibility: Translate between the MCP JSON-RPC protocol and the CondenserEngine.
The server exposes three tools and one prompt:
hydrate_context— core hydration with optional depthget_skeleton— file-level skeleton viewefficiency_report— session statslvm-systemprompt — auto-injects operating instructions so the LLM knows how to use the tools
The system prompt is delivered via MCP's prompts endpoint, which means Claude Desktop injects it automatically — the user doesn't need to copy-paste anything.
interface CodeSymbol {
id: string; // "src/auth.ts:loginUser:42" (stable, deterministic)
name: string; // "loginUser"
type: SymbolType; // 'function' | 'class' | 'interface' | ...
filePath: string; // "/abs/path/src/auth.ts"
startLine: number; // 0-indexed
endLine: number;
signature: string; // Everything before the opening brace
fullBody: string; // The complete node text (stored locally, never sent unless hydrated)
dependencies: string[]; // Raw identifier names found inside this symbol
tokenCount: number; // Estimated token cost of fullBody
}<absolute-file-path>:<symbol-name>:<start-char-offset>
The offset (not line number) is used because it's stable even if comments are added above a function. Line numbers shift; character positions within a symbol do not.
| Operation | Complexity | Typical time (10k file repo) |
|---|---|---|
indexSource (full scan) |
O(n·k) where k = avg file size | ~2–4s |
indexFile (single file) |
O(k) | < 50ms |
generateSkeleton |
O(s) where s = symbols in file | < 1ms |
hydrateSymbol |
O(1) (hashmap lookup) | < 0.1ms |
getRequiredContext |
O(V + E) BFS | < 5ms |
- Anonymous functions — Arrow functions without a named assignment get an
anon_<offset>ID. These work correctly but produce less readable IDs. - Dynamic imports —
require()andimport()calls are not traced (only staticimportdeclarations are mapped). - Metaprogramming — Decorators and
Proxy-based code cannot be statically resolved. - Language coverage — Currently TypeScript/JavaScript only. The parser module is designed for extension via tree-sitter grammar swapping.
- Install the tree-sitter grammar:
npm install tree-sitter-python - Create
packages/core/src/parser/python.tsmirroringtree-sitter-logic.ts - Map Python-specific node types (
def,class,import_from) toSymbolType - Register the new extension in
IgnoreManager.PARSEABLE_EXTENSIONS - Add a dispatch in
CondenserEngine.indexFilebased on file extension
See CONTRIBUTING.md for the full guide.