Skip to content

feat: multi-language support via tree-sitter (Python, Go) #20

@bntvllnt

Description

@bntvllnt

Summary

Add tree-sitter-based parsing for Python and Go alongside the existing TypeScript Compiler API parser. TypeScript-only support locks out 80%+ of the market.

Motivation

  • codebase-memory-mcp supports 64 languages via tree-sitter — single Go binary
  • CKB uses SCIP indexes for multi-language support
  • CodePathfinder is Python-only but proves demand for non-TS analysis
  • Every competitive analysis perspective flagged single-language as the existential risk
  • Python is the most common language in AI agent codebases

Approach

Phase 1: Parser Abstraction

  • Extract parser interface from current TS Compiler API implementation
  • Define ParsedFile contract that both parsers satisfy
  • Keep TS Compiler API for TypeScript (higher quality: type resolution, call sites)

Phase 2: tree-sitter Integration

  • Add tree-sitter + language grammars as dependencies
  • Implement parser for Python (functions, classes, imports, exports)
  • Implement parser for Go (functions, structs, imports, packages)
  • Map tree-sitter AST nodes to existing ParsedFile interface

Phase 3: Graph Unification

  • Multi-language files in same graph
  • Cross-language edges (e.g., Python calling TS via API)
  • Metrics compute identically regardless of source language

Tradeoffs

Aspect TS Compiler API tree-sitter
Type resolution Full None
Call site confidence type-resolved text-inferred only
Language support TypeScript only 64+ languages
Parse speed Slower Faster
Dependency typescript npm tree-sitter + grammars

Acceptance Criteria

  • Python files parsed: functions, classes, imports, exports
  • Go files parsed: functions, structs, imports, packages
  • Mixed-language repos produce unified graph
  • All existing metrics compute on non-TS files
  • TS Compiler API still used for .ts/.tsx files (no quality regression)
  • Tests for Python and Go parsing with real fixture files

Priority

Short-term — Critical for market viability. Start with Python.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions