Skip to content

Latest commit

 

History

History
287 lines (214 loc) · 6.36 KB

File metadata and controls

287 lines (214 loc) · 6.36 KB

Project Structure

This document outlines the modular architecture of Smart Coding MCP.

Directory Structure

smart-coding-mcp/
├── index.js                    # Main entry point, MCP server setup
├── package.json                # Package configuration
├── config.json                 # User configuration
├── LICENSE                     # MIT License
├── README.md                   # Project documentation
├── EXAMPLES.md                 # Usage examples
├── .gitignore                  # Git ignore rules
│
├── lib/                        # Core libraries
│   ├── config.js              # Configuration loader
│   ├── cache.js               # Embeddings cache management
│   └── utils.js               # Shared utilities (chunking, similarity)
│
├── features/                   # Pluggable features
│   ├── hybrid-search.js       # Semantic search feature
│   ├── index-codebase.js      # Code indexing feature
│   └── clear-cache.js         # Cache management feature
│
└── scripts/                    # Utility scripts
    └── clear-cache.js         # Cache management utility

Module Responsibilities

index.js

  • MCP server initialization
  • Feature registry and orchestration
  • Tool request routing
  • Global state management (embedder, cache)

lib/config.js

  • Loads and validates configuration from config.json
  • Provides default configuration values
  • Resolves file paths

lib/cache.js

  • EmbeddingsCache class
  • Manages persistence of embedding vectors
  • File hash tracking for change detection
  • Load/save operations for disk cache

lib/utils.js

  • cosineSimilarity() - Vector similarity calculation
  • hashContent() - MD5 hashing for change detection
  • smartChunk() - Language-aware code chunking

features/hybrid-search.js

  • HybridSearch class
  • Combines semantic and exact matching
  • Weighted scoring algorithm
  • Result formatting with relevance scores
  • MCP tool: semantic_search

features/index-codebase.js

  • CodebaseIndexer class
  • File discovery via glob patterns
  • Incremental indexing
  • File watcher for real-time updates
  • MCP tool: index_codebase

Adding New Features

To extend with a new feature:

1. Create Feature Module

Create features/my-feature.js:

export class MyFeature {
  constructor(embedder, cache, config) {
    this.embedder = embedder;
    this.cache = cache;
    this.config = config;
  }

  async execute(params) {
    // Implementation
    return {
      /* results */
    };
  }
}

export function getToolDefinition(config) {
  return {
    name: "my_tool",
    description: "What this tool does",
    inputSchema: {
      type: "object",
      properties: {
        param1: { type: "string", description: "..." },
      },
      required: ["param1"],
    },
  };
}

export async function handleToolCall(request, instance) {
  const params = request.params.arguments;
  const result = await instance.execute(params);

  return {
    content: [
      {
        type: "text",
        text: JSON.stringify(result, null, 2),
      },
    ],
  };
}

2. Register in index.js

import * as MyFeature from "./features/my-feature.js";

// In initialize():
const myFeature = new MyFeature.MyFeature(embedder, cache, config);

// Add to features array:
const features = [
  // ... existing features
  {
    module: MyFeature,
    instance: myFeature,
    handler: MyFeature.handleToolCall,
  },
];

3. Done!

The feature will automatically:

  • Be listed in MCP tool discovery
  • Handle incoming tool requests
  • Have access to embeddings and cache

Configuration Flow

  1. User creates/edits config.json
  2. lib/config.js loads configuration on startup
  3. Configuration merged with defaults
  4. Passed to all features via constructor

Data Flow

Indexing Flow

User code files
    ↓
glob pattern matching
    ↓
smartChunk() - split into chunks
    ↓
embedder - generate vectors
    ↓
EmbeddingsCache - store in memory + disk

Search Flow

User query
    ↓
embedder - query to vector
    ↓
cosineSimilarity() - score all chunks
    ↓
exact match boost - adjust scores
    ↓
sort and filter - top N results
    ↓
format output - markdown with syntax highlighting

Performance Considerations

Caching Strategy

  • First Run: Download model (~90MB), index all files, save cache
  • Subsequent Runs: Load cache from disk, only index changed files
  • File Changes: Incremental updates via file watcher

Memory Usage

Approximate memory usage:

  • Base (Node.js + libraries): ~50MB
  • Embedding model: ~100MB
  • Vector store: ~10KB per code chunk
  • Example: 1000 files × 20 chunks/file = ~200MB

Optimization Tips

  • Reduce chunkSize for large codebases
  • Disable watchFiles if not needed
  • Use excludePatterns aggressively
  • Limit fileExtensions to relevant types

Future Feature Ideas

Potential features to add following this architecture:

  1. Code Complexity Analysis

    • Cyclomatic complexity scoring
    • Technical debt detection
  2. Pattern Detection

    • Anti-pattern identification
    • Best practice recommendations
  3. Documentation Generation

    • Auto-generate function docs
    • README generation from code
  4. Refactoring Suggestions

    • Code smell detection
    • Automated fix suggestions
  5. Test Coverage Analysis

    • Identify untested code paths
    • Generate test templates
  6. Dependency Analysis

    • Import/export graph
    • Dead code detection

Each feature would follow the same pattern:

  • Class in features/ directory
  • Access to embedder, cache, config
  • MCP tool definition and handler
  • Registration in feature array

Testing Strategy

Recommended testing approach:

  1. Unit Tests: lib/ modules

    • Test utilities in isolation
    • Mock dependencies
  2. Integration Tests: features/

    • Test with sample codebases
    • Verify MCP tool contracts
  3. E2E Tests: Full workflow

    • Index → Search → Results
    • File watching behavior
    • Cache persistence

Error Handling

Each module follows defensive error handling:

  • Config errors → use defaults
  • File read errors → log and skip
  • Embedding errors → retry or skip chunk
  • Cache errors → log but continue
  • Unknown tools → return helpful error message

All errors logged to stderr for MCP protocol compatibility.