- Implemented Ollama integration for AI-powered code explanations with GPU acceleration
- New
ollama_llm.pymodule withOllamaLLMclass for communication with local Ollama service - Added
--ollamaflag to enable Ollama-enhanced query responses - Added
--ollama-modelparameter to specify which Ollama model to use (default: llama3) - Added
--max-tokensparameter to control response length - New
generate_answer_ollama()function inrag.pyfor Ollama-based RAG responses - Comprehensive documentation in
docs/ollama_integration.md - Full test suite for Ollama integration in
tests/test_ollama.py,tests/test_rag_ollama.py, andtests/test_cli_ollama.py
- New
- Implemented
cmd_query()CLI function for semantic search with support for hybrid search and documentation merging - Enhanced autoplan with index-driven section discovery:
_discover_key_modules(): Identifies core modules by symbol count_discover_key_classes(): Finds important classes by structural complexity_discover_key_functions(): Locates critical functions via call graph analysis_discover_patterns(): Detects architectural patterns (testing, configuration, API, data layer)
- Added
PlanningRuledataclass for declarative section planning rules - Refactored
generate_plan()to support both rule-based and index-discovered sections
- Autoplan now generates documentation sections dynamically from indexed codebase rather than hardcoded templates
generate_plan()accepts optionalindex_dirparameter to enable index-based discovery- Improved code organization with separation of concerns in planning logic
- README.md updated to reflect index-driven documentation planning and Ollama integration
- CLI argument parser enhanced with new Ollama-specific options
- Fixed missing
cmd_queryfunction that was causing NameError in CLI - Added missing imports (
json,sqlite3) to cli.py - Fixed unpacking error in query result formatting (6 fields instead of 7)
- External library documentation indexing (PyPI + local site-packages)
- New public Python wrappers:
IndexerandSearcherfor programmatic indexing and querying - New
semindex.docspackage with automated documentation generation - Added
scripts/gen_docs.pyfor generating wiki documentation - Implemented
LocalLLMwith automatic model download for offline documentation generation - Added
OpenAICompatibleLLMfor remote LLM integration
- Extended
store.pyschema to manage docs-specific tables and FAISS index cli.pyupdated to support docs indexing and retrieval- Improved error handling for external service failures
- Added a pluggable language adapter registry in
semindex.languagesthat powers automatic discovery of supported file types and supports runtime registration of custom adapters. - Introduced an optional Tree-sitter powered JavaScript adapter that is
registered when
tree_sitter_languagesis available, expanding multi-language indexing support. - Expanded Tree-sitter backed adapters to cover
javascript,java,typescript,csharp,cpp,c,go,php,shell,rust, andruby(12 languages total with extras) and upgraded the JavaScript adapter to emit class/function symbols via the Tree-sitter AST. - External library documentation indexing (PyPI + local site-packages). Docs are parsed (HTML/Markdown), normalized, embedded, and stored in dedicated tables (
doc_packages,doc_pages,doc_vectors) and a separate FAISS indexdocs.faiss. CLI:--include-docsforindexandquery, with--docs-weightto control ranking merge. - New public Python wrappers:
Indexer(semindex.indexer.Indexer) andSearcher(semindex.search.Searcher) for programmatic indexing and querying, including hybrid search and optional docs merging. - New
semindex.docspackage exposinggenerate_plan(), graph builders, and Mermaid utilities to power automated documentation. - Added
scripts/gen_docs.pyCLI for generating wiki documentation from graphs, repo statistics, and LLM-authored narratives. - Implemented
LocalLLMwith automatic TinyLlama GGUF download (configurable viaSEMINDEX_LLM_*env vars) for offline documentation generation. - Added
remote_llm.pywithOpenAICompatibleLLM+resolve_groq_config()to integrate Groq/OpenAI-compatible endpoints, and surfaced--remote-llmCLI options ingen_docs.py. - Added automated planner tests in
tests/test_autoplan.pyand CLI coverage intests/test_gen_docs_cli.py.
- Incremental indexing now reuses the adapter registry so mixed-language repositories are handled consistently in both fresh and incremental runs.
- Extended
store.pyschema and index reset logic to manage docs-specific tables and FAISS index. cli.pyupdated to optionally index docs after code indexing and to merge doc results at query time.README.mdandROADMAP.mdupdated with the documentation generator workflow, LLM configuration, and dependency-group guidance.pyproject.tomlnow credits OpenSource Syndicate as the author and introduces alanguagesdependency group for uv-based installs.
0.2.0 - 2025-10-06
- Hybrid search functionality combining dense vector search and keyword search using Elasticsearch
- Reciprocal Rank Fusion (RRF) to combine search results from multiple sources
- Semantic-aware chunking using CAST-like algorithm for better context preservation
- Incremental indexing to only re-process changed files using file hashing
--hybridflag to enable hybrid search in query command--chunkingoption to select between symbol-based and semantic chunking--similarity-thresholdparameter for controlling semantic chunking sensitivity--incrementalflag for incremental indexing mode- Comprehensive test suite for all new features
- Enhanced CLI to support new hybrid search and chunking options
- Updated indexing process to support both fresh and incremental indexing
- Improved error handling for Elasticsearch connection failures
- Refactored chunking module to support both traditional and semantic-aware chunking
- Added proper hash-based comparison for Symbol class
- Symbol hashability issue that was causing errors in dictionary lookups
- Various mocking issues in tests to properly isolate functionality
- Added elasticsearch dependency for keyword search functionality
- Added sentence-transformers for semantic similarity calculations
0.1.0 - 2025-10-05
- Initial release of semindex
- AST-based parsing for Python code
- Vector search using FAISS
- CLI interface for indexing and querying
- Basic chunking by function/class boundaries
- SQLite for metadata storage