Thank you for your interest in contributing to Concept-RAG! This document provides guidelines and information for contributors.
Before creating an issue, please:
- Search existing issues to avoid duplicates
- Include relevant details:
- Node.js version (
node --version) - Operating system
- Steps to reproduce
- Expected vs actual behavior
- Error messages or logs
- Sample documents (if applicable)
- Node.js version (
Enhancement suggestions are welcome! Please:
- Check the roadmap to see if it's already planned
- Describe the use case - explain why this would be useful
- Provide examples - show how it would work
- Consider backwards compatibility - how it affects existing users
We love pull requests! Here's the process:
- Discuss major changes - Open an issue first for significant changes
- Check existing PRs - Someone might already be working on it
- Review the codebase - Understand the existing patterns and architecture
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/concept-rag.git
cd concept-rag
# Install dependencies
npm install
# Build the project
npm run build
# Setup WordNet (required for development)
pip3 install nltk
python3 -c "import nltk; nltk.download('wordnet'); nltk.download('omw-1.4')"
# Configure environment
cp .env.example .env
# Edit .env and add your OpenRouter API key-
Create a branch:
git checkout -b feature/your-feature-name # or git checkout -b fix/your-bug-fix -
Follow code style:
- Use TypeScript strict mode
- Follow existing naming conventions
- Add type definitions for all functions
- Use meaningful variable and function names
- Keep functions focused and concise
-
Write tests (when applicable):
# Run tests npm test
-
Update documentation:
- Update README.md if adding features
- Update USAGE.md for user-facing changes
- Add JSDoc comments for new functions
- Update api-reference.md if adding/modifying tools
-
Test your changes:
# Build npm run build # Test with sample documents npx tsx hybrid_fast_seed.ts --dbpath ./test-db --filesdir sample-docs --overwrite # Test with MCP Inspector npx @modelcontextprotocol/inspector dist/conceptual_index.js ./test-db
Write clear, descriptive commit messages:
# Good examples
git commit -m "Add support for markdown document processing"
git commit -m "Fix concept extraction for documents >200k tokens"
git commit -m "Update README with new tool examples"
# Less helpful
git commit -m "Update code"
git commit -m "Fix bug"
git commit -m "Changes"Format: <type>: <description>
Types:
feat: New featurefix: Bug fixdocs: Documentation changesrefactor: Code refactoringtest: Adding or updating testschore: Maintenance tasks
-
Push your branch:
git push origin feature/your-feature-name
-
Create the pull request:
- Use a clear, descriptive title
- Reference related issues (e.g., "Fixes #123")
- Describe what changed and why
- Include screenshots/examples if relevant
- List any breaking changes
-
PR checklist:
- Code builds successfully (
npm run build) - Tests pass (if applicable)
- Documentation updated
- No API keys or secrets in code
- Follows existing code style
- Breaking changes documented
- Code builds successfully (
concept-rag/
├── src/
│ ├── conceptual_index.ts # MCP server entry point
│ ├── concepts/ # Concept extraction & management
│ │ ├── concept_extractor.ts # LLM-based extraction
│ │ ├── concept_index.ts # Concept indexing
│ │ ├── concept_chunk_matcher.ts # Matching concepts to chunks
│ │ └── query_expander.ts # Query expansion with WordNet
│ ├── lancedb/ # Database clients
│ │ └── conceptual_search_client.ts
│ ├── wordnet/ # WordNet integration
│ │ └── wordnet_service.ts # Python NLTK bridge
│ └── tools/ # MCP tools
│ ├── conceptual_registry.ts # Tool registration
│ └── operations/ # Individual tools
├── hybrid_fast_seed.ts # Database seeding script
├── prompts/ # LLM prompts (editable!)
│ └── concept-extraction.txt # Concept extraction prompt
└── scripts/ # CLI utilities
├── extract_concepts.ts
├── rebuild_indexes.ts
└── repair_missing_concepts.ts
-
Concept Extraction (
src/concepts/concept_extractor.ts)- Uses Claude Sonnet 4.5 for extraction
- Multi-pass processing for large documents
- Formal concept model for quality
-
Search Engine (
src/lancedb/conceptual_search_client.ts)- Multi-signal hybrid ranking
- Vector + BM25 + concept + WordNet
- LanceDB for vector storage
-
MCP Tools (
src/tools/operations/)- Each tool is a separate operation
- Registered in
conceptual_registry.ts - Follow BaseTool interface
-
WordNet Integration (
src/wordnet/wordnet_service.ts)- Python subprocess bridge
- Synonym expansion
- Hierarchical concept navigation
- Modularity: Each component has a single responsibility
- Type Safety: Full TypeScript with strict mode
- Error Handling: Graceful degradation and clear error messages
- Performance: Incremental processing, efficient indexing
- Extensibility: Easy to add new tools or modify extraction
// ✅ Good: Clear types, descriptive names
export async function extractConcepts(
content: string,
model: string = DEFAULT_MODEL
): Promise<ExtractedConcepts> {
// Implementation
}
// ❌ Avoid: Unclear types, vague names
export async function extract(c: string, m?: string): Promise<any> {
// Implementation
}// ✅ Good: Specific error messages
try {
await processDocument(path);
} catch (error) {
console.error(`Failed to process document ${path}:`, error.message);
throw new Error(`Document processing failed: ${error.message}`);
}
// ❌ Avoid: Silent failures or generic errors
try {
await processDocument(path);
} catch (error) {
// Silent failure
}// ✅ Good: Clear JSDoc with examples
/**
* Extracts concepts from document content using LLM.
*
* @param content - Document text to analyze
* @param model - OpenRouter model name (default: Claude Sonnet 4.5)
* @returns Extracted concepts organized by type
*
* @example
* ```ts
* const concepts = await extractConcepts(documentText);
* console.log(concepts.primary_concepts);
* ```
*/
export async function extractConcepts(
content: string,
model: string = DEFAULT_MODEL
): Promise<ExtractedConcepts> {
// Implementation
}-
Test with sample documents:
- Use
sample-docs/for testing - Test with various PDF types (text, scanned, corrupted)
- Verify concept extraction quality
- Use
-
Test MCP integration:
- Use MCP Inspector for interactive testing
- Test all 10 tools
- Verify error handling
-
Test edge cases:
- Empty documents
- Very large documents (>100k tokens)
- Documents with special characters
- Corrupted PDFs
When adding tests (future feature):
// test/concept_extraction.test.ts
describe('Concept Extraction', () => {
it('should extract concepts from sample document', async () => {
const concepts = await extractConcepts(sampleText);
expect(concepts.primary_concepts.length).toBeGreaterThan(0);
});
});Never commit API keys!
# ✅ Good: Use environment variables
const apiKey = process.env.OPENROUTER_API_KEY;
# ✅ Good: Add to .gitignore
echo ".env" >> .gitignore- Review dependencies before adding
- Use
npm auditto check for vulnerabilities - Keep dependencies up to date
- Never log document content
- Sanitize file paths in logs
- Be mindful of PII in error messages
- Add automated tests
- Support for more document formats (DOCX, TXT, Markdown)
- Performance benchmarks
- CLI improvements
- Better error recovery
- Support for other embedding models
- Configurable concept extraction prompts
- Export formats (CSV, Excel)
- Visualization tools for concepts
- Integration with other MCP clients
- Video tutorials
- More examples
- API reference
- Architecture diagrams
- Performance optimization guide
- Web UI for exploring concepts
- Concept similarity analysis
- Document clustering
- Multi-language support
- Incremental re-indexing
- Improve documentation and examples
- Add more test cases
- Fix typos or formatting
- Update dependencies
- Add code comments
- Add support for new document formats
- Improve error handling
- Optimize database queries
- Add CLI features
- Refactor existing code
- Implement new MCP tools
- Enhance concept extraction algorithms
- Add new search ranking signals
- Optimize performance
- Design new features
- GitHub Issues: For bugs and feature requests
- Discussions: For questions and ideas
- Pull Requests: For code review and feedback
- Issues: We'll respond within 2-3 days
- Pull requests: We'll review within 1 week
- Security issues: We'll respond within 24 hours
- Be respectful and inclusive
- Welcome newcomers
- Accept constructive criticism
- Focus on what's best for the project
- Show empathy towards others
- Harassment or discrimination
- Trolling or insulting comments
- Personal or political attacks
- Publishing others' private information
- Other unprofessional conduct
By contributing to Concept-RAG, you agree that your contributions will be licensed under the MIT License.
Contributors will be:
- Credited in release notes
- Recognized in the project README
Thank you for contributing to Concept-RAG! 🎉