Skip to content

naveenkanaparthi-git/contextual-code-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contextual Code Search

Python License

Problem Statement

Developers often need to search through extensive codebases to find relevant code snippets that match a specific functionality or concept. Traditional keyword-based search systems fall short in understanding the semantics of code and natural language queries, leading to inefficient and time-consuming search results. Contextual Code Search aims to solve this problem by providing a semantic search capability using vector embeddings, allowing for more accurate and relevant code retrieval.

Features

  • Semantic Search: Find relevant code snippets based on natural language queries.
  • Advanced Embeddings: Utilizes state-of-the-art transformer models to generate code embeddings.
  • API Service: Integrates search functionality into development environments via a FastAPI service.
  • Efficient Storage & Retrieval: Scalable and efficient vector database management using Faiss.

Installation

To set up the Contextual Code Search project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/yourusername/contextual-code-search.git
    cd contextual-code-search
  2. Build the Docker container:

    docker build -t contextual-code-search .
  3. Run the Docker container:

    docker run -p 8000:8000 contextual-code-search

Usage

Once the Docker container is running, you can start using the API for code search:

Example: Search for a Code Snippet

import requests

response = requests.post("http://localhost:8000/search", json={"query": "transform a list into a dictionary"})
results = response.json()

for result in results:
    print(result)

Project Structure

contextual-code-search/
│
├── app/
│   ├── main.py
│   ├── models.py
│   └── utils.py
│
├── data/
│   └── code_snippets.csv
│
├── tests/
│   ├── test_api.py
│   └── test_embeddings.py
│
├── Dockerfile
├── requirements.txt
└── README.md

Tech Stack

  • Python 3.11: As the primary programming language.
  • Pandas: For data manipulation and analysis.
  • Faiss: For efficient similarity search and clustering of dense vectors.
  • Transformers: For generating vector embeddings using state-of-the-art models.
  • FastAPI: For building the API service.
  • Docker: For containerization and easy deployment.

Contributing

We welcome contributions from the community! To contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes and commit them (git commit -m 'Add new feature').
  4. Push to the branch (git push origin feature-branch).
  5. Open a Pull Request.

Please ensure that your code follows the project's coding standards and include tests for any new features.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

About

Semantic search for code snippets using vector embeddings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages