🚀 Multi-Codebase AI Query Agent

An enterprise-grade, conversational codebase intelligence agent built on top of Streamlit, LangChain, and Google Gemini GenAI. This system acts as a native engineering copilot capable of interpreting antiquated architectures (Fortran, COBOL, ALGOL) and modern standards (Rust, Python, Go) locally.

Instead of navigating massive legacy codebases manually, this app vectorizes raw code files using the highly advanced gemini-embedding-2 model, securely storing them locally with FAISS-CPU. It utilizes multi-history Retrieval-Augmented Generation (RAG) tied directly to a hyper-optimized context length via gemini-2.5-flash.

✨ Core Features

Multi-Project Management Dashboard: Upload and ingest various distinct codebases seamlessly. The UI natively separates file states so you can seamlessly query a COBOL-Banking node, and switch to a Rust-Microservice node without polluting the response generation context.
Generative Query Architecture (RAG): Extracts and cites localized file sources whenever answering queries allowing you to confidently view the exact snippet the model derived its architecture understanding from.
Implicit Multilingual Abstraction: Built on standard byte-abstractions, the ingestion pipeline implicitly supports all underlying syntax rules ranging securely between C++, Dart, LISP, and plain text documentation.
Interactive Deletion Control: Included natively inside the Dashboard is an explicit single-click Delete Codebase button allowing rapid purging of specific persistent vector stores upon task completion.
General Chat Copilot with Context Guardrails: Supports a global [General Chat] module for asking abstract programming questions or inserting sub-5 line standalone code blocks. The general chat module retains complete conversation history safely, while strictly refusing non-programming trivial queries (like politics or cooking) through implemented guardrails.
Stateless Automated Security Purge: To prevent excessive local caching and limit cloud host provisioning bills, the main application loop implicitly checks for the arrival of Midnight / 00:00 Daily. It triggers a completely stateless wipe of all vectors and documents natively. No background daemons required; this natively secures Streamlit Community Cloud instances implicitly!

🛠 Required Technologies

Ensure you have the following prerequisites activated within your development environment:

Python 3.10+
Active Google Generative AI Developer Key provisioned for Flash endpoint access.

⚙️ How to Setup

1. Clone the project locally

git clone https://github.com/your-username/ai-legacy-code-system.git
cd ai-legacy-code-system

2. Configure Local Virtual Environment and Dependencies

It is strictly recommended to keep variables isolated utilizing standard python environments.

# Initialize and activate the virtual environment
python -m venv venv
venv\Scripts\activate   # Use `source venv/bin/activate` for unix execution

# Map and install structural modules
pip install -r requirements.txt

3. Mount Secrets

Copy the default environmental structures securely into a .env deployment block utilizing .env.example. This protects your API credentials dynamically from public Git tracker commits via standard .gitignore practices.

cp .env.example .env

Once generated, assign your explicit API Key utilizing your favorite editor. GOOGLE_API_KEY="AIzaSyXXXXXXXXXXXXXXXXXX"

4. Deploy Subsystems

Because the ecosystem natively tracks and controls autonomous schedules internally, all you need to do is fire up the UI Gateway!

streamlit run app.py

🏗 Sub-Directory Architecture

app.py: Primary Streamlit orchestration node mapping UI interactions, pipeline embedding executions, and automated daily purges completely natively.
vector_stores/: Implicitly tracks, hosts, and preserves structural .faiss vector arrays securely indexed to specific codebase aliases upon user-upload.
uploaded_data/: Caching mechanism natively preserving absolute file architectures mirroring project uploads seamlessly protecting original code copies.

Note: The storage directory mechanisms will be generated implicitly initialized upon runtime execution!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
legacy_data		legacy_data
uploaded_data		uploaded_data
vector_stores		vector_stores
.env		.env
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Multi-Codebase AI Query Agent

✨ Core Features

🛠 Required Technologies

⚙️ How to Setup

1. Clone the project locally

2. Configure Local Virtual Environment and Dependencies

3. Mount Secrets

4. Deploy Subsystems

🏗 Sub-Directory Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Multi-Codebase AI Query Agent

✨ Core Features

🛠 Required Technologies

⚙️ How to Setup

1. Clone the project locally

2. Configure Local Virtual Environment and Dependencies

3. Mount Secrets

4. Deploy Subsystems

🏗 Sub-Directory Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages