Fullstack Gen AI PDF Context-Aware Chatbot

Overview

This is a full-stack Gen AI application that functions as a PDF context-aware chatbot. Users can upload PDF documents and ask questions related to their content. The application leverages a Large Language Model (LLM) to provide relevant answers based on the PDF context.

Setup

For Backend:

Navigate to the backend directory:
```
cd backend
```
Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
venv\Scripts\activate
```
- Linux:
```
source venv/bin/activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Run the FastAPI application:

uvicorn main:app --reload --host 0.0.0.0 --port 8000

For Frontend:

Navigate to the frontend directory:
```
cd frontend
```
Install the npm dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```

API Endpoints

/upload (POST)

Receives files via a POST request.
Extracts text content from the uploaded PDF using pymupdf.
Divides the extracted text into smaller chunks.
Generates vector embeddings for each chunk using Langchain.
The text extraction and embedding processes are executed in separate threads to handle large files and CPU-intensive tasks efficiently.
Stores file metadata (e.g., filename, upload timestamp) in a SQLite database.
Saves the generated vector embeddings in the Fasis vector database.

/ask (POST)

Retrieves relevant vector embeddings from the Fasis index based on the user's question.
Receives the user's question via the POST request body.
Combines the retrieved context (from the PDF) and the user's question.
Sends this combined information to the LLM.
Returns the LLM's response to the user.

Tools and Frameworks

Frontend:
- Vite: A fast build tool and development server.
- Tailwind CSS: A utility-first CSS framework.
- React: A JavaScript library for building user interfaces.
Backend:
- FastAPI: A modern, high-performance web framework for building APIs with Python.
- PyMuPdf: A Python library for working with PDF and XPS documents.
- Langchain: A framework for developing applications powered by language models.
Database:
- Fasis: A vector database for efficient similarity search.
- SQLite: A lightweight, disk-based database for storing metadata.

Demo Video

AppDemo.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fullstack Gen AI PDF Context-Aware Chatbot

Overview

Setup

API Endpoints

Tools and Frameworks

Demo Video

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Fullstack Gen AI PDF Context-Aware Chatbot

Overview

Setup

API Endpoints

Tools and Frameworks

Demo Video