HackNexus AI Document Assistant - Backend

Production-ready FastAPI backend for document Q&A using RAG (Retrieval-Augmented Generation) with Gemini AI.

Features

📄 Document Upload: PDF and DOCX support with validation
🔍 Semantic Search: FAISS vector database for fast similarity search
🤖 AI-Powered Q&A: Gemini 2.5 Flash for accurate, grounded answers
🎯 Source Citations: Returns page numbers and snippets
🔒 Error Handling: Comprehensive validation and error responses
🌐 CORS Enabled: Ready for frontend integration

Tech Stack

Framework: FastAPI + Uvicorn
AI: Google Gemini 2.0 Flash (via google-genai SDK)
Vector DB: FAISS for embedding storage
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
Document Parsing: PyPDF2, python-docx

Setup Instructions

1. Install Dependencies

cd backend
pip install -r requirements.txt

2. Configure Environment

Create a .env file in the backend directory:

GEMINI_API_KEY=your_gemini_api_key_here

Get your API key from: https://aistudio.google.com/apikey

3. Run the Server

uvicorn app.main:app --reload --env-file .env

The API will be available at: http://localhost:8000

API Endpoints

Health Check

GET /

Response:

{
  "status": "Backend running successfully",
  "team": "HackNexus",
  "service": "AI Document Assistant"
}

Upload Document

POST /upload/
Content-Type: multipart/form-data

Request:

file: PDF or DOCX file (max 20MB)

Response:

{
  "message": "Document uploaded and indexed successfully",
  "total_chunks": 42
}

Ask Question

POST /query/
Content-Type: application/json

Request:

{
  "question": "What is the refund policy?"
}

Response:

{
  "answer": "Customers can request a refund within 30 days of purchase.",
  "confidence": 0.82,
  "sources": [
    {
      "page": 3,
      "snippet": "Refunds accepted within 30 days of purchase..."
    }
  ]
}

Project Structure

backend/
├── app/
│   ├── main.py              # FastAPI app with CORS
│   ├── config.py            # Environment & settings
│   ├── models/
│   │   └── schemas.py       # Pydantic models
│   ├── routes/
│   │   ├── upload.py        # Document upload endpoint
│   │   └── query.py         # Q&A endpoint
│   ├── services/
│   │   ├── document_loader.py   # PDF/DOCX parsing
│   │   ├── vector_store.py      # FAISS operations
│   │   └── qa_engine.py         # Gemini RAG pipeline
│   └── utils/
│       └── references.py    # Helper functions
├── data/
│   ├── uploads/             # Temporary file storage
│   └── vector_db/           # FAISS index storage
├── .env                     # Environment variables
├── requirements.txt         # Python dependencies
└── README.md               # This file

Key Features Explained

1. CORS Configuration

Allows frontend connections from multiple origins
Handles preflight requests
Returns proper headers even on errors

2. Document Processing

Validates file types and sizes
Extracts text page-by-page
Creates overlapping chunks for better context

3. Vector Storage

Generates 384-dim embeddings
Stores in FAISS for fast retrieval
Persists to disk automatically

4. RAG Pipeline

Question → Embedding
Similarity search → Top 3 chunks
Build context with page numbers
Gemini generates grounded answer
Return with confidence & sources

Troubleshooting

CORS Errors

Make sure your frontend URL is listed in main.py CORS middleware.

Gemini API Errors

Check your API key in .env
Verify internet connection
Check quota limits at Google AI Studio

Upload Fails

Check file size (<20MB)
Verify file extension (.pdf or .docx)
Check backend logs for errors

No Documents Found

Upload a document first
Check if FAISS index was created in data/vector_db/

Environment Variables

Variable	Required	Description
`GEMINI_API_KEY`	✅ Yes	Google Gemini API key

Development

Run in Debug Mode

uvicorn app.main:app --reload --env-file .env --log-level debug

View Logs

All operations print to console with emoji indicators:

✅ Success
❌ Error
⚠️ Warning
🔄 Processing
📦 Loading

Production Deployment

For production, consider:

Use proper secrets management
Add authentication
Implement rate limiting
Add request logging
Use PostgreSQL for metadata
Deploy with Docker

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HackNexus AI Document Assistant - Backend

Features

Tech Stack

Setup Instructions

1. Install Dependencies

2. Configure Environment

3. Run the Server

API Endpoints

Health Check

Upload Document

Ask Question

Project Structure

Key Features Explained

1. CORS Configuration

2. Document Processing

3. Vector Storage

4. RAG Pipeline

Troubleshooting

CORS Errors

Gemini API Errors

Upload Fails

No Documents Found

Environment Variables

Development

Run in Debug Mode

View Logs

Production Deployment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HackNexus AI Document Assistant - Backend

Features

Tech Stack

Setup Instructions

1. Install Dependencies

2. Configure Environment

3. Run the Server

API Endpoints

Health Check

Upload Document

Ask Question

Project Structure

Key Features Explained

1. CORS Configuration

2. Document Processing

3. Vector Storage

4. RAG Pipeline

Troubleshooting

CORS Errors

Gemini API Errors

Upload Fails

No Documents Found

Environment Variables

Development

Run in Debug Mode

View Logs

Production Deployment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages