Document Intelligence System

A full-stack system that lets you upload documents, search their content, and get AI-powered summaries, key points, and question answering — built with FastAPI, React, and NVIDIA AI.

Features

Document Upload: Support for both raw text and file uploads (.txt, .pdf, etc.).
Smart Processing: Automatic text cleaning, chunking, and keyword extraction.
AI Analysis: Powered by NVIDIA AI for summaries and key point extraction.
Contextual Q&A: Ask questions on any document and get answers with confidence scores.
Keyword Search: Fast search across titles, content, and extracted tags.
Modern UI: Clean, dark-themed dashboard built with React.

File Structure

doc-intelligence/
│
├── main.py                        # Backend entry point
├── app/                           # FastAPI Backend
│   ├── routes/                    # API Endpoints (Documents, AI)
│   ├── models/                    # Database Models (SQLAlchemy)
│   ├── services/                  # Business Logic (AI, Processing)
│   └── schemas.py                 # Pydantic Schemas
│
├── frontend/                      # React UI
│   ├── src/                       # React components and styles
│   └── public/                    # Static assets
│
├── documents.db                   # SQLite database
├── .env                           # Configuration (API keys)
└── requirements.txt               # Backend dependencies

Setup

1. Backend Setup

# Install dependencies
pip install -r requirements.txt

# Configure .env (Add your NVIDIA_API_KEY)
# NVIDIA_API_KEY=your_key_here
# DATABASE_URL=sqlite:///./documents.db

# Run the server
uvicorn main:app --reload

Backend runs at: http://localhost:8000

2. Frontend Setup

cd frontend
npm install
npm start

Frontend runs at: http://localhost:3000

API Reference

Upload a Document

POST /documents Now supports multipart/form-data for both text and file uploads.

Generate AI Summary

POST /documents/{id}/summary Generates a 2-3 sentence summary, key points, and suggested tags.

Ask a Question

POST /documents/{id}/query Uses keyword overlap to find relevant context and answers via NVIDIA AI.

Get All Documents

GET /documents
GET /documents?skip=0&limit=10

Get Document by ID

GET /documents/1

Search Documents

GET /documents/search?q=IoT

Response:

[
  {
    "id": 1,
    "title": "IoT Overview",
    "keywords": ["internet", "devices", "network"],
    "snippet": "...The Internet of Things (IoT) refers to the network of physical devices...",
    "created_at": "2024-01-15T10:30:00"
  }
]

Generate AI Summary

POST /documents/1/summary

Response:

{
  "document_id": 1,
  "title": "IoT Overview",
  "summary": "This document provides an overview of the Internet of Things, explaining how physical devices connect and communicate over networks.",
  "key_points": [
    "IoT connects physical devices to the internet",
    "Sensors collect and transmit data",
    "Applications include smart homes and industrial automation"
  ],
  "suggested_tags": ["iot", "sensors", "networking", "automation"]
}

Ask a Question

POST /documents/1/query
Content-Type: application/json

{
  "question": "What are the main applications of IoT?"
}

Response:

{
  "document_id": 1,
  "question": "What are the main applications of IoT?",
  "answer": "According to the document, IoT is applied in smart homes, industrial automation, and healthcare monitoring.",
  "confidence": "high",
  "source_chunks": [
    "IoT applications span smart homes, factories, and medical devices..."
  ]
}

Get Query History

GET /documents/1/history

Delete a Document

DELETE /documents/1

Running Tests

pytest tests/ -v

Approach

Processing Pipeline

Before storing any document, the system runs a lightweight pipeline:

Clean — strips extra whitespace, normalizes characters
Chunk — splits content into ~1000-char overlapping segments at sentence boundaries
Extract keywords — frequency-based keyword extraction after removing stop words
Store Summaries — generated AI summaries are saved in the DB for persistence and speed.
Save History — every user query and AI answer is logged in a history table.

This keeps the system fast and feature-rich without needing a vector database.

AI Integration

NVIDIA AI handles two tasks:

Summary endpoint — given the document content, it returns a JSON with a summary, key points, and suggested tags. Results are cached in the DB so repeated calls don't re-invoke the API.
Query endpoint — the most relevant chunks are selected using keyword overlap scoring, then passed to the model with the question. It returns an answer and a confidence level (high/medium/low).

Search

Search is done with simple string matching across title, content, and extracted keywords. No vector embeddings or FAISS required — clean and sufficient for this scale.

AI Usage Notes

Where	What was used	What was customized
`app/services/ai_service.py`	NVIDIA AI API	Prompt design, JSON parsing, confidence scoring, fallback handling
`app/services/processing.py`	Written manually	Full custom implementation — chunking logic, keyword extraction, stop words list
`app/routes/`	Written manually	All routing, caching logic, history saving
README examples	AI (assisted)	All curl examples verified manually

The processing pipeline (processing.py) is entirely hand-written. AI was used only for the summary and Q&A features, where NVIDIA AI's language understanding adds real value.

Bonus Features Implemented

Confidence score on every query response (low / medium / high)
Query history saved and retrievable per document
Summary caching — avoids redundant API calls
Source chunks returned with each answer (citation-style)
Auto-generated Swagger UI at /docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Intelligence System

Features

File Structure

Setup

1. Backend Setup

2. Frontend Setup

API Reference

Upload a Document

Generate AI Summary

Ask a Question

Get All Documents

Get Document by ID

Search Documents

Generate AI Summary

Ask a Question

Get Query History

Delete a Document

Running Tests

Approach

Processing Pipeline

AI Integration

Search

AI Usage Notes

Bonus Features Implemented

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
app		app
frontend		frontend
tests		tests
.gitignore		.gitignore
README.md		README.md
documents.db		documents.db
main.py		main.py
requirements.txt		requirements.txt
test_api_key.py		test_api_key.py
uvicorn		uvicorn

Folders and files

Latest commit

History

Repository files navigation

Document Intelligence System

Features

File Structure

Setup

1. Backend Setup

2. Frontend Setup

API Reference

Upload a Document

Generate AI Summary

Ask a Question

Get All Documents

Get Document by ID

Search Documents

Generate AI Summary

Ask a Question

Get Query History

Delete a Document

Running Tests

Approach

Processing Pipeline

AI Integration

Search

AI Usage Notes

Bonus Features Implemented

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages