Skip to content

dcaayushd/Documentify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Documentify

Documentify is a local AI SaaS platform for analyzing large documents such as legal contracts, research papers, financial reports, and policy documents. It runs entirely on open-source tooling with Flutter on the frontend and FastAPI on the backend, using Ollama, SentenceTransformers, and ChromaDB for local-first RAG.

What it does

  • Upload and process PDF, DOCX, and TXT documents
  • Generate executive summaries, key insights, and document metrics
  • Detect contract and policy risks with section-linked references
  • Run grounded Q&A over a single document or an entire knowledge base
  • Search semantically across multiple uploaded documents
  • Compare two documents for structural differences
  • Generate research study packs: simple explanations, key contributions, slides, flashcards, and quizzes
  • Run an autonomous document agent to synthesize multi-document reports
  • Export analysis reports as Markdown, text, or PDF

System Architecture

See [architecture.md](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/docs/architecture.md) for the diagram.

flowchart TD
    A[Flutter Web + Mobile] --> B[FastAPI API]
    B --> C[Document Parsing]
    C --> D[Chunking]
    D --> E[SentenceTransformers]
    E --> F[ChromaDB]
    F --> G[Retriever]
    G --> H[Ollama Local LLM]
    H --> I[Insights + Q&A + Research + Agent Reports]
Loading

Folder Structure

Documentify/
├── lib/
│   ├── core/
│   ├── models/
│   ├── screens/
│   ├── services/
│   └── widgets/
├── backend/
│   ├── app/
│   │   ├── agent/
│   │   ├── core/
│   │   ├── document_parser/
│   │   ├── embeddings/
│   │   ├── rag/
│   │   ├── research_mode/
│   │   ├── risk_analysis/
│   │   ├── routes/
│   │   ├── schemas/
│   │   └── services/
│   ├── data/
│   ├── Dockerfile
│   └── requirements.txt
├── docs/
│   └── architecture.md
├── docker-compose.yml
└── Dockerfile.web

Backend Implementation

The FastAPI backend provides a local-first document intelligence pipeline:

  • Parsing: PyMuPDF, pdfplumber, python-docx, and optional local OCR via pytesseract
  • Chunking: section-aware semantic chunking with overlap
  • Embeddings: SentenceTransformers using all-MiniLM-L6-v2
  • Vector retrieval: persistent ChromaDB
  • LLM inference: Ollama with open-source models such as mistral
  • Risk analysis: domain heuristics for contracts, financial reports, and policy documents
  • Research mode: study materials and presentation outputs
  • Autonomous agent: retrieval + synthesis workflow for cross-document reporting

API Endpoints

  • POST /upload-document
  • POST /process-document
  • GET /documents
  • GET /document-summary
  • GET /key-insights
  • GET /risk-analysis
  • POST /chat
  • GET /search
  • POST /research-mode
  • POST /agent-analysis
  • POST /compare-documents
  • GET /export-report
  • GET /health

Flutter Frontend Implementation

The Flutter app is designed as a modern AI SaaS dashboard with Material 3 and a dark visual system:

  • 3-panel desktop layout: sidebar, semantic document viewer, AI insights rail
  • Responsive mobile/tablet experience with modal insights and dedicated chat
  • Startup-style login and onboarding screen
  • Upload workflow with processing pipeline states
  • Research mode with slides, flashcards, and quiz UI
  • Semantic viewer with section highlights and jump-to-reference behavior
  • Integrated chat for grounded document Q&A

API Integration

The frontend calls the local FastAPI backend through lib/services/api_service.dart. If the backend is not running yet, the app falls back to rich demo data so the interface still behaves like a polished product during development.

Set the backend URL at build/run time if needed:

flutter run --dart-define=API_BASE_URL=http://127.0.0.1:8000

Local Setup

1. Start Ollama

Install Ollama locally, then pull an open-source model:

ollama serve
ollama pull mistral

2. Run the backend

cd backend
python3 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
./run_dev.sh

Optional: enable OCR for scanned PDFs

The parser is OCR-ready and will automatically fall back to local OCR for low-text, image-heavy PDF pages when Tesseract is available.

macOS:

brew install tesseract

Linux:

sudo apt-get install tesseract-ocr tesseract-ocr-eng

Optional environment variables:

export OCR_ENABLED=1
export OCR_LANGUAGE=eng
export OCR_RENDER_DPI=220
export OCR_MIN_CONFIDENCE=35

If Tesseract is installed in a non-standard location, set:

export TESSERACT_CMD=/full/path/to/tesseract

3. Run the Flutter app

flutter pub get
flutter run -d chrome --dart-define=API_BASE_URL=http://127.0.0.1:8000

Docker Configuration

The repo includes:

  • [backend/Dockerfile](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/backend/Dockerfile)
  • [Dockerfile.web](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/Dockerfile.web)
  • [docker-compose.yml](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/docker-compose.yml)

Bring the full stack up with:

docker compose up --build

Services:

  • Frontend web app: http://localhost:8080
  • FastAPI backend: http://localhost:8000
  • Ollama: http://localhost:11434
  • Backend container includes tesseract-ocr for scanned-PDF OCR fallback

Notes

  • All models and libraries in this project are free and open source.
  • The backend is designed to run locally with no paid API dependency.
  • The frontend includes demo fallback data to keep the product presentable before the full local model stack is installed.

About

AI platform for analyzing large documents (legal, research, financial etc) using RAG, built with Flutter, FastAPI, and open-source LLM tooling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors