Skip to content

Latest commit

 

History

History
209 lines (162 loc) · 5.8 KB

File metadata and controls

209 lines (162 loc) · 5.8 KB

Documentify

Documentify is a local AI SaaS platform for analyzing large documents such as legal contracts, research papers, financial reports, and policy documents. It runs entirely on open-source tooling with Flutter on the frontend and FastAPI on the backend, using Ollama, SentenceTransformers, and ChromaDB for local-first RAG.

What it does

  • Upload and process PDF, DOCX, and TXT documents
  • Generate executive summaries, key insights, and document metrics
  • Detect contract and policy risks with section-linked references
  • Run grounded Q&A over a single document or an entire knowledge base
  • Search semantically across multiple uploaded documents
  • Compare two documents for structural differences
  • Generate research study packs: simple explanations, key contributions, slides, flashcards, and quizzes
  • Run an autonomous document agent to synthesize multi-document reports
  • Export analysis reports as Markdown, text, or PDF

System Architecture

See [architecture.md](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/docs/architecture.md) for the diagram.

flowchart TD
    A[Flutter Web + Mobile] --> B[FastAPI API]
    B --> C[Document Parsing]
    C --> D[Chunking]
    D --> E[SentenceTransformers]
    E --> F[ChromaDB]
    F --> G[Retriever]
    G --> H[Ollama Local LLM]
    H --> I[Insights + Q&A + Research + Agent Reports]
Loading

Folder Structure

Documentify/
├── lib/
│   ├── core/
│   ├── models/
│   ├── screens/
│   ├── services/
│   └── widgets/
├── backend/
│   ├── app/
│   │   ├── agent/
│   │   ├── core/
│   │   ├── document_parser/
│   │   ├── embeddings/
│   │   ├── rag/
│   │   ├── research_mode/
│   │   ├── risk_analysis/
│   │   ├── routes/
│   │   ├── schemas/
│   │   └── services/
│   ├── data/
│   ├── Dockerfile
│   └── requirements.txt
├── docs/
│   └── architecture.md
├── docker-compose.yml
└── Dockerfile.web

Backend Implementation

The FastAPI backend provides a local-first document intelligence pipeline:

  • Parsing: PyMuPDF, pdfplumber, python-docx, and optional local OCR via pytesseract
  • Chunking: section-aware semantic chunking with overlap
  • Embeddings: SentenceTransformers using all-MiniLM-L6-v2
  • Vector retrieval: persistent ChromaDB
  • LLM inference: Ollama with open-source models such as mistral
  • Risk analysis: domain heuristics for contracts, financial reports, and policy documents
  • Research mode: study materials and presentation outputs
  • Autonomous agent: retrieval + synthesis workflow for cross-document reporting

API Endpoints

  • POST /upload-document
  • POST /process-document
  • GET /documents
  • GET /document-summary
  • GET /key-insights
  • GET /risk-analysis
  • POST /chat
  • GET /search
  • POST /research-mode
  • POST /agent-analysis
  • POST /compare-documents
  • GET /export-report
  • GET /health

Flutter Frontend Implementation

The Flutter app is designed as a modern AI SaaS dashboard with Material 3 and a dark visual system:

  • 3-panel desktop layout: sidebar, semantic document viewer, AI insights rail
  • Responsive mobile/tablet experience with modal insights and dedicated chat
  • Startup-style login and onboarding screen
  • Upload workflow with processing pipeline states
  • Research mode with slides, flashcards, and quiz UI
  • Semantic viewer with section highlights and jump-to-reference behavior
  • Integrated chat for grounded document Q&A

API Integration

The frontend calls the local FastAPI backend through lib/services/api_service.dart. If the backend is not running yet, the app falls back to rich demo data so the interface still behaves like a polished product during development.

Set the backend URL at build/run time if needed:

flutter run --dart-define=API_BASE_URL=http://127.0.0.1:8000

Local Setup

1. Start Ollama

Install Ollama locally, then pull an open-source model:

ollama serve
ollama pull mistral

2. Run the backend

cd backend
python3 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
./run_dev.sh

Optional: enable OCR for scanned PDFs

The parser is OCR-ready and will automatically fall back to local OCR for low-text, image-heavy PDF pages when Tesseract is available.

macOS:

brew install tesseract

Linux:

sudo apt-get install tesseract-ocr tesseract-ocr-eng

Optional environment variables:

export OCR_ENABLED=1
export OCR_LANGUAGE=eng
export OCR_RENDER_DPI=220
export OCR_MIN_CONFIDENCE=35

If Tesseract is installed in a non-standard location, set:

export TESSERACT_CMD=/full/path/to/tesseract

3. Run the Flutter app

flutter pub get
flutter run -d chrome --dart-define=API_BASE_URL=http://127.0.0.1:8000

Docker Configuration

The repo includes:

  • [backend/Dockerfile](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/backend/Dockerfile)
  • [Dockerfile.web](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/Dockerfile.web)
  • [docker-compose.yml](/Users/dcaayushd/Development/Flutter_Dev/Flutter Projects/Documentify/docker-compose.yml)

Bring the full stack up with:

docker compose up --build

Services:

  • Frontend web app: http://localhost:8080
  • FastAPI backend: http://localhost:8000
  • Ollama: http://localhost:11434
  • Backend container includes tesseract-ocr for scanned-PDF OCR fallback

Notes

  • All models and libraries in this project are free and open source.
  • The backend is designed to run locally with no paid API dependency.
  • The frontend includes demo fallback data to keep the product presentable before the full local model stack is installed.