Morph IQ

The operating system for document compliance workflows.

MorphIQ is a local-first document scanning and compliance platform. It takes uploaded or scanned documents, converts them into searchable PDFs, classifies them with AI, extracts structured fields, routes them through human verification, and presents approved records in a tenant-scoped portal with compliance tracking and issue handling.

Architecture

Capture -> OCR / AI Pipeline -> Review / Verification -> Portal / Compliance

ScanStation handles capture and upload.
OCR / AI Pipeline preprocesses files, runs OCR, and performs structured extraction.
ReviewStation supports verification, correction, merge, and split workflows.
Portal provides authenticated, tenant-scoped access to documents, compliance state, and issue workflows.

Tech Stack

Layer	Technology
OCR pipeline	Tesseract, OCRmyPDF, ImageMagick
AI classification & extraction	Gemini Flash
Backend / API	Python 3, Flask, Flask-Login
Database	SQLite (`portal.db`)
Document processing	pypdf, ReportLab, pdfminer
Frontend	Vanilla JS, PDF.js (no framework)
Testing	pytest, Playwright

Key Features

AI Document Pipeline

Multiple supported document types with type-specific extraction prompts
Completeness scoring and attention flags for low-confidence results
Batch re-processing support

Human-in-the-Loop Verification

Side-by-side extracted fields and source document review
Verification gates for required data
Merge and split support for multi-page documents

Compliance Workflows

Certificate tracking and expiry monitoring
Property or account-level status views
Portal issue workflow for challenged documents and rework handling

Tenant-Scoped Portal

Authenticated access with client scoping
Document search, filter, and review flows
Pack building and export support

Setup

See docs/SETUP_GUIDE.md for full instructions.

Prerequisites

Python 3.11+
Tesseract OCR
OCRmyPDF
ImageMagick
Gemini API key (set in .env - see .env.example)

Quick start (Windows)

pip install -r requirements.txt
copy .env.example .env
REM edit .env and add your Gemini key and portal secret
Start_System_v2.bat

Project Structure

MorphIQ/Product/
|-- scan_station.html
|-- review_station.html
|-- viewer.html
|-- server.py
|-- auto_ocr_watch.py
|-- ai_prefill.py
|-- sync_to_portal.py
|-- export_client.py
|-- portal_new/
|-- Templates/
|-- Clients/                # runtime data, gitignored
|-- scripts/
|-- docs/
|-- tests/
|-- Start_System_v2.bat
|-- Stop_System.bat
`-- setup_check.bat

Running Evaluations

The eval/ package measures the AI pipeline (detection + extraction) against a synthetic golden dataset and gates merges on configurable quality thresholds.

make eval
:: or, on Windows without make:
python eval/golden/generate_golden.py
python -m eval.run_eval

run_eval builds the dataset if needed, runs the tasks in parallel, writes a report to eval/report/latest/ (open index.html), and exits non-zero if any threshold gate is breached.

Flags

python -m eval.run_eval --only detection   :: run a single task (detection|extraction|pipeline)
python -m eval.run_eval --live             :: call the real Gemini API (needs GEMINI_API_KEY)
python -m eval.run_eval --workers 4        :: parallel worker threads
python -m eval.run_eval --no-report        :: skip writing the HTML/MD/JSON report

How it works. The golden PDFs are synthetic, so the correct answer for every case is known. Recorded Gemini-style responses (with deterministic injected errors) are committed in eval/golden/manifest.json, letting CI replay them fully offline — no API key, identical results every run. --live swaps in the real model to measure actual quality. The PDFs themselves are generated on demand and gitignored.

Thresholds (override via environment variables; CI fails on breach):

Metric	Env var	Default
Detection accuracy	`EVAL_MIN_DETECTION_ACC`	0.90
Required-field recall	`EVAL_MIN_FIELD_RECALL`	0.85
Completeness Pearson r	`EVAL_MIN_COMPLETENESS_R`	0.85
`needs_attention` F1	`EVAL_MIN_ATTENTION_F1`	0.80

python -m eval.ci_gate re-applies the current env thresholds to the last results.json without re-running the eval.

Expanding the dataset. Edit eval/golden/generate_golden.py (bump CASES_PER_TYPE, adjust EDGE_LAYOUT, or add value pools), then regenerate and commit the manifest:

python eval/golden/generate_golden.py
git add eval/golden/manifest.json

CI (.github/workflows/eval.yml) runs this offline on every PR to main and nightly, uploads the HTML report as an artifact, and posts the summary as a PR comment.

Status

Active development - pre-launch.

The core flow is operational:

capture
OCR
AI extraction
verification
portal delivery
issue handling

Current emphasis is on broadening test coverage, refining internal rework workflows, and keeping the repo clean, reusable, and deployment-ready.

Reference docs

Testing

python -m pytest tests -q
npm install
npm run playwright:install
npm run test:smoke
python scripts/scan_tracked_secrets.py

Repo Hygiene

Secrets belong in .env, not Git.
Runtime databases, logs, and generated test artifacts stay out of version control.
Sample data should stay synthetic and clearly marked as such.
Public docs should avoid internal machine paths, operational notes, and client-specific details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Morph IQ

Architecture

Tech Stack

Key Features

Setup

Project Structure

Running Evaluations

Status

Repo Hygiene

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
Templates		Templates
docs		docs
eval		eval
portal_new		portal_new
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitkeep		.gitkeep
Makefile		Makefile
README.md		README.md
Repo_Audit.md		Repo_Audit.md
Start_System_v2.bat		Start_System_v2.bat
Stop_System.bat		Stop_System.bat
Stop_Watcher.bat		Stop_Watcher.bat
Verification_Report.md		Verification_Report.md
ai_prefill.py		ai_prefill.py
auto_ocr_watch.py		auto_ocr_watch.py
export_client.py		export_client.py
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js
pytest.ini		pytest.ini
requirements.txt		requirements.txt
review_station.html		review_station.html
scan_station.html		scan_station.html
server.py		server.py
setup_check.bat		setup_check.bat
sync_to_portal.py		sync_to_portal.py
viewer.html		viewer.html
wsgi.py		wsgi.py

Folders and files

Latest commit

History

Repository files navigation

Morph IQ

Architecture

Tech Stack

Key Features

Setup

Project Structure

Running Evaluations

Status

Repo Hygiene

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages