🔍 Intelligent Forensic Document Auditor (Local GenAI)

An automated data integrity tool designed to identify discrepancies and unauthorized alterations in high-stakes corporate documents. This project was developed as a Proof of Concept (POC) for the Digital Accelerator division to ensure 100% data privacy and eliminate cloud-based API costs.

🎯 Target Use Case

Forensics: Identifying tampered text, modified dates, or altered financial figures in official records.
Legal/Finance: Automating the review of complex, multi-page contracts (10+ paragraphs) for data consistency.
Process Automation: Reducing manual auditing time by 90% while maintaining enterprise-grade data security.

🛠️ Tech Stack

Language: Python 3.11+
LLM Orchestration: LangChain (LCEL)
Local LLM: Llama 3.2 (via Ollama)
Embeddings: nomic-embed-text (Local via Ollama)
Vector Database: ChromaDB (with MMR search logic)
Frontend: Streamlit for data visualization and interaction

🚀 Key Technical Solutions

Privacy-First Architecture: By using Ollama, the system processes documents entirely offline, making it suitable for sensitive internal communications.
Advanced Retrieval: Implemented Maximum Marginal Relevance (MMR) and adjusted retrieval parameters ($k=6$) to ensure accuracy in long-form documents.
Stability Engineering: Resolved Windows-specific environment conflicts, including telemetry bugs and file-locking errors, through dynamic session pathing.

📋 Setup & Installation

Install Ollama: Download from ollama.com and pull the required models:
```
ollama pull llama3.2
ollama pull nomic-embed-text
```

Environment Setup:

python -m venv audit_env
# Windows
.\audit_env\Scripts\activate
# Install dependencies
pip install -r requirements.txt

Run Application:
```
streamlit run app.py
```

Developer: Sethukb
Portfolio: codes-by-sethu.github.io/PORTFOLIO/

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Test_Data		Test_Data
chroma_db_1769789489		chroma_db_1769789489
chroma_db_1769790029		chroma_db_1769790029
chroma_db_1769790355		chroma_db_1769790355
.gitignore		.gitignore
README.md		README.md
app.py		app.py
auditor.py		auditor.py
eval_document_auditor.py		eval_document_auditor.py
eval_test.py		eval_test.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Intelligent Forensic Document Auditor (Local GenAI)

🎯 Target Use Case

🛠️ Tech Stack

🚀 Key Technical Solutions

📋 Setup & Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 Intelligent Forensic Document Auditor (Local GenAI)

🎯 Target Use Case

🛠️ Tech Stack

🚀 Key Technical Solutions

📋 Setup & Installation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages