🚀 Omni-Doc: High-Density PDF Explorer

Omni-Doc is a robust local RAG (Retrieval-Augmented Generation) system engineered to handle massive, high-complexity technical documentation.

While standard AI loaders often fail on large-scale PDFs (3,000+ pages) due to dense tables, JSON schemas, and non-standard formatting, Omni-Doc uses a "Mechanical Split" architecture to guarantee 100% stability.

🛠️ Tech Stack

Orchestration: LangGraph (State-machine reasoning)
Data Framework: LlamaIndex (Data indexing & retrieval)
LLM: Gemma 3:4B via Ollama
Embeddings: Nomic-Embed-Text via Ollama
Schema: Pydantic for structured, cited responses

🏗️ The Engineering Edge

Omni-Doc is built to bypass the hard constraints of local embedding servers:

Deterministic Splitting: We ignore unreliable "semantic" breaks in favor of strict, fixed-length character chunks. This ensures no single payload ever exceeds the API context limit.
Purified Nodes: By stripping all metadata during the embedding phase, we maximize the available token space for actual content.
Stateful Verification: Using a graph-based workflow, the agent must prove its answers exist within the documentation before responding.

🚀 Getting Started

1. Prerequisites

Ensure Ollama is installed and the models are pulled:

sudo docker container start {name of container}
ollama pull gemma3:4b
ollama pull nomic-embed-text

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
documentation		documentation
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Omni-Doc: High-Density PDF Explorer

🛠️ Tech Stack

🏗️ The Engineering Edge

🚀 Getting Started

1. Prerequisites

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Omni-Doc: High-Density PDF Explorer

🛠️ Tech Stack

🏗️ The Engineering Edge

🚀 Getting Started

1. Prerequisites

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages