GitHub - NYN-05/Report_Generation: Evidence-first, AI-powered report generation system using multi-agent architecture with fact extraction, knowledge graphs, and hallucination detection.

Most AI report generators hallucinate — they treat your documents as inspiration, not constraint. This project flips that: extract facts first, generate second.

Typical AI Generator	This System
Writes first, cites later	Extracts facts before writing a word
Hallucinates metrics & citations	Validates every claim against a fact store
Requires expensive cloud APIs	Runs on local Ollama — zero API cost
Black-box generation	Transparent regex + LLM pipeline

Area	What It Does
Fact Extraction	Regex discovery of metrics, algorithms, datasets, results — 13 fact types
Hybrid RAG	BM25 + vector search (ChromaDB) with CrossEncoder reranking
Knowledge Clustering	LLM-powered grouping of facts into coherent concept clusters
Hallucination Detection	Multi-pattern validation flags unsupported claims before export
Quality Scoring	Evidence fidelity, traceability, and hallucination risk scoring
DOCX Export	Cover page, TOC, evidence appendix, centralized style management
Web Enrichment	Optional DuckDuckGo / Tavily to fill evidence gaps automatically

Quick Start

pip install -r requirements.txt
ollama serve
python -m src.main "Deep Learning for Intrusion Detection" --knowledge-dir knowledge/

Flag	What It Does
`topic`	Report topic (required)
`--knowledge-dir`	Reference documents directory
`--output FILE`	Output path (default: `output/output.docx`)
`--min-coverage N`	Evidence coverage threshold (default: 0.3)
`--web-search`	Enable fact enrichment via web search
`--tavly`	Use Tavily API (requires `TAVILY_API_KEY`)

Project Structure

src/
├─ main.py                  # Pipeline orchestration
├─ facts/                   # Fact extraction, validation, store
├─ evidence/                # Web search enrichment
├─ generator/               # Ollama synthesis, blueprint, planning
├─ analysis/                # Knowledge clustering & coverage audit
├─ validation/              # Hallucination detection engine
├─ quality/                 # Pre/post generation quality metrics
├─ document/                # DOCX generation with style management
├─ ingestion/               # Parsing, chunking, embedding
├─ retrieval/               # BM25 + vector search + reranker
├─ providers/               # Ollama provider with retry & circuit breaker
├─ collection/              # Wikipedia knowledge collection
└─ core/                    # Config, logging, state, utils

Dependencies

Required	Optional
Ollama (runtime), python-docx, chromadb	sentence-transformers, rank-bm25, duckduckgo_search, docx2pdf, Jinja2

Roadmap

Done: Fact extraction, fact store, hybrid RAG, knowledge clustering, hallucination detection, quality scoring, DOCX export

Next: LLM-powered fact extraction, multi-agent orchestration, knowledge graph, dashboard, parallel execution

pytest tests/

Evidence over generation. Every sentence traces to a verifiable source.
github.com/NYN-05/Report_Generation

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
assets		assets
knowledge		knowledge
prompts		prompts
skills		skills
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
memory_hub_state.json		memory_hub_state.json
requirements.txt		requirements.txt
run_demo.ps1		run_demo.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Project Structure

Dependencies

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Project Structure

Dependencies

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages