GitHub - RudraanshBhati/GraphRAG: An end-to-end GraphRAG system that combines semantic search and knowledge graphs using LangChain, Neo4j, and OpenAI to reason over unstructured documents.

GraphRAG: Knowledge Graph–Driven Retrieval Augmented Generation

This repository contains an end-to-end GraphRAG system that transforms unstructured documents into a queryable knowledge graph combined with semantic search.

Instead of relying only on vector similarity, this project integrates knowledge graphs and embeddings to enable deeper reasoning, structured retrieval, and more grounded AI-generated answers.

🚀 What This Project Does

Ingests real-world unstructured documents (PDFs, reports, manuals)

Splits large text into semantically meaningful chunks

Extracts entities and relationships using an LLM

Stores structured knowledge in Neo4j

Generates OpenAI embeddings for semantic search

Performs hybrid retrieval using:

Vector similarity search

Graph-based relationships

Optionally generates answers using an LLM (GraphRAG)

🧠 Why GraphRAG?

Traditional RAG systems use vector databases to retrieve similar text. While effective, they lack explicit structure and reasoning capability.

GraphRAG combines:

Vector embeddings → semantic relevance

Knowledge graphs → relationships and reasoning

This enables:

Multi-hop reasoning

Relationship-aware retrieval

Better explainability

More reliable answers for complex documents

🏗️ System Architecture (High-Level)

Document Loader Loads unstructured documents using LangChain document loaders.

Text Chunking Splits documents using RecursiveCharacterTextSplitter with overlap to preserve context.

Entity & Relation Extraction Uses an LLM to extract structured entities and relationships in strict JSON format.

Knowledge Graph Storage (Neo4j) Stores:

Document nodes

Chunk nodes

Entity nodes

Relationships between entities and chunks

Embedding Generation Converts chunks into dense vectors using OpenAI’s embedding model.

Vector Indexing Creates a Neo4j native vector index for fast similarity search.

Hybrid Query Pipeline

Semantic retrieval using vector search

Context expansion using graph traversal

Optional answer generation using LangChain chains

🛠️ Tech Stack

Python

LangChain – orchestration, loaders, splitters, chains

Neo4j – knowledge graph + vector index

OpenAI

text-embedding-3-large for embeddings (3072 dimensions)

gpt-4o-mini for extraction and answer generation

📦 Installation git clone cd graph-rag pip install -r requirements.txt

🔐 Environment Variables

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_key NEO4J_URI=neo4j://127.0.0.1:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_password

Make sure .env is added to .gitignore.

▶️ Usage (High Level)

Load and chunk documents

Extract entities and relationships

Store data in Neo4j

Generate and store embeddings

Create vector index

Query using:

Vector-only retrieval

GraphRAG pipeline with LLM

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages