Skip to content

RudraanshBhati/GraphRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphRAG: Knowledge Graph–Driven Retrieval Augmented Generation

This repository contains an end-to-end GraphRAG system that transforms unstructured documents into a queryable knowledge graph combined with semantic search.

Instead of relying only on vector similarity, this project integrates knowledge graphs and embeddings to enable deeper reasoning, structured retrieval, and more grounded AI-generated answers.

🚀 What This Project Does

Ingests real-world unstructured documents (PDFs, reports, manuals)

Splits large text into semantically meaningful chunks

Extracts entities and relationships using an LLM

Stores structured knowledge in Neo4j

Generates OpenAI embeddings for semantic search

Performs hybrid retrieval using:

Vector similarity search

Graph-based relationships

Optionally generates answers using an LLM (GraphRAG)

🧠 Why GraphRAG?

Traditional RAG systems use vector databases to retrieve similar text. While effective, they lack explicit structure and reasoning capability.

GraphRAG combines:

Vector embeddings → semantic relevance

Knowledge graphs → relationships and reasoning

This enables:

Multi-hop reasoning

Relationship-aware retrieval

Better explainability

More reliable answers for complex documents

🏗️ System Architecture (High-Level)

Document Loader Loads unstructured documents using LangChain document loaders.

Text Chunking Splits documents using RecursiveCharacterTextSplitter with overlap to preserve context.

Entity & Relation Extraction Uses an LLM to extract structured entities and relationships in strict JSON format.

Knowledge Graph Storage (Neo4j) Stores:

Document nodes

Chunk nodes

Entity nodes

Relationships between entities and chunks

Embedding Generation Converts chunks into dense vectors using OpenAI’s embedding model.

Vector Indexing Creates a Neo4j native vector index for fast similarity search.

Hybrid Query Pipeline

Semantic retrieval using vector search

Context expansion using graph traversal

Optional answer generation using LangChain chains

🛠️ Tech Stack

Python

LangChain – orchestration, loaders, splitters, chains

Neo4j – knowledge graph + vector index

OpenAI

text-embedding-3-large for embeddings (3072 dimensions)

gpt-4o-mini for extraction and answer generation

📦 Installation git clone cd graph-rag pip install -r requirements.txt

🔐 Environment Variables

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_key NEO4J_URI=neo4j://127.0.0.1:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_password

Make sure .env is added to .gitignore.

▶️ Usage (High Level)

Load and chunk documents

Extract entities and relationships

Store data in Neo4j

Generate and store embeddings

Create vector index

Query using:

Vector-only retrieval

GraphRAG pipeline with LLM

About

An end-to-end GraphRAG system that combines semantic search and knowledge graphs using LangChain, Neo4j, and OpenAI to reason over unstructured documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors