GraphRAG: Knowledge Graph–Driven Retrieval Augmented Generation
This repository contains an end-to-end GraphRAG system that transforms unstructured documents into a queryable knowledge graph combined with semantic search.
Instead of relying only on vector similarity, this project integrates knowledge graphs and embeddings to enable deeper reasoning, structured retrieval, and more grounded AI-generated answers.
🚀 What This Project Does
Ingests real-world unstructured documents (PDFs, reports, manuals)
Splits large text into semantically meaningful chunks
Extracts entities and relationships using an LLM
Stores structured knowledge in Neo4j
Generates OpenAI embeddings for semantic search
Performs hybrid retrieval using:
Vector similarity search
Graph-based relationships
Optionally generates answers using an LLM (GraphRAG)
🧠 Why GraphRAG?
Traditional RAG systems use vector databases to retrieve similar text. While effective, they lack explicit structure and reasoning capability.
GraphRAG combines:
Vector embeddings → semantic relevance
Knowledge graphs → relationships and reasoning
This enables:
Multi-hop reasoning
Relationship-aware retrieval
Better explainability
More reliable answers for complex documents
🏗️ System Architecture (High-Level)
Document Loader Loads unstructured documents using LangChain document loaders.
Text Chunking Splits documents using RecursiveCharacterTextSplitter with overlap to preserve context.
Entity & Relation Extraction Uses an LLM to extract structured entities and relationships in strict JSON format.
Knowledge Graph Storage (Neo4j) Stores:
Document nodes
Chunk nodes
Entity nodes
Relationships between entities and chunks
Embedding Generation Converts chunks into dense vectors using OpenAI’s embedding model.
Vector Indexing Creates a Neo4j native vector index for fast similarity search.
Hybrid Query Pipeline
Semantic retrieval using vector search
Context expansion using graph traversal
Optional answer generation using LangChain chains
🛠️ Tech Stack
Python
LangChain – orchestration, loaders, splitters, chains
Neo4j – knowledge graph + vector index
OpenAI
text-embedding-3-large for embeddings (3072 dimensions)
gpt-4o-mini for extraction and answer generation
📦 Installation git clone cd graph-rag pip install -r requirements.txt
🔐 Environment Variables
Create a .env file in the project root:
OPENAI_API_KEY=your_openai_key NEO4J_URI=neo4j://127.0.0.1:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_password
Make sure .env is added to .gitignore.
Load and chunk documents
Extract entities and relationships
Store data in Neo4j
Generate and store embeddings
Create vector index
Query using:
Vector-only retrieval
GraphRAG pipeline with LLM