lightrag-bigquery

Google Cloud BigQuery storage backend for LightRAG.

This package provides four BigQuery-backed storage classes as an external plugin — no modifications to LightRAG source code required.

Storage	Class	Description
KV	`BigQueryKVStorage`	Key-value storage with JSON serialization
Vector	`BigQueryVectorStorage`	Vector storage with cosine similarity search
Graph	`BigQueryGraphStorage`	Graph storage with BigQuery Property Graph support
DocStatus	`BigQueryDocStatusStorage`	Document processing status tracking

Installation

pip install lightrag-hku
pip install git+https://github.com/ksmin23/lightrag-bigquery.git@v0.1.0

Quick Start

import asyncio
import lightrag_bigquery
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed

# Register BigQuery storage classes with LightRAG
lightrag_bigquery.register()

async def main():
    rag = LightRAG(
        working_dir="./rag_storage",
        llm_model_func=gpt_4o_mini_complete,
        embedding_func=openai_embed,
        kv_storage="BigQueryKVStorage",
        vector_storage="BigQueryVectorStorage",
        graph_storage="BigQueryGraphStorage",
        doc_status_storage="BigQueryDocStatusStorage",
        addon_params={
            "bigquery_project_id": "my-project",
            "bigquery_dataset_id": "my-dataset",
        },
    )

    await rag.initialize_storages()
    await rag.ainsert("Your document text here")
    result = await rag.aquery("Your question", param=QueryParam(mode="hybrid"))
    print(result)
    await rag.finalize_storages()

asyncio.run(main())

Configuration

BigQuery connection settings can be provided via addon_params or environment variables. Environment variables are used as fallback when addon_params are not set.

addon_params key	Environment Variable	Description
`bigquery_project_id`	`BIGQUERY_PROJECT` or `GOOGLE_CLOUD_PROJECT`	GCP project ID
`bigquery_dataset_id`	`BIGQUERY_DATASET`	BigQuery dataset ID
`bigquery_graph_name`	`BIGQUERY_GRAPH_NAME`	Property graph name (default: `lightrag_knowledge_graph`)

Using Environment Variables

export GOOGLE_CLOUD_PROJECT=my-project
export BIGQUERY_DATASET=my-dataset

lightrag_bigquery.register()
rag = LightRAG(
    kv_storage="BigQueryKVStorage",
    vector_storage="BigQueryVectorStorage",
    graph_storage="BigQueryGraphStorage",
    doc_status_storage="BigQueryDocStatusStorage",
    ...
)

LLM Authentication

LLM and embedding authentication is handled by LightRAG core, not by this package. Choose one of the following options depending on your LLM provider:

Option	Environment Variable	Description
Gemini via Vertex AI	`GOOGLE_GENAI_USE_VERTEXAI=true`	Uses Application Default Credentials (ADC). No API key needed. Recommended on GCP.
Gemini via AI Studio	`GEMINI_API_KEY`	Uses a Gemini API key from AI Studio.
OpenAI	`OPENAI_API_KEY`	Uses an OpenAI API key.

Note: When using Vertex AI mode, LightRAG's gemini.py checks for the exact string "true" (case-insensitive). Values like "1" or "yes" will not activate Vertex AI mode.

Prerequisites

GCP Authentication

gcloud auth application-default login

Or with a service account key:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json

BigQuery Dataset

The dataset is created automatically during initialization (CREATE SCHEMA IF NOT EXISTS). Tables and the property graph are also created automatically.

If you prefer to create the dataset manually:

export BIGQUERY_DATASET=lightrag

bq --location=US mk --dataset $GOOGLE_CLOUD_PROJECT:$BIGQUERY_DATASET

Project Structure

lightrag-bigquery/
├── pyproject.toml
├── src/
│   └── lightrag_bigquery/
│       ├── __init__.py                         # register() and public exports
│       ├── client.py                           # BigQueryClientManager and helpers
│       └── storage.py                          # All 4 storage class implementations
└── examples/
    ├── .env.example                            # Environment variable template
    ├── _config.py                              # Shared configuration loader
    ├── requirements.txt
    ├── basic_usage.py
    ├── env_var_config.py
    ├── batch_insert_and_query.py
    └── knowledge_graph_exploration.py

Design Decisions

Decision	Approach	Rationale
Sync vs Async	Synchronous BigQuery SDK wrapped with `asyncio.to_thread`	BigQuery Python SDK is synchronous; avoids blocking the event loop
Upsert	`MERGE INTO ... WHEN MATCHED / NOT MATCHED`	BigQuery lacks `INSERT OR UPDATE`; MERGE is the idiomatic alternative
Workspace Isolation	Column-based filtering (`WHERE workspace = @ws`)	Avoids DDL proliferation from per-workspace tables
Property Graph	BigQuery Property Graph (`CREATE PROPERTY GRAPH`)	Native graph support for nodes and edges
Embedding Type	`ARRAY<FLOAT64>`	BigQuery's native vector type with `COSINE_DISTANCE` support
Vector Search	`COSINE_DISTANCE()` in `ORDER BY`	Simple and universal; `VECTOR_SEARCH` with IVF index can be added later
Client Reuse	Singleton `BigQueryClientManager`	Shares a single BigQuery client across all storage classes
Primary Key	`PRIMARY KEY (id) NOT ENFORCED`	BigQuery does not enforce primary keys; used as advisory hints
Fuzzy Search	`LIKE`-based pattern matching	BigQuery standard SQL; sufficient for entity label search

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
src/lightrag_bigquery		src/lightrag_bigquery
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lightrag-bigquery

Installation

Quick Start

Configuration

Using Environment Variables

LLM Authentication

Prerequisites

GCP Authentication

BigQuery Dataset

Project Structure

Design Decisions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lightrag-bigquery

Installation

Quick Start

Configuration

Using Environment Variables

LLM Authentication

Prerequisites

GCP Authentication

BigQuery Dataset

Project Structure

Design Decisions

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages