Google Cloud BigQuery storage backend for LightRAG.
This package provides four BigQuery-backed storage classes as an external plugin — no modifications to LightRAG source code required.
| Storage | Class | Description |
|---|---|---|
| KV | BigQueryKVStorage |
Key-value storage with JSON serialization |
| Vector | BigQueryVectorStorage |
Vector storage with cosine similarity search |
| Graph | BigQueryGraphStorage |
Graph storage with BigQuery Property Graph support |
| DocStatus | BigQueryDocStatusStorage |
Document processing status tracking |
pip install lightrag-hku
pip install git+https://github.com/ksmin23/lightrag-bigquery.git@v0.1.0import asyncio
import lightrag_bigquery
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
# Register BigQuery storage classes with LightRAG
lightrag_bigquery.register()
async def main():
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=gpt_4o_mini_complete,
embedding_func=openai_embed,
kv_storage="BigQueryKVStorage",
vector_storage="BigQueryVectorStorage",
graph_storage="BigQueryGraphStorage",
doc_status_storage="BigQueryDocStatusStorage",
addon_params={
"bigquery_project_id": "my-project",
"bigquery_dataset_id": "my-dataset",
},
)
await rag.initialize_storages()
await rag.ainsert("Your document text here")
result = await rag.aquery("Your question", param=QueryParam(mode="hybrid"))
print(result)
await rag.finalize_storages()
asyncio.run(main())BigQuery connection settings can be provided via addon_params or environment variables. Environment variables are used as fallback when addon_params are not set.
| addon_params key | Environment Variable | Description |
|---|---|---|
bigquery_project_id |
BIGQUERY_PROJECT or GOOGLE_CLOUD_PROJECT |
GCP project ID |
bigquery_dataset_id |
BIGQUERY_DATASET |
BigQuery dataset ID |
bigquery_graph_name |
BIGQUERY_GRAPH_NAME |
Property graph name (default: lightrag_knowledge_graph) |
export GOOGLE_CLOUD_PROJECT=my-project
export BIGQUERY_DATASET=my-datasetlightrag_bigquery.register()
rag = LightRAG(
kv_storage="BigQueryKVStorage",
vector_storage="BigQueryVectorStorage",
graph_storage="BigQueryGraphStorage",
doc_status_storage="BigQueryDocStatusStorage",
...
)LLM and embedding authentication is handled by LightRAG core, not by this package. Choose one of the following options depending on your LLM provider:
| Option | Environment Variable | Description |
|---|---|---|
| Gemini via Vertex AI | GOOGLE_GENAI_USE_VERTEXAI=true |
Uses Application Default Credentials (ADC). No API key needed. Recommended on GCP. |
| Gemini via AI Studio | GEMINI_API_KEY |
Uses a Gemini API key from AI Studio. |
| OpenAI | OPENAI_API_KEY |
Uses an OpenAI API key. |
Note: When using Vertex AI mode, LightRAG's
gemini.pychecks for the exact string"true"(case-insensitive). Values like"1"or"yes"will not activate Vertex AI mode.
gcloud auth application-default loginOr with a service account key:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.jsonThe dataset is created automatically during initialization (CREATE SCHEMA IF NOT EXISTS). Tables and the property graph are also created automatically.
If you prefer to create the dataset manually:
export BIGQUERY_DATASET=lightrag
bq --location=US mk --dataset $GOOGLE_CLOUD_PROJECT:$BIGQUERY_DATASETlightrag-bigquery/
├── pyproject.toml
├── src/
│ └── lightrag_bigquery/
│ ├── __init__.py # register() and public exports
│ ├── client.py # BigQueryClientManager and helpers
│ └── storage.py # All 4 storage class implementations
└── examples/
├── .env.example # Environment variable template
├── _config.py # Shared configuration loader
├── requirements.txt
├── basic_usage.py
├── env_var_config.py
├── batch_insert_and_query.py
└── knowledge_graph_exploration.py
| Decision | Approach | Rationale |
|---|---|---|
| Sync vs Async | Synchronous BigQuery SDK wrapped with asyncio.to_thread |
BigQuery Python SDK is synchronous; avoids blocking the event loop |
| Upsert | MERGE INTO ... WHEN MATCHED / NOT MATCHED |
BigQuery lacks INSERT OR UPDATE; MERGE is the idiomatic alternative |
| Workspace Isolation | Column-based filtering (WHERE workspace = @ws) |
Avoids DDL proliferation from per-workspace tables |
| Property Graph | BigQuery Property Graph (CREATE PROPERTY GRAPH) |
Native graph support for nodes and edges |
| Embedding Type | ARRAY<FLOAT64> |
BigQuery's native vector type with COSINE_DISTANCE support |
| Vector Search | COSINE_DISTANCE() in ORDER BY |
Simple and universal; VECTOR_SEARCH with IVF index can be added later |
| Client Reuse | Singleton BigQueryClientManager |
Shares a single BigQuery client across all storage classes |
| Primary Key | PRIMARY KEY (id) NOT ENFORCED |
BigQuery does not enforce primary keys; used as advisory hints |
| Fuzzy Search | LIKE-based pattern matching |
BigQuery standard SQL; sufficient for entity label search |
MIT