Skip to content

robertmu/pg-lakebase

Repository files navigation

pg-lakebase

Build Status Rust PostgreSQL License

The Unified Lakebase Extension Suite for PostgreSQL.

pg-lakebase makes PostgreSQL a first-class citizen in the modern Lakebase ecosystem. By implementing high-performance Table Access Methods (TAM) and Foreign Data Wrappers (FDW) in Rust — backed by a dedicated local caching storage service — it allows PostgreSQL to query and manage open table formats with native-like performance and semantics.

The current runnable extension is pg-iceberg-am, a PostgreSQL Table Access Method (TAM) for Apache Iceberg tables. It uses pg-lakebase-core for the TAM framework, iceberg-lite for Iceberg metadata and file format logic, and pgrx for PostgreSQL integration.

Current State

  • pg-iceberg-am is the primary SQL-facing extension. Its local Iceberg table storage path is the default and most exercised path, using PostgreSQL's local file APIs and a custom WAL resource manager for crash recovery.
  • Object storage is available through distributed tablespaces backed by pg-lakebase-storage, a Unix-socket cache service. The storage layer supports AWS S3, S3-compatible endpoints, Google Cloud Storage, and Azure Blob Storage.
  • pg-lakebase-core currently exposes TAM framework primitives. FDW support is still a project direction, not a completed public API.

Architecture Overview

                    PostgreSQL backend
                          |
                          |  pgrx hooks (TAM / FDW)
                          v
                   +------------------+      +---------------------+
                   | pg-iceberg-am    | ---> | pg-lakebase-core    |
                   | (Iceberg TAM)    |      | (framework traits)  |
                   +------------------+      +---------------------+
                     /              \
        local storage                object storage
        (VFD + WAL)                  (Unix domain socket)
               /                            \
              v                              v
    local filesystem          +-------------------------------+
                              |     pg-lakebase-storage       |
                              |  transport | protocol | conn  |
                              |  service   | backend  | cache |
                              +-------------------------------+
                                    |                |
                                    v                v
                              local disk cache   S3 / S3-compatible / GCS / Azure
                              (redb + files)     (object_store)

pg-iceberg-am supports two storage paths depending on the tablespace:

  • Local storage: reads and writes go directly through PostgreSQL's Virtual File Descriptor (VFD) system with optional WAL logging for crash consistency.
  • Object storage: the database process communicates with pg-lakebase-storage over Unix domain sockets. Reads of cached files use a local pread fast path that bypasses the socket entirely; control operations (open, head, miss fetch, upload) go over the socket. Cache misses are transparently fetched from AWS S3, S3-compatible endpoints, Google Cloud Storage, or Azure Blob Storage. Writes go through an explicit stage → commit flow tied to database transaction boundaries.

Object-storage tablespaces intentionally use the PostgreSQL tablespace name as the storage-service store_id, so cache and staging paths remain readable on disk. Because that name is part of the storage identity, renaming a distributed tablespace is unsupported.

Tablespace options currently expose protocol=s3, protocol=gcs, and protocol=azure; use protocol=s3 with a custom endpoint for S3-compatible services.

Distributed tablespace credentials are currently stored in pg_tablespace.spcoptions. They are redacted from Rust Debug output, but the catalog value itself is not encrypted; production deployments should prefer credential references, IAM-style ambient credentials, or another secret manager once that integration exists.

Workspace

Crate Purpose
pg-iceberg-am PostgreSQL extension implementing the Iceberg table access method.
pg-lakebase-core Framework crate for PostgreSQL TAM implementations.
pg-lakebase-core-tests PostgreSQL integration tests (#[pg_test]) for pg-lakebase-core.
pg-lakebase-macros Procedural macro support, including #[pg_table_am].
iceberg-lite Synchronous, PostgreSQL-friendly Iceberg library used by the TAM.
pg-lakebase-storage Local object-storage caching service library.
xtask Workspace maintenance commands: test-all, isolation.

Requirements

  • Rust 1.95.0 or later
  • PostgreSQL 17, including server development files, or a pgrx-managed PostgreSQL 17 downloaded during setup
  • cargo-pgrx 0.18.0

Setup

Register PostgreSQL 17 with pgrx. Use either an existing pg_config or let pgrx download PostgreSQL:

cargo pgrx init --pg17=/path/to/pg_config
# or
cargo pgrx init --pg17=download

Build

Build the Iceberg extension crate:

cargo build --package pg-iceberg-am

Install and Run

Install the extension into the PostgreSQL instance you want to use. Pass the target PostgreSQL 17 pg_config, whether it comes from pgrx-managed PostgreSQL or an existing PostgreSQL installation:

cargo pgrx install --package pg-iceberg-am --pg-config /path/to/pg_config

Then start or restart PostgreSQL with shared_preload_libraries='pg_iceberg_am'. For a pgrx-managed PostgreSQL 17:

cargo pgrx start pg17 \
  --package pg-iceberg-am \
  --postgresql-conf "shared_preload_libraries='pg_iceberg_am'"

cargo pgrx connect pg17 --package pg-iceberg-am

If the pgrx-managed PostgreSQL instance is already running, stop it before starting it again so shared_preload_libraries is applied.

For an existing PostgreSQL 17, update postgresql.conf:

shared_preload_libraries = 'pg_iceberg_am'

Then restart PostgreSQL and connect to the target database.

Testing

After modifying code, run the standard test suite:

cargo xtask test-all pg17

This runs unit tests, pgrx tests, SQL regression, and isolation tests.

Regression SQL lives in pg-iceberg-am/tests/pg_regress/sql, isolation specs in pg-iceberg-am/tests/isolation/specs, and isolation results are written to target/isolation/pg17/output_iso/.

Package

Build a distributable directory of extension artifacts:

cargo pgrx package --package pg-iceberg-am --pg-config "$(cargo pgrx info pg-config pg17)"

Use package when you want to copy the extension artifacts into an image, VM, or distro package instead of installing directly into a local PostgreSQL installation.

Usage

Create the extension once in each database that uses Iceberg tables:

CREATE EXTENSION IF NOT EXISTS pg_iceberg_am;

Create a local Iceberg table in PostgreSQL's default tablespace:

CREATE TABLE events (
    id int,
    payload text,
    created_at timestamp
) USING iceberg;

INSERT INTO events VALUES
    (1, 'hello', now()),
    (2, 'lakebase', now());

SELECT * FROM events ORDER BY id;

To use a regular PostgreSQL local tablespace, create the tablespace first and then place the Iceberg table in it:

CREATE TABLESPACE lake_local LOCATION '/path/to/local/tablespace';

CREATE TABLE local_events (
    id int,
    payload text
) USING iceberg TABLESPACE lake_local;

To use object storage, create a distributed tablespace and then place the Iceberg table in it. PostgreSQL still requires a local LOCATION directory for the tablespace metadata.

CREATE TABLESPACE lake_s3 LOCATION '/path/to/local/tablespace' WITH (
    protocol = 's3',
    bucket = 'my-lake-bucket',
    region = 'us-east-1'
);

CREATE TABLE object_events (
    id int,
    payload text
) USING iceberg TABLESPACE lake_s3;

INSERT INTO object_events VALUES
    (1, 'hello'),
    (2, 'lakebase');

SELECT * FROM object_events ORDER BY id;

For S3-compatible services, keep protocol = 's3' and set endpoint.

Documentation

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

About

The Unified Lakebase Extension Suite for PostgreSQL

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages