The Unified Lakebase Extension Suite for PostgreSQL.
pg-lakebase makes PostgreSQL a first-class citizen in the modern Lakebase ecosystem. By implementing high-performance Table Access Methods (TAM) and Foreign Data Wrappers (FDW) in Rust — backed by a dedicated local caching storage service — it allows PostgreSQL to query and manage open table formats with native-like performance and semantics.
The current runnable extension is pg-iceberg-am, a PostgreSQL Table Access
Method (TAM) for Apache Iceberg tables. It uses pg-lakebase-core for the TAM
framework, iceberg-lite for Iceberg metadata and file format logic, and pgrx
for PostgreSQL integration.
pg-iceberg-amis the primary SQL-facing extension. Its local Iceberg table storage path is the default and most exercised path, using PostgreSQL's local file APIs and a custom WAL resource manager for crash recovery.- Object storage is available through distributed tablespaces backed by
pg-lakebase-storage, a Unix-socket cache service. The storage layer supports AWS S3, S3-compatible endpoints, Google Cloud Storage, and Azure Blob Storage. pg-lakebase-corecurrently exposes TAM framework primitives. FDW support is still a project direction, not a completed public API.
PostgreSQL backend
|
| pgrx hooks (TAM / FDW)
v
+------------------+ +---------------------+
| pg-iceberg-am | ---> | pg-lakebase-core |
| (Iceberg TAM) | | (framework traits) |
+------------------+ +---------------------+
/ \
local storage object storage
(VFD + WAL) (Unix domain socket)
/ \
v v
local filesystem +-------------------------------+
| pg-lakebase-storage |
| transport | protocol | conn |
| service | backend | cache |
+-------------------------------+
| |
v v
local disk cache S3 / S3-compatible / GCS / Azure
(redb + files) (object_store)
pg-iceberg-am supports two storage paths depending on the tablespace:
- Local storage: reads and writes go directly through PostgreSQL's Virtual File Descriptor (VFD) system with optional WAL logging for crash consistency.
- Object storage: the database process communicates with
pg-lakebase-storageover Unix domain sockets. Reads of cached files use a localpreadfast path that bypasses the socket entirely; control operations (open, head, miss fetch, upload) go over the socket. Cache misses are transparently fetched from AWS S3, S3-compatible endpoints, Google Cloud Storage, or Azure Blob Storage. Writes go through an explicit stage → commit flow tied to database transaction boundaries.
Object-storage tablespaces intentionally use the PostgreSQL tablespace name as
the storage-service store_id, so cache and staging paths remain readable on
disk. Because that name is part of the storage identity, renaming a distributed
tablespace is unsupported.
Tablespace options currently expose protocol=s3, protocol=gcs, and
protocol=azure; use protocol=s3 with a custom endpoint for S3-compatible
services.
Distributed tablespace credentials are currently stored in
pg_tablespace.spcoptions. They are redacted from Rust Debug output, but the
catalog value itself is not encrypted; production deployments should prefer
credential references, IAM-style ambient credentials, or another secret manager
once that integration exists.
| Crate | Purpose |
|---|---|
| pg-iceberg-am | PostgreSQL extension implementing the Iceberg table access method. |
| pg-lakebase-core | Framework crate for PostgreSQL TAM implementations. |
| pg-lakebase-core-tests | PostgreSQL integration tests (#[pg_test]) for pg-lakebase-core. |
| pg-lakebase-macros | Procedural macro support, including #[pg_table_am]. |
| iceberg-lite | Synchronous, PostgreSQL-friendly Iceberg library used by the TAM. |
| pg-lakebase-storage | Local object-storage caching service library. |
| xtask | Workspace maintenance commands: test-all, isolation. |
- Rust 1.95.0 or later
- PostgreSQL 17, including server development files, or a pgrx-managed PostgreSQL 17 downloaded during setup
cargo-pgrx0.18.0
Register PostgreSQL 17 with pgrx. Use either an existing pg_config or let
pgrx download PostgreSQL:
cargo pgrx init --pg17=/path/to/pg_config
# or
cargo pgrx init --pg17=downloadBuild the Iceberg extension crate:
cargo build --package pg-iceberg-amInstall the extension into the PostgreSQL instance you want to use. Pass the
target PostgreSQL 17 pg_config, whether it comes from pgrx-managed PostgreSQL
or an existing PostgreSQL installation:
cargo pgrx install --package pg-iceberg-am --pg-config /path/to/pg_configThen start or restart PostgreSQL with shared_preload_libraries='pg_iceberg_am'.
For a pgrx-managed PostgreSQL 17:
cargo pgrx start pg17 \
--package pg-iceberg-am \
--postgresql-conf "shared_preload_libraries='pg_iceberg_am'"
cargo pgrx connect pg17 --package pg-iceberg-amIf the pgrx-managed PostgreSQL instance is already running, stop it before
starting it again so shared_preload_libraries is applied.
For an existing PostgreSQL 17, update postgresql.conf:
shared_preload_libraries = 'pg_iceberg_am'
Then restart PostgreSQL and connect to the target database.
After modifying code, run the standard test suite:
cargo xtask test-all pg17This runs unit tests, pgrx tests, SQL regression, and isolation tests.
Regression SQL lives in pg-iceberg-am/tests/pg_regress/sql,
isolation specs in pg-iceberg-am/tests/isolation/specs,
and isolation results are written to target/isolation/pg17/output_iso/.
Build a distributable directory of extension artifacts:
cargo pgrx package --package pg-iceberg-am --pg-config "$(cargo pgrx info pg-config pg17)"Use package when you want to copy the extension artifacts into an image, VM,
or distro package instead of installing directly into a local PostgreSQL
installation.
Create the extension once in each database that uses Iceberg tables:
CREATE EXTENSION IF NOT EXISTS pg_iceberg_am;Create a local Iceberg table in PostgreSQL's default tablespace:
CREATE TABLE events (
id int,
payload text,
created_at timestamp
) USING iceberg;
INSERT INTO events VALUES
(1, 'hello', now()),
(2, 'lakebase', now());
SELECT * FROM events ORDER BY id;To use a regular PostgreSQL local tablespace, create the tablespace first and then place the Iceberg table in it:
CREATE TABLESPACE lake_local LOCATION '/path/to/local/tablespace';
CREATE TABLE local_events (
id int,
payload text
) USING iceberg TABLESPACE lake_local;To use object storage, create a distributed tablespace and then place the
Iceberg table in it. PostgreSQL still requires a local LOCATION directory for
the tablespace metadata.
CREATE TABLESPACE lake_s3 LOCATION '/path/to/local/tablespace' WITH (
protocol = 's3',
bucket = 'my-lake-bucket',
region = 'us-east-1'
);
CREATE TABLE object_events (
id int,
payload text
) USING iceberg TABLESPACE lake_s3;
INSERT INTO object_events VALUES
(1, 'hello'),
(2, 'lakebase');
SELECT * FROM object_events ORDER BY id;For S3-compatible services, keep protocol = 's3' and set endpoint.
This project is licensed under the Apache License 2.0. See LICENSE for details.