Skip to content

feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg#4242

Closed
mengw15 wants to merge 52 commits intoapache:mainfrom
mengw15:Restful-Catalog4
Closed

feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg#4242
mengw15 wants to merge 52 commits intoapache:mainfrom
mengw15:Restful-Catalog4

Conversation

@mengw15
Copy link
Contributor

@mengw15 mengw15 commented Feb 27, 2026

What changes were proposed in this PR?

This PR introduces Lakekeeper as a REST catalog service for Iceberg, replacing direct JDBC catalog connections. Key changes include:

  • Lakekeeper bootstrap script (bin/bootstrap-lakekeeper.sh): automates Lakekeeper setup with MinIO as the S3-compatible storage backend.
  • Iceberg catalog migration (Scala & Python): updated Scala side and Python side to connect via the Lakekeeper REST catalog instead of direct JDBC.
  • Single-node deployment: updated bin/single-node/docker-compose.yml to include Lakekeeper and MinIO services.
  • Kubernetes deployment: added Lakekeeper init job, external-names for service discovery, and exposed Lakekeeper to the computing-unit pool.

Post-merge setup for developers

After this PR is merged, each developer needs to perform the following one-time setup:

  1. Create the Lakekeeper database

psql -f sql/texera_lakekeeper.sql

  1. Download the Lakekeeper binary

Go to the Lakekeeper releases page and download the binary for your platform
Place it somewhere on your machine

  1. Configure bin/bootstrap-lakekeeper.sh

Edit the User Configuration section at the top of the script:

LAKEKEEPER_BINARY_PATH — path to the downloaded Lakekeeper binary
LAKEKEEPER__PG_DATABASE_URL_READ / LAKEKEEPER__PG_DATABASE_URL_WRITE — PostgreSQL connection URL in the format postgres://username:password@hostname:5432/texera_lakekeeper

  1. Run the bootstrap script

./bin/bootstrap-lakekeeper.sh
This will start Lakekeeper, create the default project, set up the MinIO bucket, and create the warehouse.

Any related issues, documentation, discussions?

Closes #4126

How was this PR tested?

Tested manually on single-node Docker Compose deployment and Kubernetes cluster

Was this PR authored or co-authored using generative AI tooling?

co-authored with AI

@github-actions github-actions bot added the ci changes related to CI label Mar 2, 2026
@mengw15 mengw15 marked this pull request as ready for review March 4, 2026 23:52
@mengw15 mengw15 changed the title feat: introduce Lakekeeper as REST catalog for Iceberg storage feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg storage Mar 5, 2026
@mengw15 mengw15 changed the title feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg storage feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg Mar 5, 2026
@mengw15
Copy link
Contributor Author

mengw15 commented Mar 5, 2026

@bobbai00

@bobbai00
Copy link
Contributor

As multiple small PRs are raised. I will close this one

@bobbai00 bobbai00 closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build ci changes related to CI common ddl-change Changes to the TexeraDB DDL dependencies Pull requests that update a dependency file engine python service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate to Result Service and MinIO for Execution Results

3 participants