DuckDB Paimon Extension 🦆

This extension enables DuckDB to read and query Apache Paimon format data directly — no ETL pipelines, no Flink/Spark clusters required. Just open a DuckDB shell and run SQL against your Paimon tables.

Similar to other extensions, duckdb-paimon brings DuckDB's powerful local analytics to the Paimon data lake ecosystem.

About Apache Paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations. It innovatively combines lake format and LSM structure, bringing realtime streaming updates into the lake architecture.

Implementation

This extension is built on top of paimon-cpp, an open-source C++ library that provides native access to Paimon format data. It is the first library that brings native Paimon read/write capabilities to the C++ ecosystem.

Zero JVM dependency — No Java runtime required. Pure C++ implementation means minimal memory footprint and instant startup.
Apache Arrow data exchange — Data flows between paimon-cpp and DuckDB via Apache Arrow, the industry standard for columnar in-memory data, enabling zero-copy transfers with no serialization overhead.
Parallel scan architecture — Paimon tables are split into independent Splits, and DuckDB's multi-threaded execution engine reads them in parallel to fully utilize multi-core CPUs.
Secure credential management — OSS credentials are managed through DuckDB's native Secret Manager with scope isolation and automatic key redaction.

Features

Read Paimon table data (local and remote OSS)
Projection pushdown optimization
Predicate pushdown optimization
Multiple file format support (manifest / data)
Catalog ATTACH support
DuckDB Secret-based OSS credential management
Snapshot history inspection
Snapshot-based time travel queries

Use Cases

Lightweight Ad-hoc Queries on Realtime Lakehouses

Data is written into Paimon by Flink in real time. Analysts can query it directly on OSS using DuckDB + duckdb-paimon — no compute cluster needed, reducing query latency from minutes to seconds.

Data Validation & Quality Checks

Use DuckDB in CI/CD pipelines to run data quality assertions on Paimon tables, verifying that Flink job outputs meet expectations. Lightweight, fast, and dependency-free.

Data Exploration & Debugging

Data engineers developing Flink jobs can instantly inspect the current state of Paimon tables using DuckDB Shell, quickly locating data issues — far more efficient than launching a Flink SQL Client.

Cross-format Federated Queries

DuckDB natively supports Parquet, CSV, JSON, Iceberg, and more. Combined with duckdb-paimon, you can JOIN Paimon tables with other data sources without any data movement:

-- Join a Paimon orders table with a local CSV dimension table
SELECT o.order_id, o.amount, c.customer_name
FROM paimon_scan('oss://...', 'db', 'orders') o
JOIN read_csv('customers.csv') c ON o.customer_id = c.id;

Development Guide

Building

Clone the repository with submodules:

git clone --recurse-submodules https://github.com/polardb/duckdb-paimon.git
cd duckdb-paimon

--recurse-submodules pulls DuckDB and paimon-cpp, which are required to build the extension.

Build in release mode:

GEN=ninja make

Or build in debug mode:

GEN=ninja make debug

Running the Tests

# Release
make test

# Debug
make test_debug

Usage

The examples below use sample data bundled in the data/ directory of this repository. Start the DuckDB shell with the extension pre-loaded:

./build/release/duckdb

Query Local Paimon Tables

Pass the table path directly to paimon_scan, or use separate warehouse / database / table arguments:

SELECT * FROM paimon_scan('./data/testdb.db/testtbl');
┌─────────┬───────┬───────┬────────┐
│   f0    │  f1   │  f2   │   f3   │
│ varchar │ int32 │ int32 │ double │
├─────────┼───────┼───────┼────────┤
│ Alice   │     1 │     0 │   11.0 │
│ Bob     │     1 │     1 │   12.1 │
│ Cathy   │     1 │     2 │   13.2 │
│ David   │     2 │     0 │   21.0 │
│ Eve     │     2 │     1 │   22.1 │
│ Frank   │     2 │     2 │   23.2 │
│ Grace   │     3 │     0 │   31.0 │
│ Henry   │     3 │     1 │   32.1 │
│ Iris    │     3 │     2 │   33.2 │
└─────────┴───────┴───────┴────────┘

-- SELECT * FROM paimon_scan('./data', 'testdb', 'testtbl');

Query Remote OSS Paimon Tables

First create a secret to supply OSS credentials, then query using either a full table path or separate warehouse / database / table arguments:

-- Configure OSS credentials
CREATE SECRET my_oss (
    TYPE paimon,
    key_id 'your-access-key-id',
    secret 'your-access-key-secret',
    endpoint 'oss-cn-hangzhou.aliyuncs.com'
);

SELECT * FROM paimon_scan('oss://your-bucket/warehouse/your_db.db/your_table');
SELECT * FROM paimon_scan('oss://your-bucket/warehouse', 'your_db', 'your_table');

Attach as Catalog

ATTACH a Paimon warehouse as a catalog to browse and query all databases and tables inside it with standard DuckDB SQL:

ATTACH './data' AS my_catalog (TYPE paimon);

SHOW ALL TABLES;
SELECT * FROM my_catalog.testdb.testtbl;

-- For an OSS warehouse:
-- ATTACH 'oss://my-bucket/warehouse' AS my_catalog (TYPE paimon);

Inspect Snapshot History

Use paimon_snapshots to list all snapshots of a Paimon table — useful for auditing commit history, diagnosing data issues, or identifying a snapshot ID for time-travel queries:

SELECT snapshot_id, commit_kind, commit_time, total_record_count
FROM paimon_snapshots('./data/testdb.db/testtbl')
ORDER BY snapshot_id;
┌─────────────┬─────────────┬─────────────────────────┬────────────────────┐
│ snapshot_id │ commit_kind │      commit_time        │ total_record_count │
│    int64    │   varchar   │       timestamp         │       int64        │
├─────────────┼─────────────┼─────────────────────────┼────────────────────┤
│           1 │ APPEND      │ 2026-01-15 10:48:23.486 │                  3 │
│           2 │ APPEND      │ 2026-01-15 10:48:23.509 │                  6 │
│           3 │ APPEND      │ 2026-01-15 10:48:23.528 │                  9 │
└─────────────┴─────────────┴─────────────────────────┴────────────────────┘

-- SELECT snapshot_id, commit_kind, commit_time, total_record_count
-- FROM paimon_snapshots('oss://your-bucket/warehouse', 'your_db', 'your_table')
-- ORDER BY snapshot_id;

Time Travel Queries

Query a historical version of a table by snapshot ID or by timestamp. Use paimon_snapshots first to identify the snapshot you want.

-- Read from a specific snapshot (6 rows — state after the second append)
SELECT * FROM paimon_scan('./data/testdb.db/testtbl', snapshot_from_id=2);

-- Read from a point in time (returns the snapshot active at that moment)
SELECT * FROM paimon_scan('./data/testdb.db/testtbl', snapshot_from_timestamp=TIMESTAMP '2026-01-15 10:48:23.5');

When using an ATTACHed catalog, the same functionality is available via DuckDB's native AT clause:

ATTACH './data' AS my_catalog (TYPE paimon);

-- AT (VERSION => snapshot_id)
SELECT count(*) FROM my_catalog.testdb.testtbl AT (VERSION => 2);

-- AT (TIMESTAMP => point_in_time)
SELECT count(*) FROM my_catalog.testdb.testtbl AT (TIMESTAMP => TIMESTAMP '2026-01-15 10:48:23.5');

Related Projects

Apache Paimon — Realtime lakehouse format
paimon-cpp — Native C++ library for Paimon (underlying dependency)
DuckDB — Embeddable OLAP database
duckdb-iceberg — DuckDB's official Iceberg extension

Join the Community

We welcome contributions and discussions! If you have questions, ideas, or want to connect with other users and developers, join our community by clicking here or scan the QR code below:

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
data		data
docs		docs
duckdb @ 8a58519		duckdb @ 8a58519
extension-ci-tools @ ec20f45		extension-ci-tools @ ec20f45
patches		patches
scripts		scripts
src		src
test		test
third_party		third_party
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
extension_config.cmake		extension_config.cmake
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuckDB Paimon Extension 🦆

About Apache Paimon

Implementation

Features

Use Cases

Lightweight Ad-hoc Queries on Realtime Lakehouses

Data Validation & Quality Checks

Data Exploration & Debugging

Cross-format Federated Queries

Development Guide

Building

Running the Tests

Usage

Query Local Paimon Tables

Query Remote OSS Paimon Tables

Attach as Catalog

Inspect Snapshot History

Time Travel Queries

Related Projects

Join the Community

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DuckDB Paimon Extension 🦆

About Apache Paimon

Implementation

Features

Use Cases

Lightweight Ad-hoc Queries on Realtime Lakehouses

Data Validation & Quality Checks

Data Exploration & Debugging

Cross-format Federated Queries

Development Guide

Building

Running the Tests

Usage

Query Local Paimon Tables

Query Remote OSS Paimon Tables

Attach as Catalog

Inspect Snapshot History

Time Travel Queries

Related Projects

Join the Community

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages