Skip to content

Add ClickHouse data loader#37

Open
fordN wants to merge 2 commits intomainfrom
ford/clickhouse-loader
Open

Add ClickHouse data loader#37
fordN wants to merge 2 commits intomainfrom
ford/clickhouse-loader

Conversation

@fordN
Copy link
Contributor

@fordN fordN commented Feb 3, 2026

Load your data into a ClickHouse instance!

Summary

  • Add ClickHouse loader with native PyArrow integration via insert_arrow()
  • MergeTree table engine with bloom filter index on _amp_batch_id
  • Full Arrow ↔ ClickHouse type mapping
  • Reorg handling with synchronous mutations

Depends on PR #32
Resolves #5

@fordN fordN self-assigned this Feb 3, 2026
@fordN fordN changed the base branch from main to ford/generalize-loader-tests February 3, 2026 01:26
@fordN fordN force-pushed the ford/clickhouse-loader branch 2 times, most recently from 53b0caa to b8287fa Compare February 3, 2026 01:42
@edgeandnode edgeandnode deleted a comment from github-actions bot Feb 3, 2026
@fordN fordN force-pushed the ford/clickhouse-loader branch from b8287fa to 042ef56 Compare February 3, 2026 01:43
@fordN fordN marked this pull request as draft February 3, 2026 04:09
@fordN fordN added the loaders label Feb 5, 2026
Base automatically changed from ford/generalize-loader-tests to main February 9, 2026 22:00
fordN added 2 commits February 9, 2026 18:48
- Add `make test-clickhouse` target
- Fix testcontainer readiness check (remove broken wait_for_logs, rely on
  built-in HTTP health check)
- Import container classes independently so missing driver packages don't
  block other containers
- Fix ClickHouse test config credentials to match container defaults
- Remove calls to nonexistent _record_connection_opened/_closed methods
- Set supports_overwrite=False since ClickHouse only supports APPEND mode
- Fix base test_batch_loading to respect supports_overwrite config
@fordN fordN force-pushed the ford/clickhouse-loader branch from f6b056a to 3184056 Compare February 10, 2026 02:50
@fordN fordN marked this pull request as ready for review February 10, 2026 02:54
@fordN fordN requested a review from incrypto32 February 16, 2026 20:34
Copy link
Member

@incrypto32 incrypto32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! just some minor comments thats not really about this PR

Comment on lines +41 to +43
def __post_init__(self):
if self.connection_params is None:
self.connection_params = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future PR: consistent with other loaders, but we should probably clean this up across the board to use field(default_factory=dict) later in another PR

Comment on lines +85 to +87
def _get_required_config_fields(self) -> list[str]:
"""Return required configuration fields"""
return ['host'] # Only host is truly required, others have defaults
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future PR: All loaders could accept their typed config class directly instead of a raw dict. base class already handles the conversion internally, making this a small change per loader. Would alsoeliminate _get_required_config_fields.

# Declare loader capabilities
SUPPORTED_MODES = {LoadMode.APPEND}
REQUIRES_SCHEMA_MATCH = False
SUPPORTS_TRANSACTIONS = False # ClickHouse uses eventual consistency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: SUPPORTS_TRANSACTIONS = False but load_batch_transactional is implemented and will be called regardless the base class doesn't use this flag. Seems like this flag is unused? is it meant to be purely decorative/metadata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants