Skip to content

Commit e1397c0

Browse files
committed
materialize-clickhouse: new connector
1 parent d9acbbb commit e1397c0

3 files changed

Lines changed: 97 additions & 0 deletions

File tree

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
2+
3+
# ClickHouse
4+
5+
This connector materializes Estuary collections into tables in a ClickHouse database.
6+
7+
[ClickHouse](https://clickhouse.com/) is a column-oriented OLAP database designed for real-time analytics.
8+
This connector writes directly to ClickHouse using the native protocol.
9+
10+
Estuary also provides a [Dekaf-based integration](./Dekaf/clickhouse.md) for users who prefer to ingest via ClickPipes.
11+
12+
## Prerequisites
13+
14+
To use this connector, you'll need:
15+
16+
* A ClickHouse database (self-hosted or ClickHouse Cloud) with a user that has permissions to create tables and write data.
17+
* The connector uses the ClickHouse native protocol (port 9000 by default, not the HTTP interface on port 8123).
18+
* At least one Estuary collection.
19+
20+
:::tip
21+
If you haven't yet captured your data from its external source, start at the beginning of the [guide to create a dataflow](../../../guides/create-dataflow.md). You'll be referred back to this connector-specific documentation at the appropriate steps.
22+
:::
23+
24+
## Configuration
25+
26+
To use this connector, begin with data in one or more Estuary collections.
27+
Use the below properties to configure a ClickHouse materialization, which will direct the contents of these collections into ClickHouse tables.
28+
29+
### Properties
30+
31+
#### Endpoint
32+
33+
| Property | Title | Description | Type | Required/Default |
34+
|---|---|---|---|---|
35+
| **`/address`** | Address | Host and port of the database, in the form of `host[:port]`. Port 9000 is used as the default if no specific port is provided. | string | Required |
36+
| **`/credentials`** | Authentication | | object | Required |
37+
| **`/credentials/auth_type`** | Auth Type | Authentication type. Must be `user_password`. | string | Required |
38+
| **`/credentials/username`** | Username | Database username. | string | Required |
39+
| **`/credentials/password`** | Password | Database password. | string | Required |
40+
| **`/database`** | Database | Name of the ClickHouse database to materialize to. | string | Required |
41+
| `/hardDelete` | Hard Delete | If enabled, the connector inserts tombstone rows with `_is_deleted = 1` when source documents are deleted, causing them to be excluded from `FINAL` queries. By default, source deletions are ignored at the destination. | boolean | `false` |
42+
43+
#### Bindings
44+
45+
| Property | Title | Description | Type | Required/Default |
46+
|---|---|---|---|---|
47+
| **`/table`** | Table | Name of the database table to materialize to. The connector will create the table if it doesn't already exist. | string | Required |
48+
49+
### Sample
50+
51+
```yaml
52+
materializations:
53+
${PREFIX}/${mat_name}:
54+
endpoint:
55+
connector:
56+
config:
57+
address: clickhouse.example.com:9000
58+
credentials:
59+
auth_type: user_password
60+
username: flow_user
61+
password: secret
62+
database: my_database
63+
image: ghcr.io/estuary/materialize-clickhouse:v1
64+
bindings:
65+
- resource:
66+
table: my_table
67+
source: ${PREFIX}/${source_collection}
68+
```
69+
70+
## ReplacingMergeTree and FINAL
71+
72+
The connector creates tables using the [ReplacingMergeTree engine](https://clickhouse.com/docs/engines/table-engines/mergetree-family/replacingmergetree). Updated records are actually inserted as duplicates; ClickHouse later deduplicates these as a background process.
73+
74+
Your queries should use the `FINAL` directive to get deduplicated results, and include the predicate `_is_deleted = 0` to ignore deleted records.
75+
76+
```sql
77+
SELECT * FROM my_table FINAL WHERE _is_deleted = 0;
78+
```
79+
80+
## Hard deletes
81+
82+
All tables are created with `_version` (UInt64) and `_is_deleted` (UInt8) columns used internally by the `ReplacingMergeTree` engine.
83+
84+
If you set `hardDelete: true` in the endpoint configuration, the connector inserts a **tombstone row** when a source document is deleted. The tombstone has `_is_deleted = 1`, the same key columns as the original row, and zero values for all other columns. The `ReplacingMergeTree` engine then uses `_is_deleted` to hide these rows from `FINAL` queries, and eventually removes the tombstoned records from the table.
85+
86+
## Soft deletes not supported
87+
88+
Source deletions are effectively ignored at the destination.
89+
90+
## Delta updates not supported
91+
92+
This connector does not support [delta updates](/concepts/materialization/#delta-updates). Only standard (merge) mode is supported.

site/docs/reference/Connectors/materialization-connectors/Dekaf/clickhouse.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33

44
This connector materializes Estuary collections as Kafka-compatible messages that a ClickHouse Kafka consumer can read. [ClickHouse](https://clickhouse.com/) is a real-time analytical database and warehouse.
55

6+
Estuary also provides a [direct materialization with ClickHouse](../ClickHouse.md).
7+
68
## Prerequisites
79

810
To use this connector, you'll need:

site/docs/reference/Connectors/materialization-connectors/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ In the future, other open-source materialization connectors from third parties c
5959
* Bytewax
6060
* [Configuration](./Dekaf/bytewax.md)
6161
* ClickHouse
62+
* [Configuration](./ClickHouse.md)
63+
* Package - ghcr.io/estuary/materialize-clickhouse:v1
64+
* ClickHouse (Dekaf)
6265
* [Configuration](./Dekaf/clickhouse.md)
6366
* CSV Files in GCS
6467
* [Configuration](./google-gcs-csv.md)

0 commit comments

Comments
 (0)