A Zig binding for chdb - the embedded ClickHouse database engine. This library provides a safe and convenient way to interact with ClickHouse directly from Zig, leveraging the language's memory safety features and type system.
chdb-zig wraps the C API of chdb, giving you access to a full-featured SQL database that runs in-process without needing to manage a separate server. Whether you need to query Parquet files, create in-memory tables, or perform complex analytical queries, chdb-zig makes it straightforward.
Here's a simple example that creates a table and queries it:
const std = @import("std");
const chdb_zig = @import("chdb_zig");
pub fn main() !void {
var gpa: std.heap.GeneralPurposeAllocator(.{}) = .{};
const allocator = gpa.allocator();
defer _ = gpa.deinit();
// Initialize a connection with options
const options = chdb_zig.ChdbConnectionOptions{
.UseMultiQuery = true,
.Path = "my_database.db",
};
const conn = try chdb_zig.initConnection(allocator, options);
defer conn.deinit();
// Create a table
try conn.execute(@constCast("CREATE TABLE IF NOT EXISTS test (id Int32, name String) " ++
"ENGINE = MergeTree() ORDER BY id"));
try conn.execute(@constCast("INSERT INTO test (id,name) VALUES (1,'Alice'), (2,'Bob')"));
// Query the database
var result = try conn.query(@constCast("SELECT * FROM test"));
if (!result.isSuccess()) {
std.debug.print("Query failed: {?s}\n", .{result.getError()});
return;
}
defer result.deinit();
// Iterate through results
var iter = result.iter(allocator);
while (iter.nextRow()) |row| {
std.debug.print("Row: {s}\n", .{row});
}
}One of the powerful features of chdb is the ability to query remote data sources directly. Here's an example using Parquet files from a URL:
const query =
\\CREATE TABLE IF NOT EXISTS parquet_data ENGINE = MergeTree()
\\ORDER BY tuple()
\\AS SELECT * FROM url('https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_0.parquet');
;
try conn.execute(@constCast(query));
// Now query the data
var result = try conn.query(@constCast(
"SELECT URL, COUNT(*) FROM parquet_data " ++
"GROUP BY URL ORDER BY COUNT(*) DESC LIMIT 10"
));
// Access query statistics
std.debug.print("Elapsed time: {d}ms\n", .{result.elapsedTime()});
std.debug.print("Rows read: {d}\n", .{result.rowsRead()});
std.debug.print("Bytes read: {d}\n", .{result.bytesRead()});The ChdbConnectionOptions struct allows you to configure how the connection behaves:
UseMultiQuery- Enable support for multiple queries in a single statementPath- File path for persistent storage (omit for in-memory database)LogLevel- Set logging verbosity (e.g., "debug", "info")CustomArgs- Pass additional command-line arguments to chdb
execute()- Run a query and discard the resultquery()- Run a query and return resultsqueryStreaming()- Stream large result sets
After executing a query, you get a ChdbResult object with the following methods:
iter()- Get an iterator over result rows (NDJSON format)isSuccess()- Check if the query succeededgetError()- Retrieve error message if query failedelapsedTime()- Query execution timerowsRead()- Number of rows processedbytesRead()- Number of bytes readstorageRowsRead()- Rows read from diskstorageBytesRead()- Bytes read from disk
The iterator returned by result.iter() provides several methods for traversing and processing query results. For optimal performance with multiple allocations, use an arena allocator:
var arena = std.heap.ArenaAllocator.init(allocator);
defer arena.deinit();
var iter = result.iter(arena.allocator());Basic Iteration:
nextRow()- Get next row as a raw slice (zero-copy, no allocation)nextAs(T)- Get next row parsed as type T (allocates)rowCount()- Total number of rows in resultreset()- Reset iterator to beginningrowAt(index)- dGet row at specific inex (zero-copy)maxMemoryUsage()- Maximum memory needed for results
Batch Operations:
takeOwned(count)- Take next N rows (allocates owned slice)takeAsOwned(T, count)- Take next N rows parsed as type TsliceOwned(start, end)- Get rows in range as owned slicesliceAsOwned(T, start, end)- Get rows in range parsed as type TselectOwned(predicate)- Filter rows with predicate functionselectAsOwned(T, predicate)- Filter and parse rows with predicate
Arena Allocators:
Methods ending in Owned or using generic type parsing (As) perform allocations. For best performance, pass an arena allocator to iter(). This allows all allocations to be freed in a single call to deinit(), rather than fragmenting memory with individual allocations:
// Good - all allocations freed together
var arena = std.heap.ArenaAllocator.init(allocator);
defer arena.deinit();
var iter = result.iter(arena.allocator());
const rows = try iter.takeAsOwned(User, 100);
// rows is valid until arena.deinit()Without an arena, allocations may fragment memory and you'll need to manage cleanup individually.
This library uses Zig's allocator pattern. You should always defer cleanup:
var gpa: std.heap.GeneralPurposeAllocator(.{}) = .{};
const allocator = gpa.allocator();
defer _ = gpa.deinit();
const conn = try chdb_zig.initConnection(allocator, options);
defer conn.deinit();
var result = try conn.query(@constCast(query));
defer result.deinit();Add chdb-zig to your build.zig.zon:
.dependencies = .{
.chdb_zig = .{
.url = "https://github.com/s0und0fs1lence/chdb-zig/archive/refs/tags/0.0.4.tar.gz",
.hash = "12200c7a3c6b8e9f1d2a3b4c5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5x6y7z8",
},
},In your build.zig, add the dependency to your executable:
const chdb_dep = b.dependency("chdb_zig", .{
.target = target,
.optimize = optimize,
});
// Get the module from the dependency
const chdb_module = chdb_dep.module("chdb_zig");
chdb_module.link_libc = true;
// Add the module to your executable's imports
exe.root_module.addImport("chdb_zig", chdb_module);Now you can import and use chdb-zig in your code:
const chdb_zig = @import("chdb_zig");Contributions are welcome. Feel free to open issues or submit pull requests.
Licensed under the Apache License, Version 2.0. See the LICENSE file for details.