Skip to content

guillesd/duckdb-streaming-patterns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuckDB Streaming Patterns

This repository demonstrates several patterns for streaming data from Kafka into DuckDB (and DuckLake) using Python and Spark. It provides practical examples for ingesting, processing, and aggregating event data in real time, suitable for analytics and prototyping modern data pipelines. For a full write-up and context read the blogpost TODDO.

Repository Structure

Getting Started

Prerequisites

  • Python 3.8+
  • Docker (for running Kafka)
  • Java (for Spark)
  • Install dependencies:
    pip install -r requirements.txt
    

Start Kafka Broker

Start the Kafka broker using Docker Compose:

docker compose up -d

Produce Events

Run the producer to generate random user events:

python scripts/producer.py --bootstrap-servers localhost:9092 --topic my_topic --duration 60

Run a Streaming Pattern

Example: Run Pattern 1.1 (DuckDB streaming and aggregation)

python pipelines/pattern_1_1.py --bootstrap-servers localhost:9092 --topic my_topic --duration 60

Other patterns can be run similarly (add --bootstrap-servers, --topic, and --duration as needed):

  • DuckLake:
    python pipelines/pattern_1_2.py --bootstrap-servers localhost:9092 --topic my_topic --duration 60
    
  • Spark:
    python pipelines/pattern_2.py --bootstrap-servers localhost:9092 --topic my_topic --duration 60
    
  • Tributary:
    python pipelines/bonus_pattern.py --bootstrap-servers localhost:9092 --topic my_topic --duration 60
    

Cleanup

To remove generated databases and Kafka topics:

python scripts/cleanup.py --bootstrap-servers localhost:9092 --topic my_topic

Notes

  • The default Kafka topic is my_topic. You can change this via command-line arguments.
  • DuckLake and Tributary extensions are installed automatically by the scripts.

About

Some ways in which you can use DuckDB for streaming analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages