Skip to content

Add state to track topic offsets and read more efficiently #12

@guillesd

Description

@guillesd

Right now everytime a scan is triggered a new consumer reads the whole topic. Ideally for big volume you want to keep some sort of state so that a scan of the same topic will only load the new messages. I'm thinking of this in the context of a view, so that views become efficient (this is close to a materialized view on top of a stream?):

create view raw_events as select * exclude message, decode(message)::json as message FROM tributary_scan_topic('my_topic', "bootstrap.servers" := "localhost:9092");

Scans on statements that are not associated to a view should consume the topic from offset 0, maybe. I'm unsure of what is best here.

The idea of persisted state also opens the door to reading from a topic after a process has been interrupted, so:

  • I have a duckdb process reading from some kafka topic and I persist the state (maybe as a table).
  • Process is interrupted.
  • Process restarts and is able to load the state from that table?

Anyways these are just some thoughts! Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions