Skip to content

feat: add realtime-box with Lookup Catalog sync workflows#440

Open
toru-takahashi wants to merge 1 commit into
treasure-data:mainfrom
toru-takahashi:feat/realtime-box
Open

feat: add realtime-box with Lookup Catalog sync workflows#440
toru-takahashi wants to merge 1 commit into
treasure-data:mainfrom
toru-takahashi:feat/realtime-box

Conversation

@toru-takahashi

Copy link
Copy Markdown
Member

Summary

Adds a new realtime-box/ directory for Treasure AI RT 2.0 workflow templates, starting with the Lookup Catalog sync workflow.

realtime-box/lookup-catalog-sync/

Two variants for syncing cdp_lookup_catalog tables to RT 2.0 internal storage:

manual/ — for environments with fewer than 5 tables or where explicit column control is needed

  • lookup_catalog_sync.dig — iterates over a configured table list
  • queries/ — SQL for digest initialization, change extraction, count check, and digest update

table-discovery/ — for environments with many tables or frequently changing schemas (requires additional feature flag, contact Treasure AI Support)

  • lookup_catalog_sync.dig — auto-discovers tables via information_schema
  • sync_table.dig — reusable single-table sync logic
  • scripts/generate_sql.py — type-aware JSON payload SQL generator supporting array<varchar>, array<bigint>, array<double>, scalar float artifact fix, and NULL element preservation
  • queries/discover_tables.sql — excludes _wf_* internal tables

Both variants use hash-based change detection so only changed rows are uploaded on each run, and use a consistent _wf_ prefix for all internal/temporary tables.

Test plan

  • Run manual/ workflow against a test cdp_lookup_catalog table and verify records appear in RT 2.0
  • Run table-discovery/ workflow and verify automatic table detection works
  • Verify incremental run uploads only changed records
  • Verify NULL key rows cause a clear error before upload

Generated with Treasure Work

Adds realtime-box/lookup-catalog-sync with two variants:

manual/
  lookup_catalog_sync.dig    — iterates configured tables with explicit
                               column definitions
  queries/                   — SQL for digest init, extract, count, update

table-discovery/
  lookup_catalog_sync.dig    — auto-discovers tables via information_schema
  sync_table.dig             — reusable single-table sync called per table
  scripts/generate_sql.py    — type-aware JSON payload SQL generator
                               (supports array<varchar/bigint/double>,
                                float artifact fix, NULL element handling)
  queries/discover_tables.sql — excludes _wf_* internal tables

Both variants implement hash-based change detection (only changed rows
are uploaded on each run) and use the _wf_ prefix for internal tables.

Co-Authored-By: Treasure Work <291137728+treasure-work@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant