Skip to content

Preserve historical leaks data across post-processor runs#208

Merged
dmarulli merged 1 commit intomainfrom
dmarulli/preserve-historical-leaks
Mar 10, 2026
Merged

Preserve historical leaks data across post-processor runs#208
dmarulli merged 1 commit intomainfrom
dmarulli/preserve-historical-leaks

Conversation

@dmarulli
Copy link
Copy Markdown
Collaborator

@dmarulli dmarulli commented Mar 9, 2026

Summary

  • The leaks post-processor currently uses CREATE OR REPLACE TABLE to rebuild the leaks detail table on every run, wiping all data outside the 30-day computation window
  • This replaces that with a DELETE + INSERT scoped to the window, so historical leak detection results are preserved
  • The agg table still rebuilds from the full detail table each run (now includes full history)
  • Adds CREATE TABLE IF NOT EXISTS as an initialization step for new orgs that won't have a leaks table yet to delete from or insert into

Known issue: leaks_cadc_south_tahoe

This org's leaks table has an older schema (e.g. FLOWTIME instead of FLOWTIME_TS) from before column aliases were added to the leaks detection query in snowflake.py. The South Tahoe DAG is not currently running, so this is not a problem. If it's ever reactivated, the old table should be dropped first so CREATE TABLE IF NOT EXISTS can recreate it with the current schema.

Replace CREATE OR REPLACE TABLE with DELETE+INSERT scoped to
the 30-day window so previous leak detection results are retained.
The agg table still rebuilds from the full detail table each run.
@dmarulli dmarulli marked this pull request as ready for review March 9, 2026 21:02
@dmarulli dmarulli requested a review from jaime-82 March 9, 2026 21:03
@dmarulli dmarulli merged commit fd4cfec into main Mar 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants