🎉 Add Owl lightweight pipeline runner by Marigold · Pull Request #6038 · owid/etl

Marigold · 2026-05-06T10:48:21Z

Summary

Adds Owl, a lightweight pipeline runner that can live alongside the existing OWID ETL without taking over the etl package name.

The key design choice is that Owl uses its own code tree (owl_steps/) but writes outputs into the existing ETL artifact layout, so existing tooling can consume them:

snapshots → data/snapshots/<namespace>/<snapshot-version>/<dataset>__<snapshot>.parquet
datasets → data/garden/<namespace>/<version>/<dataset>/

Owl step versions use Python-friendly folders such as v20260416, translated to ETL-style versions such as 2026-04-16.

Example layout:

owl_steps/biodiversity/cherry_blossom/v20260416/step.py
owl_steps/biodiversity/cherry_blossom/v20260416/meta.yml

data/garden/biodiversity/2026-04-16/cherry_blossom/

What is included

New local editable package: owid-owl under lib/owl
New CLI entry point: .venv/bin/owl
New Make target: make owl
Two example Owl steps:
- biodiversity/cherry_blossom/v20260416
- space/near_earth_asteroids/v20260416

How to try it

.venv/bin/owl snapshot
.venv/bin/owl run
.venv/bin/owl run biodiversity/cherry_blossom
.venv/bin/owl run biodiversity/cherry_blossom/2026-04-16

Expected generated outputs include:

data/garden/biodiversity/2026-04-16/cherry_blossom/index.json
data/garden/biodiversity/2026-04-16/cherry_blossom/cherry_blossom.feather
data/garden/space/2026-04-16/near_earth_asteroids/index.json

A second owl run should report the datasets/actions as up to date.

Validation performed

.venv/bin/owl snapshot
.venv/bin/owl run
.venv/bin/owl run again to check staleness detection
.venv/bin/owl --help
.venv/bin/etl --help to confirm the existing ETL CLI still resolves
Commit hook ran successfully, including lint/format/type checks

Takeover notes

This is intended as a first working prototype rather than a final framework API.

Important implementation details:

owl.project.parse_step_file() owns the mapping from owl_steps/<namespace>/<dataset>/vYYYYMMDD/step.py to ETL path parts.
owl.snapshot.Snapshot writes raw data into data/snapshots/ as Parquet files.
owl.dataset.Dataset writes OWID catalog datasets with owid.catalog.Dataset.create_empty(...).add(...).save().
Owl metadata currently supports the lightweight datasets: / columns: shape from the prototype and maps column role into VariableMeta.additional_info.
Action stamps are kept under .cache/owl/stamps/ so they do not pollute data/.

Likely follow-ups:

Decide whether Owl should keep its own sync command or rely entirely on existing ETL publish/sync tooling.
Tighten metadata mapping from Owl-style metadata to full catalog metadata.
Decide whether @Dataset(channel="...") is enough for non-garden outputs or whether channel belongs in folder/config.
Add focused unit tests for path parsing, version translation, staleness, and catalog output shape.
Decide whether example steps should stay in this PR or move to docs/examples once the package shape is reviewed.

Marigold · 2026-05-06T10:48:50Z

@codex review

owidbot · 2026-05-06T10:51:22Z

Quick links (staging server):

Site Dev	Site Preview	Admin	Wizard	Docs	Docs Preview

Login: ssh owid@staging-site-feature-owl-lightweight-pipe

chart-diff: ✅

No charts for review.

data-diff: ✅ No differences found

Edited: 2026-05-19 16:15:46 UTC
Execution time: 6.03 seconds

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54d3105be0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T10:55:27Z

+    for parquet_path in sorted(data_dir.rglob("*.parquet")):
+        rel = parquet_path.relative_to(data_dir)
+        parts = rel.with_suffix("").parts  # e.g. ("worldbank", "population")
+        if len(parts) != 2:


Handle Owl's catalog dataset layout when indexing

When build_catalog() is used after owl run, it will skip every normal Owl output: Dataset.run() saves catalog datasets under data/garden/<namespace>/<version>/<dataset>/... via ds.add(table), while this loop only looks for *.parquet and then discards anything whose relative path is not exactly two components like source/name. With the default catalog format this finds no files at all, and even if parquet output is enabled the relative path has five components, so the generated catalog is empty for the datasets this commit writes.

Useful? React with 👍 / 👎.

github-actions Bot assigned Marigold May 6, 2026

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

Marigold added 4 commits May 14, 2026 13:30

🎉🤖 Add Owl lightweight pipeline runner

f449158

🎉 Add Owl lightweight pipeline runner

b612c69

✨🤖 Refine Owl snapshots and actions

4268539

✨🤖 Align Owl metadata with catalog schema

ef63275

Marigold force-pushed the feature/owl-lightweight-pipeline branch from 26697c9 to ef63275 Compare May 14, 2026 12:02

🐛🤖 Fix duplicate project table

65dcb16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎉 Add Owl lightweight pipeline runner#6038

🎉 Add Owl lightweight pipeline runner#6038
Marigold wants to merge 5 commits into
masterfrom
feature/owl-lightweight-pipeline

Marigold commented May 6, 2026 •

edited

Loading

Uh oh!

Marigold commented May 6, 2026

Uh oh!

owidbot commented May 6, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Marigold commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What is included

How to try it

Validation performed

Takeover notes

Uh oh!

Marigold commented May 6, 2026

Uh oh!

owidbot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Marigold commented May 6, 2026 •

edited

Loading

owidbot commented May 6, 2026 •

edited

Loading