Skip to content

Create placeholder field in empty target deploy schemas#9431

Merged
BenWu merged 5 commits into
mainfrom
benwu/target-schema-placeholder
May 22, 2026
Merged

Create placeholder field in empty target deploy schemas#9431
BenWu merged 5 commits into
mainfrom
benwu/target-schema-placeholder

Conversation

@BenWu
Copy link
Copy Markdown
Contributor

@BenWu BenWu commented May 22, 2026

Description

For target deploys, empty schemas (fields: []) are being created when the dry run to get schemas fails. The empty schema causes issues since the downstream tables and views can't query it. Example failed run: https://github.com/mozilla/bigquery-etl/actions/runs/26305494345/job/77442626593?pr=9429. This is blocking #9429

Creating a schema with one field allows at least SELECT * views to work. I think this hasn't come up before because there are very few cases where it would fail and they haven't been edited so they didn't get stage deployed. buildhub2 is one example that I used as a test failure: https://github.com/mozilla/bigquery-etl/actions/runs/26310264998/job/77457419956

Reviewer, please follow this checklist

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

Integration report for "Create placeholder field in empty target deploy schemas"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry/buildhub2/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry/buildhub2/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry/buildhub2/view.sql	2026-05-22 20:39:05.989816566 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry/buildhub2/view.sql	2026-05-22 20:38:59.458788857 +0000
@@ -4,6 +4,7 @@
 CREATE OR REPLACE VIEW
   `moz-fx-data-shared-prod.telemetry.buildhub2`
 AS
+-- test
 SELECT
   *
 FROM

Link to full diff

@BenWu BenWu marked this pull request as ready for review May 22, 2026 20:41
@BenWu BenWu requested a review from a team as a code owner May 22, 2026 20:41
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes _fetch_stub_schema in bigquery_etl/util/target.py so that, when both client.get_table and Schema.for_table's dry-run fail to produce a non-empty schema for an unmanaged stub-target dependency, it writes a one-field placeholder schema (_bqetl_stub_placeholder STRING NULLABLE) instead of an empty fields: []. The function now also treats a successful-but-empty-fields response from either source as a failure and falls through to the placeholder. A new TestFetchStubSchema class covers the dry-run-empty-fields path.

Cross-cutting observations:

  • The intent — letting downstream SELECT * stage views survive an inaccessible source — is well scoped and the implementation is small and localized. The control flow (early-return on success, single placeholder write at the bottom) is a clean refactor of the previous nested try/except.
  • The main correctness concern is that Schema.for_table already swallows its own exceptions and returns cls({"fields": []}) (bigquery_etl/schema/__init__.py:109), so the new except Exception as e: for_table_err = e arm is effectively unreachable in normal use. The warning will print dry-run: None even though for_table printed the real error to stdout earlier — left inline.
  • Test coverage exercises one of three new fall-through paths (get_table raises + for_table returns empty fields). The empty-bq_table.schema branch and the both-raise branch aren't covered; adding parametrized cases would lock the placeholder behavior in.
  • Minor: typo in the test name, and _PLACEHOLDER_STUB_SCHEMA is a shared mutable module global — fine for today's read-only usage, easy to harden.

Reviewer checklist items applicable here (issue ref, schema impact, restricted namespace) are satisfied: the PR links #9429, doesn't add new query fields, and touches utility code rather than restricted SQL namespaces.

Comment thread bigquery_etl/util/target.py Outdated
Comment on lines +622 to +635
for_table_err: Optional[Exception] = None
try:
Schema.for_table(
schema = Schema.for_table(
project=project,
dataset=dataset,
table=name,
id_token=id_token,
partitioned_by=resolve_partition_for(sql_dir, project, dataset, name),
).to_yaml_file(out_path)
except Exception as for_table_err:
print(
f"Warning: Could not fetch schema for {project}.{dataset}.{name}: "
f"get_table: {get_table_err}; dry-run: {for_table_err}"
)
if schema.schema.get("fields"):
schema.to_yaml_file(out_path)
return
except Exception as e:
for_table_err = e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Schema.for_table already catches its own exceptions internally and returns cls({"fields": []}) on failure (see bigquery_etl/schema/__init__.py:109-111) — it prints the underlying error and swallows it. That means this except branch is rarely taken in practice, and the warning emitted below will almost always show dry-run: None, hiding the actual reason the dry-run failed. The genuinely informative signal is the empty-fields fallthrough, not the exception. Consider either (a) re-raising in Schema.for_table and letting this caller decide how to log, or (b) dropping for_table_err and tightening the warning so readers don't chase a None.

Comment thread tests/util/test_target.py
Comment thread tests/util/test_target.py
Comment thread bigquery_etl/util/target.py Outdated
f"get_table: {get_table_err}; dry-run: {for_table_err}. "
f"Writing placeholder schema for stage deploy."
)
Schema(_PLACEHOLDER_STUB_SCHEMA).to_yaml_file(out_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: downstream consumers that aren't simple SELECT * views (e.g., a view that references a specific column by name, or a query that joins on a particular field) will still fail against this placeholder — they'll just fail later in the deploy with a more obscure error than "empty schema." The PR description acknowledges the scope is SELECT * views, so this is fine, but it might be worth a short note in the warning ("downstream views that reference specific columns will still fail") so the deploy log makes the limitation obvious. Optional.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true but they fail at the same spot, just with the different error message so I don't think it makes a difference

Comment thread bigquery_etl/util/target.py Outdated
@BenWu BenWu added this pull request to the merge queue May 22, 2026
Merged via the queue into main with commit 202e67e May 22, 2026
36 checks passed
@BenWu BenWu deleted the benwu/target-schema-placeholder branch May 22, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants