[DENG-11143] Create fxa_fastly_logs dataset and sql generator#9429
[DENG-11143] Create fxa_fastly_logs dataset and sql generator#9429BenWu wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Summary
Adds a new fxa_fastly_logs dataset and an SQL generator that creates authorized views (8 services × 2 envs = 16 views) over Fastly CDN logs that live in the FxA-owned projects. The shape matches the upstream Fastly resource layout (one dataset per table on the FxA side), and the choice to use authorized views instead of dataset syndication is explained in the PR description.
Overall assessment
The generator is small, focused, and follows the same shape as other generators in sql_generators/ (click command, jinja templates, write_sql + reformat). Nothing here looks structurally wrong. Comments below are mostly nits and a couple of small robustness/documentation improvements:
- A typo in the module docstring (
"fo Fastly"). - The dataset description appears truncated mid-sentence.
- Per-view
metadata.yamltemplate doesn't includeowners—Metadata.from_filetolerates the omission, but recommended practices suggest including them. - The
prod_prod/nonprod_stagesource-table naming pattern is non-obvious and would benefit from a one-line explanatory comment. metadata.yamlis written viaPath.write_textand implicitly relies onwrite_sqlhaving created the directory first — fragile if reordered.- Hardcoded
fxa_fastly_logsinview.sqlcould be replaced by{{ dataset }}to keep theDATASETconstant as the single source of truth. use_cloud_functionis accepted but unused (consistent with the generator-all contract; a brief note would clarify intent).
Checklist items that apply to this PR:
- Bug reference (
DENG-11143) is in the title. ✅ - New dataset addition — the PR description references the upstream Terraform definition and justifies why dataset syndication isn't used, which addresses the "is this data already available elsewhere?" item.
- No restricted-namespace schema changes.
Nothing here looks blocking; humans should decide on the open question about source-table naming and whether per-view owners are required for this dataset.
| @@ -0,0 +1,82 @@ | |||
| """Generate authorized views in fxa_fastly_logs fo Fastly CDN logs.""" | |||
There was a problem hiding this comment.
nit: Typo in module docstring — "fo Fastly" should be "for Fastly".
| """Generate authorized views in fxa_fastly_logs fo Fastly CDN logs.""" | |
| """Generate authorized views in fxa_fastly_logs for Fastly CDN logs.""" |
| @@ -0,0 +1,10 @@ | |||
| friendly_name: Firefox Accounts Fastly Logs | |||
| description: |- | |||
| Fastly CDN logs for Firefox Accounts. Contains prod and non-prod | |||
There was a problem hiding this comment.
nit: The description appears to be cut off mid-sentence ("Contains prod and non-prod"). Consider completing it, e.g. "Contains prod and non-prod environments." or similar — the friendly description shows up in BigQuery / data catalog UIs.
| # (env, stage) pairs. env is the GCP/deployment env (prod/nonprod) | ||
| ENVS: list[tuple[str, str]] = [ | ||
| ("prod", "prod"), | ||
| ("nonprod", "stage"), | ||
| ] |
There was a problem hiding this comment.
question: Worth double-checking the source-table naming against the upstream bq.tf. For prod, this renders as moz-fx-fxa-prod.fxa_<service>_prod_prod_fastly_cdn_logs.fastly (the prod_prod is unusual). For nonprod it's moz-fx-fxa-nonprod.fxa_<service>_nonprod_stage_fastly_cdn_logs.fastly. If this is intentional (Fastly's convention puts both deployment-env and stage in the dataset name), a one-line comment explaining the mapping would help future readers — the current # env is the GCP/deployment env comment doesn't quite explain why both env and stage show up in the dataset name.
There was a problem hiding this comment.
that's just how it be
| metadata = metadata_template.render(service=service, env=env, stage=stage) | ||
| metadata_path = ( | ||
| output_dir / target_project / DATASET / view_name / "metadata.yaml" | ||
| ) | ||
| metadata_path.write_text(metadata) |
There was a problem hiding this comment.
nit: metadata_path.write_text(...) works here only because the preceding write_sql happens to create the same directory. If those two operations are ever reordered or split, this will FileNotFoundError. Consider either calling metadata_path.parent.mkdir(parents=True, exist_ok=True) first, or moving the write_sql call right above this so the coupling is obvious.
| type=click.Path(file_okay=False), | ||
| ) | ||
| @use_cloud_function_option | ||
| def generate(target_project, output_dir, use_cloud_function): |
There was a problem hiding this comment.
nit: use_cloud_function is accepted (so generate all can pass it through) but unused in this generator. That's consistent with other generators that don't need a schema fetch, so it's fine — but a short noqa-style note (# Accepted for parity with the generate-all interface; not used here.) would make the intent obvious.
| @@ -0,0 +1,7 @@ | |||
| CREATE OR REPLACE VIEW | |||
| `{{ target_project }}.fxa_fastly_logs.{{ env }}_{{ service }}` | |||
There was a problem hiding this comment.
nit: The dataset name fxa_fastly_logs is hardcoded here but lives as a DATASET constant in __init__.py. Consider passing it through as {{ dataset }} so the single source of truth is the Python constant — minor, but it avoids a rename hazard.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Integration report for "DENG-11143 Create fxa_fastly_logs dataset and sql generator"
|
Description
Authorized views for the tables from https://github.com/mozilla/webservices-infra/blob/18f0473be3cd78dab20752f92daddc8ee7d2f7c4/fxa/tf/modules/resources/bq.tf#L191. Doing this instead of dataset syndication because there are 16 datasets with one table each, which is how fastly does it
Goes with this change https://github.com/mozilla/webservices-infra/pull/11117
Related Tickets & Documents
Reviewer, please follow this checklist