DENG-11155 - Newtab Widget aggregation table Update#9434
Conversation
There was a problem hiding this comment.
This PR widens the grain of firefox_desktop_derived.newtab_widgets_daily_v1 from (submission_date, widget_name) to (submission_date, app_version, os, channel, locale, country, widget_name), expands the user_action value lists feeding widget_setting_change_count and widget_utility_action_count (clock, teams, schedule, hour-format, timer-reset, expand/collapse, etc.), adds clustering on (channel, country), and introduces a new widget_enabled_clients metric sourced from metrics.string_list.newtab_widgets_enabled_list. checks.sql, metadata.yaml, schema.yaml, and README.md are updated to match.
Cross-cutting observations
- Schema vs. query drift. The new
widget_enabled_clientscolumn is inquery.sqloutput but missing fromschema.yaml. Per the reviewer checklist, schema must be updated when fields are added — this is the most actionable item before merge. widget_enabled_clientssemantics. The new CTE pulls frommetrics.string_list.newtab_widgets_enabled_listrather than the event stream, thenLEFT JOINs back on a seven-key tuple includingwidget_name. This is worth scrutinizing: (a) the twowidget_namenamespaces (event extra vs. string_list value) may not line up identically; (b) clients with the widget enabled but no widget event on the day will be silently dropped; (c) it's a second full scan ofnewtab_v1. See the inline comment on the CTE for details.- Grain change is a real behavior change, not just a refactor. Multiplying by
app_version × os × channel × locale × countrywill inflate row counts substantially (sparsely, but still). Worth a sanity-check on size and on whether downstream Looker explores assume the old grain. Since the table is still v1 and only recently introduced, an in-place change is reasonable, but the doc string "(initial version)" in the README is no longer literally true. - Documentation consistency. Several docs were partially updated for the new grain but a few spots still reference the old shape: the Mermaid diagram label, the
is_uniquecomment inchecks.sql, and the shortenedmetadata.yamldescription (which dropped some useful detail). Inline notes on each. - Schema descriptions. A few of the newly-added field descriptions appear to be copy-pasted boilerplate that doesn't actually describe the stored value (
app_versionsays "1.0.3"-style string but stores an integer major version;countrysays "name of the country" but stores a 2-letter code).
No tests live under tests/sql/.../newtab_widgets_daily_v1/; expanding test coverage is out of scope for this PR but worth flagging given the grain change and the cross-source join.
| mode: NULLABLE | ||
| description: Set of language- and/or country-based preferences for a user interface. | ||
| - name: country | ||
| type: STRING |
There was a problem hiding this comment.
issue: widget_enabled_clients is selected in query.sql (line 176) but is not declared in schema.yaml. The reviewer checklist explicitly calls this out: "If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated." Please add a widget_enabled_clients entry (INTEGER, NULLABLE) so the deploy reflects the new column. Otherwise the schema deploy will either drop the column or fail validation depending on how schema_update_option is configured.
| - name: app_version | ||
| type: INTEGER | ||
| mode: NULLABLE | ||
| description: User visible version string (e.g. "1.0.3") for the browser. |
There was a problem hiding this comment.
issue: the description does not match the value being stored. The query stores mozfun.norm.browser_version_info(client_info.app_display_version).major_version cast to INT64 — i.e. just the integer major version (e.g. 142), not the user-visible display string like "1.0.3". Suggest updating the description to something like "Major version of the browser, parsed from client_info.app_display_version." so downstream consumers don't mis-interpret the column.
| - name: country | ||
| type: STRING | ||
| mode: NULLABLE | ||
| description: Name of the country in which the activity took place, as determined | ||
| by the IP geolocation. |
There was a problem hiding this comment.
issue: the description says "Name of the country" but the column stores normalized_country_code — a 2-letter ISO country code, not the country name. Recommend wording it as "ISO 3166-1 alpha-2 country code, derived from IP geolocation (normalized_country_code)."
| - name: os | ||
| type: STRING | ||
| mode: NULLABLE | ||
| description: Set to "Other" if this message contained an unrecognized OS name. |
There was a problem hiding this comment.
nitpick: this description only states the fallback behavior ("Set to 'Other' if this message contained an unrecognized OS name.") and doesn't actually describe what the field is. Consider something like "Normalized operating system name (e.g. Windows, Mac, Linux); Other if unrecognized.". Same shape applies to locale and channel on the surrounding lines — they read as fragments rather than column definitions.
| @@ -115,4 +188,7 @@ FROM | |||
| aggregated AS agg | |||
| LEFT JOIN | |||
| user_action_array AS uaa | |||
| USING (submission_date, widget_name) | |||
| USING (submission_date, widget_name, app_version, os, channel, locale, country) | |||
| LEFT JOIN | |||
| enabled_users_aggregate AS eua | |||
| USING (submission_date, widget_name, app_version, os, channel, locale, country) | |||
There was a problem hiding this comment.
issue: there are a few concerns with how enabled_users_aggregate is wired in:
-
Row loss for enable-only clients.
euais joined withLEFT JOIN agg → eua, so any(date, app_version, os, channel, locale, country, widget_name)combination that appears inmetrics.string_list.newtab_widgets_enabled_listbut has nowidgets_impression / widgets_user_event / widgets_enabledevent on that day will be silently dropped. Ifwidget_enabled_clientsis meant to count all clients with the widget enabled (which is what the PR description and README suggest), this likely under-counts. ConsiderFULL OUTER JOIN(withCOALESCEon the join keys) — or document the intentional exclusion in the README under "Notes & Conventions". -
Different
widget_namenamespaces.widget_events.widget_namecomes fromevent.extra['widget_name'], whileeua.widget_namecomes from values insidemetrics.string_list.newtab_widgets_enabled_list. If those two surfaces use different identifiers (case, slug vs. display name, etc.) the join key will silently miss — andwidget_enabled_clientswill come back NULL for every row. Worth verifying with a sample query before merging that the value sets actually intersect. -
Duplicate scan of
newtab_v1. The same source table is scanned a second time here with the same partition filter; the dimensions and aggregation could be folded into the existingaggregatedCTE by addingCOUNT(DISTINCT IF(<client has widget enabled>, client_id, NULL))against anUNNESTjoined inwidget_events. Not required, but it would halve the scan cost and remove the join-key risk in (2).
| ```mermaid | ||
| flowchart TD | ||
| A[Firefox Desktop Newtab Ping<br/>`firefox_desktop_stable.newtab_v1`] -->|filter @submission_date<br/>UNNEST events · widget events only<br/>GROUP BY widget_name| B[**This query**] | ||
| B --> C[Partitioned table<br/>time: `submission_date`<br/>cluster: `widget_name`] | ||
| B --> C[Partitioned table<br/>time: `submission_date`<br/>cluster: `widget_name`, `channel`, `country`] |
There was a problem hiding this comment.
nitpick: the Mermaid label still reads GROUP BY widget_name but the actual GROUP BY now includes submission_date, app_version, os, channel, locale, country, widget_name. Worth updating the diagram so it stays consistent with the grain table above and the How It Works step 3.
| description: |- | ||
| Daily aggregation of Firefox New Tab widget telemetry. | ||
| One row per widget per day, capturing impressions, user interactions, | ||
| enable/disable actions, and engagement metrics derived from the unified | ||
| widget telemetry shape in newtab_v1. | ||
| A daily aggregation of newtab actions on widgets (example: impressions, usage actions, clicks, etc...) | ||
| for Firefox desktop, partitioned by day. |
There was a problem hiding this comment.
suggestion: the new description is shorter and less precise than the one it replaces — it drops the explicit mention of impressions / user interactions / enable–disable / engagement metrics and that the data is derived from newtab_v1, and adds an etc... ellipsis. Since this description surfaces in dataset catalogs (Looker, BigQuery UI), it's worth keeping the more descriptive previous wording and just appending the new dimensions, e.g.:
description: |-
Daily aggregation of Firefox New Tab widget telemetry derived from
newtab_v1. One row per widget per day, broken down by app_version,
os, channel, locale, and country, capturing impressions, user
interactions, enable/disable actions, and engagement metrics.| clustering: | ||
| fields: | ||
| - widget_name | ||
| - channel | ||
| - country |
There was a problem hiding this comment.
question: clustering order matters — BigQuery filters most efficiently on the leading clustering keys. The example query in README.md filters on country (and downstream Looker analyses are likely to filter on country/channel first), but the current order is widget_name, channel, country. If country/channel are expected to appear in WHERE more often than widget_name, consider reordering as country, channel, widget_name (or similar) to maximize pruning. Worth confirming the expected query patterns before locking this in, since cluster order is a deploy-time choice.
| agg.widget_name, | ||
| submission_date, | ||
| widget_name, | ||
| eua.widget_enabled_clients, |
There was a problem hiding this comment.
nitpick: widget_engaged_clients and widget_enabled_clients differ by two letters and represent very different populations (engaged-via-event vs. enabled-via-string-list). Easy to misuse downstream. Recommend either renaming the new column to something less collidable (e.g. widget_active_enabled_clients, clients_with_widget_enabled) or adding an explicit row to the Notes & Conventions section of the README that contrasts the two definitions side-by-side, the same way widget_link_click_count is contrasted with widgets_visit_daily_v1.
Integration report for "DENG-11155 - Newtab Widget aggregation table Update"
|
Description
This PR updates the
firefox_desktop_derived.newtab_widgets_daily_v1, a daily aggregation of Firefox New Tab widget telemetry fromfirefox_desktop_stable.newtab_v1with new segments and metrics.New segments being added are
(app_version, os, channel, locale, country).A new metric
widget_enabled_clientsis added to track the number of clients with widget enabled.Existing metrics (
widget_setting_change_count, widget_utility_action_count)are updated to include more user actions.Includes update to:
query.sqlmetadata.yamlschema.yamlchecks.sqlREADME.mdRelated Tickets & Documents
Reviewer, please follow this checklist