-
Notifications
You must be signed in to change notification settings - Fork 1
BigQuery for Modeling Data
The WFRC BigQuery environment serves as the centralized processing platform for the agency’s modeling data, including preprocessing of the 2023 Household Travel Survey (HTS) for Travel Demand Model (TDM) and related use cases.
This framework ensures that all modeling datasets—whether raw, consultant-delivered, or WFRC-produced—are organized, consistent, and securely managed in a single, scalable environment.
- Platform: Google BigQuery — a SQL-based cloud data warehouse.
- Purpose: Hosts a unified environment for processing and storing survey and model data (e.g., HTS, TDM, ABM, and commercial vehicle datasets).
- Architecture: Cloud-native and modular, designed to scale and adapt without structural overhaul.
-
Single Access Point: All datasets are stored within a unified BigQuery project (
wfrc-modeling-data), ensuring consistency, ease of maintenance, and streamlined collaboration. -
Modular Design: Each dataset represents a specific processing stage (
src_,ext_,prd_), promoting clear lineage and simplified governance. - Persistent Views: SQL-based views are centrally maintained and automatically propagate updates to all users, supporting consistent analytics across teams.
- Integrated Spatial Processing: Spatial joins (e.g., TAZ assignments or coordinate-based joins) are applied centrally during preprocessing, preserving spatial integrity and reducing the need for repetitive spatial queries by users.
- Role-Based Access Control (RBAC): Access is governed by roles, restricting sensitive operations to authorized users and aligning with WFRC’s data governance standards.
- Protection of Sensitive Geolocation Data: Personally identifiable location data (e.g., latitude/longitude, projected X/Y coordinates, Census block group IDs) is excluded from TDM-preprocessed tables to protect respondent privacy while preserving analytical utility.
- Cross-Functional Accessibility: Built for planners, analysts, and engineers alike—providing approachable yet powerful access to shared datasets.
- Web GUI Access: Users can query, preview, and manage datasets directly through the BigQuery web interface in the Google Cloud Console—ideal for interactive analysis and data exploration without coding.
- Python API Access: Analysts and developers can connect using the BigQuery Python client library, supporting reproducible workflows, automated processing, and integration with Jupyter notebooks or other analytical pipelines.
- Hybrid Use Case Support: Both methods access the same datasets and permission structure, ensuring a unified and consistent environment across graphical and programmatic workflows.
Each dataset name follows this structure:
<stage>_<source>_<project_or_model>_<content>_<year>
| Element | Meaning | Example |
|---|---|---|
stage |
Stage of data in lifecycle |
src, ext, prd
|
source |
Data source (not used for ) |
rsg, udot, locus
|
project_or_model |
Model system or project name |
tdm, abm, lu
|
content |
Content of data |
household_travel_survey, commercial_vehicle, taz
|
year (optional) |
Primary data year | 2023 |
Example:
src_rsg_household_travel_survey_2023
| Stage | Meaning | Description | Owned by |
|---|---|---|---|
src |
Source | Data ingested from external systems or raw sources (e.g., HTS survey, UDOT feeds, Locus exports). | External system |
ext |
External | Consultant-delivered or partner-processed datasets not yet WFRC-final. | Consultant / Partner |
prd |
Production | WFRC-approved, final datasets ready for modeling or publication. | WFRC |
stg |
Staging | Temporary or intermediate processing steps (optional use). | WFRC |
Example data flow:
src_rsg_household_travel_survey_2023 → ext_rsg_hts_2023 → prd_tdm_hts_2023
| Term | Abbreviation | Example |
|---|---|---|
| Travel Demand Model | tdm |
prd_tdm_taz_2021 |
| Activity-Based Model | abm |
prd_abm_lu_2025 |
| Real Estate Market Model | remm |
prd_remm_parcel_2025 |
For content at the src level use full descriptions and at all other levels use abbreviations.
| Term | Abbreviation | Example |
|---|---|---|
| Household Travel Survey | hts |
prd_tdm_hts_2023 |
| On-Board Survey | obs |
prd_tdm_obs_2023 |
| Land Use | lu |
prd_abm_lu_2025 |
| Personal Vehicle | pv |
ext_tdm_pv_2023 |
| Commercial Vehicle | cv |
ext_tdm_cv_2023 |
| Transportation Analysis Zone | taz |
prd_tdm_taz_2021 |
wfrc-modeling-data
├── src_rsg_household_travel_survey_2023
├── src_locus_commercial_vehicle_2023
├── ext_insight_cv_2023
├── ext_utah_on_board_survey_2023
├── prd_tdm_hts_2023
├── prd_tdm_taz
├── prd_abm_lu_2025
- Keep names lowercase and use underscores (_) as separators.
- Include year when datasets are versioned or time-specific.
- Use descriptive names in source dataset for clarity.
- Use abbreviations for descriptive names for production tables.
- Never overwrite production (
prd) datasets — use another dataset for new year/version.