Skip to content

de-bias/debiasRdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

debiasRdata

debiasRdata is the optional empirical data companion package for debiasR. It keeps empirical travel-to-work data out of the main methods package, so debiasR can stay small, method-focused, and CRAN-friendly.

The package supplies documented data assets only. Bias adjustment, modelling, validation, and example-data loading logic live in debiasR.

Contributions are made through pull requests. See CONTRIBUTING.md for the review workflow and data-release checks.

Installation

Install the package from GitHub with pak:

pak::pak("de-bias/debiasRdata")

If you do not use pak, install with remotes instead:

remotes::install_github("de-bias/debiasRdata")

Included datasets

Dataset names identify the geography explicitly: msoa_... for MSOA assets and lad_... for LAD/LTLA assets, with a census_ prefix for Census benchmarks.

msoa_OD_travel2work contains observed mobile-phone-derived MSOA travel-to-work OD flows from Zenodo record https://doi.org/10.5281/zenodo.13327082.

census_msoa_OD_travel2work contains the matching Census 2021 ODWP01EW MSOA workplace-flow benchmark from the Office for National Statistics via Nomis.

lad_OD_travel2work contains the same observed mobile-phone-derived flows aggregated to LAD22 origin-destination pairs with the ONS OA_to_LSOA_to_MSOA_to_LAD_Dec_2021_EW_V3.csv lookup.

census_lad_OD_travel2work contains the Census 2021 ODWP01EW LTLA workplace-flow benchmark. The observed LAD aggregate uses LAD22CD from the ONS lookup; the Census LAD/LTLA benchmark uses the authority codes supplied directly by ODWP01EW_LTLA.csv.

lad_centroids contains ONS LAD December 2021 representative coordinates. It lets debiasR compute selected-area LAD distances without packaging a full OD distance matrix.

msoa_centroids contains MSOA representative coordinates. It lets debiasR compute selected-area MSOA distances without packaging a full OD distance matrix.

msoa_covariates and lad_covariates contain selected Census 2021 topic-summary covariates for MSOA and LAD/LTLA areas.

coverage_lad contains LAD/LTLA benchmark population counts and mobile-phone- derived active-user counts for coverage-bias examples.

coverage_msoa contains the corresponding MSOA benchmark population counts and mobile-phone-derived active-user counts for coverage-bias examples.

The four OD datasets use the same normalized schema:

Column Type Meaning
origin character Origin MSOA or LAD/LTLA code
destination character Destination MSOA or LAD/LTLA code
flow numeric Non-negative OD flow

The two covariate datasets use this schema:

Column Type Meaning
area character MSOA or LAD/LTLA code
name character Area name
year integer Census year
per_ukborn numeric Percentage UK-born
per_age_20.29 numeric Percentage aged 20-29
per_age_70plus numeric Percentage aged 70 or over
per_level4 numeric Percentage with level 4 qualifications
per_hh_no_centralheat numeric Percentage of households without central heating
per_NS_SeC_L13_routine numeric Percentage in NS-SeC L13 routine occupations
rural_pct numeric Percentage rural

The coverage datasets use this schema:

Column Type Meaning
date integer Reference year
name character MSOA or LAD/LTLA name
code character MSOA or LAD/LTLA code
population integer Benchmark population count
user_count numeric Mobile-phone-derived active-user count

Usage

library(debiasRdata)

head(msoa_OD_travel2work)
head(census_msoa_OD_travel2work)
head(lad_OD_travel2work)
head(census_lad_OD_travel2work)
head(lad_centroids)
head(msoa_centroids)
head(msoa_covariates)
head(lad_covariates)
head(coverage_lad)
head(coverage_msoa)

debiasR consumes these objects conditionally:

library(debiasR)

ex <- debiasR_example_data(n_areas = 25)
names(ex)

Compressed normalized CSV files are also installed under inst/extdata and can be located with:

debiasRdata_path("msoa_OD_travel2work")
debiasRdata_path("census_msoa_OD_travel2work")
debiasRdata_path("lad_OD_travel2work")
debiasRdata_path("census_lad_OD_travel2work")
debiasRdata_path("lad_centroids")
debiasRdata_path("msoa_centroids")
debiasRdata_path("msoa_covariates")
debiasRdata_path("lad_covariates")
debiasRdata_path("coverage_lad")
debiasRdata_path("coverage_msoa")

The helper only locates installed files. It does not download data during package use.

Sources and licenses

The mobile-phone-derived table is derived from:

  • Title: Anonymised human location data for urban mobility research
  • DOI: https://doi.org/10.5281/zenodo.13327082
  • Source file: msoa_OD_travel2work.csv.gz
  • License: Creative Commons Attribution 4.0 International (CC BY 4.0), https://creativecommons.org/licenses/by/4.0/
  • Packaged modifications: columns were normalized to origin, destination, and flow; county columns were removed; OD pairs were validated and aggregated where needed.

The MSOA Census benchmark is derived from:

  • Table: Census 2021 ODWP01EW, MSOA workplace-flow origin-destination table
  • Source institution: Office for National Statistics, accessed via Nomis
  • License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
  • Packaged modifications: rows were filtered to people working in the UK but not working at or from home; non-MSOA destination codes were removed; columns were normalized to origin, destination, and flow.

The LAD-level observed table is derived from the Zenodo MSOA travel-to-work table using the ONS OA_to_LSOA_to_MSOA_to_LAD_Dec_2021_EW_V3.csv lookup. The lookup is published by ONS under the Open Government Licence v3.0.

The LAD/LTLA Census benchmark is derived from:

  • Table: Census 2021 ODWP01EW, LTLA workplace-flow origin-destination table
  • Source file: ODWP01EW_LTLA.csv in the Nomis Census 2021 origin-destination zip
  • Source institution: Office for National Statistics, accessed via Nomis
  • License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
  • Packaged modifications: rows were filtered to people working in the UK but not working at or from home; columns were normalized to origin, destination, and flow; valid England/Wales local authority codes were retained.

The LAD centroid table is derived from:

  • Boundary: ONS Local Authority Districts December 2021 GB boundary file LAD_Dec_2021_GB_BFC_2022.gpkg
  • Source institution: Office for National Statistics
  • License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
  • Packaged modifications: LAD code, LAD name, British National Grid easting / northing, and longitude / latitude fields were retained and renamed to area, name, easting, northing, longitude, and latitude.

The MSOA centroid table is derived from the local DEBIAS geography centroid input msoa_centroids.rds, available under the Open Government Licence v3.0. The packaged table keeps MSOA code, MSOA name, British National Grid easting / northing, and longitude / latitude fields under the same schema as lad_centroids.

The covariate tables are derived from:

  • Collection: Census 2021 topic-summary inputs
  • Source files: combined-data-msoa.csv, combined-data-lad.csv, and dictionary.csv
  • Source institution: Office for National Statistics
  • License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
  • Packaged modifications: geography.code was renamed to area, geography to name, and date to year; the seven requested label-5 covariates were retained; source column per_age_20-29 was renamed to the label-5/R-facing name per_age_20.29.

The LAD coverage table is derived from the same Zenodo mobile-phone-derived data release used for the packaged observed travel-to-work OD assets, licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). The packaged table keeps benchmark population counts, renames the LAD-level aggregate active-user count to user_count, sets date to 2021, and maps LAD/LTLA codes from lad_covariates by area name. It contains LAD/LTLA-level aggregate counts only; it does not contain individual records, device identifiers, raw trajectories, or raw mobile-phone data.

The MSOA coverage table is derived from the local DEBIAS coverage_msoa.rda active-population-bias output. The active-user counts are derived from the same Zenodo mobile-phone-derived data release used for the packaged observed travel-to-work OD assets, licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). Benchmark population counts are ONS-derived under the Open Government Licence v3.0. The packaged table keeps date, MSOA name, benchmark population, and aggregate active-user user_count, renames MSOA21CD to code, and removes rows without valid MSOA codes. It contains MSOA-level aggregate counts only; it does not contain individual records, device identifiers, raw trajectories, or raw mobile-phone data.

Build metadata, checksums, row counts, and transformation notes are recorded in inst/metadata/source-metadata.json.

Planned data

Full OD distance tables are not packaged. For LAD examples, debiasR computes selected-area distances from lad_centroids. For MSOA examples, debiasR can compute selected-area distances from msoa_centroids.

About

Mobile-phone-derived travel-to-work data companion package for debiasR

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages