debiasRdata is the optional empirical data companion package for
debiasR. It keeps empirical
travel-to-work data out of the main methods package, so debiasR can stay
small, method-focused, and CRAN-friendly.
The package supplies documented data assets only. Bias adjustment, modelling,
validation, and example-data loading logic live in debiasR.
Contributions are made through pull requests. See
CONTRIBUTING.md for the review workflow and data-release
checks.
Install the package from GitHub with pak:
pak::pak("de-bias/debiasRdata")If you do not use pak, install with remotes instead:
remotes::install_github("de-bias/debiasRdata")Dataset names identify the geography explicitly: msoa_... for MSOA assets and
lad_... for LAD/LTLA assets, with a census_ prefix for Census benchmarks.
msoa_OD_travel2work contains observed mobile-phone-derived MSOA
travel-to-work OD flows from Zenodo record
https://doi.org/10.5281/zenodo.13327082.
census_msoa_OD_travel2work contains the matching Census 2021 ODWP01EW MSOA
workplace-flow benchmark from the Office for National Statistics via Nomis.
lad_OD_travel2work contains the same observed mobile-phone-derived flows
aggregated to LAD22 origin-destination pairs with the ONS
OA_to_LSOA_to_MSOA_to_LAD_Dec_2021_EW_V3.csv lookup.
census_lad_OD_travel2work contains the Census 2021 ODWP01EW LTLA
workplace-flow benchmark. The observed LAD aggregate uses LAD22CD from the
ONS lookup; the Census LAD/LTLA benchmark uses the authority codes supplied
directly by ODWP01EW_LTLA.csv.
lad_centroids contains ONS LAD December 2021 representative coordinates. It
lets debiasR compute selected-area LAD distances without packaging a full OD
distance matrix.
msoa_centroids contains MSOA representative coordinates. It lets debiasR
compute selected-area MSOA distances without packaging a full OD distance
matrix.
msoa_covariates and lad_covariates contain selected Census 2021
topic-summary covariates for MSOA and LAD/LTLA areas.
coverage_lad contains LAD/LTLA benchmark population counts and mobile-phone-
derived active-user counts for coverage-bias examples.
coverage_msoa contains the corresponding MSOA benchmark population counts and
mobile-phone-derived active-user counts for coverage-bias examples.
The four OD datasets use the same normalized schema:
| Column | Type | Meaning |
|---|---|---|
origin |
character | Origin MSOA or LAD/LTLA code |
destination |
character | Destination MSOA or LAD/LTLA code |
flow |
numeric | Non-negative OD flow |
The two covariate datasets use this schema:
| Column | Type | Meaning |
|---|---|---|
area |
character | MSOA or LAD/LTLA code |
name |
character | Area name |
year |
integer | Census year |
per_ukborn |
numeric | Percentage UK-born |
per_age_20.29 |
numeric | Percentage aged 20-29 |
per_age_70plus |
numeric | Percentage aged 70 or over |
per_level4 |
numeric | Percentage with level 4 qualifications |
per_hh_no_centralheat |
numeric | Percentage of households without central heating |
per_NS_SeC_L13_routine |
numeric | Percentage in NS-SeC L13 routine occupations |
rural_pct |
numeric | Percentage rural |
The coverage datasets use this schema:
| Column | Type | Meaning |
|---|---|---|
date |
integer | Reference year |
name |
character | MSOA or LAD/LTLA name |
code |
character | MSOA or LAD/LTLA code |
population |
integer | Benchmark population count |
user_count |
numeric | Mobile-phone-derived active-user count |
library(debiasRdata)
head(msoa_OD_travel2work)
head(census_msoa_OD_travel2work)
head(lad_OD_travel2work)
head(census_lad_OD_travel2work)
head(lad_centroids)
head(msoa_centroids)
head(msoa_covariates)
head(lad_covariates)
head(coverage_lad)
head(coverage_msoa)debiasR consumes these objects conditionally:
library(debiasR)
ex <- debiasR_example_data(n_areas = 25)
names(ex)Compressed normalized CSV files are also installed under inst/extdata and can
be located with:
debiasRdata_path("msoa_OD_travel2work")
debiasRdata_path("census_msoa_OD_travel2work")
debiasRdata_path("lad_OD_travel2work")
debiasRdata_path("census_lad_OD_travel2work")
debiasRdata_path("lad_centroids")
debiasRdata_path("msoa_centroids")
debiasRdata_path("msoa_covariates")
debiasRdata_path("lad_covariates")
debiasRdata_path("coverage_lad")
debiasRdata_path("coverage_msoa")The helper only locates installed files. It does not download data during package use.
The mobile-phone-derived table is derived from:
- Title: Anonymised human location data for urban mobility research
- DOI: https://doi.org/10.5281/zenodo.13327082
- Source file:
msoa_OD_travel2work.csv.gz - License: Creative Commons Attribution 4.0 International (CC BY 4.0), https://creativecommons.org/licenses/by/4.0/
- Packaged modifications: columns were normalized to
origin,destination, andflow; county columns were removed; OD pairs were validated and aggregated where needed.
The MSOA Census benchmark is derived from:
- Table: Census 2021
ODWP01EW, MSOA workplace-flow origin-destination table - Source institution: Office for National Statistics, accessed via Nomis
- License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
- Packaged modifications: rows were filtered to people working in the UK but
not working at or from home; non-MSOA destination codes were removed; columns
were normalized to
origin,destination, andflow.
The LAD-level observed table is derived from the Zenodo MSOA travel-to-work
table using the ONS OA_to_LSOA_to_MSOA_to_LAD_Dec_2021_EW_V3.csv lookup. The
lookup is published by ONS under the Open Government Licence v3.0.
The LAD/LTLA Census benchmark is derived from:
- Table: Census 2021
ODWP01EW, LTLA workplace-flow origin-destination table - Source file:
ODWP01EW_LTLA.csvin the Nomis Census 2021 origin-destination zip - Source institution: Office for National Statistics, accessed via Nomis
- License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
- Packaged modifications: rows were filtered to people working in the UK but
not working at or from home; columns were normalized to
origin,destination, andflow; valid England/Wales local authority codes were retained.
The LAD centroid table is derived from:
- Boundary: ONS Local Authority Districts December 2021 GB boundary file
LAD_Dec_2021_GB_BFC_2022.gpkg - Source institution: Office for National Statistics
- License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
- Packaged modifications: LAD code, LAD name, British National Grid easting /
northing, and longitude / latitude fields were retained and renamed to
area,name,easting,northing,longitude, andlatitude.
The MSOA centroid table is derived from the local DEBIAS geography centroid
input msoa_centroids.rds, available under the Open Government Licence v3.0.
The packaged table keeps MSOA code, MSOA name, British National Grid easting /
northing, and longitude / latitude fields under the same schema as
lad_centroids.
The covariate tables are derived from:
- Collection: Census 2021 topic-summary inputs
- Source files:
combined-data-msoa.csv,combined-data-lad.csv, anddictionary.csv - Source institution: Office for National Statistics
- License: Open Government Licence v3.0, https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
- Packaged modifications:
geography.codewas renamed toarea,geographytoname, anddatetoyear; the seven requested label-5 covariates were retained; source columnper_age_20-29was renamed to the label-5/R-facing nameper_age_20.29.
The LAD coverage table is derived from the same Zenodo mobile-phone-derived data
release used for the packaged observed travel-to-work OD assets, licensed under
Creative Commons Attribution 4.0 International (CC BY 4.0). The packaged table
keeps benchmark population counts, renames the LAD-level aggregate active-user
count to user_count, sets date to 2021, and maps LAD/LTLA codes from
lad_covariates by area name. It contains LAD/LTLA-level aggregate counts only;
it does not contain individual records, device identifiers, raw trajectories, or
raw mobile-phone data.
The MSOA coverage table is derived from the local DEBIAS
coverage_msoa.rda active-population-bias output. The active-user counts are
derived from the same Zenodo mobile-phone-derived data release used for the
packaged observed travel-to-work OD assets, licensed under Creative Commons
Attribution 4.0 International (CC BY 4.0). Benchmark population counts are
ONS-derived under the Open Government Licence v3.0. The packaged table keeps
date, MSOA name, benchmark population, and aggregate active-user
user_count, renames MSOA21CD to code, and removes rows without valid MSOA
codes. It contains MSOA-level aggregate counts only; it does not contain
individual records, device identifiers, raw trajectories, or raw mobile-phone
data.
Build metadata, checksums, row counts, and transformation notes are recorded in
inst/metadata/source-metadata.json.
Full OD distance tables are not packaged. For LAD examples, debiasR computes
selected-area distances from lad_centroids. For MSOA examples, debiasR can
compute selected-area distances from msoa_centroids.