- Guidance for Code Lists
Whereas a CSVW or a qb:Dataset is a distribution of a dcat:Dataset, so is a skos:ConceptScheme. We recommend that statisticians serialise code lists as CSVW as their primary distribution, and generate other distributions from that source.
TODO: Versioning of code lists is a hard problem to solve. As ONS doesn't directly control many code lists, we suggest we encourage people to adopt the versioning convention of the code list as the slug. E.g. To encourage the reuse of code lists, we recommend adopting the naming convention of the competency UKSIC-2007 for the UK Standard Industrial Classification of Economic Activities 2007, or ISIC-4 for the International Standard Industrial Classification Revision 4 published by the United Nations in 2008. TODO: Some stuff here around the styling of labels used in a taxonomy. TODO: Some UN best practices for creating classifications here.
Every qb:DimensionProperty must have a skos:ConceptScheme associated with it which is related using the qb:codeList property. The skos:ConceptScheme is used to define the list of codes used by the dimension.
All codes which are part of a codelist must have type skos:Concept and be related to the codelist using the skos:inScheme property.
classDiagram
class Dimension {
a qb:DimensionProperty
}
class Codelist {
a skos:ConceptScheme
}
class Code {
a skos:Concept
}
Dimension --> "1" Codelist : qb.codeList
Codelist --> "1..*" Code : skos.hasTopConcept
Code --> "1" Codelist : skos.inScheme
Code --> "1..*" Code : skos.narrower
Code "1" <-- Code : skos.broader
We recommend the use of skos:ConceptScheme, skos:Concept.
We recommend dataset series have IRIs of the form:
http://{domain}/codelist/{codelist_slug}http://{domain}/codelist/{codelist_slug}/{edition_year}
For example:
http://data.gov.uk/codelist/some-codelisthttp://data.gov.uk/codelist/sitc/2022
| Property | Requirement level | Notes |
|---|---|---|
dcterms:title |
mandatory | See title |
dcterms:description |
mandatory | See description |
dcterms:publisher |
mandatory | See publishers |
dcterms:license |
mandatory | See license |
dcat:distribution |
mandatory | See distribution |
dcterms:creator |
recommended | See creator |
dcat:contactPoint |
recommended | See contact point |
dcterms:issued |
recommended | See issued |
dcterms:modified |
recommended | See modified |
dcat:keyword |
recommended | See keyword |
dcat:theme |
recommended | See themes |
http://{domain}/codelist/{codelist_slug}/code/{code_slug}
For example:
http://data.gov.uk/codelist/some-codelist/code/some-codehttp://data.gov.uk/codelist/standard-international-trade-classification/revision-4/code/01
| Property | Requirement level | Notes |
|---|---|---|
skos:inScheme |
mandatory | See codelists |
rdfs:label |
mandatory | See titles |
skos:prefLabel |
mandatory | See titles |
skos:notation |
mandatory | |
skos:broader |
recommended | See hierarchical codelists |
skos:narrower |
recommended | See hierarchical codelists |
skos:related |
recommended | See correspondence between codelists |
skos:exactMatch |
recommended | See correspondence between codelists |
skos:closeMatch |
recommended | See correspondence between codelists |
skos:broadMatch |
recommended | See correspondence between codelists |
skos:altLabel |
optional | |
skos:hiddenLabel |
optional |
Hierarchies in codelists must be indicated by using the skos:broader and skos:narrower predicates.
The codes at the top of the hierarchy (and so have no skos:broader relationships) must be related to the codelist using the skos:hasTopConcept property.
flowchart TD
Animals((Animals codelist)) -->|skos:hasTopConcept| animals((animals))
Animals((Animals codelist)) -->|rdf:type| qb:ConceptScheme((qb:ConceptScheme))
animals -->|skos:narrower| mammals((mammals))
mammals -->|skos:broader| animals
Producers may use dcat:qualifiedRelation (or some sub-property of) to relate a codelist to a statistical dataset.
TODO: Coin the IRI for relating a dataset to a codelist.
Wherever possible, statisticians should aim to reuse codes from common codelists, however, they may wish to combine or alter codes within a codelist for reasons of statistical suppression or quality. In doing so, they create a variant of some official codelist which is customised to suit their needs.
Codelists should be related to their variants using the xkos:variant property.
For example, some statistics which make use of the Standard Industrial Classification (SIC) have changed some categories and included the following notations for these codes within their statistical output:
11.01-0620.11 + 20.1320.14+20.16+20.17+20.620.15 /120.15 /224.4-5 (not 24.42 nor 24.46)33 (not 33.15-16)
TODO: How to express these sorts of semantics using XKOS/SKOS or OWL?
Notations like these, while intending to be descriptive, may be confusing or not be appropriate to be included in an IRI. When extending a codelist with custom codes, we recommend generating new notations which are:
- Similar in style and convention to the codelist which is being extended.
- Do not clash with any current (or future) notations which feature in the codelist.
We may achieve this in several ways:
- Assigning new codes to large unused digits (such as
99) or unused letters (e.g.XorZ), - Combining new or related digits with an unused character (e.g.
33xor33.15x,X.1,X.2etc.), - Appending
/1through/9when creating new subdivisions of an already existing code.
When extending an already established codelist, the creator of the new codelist must familiarise themselves with the existing codelist, how it is structured and ensure their extension does not introduce any clashes with existing codes.
| Example | Possible notation | Notes |
|---|---|---|
11.01-06 |
11.0X, 11.0Z, 11.9x |
Expresses the sum of11.01 through to 11.06, which are subdivisions of 11.0. 11.07 is occupied by another category. |
20.15 /1 |
20.15/1 |
Expresses a custom subdivision of an already existing category,25.15. |
20.14+20.16+20.17+20.6 |
20.X, 20.Z, 20.9x |
Expresses the sum of20.14, 20.16 and 20.17, which are subdivisions of 20.1, along with 20.6. |
24.4-5 (not 24.42 nor 24.46) |
24.X, 24.Z, 24.9x |
Expresses the sum of24.4 and 24.5 but excluding the subdivisions 24.42 and 24.46. |
33 (not 33.15-16) |
33X, 33Z, 33.9x |
Expresses the sum of33.1 and 33.2, which are subdivisons of 33, excluding 33.15 and 33.16. |
We can use CSVW as a convenient way to create a codelist, represented in RDF using SKOS.
For example, take the Standard Industrial Trade Classification (SITC):
0 Food and live animals
├─ 00 Live animals other than animals of division 03
│ ├─ 001 Live animals other than animals of division 03
├─ 01 Meat and meat preparations
│ ├─ 011 Meat of bovine animals, fresh, chilled or frozen
│ ├─ 012 Other meat and edible meat offal
│ ├─ 016 Meat, edible meat offal, salted, dried; flours, meals
│ ├─ 017 Meat, edible meat offal, prepared, preserved, n.e.s
│ ├─ ...
├─ 02 Dairy products and birds' eggs
│ ├─ 022 Milk, cream and milk products (excluding butter, cheese)
│ ├─ ...We can create a CSV representation of the different classifications along with the hierarchy as follows:
| notation | label | comment | parent |
|---|---|---|---|
| 0 | Food and live animals | ... | |
| 00 | Live animals other than animals of division 03 | ... | 0 |
| 001 | Live animals other than animals of division 03 | ... | 00 |
| 01 | Meat and meat preparations | ... | 0 |
| 011 | Meat of bovine animals, fresh, chilled or frozen | ... | 01 |
| 012 | Other meat and edible meat offal | ... | 01 |
| 016 | Meat, edible meat offal, salted, dried; flours, meals | ... | 01 |
| 017 | Meat, edible meat offal, prepared, preserved, n.e.s | ... | 01 |
| ... | ... | ... | |
| 02 | Dairy products and birds' eggs | ... | 0 |
| 022 | Milk, cream and milk products (excluding butter, cheese) | ... | 02 |
| ... | ... | ... |
We are able to create a CSVW file which can be used to create a codelist. Note the use of virtual columns to assert the type and the relationship between the concepts and the concept scheme.
{
"@context": "http://www.w3.org/ns/csvw",
"@id": "http://data.gov.uk/codelist/standard-international-trade-classification/revision-4.csv",
"@type": "Table",
"url": "http://data.gov.uk/codelist/standard-international-trade-classification/revision-4.csv",
"tableSchema": {
"columns": [
{
"titles": "notation",
"name": "notation",
"required": true,
"propertyUrl": "skos:notation"
},
{
"titles": "label",
"name": "label",
"required": true,
"propertyUrl": "rdfs:label"
},
{
"titles": "comment",
"name": "comment",
"required": false,
"propertyUrl": "rdfs:comment"
},
{
"titles": "parent_notation",
"name": "parent_notation",
"required": false,
"propertyUrl": "skos:broader",
"valueUrl": "http://data.gov.uk/codelist/standard-international-trade-classification/revision-4/{+parent_notation}"
},
{
"virtual": true,
"propertyUrl": "skos:inScheme",
"valueUrl": "http://data.gov.uk/codelist/standard-international-trade-classification/revision-4"
},
{
"virtual": true,
"propertyUrl": "rdf:type",
"valueUrl": "skos:Concept"
}
],
"aboutUrl": "http://data.gov.uk/codelist/standard-international-trade-classification/revision-4/{+notation}"
}
}Performing csv2rdf on this CSVW produces RDF like:
<http://data.gov.uk/codelist/standard-international-trade-classification/revision-4/0> a skos:Concept ;
skos:notation "0" ;
rdfs:label "Food and live animals" ;
rdfs:comment "..." ;
skos:inScheme <http://data.gov.uk/codelist/standard-international-trade-classification/revision-4> ;
.
<http://data.gov.uk/codelist/standard-international-trade-classification/revision-4/00> a skos:Concept ;
skos:notation "00" ;
rdfs:label "Live animals other than animals of division 03" ;
rdfs:comment "..." ;
skos:broader <http://data.gov.uk/codelist/standard-international-trade-classification/revision-4/0> ;
skos:inScheme <http://data.gov.uk/codelist/standard-international-trade-classification/revision-4> ;
.
# etc...A limitation of using CSVW to produce a skos:ConceptScheme is the inability to set both skos:narrower and skos:broader relationships concurrently, and to set the skos:hasTopConcept relationship. When loading a skos:ConceptScheme generated from CSVW in this way, we serialise these additional relationships using CONSTRUCT queries in SPARQL.
The following SPARQL query produces skos:narrower relationships:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
CONSTRUCT {
?broaderConcept skos:narrower ?concept.
}
WHERE {
?conceptScheme a skos:ConceptScheme .
?concept
skos:inScheme ?conceptScheme;
skos:broader ?broaderConcept.
FILTER NOT EXISTS {
?broaderConcept skos:narrower ?concept.
}
}The following SPARQL query produces skos:hasTopConcept relationships:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
CONSTRUCT {
?conceptScheme skos:hasTopConcept ?concept.
}
WHERE {
?conceptScheme a skos:ConceptScheme .
?concept skos:inScheme ?conceptScheme.
FILTER NOT EXISTS {
# Find concepts which don't have anything broader, they are by definition topConcepts.
?concept skos:broader ?broaderConcept.
}
FILTER NOT EXISTS {
# Ensure we don't add topConcept where it is already set.
?conceptScheme skos:hasTopConcept ?concept.
}
}Statisticians may wish to report statistics against multiple classifications. Doing so may be difficult, as different classifications typically use different namespaces for their IRIs.
For example, consider a dataset which mixes codes from the NUTS geography codelist with codes from the ONS geography codelist.
| geography_code | geography_label | value |
|---|---|---|
| UKC | North East, England | ... |
| UKD | North West, England | ... |
| E92000001 | England | ... |
The NUTS codes have IRIs which are maintained by Eurostat, such as http://data.europa.eu/nuts/code/UKC, whereas the ONS geography codes are maintained by the ONS at the http://statistics.data.gov.uk/id/statistical-geography/E92000001 namespace.
We map the cells of the dataset to RDF by using the valueUrl CSVW property. Only a single valueUrl can be applied to all the cells in a column. This is problematic, as the IRIs we wish to map to have different bases. Setting valueUrl to http://data.europa.eu/nuts/code/{geography_code} would result in a non-existent identifier http://data.europa.eu/nuts/code/E92000001 appearing in the RDF output.
We address this by creating new identifiers for each of the codes under a shared namespace, and using skos:exactMatch relations to relate these new identifiers to the more commonly used identifiers. For example,
<http://data.gov.uk/dataset/some-dataset/codelist/geography/code/E92000001>
skos:exactMatch <http://statistics.data.gov.uk/id/statistical-geography/E92000001> ;
.If using a CSVW to create a codelist, then the skos:exactMatch relationships can be expressed by adding an additional column to the CSV:
| notation | label | same_as | |
|---|---|---|---|
| UKC | North East, England | http://data.europa.eu/nuts/code/UKC |
... |
| UKD | North West, England | http://data.europa.eu/nuts/code/UKD |
... |
| E92000001 | England | http://statistics.data.gov.uk/id/statistical-geography/E92000001 |
... |
The additional column would have the following specification inside the CSVW:
{
"titles": "same_as",
"name": "same_as",
"required": true,
"propertyUrl": "skos:exactMatch",
"valueUrl": "{+same_as}"
}This would result in the following RDF:
<http://data.gov.uk/dataset/some-dataset/codelist/geography/code/UKC> a skos:Concept ;
skos:notation "UKC" ;
rdfs:label "North East, England" ;
skos:prefLabel "North East, England" ;
skos:inScheme <http://data.gov.uk/dataset/some-dataset/codelist/geography> ;
skos:exactMatch <http://data.europa.eu/nuts/code/UKC> ;
.
# ...
<http://data.gov.uk/dataset/some-dataset/codelist/geography/code/E92000001> a skos:Concept ;
skos:notation "E92000001" ;
rdfs:label "England" ;
skos:prefLabel "England" ;
skos:inScheme <http://data.gov.uk/dataset/some-dataset/codelist/geography> ;
skos:exactMatch <http://statistics.data.gov.uk/id/statistical-geography/E92000001> ;
.Warning This section needs further work.
TODO: Add stuff about xkos TODO: Some example using HMRC guidance
For example, the Combined Nonclementure (CN8) is a classification of commodities of trade. These get updated in legislation each year.
HMRC publishes each annual edition of CN8 and provides correspondence tables between the different year's editions.
| 2021 code | 2022 code |
|---|---|
| 0208 90 98 | 0208 90 98 |
| 0208 90 98 | 0410 10 10 |
| 0210 99 39 | 0210 99 39 |
| 0210 99 39 | 0410 10 99 |
| 0210 99 90 | 0210 99 90 |
| 0210 99 90 | 0410 10 91 |
<> a xkos:Correspondence ;
xkos:compares
<http://data.gov.uk/codelist/combined-nonclementure/2022>,
<http://data.gov.uk/codelist/combined-nonclementure/2021> ;
xkos:madeOf <> ;
.
<> a xkos:ConceptAssociation ;
xkos:sourceConcept <http://data.gov.uk/codelist/combined-nonclementure/02089098> ;
xkos:targetConcept
<http://data.gov.uk/codelist/combined-nonclementure/02089098> ,
<http://data.gov.uk/codelist/combined-nonclementure/04101010> ;
.Prefer using IRIs from the http://statistics.data.gov.uk vocabulary, based on ONS geography codes.
| Label | IRI |
|---|---|
| United Kingdom | http://statistics.data.gov.uk/id/statistical-geography/K02000001 |
| Great Britain | http://statistics.data.gov.uk/id/statistical-geography/K03000001 |
| England and Wales | http://statistics.data.gov.uk/id/statistical-geography/K04000001 |
| England | http://statistics.data.gov.uk/id/statistical-geography/E92000001 |
| Northern Ireland | http://statistics.data.gov.uk/id/statistical-geography/N92000002 |
| Scotland | http://statistics.data.gov.uk/id/statistical-geography/S92000002 |
| Wales | http://statistics.data.gov.uk/id/statistical-geography/W92000002 |
Data providers should prefer using IRIs from the Dublin core collection description frequency vocabulary, http://purl.org/cld/freq/.
Common options include:
| Label | IRI |
|---|---|
| Annual | http://purl.org/cld/freq/annual |
| Quarterly | http://purl.org/cld/freq/quarterly |
| Monthly | http://purl.org/cld/freq/monthly |
| Weekly | http://purl.org/cld/freq/weekly |
| Daily | http://purl.org/cld/freq/daily |
| Label | IRI |
|---|---|
| Open Government Licence v3.0 | http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ |
| Open Government Licence v2.0 | http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/ |
| Open Government Licence v1.0 | http://www.nationalarchives.gov.uk/doc/open-government-licence/version/1/ |
GOV.UK provides a list of government organisations, which can be used to populate the dcterms:publisher and dcterms:creator properties.
For example: https://www.gov.uk/government/organisations/office-for-national-statistics.
Data providers should adopt the analytical function guidance for statistical markers.
TODO: realistically change this form Analyst Function to SDMX; as the latter will be around in 10 years guaranteed.
| Label | Notation | IRI |
|---|---|---|
| Break in time series | [b] |
http://data.gov.uk/codelist/statistical-markers/code/[b] |
| Confidential | [c] |
http://data.gov.uk/codelist/statistical-markers/code/[c] |
| Estimated | [e] |
http://data.gov.uk/codelist/statistical-markers/code/[e] |
| Earliest revision | [er] |
http://data.gov.uk/codelist/statistical-markers/code/[er] |
| Forecast | [f] |
http://data.gov.uk/codelist/statistical-markers/code/[f] |
| Low | [low] |
http://data.gov.uk/codelist/statistical-markers/code/[low] |
| Not significant | [ns] |
http://data.gov.uk/codelist/statistical-markers/code/[ns] |
| Provisional | [p] |
http://data.gov.uk/codelist/statistical-markers/code/[p] |
| Revised | [r] |
http://data.gov.uk/codelist/statistical-markers/code/[r] |
| Significance level of 0.05 | [s] |
http://data.gov.uk/codelist/statistical-markers/code/[s] |
| Significance level of 0.01 | [ss] |
http://data.gov.uk/codelist/statistical-markers/code/[ss] |
| Significance level of 0.001 | [sss] |
http://data.gov.uk/codelist/statistical-markers/code/[sss] |
| Low reliability | [u] |
http://data.gov.uk/codelist/statistical-markers/code/[u] |
| None recorded in survey | [w] |
http://data.gov.uk/codelist/statistical-markers/code/[w] |
| Not available | [x] |
http://data.gov.uk/codelist/statistical-markers/code/[x] |
| Not applicable | [z] |
http://data.gov.uk/codelist/statistical-markers/code/[z] |
| Label | IRI |
|---|---|
| Business, Trade and International Development | http://osr.statisticsauthority.gov.uk/themes/business-trade-international-development |
| Children, Education and Skills | http://osr.statisticsauthority.gov.uk/themes/children-education-skills |
| Crime and Security | http://osr.statisticsauthority.gov.uk/themes/crime-security |
| Economy | http://osr.statisticsauthority.gov.uk/themes/economy |
| Health and Social Care | http://osr.statisticsauthority.gov.uk/themes/health-social-care |
| Housing, Planning and Local Services | http://osr.statisticsauthority.gov.uk/themes/housing-planning-local-services |
| Labour Market and Welfare | http://osr.statisticsauthority.gov.uk/themes/labour-market-welfare |
| Population and Society | http://osr.statisticsauthority.gov.uk/themes/population-society |
| Transport, Environment and Climate Change | http://osr.statisticsauthority.gov.uk/themes/transport-environment-climate-change |
TODO: Cover media types from IANA
| Label | IRI |
|---|---|
| CSV | http://www.w3.org/ns/iana/media-types/text/csv#Resource |
| JSON | http://www.w3.org/ns/iana/media-types/application/json#Resource |
| Turtle | http://www.w3.org/ns/iana/media-types/text/turtle#Resource |
There are a variety of different ways that time can be represented in your data. Below are some examples:
| period_type | period_code | period_label |
|---|---|---|
| day | 1999-12-31 | 31-December-1999 |
For calendar day data we require the period_type to be day. In the period_code we require the year, the month followed by the day. For period_label we require the field to be the day, the month written fully and then the year. This will help with human readability.
| period_type | period_code | period_label |
|---|---|---|
| month | 2020-01 | January-2020 |
For monthly data that is from a calendar period we require the period_type to be month. In the period_code we require the year followed by the specified digit of the month. The period_label column is more human readable hence why it is showing the month's full name and the year.
| period_type | period_code | period_label |
|---|---|---|
| quarter | 2020-Q1 | 2020-Q1 |
For quarterly data that is from a calendar period we require the period_type to be quarter. In the period_code and period_label we require the field to be the same. The year followed by which quarter.
| period_type | period_code | period_label |
|---|---|---|
| year | 2020 | 2020 |
For calendar year data we require the period_type to be year. In the period_code and period_label we require the field to be the same. Just the year.
| period_type | period_code | period_label |
|---|---|---|
| government-year | 2020-2021 | 2020-2021 |
For government year which starts in April we require the period_type to be government-year. In the period_code and period_label we require the field to be the same. The year the period starts and the period where it ends.
| period_type | period_code | period_label |
|---|---|---|
| gregorian-interval | 2001-04-01 00:00:00/P2M | Apr-Jun 2001 |
Gregorian interval can be used if the time frame of your data does not conform to a standard time frame. This can also be used for monthly, quarterly and yearly data though slightly reduced clarity. We recommend only using the start and period method. Using the example above it is the 1st April 2001 as the start date. The P2M refers to how much time is within the period, the example being 2 months. For further details on how to construct a gregorian interval please refer to the ISO 8601 Durations section on Wikipedia.
| geography_code | geography_label | geography_type |
|---|---|---|
| K02000001 | United Kingdom | Country |
| E92000001 | England | Nation |
| E12000001 | North East | Region |
| E06000047 | County Durham | County or Unitary Authority |
| E07000088 | Gosport | Local Authority District |
| E14001252 | Gosport | Westminster Constituency |
The table above shows the variety of geography types that can be represented in your data. The important thing is that in the geography code column each entry has its own identifiable code.
| age_code | age_label |
|---|---|
| Y_GE16 | Aged 16 years and over |
| Y16T24 | Aged 16 to 24 |
| Y25T34 | Aged 25 to 34 |
| Y35T44 | Aged 35 to 44 |
| Y45T54 | Aged 45 to 54 |
| Y55T74 | Aged 55 to 74 |
| Y_GE75 | Aged 75 and over |
The examples in the table above show the best way to represent different age categories. his has come from the Statistical Data and Metadata eXchange (SDMX) guidelines 1
| sex_code | sex_label |
|---|---|
| F | Female |
| M | Male |
| _N | Non response |
| _O | Other |
| -U | Unknown |
| _Z | Not applicable |
The examples in the table above show the best way to represent different sex categories. This has come from the Statistical Data and Metadata eXchange (SDMX) guidelines 2