Currently, the PGM JSON (and MessagePack) format (https://power-grid-model.readthedocs.io/en/stable/user_manual/serialization.html#json-serialization-format-specification) are all row-major. That is, one has to specify all attributes for the same component at once.
It would be nice to also be able to specify columnar component data. This addition needs to be made to the specification and the C++ (de-)serialization.
NOTE: the existing format can still be loaded in as either row-based or columnar data format. This issue is specifically about the serialized data.
NOTE: we probably need to bump the JSON schema version from 1.0 to 2.0
Proposal for format
Row-based homogeneous component data (existing)
Introduced in #320
{
"version": "1.0",
"type": "input",
"is_batch": false,
"attributes": {
"sym_load": ["id", "p_specified"],
"asym_load": ["id", "p_specified"]
},
"data": {
"sym_load": [
[1, 1.0e4],
[2, 1.1e5]
],
"asym_load": [
[4, [1.0e4, 1.1e4, 1.2e4]],
[3, [1.1e5, 1.2e5, 1.3e5]]
]
}
}
Row-based inhomogeneous component data (current)
Introduced in #320
{
"version": "1.0",
"type": "input",
"is_batch": false,
"attributes": {},
"data": {
"sym_load": [
{"id": 1, "p_specified": 1.0e4},
{"id": 2, "p_specified": 1.1e5}
],
"asym_load": [
{"id": 4, "p_specified": [1.0e4, 1.1e4, 1.2e4]},
{"id": 3, "p_specified": [1.1e5, 1.2e5, 1.3e5]}
]
}
}
Columnar component data (this issue)
{
"version": "1.0",
"type": "input",
"is_batch": false,
"attributes": {},
"data": {
"sym_load": {
"id": [1, 2],
"p_specified": [1.0e4, 1.1e5]
},
"asym_load": {
"id": [3, 4],
"p_specified": [
[1.0e4, 1.1e4, 1.2e4],
[1.1e5, 1.2e5, 1.3e5]
]
}
}
}
Row-based homogeneous batch component data (existing)
Introduced in #320
Row-based inhomogeneous batch component data (current)
Introduced in #320
Scenario-major columnar batch serialization format (this issue)
The hybrid between the current batch serialization format and columnar component data
Component-major columnar batch serialization format
Going columnar component data all the way
Additional considerations
- Columnar component data cannot have columns of varying length, because that would result in inconsistencies in the mapping of IDs to the other attributes.
- It is still possible to not specify some attributes.
- Pre-specifying the attributes is still allowed, but doesn't have any benefits, similarly to how it works for inhomogeneous data.
- There is no data format conflict, because the current form has
"node": Array, while the added feature looks like "node": Object. Everything is backwards compatible.
- We cannot go with an equivalent format in which we do not specify the attributes at component level, like for row-based homogeneous data, because that would cause a data format conflict and name clashes
Currently, the PGM JSON (and MessagePack) format (https://power-grid-model.readthedocs.io/en/stable/user_manual/serialization.html#json-serialization-format-specification) are all row-major. That is, one has to specify all attributes for the same component at once.
It would be nice to also be able to specify columnar component data. This addition needs to be made to the specification and the C++ (de-)serialization.
NOTE: the existing format can still be loaded in as either row-based or columnar data format. This issue is specifically about the serialized data.
NOTE: we probably need to bump the JSON schema version from
1.0to2.0Proposal for format
Row-based homogeneous component data (existing)
Introduced in #320
{ "version": "1.0", "type": "input", "is_batch": false, "attributes": { "sym_load": ["id", "p_specified"], "asym_load": ["id", "p_specified"] }, "data": { "sym_load": [ [1, 1.0e4], [2, 1.1e5] ], "asym_load": [ [4, [1.0e4, 1.1e4, 1.2e4]], [3, [1.1e5, 1.2e5, 1.3e5]] ] } }Row-based inhomogeneous component data (current)
Introduced in #320
{ "version": "1.0", "type": "input", "is_batch": false, "attributes": {}, "data": { "sym_load": [ {"id": 1, "p_specified": 1.0e4}, {"id": 2, "p_specified": 1.1e5} ], "asym_load": [ {"id": 4, "p_specified": [1.0e4, 1.1e4, 1.2e4]}, {"id": 3, "p_specified": [1.1e5, 1.2e5, 1.3e5]} ] } }Columnar component data (this issue)
{ "version": "1.0", "type": "input", "is_batch": false, "attributes": {}, "data": { "sym_load": { "id": [1, 2], "p_specified": [1.0e4, 1.1e5] }, "asym_load": { "id": [3, 4], "p_specified": [ [1.0e4, 1.1e4, 1.2e4], [1.1e5, 1.2e5, 1.3e5] ] } } }Row-based homogeneous batch component data (existing)
Introduced in #320
{ "version": "1.0", "type": "update", "is_batch": true, "attributes": { "sym_load": ["id", "p_specified"], "asym_load": ["id", "p_specified"] }, "data": [ // scenarios { // scenario 0 "sym_load": [ [1, 1.0e4], [2, 1.1e5] ], "asym_load": [ [3, [1.0e4, 1.1e4, 1.2e4]], [4, [1.1e5, 1.2e5, 1.3e5]] ] }, { // scenario 1 "sym_load": [ [1, 2.0e4], [2, 2.1e5] ], "asym_load": [ [3, [2.0e4, 2.1e4, 2.2e4]], [4, [2.1e5, 2.2e5, 2.3e5]] ] } ] }Row-based inhomogeneous batch component data (current)
Introduced in #320
{ "version": "1.0", "type": "update", "is_batch": true, "attributes": {}, "data": [ // scenarios { // scenario 0 "sym_load": [ {"id": 1, "p_specified": 1.0e4}, {"id": 2, "p_specified": 1.1e5} ], "asym_load": [ {"id": 3, "p_specified": [1.0e4, 1.1e4, 1.2e4]}, {"id": 4, "p_specified": [1.1e5, 1.2e5, 1.3e5]} ] }, { // scenario 1 "sym_load": [ {"id": 1, "p_specified": 2.0e4}, {"id": 2, "p_specified": 2.1e5} ], "asym_load": [ {"id": 3, "p_specified": [2.0e4, 2.1e4, 2.2e4]}, {"id": 4, "p_specified": [2.1e5, 2.2e5, 2.3e5]} ] } ] }Scenario-major columnar batch serialization format (this issue)
The hybrid between the current batch serialization format and columnar component data
{ "version": "1.0", "type": "update", "is_batch": true, "attributes": {}, "data": [ // scenarios { // scenario 0 "sym_load": { "id": [1, 2], "p_specified": [1.0e4, 1.1e5] }, "asym_load": { "id": [3, 4], "p_specified": [ [1.0e4, 1.1e4, 1.2e4], [1.1e5, 1.2e5, 1.3e5] ] } }, { // scenario 1 "sym_load": { "id": [1, 2], "p_specified": [2.0e4, 2.1e5] }, "asym_load": { "id": [3, 4], "p_specified": [ [2.0e4, 2.1e4, 2.2e4], [2.1e5, 2.2e5, 2.3e5] ] } } ] }Component-major columnar batch serialization format
Going columnar component data all the way
{ "version": "1.0", "type": "update", "is_batch": true, "attributes": {}, "data": { "sym_load": { "id": [ // scenarios [1, 2], // scenario 0 [1, 2] // scenario 1 ], "p_specified": [ // scenarios [1.0e4, 1.1e5], // scenario 0 [2.0e4, 2.1e5] // scenario 1 ] }, "asym_load": { "id": [ // scenarios [3, 4], // scenario 0 [3, 4] // scenario 1 ] "p_specified": [ // scenarios [ // scenario 0 [1.0e4, 1.1e4, 1.2e4], [1.1e5, 1.2e5, 1.3e5] ], [ // scenario 1 [2.0e4, 2.1e4, 2.2e4], [2.1e5, 2.2e5, 2.3e5] ] ] } } }Additional considerations
"node": Array, while the added feature looks like"node": Object. Everything is backwards compatible.