Skip to content

[FEATURE] Columnar JSON serialization format #1381

@mgovers

Description

@mgovers

Currently, the PGM JSON (and MessagePack) format (https://power-grid-model.readthedocs.io/en/stable/user_manual/serialization.html#json-serialization-format-specification) are all row-major. That is, one has to specify all attributes for the same component at once.

It would be nice to also be able to specify columnar component data. This addition needs to be made to the specification and the C++ (de-)serialization.

NOTE: the existing format can still be loaded in as either row-based or columnar data format. This issue is specifically about the serialized data.

NOTE: we probably need to bump the JSON schema version from 1.0 to 2.0

Proposal for format

Row-based homogeneous component data (existing)

Introduced in #320

{
  "version": "1.0",
  "type": "input",
  "is_batch": false,
  "attributes": {
    "sym_load": ["id", "p_specified"],
    "asym_load": ["id", "p_specified"]
  },
  "data": {
    "sym_load": [
      [1, 1.0e4],
      [2, 1.1e5]
    ],
    "asym_load": [
      [4, [1.0e4, 1.1e4, 1.2e4]],
      [3, [1.1e5, 1.2e5, 1.3e5]]
    ]
  }
}

Row-based inhomogeneous component data (current)

Introduced in #320

{
  "version": "1.0",
  "type": "input",
  "is_batch": false,
  "attributes": {},
  "data": {
    "sym_load": [
      {"id": 1, "p_specified": 1.0e4},
      {"id": 2, "p_specified": 1.1e5}
    ],
    "asym_load": [
      {"id": 4, "p_specified": [1.0e4, 1.1e4, 1.2e4]},
      {"id": 3, "p_specified": [1.1e5, 1.2e5, 1.3e5]}
    ]
  }
}

Columnar component data (this issue)

{
  "version": "1.0",
  "type": "input",
  "is_batch": false,
  "attributes": {},
  "data": {
    "sym_load": {
      "id": [1, 2],
      "p_specified": [1.0e4, 1.1e5]
    },
    "asym_load": {
      "id": [3, 4],
      "p_specified": [
        [1.0e4, 1.1e4, 1.2e4], 
        [1.1e5, 1.2e5, 1.3e5]
      ]
    }
  }
}

Row-based homogeneous batch component data (existing)

Introduced in #320

{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {
    "sym_load": ["id", "p_specified"],
    "asym_load": ["id", "p_specified"]
  },
  "data": [                         // scenarios
    {                               // scenario 0
      "sym_load": [
        [1, 1.0e4],
        [2, 1.1e5]
      ],
      "asym_load": [
        [3, [1.0e4, 1.1e4, 1.2e4]],
        [4, [1.1e5, 1.2e5, 1.3e5]]
      ]
    },
    {                               // scenario 1
      "sym_load": [
        [1, 2.0e4],
        [2, 2.1e5]
      ],
      "asym_load": [
        [3, [2.0e4, 2.1e4, 2.2e4]],
        [4, [2.1e5, 2.2e5, 2.3e5]]
      ]
    }
  ]
}

Row-based inhomogeneous batch component data (current)

Introduced in #320

{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {},
  "data": [                                              // scenarios
    {                                                    // scenario 0
      "sym_load": [
        {"id": 1, "p_specified": 1.0e4},
        {"id": 2, "p_specified": 1.1e5}
      ],
      "asym_load": [
        {"id": 3, "p_specified": [1.0e4, 1.1e4, 1.2e4]},
        {"id": 4, "p_specified": [1.1e5, 1.2e5, 1.3e5]}
      ]
    },
    {                                                    // scenario 1
      "sym_load": [
        {"id": 1, "p_specified": 2.0e4},
        {"id": 2, "p_specified": 2.1e5}
      ],
      "asym_load": [
        {"id": 3, "p_specified": [2.0e4, 2.1e4, 2.2e4]},
        {"id": 4, "p_specified": [2.1e5, 2.2e5, 2.3e5]}
      ]
    }
  ]
}

Scenario-major columnar batch serialization format (this issue)

The hybrid between the current batch serialization format and columnar component data

{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {},
  "data": [                           // scenarios
    {                                 // scenario 0
      "sym_load": {
        "id": [1, 2],
        "p_specified": [1.0e4, 1.1e5]
      },
      "asym_load": {
        "id": [3, 4],
        "p_specified": [
          [1.0e4, 1.1e4, 1.2e4], 
          [1.1e5, 1.2e5, 1.3e5]
        ]
      }
    },
    {                                 // scenario 1
      "sym_load": {
        "id": [1, 2],
        "p_specified": [2.0e4, 2.1e5]
      },
      "asym_load": {
        "id": [3, 4],
        "p_specified": [
          [2.0e4, 2.1e4, 2.2e4], 
          [2.1e5, 2.2e5, 2.3e5]
        ]
      }
    }
  ]
}

Component-major columnar batch serialization format

Going columnar component data all the way

{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {},
  "data": {
    "sym_load": {
      "id": [   // scenarios
        [1, 2], // scenario 0
        [1, 2]  // scenario 1
      ],
      "p_specified": [  // scenarios
        [1.0e4, 1.1e5], // scenario 0
        [2.0e4, 2.1e5]  // scenario 1
      ]
    },
    "asym_load": {
      "id": [   // scenarios
        [3, 4], // scenario 0
        [3, 4]  // scenario 1
      ]
      "p_specified": [           // scenarios
        [                        // scenario 0
          [1.0e4, 1.1e4, 1.2e4], 
          [1.1e5, 1.2e5, 1.3e5]
        ],
        [                        // scenario 1
          [2.0e4, 2.1e4, 2.2e4], 
          [2.1e5, 2.2e5, 2.3e5]
        ]
      ]
    }
  }
}

Additional considerations

  • Columnar component data cannot have columns of varying length, because that would result in inconsistencies in the mapping of IDs to the other attributes.
  • It is still possible to not specify some attributes.
  • Pre-specifying the attributes is still allowed, but doesn't have any benefits, similarly to how it works for inhomogeneous data.
  • There is no data format conflict, because the current form has "node": Array, while the added feature looks like "node": Object. Everything is backwards compatible.
  • We cannot go with an equivalent format in which we do not specify the attributes at component level, like for row-based homogeneous data, because that would cause a data format conflict and name clashes

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions