[FEATURE] Columnar JSON serialization format

Currently, the PGM JSON (and MessagePack) format (https://power-grid-model.readthedocs.io/en/stable/user_manual/serialization.html#json-serialization-format-specification) are all row-major. That is, one has to specify all attributes for the same component at once.

It would be nice to also be able to specify columnar component data. This addition needs to be made to the specification and the C++ (de-)serialization.

NOTE: the existing format can still be loaded in as either row-based or columnar data format. This issue is specifically about the serialized data.

NOTE: we probably need to bump the JSON schema version from `1.0` to `2.0`

## Proposal for format

### Row-based homogeneous component data (existing)

Introduced in https://github.com/PowerGridModel/power-grid-model/issues/320

```json
{
  "version": "1.0",
  "type": "input",
  "is_batch": false,
  "attributes": {
    "sym_load": ["id", "p_specified"],
    "asym_load": ["id", "p_specified"]
  },
  "data": {
    "sym_load": [
      [1, 1.0e4],
      [2, 1.1e5]
    ],
    "asym_load": [
      [4, [1.0e4, 1.1e4, 1.2e4]],
      [3, [1.1e5, 1.2e5, 1.3e5]]
    ]
  }
}
```

### Row-based inhomogeneous component data (current)

Introduced in https://github.com/PowerGridModel/power-grid-model/issues/320

```json
{
  "version": "1.0",
  "type": "input",
  "is_batch": false,
  "attributes": {},
  "data": {
    "sym_load": [
      {"id": 1, "p_specified": 1.0e4},
      {"id": 2, "p_specified": 1.1e5}
    ],
    "asym_load": [
      {"id": 4, "p_specified": [1.0e4, 1.1e4, 1.2e4]},
      {"id": 3, "p_specified": [1.1e5, 1.2e5, 1.3e5]}
    ]
  }
}
```

### Columnar component data (this issue)

```json
{
  "version": "1.0",
  "type": "input",
  "is_batch": false,
  "attributes": {},
  "data": {
    "sym_load": {
      "id": [1, 2],
      "p_specified": [1.0e4, 1.1e5]
    },
    "asym_load": {
      "id": [3, 4],
      "p_specified": [
        [1.0e4, 1.1e4, 1.2e4], 
        [1.1e5, 1.2e5, 1.3e5]
      ]
    }
  }
}
```

### Row-based homogeneous batch component data (existing)

Introduced in https://github.com/PowerGridModel/power-grid-model/issues/320

```jsonc
{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {
    "sym_load": ["id", "p_specified"],
    "asym_load": ["id", "p_specified"]
  },
  "data": [                         // scenarios
    {                               // scenario 0
      "sym_load": [
        [1, 1.0e4],
        [2, 1.1e5]
      ],
      "asym_load": [
        [3, [1.0e4, 1.1e4, 1.2e4]],
        [4, [1.1e5, 1.2e5, 1.3e5]]
      ]
    },
    {                               // scenario 1
      "sym_load": [
        [1, 2.0e4],
        [2, 2.1e5]
      ],
      "asym_load": [
        [3, [2.0e4, 2.1e4, 2.2e4]],
        [4, [2.1e5, 2.2e5, 2.3e5]]
      ]
    }
  ]
}
```

### Row-based inhomogeneous batch component data (current)

Introduced in https://github.com/PowerGridModel/power-grid-model/issues/320

```jsonc
{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {},
  "data": [                                              // scenarios
    {                                                    // scenario 0
      "sym_load": [
        {"id": 1, "p_specified": 1.0e4},
        {"id": 2, "p_specified": 1.1e5}
      ],
      "asym_load": [
        {"id": 3, "p_specified": [1.0e4, 1.1e4, 1.2e4]},
        {"id": 4, "p_specified": [1.1e5, 1.2e5, 1.3e5]}
      ]
    },
    {                                                    // scenario 1
      "sym_load": [
        {"id": 1, "p_specified": 2.0e4},
        {"id": 2, "p_specified": 2.1e5}
      ],
      "asym_load": [
        {"id": 3, "p_specified": [2.0e4, 2.1e4, 2.2e4]},
        {"id": 4, "p_specified": [2.1e5, 2.2e5, 2.3e5]}
      ]
    }
  ]
}
```

### Scenario-major columnar batch serialization format (this issue)

The hybrid between the current batch serialization format and columnar component data

```jsonc
{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {},
  "data": [                           // scenarios
    {                                 // scenario 0
      "sym_load": {
        "id": [1, 2],
        "p_specified": [1.0e4, 1.1e5]
      },
      "asym_load": {
        "id": [3, 4],
        "p_specified": [
          [1.0e4, 1.1e4, 1.2e4], 
          [1.1e5, 1.2e5, 1.3e5]
        ]
      }
    },
    {                                 // scenario 1
      "sym_load": {
        "id": [1, 2],
        "p_specified": [2.0e4, 2.1e5]
      },
      "asym_load": {
        "id": [3, 4],
        "p_specified": [
          [2.0e4, 2.1e4, 2.2e4], 
          [2.1e5, 2.2e5, 2.3e5]
        ]
      }
    }
  ]
}
```

### Component-major columnar batch serialization format

Going columnar component data all the way

```jsonc
{
  "version": "1.0",
  "type": "update",
  "is_batch": true,
  "attributes": {},
  "data": {
    "sym_load": {
      "id": [   // scenarios
        [1, 2], // scenario 0
        [1, 2]  // scenario 1
      ],
      "p_specified": [  // scenarios
        [1.0e4, 1.1e5], // scenario 0
        [2.0e4, 2.1e5]  // scenario 1
      ]
    },
    "asym_load": {
      "id": [   // scenarios
        [3, 4], // scenario 0
        [3, 4]  // scenario 1
      ]
      "p_specified": [           // scenarios
        [                        // scenario 0
          [1.0e4, 1.1e4, 1.2e4], 
          [1.1e5, 1.2e5, 1.3e5]
        ],
        [                        // scenario 1
          [2.0e4, 2.1e4, 2.2e4], 
          [2.1e5, 2.2e5, 2.3e5]
        ]
      ]
    }
  }
}
```

## Additional considerations

* Columnar component data cannot have columns of varying length, because that would result in inconsistencies in the mapping of IDs to the other attributes.
* It is still possible to not specify some attributes.
* Pre-specifying the attributes is still allowed, but doesn't have any benefits, similarly to how it works for inhomogeneous data.
* There is no data format conflict, because the current form has `"node": Array`, while the added feature looks like `"node": Object`. Everything is backwards compatible.
* We cannot go with an equivalent format in which we do not specify the attributes at component level, like for row-based homogeneous data, because that would cause a data format conflict and name clashes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Columnar JSON serialization format #1381

Proposal for format

Row-based homogeneous component data (existing)

Row-based inhomogeneous component data (current)

Columnar component data (this issue)

Row-based homogeneous batch component data (existing)

Row-based inhomogeneous batch component data (current)

Scenario-major columnar batch serialization format (this issue)

Component-major columnar batch serialization format

Additional considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Columnar JSON serialization format #1381

Description

Proposal for format

Row-based homogeneous component data (existing)

Row-based inhomogeneous component data (current)

Columnar component data (this issue)

Row-based homogeneous batch component data (existing)

Row-based inhomogeneous batch component data (current)

Scenario-major columnar batch serialization format (this issue)

Component-major columnar batch serialization format

Additional considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions