Skip to content

New code matrix packet types #21

@maksverver

Description

@maksverver

Currently, the Par3 spec supports three types of code matrices:

  1. Cauchy Matrix Packet
  2. Sparse Random Matrix Packet (generated from a specific PRNG)
  3. Explicit Matrix Packet

In practice, these require an O(N^3) matrix inversion for recovery, and an O(N^2) matrix multiplication for both generation and recovery. (The sparse random matrix is more efficient during generation; not sure if this also holds for recovery.)

However it looks like the state of the art has improved, with some fast implementations of Reed-Solomon codes being proposed:

The first two provide the same redundancy as the Cauchy matrix, but can be support generation and recovery in O(N log N) time, making them much faster for large number of blocks. Leopard-RS claims to encode at 450 MB/s and decode at 190 MB/s, which is really fast.

The latter two should be even faster, though like sparse random matrices, they might need slightly more recovery blocks than the number of errors in a cohort.

It might be possible to support some of these coding schemes with an explicit matrix packet (I haven't delved deeply enough into the implementation to figure it out; the math is a bit intimidating), but even if true, that leaves two problems:

  1. The explicit matrix packet takes up a lot of space, especially when interleaving is involved (see Missing feature: subslice hashes #11).
  2. It may not be trivial to recover an efficient algorithm from an explicit packet.

Which brings me to the main question: should Par3 support any of these coding schemes explicitly? If so, which ones? And if so, should we have have explicit support for interleaving?

It's worth noting that @Yutaka-Sawada already implemented a Leopard matrix packet:

The FFT matrix packet has a type value of "PAR FFT\0" (ASCII). The packet's body contains the following:

*Table: FFT Packet Body Contents*

| Length (bytes) | Type         | Description                          |
|---------------:|:-------------|:-------------------------------------|
|        8       | unsigned int | Index of first input block           |
|        8       | unsigned int | Index of last input block plus 1     |
|        1       | signed int   | Max number of blocks (as power of 2) |
|      0 ~ 8     | byte[]       | Number of interleaving blocks        |

One solution is to just add this to the spec. But that leads to a side question: if Leopard-RS is an option, how useful are the other coding schemes? Are e.g. the sparse random codes sufficiently faster to justify using them instead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions