Skip to content

Should differential encoding be used? #15

@qwertie

Description

@qwertie

I'm not sure what the general attitude in the CG is about compression, but I remember that early on there was a strong focus on minimizing file size. With that in mind, standard compression algorithms like deflate don't like arbitrary numbers - they are unable, for instance, to predict that an increasing number sequence will continue to increase, so a workaround for this is to store numbers as differences, e.g. instead of [5, 8, 11, 21], store [5, 3, 3, 10]. This makes the numbers smaller, which tends to increase the compression ratio because it concentrates probability mass (e.g. in arithmetic/huffman encoding) toward smaller numbers.

The current format is quite natural:

the u32 byte offset of the hinted instruction from the first instruction of the function.

But since the list must be sorted, it would be straightforward to use differential encoding instead.

I haven't been very active in the Wasm community but I remember in the beginning there was an idea of having higher-level Wasm binary encodings, outside the core spec and implemented by 3rd parties, that could be used to optimize file size. Did that ever actually happen? If not, optimizing items in the core spec for compressibility starts to look more worthwhile.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions