Skip to content

Support for RleListMatrix (?) #62

@jonocarroll

Description

@jonocarroll

Related to #27, though I note that the following now works

library(DelayedArray)
DelayedArray(
  matrix(
    IntegerList(
      c(list(c(1L, 1L)), list(c(1L,1L)), list(c(1L,1L)), list(c(2L,2L)))
    ), 
    nrow = 2, ncol = 2)
)
#> <2 x 2> matrix of class DelayedMatrix and type "list":
#>      [,1] [,2]
#> [1,] 1, 1 1, 1
#> [2,] 1, 1 2, 2

Created on 2020-02-28 by the reprex package (v0.3.0)

Is there a motivation to support RleListMatrix? For the same use case as above, I'm using VariantAnnotation to build a CompressedVcf object and it has matrices of lists. The list elements are in many cases NA so it may be efficient to be able to store these as an Rle-derived object. I can't go as far as to verify that such a structure would benefit from Rle - would the elements be sufficiently contiguous?

My workaround at the moment is to collapse the list elements into single delimited strings, in which case DelayedArray or RleMatrix work out of the box. In this case the string concatenation results in the matrix object decreasing in size by a factor of ~8 (potentially due to global string pooling). Converting to RleMatrix reduces it again by an additional factor of ~16. Total compression from matrix of lists to character RleMatrix is 128x. If RleListMatrix was able provide a comparable benefit without converting to string then that could be very useful.

I'll link another issue to this one specific to VariantAnnotation, but I thought I'd check if this was a) possible; b) useful; and c) of interest.

Ping @lawremi who first proposed investigating support for this structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions