Skip to content

longForm,SummarizedExperiment #85

@lgatto

Description

@lgatto

Coming back to the longForm discussed here, MEA now defines the following methods:

> suppressPackageStartupMessages(library(MultiAssayExperiment))
Warning message:
replacing previous import ‘S4Arrays::read_block’ by ‘DelayedArray::read_block’ when loading ‘SummarizedExperiment’
> showMethods("longForm")
Function: longForm (package BiocGenerics)
object="ANY"
object="ExperimentList"
object="MultiAssayExperiment"

ANY implicitly defines additional methods ...

> getMethod("longForm", "ANY")
Method Definition:

function (object, ...)
{
    .local <- function (object, colDataCols, i = 1L, ...)
    {
        rowNAMES <- rownames(object)
        if (is.null(rowNAMES))
            rowNames <- as.character(seq_len(nrow(object)))
        if (is(object, "ExpressionSet"))
            object <- Biobase::exprs(object)
        if (is(object, "SummarizedExperiment") || is(object,
            "RaggedExperiment"))
            object <- assay(object, i = i)
        BiocBaseUtils::checkInstalled("reshape2")
        res <- reshape2::melt(object, varnames = c("rowname",
            "colname"), value.name = "value")
        if (!is.character(res[["rowname"]]))
            res[["rowname"]] <- as.character(res[["rowname"]])
        res
    }
    .local(object, ...)
}
<bytecode: 0x59175b2aefc8>
<environment: namespace:MultiAssayExperiment>

Signatures:
        object
target  "ANY"
defined "ANY"

... including for a SummarizedExperiment

> nrows <- 5; ncols <- 2
> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
> colData <- DataFrame(Treatment=c("ChIP", "Input"), row.names=LETTERS[1:2])
> se0 <- SummarizedExperiment(assays=SimpleList(counts=counts), colData=colData)
> longForm(se0)
   rowname colname    value
1        1       A 1888.095
2        2       A 3194.072
3        3       A 7372.889
4        4       A 2488.492
5        5       A 7293.829
6        1       B 5895.799
7        2       B 9025.518
8        3       B 1884.100
9        4       B 3057.519
10       5       B 5762.292
> showMethods("longForm")
Function: longForm (package BiocGenerics)
object="ANY"
object="ExperimentList"
object="MultiAssayExperiment"
object="SummarizedExperiment"
    (inherited from: object="ANY")

Suggestion 1

I would suggest to implement longForm,SummarizedExperiment in the SummarizedExperiment package.

Suggestion 2

I would also suggest to allow to return all assays as a long table, ideally by default.

Current behaviour:

> assay(se0, "counts2") <- assay(se0) * 10
> longForm(se0, i = 1)
   rowname colname    value
1        1       A 1888.095
2        2       A 3194.072
3        3       A 7372.889
4        4       A 2488.492
5        5       A 7293.829
6        1       B 5895.799
7        2       B 9025.518
8        3       B 1884.100
9        4       B 3057.519
10       5       B 5762.292
> longForm(se0, i = 2)
   rowname colname    value
1        1       A 18880.95
2        2       A 31940.72
3        3       A 73728.89
4        4       A 24884.92
5        5       A 72938.29
6        1       B 58957.99
7        2       B 90255.18
8        3       B 18841.00
9        4       B 30575.19
10       5       B 57622.92

I would find it useful to have

> longForm(se0)
DataFrame with 20 rows and 4 columns
      rowname  colname     value   assayName
    <integer> <factor> <numeric> <character>
1           1        A   1888.10      counts
2           2        A   3194.07      counts
3           3        A   7372.89      counts
4           4        A   2488.49      counts
5           5        A   7293.83      counts
...       ...      ...       ...         ...
16          1        B   58958.0     counts2
17          2        B   90255.2     counts2
18          3        B   18841.0     counts2
19          4        B   30575.2     counts2
20          5        B   57622.9     counts2

Suggestion 3

I also think these long tables should incorporate colData and rowData columns.

Here's an example for a colData variable:

> longFormSE(se0, colvars = "Treatment")
DataFrame with 20 rows and 5 columns
      rowname  colname     value   assayName   Treatment
    <integer> <factor> <numeric> <character> <character>
1           1        A   1888.10      counts        ChIP
2           2        A   3194.07      counts        ChIP
3           3        A   7372.89      counts        ChIP
4           4        A   2488.49      counts        ChIP
5           5        A   7293.83      counts        ChIP
...       ...      ...       ...         ...         ...
16          1        B   58958.0     counts2       Input
17          2        B   90255.2     counts2       Input
18          3        B   18841.0     counts2       Input
19          4        B   30575.2     counts2       Input
20          5        B   57622.9     counts2       Input

A rowData variables:

> rowData(se0)$X <- letters[1:5]
> longFormSE(se0, rowvars = "X")
DataFrame with 20 rows and 5 columns
      rowname  colname     value   assayName           X
    <integer> <factor> <numeric> <character> <character>
1           1        A 9418.8870      counts           a
2           2        A 6657.9743      counts           b
3           3        A 1240.3003      counts           c
4           4        A 1278.6833      counts           d
5           5        A   27.7678      counts           e
...       ...      ...       ...         ...         ...
16          1        B   10652.9     counts2           a
17          2        B   34444.3     counts2           b
18          3        B   48373.1     counts2           c
19          4        B   21214.1     counts2           d
20          5        B   85826.3     counts2           e

and both, of course

> longFormSE(se0, colvars = "Treatment", rowvars = "X")
DataFrame with 20 rows and 6 columns
      rowname  colname     value   assayName   Treatment           X
    <integer> <factor> <numeric> <character> <character> <character>
1           1        A 9418.8870      counts        ChIP           a
2           2        A 6657.9743      counts        ChIP           b
3           3        A 1240.3003      counts        ChIP           c
4           4        A 1278.6833      counts        ChIP           d
5           5        A   27.7678      counts        ChIP           e
...       ...      ...       ...         ...         ...         ...
16          1        B   10652.9     counts2       Input           a
17          2        B   34444.3     counts2       Input           b
18          3        B   48373.1     counts2       Input           c
19          4        B   21214.1     counts2       Input           d
20          5        B   85826.3     counts2       Input           e

I would be happy to provide an initial implementation and unit test.

What do you think @hpages @LiNk-NY ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions