-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I recently observed a substantial difference in using rowData() when using SE and RangedSE instances:
> example(RangedSummarizedExperiment)
*** output flushed ***
> se <- as(rse, "SummarizedExperiment")
> microbenchmark::microbenchmark(rowData(se), rowData(rse))
Unit: microseconds
expr min lq mean median uq max neval
rowData(se) 129.264 134.4920 138.2764 136.8520 141.9285 154.541 100
rowData(rse) 598.731 621.2365 638.1464 633.3445 646.5985 1074.851 100This is because rowData(), or more precisely mcols on an RSE first gets the rowRanges, and then that object's mcols.
I discovered this because I noticed significant differences when manipulating SE and SCE object. One reason is that SCE inherits from RSE, and hence suffer from the above overhead.
> sce <- as(se, "SingleCellExperiment")
> microbenchmark::microbenchmark(mcols(se), mcols(rse), mcols(sce))
Unit: microseconds
expr min lq mean median uq max neval
mcols(se) 123.715 137.6745 163.4424 150.011 177.8125 412.293 100
mcols(rse) 589.028 635.6750 683.9597 667.770 716.7420 1086.609 100
mcols(sce) 2664.261 2835.4035 3029.9824 2935.945 3115.0290 4567.160 100I understand that there are historic reasons why SCE derives from RSE, but would it be conceivable to have the main SCE derive from SE, and have a RangedSCE as there's a RangedSE? I can imagine this would be a lot of work with risks of breaking things?
I am not really requesting any change here, just asking out of curiosity, as we are considering dropping using SCE for single-cell proteomics in favour of SEs because of that, and only switch the SCE when needed (for example when the reducedDim slot is populated in the downstream analyses)