-
Notifications
You must be signed in to change notification settings - Fork 24
MassSpectrumOnDisk Functionality #60
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,3 +7,5 @@ | |
| ^hooks$ | ||
| ^playground$ | ||
| ^revdep$ | ||
| ^.*\.Rproj$ | ||
| ^\.Rproj\.user$ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,3 +2,4 @@ | |
| *.so | ||
| *.o | ||
| *.rds | ||
| .Rproj.user | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,15 +47,24 @@ importFrom("utils", | |
| "relist", | ||
| "tail") | ||
|
|
||
| importFrom("matter", | ||
| "matter_vec", | ||
| "matter_fc") | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any code that uses
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I remove, when building the package a warning is always thrown: |
||
|
|
||
| importClassesFrom("matter", | ||
| "matter_vec", | ||
| "matter_fc") | ||
|
|
||
| exportClasses("MassPeaks", | ||
| "MassSpectrum") | ||
| "MassSpectrum", | ||
| "MassSpectrumOnDisk") | ||
|
|
||
| export("alignSpectra", | ||
| "averageMassSpectra", | ||
| "binPeaks", | ||
| "createMassPeaks", | ||
| "createMassSpectrum", | ||
| "createMassSpectrumOnDisk", | ||
| "determineWarpingFunctions", | ||
| "filterPeaks", | ||
| "findEmptyMassObjects", | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,29 @@ | ||
| ## basic class for all spectra based information | ||
|
|
||
| ## Set a class union to extend slots | ||
| # type to matter Objects | ||
| setClassUnion("NumericOrOnDisk", c("numeric", "matter_vec")) | ||
|
|
||
|
|
||
| setClass("AbstractMassObject", | ||
| slots=list(mass="numeric", intensity="numeric", | ||
| slots=list(mass="NumericOrOnDisk", | ||
| intensity="NumericOrOnDisk", | ||
| metaData="list"), | ||
| prototype=list(mass=numeric(), intensity=numeric(), | ||
| prototype=list(mass=numeric(), | ||
| intensity=numeric(), | ||
| metaData=list()), | ||
| contains="VIRTUAL") | ||
|
|
||
| ## represent a spectrum | ||
| setClass("MassSpectrum", | ||
| contains="AbstractMassObject") | ||
|
|
||
| ## represent an On-disk spectrum | ||
| setClass("MassSpectrumOnDisk", | ||
| slots = list(path = "character"), | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why this slot is needed? Isn't the file path part of the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, but I added it for convenience; not to call additional functions. If we opt to removing matter altogether, then it will have to stay. |
||
| prototype = list(path = character()), | ||
| contains="AbstractMassObject") | ||
|
|
||
| ## represent a peak list from a single spectrum | ||
| setClass("MassPeaks", | ||
| slots=list(snr="numeric"), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -130,6 +130,60 @@ if (is.null(getGeneric("totalIonCurrent"))) { | |
| } | ||
| ## end of MassSpectrum | ||
|
|
||
| ## MassSpectrumOnDisk | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All these generics are already defined. No need to add any of them again.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought every class has its own Generics. I see your point now, I'll remove them. |
||
| if (is.null(getGeneric("approxfun"))) { | ||
| setGeneric("approxfun", | ||
| function(x, y=NULL, method="linear", yleft, yright, rule=1, f=0, | ||
| ties=mean) | ||
| standardGeneric("approxfun")) | ||
| } | ||
| if (is.null(getGeneric("calibrateIntensity"))) { | ||
| setGeneric("calibrateIntensity", | ||
| function(object, ...) standardGeneric("calibrateIntensity")) | ||
| } | ||
| if (is.null(getGeneric("detectPeaks"))) { | ||
| setGeneric("detectPeaks", | ||
| function(object, ...) standardGeneric("detectPeaks")) | ||
| } | ||
| if (is.null(getGeneric("estimateBaseline"))) { | ||
| setGeneric("estimateBaseline", | ||
| function(object, method=c("SNIP", "ConvexHull", "Median"), ...) | ||
| standardGeneric("estimateBaseline")) | ||
| } | ||
| if (is.null(getGeneric("estimateNoise"))) { | ||
| setGeneric("estimateNoise", | ||
| function(object, ...) standardGeneric("estimateNoise")) | ||
| } | ||
| if (is.null(getGeneric(".findLocalMaxima"))) { | ||
| setGeneric(".findLocalMaxima", | ||
| function(object, halfWindowSize=20L) | ||
| standardGeneric(".findLocalMaxima")) | ||
| } | ||
| if (is.null(getGeneric(".findLocalMaximaLogical"))) { | ||
| setGeneric(".findLocalMaximaLogical", | ||
| function(object, halfWindowSize=20L) | ||
| standardGeneric(".findLocalMaximaLogical")) | ||
| } | ||
| if (is.null(getGeneric("isRegular"))) { | ||
| setGeneric("isRegular", | ||
| function(object, ...) standardGeneric("isRegular")) | ||
| } | ||
| if (is.null(getGeneric("removeBaseline"))) { | ||
| setGeneric("removeBaseline", | ||
| function(object, ...) standardGeneric("removeBaseline")) | ||
| } | ||
| if (is.null(getGeneric("smoothIntensity"))) { | ||
| setGeneric("smoothIntensity", | ||
| function(object, ...) | ||
| standardGeneric("smoothIntensity")) | ||
| } | ||
| if (is.null(getGeneric("totalIonCurrent"))) { | ||
| setGeneric("totalIonCurrent", | ||
| function(object) standardGeneric("totalIonCurrent")) | ||
| } | ||
| ## end of MassSpectrumOnDisk | ||
|
|
||
|
|
||
| ## MassPeaks | ||
| if (is.null(getGeneric("labelPeaks"))) { | ||
| setGeneric("labelPeaks", function(object, ...) standardGeneric("labelPeaks")) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,7 +6,20 @@ setMethod(f="approxfun", | |
| if (isEmpty(x)) { | ||
| function(x)rep.int(NA, length(x)) | ||
| } else { | ||
| approxfun(x=x@mass, y=x@intensity, method=method, | ||
| approxfun(x=mass(x), y=intensity(x), method=method, | ||
| yleft=yleft, yright=yright, rule=rule, f=f, ties=ties) | ||
| } | ||
| }) | ||
|
|
||
| ## MassSpectrumOnDisk | ||
| setMethod(f="approxfun", | ||
| signature=signature(x="MassSpectrumOnDisk"), | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After changing
I tend to the first solution (then the signature has to be changed into
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean that instead of defining a method for every |
||
| definition=function(x, y=NULL, method="linear", yleft, yright, | ||
| rule=1L, f=0L, ties=mean) { | ||
| if (isEmpty(x)) { | ||
| function(x)rep.int(NA, length(x)) | ||
| } else { | ||
| approxfun(x=mass(x), y=intensity(x), method=method, | ||
| yleft=yleft, yright=yright, rule=rule, f=f, ties=ties) | ||
| } | ||
| }) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,30 @@ setMethod(f="calibrateIntensity", | |
| }) | ||
| }) | ||
|
|
||
| ## MassSpectrumOnDisk | ||
| setMethod(f="calibrateIntensity", | ||
| signature=signature(object="MassSpectrumOnDisk"), | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same problem as for
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted. |
||
| definition=function(object, | ||
| method=c("TIC", "PQN", "median"), | ||
| range, ...) { | ||
|
|
||
| method <- match.arg(method) | ||
|
|
||
| switch(method, | ||
| "TIC" = , | ||
| "median" = { | ||
| .transformIntensity(object, fun=.calibrateIntensitySimple, | ||
| offset=0L, | ||
| scaling=.scalingFactor(object, method=method, | ||
| range=range)) | ||
| }, | ||
| "PQN" = { | ||
| stop(dQuote("PQN"), | ||
| " is not supported for a single MassSpectrum object.") | ||
| }) | ||
| }) | ||
|
|
||
|
|
||
| ## list | ||
| setMethod(f="calibrateIntensity", | ||
| signature=signature(object="list"), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,13 +28,51 @@ setMethod(f="detectPeaks", | |
| metaData=object@metaData) | ||
| }) | ||
|
|
||
| ## list | ||
|
|
||
|
|
||
|
|
||
| ## MassSpectrumOnDisk | ||
| setMethod(f="detectPeaks", | ||
| signature=signature(object="MassSpectrumOnDisk"), | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same problem as for
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted |
||
| definition=function(object, halfWindowSize=20L, | ||
| method=c("MAD", "SuperSmoother"), SNR=2L, ...) { | ||
|
|
||
| tmpMass <- mass(object) | ||
| tmpIntensity <- intensity(object) | ||
|
|
||
| ## empty spectrum? | ||
| if (.isEmptyWarning(object)) { | ||
| return(createMassPeaks(mass=tmpMass, intensity=tmpIntensity, | ||
| metaData=object@metaData)) | ||
| } | ||
|
|
||
| ## estimate noise | ||
| noise <- .estimateNoise(x=tmpMass, y=tmpIntensity, method=method, ...) | ||
|
|
||
| ## find local maxima | ||
| isLocalMaxima <- .findLocalMaximaLogical(object, | ||
| halfWindowSize=halfWindowSize) | ||
|
|
||
| ## include only local maxima which are above the noise | ||
| isAboveNoise <- tmpIntensity > (SNR * noise) | ||
|
|
||
| peakIdx <- which(isAboveNoise & isLocalMaxima) | ||
|
|
||
| createMassPeaks(mass=tmpMass[peakIdx], | ||
| intensity=tmpIntensity[peakIdx], | ||
| snr=tmpIntensity[peakIdx] / noise[peakIdx], | ||
| metaData=object@metaData) | ||
| }) | ||
|
|
||
| ## list | ||
| setMethod(f="detectPeaks", | ||
| signature=signature(object="list"), | ||
| definition=function(object, ...) { | ||
|
|
||
| ## test arguments | ||
| .stopIfNotIsMassSpectrumList(object) | ||
|
|
||
| .mapply(detectPeaks, object, ...) | ||
| }) | ||
|
|
||
| ## test arguments | ||
| .stopIfNotIsMassSpectrumList(object) | ||
|
|
||
| .mapply(detectPeaks, object, ...) | ||
| }) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,3 +12,18 @@ setMethod(f="estimateBaseline", | |
| intensity=.estimateBaseline(x=object@mass, y=object@intensity, | ||
| method=method, ...)) | ||
| }) | ||
|
|
||
| ## MassSpectrumOnDisk | ||
| setMethod(f="estimateBaseline", | ||
| signature=signature(object="MassSpectrumOnDisk"), | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same problem as for |
||
| definition=function(object, method=c("SNIP", "TopHat", "ConvexHull", | ||
| "median"), | ||
| ...) { | ||
| if (.isEmptyWarning(object)) { | ||
| return(NA) | ||
| } | ||
|
|
||
| cbind(mass=mass(object), | ||
| intensity=.estimateBaseline(x=mass(object), y=intensity(object), | ||
| method=method, ...)) | ||
| }) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary to trick R to install dependencies from Bioconductor, right? (I am not sure that this works/will be accepted on CRAN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply/review Sebastian.
I just tested this thing; it does not work, although I found a lot of people suggesting this trick! I see your point on removing the dependency on matter. As you showed in your
onDiskVecexample, we just need to write and read one vector at a time. TheonDiskVecshould havepath,offset,sizeandlengthslots to account forimzMLdatatype.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if that don't work I don't really want to "Depend" or "Import" a Bioconductor package if it is not really necessary. If it would be possible to move it to "Suggests" it would be fine (but I am not sure that this is really possible).
I add a new branch: https://github.com/sgibb/MALDIquant/tree/OnDiskVector
If you like you could use and play with the new
OnDiskVectorclass. It should work for both non-imzML and imzML data (it supports single files and offsets):Implementation could be found in https://github.com/sgibb/MALDIquant/blob/OnDiskVector/R/OnDiskVector-class.R
(Currently I have no time to really finish this thing. There is still the problem with
odv2 <- odvand a subsequently different change of both vectors on the same file.)Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use a similar approach as we discussed for
MSnbasein lgatto/MSnbase#429. TLDR; we store a modification counter in the object and in an additional file (inMSnbasewe use the same file but that won't work for existing idb files). The modification counter is incremented if[<-is called, e.g.:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply Sebastian & sorry for the delayed reply.
So the way to go now, is to replicate whatever I did with the matter package but using the onDiskVector Class instead on the newly opened branch, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you satisfied with the
OnDiskVectorclass that would be the way to go.