Skip to content

Interpreting the output #14

@jemunro

Description

@jemunro

Hello,

Are you able to give some guidance on how to interpret the output?
For example:

INFO:splitStrain.py has started.
INFO:sample name: SAMEA1100847.ERR2509676.recal.bam
INFO:reference name: Chromosome, reference length: 4411532
INFO:regionStart: 100, regionEnd: 4000000
INFO:depth threshold percent: 75
INFO:entropy threshold: 0.0
INFO:using gff: tuberculosis.filtered-intervals.gff
INFO:Likelihood Ratio Statistic: -2*log(LR) = 12495, treshold: 1920
INFO:using the model:GMM
file    alpha   min_LR_thresh   LR_statistic    log-p-value     p-value proportions
SAMEA1100847.ERR2509676.recal.bam       0.05    1920    12495   -14.367 0.000   0.83 0.17

How should I interpret this? I note the p-value is 0, does this mean that multiple strains are detected confidently?

In the manuscript 10.1099/mgen.0.000607 it is mentioned that the ROC curvers are generated using the likelihood ratio. Is that equivalent to the LR_statistic above? Is there a recommend threshold for the LR_statistic to discriminate between pure and mixed infections?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions