Skip to content

optimise VariantSpark for large sample size (n>50K) #204

@natwine

Description

@natwine

VariantSpark is currently optimised for reasonally small sample sizes (n=100-5000) and large numbers of variants (e.g. 42 million) , ie. 'wide' datasets. Working on phenotypes in UKBB, e.g. CAD we have samples sizes of ~50K at our disposal and VariantSpark has a long run time ( ~3day) when dealing with such sample sizes. As we expect genomic cohorts to grow in size it is worth considering how we can optimise VariantSpark for larger sample sizes (50K plus).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions