optimise VariantSpark for large sample size (n>50K)

VariantSpark is currently optimised for reasonally small sample sizes (n=100-5000) and large numbers of variants (e.g. 42 million) , ie. 'wide' datasets. Working on phenotypes in UKBB, e.g. CAD we have samples sizes of ~50K at our disposal and VariantSpark has a long run time ( ~3day) when dealing with such sample sizes. As we expect genomic cohorts to grow in size it is worth considering how we can optimise VariantSpark for larger sample sizes (50K plus). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimise VariantSpark for large sample size (n>50K) #204

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

optimise VariantSpark for large sample size (n>50K) #204

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions