VariantSpark is currently optimised for reasonally small sample sizes (n=100-5000) and large numbers of variants (e.g. 42 million) , ie. 'wide' datasets. Working on phenotypes in UKBB, e.g. CAD we have samples sizes of ~50K at our disposal and VariantSpark has a long run time ( ~3day) when dealing with such sample sizes. As we expect genomic cohorts to grow in size it is worth considering how we can optimise VariantSpark for larger sample sizes (50K plus).