Skip to content

estimation of footprint to assemble a large plant genome #3

@dcopetti

Description

@dcopetti

Hello,
we would like to use Ra to assemble a ~5 Gb plant genome (the genome is actually 2.5Gb in size, but it is highly-heterozygous, so we want to distinguish the two alleles in separate contigs). I have about 45x (of 5 Gb) of ONT data (N50 15 kb, QV >7) and we wonder if there is a way to predict the size of the minimap2 file and optimize the alignment step, since it is the most expensive.
For example, to simplify the output, adding/tweaking the minimap2 options: -X -p -N
and to be more sensitive: -A -B -O -E -z.
Do you think there is room in that part to increase specificity of alignments and decrease footprint and computation time?

Lastly, can you estimate the memory usage for an assembly where we have up to 36 CPUs (hyperthreading to 72) and max 500 GB RAM available? Will the polishing/graph construction steps be more memory demanding than the alignment step?
Thanks,
Dario

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions