Hi,
I noticed you are using a combination of database including rnacentral, rfam, ensembl and nt.
Can I please ask why did you chose these databases?
Specifically, rnacentral should be a superset of rfam and ensembl. While nt is not a part of rnacentral, it should have been very similar to the ENA database, which is also a subset of rnacentral.
Besides, what data deduplication pipelines is applied to remove the redundancy?
Hi,
I noticed you are using a combination of database including rnacentral, rfam, ensembl and nt.
Can I please ask why did you chose these databases?
Specifically, rnacentral should be a superset of rfam and ensembl. While nt is not a part of rnacentral, it should have been very similar to the ENA database, which is also a subset of rnacentral.
Besides, what data deduplication pipelines is applied to remove the redundancy?