Skip to content

Should filtering all but one genome lead to an error? #300

@bernt-matthias

Description

@bernt-matthias

I got 2 similar drep error reports from users. To me it seems that in both cases only a single genome passes filtering which makes calculating the distance matrix fail.

I was wondering if such cases should really produce an error or if drep should just return the single genome that passed the filter?

Wondering what happens if all genomes are filtered.

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

Will filter the genome list
6 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
Running prodigal
Running checkM
16.67% of genomes passed checkM filtering



..:: dRep dereplicate Step 2. Cluster ::..





Running primary clustering
Running pair-wise MASH clustering
Traceback (most recent call last):
File "/usr/local/bin/dRep", line 32, in <module>
Controller().parseArguments(args)
File "/usr/local/lib/python3.11/site-packages/drep/controller.py", line 100, in parseArguments
self.dereplicate_operation(**vars(args))
File "/usr/local/lib/python3.11/site-packages/drep/controller.py", line 48, in dereplicate_operation
drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
File "/usr/local/lib/python3.11/site-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/controller.py", line 184, in d_cluster_wrapper
GenomeClusterController(workDirectory, **kwargs).main()
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/controller.py", line 32, in main
self.run_primary_clustering()
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/controller.py", line 100, in run_primary_clustering
Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/compare_utils.py", line 115, in all_vs_all_MASH
Cdb, cluster_ret = cluster_mash_database(Mdb, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/compare_utils.py", line 282, in cluster_mash_database
Cdb, linkage = drep.d_cluster.cluster_utils.cluster_hierarchical(linkage_db, linkage_method= P_Lmethod, 

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/cluster_utils.py", line 153, in cluster_hierarchical
linkage = scipy.cluster.hierarchy.linkage(arr, method= linkage_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/scipy/cluster/hierarchy.py", line 1033, in linkage
n = int(distance.num_obs_y(y))
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/scipy/spatial/distance.py", line 2742, in num_obs_y
raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix.
***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************
Will filter the genome list
76 genomes were input to dRep
Calculating genome info of genomes
1.32% of genomes passed length filtering


..:: dRep dereplicate Step 2. Cluster ::..



Running primary clustering
Running pair-wise MASH clustering
Traceback (most recent call last):
File "/usr/local/bin/dRep", line 32, in <module>
Controller().parseArguments(args)
File "/usr/local/lib/python3.11/site-packages/drep/controller.py", line 100, in parseArguments
self.dereplicate_operation(**vars(args))
File "/usr/local/lib/python3.11/site-packages/drep/controller.py", line 48, in dereplicate_operation
drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
File "/usr/local/lib/python3.11/site-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/controller.py", line 184, in d_cluster_wrapper
GenomeClusterController(workDirectory, **kwargs).main()
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/controller.py", line 32, in main
self.run_primary_clustering()
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/controller.py", line 100, in run_primary_clustering
Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/compare_utils.py", line 115, in all_vs_all_MASH
Cdb, cluster_ret = cluster_mash_database(Mdb, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/compare_utils.py", line 282, in cluster_mash_database
Cdb, linkage = drep.d_cluster.cluster_utils.cluster_hierarchical(linkage_db, linkage_method= P_Lmethod, 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/drep/d_cluster/cluster_utils.py", line 153, in cluster_hierarchical
linkage = scipy.cluster.hierarchy.linkage(arr, method= linkage_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/scipy/cluster/hierarchy.py", line 1033, in linkage
n = int(distance.num_obs_y(y))
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/scipy/spatial/distance.py", line 2742, in num_obs_y
raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix.
cat: can't open 'outdir/data/checkM/checkM_outdir/checkm.log': No such file or directory

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions