Skip to content

Cannot retrieve some cluster files #20

@L40S38

Description

@L40S38

Hi.

I executed the command to evaluate on the Vertex dataset or the ProSPECCTS dataset.
But I found almost the same error like below.

(I exported as $STRUCTURE_DATA_DIR = $DEEPLYTOUGH/datasets_structure. Also, I omitted the path to the repository)

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5324k 100 5324k 0 0 1471k 0 0:00:03 0:00:03 --:--:-- 1472k
INFO:datasets.vertex:Preprocessing: downloading data and extracting pockets, this will take time.
INFO:root:cluster file path: DeeplyTough/datasets_structure/bc-30.out
WARNING:root:Cluster definition not found, will download a fresh one.
WARNING:root:However, this will very likely lead to silent incompatibilities with any old 'pdbcode_mappings.pickle' files! Please better remove those manually.
Traceback (most recent call last):
File "DeeplyTough/deeplytough/scripts/vertex_benchmark.py", line 68, in
main()
File "DeeplyTough/deeplytough/scripts/vertex_benchmark.py", line 32, in main
database.preprocess_once()
File "DeeplyTough/deeplytough/datasets/vertex.py", line 49, in preprocess_once
clusterer = RcsbPdbClusters(identity=30)
File "DeeplyTough/deeplytough/misc/utils.py", line 248, in init
self._fetch_cluster_file()
File "DeeplyTough/deeplytough/misc/utils.py", line 262, in _fetch_cluster_file
self._download_cluster_sets(cluster_file_path)
File "DeeplyTough/deeplytough/misc/utils.py", line 253, in _download_cluster_sets
request.urlretrieve(f'https://cdn.rcsb.org/resources/sequence/clusters/bc-{self.identity}.out', cluster_file_path)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

I successed when evaluated on the TOUGH-M1 dataset, so I'm afraid of some URL to the Vertex and ProSPECCTS data is expired.
Would you mind check about that?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions