The user home directories are actually not a good place to store data.
This is configurable in the huggingface interface with something like this:
import datasets
datasets.load_dataset("sjyhne/mapai_training_data", cache_dir="/mnt/experiment-3/huggingface")
However, it seems there is a bug in the mapai_training_data.py, and it can't properly handle nondefault directory.
I would suggest to add an optional cache_dir to the create_dataset function which is propagated
into the load_dataset, and also fixing the bug in the huggingface datasets.
It is hackable, but especially on long term, after the competition, it would be nice to have a conformant dataset
which could be used in the future easily.
Reproduction:
Possible cause:
- very likely in the
_split_genetators function the current working dir is not the one which is assumed,
therefore the os.makedirs refer to wrong location.
The user home directories are actually not a good place to store data.
This is configurable in the huggingface interface with something like this:
However, it seems there is a bug in the mapai_training_data.py, and it can't properly handle nondefault directory.
I would suggest to add an optional cache_dir to the
create_datasetfunction which is propagatedinto the load_dataset, and also fixing the bug in the huggingface datasets.
It is hackable, but especially on long term, after the competition, it would be nice to have a conformant dataset
which could be used in the future easily.
Reproduction:
Possible cause:
_split_genetatorsfunction the current working dir is not the one which is assumed,therefore the os.makedirs refer to wrong location.