Is your proposal related to a problem?
The dataset interfaces using single JSON file as the input, might have to implement different validation logic.
For example, current integrity validators just reading the JSON files and seeing if they are not corrupted might not work for a dataset where the files are now JSON objects within a larger JSON array.
They are no longer files but objects of a lazily loaded huge JSON file.
Describe the solution you'd like
Define a proper way of integrating validation logic for all kinds of datasets that there are.
Currently validators act like filters, they can exclude specific samples from training. This logic should be described further and solidified.
Is your proposal related to a problem?
The dataset interfaces using single JSON file as the input, might have to implement different validation logic.
For example, current integrity validators just reading the JSON files and seeing if they are not corrupted might not work for a dataset where the files are now JSON objects within a larger JSON array.
They are no longer files but objects of a lazily loaded huge JSON file.
Describe the solution you'd like
Define a proper way of integrating validation logic for all kinds of datasets that there are.
Currently validators act like filters, they can exclude specific samples from training. This logic should be described further and solidified.