The example dataset used as a baseline for PROTzilla features unfortunately uses a non-standard format for the evidence.txt file, where all modified sequence strings only contain abbreviated identifiers (e.g. ac instead of Acetylation), which is not understood by standard parsers. However, the translation is straightforward and could be accomplished via a checkbox in the step's form that would then lead to preprocessing all abbreviations and replacing them with canonical identifiers, after which the standard parsing should be able to handle the new form. See the pandas documentation on replacing strings in a series to achieve this
The example dataset used as a baseline for PROTzilla features unfortunately uses a non-standard format for the evidence.txt file, where all modified sequence strings only contain abbreviated identifiers (e.g.
acinstead ofAcetylation), which is not understood by standard parsers. However, the translation is straightforward and could be accomplished via a checkbox in the step's form that would then lead to preprocessing all abbreviations and replacing them with canonical identifiers, after which the standard parsing should be able to handle the new form. See the pandas documentation on replacing strings in a series to achieve this