Hi,
Thanks for developing this tool. Also, a well-written paper covering the minutest of details.
I am brand new to the field of NER and relation extraction. Please bear with my naive questions.
Some context on what I have tried for mining gene sequence variants:
I have tried Pubtator3 API for gene sequence variant annotations. However, I found several instances of inconsistent variant annotations for the infon 'DNAMutation'. Particularly, it appears that in several cases, the annotations mistake the following for gene sequence variants - catalogue number for chemicals, equipment or sometimes a variant mentioned in the title of a Reference. This appears to be a problem specific to papers where few variants are listed in the full-text of a paper (<5) and also a problem with papers focussing on species other than humans.
Questions specific to BioNExt:
As mentioned in the BioNExt paper, I find the performance of 'Tagger' compared to Pubtator3 intriguing. I have two questions relating to BioNExt:
a) Does the Tagger algorithm considers the cases I mentioned in the previous paragraph and corrects for false positives?
b) In the paper under the Linked methodology for Genes- "Additionally, if no organism is found, we consider the organism to be human (code 9606)."
This was also a problem with Pubtator3, when the organism name was clearly mentioned in the sentences surrounding the text for genes in the concerned research paper. Is there a possibility to improve on this in BioNExt?
Hoping to use this tool in my work. Thanks.
Hi,
Thanks for developing this tool. Also, a well-written paper covering the minutest of details.
I am brand new to the field of NER and relation extraction. Please bear with my naive questions.
Some context on what I have tried for mining gene sequence variants:
I have tried Pubtator3 API for gene sequence variant annotations. However, I found several instances of inconsistent variant annotations for the infon 'DNAMutation'. Particularly, it appears that in several cases, the annotations mistake the following for gene sequence variants - catalogue number for chemicals, equipment or sometimes a variant mentioned in the title of a Reference. This appears to be a problem specific to papers where few variants are listed in the full-text of a paper (<5) and also a problem with papers focussing on species other than humans.
Questions specific to BioNExt:
As mentioned in the BioNExt paper, I find the performance of 'Tagger' compared to Pubtator3 intriguing. I have two questions relating to BioNExt:
a) Does the Tagger algorithm considers the cases I mentioned in the previous paragraph and corrects for false positives?
b) In the paper under the Linked methodology for Genes- "Additionally, if no organism is found, we consider the organism to be human (code 9606)."
This was also a problem with Pubtator3, when the organism name was clearly mentioned in the sentences surrounding the text for genes in the concerned research paper. Is there a possibility to improve on this in BioNExt?
Hoping to use this tool in my work. Thanks.