String models exploit biases in MoleculeNet SMILES dialect to inflate performance

Following up on a conversation with Meng Liu, I wanted to link this bug. I confirmed it for ClinTox, but it may be present for other datasets:
https://github.com/deepchem/moleculenet/issues/15

One set of solutions would be:
- Refactoring input parsing code to be shared across models
- Adding smiles canonicalization to input parsing: `from rdkit import Chem; Chem.MolToSmiles(Chem.MolFromSmiles(smiles), canonical=True)`
- Re-running string-based models on all benchmarks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String models exploit biases in MoleculeNet SMILES dialect to inflate performance #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

String models exploit biases in MoleculeNet SMILES dialect to inflate performance #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions