Skip to content

Handling more SMILES #27

@chem-william

Description

@chem-william

Is there any interest in covering SMILES strings a bit more extensively? I don't think they strictly follow the specification by OpenSMILES, but rdkit seems to explicitly want to handle them:
Out of the following list of SMILES

        CCC
        C1CCC2(CC1)CO2
        c1ccc[se]1
        c1ccc[te]1
        C0CC0
        C/C(/C=C2\\Sc1ccc(cc1N\\2C))=C5\\SC4=NccN4C\\5=O
        [HH]
        [2HH]
        [HH2-]
        [2HH2-]
        b1ccccc1
        C[Rf]C
        [C:1]
        [C:0]
        [si]1cccc[si]1
        [asH]1cccc1
        [Db][Sg][Bh][Hs][Mt][Ds][Rg][Cn][Nh][Fl][Mc][Lv][Ts][Og]
        [Uun][Uuu][Uub][Uut][Uuq][Uup][Uuh][Uus][Uuo]
        ['Db']['Sg']['Bh']['Hs']['Mt']['Ds']['Rg']['Cn']['Nh']['Fl']['Mc']['Lv']['Ts']['Og']
        C[Fe@TH](O)(Cl)F
        C[Fe@TH1](O)(Cl)F
        C[Fe@SP](O)(Cl)F
        C[Fe@SP1](O)(Cl)F
        C[Fe@TB](O)(Cl)(Br)F
        C[Fe@TB10](O)(Cl)(Br)F
        C[Fe@OH](O)(Cl)(Br)(N)F
        C[Fe@OH20](O)(Cl)(Br)(N)F

these are not understood by Purr:

[src/main.rs:64:13] s = "c1ccc[te]1"
[src/main.rs:64:13] s = "[si]1cccc[si]1"
[src/main.rs:64:13] s = "[Uun][Uuu][Uub][Uut][Uuq][Uup][Uuh][Uus][Uuo]"
[src/main.rs:64:13] s = "['Db']['Sg']['Bh']['Hs']['Mt']['Ds']['Rg']['Cn']['Nh']['Fl']['Mc']['Lv']['Ts']['Og']"
[src/main.rs:64:13] s = "C[Fe@TH](O)(Cl)F"
[src/main.rs:64:13] s = "C[Fe@SP](O)(Cl)F"
[src/main.rs:64:13] s = "C[Fe@TB](O)(Cl)(Br)F"
[src/main.rs:64:13] s = "C[Fe@OH](O)(Cl)(Br)(N)F"

I'm aware that Balsa is supposed to supersede Purr, but that library handles even fewer cases than Purr

I'd be happy to give it a shot implementing these cases

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions