Skip to content

Fix CP02712: replace corrupted methane SMILES with correct ANP structure#2

Open
anagnorisis2peripeteia wants to merge 1 commit into
dfwlab:mainfrom
anagnorisis2peripeteia:fix/cp02712-anp-smiles
Open

Fix CP02712: replace corrupted methane SMILES with correct ANP structure#2
anagnorisis2peripeteia wants to merge 1 commit into
dfwlab:mainfrom
anagnorisis2peripeteia:fix/cp02712-anp-smiles

Conversation

@anagnorisis2peripeteia

Copy link
Copy Markdown

Fixes #1

What was wrong

CP02712 (Atrial Natriuretic Peptide, 28-residue human form) had its SMILES field populated with 28 disconnected methane molecules:

C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C

This appears to have been caused by a pipeline bug where one C fragment was emitted per residue when the SMILES lookup failed, rather than producing the actual structure. All downstream computed properties (InChI, InChIKey, formula, MW, fingerprints, etc.) were therefore computed from methane instead of ANP.

What this PR changes

All fields in Peptide_structrure_info.xlsx for CP02712 have been recomputed from the correct structure:

Field Old (corrupt) New (correct)
SMILES C.C.C.C... (×28) PubChem CID 16129708 isomeric SMILES
Formula C₁₂₇H₂₀₃N₄₅O₃₉S₃
Exact_Mass 448.88 3078.44
InChIKey YTQVSWRCVXPAKI-... NSQLIUXCMFBZME-MPVJKSABSA-N
Heavy_Atom_Count 28 214
Number_of_Rings 0 4
All fingerprints (methane) recomputed with RDKit

The 3D structure evidence (PDB|1ANP), complex structure references, and other non-SMILES fields were left unchanged as they appear correct.

Source

Notes

The Peptide_basic_info.xlsx and Peptide_sequence_info.xlsx entries for CP02712 appear to be correct (the sequence SLRRSSCFGGRMDRIGAQSGLGCNSFRY is intact); only the structure info file was affected.

CP02712 (Atrial Natriuretic Peptide) had its SMILES field corrupted with
28 disconnected methane fragments (C.C.C...C) instead of the actual
28-residue disulfide-cyclic peptide structure.

Corrected fields (all recomputed from the correct structure):
- SMILES: PubChem CID 16129708 isomeric SMILES
- NS_InChI / InChIKey
- Formula: C127H203N45O39S3 (was CH4 x28)
- Exact_Mass, TPSA, LogP, Heavy_Atom_Count, HBD/HBA, rings, etc.
- RDKit / Daylight-like / Morgan / MACCS fingerprints

Fixes dfwlab#1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CP02712: SMILES field contains corrupted data (28x methane) instead of Atrial Natriuretic Peptide

1 participant