tiny generator

do you love to generate random molecules but are you sick and tired of dependencies, black boxes and bloated weights of contemporary methods? then this is the script for you!

using a privileged set of chembl-mined reduced graphs and fragments, this script generates random molecules using string operations. the output is (highly non-canonical) rdkit parsable SMILES strings

example usage

python tiny_generator.py 1000 0.5

generate 1000 molecules with a weight adjustment of 0.5 and print them out. there is no clutter printed so you can also pipe it to a smi file using

python tiny_generator.py 1000 0.5 > random_molecules.smi

the weight adjustment parameter should be around 1: lower means more weight to more uncommon fragments and thus more diverse molecules, higher means more weight to more common fragments, thus less diverse molecules.

you can also use it in your script by:

import tiny_generator
gm=tiny_generator.gen_mols(10,0.5)

example output

the output are uncanonized smiles, such as the below ones:

C%10C.C%10CO%11.C%111=C%12C(=O)c2ccccc2C1=O.Cl%12
C%10(=O)OCCCC.C%101CCCN1%11.c%111ccc%12c%13c1.C%12(=O)OCC.c%131cn%14cn1.C%14

these look funny but are totally grammatical and parsable. the equivalent canonized version of these molecules are

CCCCOC1=C(Cl)C(=O)c2ccccc2C1=O
CCCCOC(=O)C1CCCN1c1ccc(C(=O)OCC)c(-c2cn(C)cn2)c1

for quick rendering of SMILES using rdkit backend I recommend pasting into the RDKit.js web demo

some stats

time to generate 1 million molecules (weight adjustment 0.5): 76.1 sec on 1 CPU core (Intel® Core™ i5-6600 CPU @ 3.30GHz × 4 ). ~99.9% of the strings were unique.

a word of warning

currently, the merging strategy is purely random. this means some of the molecules will have unstable functionality in there like haloamine, aminals, etc

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
model1000.pkl.gz		model1000.pkl.gz
tiny_generator.py		tiny_generator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tiny generator

example usage

example output

some stats

a word of warning

About

Uh oh!

Releases

Packages

Languages

License

dehaenw/tinygenerator

Folders and files

Latest commit

History

Repository files navigation

tiny generator

example usage

example output

some stats

a word of warning

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages