MOLRAPTOR is a pre-stable modular pipeline for fetching, curating, and encoding molecular datasets using PubChem data and RDKit's Morgan fingerprinting algorithm, designed for cheminformatics workflows and phase 1 machine learning applications in computational drug discovery.
MOLRAPTOR/
├── .github/workflows/
│ ├── ci.yml
│ ├── docs.yml
│ └── publish-to-pypi.yml
├── docs/
│ ├── stylesheets/
│ │ └── extra.css
│ ├── api.md
│ ├── cli.md
│ ├── configuration.md
│ ├── index.md
│ ├── installation.md
│ ├── quickstart.md
│ └── release.md
├── examples/
│ └── example_config.yaml
├── molraptor/
│ ├── __init__.py
│ ├── cli.py
│ ├── config.py
│ ├── curate.py
│ ├── fetch.py
│ ├── fingerprint.py
│ ├── fp_integrity.py
│ ├── pipeline.py
│ ├── pubchem.py
│ ├── result_manager.py
│ ├── validators.py
│ └── version.py
├── tests/
│ ├── __init__.py
│ ├── conftest.py
│ ├── test_public_api.py
│ └── test_version.py
├── .gitignore
├── CHANGELOG.md
├── CITATION.cff
├── COPYING
├── COPYING.LESSER
├── environment.yml
├── LICENSE
├── mkdocs.yml
├── pyproject.toml
└── README.md
Project: MOLRAPTOR
PyPI distribution: molraptor
Import package: molraptor
CLI: molraptor
Version: 0.1.1
License: LGPL-3.0-or-later
Status: alpha / pre-stable
The live documentation is published at:
https://nanobiostructuresrg.github.io/molraptor/
Key pages:
After PyPI publication:
python -m pip install molraptorFor local development:
git clone https://github.com/NanoBiostructuresRG/molraptor.git
cd molraptor
python -m pip install -e .For development and documentation tools:
python -m pip install -e ".[dev]"
python -m pip install -e ".[docs]"Run the pipeline with the bundled example configuration:
molraptor run --config examples/example_config.yamlRun from Python:
from molraptor import MolraptorConfig, run
config = MolraptorConfig.load("examples/example_config.yaml")
run(config)| MOLRAPTOR does | MOLRAPTOR does not |
|---|---|
| Fetch molecular properties from PubChem. | Train machine learning models. |
| Curate and validate chemical datasets. | Perform dimensionality reduction. |
| Generate Morgan fingerprints via RDKit. | Support non-PubChem data sources (yet). |
Output ML-ready .npy and .csv artifacts. |
Handle 3D molecular structures. |
| Log failed CIDs for reproducibility. | Support alternative fingerprint types (yet). |
molraptor --help
molraptor run --help
molraptor --versionCommon commands:
molraptor run
molraptor run --config examples/example_config.yaml
molraptor run --config examples/example_config.yaml --verbosefrom molraptor import MolraptorConfig
from molraptor import validate_config
from molraptor import run
from molraptor import DataValidator
from molraptor import __version__Modules not listed above are importable directly but are not part of the public contract and may change before 1.0.
data/
└── dataset.csv <- CSV with PubChem CIDs and labels
Minimum required columns: PubChem CID, Label.
artifacts/
├── morgan_fp.csv # Morgan fingerprints (human-readable)
├── morgan_db_*.npy # Morgan fingerprints (NumPy array, shape: N×size)
├── labels.npy # Target labels (NumPy array, shape: N,)
└── summary.txt # Execution report
Local inputs and generated artifacts such as data/, artifacts/, and logs/
are intentionally ignored by Git.
The current dev/v0.1.1 branch targets:
python -m pytest tests/ -v
mkdocs build --strict
python -m build --no-isolation
python -m twine check dist/*
molraptor --help
molraptor run --help
molraptor --versionIf you use MOLRAPTOR in your research, please cite it using the metadata in CITATION.cff.
Developed by Flavio F. Contreras-Torres. Tecnologico de Monterrey
This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later.
SPDX identifier: LGPL-3.0-or-later