Skip to content

NanoBiostructuresRG/molraptor

MOLRAPTOR: Molecular Learning via Rapid Processing of Topological Representations

CI License: LGPL v3 Version Python

MOLRAPTOR is a pre-stable modular pipeline for fetching, curating, and encoding molecular datasets using PubChem data and RDKit's Morgan fingerprinting algorithm, designed for cheminformatics workflows and phase 1 machine learning applications in computational drug discovery.

Project Structure

MOLRAPTOR/
├── .github/workflows/
│   ├── ci.yml
│   ├── docs.yml
│   └── publish-to-pypi.yml
├── docs/
│   ├── stylesheets/
│   │   └── extra.css
│   ├── api.md
│   ├── cli.md
│   ├── configuration.md
│   ├── index.md
│   ├── installation.md
│   ├── quickstart.md
│   └── release.md
├── examples/
│   └── example_config.yaml
├── molraptor/
│   ├── __init__.py
│   ├── cli.py
│   ├── config.py
│   ├── curate.py
│   ├── fetch.py
│   ├── fingerprint.py
│   ├── fp_integrity.py
│   ├── pipeline.py
│   ├── pubchem.py
│   ├── result_manager.py
│   ├── validators.py
│   └── version.py
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_public_api.py
│   └── test_version.py
├── .gitignore
├── CHANGELOG.md
├── CITATION.cff
├── COPYING
├── COPYING.LESSER
├── environment.yml
├── LICENSE
├── mkdocs.yml
├── pyproject.toml
└── README.md

Project Identity

Project: MOLRAPTOR
PyPI distribution: molraptor
Import package: molraptor
CLI: molraptor
Version: 0.1.1
License: LGPL-3.0-or-later
Status: alpha / pre-stable

Documentation

The live documentation is published at:

https://nanobiostructuresrg.github.io/molraptor/

Key pages:

Installation

After PyPI publication:

python -m pip install molraptor

For local development:

git clone https://github.com/NanoBiostructuresRG/molraptor.git
cd molraptor
python -m pip install -e .

For development and documentation tools:

python -m pip install -e ".[dev]"
python -m pip install -e ".[docs]"

Quick Start

Run the pipeline with the bundled example configuration:

molraptor run --config examples/example_config.yaml

Run from Python:

from molraptor import MolraptorConfig, run

config = MolraptorConfig.load("examples/example_config.yaml")
run(config)

Scope

MOLRAPTOR does MOLRAPTOR does not
Fetch molecular properties from PubChem. Train machine learning models.
Curate and validate chemical datasets. Perform dimensionality reduction.
Generate Morgan fingerprints via RDKit. Support non-PubChem data sources (yet).
Output ML-ready .npy and .csv artifacts. Handle 3D molecular structures.
Log failed CIDs for reproducibility. Support alternative fingerprint types (yet).

CLI

molraptor --help
molraptor run --help
molraptor --version

Common commands:

molraptor run
molraptor run --config examples/example_config.yaml
molraptor run --config examples/example_config.yaml --verbose

Public API

from molraptor import MolraptorConfig
from molraptor import validate_config
from molraptor import run
from molraptor import DataValidator
from molraptor import __version__

Modules not listed above are importable directly but are not part of the public contract and may change before 1.0.

Input Format

data/
└── dataset.csv      <- CSV with PubChem CIDs and labels

Minimum required columns: PubChem CID, Label.

Outputs

artifacts/
├── morgan_fp.csv          # Morgan fingerprints (human-readable)
├── morgan_db_*.npy        # Morgan fingerprints (NumPy array, shape: N×size)
├── labels.npy             # Target labels (NumPy array, shape: N,)
└── summary.txt            # Execution report

Local inputs and generated artifacts such as data/, artifacts/, and logs/ are intentionally ignored by Git.

Validation

The current dev/v0.1.1 branch targets:

python -m pytest tests/ -v
mkdocs build --strict
python -m build --no-isolation
python -m twine check dist/*
molraptor --help
molraptor run --help
molraptor --version

Citation

If you use MOLRAPTOR in your research, please cite it using the metadata in CITATION.cff.

Author

Developed by Flavio F. Contreras-Torres. Tecnologico de Monterrey

License

This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later.

SPDX identifier: LGPL-3.0-or-later

About

MOLRAPTOR is a modular pipeline for molecular data curation and fingerprint encoding

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
GPL-3.0
COPYING
LGPL-3.0
COPYING.LESSER

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages