CHAMANP: Curation and Hierarchical Analysis for Molecular Annotation of Natural Products

CHAMANP is a pre-stable Python package for systematic, reproducible curation and preparation of natural-product molecular datasets.

It sits between raw molecular tables and analysis-ready cheminformatics or machine-learning inputs. Current development uses COCONUT-style data as the reference workflow, but CHAMANP is intended to remain reusable for other tabular natural-product sources with SMILES and collection metadata.

The name intentionally echoes "shaman": CHAMANP interprets a molecular table through a collection taxonomy, helping users separate compounds that belong to one or several source collections.

Status

CHAMANP is published as an Alpha-stage, pre-stable package. The current public API is usable, but the project is still hardening its documentation contract, release workflow, and stable public boundary.

The current public package doorway is:

from chamanp import ChamanpConfig, ChamanpResult, validate_config, run

The minimal CLI is available as:

chamanp --version
chamanp check-config my-chamanp-profile.toml
chamanp run my-chamanp-profile.toml
python -m chamanp --version

Pipeline, chamanp._core, and chamanp._utils are private implementation details and should not be treated as stable public API.

Install

pip install chamanp

CHAMANP targets Python 3.11 and 3.12. Runtime dependencies are declared in pyproject.toml and include pandas, numpy, and RDKit. RDKit is the most platform-sensitive dependency; conda/mamba may still be useful for local scientific environments.

Documentation

The full user documentation is published with MkDocs Material:

Home introduces the project, dataset contract, taxonomy concept, citation, and license.
Usage covers installation, CLI usage, TOML profiles, and Python examples.
API Reference documents the current public API generated with mkdocstrings.
Changelog records release notes.

This README is intentionally concise. Detailed examples, TOML profile guidance, and API examples live in the documentation site.

What CHAMANP Does

At a high level, CHAMANP can:

validate configuration before chemical processing begins;
curate SMILES with RDKit;
filter compounds by exact collection labels from a taxonomy;
handle duplicate structures and optional stereochemistry-related duplicate removal;
generate RDKit Morgan fingerprints;
record invalid SMILES rows for traceability;
write curated tables, filtered tables, valid-molecule metadata, fingerprint matrices, and preparation reports.

CHAMANP does not currently provide molecular property prediction, docking, virtual screening, or a stable public API.

Minimal Orientation

A typical run starts from:

a CSV table with canonical_smiles, collections, and selected metadata columns;
a collection taxonomy JSON file;
a TOML profile or ChamanpConfig with input paths, target collections, fingerprint settings, selected columns, and output locations.

The recommended user path is documented in the Usage page. Source checkouts also include examples/example_chamanp.csv and source_data/coconut_taxonomy.json for the reference walkthrough.

Repository Structure

CHAMANP/
|-- chamanp/                 # Package namespace and public API doorway
|   |-- __init__.py          # Public exports
|   |-- __main__.py          # python -m chamanp entry point
|   |-- cli.py               # Minimal public CLI
|   |-- config.py            # ChamanpConfig
|   |-- pipeline.py          # validate_config/run doorway
|   |-- result.py            # ChamanpResult
|   |-- version.py           # Package version source
|   |-- _core/               # Private pipeline implementation
|   `-- _utils/              # Private utilities
|-- docs/                    # MkDocs Material documentation
|-- examples/                # Small example input data
|-- source_data/             # Source-data policy and taxonomy reference
|-- tests/                   # Focused tests
|-- main.py                  # Repository workflow entry point
|-- pyproject.toml           # Package metadata and dependencies
|-- CHANGELOG.md             # Release notes
|-- CITATION.cff             # Citation metadata
`-- README.md                # Repository overview

Generated outputs are written under artifacts/, which is ignored by Git.

Development

Install the package in editable mode from a source checkout:

python -m pip install -e ".[dev,docs]"

Build the documentation:

mkdocs build --strict

Run tests:

python -m pytest tests

The CI workflow runs tests, package build checks, smoke installs, CLI checks, and a strict documentation build. The documentation deployment workflow publishes the MkDocs site to GitHub Pages.

Citation

Use CITATION.cff as the authoritative machine-readable citation metadata. Citation metadata is updated with each release.

License

This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later. SPDX identifier: LGPL-3.0-or-later.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github/workflows		.github/workflows
chamanp		chamanp
docs		docs
examples		examples
source_data		source_data
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
COPYING		COPYING
COPYING.LESSER		COPYING.LESSER
LICENSE		LICENSE
README.md		README.md
config.py		config.py
environment.yml		environment.yml
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CHAMANP: Curation and Hierarchical Analysis for Molecular Annotation of Natural Products

Status

Install

Documentation

What CHAMANP Does

Minimal Orientation

Repository Structure

Development

Citation

License

About

Licenses found

Uh oh!

Releases 21

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CHAMANP: Curation and Hierarchical Analysis for Molecular Annotation of Natural Products

Status

Install

Documentation

What CHAMANP Does

Minimal Orientation

Repository Structure

Development

Citation

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages