ProQSAR — automatic pipeline for quantitative structure–activity relationship (QSAR) modeling
A reproducible toolkit for end-to-end QSAR: data standardization, featurization, splitting, model training, uncertainty estimation, and evaluation. Designed for reproducible experiments, continuous integration, and easy integration into ML/CADD pipelines. Full documentation for ProQSAR is available at ReadTheDocs.
- Data standardization and sanitization (SMILES normalization, valence checks, tautomer/charge handling).
- Modular featurizers: fingerprints, descriptors, learned featurizers (pluggable API).
- Flexible dataset splitting: random, scaffold, stratified.
- Built-in pipelines for training and evaluation with uncertainty estimation.
- Simple CLI and Python API for reproducible experiments and batch processing.
- CI-tested with unit tests and example notebooks.
Choose the preferred installation method.
From PyPI
pip install proqsarFrom conda (anaconda.org/tieulongphan)
conda install -c tieulongphan proqsarDocker (containerized)
docker pull tieulongphan/proqsar:latest
# run an example container (bind-mount your project directory)
docker run --rm -v $(pwd):/workspace -w /workspace tieulongphan/proqsar:latest proqsar --helpFrom source (developer)
git clone https://github.com/Medicine-Artificial-Intelligence/proqsar.git
cd proqsar
pip install -e .[dev]Thanks for your interest in contributing! A quick checklist:
- Fork the repository and create a feature branch.
- Implement your changes and include unit tests.
- Run linting and tests locally (
pre-commit,flake8,pytest). - Open a Pull Request describing the change and add tests/examples.
If you use ProQSAR in research, please cite the project. Example BibTeX placeholder:
@misc{proqsar2025,
title = {ProQSAR: Automatic pipeline for QSAR modeling},
author = {Tuyet-Minh Phan and Tieu-Long Phan and Phuoc-Chung Nguyen Van and contributors},
year = {2025},
howpublished = {\url{https://github.com/Medicine-Artificial-Intelligence/proqsar}}
}This project is licensed under MIT License - see the License file for details.
This work has received support from the Korea International Cooperation Agency (KOICA) under the project entitled “Education and Research Capacity Building Project at University of Medicine and Pharmacy at Ho Chi Minh City”, conducted from 2024 to 2025 (Project No. 2021-00020-3).
