Reproducibility package for the paper:
FORMULA-Sigma: a unified web platform integrating classical DoE, mixture experiments, machine-learning surrogates, and probabilistic design space for pharmaceutical formulation. Salah A. Alshehade, Iqbal H. Jebril, Mulham Alfatama, Maryam Nabavi Fard, Ahmad Naoras Bitar (submitted to International Journal of Pharmaceutics).
This repository contains the standalone validation harness and the five curated retrospective case datasets used to validate FORMULA-Sigma. It lets anyone reproduce, from the curated per-run case data, every numerical validation claim in the paper - cross-tool numerical agreement, retrospective reproduction of five published pharma DoE studies, and synthetic ground-truth benchmarks - without any access to the FORMULA-Sigma platform.
Note on source data. We distribute only our curated extraction of each study (the cleaned per-run design/response CSVs and the ingestion YAMLs), not the original article files. The PubMed Central identifiers are listed below so you can retrieve each original publication under its own license.
- It is a self-contained Python harness that re-implements the platform's design, fitting and optimisation computations directly on open libraries (pyDOE3, statsmodels, scikit-learn, pymoo, SciPy), plus the curated case data. Because it does not depend on the platform engine, it also serves as an independent cross-check of that engine.
- It is not the FORMULA-Sigma platform itself. The platform engine is
proprietary and is not part of this release. A small number of figures in
the paper (Figure 8 and the supplementary figures) are rendered from the live
platform and are provided here only as static assets in
figures_static/; they are not regenerated by this harness.
This repository is dual-licensed:
| Part | License | Where |
|---|---|---|
Harness source code (04_experiments/) |
MIT | LICENSE |
Case datasets (03_data/) |
CC-BY-4.0 | LICENSE-DATA.md |
Both licenses permit commercial use. The CC-BY-4.0 terms cover our curation
of the data only; the underlying numerical values are reproduced from published
PubMed Central records and remain subject to the rights of their original
authors. See LICENSE-DATA.md for the per-case provenance and
attribution requirements.
03_data/ Five curated case datasets (cleaned per-run CSV + ingestion YAML provenance) [CC-BY-4.0]
04_experiments/ Validation harness (Python package) [MIT]
common/ Standalone reference implementations (designs, models, optimisers, PDS, plotting)
e01_cross_tool/ Cross-tool numerical agreement vs statsmodels / pyDOE3
e02_retrospective/ Reproduction of the five published case studies
e03_synthetic/ Synthetic ground-truth benchmarks (BO, PDS, robust, Pareto)
e04_reproducibility/ Hash-pinned outputs + verify_hashes.py + make_paper.sh
figures/ Builders for Figures 1-7, the Farooqi anomaly figure, and the PRISMA figure
tests/ pytest smoke + unit tests
05_results/ Hash-pinned reference outputs (CSV / JSON / regenerated figures)
figures_static/ Figure 8, supplementary figures and graphical abstract (platform-rendered, static)
Requires Python >= 3.10.
# 1. install the open-library dependencies (no package install needed;
# the scripts run in place and make_paper.sh handles the working directory)
pip install -r requirements.txt
# 2. reproduce everything (cross-tool + retrospective + synthetic + Figures 1-7),
# then verify outputs against the hash snapshot
bash 04_experiments/e04_reproducibility/make_paper.sh all
# or just verify the committed outputs against the hash snapshot
bash 04_experiments/e04_reproducibility/make_paper.sh --checkA full rebuild runs in roughly 30 minutes on a 2024 laptop and is deterministic:
every output is SHA-256 pinned in 04_experiments/e04_reproducibility/expected_hashes.txt,
and verify_hashes.py fails if any artefact drifts.
| Artefact | Reproducible from this repo? |
|---|---|
| e01-e04 numerical results (cross-tool, retrospective, synthetic) | Yes |
| Figures 1-7, Tables 2-6 | Yes |
| Figure 8, Supplementary Figures 1-5, graphical abstract | No - platform-rendered, shipped as static assets in figures_static/ |
| Case | Study | PMC record |
|---|---|---|
| 1 | Farooqi et al. 2020 (osmotic-pump CR tablet, CCD) | PMC7705261 |
| 1b | Akhtar et al. 2024 (bilayer SR+IR tablet) | PMC10837631 |
| 2 | Arif et al. 2022 (levosulpiride NLC, mixture-process) | PMC9695558 |
| 3 | Boscolo et al. 2023 (UDCA nanosuspension, BBD) | PMC10458560 |
| 4 | Nemr et al. 2022 (ocular bilosomes, categorical D-optimal) | PMC9477486 |
Please cite the original studies in addition to this repository when reusing the data.
If you use this harness or the case datasets, please cite the paper (see
CITATION.cff) and this archive: Zenodo DOI 10.5281/zenodo.20577231 (concept DOI; version paper-v1.0 = 10.5281/zenodo.20577232).
FORMULA-Sigma is a proprietary web platform. This repository documents and validates it but does not distribute it. For access to the platform, contact the corresponding author (Mulham Alfatama, mulham@unisza.edu.my).