Skip to content

salahalsh/formula-sigma

Repository files navigation

FORMULA-Sigma - validation harness and case datasets

DOI

Reproducibility package for the paper:

FORMULA-Sigma: a unified web platform integrating classical DoE, mixture experiments, machine-learning surrogates, and probabilistic design space for pharmaceutical formulation. Salah A. Alshehade, Iqbal H. Jebril, Mulham Alfatama, Maryam Nabavi Fard, Ahmad Naoras Bitar (submitted to International Journal of Pharmaceutics).

This repository contains the standalone validation harness and the five curated retrospective case datasets used to validate FORMULA-Sigma. It lets anyone reproduce, from the curated per-run case data, every numerical validation claim in the paper - cross-tool numerical agreement, retrospective reproduction of five published pharma DoE studies, and synthetic ground-truth benchmarks - without any access to the FORMULA-Sigma platform.

Note on source data. We distribute only our curated extraction of each study (the cleaned per-run design/response CSVs and the ingestion YAMLs), not the original article files. The PubMed Central identifiers are listed below so you can retrieve each original publication under its own license.

What this repository is (and is not)

  • It is a self-contained Python harness that re-implements the platform's design, fitting and optimisation computations directly on open libraries (pyDOE3, statsmodels, scikit-learn, pymoo, SciPy), plus the curated case data. Because it does not depend on the platform engine, it also serves as an independent cross-check of that engine.
  • It is not the FORMULA-Sigma platform itself. The platform engine is proprietary and is not part of this release. A small number of figures in the paper (Figure 8 and the supplementary figures) are rendered from the live platform and are provided here only as static assets in figures_static/; they are not regenerated by this harness.

Licensing (please read)

This repository is dual-licensed:

Part License Where
Harness source code (04_experiments/) MIT LICENSE
Case datasets (03_data/) CC-BY-4.0 LICENSE-DATA.md

Both licenses permit commercial use. The CC-BY-4.0 terms cover our curation of the data only; the underlying numerical values are reproduced from published PubMed Central records and remain subject to the rights of their original authors. See LICENSE-DATA.md for the per-case provenance and attribution requirements.

Repository layout

03_data/            Five curated case datasets (cleaned per-run CSV + ingestion YAML provenance)     [CC-BY-4.0]
04_experiments/     Validation harness (Python package)                                              [MIT]
  common/           Standalone reference implementations (designs, models, optimisers, PDS, plotting)
  e01_cross_tool/   Cross-tool numerical agreement vs statsmodels / pyDOE3
  e02_retrospective/  Reproduction of the five published case studies
  e03_synthetic/    Synthetic ground-truth benchmarks (BO, PDS, robust, Pareto)
  e04_reproducibility/  Hash-pinned outputs + verify_hashes.py + make_paper.sh
  figures/          Builders for Figures 1-7, the Farooqi anomaly figure, and the PRISMA figure
  tests/            pytest smoke + unit tests
05_results/         Hash-pinned reference outputs (CSV / JSON / regenerated figures)
figures_static/     Figure 8, supplementary figures and graphical abstract (platform-rendered, static)

Quickstart

Requires Python >= 3.10.

# 1. install the open-library dependencies (no package install needed;
#    the scripts run in place and make_paper.sh handles the working directory)
pip install -r requirements.txt

# 2. reproduce everything (cross-tool + retrospective + synthetic + Figures 1-7),
#    then verify outputs against the hash snapshot
bash 04_experiments/e04_reproducibility/make_paper.sh all

# or just verify the committed outputs against the hash snapshot
bash 04_experiments/e04_reproducibility/make_paper.sh --check

A full rebuild runs in roughly 30 minutes on a 2024 laptop and is deterministic: every output is SHA-256 pinned in 04_experiments/e04_reproducibility/expected_hashes.txt, and verify_hashes.py fails if any artefact drifts.

What is reproducible here

Artefact Reproducible from this repo?
e01-e04 numerical results (cross-tool, retrospective, synthetic) Yes
Figures 1-7, Tables 2-6 Yes
Figure 8, Supplementary Figures 1-5, graphical abstract No - platform-rendered, shipped as static assets in figures_static/

Case data provenance

Case Study PMC record
1 Farooqi et al. 2020 (osmotic-pump CR tablet, CCD) PMC7705261
1b Akhtar et al. 2024 (bilayer SR+IR tablet) PMC10837631
2 Arif et al. 2022 (levosulpiride NLC, mixture-process) PMC9695558
3 Boscolo et al. 2023 (UDCA nanosuspension, BBD) PMC10458560
4 Nemr et al. 2022 (ocular bilosomes, categorical D-optimal) PMC9477486

Please cite the original studies in addition to this repository when reusing the data.

Citation

If you use this harness or the case datasets, please cite the paper (see CITATION.cff) and this archive: Zenodo DOI 10.5281/zenodo.20577231 (concept DOI; version paper-v1.0 = 10.5281/zenodo.20577232).

The platform

FORMULA-Sigma is a proprietary web platform. This repository documents and validates it but does not distribute it. For access to the platform, contact the corresponding author (Mulham Alfatama, mulham@unisza.edu.my).

About

Validation harness and curated case datasets for the FORMULA-Sigma pharmaceutical formulation platform. Reproducibility package for the methods paper (harness: MIT; datasets: CC-BY-4.0). Platform engine is proprietary.

Topics

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-DATA.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors