FORMULA-Sigma - validation harness and case datasets

Reproducibility package for the paper:

FORMULA-Sigma: a unified web platform integrating classical DoE, mixture experiments, machine-learning surrogates, and probabilistic design space for pharmaceutical formulation. Salah A. Alshehade, Iqbal H. Jebril, Mulham Alfatama, Maryam Nabavi Fard, Ahmad Naoras Bitar (submitted to International Journal of Pharmaceutics).

This repository contains the standalone validation harness and the five curated retrospective case datasets used to validate FORMULA-Sigma. It lets anyone reproduce, from the curated per-run case data, every numerical validation claim in the paper - cross-tool numerical agreement, retrospective reproduction of five published pharma DoE studies, and synthetic ground-truth benchmarks - without any access to the FORMULA-Sigma platform.

Note on source data. We distribute only our curated extraction of each study (the cleaned per-run design/response CSVs and the ingestion YAMLs), not the original article files. The PubMed Central identifiers are listed below so you can retrieve each original publication under its own license.

What this repository is (and is not)

It is a self-contained Python harness that re-implements the platform's design, fitting and optimisation computations directly on open libraries (pyDOE3, statsmodels, scikit-learn, pymoo, SciPy), plus the curated case data. Because it does not depend on the platform engine, it also serves as an independent cross-check of that engine.
It is not the FORMULA-Sigma platform itself. The platform engine is proprietary and is not part of this release. A small number of figures in the paper (Figure 8 and the supplementary figures) are rendered from the live platform and are provided here only as static assets in figures_static/; they are not regenerated by this harness.

Licensing (please read)

This repository is dual-licensed:

Part	License	Where
Harness source code (`04_experiments/`)	MIT	`LICENSE`
Case datasets (`03_data/`)	CC-BY-4.0	`LICENSE-DATA.md`

Both licenses permit commercial use. The CC-BY-4.0 terms cover our curation of the data only; the underlying numerical values are reproduced from published PubMed Central records and remain subject to the rights of their original authors. See LICENSE-DATA.md for the per-case provenance and attribution requirements.

Repository layout

03_data/            Five curated case datasets (cleaned per-run CSV + ingestion YAML provenance)     [CC-BY-4.0]
04_experiments/     Validation harness (Python package)                                              [MIT]
  common/           Standalone reference implementations (designs, models, optimisers, PDS, plotting)
  e01_cross_tool/   Cross-tool numerical agreement vs statsmodels / pyDOE3
  e02_retrospective/  Reproduction of the five published case studies
  e03_synthetic/    Synthetic ground-truth benchmarks (BO, PDS, robust, Pareto)
  e04_reproducibility/  Hash-pinned outputs + verify_hashes.py + make_paper.sh
  figures/          Builders for Figures 1-7, the Farooqi anomaly figure, and the PRISMA figure
  tests/            pytest smoke + unit tests
05_results/         Hash-pinned reference outputs (CSV / JSON / regenerated figures)
figures_static/     Figure 8, supplementary figures and graphical abstract (platform-rendered, static)

Quickstart

Requires Python >= 3.10.

# 1. install the open-library dependencies (no package install needed;
#    the scripts run in place and make_paper.sh handles the working directory)
pip install -r requirements.txt

# 2. reproduce everything (cross-tool + retrospective + synthetic + Figures 1-7),
#    then verify outputs against the hash snapshot
bash 04_experiments/e04_reproducibility/make_paper.sh all

# or just verify the committed outputs against the hash snapshot
bash 04_experiments/e04_reproducibility/make_paper.sh --check

A full rebuild runs in roughly 30 minutes on a 2024 laptop and is deterministic: every output is SHA-256 pinned in 04_experiments/e04_reproducibility/expected_hashes.txt, and verify_hashes.py fails if any artefact drifts.

What is reproducible here

Artefact	Reproducible from this repo?
e01-e04 numerical results (cross-tool, retrospective, synthetic)	Yes
Figures 1-7, Tables 2-6	Yes
Figure 8, Supplementary Figures 1-5, graphical abstract	No - platform-rendered, shipped as static assets in `figures_static/`

Case data provenance

Case	Study	PMC record
1	Farooqi et al. 2020 (osmotic-pump CR tablet, CCD)	PMC7705261
1b	Akhtar et al. 2024 (bilayer SR+IR tablet)	PMC10837631
2	Arif et al. 2022 (levosulpiride NLC, mixture-process)	PMC9695558
3	Boscolo et al. 2023 (UDCA nanosuspension, BBD)	PMC10458560
4	Nemr et al. 2022 (ocular bilosomes, categorical D-optimal)	PMC9477486

Please cite the original studies in addition to this repository when reusing the data.

Citation

If you use this harness or the case datasets, please cite the paper (see CITATION.cff) and this archive: Zenodo DOI 10.5281/zenodo.20577231 (concept DOI; version paper-v1.0 = 10.5281/zenodo.20577232).

The platform

FORMULA-Sigma is a proprietary web platform. This repository documents and validates it but does not distribute it. For access to the platform, contact the corresponding author (Mulham Alfatama, mulham@unisza.edu.my).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FORMULA-Sigma - validation harness and case datasets

What this repository is (and is not)

Licensing (please read)

Repository layout

Quickstart

What is reproducible here

Case data provenance

Citation

The platform

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
03_data/processed		03_data/processed
04_experiments		04_experiments
05_results		05_results
figures_static		figures_static
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
LICENSE-DATA.md		LICENSE-DATA.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

FORMULA-Sigma - validation harness and case datasets

What this repository is (and is not)

Licensing (please read)

Repository layout

Quickstart

What is reproducible here

Case data provenance

Citation

The platform

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages