Declarative Machine Learning eXperiments
dmlx is a declarative framework for machine learning (ML) experiments.
Typically, ML codebases use the standard python library argparse to parse
parameters from command line, and pass these parameters deep into the models and
other components. dmlx standardizes this process and provides an elegant
framework for experiment declaration and basic management, including the
following main features:
- Declarative Experiment Components: Declarative interfaces are presented for defining resusable and reproducible experiment components and hyperparameters, such as model path, dataset getter and random seed.
click-powered Command Line Interface:clickis integrated to provide powerful command line functionalities, including parameter properties.- Automatic Parameter Collection: Parameter properties will be wired with command line inputs and collected for experiment reproducibility.
- Experiment Archive Management: Archive directories will be automatically created to hold experiment data for further analysis.
- ML Framework Independent:
dmlxis independent from ML frameworks so you can use whatever ML framework you like (PyTorch/TensorFlow/ScikitLearn/...).
An example ML codebase using dmlx is illustrated below:
my_innovative_approach/model/baseline.pyours.py
dataset/dataset_foo.pydataset_bar.py
experiments/- ...
approach.pytrain.pyanalyze.py
-
Firstly, models are defined as submodules of the
modelmodule, and dataset loaders are defined as submodules of thedatasetmodule. These components should expect normal Python arguments, and the component factories defined later usingcomponent()will parse command line parameters and pass the arguments to real components.# model/xxx.py class Model: def __init__(self, alpha: float, beta: float, ...) -> None: ...
# dataset/dataset_yyy.py def get_dataset_yyy(...): ...
-
Secondly, the components (models/datasets) and other parameters can be declared as properties on a composed approach using
dmlx. The parameter properties, declared byargument()andoption(), will define corresponding command line parameters and store them as instance attributes. The component properties, declared bycomponent(), will create the actual component objects and store them as instance attributes.# approach.py from dmlx.context import argument, option, component class Approach: model = component( argument("model_locator", default="ours"), # click argument "model", # module base "Model", # default factory name ) dataset = component( option("dataset_locator", "-d", "--dataset"), # click option "dataset", # module base ) epochs = option("-e", "--epochs", type=int, default=800) # click option def run(self): for epoch in range(self.epochs): for x, y_true in self.dataset: y_pred = self.model(x) yield x, y_true, y_pred
-
Thirdly,
dmlx.experiment.Experimentcan be used to declare your experiment. The experiment object will create an underlyingclickcommand, and the experiment context will collect the parameters(model_locator,dataset_locaterandepochs) and wire them with command line inputs.# train.py from dmlx.experiment import Experiment experiment = Experiment() with experiment.context(): from approach import Approach @experiment.main() def main(**args): experiment.init() approach = Approach() with (experiment.path / "train.log").open("w") as log_file: for x, y_true, y_pred in approach.run(): metrics = compute_metrics(y_pred, y_true) log_file.write(repr(metrics) + "\n") approach.model.save(experiment.path / "model.bin") experiment.run()
-
Finally, you can invoke
train.pyin the command line to actually conduct the experiment, where component params accept string locators in the form ofpath.to.module[:factory_name][?[k_0=v_0][;k_n=v_n...]]with values parsed byjson.loads.python train.py 'ours?alpha=0.1' \ --dataset 'dataset_foo:get_dataset_foo? version = "2.0"; shots = 5; # ... ' \ --epochs 500
-
After calling
experiment.init(), an experiment directory will be created inexperiments/, to whichexperiment.pathwill point, and the experiment meta will be dumped intometa.jsonin that directory. Extra data can also be saved to the experiment directory, as shown intrain.py, where a log filetrain.logholding epoch metrics and a model archivemodel.binare created. This experiment archive can then be loaded to perform extensive inspections, such as visualization and further statistical analysis, where properties defined onApproachwill be automatically restored:# analyze.py from dmlx.experiment import Experiment experiment = Experiment() with experiment.context(): from approach import Approach @experiment.main() def main(**args): print("Loaded args:", args) print("Loaded meta:", experiment.meta) approach = Approach() approach.model.load(experiment.path / "model.bin") # Now, `args`, `approach.model`, `approach.dataset` and other properties # are all restored, ready for extensive inspections. experiment.load("/path/to/the/experiment")