Code submission for CS 325B, wealth changes group. Github: https://github.com/rosikand/CS325B-wealth-changes. Please contact rsikand@stanford.edu for any questions/help running code.
Code base structure design (optional)
The main repository structure is created with the intention of being modular and configurable to make modeling more efficient. For example, changing the learning rate is not done via code but by setting the lr parameter in a config. We use python dataclasses to facilitate the configs. All configs are placed in src/configs.py with the default config being the DefaultConfig in this file. To orchestrate correctly with the codebase, each config parameter must follow certain conventions. For example, the encoder parameter must be one of the several encoders we support in the codebase.
Secondly, we use class-based experiments, contained in src/experiments.py to run the main functionality of each experiment. The main experiment class takes in a config object and contains the training loop, val loop, and test loop. Each experiment class inherits from a trainer module specified in src/trainer.py.
In addition, several other files are included for organization purposes. The dataset classes are specified in src/datasets.py and the models are specified in src/models.py.
Finally, to run the actual program to train, val, or test a model, the user should run src/run.py, the runner script that puts everything together. The user must specify two command line arguments when running src/run.py: the config class they wish to use and whether to train, test, or val the model. For example, training the a model with the default config can be done with the following command:
$ python3 run.py -config DefaultConfig -train
Note that all training runs will consist of the specified amount of epochs where each epoch represents a full traversal through the training set, as well as performing evaluation on the val set afterwards. Finally, at the end of the run, the model is run on both the val set and test set for a final time.
A user should specify whether they want to train the model or only perform evaluation on the val or test set using the -train, -val, and -test flags respectively when running run.py. Note that the user should only specify one of them per process.
Footnote: the trainer.py module is adapted from the torchplate package experiment.py module. torchplate was created by one of the authors of this project.
Config class parameters
- country (string): one of {malawi, mozambique}
- log (bool): whether to log to wandb. Need to provide your API key.
- description (string): description of experiment for documentating the output log.
- seed (int): random seed for reproducibility.
- use_seed (bool): whether to use the seed.
- experiment (class): which experiment class to use from experiments.py.
- model_name (string): name of model to use from models.py.
- encoder (string): name of encoder to use from models.py.
- default_run_name (string): for wandb logging purposes. random 5 digit number + timestamp.
- experiment_name (string): for wandb logging purposes.
- label_normalize (bool): whether to normalize labels.
- optimizer (string): name of optimizer to use. One of {adam, adamw}
- lr (float): learning rate.
- schedule_lr (bool): whether to use a learning rate scheduler.
- weight_decay (float): weight decay for l2 regularization.
- scheduler_step_size (int): step size for lr scheduler.
- scheduler_gamma (float): gamma for lr scheduler.
- verbose (bool): whether to print out things to log during training progress.
- Note that this parameter is buggy and doesn't work fully as intended.
- epochs (int): number of epochs to train for.
- resize_size (tuple): size to resize images to.
- year_1_dir (string): relative path to year 1 directory.
- year_2_dir (string): relative path to year 2 directory.
- train_filemap_path (string): relative path to train filemap.
- val_filemap_path (string): relative path to val filemap.
- test_filemap_path (string): relative path to test filemap.
- normalization_constants (string): name of normalization constants to use.
Options:
- Malawi: {standard_malawi, group_malawi, district_malawi}
- Mozambique: {standard_mozambique}
- percent_of_dataset_to_train (float): percent of dataset to use for training.
- percent_of_dataset_to_val (float): percent of dataset to use for validation.
- latent_dim (int): latent dimension of model; encoder output dim. (really is 2x due to concat)
- hidden_dim (int): hidden dimension of MLP head.
- save_best_model (bool): whether to save the best model during the course of training.
- save_at_end (bool): whether to save the model at the end of training.
- model_checkpoint_path (string): path to model checkpoint to load from.
- image_augmentation (bool): whether to use image augmentation.
- Create a python virtual environment.
- Install the requirements in the venv using
$ pip install -r requirements.txt. - cd into the program directory:
$ cd code - Set the paths to your data, relative to the current directory
$*$ inconfigs.pyin lines 18-21. - Run experiment with desired config. As of now, we support the best model (log) run via running
code/main.sh.