This page details the supported optimizers and learning rate schedulers available in DeepFense.
Optimizers are defined in the optimizer section of training in the config.
Standard Adam optimizer.
Configuration Signature:
optimizer:
type: adam
args:
lr: float
weight_decay: float
betas: [float, float]Parameters:
- lr - (float) Learning rate (default:
1e-6). - weight_decay - (float) L2 penalty (default:
1e-4). - betas - (tuple) Coefficients for computing running averages of gradient and its square (default:
(0.9, 0.999)).
Example:
optimizer:
type: adam
args:
lr: 0.0001
weight_decay: 0.0001Adam with decoupled weight decay. Generally recommended over Adam for transformer-based models (Wav2Vec2, etc.).
Configuration Signature:
optimizer:
type: adamw
args:
lr: float
weight_decay: float
betas: [float, float]Parameters:
- lr - (float) Learning rate (default:
1e-6). - weight_decay - (float) Weight decay coefficient (default:
1e-4). - betas - (tuple) (default:
(0.9, 0.999)).
Example:
optimizer:
type: adamw
args:
lr: 0.000001
weight_decay: 0.01Stochastic Gradient Descent.
Configuration Signature:
optimizer:
type: sgd
args:
lr: float
momentum: float
weight_decay: floatParameters:
- lr - (float) Learning rate.
- momentum - (float) Momentum factor (default:
0.9). - weight_decay - (float) (default:
1e-4).
Example:
optimizer:
type: sgd
args:
lr: 0.01
momentum: 0.9Schedulers adjust the learning rate during training. They are defined in the scheduler section of training.
Decays the learning rate by gamma every step_size epochs.
Configuration Signature:
scheduler:
type: step_lr
args:
step_size: int
gamma: floatParameters:
- step_size - (int) Period of learning rate decay (default:
10). - gamma - (float) Multiplicative factor of learning rate decay (default:
0.1).
Example:
scheduler:
type: step_lr
args:
step_size: 15
gamma: 0.1Set the learning rate using a cosine annealing schedule.
Configuration Signature:
scheduler:
type: cosine
args:
T_max: int
eta_min: floatParameters:
- T_max - (int) Maximum number of iterations (usually set to total epochs).
- eta_min - (float) Minimum learning rate (default:
0).
Example:
scheduler:
type: cosine
args:
T_max: 100
eta_min: 0.0000001Decays the learning rate of each parameter group by gamma every epoch.
Configuration Signature:
scheduler:
type: exponential
args:
gamma: floatParameters:
- gamma - (float) Multiplicative factor of learning rate decay (default:
0.9).
Example:
scheduler:
type: exponential
args:
gamma: 0.95