Skip to content

Latest commit

 

History

History
180 lines (140 loc) · 3.17 KB

File metadata and controls

180 lines (140 loc) · 3.17 KB

Optimizers & Schedulers

This page details the supported optimizers and learning rate schedulers available in DeepFense.

Optimizers

Optimizers are defined in the optimizer section of training in the config.

1. Adam (adam)

Standard Adam optimizer.

Configuration Signature:

optimizer:
  type: adam
  args:
    lr: float
    weight_decay: float
    betas: [float, float]

Parameters:

  • lr - (float) Learning rate (default: 1e-6).
  • weight_decay - (float) L2 penalty (default: 1e-4).
  • betas - (tuple) Coefficients for computing running averages of gradient and its square (default: (0.9, 0.999)).

Example:

optimizer:
  type: adam
  args:
    lr: 0.0001
    weight_decay: 0.0001

2. AdamW (adamw)

Adam with decoupled weight decay. Generally recommended over Adam for transformer-based models (Wav2Vec2, etc.).

Configuration Signature:

optimizer:
  type: adamw
  args:
    lr: float
    weight_decay: float
    betas: [float, float]

Parameters:

  • lr - (float) Learning rate (default: 1e-6).
  • weight_decay - (float) Weight decay coefficient (default: 1e-4).
  • betas - (tuple) (default: (0.9, 0.999)).

Example:

optimizer:
  type: adamw
  args:
    lr: 0.000001
    weight_decay: 0.01

3. SGD (sgd)

Stochastic Gradient Descent.

Configuration Signature:

optimizer:
  type: sgd
  args:
    lr: float
    momentum: float
    weight_decay: float

Parameters:

  • lr - (float) Learning rate.
  • momentum - (float) Momentum factor (default: 0.9).
  • weight_decay - (float) (default: 1e-4).

Example:

optimizer:
  type: sgd
  args:
    lr: 0.01
    momentum: 0.9

Schedulers

Schedulers adjust the learning rate during training. They are defined in the scheduler section of training.

1. Step LR (step_lr)

Decays the learning rate by gamma every step_size epochs.

Configuration Signature:

scheduler:
  type: step_lr
  args:
    step_size: int
    gamma: float

Parameters:

  • step_size - (int) Period of learning rate decay (default: 10).
  • gamma - (float) Multiplicative factor of learning rate decay (default: 0.1).

Example:

scheduler:
  type: step_lr
  args:
    step_size: 15
    gamma: 0.1

2. Cosine Annealing (cosine)

Set the learning rate using a cosine annealing schedule.

Configuration Signature:

scheduler:
  type: cosine
  args:
    T_max: int
    eta_min: float

Parameters:

  • T_max - (int) Maximum number of iterations (usually set to total epochs).
  • eta_min - (float) Minimum learning rate (default: 0).

Example:

scheduler:
  type: cosine
  args:
    T_max: 100
    eta_min: 0.0000001

3. Exponential LR (exponential)

Decays the learning rate of each parameter group by gamma every epoch.

Configuration Signature:

scheduler:
  type: exponential
  args:
    gamma: float

Parameters:

  • gamma - (float) Multiplicative factor of learning rate decay (default: 0.9).

Example:

scheduler:
  type: exponential
  args:
    gamma: 0.95