Changes of step_length #67

Huadangfan · 2026-03-21T02:47:48Z

Huadangfan
Mar 21, 2026

how does `step_method: 1` work？

Hi everyone, I have some questions about the step_method. In step_method: 0, the step length will decay if the objective function increases. For step_method: 1, if slowness is the only model parameter to be inverted, I guess the two methods are equivalent (since the gradient angle is only 0° or 180°). I wonder whether, if the step length is reduced to half of its original value (here is [0.5, 1.2]), the objective function should increase. However, I cannot find any corresponding changes in the objective function values (shown in obj_function.txt). So, how does step_method: 1 work when the step length becomes smaller? In other words, will the objective function increase when the step length is reduced if I use step_method: 1?

# parameters for optim_method 0 (gradient_descent)
  optim_method_0:
    step_method: 1  # the method to modulate step size. 0: according to objective function; 1: according to gradient direction 
    # if step_method:0. if objective function increase, step size -> step length * step_length_decay. 
    step_length_decay: 0.9 # default: 0.9
    # if step_method:1. if the angle between the current and the previous gradients is greater than step_length_gradient_angle, step size -> step length * step_length_change[0]. 
    #                                                                                                                otherwise, step size -> step length * step_length_change[1]. 
    step_length_gradient_angle: 120 # default: 120.0 
    step_length_change: [0.5, 1.2] # default: [0.5,1.2]

obj_function.txt

 # iter,        type,      obj(22796016),   obj_abs(4926707),obj_cs_dif(17869309),      obj_cr_dif(0),        obj_tele(0),           res(mean/std),       res_abs(mean/std),    res_cs_dif(mean/std),    res_cr_dif(mean/std),      res_tele(mean/std),        step_length,
      0, model update,         3.2139e+07,        8.05505e+06,         2.4084e+07,                  0,                  0,      -0.003647/1.187366,       0.025630/1.278405,      -0.011718/1.160882,                 0.0/0.0,                 0.0/0.0,               0.01,
      1, model update,        3.22001e+07,        8.00446e+06,        2.41956e+07,                  0,                  0,      -0.001147/1.188498,       0.025614/1.274383,      -0.008525/1.163597,                 0.0/0.0,                 0.0/0.0,              0.012,
      2, model update,        3.20015e+07,        7.97533e+06,        2.40262e+07,                  0,                  0,      -0.003735/1.184823,       0.027684/1.272017,      -0.012397/1.159481,                 0.0/0.0,                 0.0/0.0,              0.006,

Answered by JingChen-Thu

Mar 22, 2026

@Huadangfan
Thanks for your interest in the difference between these two strategies. Here is some information for your reference:

For method 0, the stpe length will decrease at the (n+1)-th iteration if the objective function at the n-th iteration is greater than that at the (n-1)-th iteration. In the case, the step length is too large for linear approximation (or called first-order Tayler expansion). A decreased step length is preferred.

For method 1, the stpe length with decrease at the (n+1)-th iteration if the gradient at n-th iteration significantly differs from that at the (n-1)-th iteration. For example, the angle between two kernels is greater than the default value 120 degree, wh…

View full answer

JingChen-Thu · 2026-03-22T01:46:51Z

JingChen-Thu
Mar 22, 2026
Maintainer

@Huadangfan
Thanks for your interest in the difference between these two strategies. Here is some information for your reference:

For method 0, the stpe length will decrease at the (n+1)-th iteration if the objective function at the n-th iteration is greater than that at the (n-1)-th iteration. In the case, the step length is too large for linear approximation (or called first-order Tayler expansion). A decreased step length is preferred.

For method 1, the stpe length with decrease at the (n+1)-th iteration if the gradient at n-th iteration significantly differs from that at the (n-1)-th iteration. For example, the angle between two kernels is greater than the default value 120 degree, which means the model update direction are somewhat reversed. In this case, the model update is in the stage of oscillation. A decreased step length is preferred.

2 replies

Huadangfan Mar 22, 2026
Author

Thanks, I got it. So if we only update the slowness in our model parameters, then the gradient vector of this iteration should only differ from the direction of the previous iteration by being either parallel or antiparallel, right?

JingChen-Thu Mar 23, 2026
Maintainer

Thanks, I got it. So if we only update the slowness in our model parameters, then the gradient vector of this iteration should only differ from the direction of the previous iteration by being either parallel or antiparallel, right?

In fact, we define the angle between two vectors by \theta = arccos((v1 \cdot v2)/(|v1| * |v2|)).

If the angle is greater than a threshold, e.g., 120 degree as the default value, we deem the gradient is somewhat reversed. This strategy is also applicable for anisotropy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TomoATT-Community

Changes of step_length #67

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TomoATT-Community

Changes of step_length #67

Uh oh!

Huadangfan Mar 21, 2026

how does step_method: 1 work？

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

JingChen-Thu Mar 22, 2026 Maintainer

Uh oh!

Huadangfan Mar 22, 2026 Author

Uh oh!

JingChen-Thu Mar 23, 2026 Maintainer

Huadangfan
Mar 21, 2026

how does `step_method: 1` work？

Replies: 1 comment 2 replies

JingChen-Thu
Mar 22, 2026
Maintainer

Huadangfan Mar 22, 2026
Author

JingChen-Thu Mar 23, 2026
Maintainer