Slight modification of adam.lua causing different training losses with the same seed 

I just came across a strange problem. I slightly modified some parts of adam.lua as follows:

```
  -- Initialization
   state.t = state.t or 0
   -- Exponential moving average of gradient values
   state.m = state.m or x.new(x:size()):zero()
   -- Exponential moving average of squared gradient values
   state.v = state.v or x.new(x:size()):zero()
   -- A tmp tensor to hold the sqrt(v) + epsilon
   state.denom = state.denom or x.new(x:size()):zero()

   -- (3) learning rate decay (annealing)
   local clr = lr / (1 + state.t*lrd)

   state.t = state.t + 1
   local biasCorrection1 = 1 - beta1^state.t 
   local biasCorrection2 = 1 - beta2^state.t 

   -- (1) evaluate f(x) and df/dx
   local fx, dfdx = opfunc(x)

   -- (2) weight decay
   if wd ~= 0 then
      dfdx:add(wd, x)
   end
```

I changed the order of (1), (2) and (3), and placed 

> local biasCorrection1 = 1 - beta1^state.t
> local biasCorrection2 = 1 - beta2^state.t

after `state.t = state.t + 1`. With such changes, the training losses can not be ensured the same even though I used the same seed. If I added a `print()` between `state.t = state.t + 1` and ` local biasCorrection1 = 1 - beta1^state.t`, then I can obtain the same training losses with multiple runs. The original adam.lua can produce the same results with multiple runs.

Does anyone have any idea about what might be happening? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slight modification of adam.lua causing different training losses with the same seed #158

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slight modification of adam.lua causing different training losses with the same seed #158

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions