If a parameter does not affect a particular component of the model, there is no point evaluating it again to obtain the derivative of the overall model. That is, for some a1 + a2 model, imagine wanting the derivative of the output w.r.t a1.p1. Then under the hood, the last output of a2 should be cached (provided no other parameters were altered). I suppose this is the equivalent of putting AutoCache on everything?
Alternatively, the derivative of f(x) + c w.r.t x is just f'(x), so additive components around the model can be ignored without it changing the result of the derivative.
If a parameter does not affect a particular component of the model, there is no point evaluating it again to obtain the derivative of the overall model. That is, for some
a1 + a2model, imagine wanting the derivative of the output w.r.ta1.p1. Then under the hood, the last output ofa2should be cached (provided no other parameters were altered). I suppose this is the equivalent of puttingAutoCacheon everything?Alternatively, the derivative of
f(x) + cw.r.txis justf'(x), so additive components around the model can be ignored without it changing the result of the derivative.