https://code-first-ml.github.io/book2/notebooks/ml_softwares/2017-08-12-linear-regression-adagrad-vs-gd.html Also see: Why Momentum Works blogpost on Distill
https://code-first-ml.github.io/book2/notebooks/ml_softwares/2017-08-12-linear-regression-adagrad-vs-gd.html
Also see: Why Momentum Works blogpost on Distill