Introduce automatic differentiation

Right now, I compute gradient using finite differences, which takes `2N` simulations per epoch, with `N` being the dimension of the state space. 

Using automatic differentiation would reduce this burden to 1 simulation per epoch.