For some reason using this poli type causes a lot of nans only when training. It seems to work fine when I just run the environment.
For some reason using this poli type causes a lot of nans only when training. It seems to work fine when I just run the environment.