Hi, I think there is a bug here:
For RNN model which the last layer before softmax has shape [B * N * D] where time steps N>1, I believe the squeeze do not have any effect. Maybe for batch size B=1? If that is the case, squeeze(0) might be a better choice.
I am using your code for predicting the last state (in other words, N=1). The squeeze here will give a model_loss.shape = (B , 1) and noise_loss.shape = (B,) and then the total loss.shape = (B, B), which should be (B,1) I think.
Hi, I think there is a bug here:
Pytorch-NCE/nce.py
Line 198 in 862afc6
For RNN model which the last layer before softmax has shape [B * N * D] where time steps
N>1, I believe thesqueezedo not have any effect. Maybe for batch sizeB=1? If that is the case,squeeze(0)might be a better choice.I am using your code for predicting the last state (in other words,
N=1). Thesqueezehere will give amodel_loss.shape = (B , 1)andnoise_loss.shape = (B,)and then the totalloss.shape = (B, B), which should be(B,1)I think.