I found the LOSS_CONSISTENCY in your code, however it is not appeared in your paper. I have run the code on my own dataset, but it seems that the LOSS_CONSISTENCY is not stable. Does the LOSS_CONSISTENCY mean to minimize the difference the joint distribution and the marginal distributions of the multiple variational encoders?