Laplace smoothing for EMA codebook update

Hi, 

I understand that to calculate the normalized weights for the embeddings we divide by the Laplace smoothed cluster sizes as seen in the code [here](https://github.com/google-deepmind/sonnet/blob/v2/sonnet/src/nets/vqvae.py). 

However, for the embeddings whose cluster sizes are zero, the Laplace smoothing replaces it with a very small value (some function of epsilon). When these updated cluster sizes are used to normalize (by dividing the running ema_dw with the updated cluster size) and update the embeddings, the corresponding embeddings with zero cluster sizes are updated to a very high value. These updated embeddings then have an ever lower probability of being chosen  in the future. 

Is my understanding of this issue correct or am I missing something? If this is indeed correct is there a way to mitigate this to have a higher perplexity score?

Thanks! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Laplace smoothing for EMA codebook update #272

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Laplace smoothing for EMA codebook update #272

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions