Codebook collapse is a common problem in training deep generative models with discrete representation spaces like Vector Quantized Variational Autoencoders (VQ-VAEs). We observe that the same problem arises for the alternatively designed discrete variational autoencoders (dVAEs) whose encoder directly learns a distribution over the codebook embeddings to represent the data. We hypothesize that using the softmax function to obtain a probability distribution causes the codebook collapse by assigning overconfident probabilities to the best matching codebook elements. In this paper, we propose a novel way to incorporate evidential deep learning (EDL) instead of softmax to combat the codebook collapse problem of dVAE. We evidentially monitor the significance of attaining the probability distribution over the codebook embeddings, in contrast to softmax usage. Our experiments using various datasets show that our model, called EdVAE, mitigates codebook collapse while improving the reconstruction performance, and enhances the codebook usage compared to dVAE and VQ-VAE based models. Our code can be found at https://github.com/ituvisionlab/EdVAE .
翻译:码书坍缩是训练具有离散表示空间的深度生成模型(如向量量化变分自编码器VQ-VAE)时的常见问题。我们发现在另一种设计的离散变分自编码器(dVAE)中同样存在该问题,其编码器直接学习码书嵌入上的分布以表示数据。我们假设使用softmax函数获取概率分布会导致码书坍缩,因为它会为最佳匹配的码书元素分配过度自信的概率。本文提出一种创新方法,采用证据深度学习(EDL)替代softmax以解决dVAE的码书坍缩问题。与使用softmax的方式不同,我们通过证据方法监控获取码书嵌入概率分布的重要性。在多个数据集上的实验表明,我们的模型EdVAE能够缓解码书坍缩,同时提升重建性能,并且相较于基于dVAE和VQ-VAE的模型提高了码书利用率。代码发布于https://github.com/ituvisionlab/EdVAE。