Codebook collapse is a common problem in training deep generative models with discrete representation spaces like Vector Quantized Variational Autoencoders (VQ-VAEs). We observe that the same problem arises for the alternatively designed discrete variational autoencoders (dVAEs) whose encoder directly learns a distribution over the codebook embeddings to represent the data. We hypothesize that using the softmax function to obtain a probability distribution causes the codebook collapse by assigning overconfident probabilities to the best matching codebook elements. In this paper, we propose a novel way to incorporate evidential deep learning (EDL) instead of softmax to combat the codebook collapse problem of dVAE. We evidentially monitor the significance of attaining the probability distribution over the codebook embeddings, in contrast to softmax usage. Our experiments using various datasets show that our model, called EdVAE, mitigates codebook collapse while improving the reconstruction performance, and enhances the codebook usage compared to dVAE and VQ-VAE based models.
翻译:码本坍塌是训练具有离散表示空间的深度生成模型(如向量量化变分自编码器,VQ-VAE)时的常见问题。我们观察到,另一种设计的离散变分自编码器(dVAE)也会出现同样的问题,该编码器直接学习码本嵌入上的分布来表示数据。我们假设,使用softmax函数获取概率分布时,会将过高置信度的概率分配给最佳匹配的码本元素,从而导致码本坍塌。本文提出一种创新方法,用证据深度学习(EDL)替代softmax以应对dVAE的码本坍塌问题。我们以证据方式监控码本嵌入上概率分布的重要性,这与softmax的使用形成对比。基于多个数据集的实验表明,我们提出的模型EdVAE能够缓解码本坍塌,同时提升重建性能,并相较于基于dVAE和VQ-VAE的模型增强了码本利用率。