The vector quantization is a widely used method to map continuous representation to discrete space and has important application in tokenization for generative mode, bottlenecking information and many other tasks in machine learning. Vector Quantized Variational Autoencoder (VQ-VAE) is a type of variational autoencoder using discrete embedding as latent. We generalize the technique further, enriching the probabilistic framework with a Gaussian mixture as the underlying generative model. This framework leverages a codebook of latent means and adaptive variances to capture complex data distributions. This principled framework avoids various heuristics and strong assumptions that are needed with the VQ-VAE to address training instability and to improve codebook utilization. This approach integrates the benefits of both discrete and continuous representations within a variational Bayesian framework. Furthermore, by introducing the \textit{Aggregated Categorical Posterior Evidence Lower Bound} (ALBO), we offer a principled alternative optimization objective that aligns variational distributions with the generative model. Our experiments demonstrate that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafted heuristics.
翻译:向量量化是一种广泛使用的方法,用于将连续表示映射到离散空间,在生成模型的标记化、信息瓶颈以及机器学习中的许多其他任务中具有重要应用。向量量化变分自编码器(VQ-VAE)是一种使用离散嵌入作为隐变量的变分自编码器。我们进一步推广该技术,通过引入高斯混合作为基础生成模型来丰富其概率框架。该框架利用隐均值码本和自适应方差来捕捉复杂的数据分布。这一原则性框架避免了VQ-VAE中为应对训练不稳定性和提高码本利用率所需的各种启发式方法和强假设。该方法在变分贝叶斯框架内整合了离散表示与连续表示的优势。此外,通过引入\textit{聚合分类后验证据下界}(ALBO),我们提出了一种原则性的替代优化目标,使变分分布与生成模型保持一致。实验表明,GM-VQ在不依赖手工启发式方法的情况下,提高了码本利用率并减少了信息损失。