Vector Quantized Variational Autoencoders (VQ-VAEs) leverage self-supervised learning through reconstruction tasks to represent continuous vectors using the closest vectors in a codebook. However, issues such as codebook collapse persist in the VQ model. To address these issues, existing approaches employ implicit static codebooks or jointly optimize the entire codebook, but these methods constrain the codebook's learning capability, leading to reduced reconstruction quality. In this paper, we propose Group-VQ, which performs group-wise optimization on the codebook. Each group is optimized independently, with joint optimization performed within groups. This approach improves the trade-off between codebook utilization and reconstruction performance. Additionally, we introduce a training-free codebook resampling method, allowing post-training adjustment of the codebook size. In image reconstruction experiments under various settings, Group-VQ demonstrates improved performance on reconstruction metrics. And the post-training codebook sampling method achieves the desired flexibility in adjusting the codebook size.
翻译:向量量化变分自编码器(VQ-VAEs)通过重构任务进行自监督学习,利用码本中最接近的向量来表示连续向量。然而,VQ模型仍存在码本坍缩等问题。为解决这些问题,现有方法采用隐式静态码本或联合优化整个码本,但这些方法限制了码本的学习能力,导致重构质量下降。本文提出Group-VQ方法,对码本进行分组优化。每个组独立进行优化,组内执行联合优化。该方法改善了码本利用率与重构性能之间的权衡。此外,我们引入了一种无需训练即可实现的码本重采样方法,支持训练后调整码本大小。在不同设置的图像重构实验中,Group-VQ在重构指标上表现出性能提升。训练后码本采样方法在调整码本大小时实现了所需的灵活性。