This work examines the challenges of training neural networks using vector quantization using straight-through estimation. We find that a primary cause of training instability is the discrepancy between the model embedding and the code-vector distribution. We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss, which leads to misaligned code-vector assignments. We propose to address this issue via affine re-parameterization of the code vectors. Additionally, we introduce an alternating optimization to reduce the gradient error introduced by the straight-through estimation. Moreover, we propose an improvement to the commitment loss to ensure better alignment between the codebook representation and the model embedding. These optimization methods improve the mathematical approximation of the straight-through estimation and, ultimately, the model performance. We demonstrate the effectiveness of our methods on several common model architectures, such as AlexNet, ResNet, and ViT, across various tasks, including image classification and generative modeling.
翻译:本研究探讨了使用直通估计进行向量量化时训练神经网络所面临的挑战。我们发现训练不稳定的主要原因是模型嵌入与码本向量分布之间的不一致性。我们识别了导致该问题的因素,包括码本梯度稀疏性以及承诺损失的对称性不足,这些因素会导致码本向量分配不匹配。我们提出通过码本向量的仿射重参数化来解决该问题。此外,我们引入交替优化以减少直通估计引入的梯度误差。进一步地,我们提出改进承诺损失,以确保码本表示与模型嵌入之间更好的对齐。这些优化方法改进了直通估计的数学近似值,并最终提升了模型性能。我们在多个常见模型架构(如AlexNet、ResNet和ViT)上,跨图像分类和生成建模等多种任务验证了本方法的有效性。