Vector-quantized networks (VQNs) have exhibited remarkable performance across various tasks, yet they are prone to training instability, which complicates the training process due to the necessity for techniques such as subtle initialization and model distillation. In this study, we identify the local minima issue as the primary cause of this instability. To address this, we integrate an optimal transport method in place of the nearest neighbor search to achieve a more globally informed assignment. We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem, thereby enhancing the stability and efficiency of the training process. To mitigate the influence of diverse data distributions on the Sinkhorn algorithm, we implement a straightforward yet effective normalization strategy. Our comprehensive experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.
翻译:向量量化网络(VQNs)在各种任务中展现出卓越性能,但其训练过程易出现不稳定性,这通常需要借助精细的初始化、模型蒸馏等技术来缓解,从而增加了训练复杂度。本研究指出局部极小值问题是导致该不稳定性的主要原因。为解决此问题,我们采用最优传输方法替代最近邻搜索,以实现更具全局信息的分配。我们提出了一种新颖的向量量化方法OptVQ,该方法利用Sinkhorn算法求解最优传输问题,从而提升训练过程的稳定性与效率。为减轻不同数据分布对Sinkhorn算法的影响,我们实施了一种简洁而有效的归一化策略。在图像重建任务上的综合实验表明,OptVQ实现了100%的码本利用率,并在重建质量上超越了当前最先进的向量量化网络。