Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech generation. VQ operates as a parametric K-means algorithm that quantizes inputs using a single codebook vector in the forward pass. While powerful, this technique faces practical challenges including codebook collapse, non-differentiability and lossy compression. To mitigate the aforementioned issues, we propose Soft Convex Quantization (SCQ) as a direct substitute for VQ. SCQ works like a differentiable convex optimization (DCO) layer: in the forward pass, we solve for the optimal convex combination of codebook vectors that quantize the inputs. In the backward pass, we leverage differentiability through the optimality conditions of the forward solution. We then introduce a scalable relaxation of the SCQ optimization and demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets. We train powerful SCQ autoencoder models that significantly outperform matched VQ-based architectures, observing an order of magnitude better image reconstruction and codebook usage with comparable quantization runtime.
翻译:向量量化(VQ)是深度学习中提取信息丰富的离散隐表示的一种重要技术。嵌入VQ的模型在图像和语音生成等一系列应用中展现出显著成果。VQ作为一种参数化K均值算法,在前向传播中使用单个码本向量对输入进行量化。尽管这种方法功能强大,但面临码本崩溃、不可微性及有损压缩等实际挑战。为缓解上述问题,我们提出软凸量化(SCQ)作为VQ的直接替代方案。SCQ的工作方式类似于可微凸优化(DCO)层:在前向传播中,我们求解量化输入的最优码本向量凸组合;在反向传播中,我们利用前向解的最优性条件实现可微性。接着,我们引入SCQ优化的可扩展松弛方法,并在CIFAR-10、GTSRB和LSUN数据集上验证其有效性。我们训练了强大的SCQ自编码器模型,其性能显著优于匹配的VQ架构,在可比的量化运行时间内实现了图像重建和码本使用率的数量级提升。