In theory, vector quantization (VQ) is always better than scalar quantization (SQ) in terms of rate-distortion (R-D) performance. Recent state-of-the-art methods for neural image compression are mainly based on nonlinear transform coding (NTC) with uniform scalar quantization, overlooking the benefits of VQ due to its exponentially increased complexity. In this paper, we first investigate on some toy sources, demonstrating that even if modern neural networks considerably enhance the compression performance of SQ with nonlinear transform, there is still an insurmountable chasm between SQ and VQ. Therefore, revolving around VQ, we propose a novel framework for neural image compression named Nonlinear Vector Transform Coding (NVTC). NVTC solves the critical complexity issue of VQ through (1) a multi-stage quantization strategy and (2) nonlinear vector transforms. In addition, we apply entropy-constrained VQ in latent space to adaptively determine the quantization boundaries for joint rate-distortion optimization, which improves the performance both theoretically and experimentally. Compared to previous NTC approaches, NVTC demonstrates superior rate-distortion performance, faster decoding speed, and smaller model size. Our code is available at https://github.com/USTC-IMCL/NVTC
翻译:理论上,矢量量化(VQ)在率失真(R-D)性能方面始终优于标量量化(SQ)。当前最先进的神经图像压缩方法主要基于均匀标量量化的非线性变换编码(NTC),因矢量量化复杂度呈指数增长而忽视了其优势。本文首先对若干简单信源进行研究,结果表明即使现代神经网络通过非线性变换显著提升了SQ的压缩性能,SQ与VQ之间仍存在不可逾越的鸿沟。因此,我们围绕VQ提出一种名为非线性矢量变换编码(NVTC)的新型神经图像压缩框架。NVTC通过(1)多阶段量化策略与(2)非线性矢量变换解决了VQ的关键复杂度问题。此外,我们在潜在空间中应用熵约束VQ自适应确定量化边界以实现联合率失真优化,从理论和实验上均提升了性能。与先前的NTC方法相比,NVTC展现出更优的率失真性能、更快的解码速度及更小的模型尺寸。我们的代码已开源在https://github.com/USTC-IMCL/NVTC