In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC). First, we reveal a fundamental relationship between bit allocation in NVC and Semi-Amortized Variational Inference (SAVI). Specifically, we show that SAVI with GoP (Group-of-Picture)-level likelihood is equivalent to pixel-level bit allocation with precise rate \& quality dependency model. Based on this equivalence, we establish a new paradigm of bit allocation using SAVI. Different from previous bit allocation methods, our approach requires no empirical model and is thus optimal. Moreover, as the original SAVI using gradient ascent only applies to single-level latent, we extend the SAVI to multi-level such as NVC by recursively applying back-propagating through gradient ascent. Finally, we propose a tractable approximation for practical implementation. Our method can be applied to scenarios where performance outweights encoding speed, and serves as an empirical bound on the R-D performance of bit allocation. Experimental results show that current state-of-the-art bit allocation algorithms still have a room of $\approx 0.5$ dB PSNR to improve compared with ours. Code is available at \url{https://github.com/tongdaxu/Bit-Allocation-Using-Optimization}.
翻译:本文探讨神经视频压缩中的比特分配问题。首先,我们揭示了神经视频压缩中比特分配与半摊销变分推断之间的基本关系。具体而言,我们证明具有图像组级似然的半摊销变分推断等价于采用精确率失真依赖模型的像素级比特分配。基于这一等价性,我们建立了使用半摊销变分推断的新比特分配范式。与以往比特分配方法不同,我们的方法无需经验模型,因此具有最优性。此外,由于原始半摊销变分推断仅适用于单层潜在变量,我们通过递归应用梯度上升反向传播技术,将其扩展至多层架构(如神经视频压缩)。最后,我们提出一种适用于实际部署的可行近似方案。本方法适用于性能优先于编码速度的场景,并可作为比特分配率失真性能的经验上界。实验表明,当前最先进的比特分配算法相较我们的方法仍存在约0.5 dB峰值信噪比的提升空间。代码已开源至 \url{https://github.com/tongdaxu/Bit-Allocation-Using-Optimization}。