In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC). First, we reveal a fundamental relationship between bit allocation in NVC and Semi-Amortized Variational Inference (SAVI). Specifically, we show that SAVI with GoP (Group-of-Picture)-level likelihood is equivalent to pixel-level bit allocation with precise rate \& quality dependency model. Based on this equivalence, we establish a new paradigm of bit allocation using SAVI. Different from previous bit allocation methods, our approach requires no empirical model and is thus optimal. Moreover, as the original SAVI using gradient ascent only applies to single-level latent, we extend the SAVI to multi-level such as NVC by recursively applying back-propagating through gradient ascent. Finally, we propose a tractable approximation for practical implementation. Our method can be applied to scenarios where performance outweights encoding speed, and serves as an empirical bound on the R-D performance of bit allocation. Experimental results show that current state-of-the-art bit allocation algorithms still have a room of $\approx 0.5$ dB PSNR to improve compared with ours. Code is available at \url{https://github.com/tongdaxu/Bit-Allocation-Using-Optimization}.
翻译:本文研究了神经视频压缩(NVC)中的比特分配问题。首先,我们揭示了NVC中比特分配与半摊销变分推断(SAVI)之间的基本关系。具体地,我们证明具有图像组(GoP)层级似然函数的SAVI等价于基于精确率-质量依赖模型的像素级比特分配。基于这一等价性,我们建立了利用SAVI进行比特分配的新范式。与先前的比特分配方法不同,我们的方法无需经验模型,因此具有最优性。此外,由于原始SAVI仅适用于单层潜变量,我们通过递归应用梯度上升反向传播技术,将SAVI扩展至NVC等多层场景。最后,我们提出一种可实现的可追踪近似方案。该方法适用于性能优先于编码速度的场景,并可作为比特分配率失真性能的经验下界。实验结果表明,当前最先进的比特分配算法与我们的方法相比,仍有约0.5 dB PSNR的提升空间。代码开源在\url{https://github.com/tongdaxu/Bit-Allocation-Using-Optimization}。