The state-of-the-art neural video codecs have outperformed the most sophisticated traditional codecs in terms of RD performance in certain cases. However, utilizing them for practical applications is still challenging for two major reasons. 1) Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream. 2) The high computational complexity of the encoding and decoding process poses a challenge in achieving real-time performance. In this paper, we propose a real-time cross-platform neural video codec, which is capable of efficiently decoding of 720P video bitstream from other encoding platforms on a consumer-grade GPU. First, to solve the problem of inconsistency of codec caused by the uncertainty of floating point calculations across platforms, we design a calibration transmitting system to guarantee the consistent quantization of entropy parameters between the encoding and decoding stages. The parameters that may have transboundary quantization between encoding and decoding are identified in the encoding stage, and their coordinates will be delivered by auxiliary transmitted bitstream. By doing so, these inconsistent parameters can be processed properly in the decoding stage. Furthermore, to reduce the bitrate of the auxiliary bitstream, we rectify the distribution of entropy parameters using a piecewise Gaussian constraint. Second, to match the computational limitations on the decoding side for real-time video codec, we design a lightweight model. A series of efficiency techniques enable our model to achieve 25 FPS decoding speed on NVIDIA RTX 2080 GPU. Experimental results demonstrate that our model can achieve real-time decoding of 720P videos while encoding on another platform. Furthermore, the real-time model brings up to a maximum of 24.2\% BD-rate improvement from the perspective of PSNR with the anchor H.265.
翻译:最先进的神经视频编解码器在某些情况下已在率失真性能上超越最复杂的传统编解码器。然而,将其应用于实际场景仍面临两大挑战:1)浮点运算导致的跨平台计算误差可能引发比特流解码不准确;2)编解码过程的高计算复杂度制约了实时性能的实现。本文提出一种实时跨平台神经视频编解码器,能够在消费级GPU上高效解码来自其他编码平台的720P视频比特流。首先,为解决跨平台浮点计算不确定性导致的编解码不一致问题,我们设计了一套校准传输系统,确保编码与解码阶段熵参数量化的一致性。在编码阶段识别可能产生跨边界量化的参数,并通过辅助传输比特流传递其坐标,使解码阶段能正确处理这些不一致参数。为降低辅助比特流码率,我们采用分段高斯约束对熵参数分布进行修正。其次,为匹配实时视频编解码器的解码端计算限制,我们设计了轻量化模型。通过一系列效率优化技术,该模型在NVIDIA RTX 2080 GPU上实现了25 FPS的解码速度。实验结果表明,本模型能在另一平台编码的同时实现720P视频的实时解码。与锚点H.265相比,该实时模型在PSNR指标上最高可带来24.2%的BD率提升。