Object Segmentation-Assisted Inter Prediction for Versatile Video Coding

In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to compactly represent. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VVC) standard, but the more flexible partitions require more overhead bits to signal and still cannot be made arbitrary shaped. To address this limitation, we propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies. With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions without any extra signal. Using the segmentation mask, motion compensation is separately performed for different regions, achieving higher prediction accuracy. The segmentation mask is further used to code the motion vectors of different regions more efficiently. Moreover, segmentation mask is considered in the joint rate-distortion optimization for motion estimation and partition estimation to derive the motion vector of different regions and partition more accurately. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences, under the Low-delay P, Low-delay B, and Random Access configurations, respectively.

翻译：在现代视频编码标准中，基于块的帧间预测被广泛采用，带来了较高的压缩效率。然而，自然视频中通常存在多个任意形状的运动物体，导致复杂的运动场难以紧凑表示。通用视频编码（VVC）标准通过更灵活的块划分方法解决了这一问题，但更灵活的划分需要更多开销比特进行信号传输，且仍无法实现任意形状划分。为解决此限制，我们提出一种目标分割辅助的帧间预测方法（SAIP），该方法利用先进技术对参考帧中的目标进行分割。通过适当指示，目标分割掩膜从参考帧传递到当前帧，作为不同区域的任意形状划分，无需额外信号。利用分割掩膜，对不同区域分别进行运动补偿，实现更高的预测精度。分割掩膜进一步用于更高效地编码不同区域的运动矢量。此外，在运动估计和划分估计的联合率失真优化中考虑分割掩膜，以更准确地推导不同区域的运动矢量和划分。所提方法已集成至VVC参考软件VTM 12.0版本。实验结果表明，在低延迟P、低延迟B和随机访问配置下，该方法对通用测试序列分别实现了最高1.98%、1.14%、0.79%，平均0.82%、0.49%、0.37%的BD-rate降低。