In existing restoration-oriented Video Frame Interpolation (VFI) approaches, the motion estimation between neighboring frames plays a crucial role. However, the estimation accuracy in existing methods remains a challenge, primarily due to the inherent ambiguity in identifying corresponding areas in adjacent frames for interpolation. Therefore, enhancing accuracy by distinguishing different regions before motion estimation is of utmost importance. In this paper, we introduce a novel solution involving the utilization of open-world segmentation models, e.g., SAM2 (Segment Anything Model2) for frames, to derive Region-Distinguishable Priors (RDPs) in different frames. These RDPs are represented as spatial-varying Gaussian mixtures, distinguishing an arbitrary number of areas with a unified modality. RDPs can be integrated into existing motion-based VFI methods to enhance features for motion estimation, facilitated by our designed play-and-plug Hierarchical Region-aware Feature Fusion Module (HRFFM). HRFFM incorporates RDP into various hierarchical stages of VFI's encoder, using RDP-guided Feature Normalization (RDPFN) in a residual learning manner. With HRFFM and RDP, the features within VFI's encoder exhibit similar representations for matched regions in neighboring frames, thus improving the synthesis of intermediate frames. Extensive experiments demonstrate that HRFFM consistently enhances VFI performance across various scenes.
翻译:在现有修复导向的视频帧插值方法中,相邻帧间的运动估计起着关键作用。然而,现有方法的估计精度仍面临挑战,主要源于插值时相邻帧对应区域固有的模糊性。因此,在运动估计前通过区分不同区域来提升精度至关重要。本文提出一种新颖解决方案,利用开放世界分割模型(如SAM2)对视频帧进行处理,以获取不同帧中的区域可区分先验。这些先验表示为空间变化的高斯混合模型,能以统一模态区分任意数量的区域。通过我们设计的即插即用分层区域感知特征融合模块,RDP可集成到现有基于运动的VFI方法中,以增强运动估计的特征提取。HRFFM将RDP以残差学习方式,通过RDP引导的特征归一化融入VFI编码器的多个分层阶段。借助HRFFM和RDP,VFI编码器中的特征在相邻帧的匹配区域呈现相似表示,从而提升中间帧的合成质量。大量实验表明,HRFFM能在多种场景下持续提升VFI性能。