In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. Given a frame pair, we estimate multiple bidirectional flows to directly forward warp the pixels to the desired time step before fusing overlapping pixels. In doing so, each source pixel renders multiple target pixels and each target pixel can be synthesized from a larger area of visual context, establishing a many-to-many splatting scheme with robustness to undesirable artifacts. For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames, hence achieving fast multi-frame interpolation. However, directly warping and fusing pixels in the intensity domain is sensitive to the quality of motion estimation and may suffer from less effective representation capacity. To improve interpolation accuracy, we further extend an M2M++ framework by introducing a flexible Spatial Selective Refinement (SSR) component, which allows for trading computational efficiency for interpolation quality and vice versa. Instead of refining the entire interpolated frame, SSR only processes difficult regions selected under the guidance of an estimated error map, thereby avoiding redundant computation. Evaluation on multiple benchmark datasets shows that our method is able to improve the efficiency while maintaining competitive video interpolation quality, and it can be adjusted to use more or less compute as needed.
翻译:本文首先提出一种完全可微的多对多(M2M)光流投影框架,以实现高效帧插值。给定一对输入帧,我们估计多个双向光流,在融合重叠像素之前直接将像素前向投影至目标时间步。通过这种方式,每个源像素可生成多个目标像素,而每个目标像素也能从更大范围的视觉上下文中合成,从而建立一种对不良伪影具有鲁棒性的多对多投影机制。对于每对输入帧,M2M在插值任意数量中间帧时仅产生极小的计算开销,因此可实现快速多帧插值。然而,直接在像素强度域中进行投影与融合对运动估计质量较为敏感,且可能面临表征能力不足的问题。为提升插值精度,我们进一步扩展M2M++框架,引入灵活的空间选择性细化(SSR)组件,该组件允许在计算效率与插值质量之间进行权衡。SSR无需对整个插值帧进行细化,而是仅处理在估计误差图指导下选择出的困难区域,从而避免冗余计算。在多个基准数据集上的评估表明,本方法能在保持竞争性视频插值质量的同时提升效率,且可根据需求调整计算资源消耗。