SplatFlow: Learning Multi-frame Optical Flow via Splatting

The occlusion problem remains a crucial challenge in optical flow estimation (OFE). Despite the recent significant progress brought about by deep learning, most existing deep learning OFE methods still struggle to handle occlusions; in particular, those based on two frames cannot correctly handle occlusions because occluded regions have no visual correspondences. However, there is still hope in multi-frame settings, which can potentially mitigate the occlusion issue in OFE. Unfortunately, multi-frame OFE (MOFE) remains underexplored, and the limited studies on it are mainly specially designed for pyramid backbones or else obtain the aligned previous frame's features, such as correlation volume and optical flow, through time-consuming backward flow calculation or non-differentiable forward warping transformation. This study proposes an efficient MOFE framework named SplatFlow to address these shortcomings. SplatFlow introduces the differentiable splatting transformation to align the previous frame's motion feature and designs a Final-to-All embedding method to input the aligned motion feature into the current frame's estimation, thus remodeling the existing two-frame backbones. The proposed SplatFlow is efficient yet more accurate, as it can handle occlusions properly. Extensive experimental evaluations show that SplatFlow substantially outperforms all published methods on the KITTI2015 and Sintel benchmarks. Especially on the Sintel benchmark, SplatFlow achieves errors of 1.12 (clean pass) and 2.07 (final pass), with surprisingly significant 19.4% and 16.2% error reductions, respectively, from the previous best results submitted. The code for SplatFlow is available at https://github.com/wwsource/SplatFlow.

翻译：遮挡问题仍然是光流估计（OFE）中的关键挑战。尽管深度学习已带来显著进展，但现有大多数深度学习方法仍难以有效处理遮挡；特别是基于双帧的方法，因遮挡区域缺乏视觉对应关系而无法正确解决该问题。然而，多帧设定有望缓解OFE中的遮挡难题。遗憾的是，多帧光流估计（MOFE）目前尚待深入探索，且有限的相关研究主要针对金字塔主干网络特殊设计，或需通过耗时的反向流计算或不可微的前向扭曲变换来获取对齐的先前帧特征（如相关体和光流）。本研究提出高效MOFE框架SplatFlow以解决上述缺陷。SplatFlow引入可微溅射变换来对齐前一帧的运动特征，并设计全到终嵌入方法将对齐后的运动特征输入当前帧估计中，从而重构现有双帧主干网络。所提SplatFlow在高效性的同时具备更高精度，能妥善处理遮挡问题。大量实验评估表明，SplatFlow在KITTI2015和Sintel数据集上显著优于所有已发表方法。尤其在Sintel基准上，SplatFlow在纯净通道和最终通道分别取得1.12和2.07的误差，相比此前最优结果分别实现19.4%和16.2%的惊人误差降低。SplatFlow代码开源于https://github.com/wwsource/SplatFlow。