Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application

Optical flow estimation has been a long-lasting and fundamental problem in the computer vision community. However, despite the advances of optical flow estimation in perspective videos, the 360$^\circ$ videos counterpart remains in its infancy, primarily due to the shortage of benchmark datasets and the failure to accommodate the omnidirectional nature of 360$^\circ$ videos. We propose the first perceptually realistic 360$^\circ$ filed-of-view video benchmark dataset, namely FLOW360, with 40 different videos and 4,000 video frames. We then conduct comprehensive characteristic analysis and extensive comparisons with existing datasets, manifesting FLOW360's perceptual realism, uniqueness, and diversity. Moreover, we present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF) estimation, which is trained in a contrastive manner via a hybrid loss that combines siamese contrastive and optical flow losses. By training the model on random rotations of the input omnidirectional frames, our proposed contrastive scheme accommodates the omnidirectional nature of optical flow estimation in 360$^\circ$ videos, resulting in significantly reduced prediction errors. The learning scheme is further proven to be efficient by expanding our siamese learning scheme and omnidirectional optical flow estimation to the egocentric activity recognition task, where the classification accuracy is boosted up to $\sim$26%. To summarize, we study the optical flow estimation in 360$^\circ$ videos problem from perspectives of the benchmark dataset, learning model, and also practical application. The FLOW360 dataset and code are available at https://siamlof.github.io.

翻译：光流估计一直是计算机视觉领域长期且基础的研究问题。然而，尽管透视视频中的光流估计取得了进展，360度视频领域的相应研究仍处于起步阶段，主要原因是缺乏基准数据集以及未能适应360度视频的全方位特性。我们提出了首个具有感知真实感的360度视场角视频基准数据集——FLOW360，包含40个不同视频和4000帧视频帧。随后，我们进行了全面的特征分析，并与现有数据集进行了广泛比较，证明了FLOW360的感知真实感、独特性和多样性。此外，我们提出了一种新颖的全向流孪生表示学习框架（SLOF），该框架通过结合孪生对比损失和光流损失的混合损失以对比方式进行训练。通过在输入全向帧的随机旋转上训练模型，我们提出的对比方案适应了360度视频中光流估计的全向特性，从而显著降低了预测误差。通过将我们的孪生学习方案和全向光流估计扩展到自我中心活动识别任务，该学习方案进一步被证明是高效的，其中分类准确率提升了约26%。综上所述，我们从基准数据集、学习模型以及实际应用的角度研究了360度视频中的光流估计问题。FLOW360数据集和代码可在https://siamlof.github.io获取。