In this paper, we propose an approach for view-time interpolation of stereo videos. Specifically, we build upon X-Fields that approximates an interpolatable mapping between the input coordinates and 2D RGB images using a convolutional decoder. Our main contribution is to analyze and identify the sources of the problems with using X-Fields in our application and propose novel techniques to overcome these challenges. Specifically, we observe that X-Fields struggles to implicitly interpolate the disparities for large baseline cameras. Therefore, we propose multi-plane disparities to reduce the spatial distance of the objects in the stereo views. Moreover, we propose non-uniform time coordinates to handle the non-linear and sudden motion spikes in videos. We additionally introduce several simple, but important, improvements over X-Fields. We demonstrate that our approach is able to produce better results than the state of the art, while running in near real-time rates and having low memory and storage costs.
翻译:本文提出了一种针对立体视频的视图-时间插值方法。具体而言,我们基于X-Fields框架进行改进,该框架利用卷积解码器近似输入坐标与二维RGB图像之间的可插值映射。我们的主要贡献在于分析并识别了X-Fields在本文应用场景中的问题根源,提出应对这些挑战的新技术。我们观察到,X-Fields难以通过隐式方式对大基线相机中的视差进行插值,因此引入多平面视差来减小立体视图中物体的空间距离。此外,我们提出非均匀时间坐标以处理视频中非线性的突发运动。我们还对X-Fields进行了若干简单但关键的改进。实验表明,本方法在实现接近实时运行速度、低内存和存储开销的同时,能够取得优于现有最优方法的结果。