Stereo video retargeting aims to resize an image to a desired aspect ratio. The quality of retargeted videos can be significantly impacted by the stereo videos spatial, temporal, and disparity coherence, all of which can be impacted by the retargeting process. Due to the lack of a publicly accessible annotated dataset, there is little research on deep learning-based methods for stereo video retargeting. This paper proposes an unsupervised deep learning-based stereo video retargeting network. Our model first detects the salient objects and shifts and warps all objects such that it minimizes the distortion of the salient parts of the stereo frames. We use 1D convolution for shifting the salient objects and design a stereo video Transformer to assist the retargeting process. To train the network, we use the parallax attention mechanism to fuse the left and right views and feed the retargeted frames to a reconstruction module that reverses the retargeted frames to the input frames. Therefore, the network is trained in an unsupervised manner. Extensive qualitative and quantitative experiments and ablation studies on KITTI stereo 2012 and 2015 datasets demonstrate the efficiency of the proposed method over the existing state-of-the-art methods. The code is available at https://github.com/z65451/SVR/.
翻译:立体视频重定向旨在将图像调整至所需宽高比。重定向视频的质量会显著受到立体视频的空间、时间与视差一致性的影响,而这些一致性均可能被重定向过程所改变。由于缺乏公开可用的标注数据集,基于深度学习的立体视频重定向方法研究甚少。本文提出了一种无监督的深度学习立体视频重定向网络。该模型首先检测显著性物体,并通过平移与变形所有物体,使得立体帧中显著部分的畸变最小化。我们采用一维卷积实现显著性物体的平移操作,并设计了一种立体视频Transformer来辅助重定向过程。在网络训练中,我们利用视差注意力机制融合左右视图,将重定向后的帧输入重构模块,使其逆向映射回输入帧。因此,该网络以无监督方式进行训练。在KITTI立体2012与2015数据集上的大量定性与定量实验及消融研究表明,所提方法优于现有最优技术。代码开源地址:https://github.com/z65451/SVR/。