Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.
翻译:立体图像超分辨率(StereoSR)近年来因双摄像头在手机、自动驾驶车辆和机器人中的广泛部署而备受关注。本文提出一种名为SwinFSR的新立体超分辨率方法,该方法基于最初为单图像复原设计的SwinIR扩展架构,并融合了通过快速傅里叶卷积(FFC)获取的频域知识。具体而言,为高效收集全局信息,我们改进SwinIR中的残差Swin Transformer模块(RSTB),通过FFC显式整合频域知识,并利用由此生成的残差Swin傅里叶Transformer模块(RSFTB)进行特征提取。此外,为实现立体视图的高效精准融合,我们提出一种新型交叉注意力模块RCAM,该模块在计算成本低于现有最先进交叉注意力模块的同时实现了极具竞争力的性能。大量实验结果与消融研究证明了所提SwinFSR的有效性与高效性。