Multi-stage strategies are frequently employed in image restoration tasks. While transformer-based methods have exhibited high efficiency in single-image super-resolution tasks, they have not yet shown significant advantages over CNN-based methods in stereo super-resolution tasks. This can be attributed to two key factors: first, current single-image super-resolution transformers are unable to leverage the complementary stereo information during the process; second, the performance of transformers is typically reliant on sufficient data, which is absent in common stereo-image super-resolution algorithms. To address these issues, we propose a Hybrid Transformer and CNN Attention Network (HTCAN), which utilizes a transformer-based network for single-image enhancement and a CNN-based network for stereo information fusion. Furthermore, we employ a multi-patch training strategy and larger window sizes to activate more input pixels for super-resolution. We also revisit other advanced techniques, such as data augmentation, data ensemble, and model ensemble to reduce overfitting and data bias. Finally, our approach achieved a score of 23.90dB and emerged as the winner in Track 1 of the NTIRE 2023 Stereo Image Super-Resolution Challenge.
翻译:多阶段策略在图像恢复任务中常被采用。尽管基于Transformer的方法在单图像超分辨率任务中表现出高效率,但在立体超分辨率任务中尚未展现出对基于CNN方法的显著优势。这归因于两个关键因素:其一,当前单图像超分辨率Transformer无法在过程中利用互补的立体信息;其二,Transformer的性能通常依赖于充足的数据,而常见立体图像超分辨率算法缺乏这一条件。为解决这些问题,我们提出了一种混合Transformer与CNN注意力网络(HTCAN),该网络采用基于Transformer的网络进行单图像增强,并利用基于CNN的网络进行立体信息融合。此外,我们采用多图块训练策略和更大的窗口尺寸,以激活更多输入像素用于超分辨率。我们还重新审视了其他先进技术,如数据增强、数据集成和模型集成,以减少过拟合和数据偏差。最终,我们的方法在NTIRE 2023立体图像超分辨率挑战赛第一赛道中以23.90dB的分数获胜。