Compared to CNN-based methods, Transformer-based methods achieve impressive image restoration outcomes due to their abilities to model remote dependencies. However, how to apply Transformer-based methods to the field of blind super-resolution (SR) and further make an SR network adaptive to degradation information is still an open problem. In this paper, we propose a new degradation-aware self-attention-based Transformer model, where we incorporate contrastive learning into the Transformer network for learning the degradation representations of input images with unknown noise. In particular, we integrate both CNN and Transformer components into the SR network, where we first use the CNN modulated by the degradation information to extract local features, and then employ the degradation-aware Transformer to extract global semantic features. We apply our proposed model to several popular large-scale benchmark datasets for testing, and achieve the state-of-the-art performance compared to existing methods. In particular, our method yields a PSNR of 32.43 dB on the Urban100 dataset at $\times$2 scale, 0.94 dB higher than DASR, and 26.62 dB on the Urban100 dataset at $\times$4 scale, 0.26 dB improvement over KDSR, setting a new benchmark in this area. Source code is available at: https://github.com/I2-Multimedia-Lab/DSAT/tree/main.
翻译:与基于CNN的方法相比,基于Transformer的方法因其远程依赖建模能力而取得了显著图像复原效果。然而,如何将基于Transformer的方法应用于盲超分辨率领域,并进一步使超分辨率网络适配退化信息,仍然是一个开放性问题。本文提出一种新的基于退化感知自注意力的Transformer模型,将对比学习融入Transformer网络,用于学习含未知噪声输入图像的退化表征。具体而言,我们将CNN与Transformer组件集成到超分辨率网络中:首先利用经退化信息调制的CNN提取局部特征,随后采用退化感知的Transformer提取全局语义特征。我们将所提模型应用于多个主流大规模基准数据集进行测试,相比现有方法取得了最先进的性能。特别地,本方法在Urban100数据集$\times$2倍率下PSNR达32.43 dB(较DASR提升0.94 dB),在$\times$4倍率下达26.62 dB(较KDSR提升0.26 dB),在该领域树立了新的标杆。源代码已开源:https://github.com/I2-Multimedia-Lab/DSAT/tree/main。