Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better restoration, we propose a new Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to further exploit the potential of the model for further improvement. Extensive experiments have demonstrated the effectiveness of the proposed modules. We further scale up the model to show that the performance of the SR task can be greatly improved. Besides, we extend HAT to more image restoration applications, including real-world image super-resolution, Gaussian image denoising and image compression artifacts reduction. Experiments on benchmark and real-world datasets demonstrate that our HAT achieves state-of-the-art performance both quantitatively and qualitatively. Codes and models are publicly available at https://github.com/XPixelGroup/HAT.
翻译:摘要:基于Transformer的方法在图像超分辨率和去噪等图像恢复任务中已展现出显著性能。然而,通过归因分析我们发现,这类网络仅能利用有限空间范围内的输入信息,这表明现有网络中Transformer的潜力尚未被充分挖掘。为激活更多输入像素以实现更优恢复效果,我们提出了一种新型混合注意力Transformer(HAT)。该方法融合了通道注意力与基于窗口的自注意力机制,从而发挥两者的互补优势。此外,为更好地聚合跨窗口信息,我们引入了重叠交叉注意力模块以增强相邻窗口特征间的交互。在训练阶段,我们额外采用同任务预训练策略,进一步挖掘模型潜力以提升性能。大量实验证明了所提出模块的有效性。我们进一步扩展模型规模,表明超分辨率任务的性能可得到显著提升。同时,我们将HAT拓展至更多图像恢复应用场景,包括真实图像超分辨率、高斯图像去噪及图像压缩伪影消除。在基准数据集和真实数据集上的实验结果表明,我们的HAT在定量和定性评估中均达到了最先进水平。相关代码和模型已在https://github.com/XPixelGroup/HAT 公开。