Recently, Transformer-based image restoration networks have achieved promising improvements over convolutional neural networks due to parameter-independent global interactions. To lower computational cost, existing works generally limit self-attention computation within non-overlapping windows. However, each group of tokens are always from a dense area of the image. This is considered as a dense attention strategy since the interactions of tokens are restrained in dense regions. Obviously, this strategy could result in restricted receptive fields. To address this issue, we propose Attention Retractable Transformer (ART) for image restoration, which presents both dense and sparse attention modules in the network. The sparse attention module allows tokens from sparse areas to interact and thus provides a wider receptive field. Furthermore, the alternating application of dense and sparse attention modules greatly enhances representation ability of Transformer while providing retractable attention on the input image.We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks. Experimental results validate that our proposed ART outperforms state-of-the-art methods on various benchmark datasets both quantitatively and visually. We also provide code and models at https://github.com/gladzhang/ART.
翻译:最近,基于Transformer的图像恢复网络由于参数独立的全局交互能力,在卷积神经网络基础上取得了显著改进。为降低计算成本,现有研究通常将自注意力计算限制在非重叠窗口内。然而,每组特征始终来自图像的密集区域,这被视为密集注意策略——因为特征交互被局限在密集区域中。显然,这种策略会导致感受野受限。为此,我们提出面向图像恢复的注意力可伸缩Transformer(ART),该网络同时包含密集和稀疏注意力模块。稀疏注意力模块允许来自稀疏区域的特征进行交互,从而提供更宽广的感受野。此外,密集与稀疏注意力模块的交替应用在增强Transformer表征能力的同时,实现了对输入图像的可伸缩注意力机制。我们在图像超分辨率、去噪和JPEG压缩伪影去除任务上开展了广泛实验,结果表明所提出的ART在多个基准数据集上,无论是定量指标还是视觉质量均优于当前最优方法。相关代码与模型已开源至https://github.com/gladzhang/ART。