Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods. Code and models are obtainable at https://github.com/zhengchen1999/DAT.
翻译:Transformer近来在低层视觉任务(包括图像超分辨率)中广受关注。此类网络沿不同维度(空间或通道)利用自注意力机制,取得了显著性能。这启发我们融合Transformer中的两个维度以获得更强的表示能力。基于上述思路,我们提出一种新型Transformer模型——双聚合Transformer(DAT),用于图像超分辨率。DAT以块间与块内双重方式跨空间和通道维度聚合特征。具体而言,我们在连续Transformer块中交替应用空间自注意力和通道自注意力。交替策略使DAT能捕获全局上下文并实现块间特征聚合。此外,我们提出自适应交互模块(AIM)和空间门控前馈网络(SGFN)以实现块内特征聚合。AIM从对应维度补充两种注意力机制,同时SGFN在前馈网络中引入额外非线性空间信息。大量实验表明,我们的DAT超越现有方法。代码与模型可在 https://github.com/zhengchen1999/DAT 获取。