Transformer architectures have exhibited remarkable performance in image super-resolution (SR). Since the quadratic computational complexity of the self-attention (SA) in Transformer, existing methods tend to adopt SA in a local region to reduce overheads. However, the local design restricts the global context exploitation, which is crucial for accurate image reconstruction. In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images. Specifically, we propose the recursive-generalization self-attention (RG-SA). It recursively aggregates input features into representative feature maps, and then utilizes cross-attention to extract global information. Meanwhile, the channel dimensions of attention matrices (query, key, and value) are further scaled to mitigate the redundancy in the channel domain. Furthermore, we combine the RG-SA with local self-attention to enhance the exploitation of the global context, and propose the hybrid adaptive integration (HAI) for module integration. The HAI allows the direct and effective fusion between features at different levels (local or global). Extensive experiments demonstrate that our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively. Code is released at https://github.com/zhengchen1999/RGT.
翻译:Transformer架构在图像超分辨率(SR)任务中展现了卓越性能。由于Transformer中自注意力(SA)机制存在二次计算复杂度,现有方法倾向于在局部区域内采用SA以降低开销。然而,局部设计限制了全局上下文信息的利用,而这对于精确的图像重建至关重要。本文提出用于图像超分辨率的递归泛化Transformer(RGT),该方法能够捕获全局空间信息,且适用于高分辨率图像。具体而言,我们提出递归泛化自注意力(RG-SA)机制,该机制将输入特征递归聚合为代表性特征图,再利用交叉注意力提取全局信息。同时,我们进一步缩放注意力矩阵(查询、键、值)的通道维度,以缓解通道域中的冗余。此外,我们将RG-SA与局部自注意力相结合以增强全局上下文的利用,并提出混合自适应集成(HAI)机制用于模块集成。HAI可实现不同层级(局部或全局)特征之间的直接有效融合。大量实验表明,我们的RGT在定量和定性指标上均优于近期最先进方法。代码已开源至https://github.com/zhengchen1999/RGT。