Transformer architectures have exhibited remarkable performance in image super-resolution (SR). Since the quadratic computational complexity of the self-attention (SA) in Transformer, existing methods tend to adopt SA in a local region to reduce overheads. However, the local design restricts the global context exploitation, which is crucial for accurate image reconstruction. In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images. Specifically, we propose the recursive-generalization self-attention (RG-SA). It recursively aggregates input features into representative feature maps, and then utilizes cross-attention to extract global information. Meanwhile, the channel dimensions of attention matrices (query, key, and value) are further scaled to mitigate the redundancy in the channel domain. Furthermore, we combine the RG-SA with local self-attention to enhance the exploitation of the global context, and propose the hybrid adaptive integration (HAI) for module integration. The HAI allows the direct and effective fusion between features at different levels (local or global). Extensive experiments demonstrate that our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively. Code and pre-trained models are available at https://github.com/zhengchen1999/RGT.
翻译:Transformer架构在图像超分辨率(SR)中展现出卓越性能。由于Transformer中自注意力(SA)的二次计算复杂度,现有方法通常采用局部区域内的自注意力来降低开销。然而,局部设计限制了全局上下文信息的利用,而这对于精确的图像重建至关重要。本文提出递归泛化Transformer(RGT)用于图像超分辨率,该方法能够捕获全局空间信息,且适用于高分辨率图像。具体而言,我们提出递归泛化自注意力(RG-SA),它通过递归方式将输入特征聚合为代表性特征图,随后利用交叉注意力提取全局信息。同时,进一步缩放注意力矩阵(查询、键和值)的通道维度,以缓解通道域的冗余。此外,我们将RG-SA与局部自注意力结合以增强全局上下文的利用,并提出混合自适应集成(HAI)用于模块集成。HAI能够实现不同层级(局部或全局)特征之间的直接有效融合。大量实验表明,我们的RGT在定量和定性评估上均优于最新方法。代码与预训练模型可在https://github.com/zhengchen1999/RGT获取。