Transformer-based methods have demonstrated excellent performance on super-resolution visual tasks, surpassing conventional convolutional neural networks. However, existing work typically restricts self-attention computation to non-overlapping windows to save computational costs. This means that Transformer-based networks can only use input information from a limited spatial range. Therefore, a novel Hybrid Multi-Axis Aggregation network (HMA) is proposed in this paper to exploit feature potential information better. HMA is constructed by stacking Residual Hybrid Transformer Blocks(RHTB) and Grid Attention Blocks(GAB). On the one side, RHTB combines channel attention and self-attention to enhance non-local feature fusion and produce more attractive visual results. Conversely, GAB is used in cross-domain information interaction to jointly model similar features and obtain a larger perceptual field. For the super-resolution task in the training phase, a novel pre-training method is designed to enhance the model representation capabilities further and validate the proposed model's effectiveness through many experiments. The experimental results show that HMA outperforms the state-of-the-art methods on the benchmark dataset. We provide code and models at https://github.com/korouuuuu/HMA.
翻译:基于Transformer的方法在超分辨率视觉任务中展现了优于传统卷积神经网络的卓越性能。然而,现有工作通常将自注意力计算限制在非重叠窗口内以节省计算成本,这意味着基于Transformer的网络仅能利用有限空间范围的输入信息。为此,本文提出了一种新颖的混合多轴聚合网络(HMA),以更好地挖掘特征潜在信息。HMA通过堆叠残差混合Transformer块(RHTB)和网格注意力块(GAB)构建。一方面,RHTB结合通道注意力与自注意力,增强非局部特征融合并生成更具吸引力的视觉结果;另一方面,GAB用于跨域信息交互,联合建模相似特征以获得更大的感知场。针对训练阶段的超分辨率任务,本文设计了一种新型预训练方法以进一步提升模型表征能力,并通过大量实验验证了所提模型的有效性。实验结果表明,HMA在基准数据集上优于现有最先进方法。相关代码与模型已在https://github.com/korouuuuu/HMA中开源。