Transformer-based models have revolutionized the field of image super-resolution (SR) by harnessing their inherent ability to capture complex contextual features. The overlapping rectangular shifted window technique used in transformer architecture nowadays is a common practice in super-resolution models to improve the quality and robustness of image upscaling. However, it suffers from distortion at the boundaries and has limited unique shifting modes. To overcome these weaknesses, we propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion and allows the model to access more unique sifting modes. In this paper, we propose a Composite Fusion Attention Transformer (CFAT) that incorporates triangular-rectangular window-based local attention with a channel-based global attention technique in image super-resolution. As a result, CFAT enables attention mechanisms to be activated on more image pixels and captures long-range, multi-scale features to improve SR performance. The extensive experimental results and ablation study demonstrate the effectiveness of CFAT in the SR domain. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
翻译:基于Transformer的模型通过利用其固有的捕获复杂上下文特征的能力,彻底改变了图像超分辨率领域。当前超分辨率模型中采用的Transformer架构重叠矩形偏移窗口技术,是提升图像放大质量与鲁棒性的常见做法。然而,该技术存在边界畸变问题,且独特偏移模式数量有限。为克服这些缺陷,我们提出一种非重叠三角窗口技术,该技术与矩形窗口协同工作,以减轻边界级畸变,并使模型能够访问更多独特偏移模式。本文提出一种复合融合注意力变换器,该模型在图像超分辨率中融合了基于三角-矩形窗口的局部注意力与基于通道的全局注意力技术。由此,CFAT使得注意力机制能够激活更多图像像素,并捕获长距离、多尺度特征以提升超分辨率性能。大量实验结果与消融研究表明CFAT在超分辨率领域的有效性。与现有最先进的超分辨率架构相比,我们的模型性能显著提升了0.7 dB。