Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. Our experiments on multiple datasets demonstrate that CRAFT outperforms state-of-the-art methods by up to 0.29dB while using fewer parameters. The source code will be made available at: https://github.com/AVC2-UESTC/CRAFT-SR.git.
翻译:基于Transformer的方法通过有效提取长程依赖关系,在单图像超分辨率(SISR)中展现出显著潜力。然而,当前该领域的大多数研究优先考虑设计Transformer模块以捕获全局信息,而忽略了引入我们相信有益的高频先验的重要性。在本研究中,我们进行了一系列实验,发现Transformer结构更擅长捕获低频信息,但与卷积结构相比,其在构建高频表示方面的能力有限。我们提出的解决方案——交叉细化自适应特征调制Transformer(CRAFT),融合了卷积和Transformer结构的优势。它包含三个关键组件:用于提取高频信息的高频增强残差块(HFERB)、用于捕获全局信息的移位矩形窗口注意力块(SRWAB),以及用于细化全局表示的混合融合块(HFB)。我们在多个数据集上的实验表明,CRAFT在使用更少参数的情况下,性能优于最先进的方法达0.29dB。源代码将在以下地址提供:https://github.com/AVC2-UESTC/CRAFT-SR.git。