Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution

Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. To tackle the inherent intricacies of transformer structures, we introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. These strategies incorporate adaptive dual clipping and boundary refinement. To further amplify the versatility of our proposed approach, we extend our PTQ strategy to function as a general quantization method for transformer-based SISR techniques. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios. These results underscore the efficacy and universality of our PTQ strategy.

翻译：基于Transformer的方法通过有效提取长距离依赖关系，在单图像超分辨率任务中展现出显著潜力。然而，当前该领域的大多数研究优先考虑设计Transformer模块以捕获全局信息，却忽视了融入高频先验的重要性——我们认为这可能带来益处。在本研究中，我们进行了一系列实验，发现Transformer结构更擅长捕捉低频信息，但与卷积结构相比，在构建高频表示方面能力有限。我们提出的解决方案——交叉精炼自适应特征调制Transformer，融合了卷积结构与Transformer结构的优势。它包含三个关键组件：用于提取高频信息的高频增强残差块、用于捕获全局信息的移位矩形窗口注意力块，以及用于精炼全局表示的混合融合块。为应对Transformer结构固有的复杂性，我们引入了一种频率引导的训练后量化方法，旨在提升CRAFT的效率。这些策略融合了自适应双截断与边界精炼技术。为进一步增强所提方法的普适性，我们将训练后量化策略扩展为适用于基于Transformer的单图像超分辨率技术的通用量化方法。实验结果表明，无论在完整精度还是量化场景下，CRAFT均优于当前最先进方法。这些结果验证了我们训练后量化策略的有效性与普适性。