Transformer-based Super-Resolution (SR) models have recently advanced image reconstruction quality, yet challenges remain due to computational complexity and an over-reliance on large patch sizes, which constrain fine-grained detail enhancement. In this work, we propose TaylorIR to address these limitations by utilizing a patch size of 1x1, enabling pixel-level processing in any transformer-based SR model. To address the significant computational demands under the traditional self-attention mechanism, we employ the TaylorShift attention mechanism, a memory-efficient alternative based on Taylor series expansion, achieving full token-to-token interactions with linear complexity. Experimental results demonstrate that our approach achieves new state-of-the-art SR performance while reducing memory consumption by up to 60% compared to traditional self-attention-based transformers.
翻译:基于Transformer的超分辨率模型近期显著提升了图像重建质量,但受限于计算复杂度及对大尺寸图像块的过度依赖,其在增强细粒度细节方面仍面临挑战。本研究提出TaylorIR模型以应对这些局限:通过采用1x1的块尺寸,该模型可在任何基于Transformer的超分辨率架构中实现像素级处理。为缓解传统自注意力机制下的巨大计算负担,我们引入基于泰勒级数展开的内存高效替代方案——TaylorShift注意力机制,该机制能以线性复杂度实现完整的词元间交互。实验结果表明,所提方法在取得超分辨率性能新最优水平的同时,相较于传统基于自注意力的Transformer模型,内存消耗降低最高达60%。