Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attentive mechanism intrinsic to the Transformer architecture results in images being treated as one-dimensional sequences, thereby neglecting their inherent two-dimensional structure. Moreover, infrared images exhibit a uniform pixel distribution and a limited gradient range, posing challenges for the model to capture effective feature information. Consequently, we suggest a potent Transformer model, termed Large Kernel Transformer (LKFormer), to address this issue. Specifically, we have designed a Large Kernel Residual Attention (LKRA) module with linear complexity. This mainly employs depth-wise convolution with large kernels to execute non-local feature modeling, thereby substituting the standard self-attentive layer. Additionally, we have devised a novel feed-forward network structure called Gated-Pixel Feed-Forward Network (GPFN) to augment the LKFormer's capacity to manage the information flow within the network. Comprehensive experimental results reveal that our method surpasses the most advanced techniques available, using fewer parameters and yielding considerably superior performance.The source code will be available at https://github.com/sad192/large-kernel-Transformer.
翻译:鉴于红外技术在多个领域的广泛应用,在深度学习框架中研究红外图像超分辨率技术日益受到关注。尽管当前基于Transformer的方法在图像超分辨率任务中取得了令人瞩目的成果,但其依赖Transformer架构固有的自注意力机制,导致图像被视作一维序列,从而忽视了其固有的二维结构。此外,红外图像表现出均匀的像素分布与有限的梯度范围,这给模型捕获有效特征信息带来了挑战。为此,我们提出了一种强大的Transformer模型——大核Transformer(LKFormer)以解决该问题。具体而言,我们设计了一个具有线性复杂度的大核残差注意力模块(LKRA),该模块主要利用大核深度可分离卷积执行非局域特征建模,从而替代标准的自注意力层。此外,我们还设计了一种新颖的前馈网络结构——门控像素前馈网络(GPFN),以增强LKFormer管理网络内部信息流的能力。综合实验结果表明,我们的方法以更少的参数超越了现有最先进技术,并取得了显著更优的性能。源代码将发布于https://github.com/sad192/large-kernel-Transformer。