Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attentive mechanism intrinsic to the Transformer architecture results in images being treated as one-dimensional sequences, thereby neglecting their inherent two-dimensional structure. Moreover, infrared images exhibit a uniform pixel distribution and a limited gradient range, posing challenges for the model to capture effective feature information. Consequently, we suggest a potent Transformer model, termed Large Kernel Transformer (LKFormer), to address this issue. Specifically, we have designed a Large Kernel Residual Depth-wise Convolutional Attention (LKRDA) module with linear complexity. This mainly employs depth-wise convolution with large kernels to execute non-local feature modeling, thereby substituting the standard self-attentive layer. Additionally, we have devised a novel feed-forward network structure called Gated-Pixel Feed-Forward Network (GPFN) to augment the LKFormer's capacity to manage the information flow within the network. Comprehensive experimental results reveal that our method surpasses the most advanced techniques available, using fewer parameters and yielding considerably superior performance.
翻译:鉴于红外技术在众多领域的广泛应用,基于深度学习的红外图像超分辨率技术研究日益受到重视。尽管当前基于Transformer的方法在图像超分辨率任务中取得了显著成果,但其依赖Transformer架构固有的自注意力机制,导致图像被视作一维序列处理,从而忽略了其固有的二维结构。此外,红外图像呈现均匀的像素分布与有限的梯度范围,这为模型捕捉有效特征信息带来挑战。为此,我们提出一种名为大核Transformer(Large Kernel Transformer,LKFormer)的强效Transformer模型以解决该问题。具体而言,我们设计了一种线性复杂度的深度大核残差卷积注意力模块(LKRDA),该模块主要利用大核深度卷积实现非局部特征建模,从而替代标准自注意力层。同时,我们提出了一种新型前馈网络结构——门控像素前馈网络(GPFN),以增强LKFormer对网络内部信息流的调控能力。综合实验结果表明,我们的方法在参数更少的情况下超越了现有最先进技术,并取得了显著更优的性能。