SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection

Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models. However, largely overlooking effective global information modeling, existing techniques struggle when the target has high similarities with the background. We present a Spatial-channel Cross Transformer Network (SCTransNet) that leverages spatial-channel cross transformer blocks (SCTBs) on top of long-range skip connections to address the aforementioned challenge. In the proposed SCTBs, the outputs of all encoders are interacted with cross transformer to generate mixed features, which are redistributed to all decoders to effectively reinforce semantic differences between the target and clutter at full scales. Specifically, SCTB contains the following two key elements: (a) spatial-embedded single-head channel-cross attention (SSCA) for exchanging local spatial features and full-level global channel information to eliminate ambiguity among the encoders and facilitate high-level semantic associations of the images, and (b) a complementary feed-forward network (CFN) for enhancing the feature discriminability via a multi-scale strategy and cross-spatial-channel information interaction to promote beneficial information transfer. Our SCTransNet effectively encodes the semantic differences between targets and backgrounds to boost its internal representation for detecting small infrared targets accurately. Extensive experiments on three public datasets, NUDT-SIRST, NUAA-SIRST, and IRSTD-1k, demonstrate that the proposed SCTransNet outperforms existing IRSTD methods. Our code will be made public at https://github.com/xdFai.

翻译：红外小目标检测（IRSTD）近年来极大受益于U形神经模型。然而，现有技术因大多忽视有效的全局信息建模，在目标与背景具有高度相似性时效果不佳。我们提出一种空间通道交叉Transformer网络（SCTransNet），通过在长程跳跃连接上引入空间通道交叉Transformer模块（SCTB）来解决上述挑战。在提出的SCTB中，所有编码器的输出通过交叉Transformer交互生成混合特征，这些特征被重新分配至所有解码器，从而在全尺度上有效增强目标与杂波之间的语义差异。具体而言，SCTB包含以下两个关键要素：（a）空间嵌入单头通道交叉注意力（SSCA），用于交换局部空间特征与全层级全局通道信息，消除编码器间的模糊性并促进图像高层语义关联；（b）互补前馈网络（CFN），通过多尺度策略与跨空间通道信息交互增强特征判别性，促进有益信息传递。我们的SCTransNet有效编码了目标与背景间的语义差异，增强其内部表征能力以实现红外小目标的精确检测。在三个公开数据集NUDT-SIRST、NUAA-SIRST和IRSTD-1k上的大量实验表明，所提出的SCTransNet优于现有IRSTD方法。我们的代码将开源至https://github.com/xdFai。