The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images. Unlike a visible light camera, a thermal imager reveals an object's temperature distribution by capturing infrared radiation. Small targets often show a subtle temperature transition at the object's boundaries. To address this issue, we propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects. Specifically, we design a Perona-Malik diffusion (PMD)-based block and incorporate it into multiple levels of SAM's encoder to help it capture essential structural features while suppressing noise. Additionally, we devise a Granularity-Aware Decoder (GAD) to fuse the multi-granularity feature from the encoder to capture structural information that may be lost in long-distance modeling. Extensive experiments on the public datasets, including NUAA-SIRST, NUDT-SIRST, and IRSTD-1K, validate the design choice of IRSAM and its significant superiority over representative state-of-the-art methods. The source code are available at: github.com/IPIC-Lab/IRSAM.
翻译:近期提出的Segment Anything Model (SAM)在自然图像分割领域取得重大进展,展现出适用于多种下游图像分割任务的强大零样本性能。然而,由于自然图像与红外图像之间存在显著领域差异,直接使用预训练的SAM处理红外小目标检测任务难以获得理想性能。与可见光相机不同,热成像仪通过捕捉红外辐射来呈现物体的温度分布特征。小目标在物体边界处常呈现细微的温度过渡特征。为解决此问题,我们提出面向红外小目标检测的IRSAM模型,通过改进SAM的编码器-解码器架构以学习更优的红外小目标特征表示。具体而言,我们设计了基于Perona-Malik扩散的模块,并将其嵌入SAM编码器的多个层级,以帮助模型在抑制噪声的同时捕获关键结构特征。此外,我们构建了粒度感知解码器,用于融合编码器输出的多粒度特征,以捕捉长距离建模中可能丢失的结构信息。在NUAA-SIRST、NUDT-SIRST和IRSTD-1K等公开数据集上的大量实验验证了IRSAM的设计有效性,其性能显著优于当前最具代表性的先进方法。源代码发布于:github.com/IPIC-Lab/IRSAM。