Infrared small target detection (IRSTD) faces significant challenges due to the low signal-to-noise ratio (SNR), small target size, and complex cluttered backgrounds. Although recent DETR-based detectors benefit from global context modeling, they exhibit notable performance degradation on IRSTD. We revisit this phenomenon and reveal that the target-relevant embeddings of IRST are inevitably overwhelmed by dominant background features due to the self-attention mechanism, leading to unreliable query initialization and inaccurate target localization. To address this issue, we propose SEF-DETR, a novel framework that refines query initialization for IRSTD. Specifically, SEF-DETR consists of three components: Frequency-guided Patch Screening (FPS), Dynamic Embedding Enhancement (DEE), and Reliability-Consistency-aware Fusion (RCF). The FPS module leverages the Fourier spectrum of local patches to construct a target-relevant density map, suppressing background-dominated features. DEE strengthens multi-scale representations in a target-aware manner, while RCF further refines object queries by enforcing spatial-frequency consistency and reliability. Extensive experiments on three public IRSTD datasets demonstrate that SEF-DETR achieves superior detection performance compared to state-of-the-art methods, delivering a robust and efficient solution for infrared small target detection task.
翻译:红外小目标检测(IRSTD)由于信噪比低、目标尺寸小以及背景杂波复杂而面临重大挑战。尽管近期基于DETR的检测器受益于全局上下文建模,但在IRSTD任务上表现出明显的性能下降。我们重新审视这一现象,揭示出由于自注意力机制,红外小目标相关的嵌入表示不可避免地会被占主导地位的背景特征所淹没,从而导致不可靠的查询初始化和不准确的目标定位。为解决此问题,我们提出了SEF-DETR,一个为IRSTD优化查询初始化的新颖框架。具体而言,SEF-DETR包含三个组件:频率引导的局部块筛选(FPS)、动态嵌入增强(DEE)以及可靠性-一致性感知融合(RCF)。FPS模块利用局部块的傅里叶频谱构建目标相关密度图,以抑制背景主导的特征。DEE以目标感知的方式增强多尺度表示,而RCF则通过强制执行空间-频率一致性与可靠性来进一步精炼对象查询。在三个公开IRSTD数据集上的大量实验表明,与现有先进方法相比,SEF-DETR实现了更优的检测性能,为红外小目标检测任务提供了一个鲁棒且高效的解决方案。