基于无人机的非对齐双模态显著目标检测的高效傅里叶滤波网络与对比学习 (Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection)

Unmanned aerial vehicle (UAV)-based bi-modal salient object detection (BSOD) aims to segment salient objects in a scene utilizing complementary cues in unaligned RGB and thermal image pairs. However, the high computational expense of existing UAV-based BSOD models limits their applicability to real-world UAV devices. To address this problem, we propose an efficient Fourier filter network with contrastive learning that achieves both real-time and accurate performance. Specifically, we first design a semantic contrastive alignment loss to align the two modalities at the semantic level, which facilitates mutual refinement in a parameter-free way. Second, inspired by the fast Fourier transform that obtains global relevance in linear complexity, we propose synchronized alignment fusion, which aligns and fuses bi-modal features in the channel and spatial dimensions by a hierarchical filtering mechanism. Our proposed model, AlignSal, reduces the number of parameters by 70.0%, decreases the floating point operations by 49.4%, and increases the inference speed by 152.5% compared to the cutting-edge BSOD model (i.e., MROS). Extensive experiments on the UAV RGB-T 2400 and three weakly aligned datasets demonstrate that AlignSal achieves both real-time inference speed and better performance and generalizability compared to sixteen state-of-the-art BSOD models across most evaluation metrics. In addition, our ablation studies further verify AlignSal's potential in boosting the performance of existing aligned BSOD models on UAV-based unaligned data. The code is available at: https://github.com/JoshuaLPF/AlignSal.

翻译：基于无人机的双模态显著目标检测旨在利用未对齐的RGB与热成像图像对中的互补线索，对场景中的显著目标进行分割。然而，现有基于无人机的BSOD模型的高计算成本限制了其在真实世界无人机设备上的适用性。为解决此问题，我们提出了一种结合对比学习的高效傅里叶滤波网络，以实现实时且准确的性能。具体而言，我们首先设计了一种语义对比对齐损失，在语义层面对齐两种模态，从而以无参数的方式促进相互优化。其次，受快速傅里叶变换能以线性复杂度获取全局相关性的启发，我们提出了同步对齐融合，通过分层滤波机制在通道和空间维度上对齐并融合双模态特征。我们提出的模型AlignSal与前沿的BSOD模型（即MROS）相比，参数量减少了70.0%，浮点运算量降低了49.4%，推理速度提升了152.5%。在UAV RGB-T 2400和三个弱对齐数据集上的大量实验表明，与十六种最先进的BSOD模型相比，AlignSal在大多数评估指标上均实现了实时推理速度以及更优的性能和泛化能力。此外，我们的消融研究进一步验证了AlignSal在提升现有对齐BSOD模型在基于无人机的非对齐数据上性能的潜力。代码发布于：https://github.com/JoshuaLPF/AlignSal。