Reliable unmanned aerial vehicle (UAV) detection is critical for autonomous airspace monitoring but remains challenging when integrating sensor streams that differ substantially in resolution, perspective, and field of view. Conventional fusion methods-such as wavelet-, Laplacian-, and decision-level approaches-often fail to preserve spatial correspondence across modalities and suffer from annotation of inconsistencies, limiting their robustness in real-world settings. This study introduces two fusion strategies, Registration-aware Guided Image Fusion (RGIF) and Reliability-Gated Modality-Attention Fusion (RGMAF), designed to overcome these limitations. RGIF employs Enhanced Correlation Coefficient (ECC)-based affine registration combined with guided filtering to maintain thermal saliency while enhancing structural detail. RGMAF integrates affine and optical-flow registration with a reliability-weighted attention mechanism that adaptively balances thermal contrast and visual sharpness. Experiments were conducted on the Multi-Sensor and Multi-View Fixed-Wing (MMFW)-UAV dataset comprising 147,417 annotated air-to-air frames collected from infrared, wide-angle, and zoom sensors. Among single-modality detectors, YOLOv10x demonstrated the most stable cross-domain performance and was selected as the detection backbone for evaluating fused imagery. RGIF improved the visual baseline by 2.13% mAP@50 (achieving 97.65%), while RGMAF attained the highest recall of 98.64%. These findings show that registration-aware and reliability-adaptive fusion provides a robust framework for integrating heterogeneous modalities, substantially enhancing UAV detection performance in multimodal environments.
翻译:可靠的无人机检测对于自主空域监控至关重要,但在整合分辨率、视角和视场存在显著差异的传感器数据流时,该任务仍具挑战性。传统融合方法——如基于小波、拉普拉斯金字塔和决策级的方法——往往难以保持跨模态的空间对应关系,并受标注不一致性的影响,限制了其在真实场景中的鲁棒性。本研究提出了两种融合策略:注册感知引导图像融合与可靠性门控模态注意力融合,旨在克服这些局限性。RGIF采用基于增强相关系数的仿射配准结合引导滤波,在增强结构细节的同时保持热成像显著性。RGMAF则整合了仿射与光流配准,并采用可靠性加权注意力机制,自适应地平衡热成像对比度与可见光清晰度。实验在包含147,417帧从红外、广角和变焦传感器采集的带标注空对空帧的多传感器多视角固定翼无人机数据集上进行。在单模态检测器中,YOLOv10x展现出最稳定的跨域性能,因此被选为评估融合图像检测性能的骨干网络。RGIF将可见光基线模型的mAP@50提升了2.13%,达到97.65%,而RGMAF则取得了最高的召回率,达98.64%。这些结果表明,注册感知与可靠性自适应的融合为集成异构模态提供了一个鲁棒的框架,在多模态环境中显著提升了无人机检测性能。