Detecting Unmanned Aerial Vehicles (UAVs) in low-altitude environments is essential for perception and defense systems but remains highly challenging due to complex backgrounds, camouflage, and multimodal interference. In real-world scenarios, UAVs are frequently visually blended with surrounding structures such as buildings, vegetation, and power lines, resulting in low contrast, weak boundaries, and strong confusion with cluttered background textures. Existing UAV detection datasets, though diverse, are not specifically designed to capture these camouflage and complex-background challenges, which limits progress toward robust real-world perception. To fill this gap, we construct UAV-CB, a new RGB-T UAV detection dataset deliberately curated to emphasize complex low-altitude backgrounds and camouflage characteristics. Furthermore, we propose the Local Frequency Bridge Network (LFBNet), which models features in localized frequency space to bridge both the frequency-spatial fusion gap and the cross-modality discrepancy gap in RGB-T fusion. Extensive experiments on UAV-CB and public benchmarks demonstrate that LFBNet achieves state-of-the-art detection performance and strong robustness under camouflaged and cluttered conditions, offering a frequency-aware perspective on multimodal UAV perception in real-world applications.
翻译:在低空环境中检测无人机对于感知与防御系统至关重要,但由于复杂背景、伪装以及多模态干扰,该任务仍极具挑战性。在实际场景中,无人机常与建筑物、植被、电线等周围结构在视觉上融为一体,导致低对比度、弱边界,并与杂乱的背景纹理产生强烈混淆。现有的无人机检测数据集虽具多样性,但并非专门为捕捉这些伪装与复杂背景挑战而设计,这限制了面向鲁棒真实世界感知的进展。为填补这一空白,我们构建了UAV-CB——一个精心策划的新RGB-T无人机检测数据集,旨在突出低空复杂背景与伪装特性。此外,我们提出了局部频率桥接网络(LFBNet),该网络在局部频率空间中建模特征,以弥合RGB-T融合中频率-空间融合间隙与跨模态差异间隙。在UAV-CB及公开基准上的大量实验表明,LFBNet在伪装与杂乱条件下实现了最先进的检测性能和强大的鲁棒性,为实际应用中的多模态无人机感知提供了一个频率感知的视角。