Transmission line defect detection remains challenging for automated UAV inspection due to the dominance of small-scale defects, complex backgrounds, and illumination variations. Existing RGB-based detectors, despite recent progress, struggle to distinguish geometrically subtle defects from visually similar background structures under limited chromatic contrast. This paper proposes CMAFNet, a Cross-Modal Alignment and Fusion Network that integrates RGB appearance and depth geometry through a principled purify-then-fuse paradigm. CMAFNet consists of a Semantic Recomposition Module that performs dictionary-based feature purification via a learned codebook to suppress modality-specific noise while preserving defect-discriminative information, and a Contextual Semantic Integration Framework that captures global spatial dependencies using partial-channel attention to enhance structural semantic reasoning. Position-wise normalization within the purification stage enforces explicit reconstruction-driven cross-modal alignment, ensuring statistical compatibility between heterogeneous features prior to fusion. Extensive experiments on the TLRGBD benchmark, where 94.5% of instances are small objects, demonstrate that CMAFNet achieves 32.2% mAP@50 and 12.5% APs, outperforming the strongest baseline by 9.8 and 4.0 percentage points, respectively. A lightweight variant reaches 24.8% mAP50 at 228 FPS with only 4.9M parameters, surpassing all YOLO-based detectors while matching transformer-based methods at substantially lower computational cost.
翻译:输电线路缺陷检测在自动化无人机巡检中仍面临挑战,主要由于小尺度缺陷占主导、背景复杂以及光照变化。尽管近期取得进展,现有基于RGB的检测器在色彩对比度有限的情况下,难以将几何结构细微的缺陷与视觉相似的背景结构区分开来。本文提出CMAFNet,一种跨模态对齐与融合网络,通过原则性的“先净化后融合”范式整合RGB外观与深度几何信息。CMAFNet包含语义重组模块与上下文语义集成框架:前者通过学习得到的码本进行基于字典的特征净化,以抑制模态特定噪声同时保留缺陷判别信息;后者利用部分通道注意力捕获全局空间依赖关系,以增强结构语义推理。净化阶段采用逐位置归一化,强制执行显式的重建驱动跨模态对齐,确保异质特征在融合前具备统计兼容性。在TLRGBD基准测试上的大量实验表明(其中94.5%的实例为小目标),CMAFNet实现了32.2%的mAP@50与12.5%的APs,分别超越最强基线9.8与4.0个百分点。其轻量化变体仅需4.9M参数,在228 FPS下达到24.8%的mAP50,超越所有基于YOLO的检测器,同时以显著更低的计算成本达到与基于Transformer的方法相当的性能。