The rapid proliferation of unmanned aerial vehicles (UAVs) has highlighted the importance of robust and efficient object detection in diverse aerial scenarios. Detecting small objects under complex conditions, however, remains a significant challenge.To address this, we present DGE-YOLO, an enhanced YOLO-based detection framework designed to effectively fuse multi-modal information. We introduce a dual-branch architecture for modality-specific feature extraction, enabling the model to process both infrared and visible images. To further enrich semantic representation, we propose an Efficient Multi-scale Attention (EMA) mechanism that enhances feature learning across spatial scales. Additionally, we replace the conventional neck with a Gather-and-Distribute(GD) module to mitigate information loss during feature aggregation. Extensive experiments on the Drone Vehicle dataset demonstrate that DGE-YOLO achieves superior performance over state-of-the-art methods, validating its effectiveness in multi-modal UAV object detection tasks.
翻译:无人机的快速普及凸显了在各种空中场景下实现鲁棒高效目标检测的重要性。然而,在复杂条件下检测小目标仍然是一个重大挑战。为此,我们提出了DGE-YOLO,一种增强的基于YOLO的检测框架,旨在有效融合多模态信息。我们引入了一种用于模态特定特征提取的双分支架构,使模型能够同时处理红外与可见光图像。为了进一步丰富语义表示,我们提出了一种高效多尺度注意力机制,以增强跨空间尺度的特征学习。此外,我们用汇聚与分发模块替代了传统的颈部结构,以减轻特征聚合过程中的信息损失。在Drone Vehicle数据集上进行的大量实验表明,DGE-YOLO的性能优于现有最先进方法,验证了其在多模态无人机目标检测任务中的有效性。