Visual detection of Micro Air Vehicles (MAVs) has attracted increasing attention in recent years due to its important application in various tasks. The existing methods for MAV detection assume that the training set and testing set have the same distribution. As a result, when deployed in new domains, the detectors would have a significant performance degradation due to domain discrepancy. In this paper, we study the problem of cross-domain MAV detection. The contributions of this paper are threefold. 1) We propose a Multi-MAV-Multi-Domain (M3D) dataset consisting of both simulation and realistic images. Compared to other existing datasets, the proposed one is more comprehensive in the sense that it covers rich scenes, diverse MAV types, and various viewing angles. A new benchmark for cross-domain MAV detection is proposed based on the proposed dataset. 2) We propose a Noise Suppression Network (NSN) based on the framework of pseudo-labeling and a large-to-small training procedure. To reduce the challenging pseudo-label noises, two novel modules are designed in this network. The first is a prior-based curriculum learning module for allocating adaptive thresholds for pseudo labels with different difficulties. The second is a masked copy-paste augmentation module for pasting truly-labeled MAVs on unlabeled target images and thus decreasing pseudo-label noises. 3) Extensive experimental results verify the superior performance of the proposed method compared to the state-of-the-art ones. In particular, it achieves mAP of 46.9%(+5.8%), 50.5%(+3.7%), and 61.5%(+11.3%) on the tasks of simulation-to-real adaptation, cross-scene adaptation, and cross-camera adaptation, respectively.
翻译:微型飞行器的视觉检测因其在各类任务中的重要性,近年来受到越来越多的关注。现有微型飞行器检测方法均假设训练集与测试集服从相同分布,导致部署到新领域时因域差异而出现显著性能下降。本文针对跨域微型飞行器检测问题展开研究,主要贡献包含三方面:1)提出包含仿真与真实图像的多微型飞行器-多域(M3D)数据集,相较现有数据集,该数据集覆盖丰富场景、多种微型飞行器类型及不同视角,更具全面性,并基于此构建跨域微型飞行器检测新基准;2)提出基于伪标签框架与由粗到精训练流程的噪声抑制网络(NSN),为降低伪标签噪声,网络设计两个创新模块:先验引导课程学习模块,用于为不同难度的伪标签分配自适应阈值;掩码复制粘贴增强模块,将真实标签微型飞行器粘贴至未标记目标图像以减少伪标签噪声;3)大量实验验证了所提方法相较于现有最优方法的优越性,具体在仿真到真实域适应、跨场景适应和跨相机适应任务中分别达到46.9%(+5.8%)、50.5%(+3.7%)和61.5%(+11.3%)的平均精度均值。