The development of multi-modal object detection for Unmanned Aerial Vehicles (UAVs) typically relies on a large amount of pixel-aligned multi-modal image data. However, existing datasets face challenges such as limited modalities, high construction costs, and imprecise annotations. To this end, we propose a synthetic multi-modal UAV-based object detection dataset, UEMM-Air. Specially, we simulate various UAV flight scenarios and object types using the Unreal Engine (UE). Then we design the UAV's flight logic to automatically collect data from different scenarios, perspectives, and altitudes. Finally, we propose a novel heuristic automatic annotation algorithm to generate accurate object detection labels. In total, our UEMM-Air consists of 20k pairs of images with 5 modalities and precise annotations. Moreover, we conduct numerous experiments and establish new benchmark results on our dataset. We found that models pre-trained on UEMM-Air exhibit better performance on downstream tasks compared to other similar datasets. The dataset is publicly available (https://github.com/1e12Leon/UEMM-Air) to support the research of multi-modal UAV object detection models.
翻译:无人机多模态目标检测的发展通常依赖于大量像素对齐的多模态图像数据。然而,现有数据集面临模态有限、构建成本高昂以及标注不精确等挑战。为此,我们提出了一个基于无人机的合成多模态目标检测数据集UEMM-Air。具体而言,我们利用虚幻引擎模拟了多种无人机飞行场景与目标类型。随后,我们设计了无人机的飞行逻辑,以自动采集不同场景、视角与海拔高度的数据。最后,我们提出了一种新颖的启发式自动标注算法,用于生成精确的目标检测标签。总体而言,我们的UEMM-Air包含2万对图像,涵盖5种模态并提供精确标注。此外,我们在该数据集上进行了大量实验并建立了新的基准测试结果。我们发现,与其它类似数据集相比,在UEMM-Air上预训练的模型在下游任务中表现出更优的性能。该数据集已公开(https://github.com/1e12Leon/UEMM-Air),以支持多模态无人机目标检测模型的研究。