This technical report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge. This challenge faces difficulties in drone detection, UAV-type classification and 2D/3D trajectory estimation in extreme weather conditions with multi-modal sensor information, including stereo vision, various Lidars, Radars, and audio arrays. Leveraging this information, we propose a multi-modal UAV detection, classification, and 3D tracking method for accurate UAV classification and tracking. A novel classification pipeline which incorporates sequence fusion, region of interest (ROI) cropping, and keyframe selection is proposed. Our system integrates cutting-edge classification techniques and sophisticated post-processing steps to boost accuracy and robustness. The designed pose estimation pipeline incorporates three modules: dynamic points analysis, a multi-object tracker, and trajectory completion techniques. Extensive experiments have validated the effectiveness and precision of our approach. In addition, we also propose a novel dataset pre-processing method and conduct a comprehensive ablation study for our design. We finally achieved the best performance in the classification and tracking of the MMUAD dataset. The code and configuration of our method are available at https://github.com/dtc111111/Multi-Modal-UAV.
翻译:本技术报告介绍了在CVPR 2024无人机跟踪与姿态估计挑战赛UG2+任务中获得第一名的模型。该挑战赛面临在极端天气条件下,利用多模态传感器信息(包括立体视觉、多种激光雷达、雷达和音频阵列)进行无人机检测、无人机类型分类以及2D/3D轨迹估计的难题。基于这些信息,我们提出了一种多模态无人机检测、分类与3D跟踪方法,以实现精确的无人机分类与跟踪。我们提出了一种新颖的分类流程,该流程融合了序列融合、感兴趣区域裁剪和关键帧选择技术。我们的系统集成了先进的分类技术和复杂的后处理步骤,以提升准确性和鲁棒性。所设计的姿态估计流程包含三个模块:动态点分析、多目标跟踪器和轨迹补全技术。大量实验验证了我们方法的有效性和精确性。此外,我们还提出了一种新颖的数据集预处理方法,并针对我们的设计进行了全面的消融研究。我们最终在MMUAD数据集的分类与跟踪任务中取得了最佳性能。我们方法的代码与配置发布于 https://github.com/dtc111111/Multi-Modal-UAV。