Computer vision-based object detection is a key modality for advanced Detect-And-Avoid systems that allow for autonomous flight missions of UAVs. While standard object detection frameworks do not predict the actual depth of an object, this information is crucial to avoid collisions. In this paper, we propose several novel extensions to state-of-the-art methods for monocular object detection from images at long range. Firstly, we propose Sigmoid and ReLU-like encodings when modeling depth estimation as a regression task. Secondly, we frame the depth estimation as a classification problem and introduce a Soft-Argmax function in the calculation of the training loss. The extensions are exemplarily applied to the YOLOX object detection framework. We evaluate the performance using the Amazon Airborne Object Tracking dataset. In addition, we introduce the Fitness score as a new metric that jointly assesses both object detection and depth estimation performance. Our results show that the proposed methods outperform state-of-the-art approaches w.r.t. existing, as well as the proposed metrics.
翻译:基于计算机视觉的目标检测是实现先进探测与规避系统的关键模态,该系统支持无人机自主飞行任务。尽管标准的目标检测框架无法预测目标的实际深度,但这一信息对避免碰撞至关重要。本文针对图像远距离单目目标检测的现有最优方法提出了多项创新扩展。首先,在将深度估计建模为回归任务时,我们提出了Sigmoid和类ReLU编码方法。其次,我们将深度估计构建为分类问题,并在训练损失计算中引入Soft-Argmax函数。这些扩展以YOLOX目标检测框架为例进行验证。我们使用Amazon机载目标跟踪数据集评估性能,同时引入了Fitness评分作为联合评估目标检测与深度估计性能的新指标。实验结果表明,所提方法在现有指标及新提出指标上均优于当前最优方法。