This paper introduces a novel approach to video object detection detection and tracking on Unmanned Aerial Vehicles (UAVs). By incorporating metadata, the proposed approach creates a memory map of object locations in actual world coordinates, providing a more robust and interpretable representation of object locations in both, image space and the real world. We use this representation to boost confidences, resulting in improved performance for several temporal computer vision tasks, such as video object detection, short and long-term single and multi-object tracking, and video anomaly detection. These findings confirm the benefits of metadata in enhancing the capabilities of UAVs in the field of temporal computer vision and pave the way for further advancements in this area.
翻译:本文提出了一种针对无人机(UAV)视频目标检测与跟踪的新方法。通过融合元数据,所提方法在真实世界坐标中构建了目标位置的记忆地图,从而在图像空间和真实世界中提供更稳健且可解释的目标位置表示。我们利用该表示来增强置信度,从而在多项时序计算机视觉任务(如视频目标检测、短时与长时单目标及多目标跟踪、视频异常检测)中取得性能提升。这些发现证实了元数据在增强无人机时序计算机视觉能力方面的优势,并为该领域的进一步发展奠定了基础。