In this paper, we address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IOU association methods ineffective but also introduce significant view shifts in the objects, which complicates tracking. To overcome these issues, we propose a novel universal HomView-MOT framework, which for the first time, harnesses the view Homography inherent in changing scenes to solve MOT challenges in moving environments, incorporating Homographic Matching and View-Centric concepts. We introduce a Fast Homography Estimation (FHE) algorithm for rapid computation of Homography matrices between video frames, enabling object View-Centric ID Learning (VCIL) and leveraging multi-view Homography to learn cross-view ID features. Concurrently, our Homographic Matching Filter (HMF) maps object bounding boxes from different frames onto a common view plane for a more realistic physical IOU association. Extensive experiments have proven that these innovations allow HomView-MOT to achieve state-of-the-art performance on prominent UAV MOT datasets VisDrone and UAVDT.
翻译:本文研究了运动无人机(UAV)场景下的多目标跟踪(MOT)挑战。在该场景中,悬停、左右转弯及上下运动等不规则飞行轨迹导致其复杂度远高于固定摄像头MOT。具体而言,场景背景的变化不仅使传统帧间目标交并比关联方法失效,还引入显著的目标视角偏移,进一步加剧了跟踪难度。为解决上述问题,我们提出一种新颖的通用HomView-MOT框架,该框架首次利用变化场景中固有的视角单应性解决运动环境中的MOT挑战,融合了单应匹配与视角中心概念。我们提出快速单应估计(FHE)算法以快速计算视频帧间的单应矩阵,实现目标视角中心身份学习(VCIL),并利用多视角单应性学习跨视角身份特征。同时,我们设计的单应匹配滤波器将不同帧的目标边界框映射至统一视角平面,实现更符合物理实际的交并比关联。大量实验证明,这些创新使HomView-MOT在主流无人机MOT数据集VisDrone与UAVDT上均达到最优性能。