In Multiple Object Tracking, objects often exhibit non-linear motion of acceleration and deceleration, with irregular direction changes. Tacking-by-detection (TBD) trackers with Kalman Filter motion prediction work well in pedestrian-dominant scenarios but fall short in complex situations when multiple objects perform non-linear and diverse motion simultaneously. To tackle the complex non-linear motion, we propose a real-time diffusion-based MOT approach named DiffMOT. Specifically, for the motion predictor component, we propose a novel Decoupled Diffusion-based Motion Predictor (D$^2$MP). It models the entire distribution of various motion presented by the data as a whole. It also predicts an individual object's motion conditioning on an individual's historical motion information. Furthermore, it optimizes the diffusion process with much fewer sampling steps. As a MOT tracker, the DiffMOT is real-time at 22.7FPS, and also outperforms the state-of-the-art on DanceTrack and SportsMOT datasets with $62.3\%$ and $76.2\%$ in HOTA metrics, respectively. To the best of our knowledge, DiffMOT is the first to introduce a diffusion probabilistic model into the MOT to tackle non-linear motion prediction.
翻译:在多目标跟踪中,目标常呈现加速或减速的非线性运动模式,伴随不规则方向变化。基于检测的跟踪方法(TBD)结合卡尔曼滤波运动预测在行人主导场景中表现良好,但当多个目标同时进行非线性且多样化的运动时,其在复杂场景下效果有限。为解决复杂的非线性运动问题,我们提出一种名为DiffMOT的实时扩散式多目标跟踪方法。具体而言,在运动预测组件中,我们提出一种新颖的解耦扩散运动预测器(D$^2$MP)。该预测器整体建模数据呈现的各种运动分布,并基于单个目标的历史运动信息预测其当前运动状态。此外,它通过更少的采样步骤优化扩散过程。作为多目标跟踪器,DiffMOT以22.7FPS实现实时性能,在DanceTrack和SportsMOT数据集上的HOTA指标分别达到$62.3\%$和$76.2\%$,超越现有最优方法。据我们所知,DiffMOT是首个将扩散概率模型引入多目标跟踪以解决非线性运动预测的工作。