Beyond Kalman Filters: Deep Learning-Based Filters for Improved Object Tracking

Traditional tracking-by-detection systems typically employ Kalman filters (KF) for state estimation. However, the KF requires domain-specific design choices and it is ill-suited to handling non-linear motion patterns. To address these limitations, we propose two innovative data-driven filtering methods. Our first method employs a Bayesian filter with a trainable motion model to predict an object's future location and combines its predictions with observations gained from an object detector to enhance bounding box prediction accuracy. Moreover, it dispenses with most domain-specific design choices characteristic of the KF. The second method, an end-to-end trainable filter, goes a step further by learning to correct detector errors, further minimizing the need for domain expertise. Additionally, we introduce a range of motion model architectures based on Recurrent Neural Networks, Neural Ordinary Differential Equations, and Conditional Neural Processes, that are combined with the proposed filtering methods. Our extensive evaluation across multiple datasets demonstrates that our proposed filters outperform the traditional KF in object tracking, especially in the case of non-linear motion patterns -- the use case our filters are best suited to. We also conduct noise robustness analysis of our filters with convincing positive results. We further propose a new cost function for associating observations with tracks. Our tracker, which incorporates this new association cost with our proposed filters, outperforms the conventional SORT method and other motion-based trackers in multi-object tracking according to multiple metrics on motion-rich DanceTrack and SportsMOT datasets.

翻译：传统的检测-跟踪系统通常采用卡尔曼滤波（KF）进行状态估计。然而，卡尔曼滤波需要依赖特定领域的先验设计，且难以处理非线性运动模式。为解决这些局限性，我们提出了两种创新的数据驱动滤波方法。第一种方法采用具有可训练运动模型的贝叶斯滤波器来预测目标未来位置，并将其预测结果与目标检测器获得的观测值相结合，以提升边界框预测精度。此外，该方法摒弃了卡尔曼滤波所特有的领域特定设计。第二种方法——端到端可训练的滤波器——更进一步，通过学习修正检测器误差，进一步降低了对领域专业知识的需求。同时，我们引入了一系列基于循环神经网络、神经常微分方程和条件神经过程的运动模型架构，并与所提出的滤波方法相结合。我们在多个数据集上的广泛评估表明，所提出的滤波器在目标跟踪中优于传统卡尔曼滤波，尤其在非线性运动模式下——这正是我们滤波器最擅长的场景。我们还对所提滤波器进行了噪声鲁棒性分析，取得了令人信服的积极结果。此外，我们提出了新的轨迹关联代价函数。将这种新关联代价与所提滤波器结合的跟踪器，在运动丰富的DanceTrack和SportsMOT数据集上，根据多项指标在多目标跟踪中优于传统的SORT方法及其他基于运动的跟踪器。