Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still falls short compared to their supervised counterparts. We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs. Such formulations do not capture sufficient visual appearance variations to facilitate learning consistent re-identification features for autonomous driving when the frame rate is low or object dynamics are high. In this work, we propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames by enforcing consistent association scores across short and long timescales. We perform extensive evaluations demonstrating that re-identification features trained from longer sequences significantly reduce ID switches on standard autonomous driving datasets compared to existing self-supervised learning methods, which are limited to training on frame pairs. Using our proposed SubCo loss function, we set the new state-of-the-art among self-supervised methods and even perform on par with fully supervised learning methods.
翻译:自监督多目标跟踪器具备巨大潜力,因其能够从原始领域特定数据中学习。然而,其重识别精度仍逊于监督式方法。我们假设这一缺陷源于自监督目标函数局限于单帧或帧对设计。此类公式在帧率较低或物体动态较高时,无法捕捉充分的视觉外观变化以促进自动驾驶中一致性重识别特征的学习。本文提出一种训练目标函数,通过强制短时间尺度与长时间尺度上的关联分数一致性,从而实现从多连续帧中自监督学习重识别特征。大量实验表明,相较于仅限帧对训练的现有自监督方法,基于更长序列训练的重识别特征显著减少了标准自动驾驶数据集上的身份切换次数。采用我们提出的SubCo损失函数,该方法在自监督方法中达到全新最优水平,甚至与全监督学习方法性能相当。