Without manually annotated identities, unsupervised multi-object trackers are inferior to learning reliable feature embeddings. It causes the similarity-based inter-frame association stage also be error-prone, where an uncertainty problem arises. The frame-by-frame accumulated uncertainty prevents trackers from learning the consistent feature embedding against time variation. To avoid this uncertainty problem, recent self-supervised techniques are adopted, whereas they failed to capture temporal relations. The interframe uncertainty still exists. In fact, this paper argues that though the uncertainty problem is inevitable, it is possible to leverage the uncertainty itself to improve the learned consistency in turn. Specifically, an uncertainty-based metric is developed to verify and rectify the risky associations. The resulting accurate pseudo-tracklets boost learning the feature consistency. And accurate tracklets can incorporate temporal information into spatial transformation. This paper proposes a tracklet-guided augmentation strategy to simulate tracklets' motion, which adopts a hierarchical uncertainty-based sampling mechanism for hard sample mining. The ultimate unsupervised MOT framework, namely U2MOT, is proven effective on MOT-Challenges and VisDrone-MOT benchmark. U2MOT achieves a SOTA performance among the published supervised and unsupervised trackers.
翻译:在缺乏人工标注身份信息的情况下,无监督多目标跟踪器难以学习到可靠的特征嵌入。这使得基于相似性的帧间关联阶段容易出错,从而引发不确定性问题。逐帧累积的不确定性阻碍了跟踪器学习随时间变化的一致性特征嵌入。为避免该不确定性问题,现有自监督技术虽被采用,却未能有效捕捉时序关系,帧间不确定性依然存在。事实上,本文论证:尽管不确定性问题无法避免,但可借助不确定性本身反过来提升所学特征的一致性。具体而言,我们开发了一种基于不确定性的度量方法,用于验证和修正高风险关联;由此产生的精确伪轨迹可增强特征一致性的学习,同时准确轨迹能将时序信息融入空间变换。本文提出一种轨迹引导的数据增强策略来模拟轨迹运动,该策略采用分层不确定性采样机制进行难例挖掘。最终提出的无监督MOT框架U2MOT在MOT-Challenges和VisDrone-MOT基准测试中证明了有效性,在已发表的有监督与无监督跟踪器中均达到当前最优性能。