Following the tracking-by-attention paradigm, this paper introduces an object-centric, transformer-based framework for tracking in 3D. Traditional model-based tracking approaches incorporate the geometric effect of object- and ego motion between frames with a geometric motion model. Inspired by this, we propose S.T.A.R.-Track, which uses a novel latent motion model (LMM) to additionally adjust object queries to account for changes in viewing direction and lighting conditions directly in the latent space, while still modeling the geometric motion explicitly. Combined with a novel learnable track embedding that aids in modeling the existence probability of tracks, this results in a generic tracking framework that can be integrated with any query-based detector. Extensive experiments on the nuScenes benchmark demonstrate the benefits of our approach, showing state-of-the-art performance for DETR3D-based trackers while drastically reducing the number of identity switches of tracks at the same time.
翻译:遵循跟踪-注意力(tracking-by-attention)范式,本文提出了一种面向目标且基于Transformer的三维跟踪框架。传统基于模型的跟踪方法通过几何运动模型对帧间目标运动与自车运动产生的几何效应进行建模。受此启发,我们提出S.T.A.R.-Track,该框架采用新型潜在运动模型(LMM),在显式建模几何运动的同时,额外在潜在空间中直接调整目标查询(object queries)以应对视角变化与光照条件改变。结合一种新型可学习轨迹嵌入(track embedding)以辅助对轨迹存在概率的建模,最终形成可集成于任何基于查询的检测器的通用跟踪框架。在nuScenes基准上的大量实验验证了本方法的优势,基于DETR3D的跟踪器在实现最先进性能的同时,还显著减少了轨迹的身份切换次数。