Many query-based approaches for 3D Multi-Object Tracking (MOT) adopt the tracking-by-attention paradigm, utilizing track queries for identity-consistent detection and object queries for identity-agnostic track spawning. Tracking-by-attention, however, entangles detection and tracking queries in one embedding for both the detection and tracking task, which is sub-optimal. Other approaches resemble the tracking-by-detection paradigm, detecting objects using decoupled track and detection queries followed by a subsequent association. These methods, however, do not leverage synergies between the detection and association task. Combining the strengths of both paradigms, we introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras. We introduce a learnable data association module based on edge-augmented cross-attention, leveraging appearance and geometric features. Furthermore, we integrate this association module into the decoder layer of a DETR-based 3D detector, enabling simultaneous DETR-like query-to-image cross-attention for detection and query-to-query cross-attention for data association. By stacking these decoder layers, queries are refined for the detection and association task alternately, effectively harnessing the task dependencies. We evaluate our method on the nuScenes dataset and demonstrate the advantage of our approach compared to the two previous paradigms. Code is available at https://github.com/dsx0511/ADA-Track.
翻译:许多基于查询的3D多目标跟踪(MOT)方法采用“跟踪-通过-注意力”范式,利用轨迹查询实现身份一致的检测,并利用目标查询实现身份无关的轨迹生成。然而,“跟踪-通过-注意力”范式将检测与跟踪查询融合在同一嵌入中同时处理检测和跟踪任务,这并非最优方案。其他方法类似于“跟踪-通过-检测”范式,通过解耦的轨迹和目标查询检测目标,随后进行数据关联,但这些方法未能充分利用检测与关联任务之间的协同效应。结合两种范式的优势,我们提出ADA-Track——一种新颖的端到端多视角相机3D MOT框架。我们引入了基于边增强交叉注意力的可学习数据关联模块,融合外观与几何特征。进一步,我们将该关联模块集成至基于DETR的3D检测器解码层中,使得检测任务可同时执行DETR式查询到图像的交叉注意力,数据关联任务可执行查询到查询的交叉注意力。通过堆叠这些解码层,查询在检测与关联任务间交替优化,有效利用任务间的依赖关系。我们在nuScenes数据集上评估了该方法,并展示了其相较于前两种范式的优势。代码已开源在 https://github.com/dsx0511/ADA-Track。