Offline 3D multi-object tracking (MOT) is a critical component of the 4D auto-labeling (4DAL) process. It enhances pseudo-labels generated by high-performance detectors through the incorporation of temporal context. However, existing offline 3D MOT approaches are direct extensions of online frameworks and fail to fully exploit the advantages of offline setting. Moreover, these methods often depend on fixed upstream and customized architectures, limiting their adaptability. To address these limitations, we propose Offline-Poly, a general offline 3D MOT method based on a tracking-centric design. We introduce a standardized paradigm termed Tracking-by-Tracking (TBT), which operates exclusively on arbitrary off-the-shelf tracking outputs and produces offline-refined tracklets. This formulation decouples offline tracker from specific upstream detectors or trackers. Under the TBT paradigm, Offline-Poly accepts one or multiple coarse tracking results and processes them through a structured pipeline comprising pre-processing, hierarchical matching and fusion, and tracklet refinement. Each module is designed to capitalize on the two fundamental properties of offline tracking: resource unconstrainedness, which permits global optimization beyond real-time limits, and future observability, which enables tracklet reasoning over the full temporal horizon. Offline-Poly first eliminates short-term ghost tracklets and re-identifies fragmented segments using global scene context. It then constructs scene-level similarity to associate tracklets across multiple input sources. Finally, Offline-Poly refines tracklets by jointly leveraging local and global motion patterns. On nuScenes, we achieve SOTA performance with 77.6% AMOTA. On KITTI, it achieves leading results with 83.00% HOTA. Comprehensive experiments further validate the flexibility, generalizability, and modular effectiveness of Offline-Poly.
翻译:离线三维多目标跟踪(MOT)是四维自动标注(4DAL)流程的关键组成部分。它通过整合时序上下文信息,对高性能检测器生成的伪标签进行增强。然而,现有的离线三维多目标跟踪方法大多是在线框架的直接扩展,未能充分利用离线设置的优势。此外,这些方法通常依赖于固定的上游模块和定制化架构,限制了其适应性。为克服这些局限,我们提出Offline-Poly——一种基于跟踪中心化设计的通用离线三维多目标跟踪方法。我们引入了一种标准化范式,称为“通过跟踪进行跟踪”(TBT),该范式完全基于任意现成的跟踪输出进行操作,并生成经离线优化的轨迹片段。这种形式化设计将离线跟踪器与特定的上游检测器或跟踪器解耦。在TBT范式下,Offline-Poly接收一个或多个粗粒度跟踪结果,并通过包含预处理、分层匹配与融合、轨迹片段优化的结构化流程进行处理。每个模块均旨在利用离线跟踪的两个基本特性:资源无约束性(允许超越实时限制的全局优化)和未来可观测性(支持在全时间跨度上进行轨迹片段推理)。Offline-Poly首先利用全局场景上下文消除短期幽灵轨迹片段并重新识别碎片化轨迹段,随后构建场景级相似度以关联多输入源间的轨迹片段,最终通过联合利用局部与全局运动模式对轨迹片段进行优化。在nuScenes数据集上,我们以77.6%的AMOTA指标达到当前最优性能;在KITTI数据集上,以83.00%的HOTA指标取得领先结果。综合实验进一步验证了Offline-Poly在灵活性、泛化能力及模块化有效性方面的优势。