The tasks of object detection and trajectory forecasting play a crucial role in understanding the scene for autonomous driving. These tasks are typically executed in a cascading manner, making them prone to compounding errors. Furthermore, there is usually a very thin interface between the two tasks, creating a lossy information bottleneck. To address these challenges, our approach formulates the union of the two tasks as a trajectory refinement problem, where the first pose is the detection (current time), and the subsequent poses are the waypoints of the multiple forecasts (future time). To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects directly from LiDAR point clouds and high-definition maps. We call this model DeTra, short for object Detection and Trajectory forecasting. In our experiments, we observe that \ourmodel{} outperforms the state-of-the-art on Argoverse 2 Sensor and Waymo Open Dataset by a large margin, across a broad range of metrics. Last but not least, we perform extensive ablation studies that show the value of refinement for this task, that every proposed component contributes positively to its performance, and that key design choices were made.
翻译:目标检测与轨迹预测任务在自动驾驶场景理解中起着至关重要的作用。这些任务通常以级联方式执行,容易产生误差累积。此外,两个任务间的接口通常非常薄弱,会形成有损的信息瓶颈。为应对这些挑战,我们的方法将两个任务的联合体表述为一个轨迹优化问题,其中第一个位姿是检测结果(当前时刻),后续位姿则是多模态预测的路径点(未来时刻)。为解决这一统一任务,我们设计了一种优化Transformer,能够直接从LiDAR点云和高清地图中推断物体的存在、位姿及多模态未来行为。我们将该模型称为DeTra,即目标检测与轨迹预测的缩写。在实验中,我们观察到\ourmodel{}在Argoverse 2 Sensor和Waymo Open数据集上,在广泛的评估指标中均大幅超越现有最优方法。最后同样重要的是,我们进行了大量消融实验,证明了优化机制对此任务的价值,每个提出的组件均对其性能有积极贡献,并验证了关键设计决策的有效性。