In autonomous driving, addressing occlusion scenarios is crucial yet challenging. Robust surrounding perception is essential for handling occlusions and aiding motion planning. State-of-the-art models fuse Lidar and Camera data to produce impressive perception results, but detecting occluded objects remains challenging. In this paper, we emphasize the crucial role of temporal cues by integrating them alongside these modalities to address this challenge. We propose a novel approach for bird's eye view semantic grid segmentation, that leverages sequential sensor data to achieve robustness against occlusions. Our model extracts information from the sensor readings using attention operations and aggregates this information into a lower-dimensional latent representation, enabling thus the processing of multi-step inputs at each prediction step. Moreover, we show how it can also be directly applied to forecast the development of traffic scenes and be seamlessly integrated into a motion planner for trajectory planning. On the semantic segmentation tasks, we evaluate our model on the nuScenes dataset and show that it outperforms other baselines, with particularly large differences when evaluating on occluded and partially-occluded vehicles. Additionally, on motion planning task we are among the early teams to train and evaluate on nuPlan, a cutting-edge large-scale dataset for motion planning.
翻译:在自动驾驶中,处理遮挡场景至关重要且极具挑战性。鲁棒的周围环境感知对于处理遮挡和辅助运动规划必不可少。现有先进模型融合激光雷达与摄像头数据以产生出色的感知结果,但检测被遮挡物体仍然困难。本文通过将时序线索与这些模态相结合来应对这一挑战,强调了时序线索的关键作用。我们提出了一种新颖的鸟瞰图语义网格分割方法,该方法利用序列传感器数据以实现对遮挡的鲁棒性。我们的模型通过注意力操作从传感器读数中提取信息,并将这些信息聚合到低维潜在表示中,从而能够在每个预测步骤处理多步输入。此外,我们还展示了该方法如何直接应用于预测交通场景的发展,并无缝集成到运动规划器中进行轨迹规划。在语义分割任务上,我们在 nuScenes 数据集上评估了模型,结果表明其性能优于其他基线方法,尤其是在评估被遮挡和部分遮挡车辆时表现出显著优势。在运动规划任务上,我们是首批在 nuPlan(一个用于运动规划的前沿大规模数据集)上进行训练和评估的团队之一。