Driving scene understanding is to obtain comprehensive scene information through the sensor data and provide a basis for downstream tasks, which is indispensable for the safety of self-driving vehicles. Specific perception tasks, such as object detection and scene graph generation, are commonly used. However, the results of these tasks are only equivalent to the characterization of sampling from high-dimensional scene features, which are not sufficient to represent the scenario. In addition, the goal of perception tasks is inconsistent with human driving that just focuses on what may affect the ego-trajectory. Therefore, we propose an end-to-end Interpretable Implicit Driving Scene Understanding (II-DSU) model to extract implicit high-dimensional scene features as scene understanding results guided by a planning module and to validate the plausibility of scene understanding using auxiliary perception tasks for visualization. Experimental results on CARLA benchmarks show that our approach achieves the new state-of-the-art and is able to obtain scene features that embody richer scene information relevant to driving, enabling superior performance of the downstream planning.
翻译:驾驶场景理解是通过传感器数据获取全面的场景信息,并为下游任务提供依据,这对于自动驾驶车辆的安全性不可或缺。常用的特定感知任务包括目标检测和场景图生成。然而,这些任务的结果仅相当于从高维场景特征中采样的表征,不足以代表整个场景。此外,感知任务的目标与人类驾驶的关注点不一致,人类驾驶仅关注可能影响自车轨迹的因素。因此,我们提出了一种端到端的可解释隐式驾驶场景理解(II-DSU)模型,以规划模块为指导提取隐式高维场景特征作为场景理解结果,并通过辅助感知任务可视化验证场景理解的合理性。在CARLA基准上的实验结果表明,我们的方法达到了最新的最优性能,能够获取蕴含更丰富驾驶相关场景信息的场景特征,从而实现更优的下游规划性能。