感知辅助规划：通过双边缘结构促进多阶段车道级集成 (Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures)

When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.

翻译：在自动驾驶规划过程中，必须充分考虑车道、交叉口、交通规则及动态交通参与者等关键交通要素。然而，传统端到端规划方法往往忽视这些要素，可能导致规划效率低下或违反交通规则。本研究致力于将对这些要素的感知信息整合到规划任务中。为此，我们提出感知辅助规划（PHP）这一创新框架，实现车道级规划与感知的协同。该集成机制确保规划过程本质上符合交通约束，从而保障驾驶的安全性与高效性。具体而言，PHP同时关注车道的两条边缘线用于规划与感知，综合考虑车道边缘的三维空间位置，以及车道交叉口、车道方向、车道占用状态和规划相关的属性特征。在算法设计上，首先通过Transformer编码多摄像头图像以提取上述特征并预测车道级感知结果；随后，分层特征早期融合模块对特征进行优化以预测规划属性；最后，双边缘解释器采用专门设计的后期融合流程，集成车道级感知与规划信息，最终生成车辆控制信号。在三个Carla基准测试上的实验表明，本方法相较现有算法在驾驶评分上分别提升27.20%、33.47%和15.54%，达到最先进的性能水平，系统运行速度最高可达22.57 FPS。