Understanding road geometry is a critical component of the autonomous vehicle (AV) stack. While high-definition (HD) maps can readily provide such information, they suffer from high labeling and maintenance costs. Accordingly, many recent works have proposed methods for estimating HD maps online from sensor data. The vast majority of recent approaches encode multi-camera observations into an intermediate representation, e.g., a bird's eye view (BEV) grid, and produce vector map elements via a decoder. While this architecture is performant, it decimates much of the information encoded in the intermediate representation, preventing downstream tasks (e.g., behavior prediction) from leveraging them. In this work, we propose exposing the rich internal features of online map estimation methods and show how they enable more tightly integrating online mapping with trajectory forecasting. In doing so, we find that directly accessing internal BEV features yields up to 73% faster inference speeds and up to 29% more accurate predictions on the real-world nuScenes dataset.
翻译:理解道路几何是自动驾驶系统堆栈的关键组成部分。虽然高精地图能够直接提供此类信息,但其标注和维护成本高昂。因此,许多近期研究提出了基于传感器数据在线估计高精地图的方法。当前绝大多数方法将多摄像头观测编码为中间表示(如鸟瞰图网格),并通过解码器生成矢量地图元素。尽管该架构性能良好,但会丢弃中间表示中编码的大量信息,导致下游任务(如行为预测)无法利用这些信息。在本研究中,我们提出公开在线地图估计方法丰富的内部特征,并展示如何通过这些特征实现在线建图与轨迹预测的更紧密集成。实验表明,在真实世界nuScenes数据集上,直接访问内部BEV特征可使推理速度提升高达73%,预测精度提高高达29%。