A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants. Existing works either perform object detection followed by trajectory forecasting of the detected objects, or predict dense occupancy and flow grids for the whole scene. The former poses a safety concern as the number of detections needs to be kept low for efficiency reasons, sacrificing object recall. The latter is computationally expensive due to the high-dimensionality of the output grid, and suffers from the limited receptive field inherent to fully convolutional networks. Furthermore, both approaches employ many computational resources predicting areas or objects that might never be queried by the motion planner. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network. Our method avoids unnecessary computation, as it can be directly queried by the motion planner at continuous spatio-temporal locations. Moreover, we design an architecture that overcomes the limited receptive field of previous explicit occupancy prediction methods by adding an efficient yet effective global attention mechanism. Through extensive experiments in both urban and highway settings, we demonstrate that our implicit model outperforms the current state-of-the-art. For more information, visit the project website: https://waabi.ai/research/implicito.
翻译:自动驾驶车辆(SDV)必须能够感知周围环境并预测其他交通参与者的未来行为。现有工作要么执行目标检测后对检测到的目标进行轨迹预测,要么预测整个场景的密集占据网格和流场。前者存在安全隐患,因为出于效率原因需要保持低检测数量,从而牺牲了目标召回率;后者由于输出网格的高维性而计算成本高昂,并且受限于全卷积网络固有的有限感受野。此外,这两种方法都耗费大量计算资源来预测运动规划器可能永远不会查询的区域或目标。这促使我们提出一种统一的感知与未来预测方法,该方法通过单一神经网络隐式地表示随时间变化的占据和流场。我们的方法避免了不必要的计算,因为运动规划器可以直接在连续的时空位置上进行查询。此外,我们设计了一种架构,通过添加高效且有效的全局注意力机制,克服了先前显式占据预测方法中有限感受野的问题。通过在城市场景和高速公路场景中的大量实验,我们证明了隐式模型优于当前最先进的技术。更多信息,请访问项目网站:https://waabi.ai/research/implicito。