The task of motion prediction is pivotal for autonomous driving systems, providing crucial data to choose a vehicle behavior strategy within its surroundings. Existing motion prediction techniques primarily focus on predicting the future trajectory of each agent in the scene individually, utilizing its past trajectory data. In this paper, we introduce an end-to-end neural network methodology designed to predict the future behaviors of all dynamic objects in the environment. This approach leverages the occupancy map and the scene's motion flow. We are investigatin various alternatives for constructing a deep encoder-decoder model called OFMPNet. This model uses a sequence of bird's-eye-view road images, occupancy grid, and prior motion flow as input data. The encoder of the model can incorporate transformer, attention-based, or convolutional units. The decoder considers the use of both convolutional modules and recurrent blocks. Additionally, we propose a novel time-weighted motion flow loss, whose application has shown a substantial decrease in end-point error. Our approach has achieved state-of-the-art results on the Waymo Occupancy and Flow Prediction benchmark, with a Soft IoU of 52.1% and an AUC of 76.75% on Flow-Grounded Occupancy.
翻译:运动预测任务对于自动驾驶系统至关重要,可为车辆在其周边环境中选择行为策略提供关键数据。现有的运动预测技术主要利用每个智能体的历史轨迹数据,单独预测场景中各智能体的未来轨迹。本文提出一种端到端神经网络方法,旨在预测环境中所有动态目标的未来行为。该方法利用占用图与场景运动流。我们研究了构建名为OFMPNet的深度编码器-解码器模型的多项替代方案。该模型采用一系列鸟瞰道路图像、占用网格及先验运动流作为输入数据。模型的编码器可整合Transformer、基于注意力机制的模块或卷积单元,解码器则同时考虑使用卷积模块与循环模块。此外,我们提出一种新颖的时间加权运动流损失函数,其应用显著降低了终点误差。本方法在Waymo占用率与运动流预测基准测试中取得了最优结果,基于流的地面占用率的Soft IoU达52.1%,AUC达76.75%。