Efficient Baselines for Motion Prediction in Autonomous Driving

Motion Prediction (MP) of multiple surroundings agents is a crucial task in arbitrarily complex environments, from simple robots to Autonomous Driving Stacks (ADS). Current techniques tackle this problem using end-to-end pipelines, where the input data is usually a rendered top-view of the physical information and the past trajectories of the most relevant agents; leveraging this information is a must to obtain optimal performance. In that sense, a reliable ADS must produce reasonable predictions on time. However, despite many approaches use simple ConvNets and LSTMs to obtain the social latent features, State-Of-The-Art (SOTA) models might be too complex for real-time applications when using both sources of information (map and past trajectories) as well as little interpretable, specially considering the physical information. Moreover, the performance of such models highly depends on the number of available inputs for each particular traffic scenario, which are expensive to obtain, particularly, annotated High-Definition (HD) maps. In this work, we propose several efficient baselines for the well-known Argoverse 1 Motion Forecasting Benchmark. We aim to develop compact models using SOTA techniques for MP, including attention mechanisms and GNNs. Our lightweight models use standard social information and interpretable map information such as points from the driveable area and plausible centerlines by means of a novel preprocessing step based on kinematic constraints, in opposition to black-box CNN-based or too-complex graphs methods for map encoding, to generate plausible multimodal trajectories achieving up-to-pair accuracy with less operations and parameters than other SOTA methods. Our code is publicly available at https://github.com/Cram3r95/mapfe4mp .

翻译：多智能体场景下的运动预测（Motion Prediction, MP）是复杂环境（从简单机器人到自动驾驶系统（Autonomous Driving Stacks, ADS））中的关键任务。现有技术通常采用端到端流水线处理该问题，输入数据多为物理信息的俯视图渲染结果及最相关智能体的历史轨迹——充分利用此类信息是获得最优性能的必要条件。可靠的自动驾驶系统需在限定时间内生成合理的预测结果。然而，尽管多数方法通过简单卷积网络（ConvNets）与LSTM提取社会性潜在特征，当前最先进（State-Of-The-Art, SOTA）模型在同时利用两种信息源（地图与历史轨迹）时可能过于复杂而难以满足实时应用需求，且可解释性较弱（尤其对物理信息而言）。此外，此类模型的性能高度依赖于特定交通场景的输入数量，而获取标注高精地图（High-Definition maps, HD maps）等输入数据成本高昂。本文针对著名的Argoverse 1运动预测基准提出若干高效基线方法，旨在利用运动预测领域的SOTA技术（包括注意力机制与图神经网络（GNNs））开发紧凑型模型。我们的轻量模型通过新颖的预处理步骤（基于运动学约束提取可行驶区域点和合理中心线）使用标准社会信息与可解释地图信息，替代基于CNN的黑箱方法或过于复杂的图编码方式，生成多模态轨迹，在计算量及参数量更少的条件下达到与SOTA方法相当的精度。代码已开源：https://github.com/Cram3r95/mapfe4mp