Efficient Baselines for Motion Prediction in Autonomous Driving

Motion Prediction (MP) of multiple surroundings agents is a crucial task in arbitrarily complex environments, from simple robots to Autonomous Driving Stacks (ADS). Current techniques tackle this problem using end-to-end pipelines, where the input data is usually a rendered top-view of the physical information and the past trajectories of the most relevant agents; leveraging this information is a must to obtain optimal performance. In that sense, a reliable ADS must produce reasonable predictions on time. However, despite many approaches use simple ConvNets and LSTMs to obtain the social latent features, State-Of-The-Art (SOTA) models might be too complex for real-time applications when using both sources of information (map and past trajectories) as well as little interpretable, specially considering the physical information. Moreover, the performance of such models highly depends on the number of available inputs for each particular traffic scenario, which are expensive to obtain, particularly, annotated High-Definition (HD) maps. In this work, we propose several efficient baselines for the well-known Argoverse 1 Motion Forecasting Benchmark. We aim to develop compact models using SOTA techniques for MP, including attention mechanisms and GNNs. Our lightweight models use standard social information and interpretable map information such as points from the driveable area and plausible centerlines by means of a novel preprocessing step based on kinematic constraints, in opposition to black-box CNN-based or too-complex graphs methods for map encoding, to generate plausible multimodal trajectories achieving up-to-pair accuracy with less operations and parameters than other SOTA methods. Our code is publicly available at https://github.com/Cram3r95/mapfe4mp .

翻译：对多智能体环境的运动预测（MP）是任意复杂场景中的关键任务，涵盖从简易机器人到自动驾驶系统（ADS）的各类应用。现有技术通常采用端到端流程，输入数据多为渲染后的物理信息俯视图及关键智能体的历史轨迹——充分利用这些信息是实现最优性能的必要条件。在此背景下，可靠的ADS必须能及时生成合理预测。然而，尽管多数方法使用简单卷积网络（ConvNet）和长短期记忆网络（LSTM）提取社会性潜特征，当前最优（SOTA）模型在同时利用两种信息源（地图与历史轨迹）时仍可能过于复杂而难以满足实时应用需求，且可解释性不足，尤其对物理信息的处理。此外，这类模型的性能高度依赖特定交通场景的可获取输入数量，而标注高清（HD）地图这类输入获取成本高昂。本文针对著名的Argoverse 1运动预测基准提出若干高效基线方法，旨在采用包含注意力机制和图神经网络（GNN）的SOTA技术构建紧凑模型。与基于黑盒CNN或过度复杂图结构的地图编码方法不同，我们通过创新性预处理步骤（基于运动学约束提取可行驶区域点及合理中心线），使轻量级模型仅使用标准社会信息与可解释地图信息即可生成合理多模态预测，在运行效率和参数量上达到甚至超越其他SOTA方法。相关代码已在 https://github.com/Cram3r95/mapfe4mp 开源。