Real-time, accurate prediction of human steering behaviors has wide applications, from developing intelligent traffic systems to deploying autonomous driving systems in both real and simulated worlds. In this paper, we present ContextVAE, a context-aware approach for multi-modal vehicle trajectory prediction. Built upon the backbone architecture of a timewise variational autoencoder, ContextVAE observation encoding employs a dual attention mechanism that accounts for the environmental context and the dynamic agents' states, in a unified way. By utilizing features extracted from semantic maps during agent state encoding, our approach takes into account both the social features exhibited by agents on the scene and the physical environment constraints to generate map-compliant and socially-aware trajectories. We perform extensive testing on the nuScenes prediction challenge, Lyft Level 5 dataset and Waymo Open Motion Dataset to show the effectiveness of our approach and its state-of-the-art performance. In all tested datasets, ContextVAE models are fast to train and provide high-quality multi-modal predictions in real-time. Our code is available at: https://github.com/xupei0610/ContextVAE.
翻译:实时、准确预测人类驾驶行为具有广泛的应用,从开发智能交通系统到在真实与仿真世界中部署自动驾驶系统。本文提出ContextVAE,一种面向上下文的车辆多模态轨迹预测方法。该方法基于时间变分自编码器的主干架构,其观测编码采用双重注意力机制,以统一方式处理环境上下文与动态智能体状态。通过在智能体状态编码过程中利用语义地图提取的特征,该方法同时考虑场景中智能体展现的社会特征与物理环境约束,生成符合地图规范且具有社会意识的轨迹。我们在nuScenes预测挑战赛、Lyft Level 5数据集和Waymo开放运动数据集上进行广泛测试,验证了方法的有效性及其最先进的性能。在所有测试数据集中,ContextVAE模型训练速度快,并能实时提供高质量的多模态预测。我们的代码开源地址为:https://github.com/xupei0610/ContextVAE。