Real-time, accurate prediction of human steering behaviors has wide applications, from developing intelligent traffic systems to deploying autonomous driving systems in both real and simulated worlds. In this paper, we present ContextVAE, a context-aware approach for multi-modal vehicle trajectory prediction. Built upon the backbone architecture of a timewise variational autoencoder, ContextVAE employs a dual attention mechanism for observation encoding that accounts for the environmental context information and the dynamic agents' states in a unified way. By utilizing features extracted from semantic maps during agent state encoding, our approach takes into account both the social features exhibited by agents on the scene and the physical environment constraints to generate map-compliant and socially-aware trajectories. We perform extensive testing on the nuScenes prediction challenge, Lyft Level 5 dataset and Waymo Open Motion Dataset to show the effectiveness of our approach and its state-of-the-art performance. In all tested datasets, ContextVAE models are fast to train and provide high-quality multi-modal predictions in real-time.
翻译:实时、高精度地预测人类转向行为具有广泛应用,涵盖从开发智能交通系统到在真实及模拟世界中部署自动驾驶系统的多个领域。本文提出ContextVAE——一种面向上下文的多模态车辆轨迹预测方法。该方法基于时序变分自编码器的主干架构,采用双注意力机制进行观测编码,以统一方式融合环境上下文信息与动态智能体状态。通过在智能体状态编码过程中利用语义地图提取的特征,我们的方法同时考虑了场景中智能体展现的社会特征与物理环境约束,从而生成符合地图约束且具有社会意识的轨迹。我们在nuScenes预测挑战赛、Lyft Level 5数据集以及Waymo开放运动数据集上进行了广泛测试,验证了方法的有效性及其达到的最优性能。在所有测试数据集中,ContextVAE模型训练快速,并能实时提供高质量的多模态预测。