Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.
翻译:精确且鲁棒的邻域智能体轨迹预测对于自动驾驶车辆在复杂场景中安全行驶至关重要。近年来提出的多数方法采用深度学习技术,因其擅长编码复杂交互。然而,由于这些方法过度依赖历史观测数据,且无法有效从稀疏样本中捕捉瞬态和偶然性交互,常产生不合理的预测结果。本文提出一种结合深度学习(DL)与强化学习(RL)的层次化混合框架,用于多智能体轨迹预测,以应对多尺度交互影响的运动预测挑战。在DL阶段,将交通场景划分为多个中等尺度的异构图,并采用Transformer风格图神经网络(GNN)编码中观与全局层面的异质交互。在RL阶段,利用DL阶段预测的关键未来点将交通场景划分为局部子场景。为模拟运动规划过程以生成轨迹预测,设计了一种融合车辆运动学模型的Transformer基近端策略优化(PPO)算法,在微观交互主导影响下规划运动。同时构建多目标奖励函数,平衡智能体中心精度与场景整体兼容性。实验结果表明,本方法在Argoverse预测基准上达到当前最优性能。可视化结果进一步揭示,该层次化学习框架能够捕获多尺度交互,有效提升预测轨迹的可行性与合规性。