For autonomous driving in highly dynamic environments, it is anticipated to predict the future behaviors of surrounding vehicles (SVs) and make safe and effective decisions. However, modeling the inherent coupling effect between the prediction and decision-making modules has been a long-standing challenge, especially when there is a need to maintain appropriate computational efficiency. To tackle these problems, we propose a novel integrated intention prediction and decision-making approach, which explicitly models the coupling relationship and achieves efficient computation. Specifically, a spectrum attention net is designed to predict the intentions of SVs by capturing the trends of each frequency component over time and their interrelations. Fast computation of the intention prediction module is attained as the predicted intentions are not decoded to trajectories in the executing process. Furthermore, the proximal policy optimization (PPO) algorithm is employed to address the non-stationary problem in the framework through a modest policy update enabled by a clipping mechanism within its objective function. On the basis of these developments, the intention prediction and decision-making modules are integrated through joint learning. Experiments are conducted in representative traffic scenarios, and the results reveal that the proposed integrated framework demonstrates superior performance over several deep reinforcement learning (DRL) baselines in terms of success rate, efficiency, and safety in driving tasks.
翻译:在高度动态的环境中实现自动驾驶,需要预测周围车辆的未来行为并做出安全有效的决策。然而,建模预测模块与决策模块之间的内在耦合效应一直是一个长期存在的挑战,尤其是在需要保持适当计算效率的情况下。为解决这些问题,本文提出了一种新颖的集成意图预测与决策方法,该方法显式建模了耦合关系并实现了高效计算。具体而言,我们设计了一个频谱注意力网络,通过捕捉各频率分量随时间的变化趋势及其相互关系来预测周围车辆的意图。由于在执行过程中预测的意图无需解码为轨迹,因此实现了意图预测模块的快速计算。此外,采用近端策略优化算法,通过其目标函数中的裁剪机制实现适度策略更新,从而解决框架中的非平稳性问题。基于这些进展,意图预测模块与决策模块通过联合学习实现集成。我们在典型交通场景中进行了实验,结果表明:在驾驶任务的成功率、效率和安全性方面,所提出的集成框架优于多种深度强化学习基线方法。