This paper proposes an adaptive behavioral decision-making method for autonomous vehicles (AVs) focusing on complex merging scenarios. Leveraging principles from non-cooperative game theory, we develop a vehicle interaction behavior model that defines key traffic elements and integrates a multifactorial reward function. Maximum entropy inverse reinforcement learning (IRL) is employed for behavior model parameter optimization. Optimal matching parameters can be obtained using the interaction behavior feature vector and the behavior probabilities output by the vehicle interaction model. Further, a behavioral decision-making method adapted to dynamic environments is proposed. By establishing a mapping model between multiple environmental variables and model parameters, it enables parameters online learning and recognition, and achieves to output interactive behavior probabilities of AVs. Quantitative analysis employing naturalistic driving datasets (highD and exiD) and real-vehicle test data validates the model's high consistency with human decision-making. In 188 tested interaction scenarios, the average human-like similarity rate is 81.73%, with a notable 83.12% in the highD dataset. Furthermore, in 145 dynamic interactions, the method matches human decisions at 77.12%, with 6913 consistence instances. Moreover, in real-vehicle tests, a 72.73% similarity with 0% safety violations are obtained. Results demonstrate the effectiveness of our proposed method in enabling AVs to make informed adaptive behavior decisions in interactive environments.
翻译:本文提出一种面向自动驾驶车辆(AVs)的自适应行为决策方法,重点关注复杂汇入场景。基于非合作博弈论原理,我们构建了车辆交互行为模型,定义了关键交通要素并整合了多因素奖励函数。采用最大熵逆强化学习(IRL)进行行为模型参数优化,通过交互行为特征向量及车辆交互模型输出的行为概率可获得最优匹配参数。进一步提出适应动态环境的行为决策方法,通过建立多环境变量与模型参数的映射模型,实现参数在线学习与识别,并输出AVs的交互行为概率。基于自然驾驶数据集(highD和exiD)及实车测试数据的定量分析验证了模型与人类决策高度一致性。在188个测试交互场景中,平均人类相似率达81.73%,其中highD数据集达83.12%。此外,在145个动态交互中,该方法以77.12%的匹配率实现6913次人类决策一致性。实车测试中取得72.73%相似度且零安全违规。结果表明,所提方法能有效使AVs在交互环境中做出自适应行为决策。