This paper proposes an adaptive behavioral decision-making method for autonomous vehicles (AVs) focusing on complex merging scenarios. Leveraging principles from non-cooperative game theory, we develop a vehicle interaction behavior model that defines key traffic elements and integrates a multifactorial reward function. Maximum entropy inverse reinforcement learning (IRL) is employed for behavior model parameter optimization. Optimal matching parameters can be obtained using the interaction behavior feature vector and the behavior probabilities output by the vehicle interaction model. Further, a behavioral decision-making method adapted to dynamic environments is proposed. By establishing a mapping model between multiple environmental variables and model parameters, it enables parameters online learning and recognition, and achieves to output interactive behavior probabilities of AVs. Quantitative analysis employing naturalistic driving datasets (highD and exiD) and real-vehicle test data validates the model's high consistency with human decision-making. In 188 tested interaction scenarios, the average human-like similarity rate is 81.73%, with a notable 83.12% in the highD dataset. Furthermore, in 145 dynamic interactions, the method matches human decisions at 77.12%, with 6913 consistence instances. Moreover, in real-vehicle tests, a 72.73% similarity with 0% safety violations are obtained. Results demonstrate the effectiveness of our proposed method in enabling AVs to make informed adaptive behavior decisions in interactive environments.
翻译:本文提出一种面向复杂合流场景的自动驾驶车辆自适应行为决策方法。基于非合作博弈论原理,我们构建了车辆交互行为模型,该模型定义了关键交通要素并整合了多因素奖励函数。采用最大熵逆强化学习对行为模型参数进行优化,通过交互行为特征向量与车辆交互模型输出的行为概率,可获得最优匹配参数。进一步提出适应动态环境的行为决策方法,通过建立多环境变量与模型参数之间的映射模型,实现参数的在线学习与识别,并输出自动驾驶车辆的交互行为概率。基于自然驾驶数据集(highD与exiD)及实车测试数据的定量分析验证了模型与人类决策的高度一致性:在188个测试交互场景中,平均人类相似度达81.73%,其中highD数据集表现尤为突出(83.12%);在145个动态交互场景中,该方法与人类决策匹配率达77.12%,包含6913次一致性案例;实车测试中,相似度达72.73%且安全违规率为0%。结果表明,所提方法能有效使自动驾驶车辆在交互环境中做出自适应行为决策。