Understanding the interaction between different road users is critical for road safety and automated vehicles (AVs). Existing mathematical models on this topic have been proposed based mostly on either cognitive or machine learning (ML) approaches. However, current cognitive models are incapable of simulating road user trajectories in general scenarios, and ML models lack a focus on the mechanisms generating the behavior and take a high-level perspective which can cause failures to capture important human-like behaviors. Here, we develop a model of human pedestrian crossing decisions based on computational rationality, an approach using deep reinforcement learning (RL) to learn boundedly optimal behavior policies given human constraints, in our case a model of the limited human visual system. We show that the proposed combined cognitive-RL model captures human-like patterns of gap acceptance and crossing initiation time. Interestingly, our model's decisions are sensitive to not only the time gap, but also the speed of the approaching vehicle, something which has been described as a "bias" in human gap acceptance behavior. However, our results suggest that this is instead a rational adaption to human perceptual limitations. Moreover, we demonstrate an approach to accounting for individual differences in computational rationality models, by conditioning the RL policy on the parameters of the human constraints. Our results demonstrate the feasibility of generating more human-like road user behavior by combining RL with cognitive models.
翻译:理解不同道路使用者之间的交互对于道路安全和自动驾驶汽车至关重要。现有关于该主题的数学模型主要基于认知方法或机器学习方法提出。然而,当前的认知模型无法在一般场景中模拟道路使用者轨迹,而机器学习模型缺乏对生成行为的机制关注,并采用高层视角,这可能导致无法捕捉重要的人类类似行为。在此,我们基于计算理性开发了一个人类行人过马路决策模型,该方法利用深度强化学习学习在人类限制(本研究中为有限人类视觉系统模型)下的边界最优行为策略。我们表明,所提出的认知-强化学习联合模型能够捕捉间隙接受和过马路启动时间的人类类似模式。有趣的是,我们的模型决策不仅对时间间隙敏感,还对接近车辆的速度敏感——这曾被描述为人类间隙接受行为中的“偏差”。然而,我们的结果表明,这实际上是对人类感知限制的理性适应。此外,我们通过将强化学习策略条件化于人类约束参数,展示了一种在计算理性模型中解释个体差异的方法。我们的结果证明了将强化学习与认知模型结合以生成更类人的道路使用者行为的可行性。