As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, a type of non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multimodal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines.
翻译:随着自动驾驶等智能机器人在人类环境中日益普及,这些系统应在多大程度上依赖基于模型的博弈规划器与数据驱动策略来实现安全、交互感知的运动规划,仍是一个未解决的问题。现有动态博弈公式假设所有智能体均为任务驱动且行为最优。然而,现实中人类倾向于偏离这些模型规定的决策,其行为更适用于噪声理性范式。本研究提出一种将数据驱动参考策略与基于优化的博弈策略融合的原则性方法。我们设计了KLGame——一类基于Kullback-Leibler(KL)正则化的非合作动态博弈,其正则化针对一般性、随机且可能多模态的参考策略。该方法为每个决策者引入可调参数,允许在任务驱动与数据驱动行为间进行调节。我们提出一种高效算法,可实时计算KLGame中多模态近似反馈纳什均衡策略。通过一系列仿真与真实自动驾驶场景实验,我们证明相比非正则化基线,KLGame策略能更有效融合参考策略的引导,并解释噪声理性的人类行为。