Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, which specifies the degree to which all agents care about the distant future. In practical settings, however, decision-makers may vary in their degree of short-sightedness. We conjecture that quantifying and estimating each agent's short-sightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific parametrization of each agent's objective function that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters in order to solve for agents' short-sightedness. We conduct several experiments simulating human behavior at a real-world crosswalk. The results of these experiments clearly demonstrate that by explicitly inferring agents' short-sightedness, we can recover more accurate game-theoretic models, which ultimately allow us to make better predictions of agents' behavior. Specifically, our results show up to a 30% more accurate prediction of myopic behavior compared to the baseline.
翻译:动态博弈理论正日益成为建模多智能体(例如人机)交互的流行工具。博弈论模型假设每个智能体都希望最小化一个依赖于其他智能体行动的私有成本函数。这些博弈通常在固定的时间范围内演化,该范围规定了所有智能体对未来长远利益的关注程度。然而在实际场景中,决策者的短视程度可能存在差异。我们推测,通过在线数据量化和估计每个智能体的短视程度,将有助于实现与其他智能体更安全、更高效的交互。为此,我们将此推断问题构建为逆动态博弈。我们考虑每个智能体目标函数的一种特定参数化形式,该形式能平滑地插值短视与远视规划。此类博弈可方便地转化为参数化混合互补问题;我们利用这些问题解关于其隐藏参数的方向可微性,以求解智能体的短视程度。我们在真实世界人行横道场景中进行了多项模拟人类行为的实验。实验结果明确表明,通过显式推断智能体的短视程度,我们能重建更精确的博弈论模型,最终实现对智能体行为更准确的预测。具体而言,我们的结果显示,与基线方法相比,对短视行为的预测精度最高可提升30%。