Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address this issue by proposing a novel method for learning players' objectives in continuous dynamic games from noise-corrupted, partial state observations. Our approach learns objectives by coupling the estimation of unknown cost parameters of each player with inference of unobserved states and inputs through Nash equilibrium constraints. By coupling past state estimates with future state predictions, our approach is amenable to simultaneous online learning and prediction in receding horizon fashion. We demonstrate our method in several simulated traffic scenarios in which we recover players' preferences for, e.g., desired travel speed and collision-avoidance behavior. Results show that our method reliably estimates game-theoretic models from noise-corrupted data that closely matches ground-truth objectives, consistently outperforming state-of-the-art approaches.
翻译:面向真实世界部署的机器人必须能够与环境中的其他智能体进行交互。动态博弈论为建模智能体具有个体目标且交互随时间演变的场景提供了强大的数学框架。然而,此类方法的关键局限性在于需要预先知晓所有参与者的目标。本研究通过提出一种新颖方法解决这一问题,该方法能从含噪声的部分状态观测中学习连续动态博弈中玩家的目标。我们的方法通过将每位玩家未知成本参数的估计与通过纳什均衡约束推断未观测状态和输入相结合来学习目标。通过将过去状态估计与未来状态预测耦合,我们的方法适用于以滚动时域方式进行在线同步学习与预测。我们在多个模拟交通场景中验证了该方法,成功恢复了玩家对期望行驶速度和避碰行为等偏好的参数。结果表明,我们的方法能从含噪声数据中可靠地估计出与真实目标高度吻合的博弈论模型,其性能始终优于现有最优方法。