Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address this issue by proposing a novel method for learning players' objectives in continuous dynamic games from noise-corrupted, partial state observations. Our approach learns objectives by coupling the estimation of unknown cost parameters of each player with inference of unobserved states and inputs through Nash equilibrium constraints. By coupling past state estimates with future state predictions, our approach is amenable to simultaneous online learning and prediction in receding horizon fashion. We demonstrate our method in several simulated traffic scenarios in which we recover players' preferences for, e.g., desired travel speed and collision-avoidance behavior. Results show that our method reliably estimates game-theoretic models from noise-corrupted data that closely matches ground-truth objectives, consistently outperforming state-of-the-art approaches.
翻译:部署至现实世界的机器人必须能够与环境中的其他智能体进行交互。动态博弈理论为建模智能体具有个体目标且交互随时间演化的场景提供了强大的数学框架。然而,这类技术的关键局限性在于需要预先知晓所有参与者的目标。本研究通过提出一种新颖方法来解决该问题:在连续动态博弈中,从含噪声的部分状态观测中学习参与者目标。该方法通过纳什均衡约束,将每位参与者未知成本参数的估计与未观测状态和输入的推理相耦合。通过将过往状态估计与未来状态预测相结合,该方法能够以滚动时域方式实现同步在线学习与预测。我们在多个模拟交通场景中验证了该方法,能够恢复参与者对目标行驶速度、碰撞规避行为等偏好。结果表明,本方法能够从含噪声数据中可靠地估计出与真实目标高度吻合的博弈模型,其性能始终优于现有先进方法。