When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some properties of the game are unknown a priori and must be inferred from observations. Existing maximum likelihood estimation (MLE) approaches to solve inverse games provide only point estimates of unknown parameters without quantifying uncertainty, and perform poorly when many parameter values explain the observed behavior. To address these limitations, we take a Bayesian perspective and construct posterior distributions of game parameters. To render inference tractable, we employ a variational autoencoder (VAE) with an embedded differentiable game solver. This structured VAE can be trained from an unlabeled dataset of observed interactions, naturally handles continuous, multi-modal distributions, and supports efficient sampling from the inferred posteriors without computing game solutions at runtime. Extensive evaluations in simulated driving scenarios demonstrate that the proposed approach successfully learns the prior and posterior objective distributions, provides more accurate objective estimates than MLE baselines, and facilitates safer and more efficient game-theoretic motion planning.
翻译:当多个智能体在共同环境中交互时,每个智能体的行为会影响其他智能体的未来决策,非合作动态博弈自然地捕捉了这种耦合。然而,在交互式运动规划中,智能体通常无法获取博弈的完整模型,例如由于其他参与者的未知目标。因此,我们考虑逆博弈问题,其中博弈的某些性质是先验未知的,必须从观测中推断。现有的最大似然估计方法解决逆博弈仅提供未知参数的点估计,而无法量化不确定性,并且在多个参数值能够解释观测行为时表现较差。为解决这些局限性,我们采用贝叶斯视角,构建博弈参数的后验分布。为了简化推断过程,我们使用嵌入可微博弈求解器的变分自编码器。这种结构化的变分自编码器可以通过无标签的观测交互数据集进行训练,自然地处理连续、多模态分布,并支持从推断的后验分布中高效采样,而无需在运行时计算博弈解。在模拟驾驶场景中的广泛评估表明,所提出的方法成功学习先验和后验目标分布,提供比最大似然估计基线更准确的目标估计,并促进更安全、更高效的博弈论运动规划。