We study a setting in which two players play a (possibly approximate) Nash equilibrium of a bimatrix game, while a learner observes only their actions and has no knowledge of the equilibrium or the underlying game. A natural question is whether the learner can rationalize the observed behavior by inferring the players' payoff functions. Rather than producing a single payoff estimate, inverse game theory aims to identify the entire set of payoffs consistent with observed behavior, enabling downstream use in, e.g., counterfactual analysis and mechanism design across applications like auctions, pricing, and security games. We focus on the problem of estimating the set of feasible payoffs with high probability and up to precision $ε$ on the Hausdorff metric. We provide the first minimax-optimal rates for both exact and approximate equilibrium play, in zero-sum as well as general-sum games. Our results provide learning-theoretic foundations for set-valued payoff inference in multi-agent environments.
翻译:我们研究一种场景:两名玩家进行双矩阵博弈的(可能近似)纳什均衡博弈,而学习者仅能观察其行动,对均衡或底层博弈一无所知。一个自然的问题是,学习者能否通过推断玩家的收益函数来合理化观察到的行为。与生成单一收益估计不同,逆博弈论旨在识别与观察行为一致的全部收益集合,从而支持下游应用,如拍卖、定价和安全博弈等场景中的反事实分析与机制设计。我们重点关注以高概率和豪斯多夫度量精度$ε$估计可行收益集合的问题。我们首次为零和博弈及一般和博弈中的精确与近似均衡博弈提供了极小极大最优速率。我们的研究结果为多智能体环境中集合值收益推断奠定了学习理论基础。