We extend the formalism of Conjectural Variations games to Stackelberg games involving multiple leaders and a single follower. To solve these nonconvex games, a common assumption is that the leaders compute their strategies having perfect knowledge of the follower's best response. However, in practice, the leaders may have little to no knowledge about the other players' reactions. To deal with this lack of knowledge, we assume that each leader can form conjectures about the other players' best responses, and update its strategy relying on these conjectures. Our contributions are twofold: (i) On the theoretical side, we introduce the concept of Conjectural Stackelberg Equilibrium -- keeping our formalism conjecture agnostic -- with Stackelberg Equilibrium being a refinement of it. (ii) On the algorithmic side, we introduce a two-stage algorithm with guarantees of convergence, which allows the leaders to first learn conjectures on a training data set, and then update their strategies. Theoretical results are illustrated numerically.
翻译:我们将推测变分博弈的形式体系扩展到涉及多个领导者和单个追随者的斯塔克尔伯格博弈。为解决这些非凸博弈,一个常见的假设是领导者在计算其策略时对追随者的最优反应拥有完全知识。然而,在实践中,领导者可能对其他参与者的反应知之甚少甚至一无所知。为应对这种知识缺失,我们假设每位领导者能够对其他参与者的最优反应形成推测,并基于这些推测更新其策略。我们的贡献体现在两个方面:(i) 在理论层面,我们引入了推测性斯塔克尔伯格均衡的概念——保持我们的形式体系对推测类型中立——其中斯塔克尔伯格均衡是其一种精炼形式。(ii) 在算法层面,我们提出了一种具有收敛保证的两阶段算法,使领导者能够先在训练数据集上学习推测,再更新其策略。理论结果通过数值实验进行了验证。