Consider a strongly monotone game where the players' utility functions include a reward function and a linear term for each dimension, with coefficients that are controlled by the manager. Gradient play converges to a unique Nash equilibrium (NE) that does not optimize the global objective. The global performance at NE can be improved by imposing linear constraints on the NE, also known as a generalized Nash equilibrium (GNE). We therefore want the manager to control the coefficients such that they impose the desired constraint on the NE. However, this requires knowing the players' rewards and action sets. Obtaining this game information is infeasible in a large-scale network and violates user privacy. To overcome this, we propose a simple algorithm that learns to shift the NE to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm converges with probability 1 to the set of GNE given by coupled linear constraints. We then prove an L2 convergence rate of near-$O(t^{-1/4})$.
翻译:考虑一个强单调博弈,其中参与者的效用函数包含一个奖励函数和每个维度的线性项,其系数由管理者控制。梯度博弈收敛于一个唯一的纳什均衡(NE),该均衡无法优化全局目标。通过施加线性约束于纳什均衡(也称为广义纳什均衡,GNE),可以改善均衡点的全局性能。因此,我们希望管理者控制这些系数,以在纳什均衡上施加所需的约束。然而,这需要知晓参与者的奖励函数和行动集。在大规模网络中获取此类博弈信息并不可行,且会侵犯用户隐私。为解决此问题,我们提出一种简单算法,该算法通过在线调整受控系数,学习将纳什均衡移动至满足线性约束的位置。我们的算法仅需线性约束违反量作为反馈,无需知晓奖励函数或行动集。我们证明,给定耦合线性约束,算法以概率1收敛到广义纳什均衡集。随后,我们证明了算法具有接近-$O(t^{-1/4})$的L2收敛速率。