In this work, we introduce a new variant of online gradient descent, which provably converges to Nash Equilibria and simultaneously attains sublinear regret for the class of congestion games in the semi-bandit feedback setting. Our proposed method admits convergence rates depending only polynomially on the number of players and the number of facilities, but not on the size of the action set, which can be exponentially large in terms of the number of facilities. Moreover, the running time of our method has polynomial-time dependence on the implicit description of the game. As a result, our work answers an open question from (Du et. al, 2022).
翻译:本文提出了一种在线梯度下降的新变体,该变体在半强盗反馈设置的拥挤博弈类问题中,既能证明收敛于纳什均衡,又能同时实现次线性遗憾。该方法具有收敛速率,该速率仅依赖于玩家数量和设施数量的多项式形式,而不依赖于行动集的大小(行动集大小可能随设施数量呈指数级增长)。此外,该方法的运行时间与博弈的隐式描述呈多项式时间依赖关系。因此,本研究解答了 (Du 等人,2022) 中提出的一个开放性问题。