In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.
翻译:本文研究双边鞍点问题中学习算法的最终迭代收敛性,这一收敛概念能更好地刻画学习动态的日常行为。我们聚焦于具有挑战性的场景:参与者需从紧致凸集中选择行动,且仅能获得带反馈信息。我们的主要贡献是设计了一种非耦合学习算法,该算法能以高概率保证最终迭代收敛至纳什均衡。我们建立了$\tilde{O}(T^{-1/4})$的收敛速率(忽略问题参数的多项式因子)。关键的是,所提算法具有计算高效性,仅需通过高效的线性优化预言机处理参与者的紧致行动集。该算法通过融合实验设计技术与经典的跟随正则化领导者(FTRL)框架构建而成,并针对每位学习者的行动集几何特性精心设计了正则化函数。