We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying classes of Nash equilibria and sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that includes strict and sharp equilibria, and which leads to convergence in finite time. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods - that is, when players only observe their realized payoffs.
翻译:我们开发了一个灵活的随机逼近框架,用于分析博弈(包括连续博弈和有限博弈)中学习的长期行为。所提出的分析模板涵盖了多种流行的学习算法,包括基于梯度的方法、有限博弈中学习的指数/乘法权重算法、上述算法的乐观变体和强盗变体等。除了提供这些算法的整合视角外,我们的框架还使我们能够在连续博弈和有限博弈中获得若干新的收敛结果,包括渐近收敛和有限时间收敛。具体而言,我们提供了一系列标准,用于识别具有高概率吸引性的纳什均衡类别和行动组合集合,并引入了相干性的概念——一种博弈论性质,包括严格均衡和尖锐均衡,这导致了有限时间内的收敛。重要的是,我们的分析适用于基于预言和基于收益的强盗方法——即当玩家仅观察到其实际收益时。