Regret Matching+ (RM+) and its variants are important algorithms for solving large-scale games. However, a theoretical understanding of their success in practice is still a mystery. Moreover, recent advances on fast convergence in games are limited to no-regret algorithms such as online mirror descent, which satisfy stability. In this paper, we first give counterexamples showing that RM+ and its predictive version can be unstable, which might cause other players to suffer large regret. We then provide two fixes: restarting and chopping off the positive orthant that RM+ works in. We show that these fixes are sufficient to get $O(T^{1/4})$ individual regret and $O(1)$ social regret in normal-form games via RM+ with predictions. We also apply our stabilizing techniques to clairvoyant updates in the uncoupled learning setting for RM+ and prove desirable results akin to recent works for Clairvoyant online mirror descent. Our experiments show the advantages of our algorithms over vanilla RM+-based algorithms in matrix and extensive-form games.
翻译:遗憾匹配+(RM+)及其变体是求解大规模博弈的重要算法。然而,对其在实践中的成功进行理论理解仍是一个谜团。此外,近期关于博弈快速收敛的进展局限于满足稳定性的无遗憾算法(如在线镜像下降)。本文首先给出反例,证明RM+及其预测版本可能不稳定,这可能导致其他玩家遭受较大遗憾。随后我们提出两种修正方案:重启措施以及截断RM+工作的正象限区域。我们证明,这些修正足以使带预测的RM+在标准式博弈中实现$O(T^{1/4})$个体遗憾和$O(1)$社会遗憾。我们还针对非耦合学习场景下RM+的预见性更新应用了稳定性技术,并证明其结果类似于近期关于预见性在线镜像下降的研究。实验表明,在矩阵博弈和扩展式博弈中,本文算法相比基于原始RM+的算法具有显著优势。