In two-player zero-sum games, the learning dynamic based on optimistic Hedge achieves one of the best-known regret upper bounds among strongly-uncoupled learning dynamics. With an appropriately chosen learning rate, the social and individual regrets can be bounded by $O(\log(mn))$ in terms of the numbers of actions $m$ and $n$ of the two players. This study investigates the optimality of the dependence on $m$ and $n$ in the regret of optimistic Hedge. To this end, we begin by refining existing regret analysis and show that, in the strongly-uncoupled setting where the opponent's number of actions is known, both the social and individual regret bounds can be improved to $O(\sqrt{\log m \log n})$. In this analysis, we express the regret upper bound as an optimization problem with respect to the learning rates and the coefficients of certain negative terms, enabling refined analysis of the leading constants. We then show that the existing social regret bound as well as these new social and individual regret upper bounds cannot be further improved for optimistic Hedge by providing algorithm-dependent individual regret lower bounds. Importantly, these social regret upper and lower bounds match exactly including the constant factor in the leading term. Finally, building on these results, we improve the last-iterate convergence rate and the dynamic regret of a learning dynamic based on optimistic Hedge, and complement these bounds with algorithm-dependent dynamic regret lower bounds that match the improved bounds.
翻译:在双人零和博弈中,基于乐观Hedge的学习动态在强非耦合学习动态中实现了已知最优的遗憾上界之一。通过适当选择学习率,社会遗憾与个体遗憾可依据双方行动数$m$和$n$以$O(\log(mn))$为界。本研究探讨乐观Hedge算法遗憾对$m$和$n$依赖关系的最优性。为此,我们首先改进现有遗憾分析,证明在已知对手行动数的强非耦合设定下,社会遗憾与个体遗憾界均可提升至$O(\sqrt{\log m \log n})$。在此分析中,我们将遗憾上界表达为关于学习率及特定负项系数的优化问题,从而实现对主导常数项的精细化分析。随后,通过构造算法依赖的个体遗憾下界,我们证明现有社会遗憾界以及这些新的社会与个体遗憾上界对乐观Hedge算法均不可进一步改进。值得注意的是,这些社会遗憾上界与下界在主导项的常数因子层面完全匹配。最后基于这些结果,我们改进了基于乐观Hedge的学习动态的末次迭代收敛率与动态遗憾,并通过构造与改进界相匹配的算法依赖动态遗憾下界为这些界提供了补充。