Our paper studies the setting of players using no-regret algorithms in various two-player games. We address whether having stronger regret guarantees or playing against an opponent with weaker regret guarantees yields higher utilities for the player in question. We consider a hierarchy of algorithms from weakest to strongest: uniform random play, no-regret, and no-swap-regret. We find, counterintuitively, that in many games, no-swap-regret is a worse choice for players (and gives better utility for their opponents). We find the root cause of this phenomenon to be a difference in effective learning rate between the two algorithms, where the no-swap-regret algorithms learn $N$ times slower than no-regret algorithms. To address this, we attempt to equalize learning rates, leading to closer utility between no-regret and no-swap-regret players. Finally, we show that for certain random games with $7$ actions per player, no-swap-regret algorithms can perform noticeably better than no-regret algorithms in a manner that cannot be explained away by unfairly adjusted learning rates.
翻译:本文研究多个玩家在不同两人博弈中采用无遗憾算法的场景。我们探讨了更强的遗憾保证或与拥有更弱遗憾保证的对手对局,是否能为该玩家带来更高收益。我们考虑从弱到强的算法层次:均匀随机选择、无遗憾、无交换遗憾。与直觉相悖的是,在许多博弈中,无交换遗憾对玩家而言是更差的选择(却能为对手带来更好收益)。我们发现该现象的根本原因在于两种算法的有效学习速率差异——无交换遗憾算法的学习速度比无遗憾算法慢N倍。为解决此问题,我们尝试均衡学习速率,使无遗憾与无交换遗憾玩家的收益趋于接近。最后,我们证明在每玩家拥有7种动作的特定随机博弈中,无交换遗憾算法能以无法通过非公平调整学习速率解释的方式,显著优于无遗憾算法。