In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.
翻译:在迭代经验博弈论分析(EGTA)中,策略空间基于中间博弈模型的分析逐步扩展。以双预言机算法为代表的常见策略探索方法是,将当前均衡状态下最佳响应的策略加入策略集。然而,这种方法可能面临过拟合等局限性,促使迭代EGTA的策咯空间响应预言机(PSRO)框架的开发者将最佳响应的目标进行泛化,采用所谓元策略求解器(MSS)。注意到许多MSS可视为纳什均衡的扰动或近似版本,我们从显式正则化角度对MSS进行规范和分析。我们提出一种名为正则化复制动力学(RRD)的新型MSS,该机制基于遗憾准则直接截断演化过程。研究表明,RRD比现有MSS更具自适应性,并在多种博弈中表现更优。我们将研究扩展至三人博弈,此类博弈的收益矩阵与策略数量呈三次方关系,因此穷举评估策略组合可能不可行。我们提出一种能从非完整模型中识别解的策略组合搜索方法,并将其与使用正则化MSS的迭代模型构建相结合。最后,且最为重要的是,我们通过实验揭示出最佳响应目标的遗憾值对策略探索性能具有显著影响,这为正则化在PSRO中的有效性提供了理论解释。