Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games. However, the computational cost of PSRO has become a significant limitation to its practical application. Our analysis shows that game simulation is the primary bottleneck in PSRO's runtime. To address this issue, we conclude the concept of Simulation-Free PSRO and summarize existing methods that instantiate this concept. Additionally, we propose a novel Dynamic Window-based Simulation-Free PSRO, which introduces the concept of a strategy window to replace the original strategy set maintained in PSRO. The number of strategies in the strategy window is limited, thereby simplifying opponent strategy selection and improving the robustness of the best response. Moreover, we use Nash Clustering to select the strategy to be eliminated, ensuring that the number of strategies within the strategy window is effectively limited. Our experiments across various environments demonstrate that the Dynamic Window mechanism significantly reduces exploitability compared to existing methods, while also exhibiting excellent compatibility. Our code is available at https://github.com/enochliu98/SF-PSRO.
翻译:策略空间响应预言机(PSRO)将博弈论均衡计算与学习相结合,在近似零和博弈的纳什均衡方面效果显著。然而,PSRO的计算成本已成为其实际应用的主要限制。我们的分析表明,博弈模拟是PSRO运行时的首要瓶颈。为解决此问题,我们提出了无模拟PSRO的概念,并总结了实现此概念的现有方法。此外,我们提出了一种新颖的基于动态窗口的无模拟PSRO,该方法引入了策略窗口的概念以替代PSRO中维护的原始策略集合。策略窗口中的策略数量受到限制,从而简化了对手策略选择并提升了最佳响应的鲁棒性。进一步地,我们利用纳什聚类选择待淘汰策略,确保策略窗口内的策略数量得到有效控制。我们在多种环境下的实验表明,与现有方法相比,动态窗口机制显著降低了可被利用性,同时展现出优异的兼容性。我们的代码公开于 https://github.com/enochliu98/SF-PSRO。