Solving Nash equilibrium is the key challenge in normal-form games with large strategy spaces, where open-ended learning frameworks offer an efficient approach. In this work, we propose an innovative unified open-ended learning framework A-PSRO, i.e., Advantage Policy Space Response Oracle, as a comprehensive framework for both zero-sum and general-sum games. In particular, we introduce the advantage function as an enhanced evaluation metric for strategies, enabling a unified learning objective for agents engaged in normal-form games. We prove that the advantage function exhibits favorable properties and is connected with the Nash equilibrium, which can be used as an objective to guide agents to learn strategies efficiently. Our experiments reveal that A-PSRO achieves a considerable decrease in exploitability in zero-sum games and an escalation in rewards in general-sum games, significantly outperforming previous PSRO algorithms.
翻译:求解纳什均衡是策略空间巨大的正规型博弈中的关键挑战,而开放式学习框架为此提供了有效途径。本文提出创新型统一开放式学习框架A-PSRO(优势策略空间响应预言机),该框架同时适用于零和博弈与一般和博弈。具体而言,我们引入优势函数作为策略的增强评估指标,为参与正规型博弈的智能体建立统一的学习目标。我们证明了优势函数具有良好性质且与纳什均衡存在关联,可作为指导智能体高效学习策略的目标函数。实验结果表明,A-PSRO在零和博弈中显著降低可剥削性,在一般和博弈中提升奖励值,性能全面超越先前PSRO算法。