Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.
翻译:函数逼近(FA)一直是求解大型零和博弈的关键组成部分。然而,尽管一般和扩展式博弈被广泛认为比完全竞争或完全合作博弈在计算上更具挑战性,但针对此类博弈的FA研究却鲜有关注。其核心挑战在于:对于一般和博弈中的许多均衡,不存在马尔可夫决策过程及零和博弈中常用的状态价值函数的简单类比。本文提出学习可执行收益前沿——一般和博弈中状态价值函数的泛化形式。通过使用神经网络表示可执行收益前沿,并利用适当的备份操作和损失函数进行训练,我们逼近了最优Stackelberg扩展式关联均衡。这是首个将FA应用于Stackelberg设置的方法,使我们能够扩展到更大规模的博弈,同时仍能基于FA误差获得性能保证。此外,所提方法保证了激励相容性,且无需依赖自博弈或近似最佳反应预言机即可轻松评估。