Whether a PTAS (polynomial-time approximation scheme) exists for equilibriums of games has been an open question, which relates to questions in three fields, the practicality of methods in algorithmic game theory, the equation PPAD=FP about the two complexity classes in computational complexity theory, and non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning). This paper introduces our discovery of the sufficient and necessary conditions for iterations based on dynamic programming and line search to approximate perfect equilibriums of dynamic games, out of which we construct a method proved to be a FPTAS (fully PTAS) for non-singular perfect equilibriums of dynamic games, where for almost any given dynamic game, all its perfect equilibriums are non-singular, indicating that FP$\subseteq$PPAD$\subseteq$Almost-FP. Our discovery consists of cone interior dynamic programming and primal-dual unbiased regret minimization, which fit into existing theories by degeneration in a structure-preserving manner. The former enables a dynamic programming operator to iteratively converge to a perfect equilibrium based on a concept called policy cone. The latter enables an interior-point line search to approximate a Nash equilibrium based on two concepts called primal-dual bias and unbiased central variety, solving a subproblem of the former. Validity of our discovery is cross-corroborated by a combination of theorem proofs, graphs of the three main concepts, and experimental results.
翻译:博弈均衡是否存在多项式时间近似方案(PTAS)一直是一个悬而未决的问题,该问题关联三个领域的核心议题:算法博弈论方法的实用性、计算复杂性理论中关于PPAD与FP两类复杂度的等式关系,以及多智能体强化学习中的非平稳性与多代理维度灾难。本文揭示了基于动态规划与线搜索的迭代算法逼近动态博弈完美均衡的充分必要条件,并据此构建了一种被证明适用于动态博弈非奇异完美均衡的完全多项式时间近似方案(FPTAS)。在几乎所有给定的动态博弈中,其所有完美均衡均为非奇异,这一性质表明FP⊆PPAD⊆Almost-FP。我们的发现包含锥内部动态规划与对偶无偏遗憾最小化两个核心组件,它们通过结构保持的退化方式与现有理论体系相衔接。前者通过称为策略锥的概念,使动态规划算子能迭代收敛至完美均衡;后者基于对偶偏差与无偏中心簇两个概念,通过内点线搜索逼近纳什均衡,从而解决前者的子问题。本发现的有效性通过定理证明、三大核心概念的图示以及实验结果三方面相互印证。