We develop value iteration-based algorithms to solve in a unified manner different classes of combinatorial zero-sum games with mean-payoff type rewards. These algorithms rely on an oracle, evaluating the dynamic programming operator up to a given precision. We show that the number of calls to the oracle needed to determine exact optimal (positional) strategies is, up to a factor polynomial in the dimension, of order R/sep, where the "separation" sep is defined as the minimal difference between distinct values arising from strategies, and R is a metric estimate, involving the norm of approximate sub and super-eigenvectors of the dynamic programming operator. We illustrate this method by two applications. The first one is a new proof, leading to improved complexity estimates, of a theorem of Boros, Elbassioni, Gurvich and Makino, showing that turn-based mean-payoff games with a fixed number of random positions can be solved in pseudo-polynomial time. The second one concerns entropy games, a model introduced by Asarin, Cervelle, Degorre, Dima, Horn and Kozyakin. The rank of an entropy game is defined as the maximal rank among all the ambiguity matrices determined by strategies of the two players. We show that entropy games with a fixed rank, in their original formulation, can be solved in polynomial time, and that an extension of entropy games incorporating weights can be solved in pseudo-polynomial time under the same fixed rank condition.
翻译:我们开发了基于值迭代的算法,以统一方式求解具有平均收益型奖励的不同组合零和博弈类。这些算法依赖于一个预言机,该预言机能够以给定精度评估动态规划算子。我们证明,确定精确最优(位置)策略所需的预言机调用次数,在维度多项式因子内,具有R/sep的量级,其中“分离度”sep定义为策略产生的不同值之间的最小差值,R是一个度量估计,涉及动态规划算子的近似次特征向量和超特征向量的范数。我们通过两个应用说明该方法。第一个应用是对Boros、Elbassioni、Gurvich和Makino定理的新证明,该证明得到了改进的复杂度估计,表明具有固定数量随机位置的回合制平均收益博弈可在伪多项式时间内求解。第二个应用涉及熵博弈,这是由Asarin、Cervelle、Degorre、Dima、Horn和Kozyakin引入的模型。熵博弈的秩定义为由双方玩家策略确定的所有模糊矩阵中秩的最大值。我们证明,在其原始表述中,具有固定秩的熵博弈可在多项式时间内求解,并且在相同固定秩条件下,包含权重的熵博弈扩展模型可在伪多项式时间内求解。