We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem and later retracted. It stands in contrast with a recent counter-example by Christ and Yannakakis, showing that Howard's policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin, and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart.
翻译:我们提出了一种针对确定型双人折扣与平均收益博弈的策略迭代算法,该算法在任意输入中(当每个收益值独立服从充分随机分布时)以高概率在多项式时间内运行。这一结果涵盖任意一组收益值经高斯扰动后的情形,首次证明确定型双人博弈可在平滑分析意义下被高效求解。更一般地,我们为确定型折扣与平均收益博弈定义了条件数,并证明算法运行时间关于该条件数是多项式的。该结果证实了Boros等人先前的一项猜想(该猜想曾以定理形式发布后被撤回),并与Christ和Yannakakis近期提出的反例形成对比——后者表明Howard策略迭代算法在随机单玩家平均收益博弈中无法在平滑多项式时间内运行。我们的研究方法受到Frieze与Sorkin对随机最优指派实例的分析,以及Akian、Gaubert与Hochart对平均收益博弈中偏差诱导策略分析的启发。