Stochastic two-player games model systems with both adversarial and stochastic environment. The adversarial environment is modeled by a player (Player 2) who tries to prevent the system (Player 1) from achieving its objective. We consider finitary versions of the traditional mean-payoff objective, replacing the long-run average of the payoffs by payoff average computed over a finite sliding window. Two variants have been considered; in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. For both variants, we present complexity bounds and algorithmic solutions for computing strategies for Player 1 to ensure that the objective is satisfied with positive probability, with probability 1, or with a probability at least $p$. The solution crucially relies on a reduction to the special case of nonstochastic two-player games. We give a general characterization of prefix-independent objectives for which this reduction holds. The positive and almost-sure decision problems are in ${\sf PTIME}$ for the fixed variant and in ${\sf NP \cap coNP}$ for the bounded variant. For arbitrary $p$, the decision problem is in ${\sf NP \cap coNP}$ for both variants, thus matching the bounds for simple stochastic games. The memory requirements for both players in stochastic games are also the same as for nonstochastic games by our reduction. Further, for nonstochastic games, we improve upon the upper bound on the memory requirement of Player 1 and the lower bound on the memory requirement of Player 2. To the best of our knowledge, this is the first work to consider stochastic games with finitary quantitative objectives.
翻译:随机两人博弈模型同时包含对抗性和随机性环境。对抗性环境由试图阻止系统(玩家1)实现目标的玩家(玩家2)建模。我们考虑传统均值支付目标的有限形式,将支付的长程平均值替换为有限滑动窗口上计算的支付平均值。研究了两种变体:一种变体中,最大窗口长度固定且已知;另一种变体中,窗口长度不固定但要求有界。针对两种变体,我们给出了玩家1确保目标以正概率、概率为1或至少p概率满足的策略计算复杂性边界与算法方案。该解决方案关键依赖于对非随机两人博弈特例的归约。我们给出了该归约成立的通用前缀无关目标特征刻画。正概率和几乎必然决策问题在固定变体中属于PTIME,在有界变体中属于NP∩coNP。对于任意概率p,两种变体的决策问题均属于NP∩coNP,这与简单随机博弈的复杂性边界一致。通过我们的归约,随机博弈中双方玩家的记忆需求也与非随机博弈相同。此外,对于非随机博弈,我们改进了玩家1记忆需求的上界和玩家2记忆需求的下界。据我们所知,这是首个研究具有有限定量目标的随机博弈的工作。