Stochastic two-player games model systems with an environment that is both adversarial and stochastic. The environment is modeled by a player (Player 2) who tries to prevent the system (Player 1) from achieving its objective. We consider finitary versions of the traditional mean-payoff objective, replacing the long-run average of the payoffs by payoff average computed over a finite sliding window. Two variants have been considered: in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. For both variants, we present complexity bounds and algorithmic solutions for computing strategies for Player 1 to ensure that the objective is satisfied with positive probability, with probability $1$, or with probability at least $p$, regardless of the strategy of Player 2. The solution crucially relies on a reduction to the special case of non-stochastic two-player games. We give a general characterization of prefix-independent objectives for which this reduction holds. The memory requirement for both players in stochastic games is also the same as in non-stochastic games by our reduction. Moreover, for non-stochastic games, we improve upon the upper bound for the memory requirement of Player 1 and upon the lower bound for the memory requirement of Player 2.
翻译:随机双人博弈模型描述了同时存在对抗性和随机性环境的系统。环境由试图阻止系统(玩家1)实现目标的玩家(玩家2)建模。我们考虑传统均值支付目标的有限变体,用有限滑动窗口计算的支付平均值替代长期平均支付。已有两种变体被研究:其一固定并给定最大窗口长度,其二虽不固定但要求窗口长度有界。针对这两种变体,我们提出了复杂度界限与算法解决方案,用于计算玩家1的策略,使其能确保目标以正概率、概率为1或至少以概率p得到满足,且不受玩家2策略影响。该解决方案关键依赖于到非随机双人博弈特例的归约。我们给出了该归约适用的前缀无关目标的通用刻画。通过我们的归约,随机博弈中两位玩家的内存需求与非随机博弈相同。此外,针对非随机博弈,我们改进了玩家1内存需求的上界和玩家2内存需求的下界。