Stochastic two-player games model systems with an environment that is both adversarial and stochastic. The adversarial part of the environment is modeled by a player (Player 2) who tries to prevent the system (Player 1) from achieving its objective. We consider finitary versions of the traditional mean-payoff objective, replacing the long-run average of the payoffs by payoff average computed over a finite sliding window. Two variants have been considered: in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. For both variants, we present complexity bounds and algorithmic solutions for computing strategies for Player 1 to ensure that the objective is satisfied with positive probability, with probability 1, or with probability at least $p$, regardless of the strategy of Player 2. The solution crucially relies on a reduction to the special case of non-stochastic two-player games. We give a general characterization of prefix-independent objectives for which this reduction holds. The memory requirement for both players in stochastic games is also the same as in non-stochastic games by our reduction. Moreover, for non-stochastic games, we improve upon the upper bound for the memory requirement of Player 1 and upon the lower bound for the memory requirement of Player 2.
翻译:随机双人博弈用于建模具有对抗性与随机性并存环境的系统。环境的对抗性部分由一名玩家(玩家2)建模,其试图阻止系统(玩家1)达成目标。我们考虑传统平均收益目标的有限形式,通过有限滑动窗口计算的收益平均值替代长期收益平均值。现有两种变体:一种变体具有固定且给定的最大窗口长度;另一种变体则要求窗口长度有界但非固定。针对这两种变体,我们给出了计算玩家1策略的复杂度界限与算法解决方案,以确保无论玩家2采用何种策略,目标均能以正概率、概率1或至少概率$p$达成。该解决方案的核心在于归约至非随机双人博弈的特殊情形。我们给出了此类归约成立的前缀无关目标的一般性特征刻画。通过该归约,随机博弈中双方玩家的记忆需求与非随机博弈保持一致。此外,针对非随机博弈,我们改进了玩家1记忆需求的上界与玩家2记忆需求的下界。