Stochastic two-player games model systems with an environment that is both adversarial and stochastic. In this paper, we study the expected value of the window mean-payoff measure in stochastic games. The window mean-payoff measure strengthens the classical mean-payoff measure by measuring the mean-payoff over a window of bounded length that slides along an infinite path. Two variants have been considered: in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. For both variants, we show that the decision problem to check if the expected value is at least a given threshold is in NP $\cap$ coNP. The result follows from guessing the expected values of the vertices, partitioning them into so-called value classes, and proving that a short certificate for the expected values exists. Finally, we also show that the memory required by the players to play optimally is no more than that in non-stochastic two-player games with the corresponding window objectives.
翻译:随机双人博弈用于建模具有对抗性与随机性并存环境的系统。本文研究随机博弈中窗口平均收益度量的期望值。窗口平均收益度量通过沿无限路径滑动的有界长度窗口内的平均收益进行测量,从而强化了经典平均收益度量。现有两种变体:一种变体中最大窗口长度固定且给定,另一种变体中窗口长度不固定但要求有界。针对两种变体,我们证明判断期望值是否不低于给定阈值的决策问题属于 NP $\cap$ coNP。该结论源于对顶点期望值的猜测,将其划分为所谓值类,并证明存在期望值的简短验证证书。最后,我们还证明玩家达到最优策略所需的内存不超过具有对应窗口目标的非随机双人博弈中的内存需求。