We consider finite-state Markov decision processes with the combined Energy-MeanPayoff objective. The controller tries to avoid running out of energy while simultaneously attaining a strictly positive mean payoff in a second dimension. We show that finite memory suffices for almost surely winning strategies for the Energy-MeanPayoff objective. This is in contrast to the closely related Energy-Parity objective, where almost surely winning strategies require infinite memory in general. We show that exponential memory is sufficient (even for deterministic strategies) and necessary (even for randomized strategies) for almost surely winning Energy-MeanPayoff. The upper bound holds even if the strictly positive mean payoff part of the objective is generalized to multidimensional strictly positive mean payoff. Finally, it is decidable in pseudo-polynomial time whether an almost surely winning strategy exists.
翻译:我们考虑具有能量-平均收益组合目标的有限状态马尔可夫决策过程。控制器在第二维度上试图避免能量耗尽,同时获得严格正的平均收益。我们证明,对于能量-平均收益目标,有限记忆足以实现几乎必然获胜的策略。这与密切相关的能量-奇偶性目标形成对比,后者通常需要无限记忆才能实现几乎必然获胜。我们证明,指数级记忆对于几乎必然获胜的能量-平均收益策略是充分的(即使对于确定性策略)且必要的(即使对于随机化策略)。即使目标的严格正平均收益部分推广到多维严格正平均收益,该上界仍然成立。最后,是否存在几乎必然获胜策略是可判定的,且可在伪多项式时间内完成判定。