Nondeterministic good-for-MDPs (GFM) automata are for MDP model checking and reinforcement learning what good-for-games (GFG) automata are for reactive synthesis: a more compact alternative to deterministic automata that displays nondeterminism, but only so much that it can be resolved locally, such that a syntactic product can be analysed. GFM has recently been introduced as a property for reinforcement learning, where the simpler Büchi acceptance conditions it allows to use is key. However, while there are classic and novel techniques to obtain automata that are GFM, there has not been a decision procedure for checking whether or not an automaton is GFM. We show that GFM-ness is decidable and provide an EXPTIME decision procedure as well as a PSPACE-hardness proof. We also compare the succinctness of GFM automata with other types of automata with restricted nondeterminism. The first natural comparison point are GFG automata. Deterministic automata are GFG, and GFG automata are GFM, but not vice versa. This raises the question of how these classes relate in terms of succinctness. GFG automata are known to be exponentially more succinct than deterministic automata, but the gap between GFM and GFG automata as well as the gap between ordinary nondeterministic automata and those that are GFM have been open. We establish that these gaps are exponential, and sharpen this result by showing that the latter gap remains exponential when restricting the nondeterministic automata to separating safety or unambiguous reachability automata.
翻译:非确定性的适用于马尔可夫决策过程的自动机对于MDP模型检验和强化学习而言,正如适用于博弈的自动机对于反应式综合一样:它是一种比确定性自动机更紧凑的替代方案,虽然表现出非确定性,但其非确定性程度仅限于可在局部消解,从而能够对语法积进行分析。GFM最近被引入作为强化学习的一种性质,其允许使用的更简单的Büchi接受条件至关重要。然而,尽管存在经典和新颖的技术来获得GFM自动机,但一直缺乏用于检验一个自动机是否具有GFM性质的判定过程。我们证明了GFM性质是可判定的,并给出了一个EXPTIME判定过程以及一个PSPACE难度的证明。我们还比较了GFM自动机与其他类型具有受限非确定性的自动机的简洁性。第一个自然的比较点是GFG自动机。确定性自动机是GFG的,且GFG自动机是GFM的,但反之则不成立。这引出了这些类别在简洁性方面如何关联的问题。已知GFG自动机比确定性自动机指数级更简洁,但GFM与GFG自动机之间的差距,以及普通非确定性自动机与GFM自动机之间的差距,此前尚未明确。我们证明这些差距是指数级的,并通过证明当将非确定性自动机限制为分离的安全性自动机或无歧义可达性自动机时,后一差距仍保持指数级,从而强化了这一结果。