Partially observable Markov decision processes (POMDPs) form a prominent model for uncertainty in sequential decision making. We are interested in constructing algorithms with theoretical guarantees to determine whether the agent has a strategy ensuring a given specification with probability 1. This well-studied problem is known to be undecidable already for very simple omega-regular objectives, because of the difficulty of reasoning on uncertain events. We introduce a revelation mechanism which restricts information loss by requiring that almost surely the agent has eventually full information of the current state. Our main technical results are to construct exact algorithms for two classes of POMDPs called weakly and strongly revealing. Importantly, the decidable cases reduce to the analysis of a finite belief-support Markov decision process. This yields a conceptually simple and exact algorithm for a large class of POMDPs.
翻译:部分可观测马尔可夫决策过程(POMDP)是序列决策中不确定性建模的重要框架。本研究致力于构建具有理论保证的算法,以判定智能体是否存在确保以概率1满足给定规约的策略。由于不确定性事件推理的困难,该被深入研究的问题即使对于非常简单的Omega正则目标也已被证明是不可判定的。我们引入了一种启示机制,通过要求智能体几乎必然最终获得当前状态的完整信息来限制信息损失。我们的主要技术成果是为两类分别称为弱启示与强启示的POMDP构建精确算法。重要的是,这些可判定情形可归约为有限信念支撑马尔可夫决策过程的分析。这为一大类POMDP提供了概念简洁且精确的算法。