Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the existence of algorithms. In general, the computation of the exact and approximated values is undecidable for UMDPs with the long-run average objective. Building on matrix product theory and ergodic properties, we introduce a novel subclass of UMDPs, termed ergodic UMDPs. Our main result demonstrates that approximating the value within this subclass is decidable. However, we show that the exact problem remains undecidable. Finally, we discuss the primary challenges of extending these results to partially observable Markov decision processes.
翻译:不可观测马尔可夫决策过程(UMDPs)是建模序贯决策问题的核心数学框架。在计算分析中,可判定性(即算法存在性)是重要考量因素。针对长期平均目标,UMDP精确值计算与近似值计算通常均不可判定。基于矩阵乘积理论与遍历性质,我们提出一类新型子类——遍历UMDP。主要结果表明:在该子类中,近似求解值函数是可判定的;然而精确求解问题仍不可判定。最后,我们探讨将此类结果推广到部分可观测马尔可夫决策过程时面临的核心挑战。