Markov decision processes can be viewed as transformers of probability distributions. While this view is useful from a practical standpoint to reason about trajectories of distributions, basic reachability and safety problems are known to be computationally intractable (i.e., Skolem-hard) to solve in such models. Further, we show that even for simple examples of MDPs, strategies for safety objectives over distributions can require infinite memory and randomization. In light of this, we present a novel overapproximation approach to synthesize strategies in an MDP, such that a safety objective over the distributions is met. More precisely, we develop a new framework for template-based synthesis of certificates as affine distributional and inductive invariants for safety objectives in MDPs. We provide two algorithms within this framework. One can only synthesize memoryless strategies, but has relative completeness guarantees, while the other can synthesize general strategies. The runtime complexity of both algorithms is in PSPACE. We implement these algorithms and show that they can solve several non-trivial examples.
翻译:马尔可夫决策过程可被视为概率分布的变换器。虽然这一视角从实践角度有助于分析分布轨迹,但已知在此类模型中求解基本可达性与安全性问题在计算上具有难解性(即Skolem困难)。进一步地,我们证明即使对于MDP的简单实例,针对分布安全目标的策略可能需要无限记忆与随机化。鉴于此,我们提出一种新颖的过近似方法,用于在MDP中综合满足分布安全目标的策略。具体而言,我们构建了一个新框架,通过模板化综合将仿射分布归纳不变量作为安全目标的证书。该框架提供两种算法:一种仅能综合无记忆策略但具有相对完备性保证,另一种则可综合通用策略。两种算法的运行时复杂度均属于PSPACE。我们实现了这些算法,并证明其能解决多个非平凡实例。