We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs), based on the ideas of Br\'azdil, T. et al. (2014). Verification of Markov Decision Processes Using Learning Algorithms. The primary goal of the techniques presented in that work is to improve performance by avoiding an exhaustive exploration of the state space, guided by heuristics. This approach is significantly extended in this work. Several details of the base theory are refined and errors are fixed. Section 1.3 provides an overview of all differences. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.
翻译:我们基于Brázdil, T.等人(2014)提出的思想,构建了一个将学习算法与启发式引导应用于马尔可夫决策过程(MDP)验证的通用框架。该文献中提出的技术主要通过启发式引导避免状态空间的穷举搜索,从而提升验证性能。本研究对该方法进行了显著扩展:完善了基础理论的若干细节,修正了原有错误,第1.3节完整列出了所有差异。本框架聚焦于概率可达性这一验证领域的核心问题,并在两种不同场景中实现。第一种场景假设完全了解MDP模型(包括精确转移概率),通过启发式驱动的部分模型探索,获得所需概率的精确上下界。第二种场景针对仅能对MDP进行采样而未知精确转移动态的情况,通过概率性保证(同样包含上下界)为近似计算提供高效终止条件。特别地,后者是将统计模型检验(SMC)方法扩展至MDP无界属性验证的创新应用。与现有方法不同,我们既不局限于时间有界(有限时域)或折扣属性,也不假定MDP具有任何特定结构性质。