Partially Observable Markov Decision Processes (POMDPs) are the standard framework for decision-making under uncertainty. While sampling-based methods scale well, they lack formal correctness guarantees, making them unsuitable for safety-critical applications. Conversely, formal synthesis techniques provide correctness-by-construction but often struggle with scalability, as general POMDP synthesis is undecidable. To bridge this gap, we propose a synthesis framework that integrates sampling, automata learning, and model-checking. Inspired by Angluin's $L^*$ algorithm, our approach utilizes sampling as a membership oracle and model-checking as an equivalence oracle. This enables the synthesis of finite-state controllers with formal guarantees, provided the sampling-induced policy is regular. We establish a relative completeness result for this framework. Experimental results from our prototypical implementation demonstrate that this method successfully solves threshold-safety problems that remain challenging for existing formal synthesis tools. We believe our algorithm serves as a valuable component in a portfolio approach to tackling the inherent difficulty of POMDP synthesis problems.
翻译:部分可观测马尔可夫决策过程(POMDP)是不确定性下决策的标准框架。尽管基于采样的方法具有良好的可扩展性,但它们缺乏形式化的正确性保证,因此不适用于安全关键型应用。相反,形式化综合技术虽能提供“构建即正确”的保障,但通常难以扩展,因为一般POMDP的综合问题不可判定。为弥合这一差距,我们提出了一种集成采样、自动机学习和模型检验的综合框架。受Angluin的$L^*$算法启发,我们的方法将采样作为成员查询函数,将模型检验作为等价查询函数。这使得在采样诱导的策略是正则的前提下,能够综合出具有形式化保证的有限状态控制器。我们为该框架建立了相对完备性结果。来自原型实现的实验结果表明,该方法能够成功解决现有形式化综合工具难以处理的阈值安全问题。我们相信,该算法可作为组合方法中的关键组件,用于应对POMDP综合问题的固有难度。