The de-facto standard approach in MDP verification is based on value iteration (VI). We propose compositional VI, a framework for model checking compositional MDPs, that addresses efficiency while maintaining soundness. Concretely, compositional MDPs naturally arise from the combination of individual components, and their structure can be expressed using, e.g., string diagrams. Towards efficiency, we observe that compositional VI repeatedly verifies individual components. We propose a technique called Pareto caching that allows to reuse verification results, even for previously unseen queries. Towards soundness, we present two stopping criteria: one generalizes the optimistic value iteration paradigm and the other uses Pareto caches in conjunction with recent baseline algorithms. Our experimental evaluations shows the promise of the novel algorithm and its variations, and identifies challenges for future work.
翻译:马尔可夫决策过程(MDP)验证的事实标准方法基于值迭代(VI)。我们提出组合VI——一种用于验证组合MDP的框架,旨在兼顾效率与正确性。具体而言,组合MDP由多个单独组件的组合自然产生,其结构可通过(例如)字符串图进行表达。为提升效率,我们观察到组合VI会反复验证单个组件。我们提出一种名为帕累托缓存的技术,该技术能够重用验证结果,即使对于前所未见的查询场景亦适用。在正确性方面,我们提出两种停止准则:一种是对乐观值迭代范式的推广,另一种则将帕累托缓存与近期基线算法结合使用。实验评估表明,所提新算法及其变体具有应用前景,并指出了未来研究面临的挑战。