A labelled Markov decision process (MDP) is a labelled Markov chain with nondeterminism; i.e., together with a strategy a labelled MDP induces a labelled Markov chain. The model is related to interval Markov chains. Motivated by applications to the verification of probabilistic noninterference in security, we study problems of minimising probabilistic bisimilarity distances of labelled MDPs, in particular, whether there exist strategies such that the probabilistic bisimilarity distance between the induced labelled Markov chains is less than a given rational number, both for memoryless strategies and general strategies. We show that the distance minimisation problem is ExTh(R)-complete for memoryless strategies and undecidable for general strategies. We also study the computational complexity of the qualitative problem about making the distance less than one. This problem is known to be NP-complete for memoryless strategies. We show that it is EXPTIME-complete for general strategies.
翻译:带标签的马尔可夫决策过程(MDP)是一种带有非确定性的带标签马尔可夫链;即,结合一个策略,带标签的MDP可导出一个带标签的马尔可夫链。该模型与区间马尔可夫链相关。受概率非干涉性安全验证应用的启发,我们研究了最小化带标签MDP的概率互模拟距离问题,特别是:是否存在策略使得导出的带标签马尔可夫链之间的概率互模拟距离小于给定的有理数,此问题分别针对无记忆策略和一般策略进行探讨。我们证明,对于无记忆策略,距离最小化问题是ExTh(R)-完全的;对于一般策略,该问题是不可判定的。我们还研究了关于使距离小于1的定性问题的计算复杂度。已知对于无记忆策略,该问题是NP-完全的。我们证明,对于一般策略,该问题是EXPTIME-完全的。