This paper studies the influence of probabilism and non-determinism on some quantitative aspect X of the execution of a system modeled as a Markov decision process (MDP). To this end, the novel notion of demonic variance is introduced: For a random variable X in an MDP M, it is defined as 1/2 times the maximal expected squared distance of the values of X in two independent execution of M in which also the non-deterministic choices are resolved independently by two distinct schedulers. It is shown that the demonic variance is between 1 and 2 times as large as the maximal variance of X in M that can be achieved by a single scheduler. This allows defining a non-determinism score for M and X measuring how strongly the difference of X in two executions of M can be influenced by the non-deterministic choices. Properties of MDPs M with extremal values of the non-determinism score are established. Further, the algorithmic problems of computing the maximal variance and the demonic variance are investigated for two random variables, namely weighted reachability and accumulated rewards. In the process, also the structure of schedulers maximizing the variance and of scheduler pairs realizing the demonic variance is analyzed.
翻译:本文研究了概率性与非确定性对以马尔可夫决策过程建模的系统执行过程中某定量特征X的影响。为此,我们引入了恶魔方差这一新概念:对于MDP M中的随机变量X,其定义为M在两个独立执行过程中X取值的最大期望平方距离的一半,其中非确定性选择也由两个不同的调度器独立解决。研究表明,恶魔方差介于M中单个调度器所能实现的X的最大方差的1至2倍之间。这允许为M和X定义一个非确定性评分,用以衡量M的非确定性选择能在多大程度上影响两次执行中X的差异。本文建立了具有极值非确定性评分的MDP M的性质。此外,针对两个随机变量(即加权可达性与累积奖励),研究了计算最大方差与恶魔方差的算法问题。在此过程中,还分析了实现方差最大化的调度器结构以及实现恶魔方差的调度器对结构。