Linear Temporal Logic (LTL) is a formal way of specifying complex objectives for planning problems modeled as Markov Decision Processes (MDPs). The planning problem aims to find the optimal policy that maximizes the satisfaction probability of the LTL objective. One way to solve the planning problem is to use the surrogate reward with two discount factors and dynamic programming, which bypasses the graph analysis used in traditional model-checking. The surrogate reward is designed such that its value function represents the satisfaction probability. However, in some cases where one of the discount factors is set to $1$ for higher accuracy, the computation of the value function using dynamic programming is not guaranteed. This work shows that a multi-step contraction always exists during dynamic programming updates, guaranteeing that the approximate value function will converge exponentially to the true value function. Thus, the computation of satisfaction probability is guaranteed.
翻译:线性时序逻辑(LTL)是一种为建模为马尔可夫决策过程(MDP)的规划问题指定复杂目标的正式方法。该规划问题旨在找到最大化LTL目标满足概率的最优策略。解决该规划问题的一种途径是使用具有两个折扣因子的代理奖励并结合动态规划,这种方法绕过了传统模型检测中使用的图分析。代理奖励的设计使其值函数能够表示满足概率。然而,在某些情况下,为了获得更高精度而将其中一个折扣因子设为 $1$ 时,使用动态规划计算值函数无法保证收敛。本工作证明了在动态规划更新过程中始终存在多步压缩性,从而确保近似值函数将以指数速度收敛到真实值函数。因此,满足概率的计算得以保证。