Value Iteration is a widely used algorithm for solving Markov Decision Processes (MDPs). While previous studies have extensively analyzed its convergence properties, they primarily focus on convergence with respect to the infinity norm. In this work, we use absolute probability sequences to develop a new line of analysis and examine the algorithm's convergence in terms of the $L^2$ norm, offering a new perspective on its behavior and performance.
翻译:价值迭代(Value Iteration)是求解马尔可夫决策过程(MDPs)的常用算法。尽管已有研究对其收敛性进行了广泛分析,但主要集中于无穷范数意义下的收敛。本文利用绝对概率序列,提出一种新的分析思路,在$L^2$范数下考察该算法的收敛性,从而为其行为与性能提供新的视角。