In many branches of engineering, Banach contraction mapping theorem is employed to establish the convergence of certain deterministic algorithms. Randomized versions of these algorithms have been developed that have proved useful in data-driven problems. In a class of randomized algorithms, in each iteration, the contraction map is approximated with an operator that uses independent and identically distributed samples of certain random variables. This leads to iterated random operators acting on an initial point in a complete metric space, and it generates a Markov chain. In this paper, we develop a new stochastic dominance based proof technique, called probabilistic contraction analysis, for establishing the convergence in probability of Markov chains generated by such iterated random operators in certain limiting regime. The methods developed in this paper provides a general framework for understanding convergence of a wide variety of Monte Carlo methods in which contractive property is present. We apply the convergence result to conclude the convergence of fitted value iteration and fitted relative value iteration in continuous state and continuous action Markov decision problems as representative applications of the general framework developed here.
翻译:在工程学的许多分支中,巴拿赫压缩映射定理被用于确定某些确定性算法的收敛性。这些算法的随机化版本已在数据驱动问题中得到验证并展现出实用性。在一类随机化算法中,每次迭代时,压缩映射通过使用某些随机变量的独立同分布样本的算子进行近似,这导致作用于完备度量空间中初始点的随机算子迭代,并生成一个马尔可夫链。本文提出了一种基于随机占优的新证明技术,称为概率压缩分析,用于在特定极限条件下建立此类迭代随机算子生成的马尔可夫链依概率收敛的结论。本文所开发的方法为理解具有压缩性质的各类蒙特卡洛方法的收敛性提供了通用框架。我们应用该收敛结论,作为所建立通用框架的代表性应用,证明了连续状态与连续动作马尔可夫决策问题中拟合价值迭代与拟合相对价值迭代的收敛性。