We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
翻译:我们提出SmoothCruiser,这是一种新型规划算法,用于在给定环境生成模型的情况下估计熵正则化马尔可夫决策过程和双人博弈中的值函数。SmoothCruiser利用正则化所促进的贝尔曼算子的光滑性,实现了与问题无关的样本复杂度阶数O~(1/ε^4)(其中ε为期望精度),而在非正则化设定下,尚无已知算法能在最坏情况下保证多项式样本复杂度。