Stochastic dual dynamic programming is a cutting plane type algorithm for multi-stage stochastic optimization originated about 30 years ago. In spite of its popularity in practice, there does not exist any analysis on the convergence rates of this method. In this paper, we first establish the number of iterations, i.e., iteration complexity, required by a basic dynamic cutting plane method for solving relatively simple multi-stage optimization problems, by introducing novel mathematical tools including the saturation of search points. We then refine these basic tools and establish the iteration complexity for both deterministic and stochastic dual dynamic programming methods for solving more general multi-stage stochastic optimization problems under the standard stage-wise independence assumption. Our results indicate that the complexity of some deterministic variants of these methods mildly increases with the number of stages $T$, in fact linearly dependent on $T$ for discounted problems. Therefore, they are efficient for strategic decision making which involves a large number of stages, but with a relatively small number of decision variables in each stage. Without explicitly discretizing the state and action spaces, these methods might also be pertinent to the related reinforcement learning and stochastic control areas.
翻译:随机对偶动态规划是一种约30年前提出的用于多阶段随机优化的切割平面类算法。尽管该方法在实践中广受欢迎,但目前尚无关于其收敛速度的分析。本文首先通过引入包括搜索点饱和在内的新型数学工具,建立了基本动态切割平面方法求解相对简单多阶段优化问题所需的迭代次数(即迭代复杂度)。进而,我们完善了这些基本工具,并针对更一般的多阶段随机优化问题(在标准阶段独立假设下),建立了确定性和随机对偶动态规划方法的迭代复杂度。我们的结果表明,这些方法的某些确定性变体的复杂度随阶段数$T$的增长幅度较小——事实上,对于折扣问题,复杂度与$T$呈线性关系。因此,这类方法适用于涉及大量阶段但每阶段决策变量数量较少的战略决策问题。此外,在不显式离散化状态和动作空间的情况下,这些方法也可能适用于相关的强化学习和随机控制领域。