We consider the problem of nonlinear stochastic optimal control. This problem is thought to be fundamentally intractable owing to Bellman's ``curse of dimensionality". We present a result that shows that repeatedly solving an open-loop deterministic problem from the current state with progressively shorter horizons, similar to Model Predictive Control (MPC), results in a feedback policy that is $O(\epsilon^4)$ near to the true global stochastic optimal policy, \nxx{where $\epsilon$ is a perturbation parameter modulating the noise.} We show that the optimal deterministic feedback problem has a perturbation structure in that higher-order terms of the feedback law do not affect lower-order terms, and that this structure is lost in the optimal stochastic feedback problem. Consequently, solving the Stochastic Dynamic Programming problem is highly susceptible to noise, even when tractable, and in practice, the MPC-type feedback law offers superior performance even for stochastic systems.
翻译:我们考虑非线性随机最优控制问题。由于贝尔曼的“维度灾难”,该问题被认为本质上难以处理。我们提出一个结果:从当前状态开始反复求解具有逐渐缩短时域的开环确定性问题(类似于模型预测控制,MPC),所得到的反馈策略与真实全局随机最优策略的偏差为$O(\epsilon^4)$,其中$\epsilon$是调制噪声的扰动参数。我们证明最优确定性反馈问题具有扰动结构,即反馈律的高阶项不影响低阶项,而这种结构在最优随机反馈问题中会丢失。因此,即使随机动态规划问题可解,其对噪声也高度敏感,而在实践中,即使对于随机系统,MPC型反馈律也能提供更优的性能。