The solution to a stochastic optimal control problem can be determined by computing the value function from a discretization of the associated Hamilton-Jacobi-Bellman equation. Alternatively, the problem can be reformulated in terms of a pair of forward-backward SDEs, which makes Monte-Carlo techniques applicable. More recently, the problem has also been viewed from the perspective of forward and reverse time SDEs and their associated Fokker-Planck equations. This approach is closely related to techniques used in diffusion-based generative models. Forward and reverse time formulations express the value function as the ratio of two probability density functions; one stemming from a forward McKean-Vlasov SDE and another one from a reverse McKean-Vlasov SDE. In this paper, we extend this approach to a more general class of stochastic optimal control problems and combine it with ensemble Kalman filter type and diffusion map approximation techniques in order to obtain efficient and robust particle-based algorithms.
翻译:随机最优控制问题的解可以通过对关联的Hamilton-Jacobi-Bellman方程进行离散化,计算价值函数来确定。或者,该问题可以用一对前向-后向随机微分方程重新表述,这使得蒙特卡洛方法得以应用。最近,该问题还可以从前向和反向时间随机微分方程及其关联的Fokker-Planck方程的视角来审视。这种方法与扩散生成模型中使用的技术密切相关。前向和反向时间公式将价值函数表示为两个概率密度函数的比值:一个源于前向McKean-Vlasov随机微分方程,另一个源于反向McKean-Vlasov随机微分方程。在本文中,我们将这种方法扩展到更一般的随机最优控制问题类别,并将其与集成卡尔曼滤波类型和扩散图逼近技术相结合,以开发高效且鲁棒的基于粒子的算法。