The solution to a stochastic optimal control problem can be determined by computing the value function from a discretisation of the associated Hamilton-Jacobi-Bellman equation. Alternatively, the problem can be reformulated in terms of a pair of forward-backward SDEs, which makes Monte-Carlo techniques applicable. More recently, the problem has also been viewed from the perspective of forward and reverse time SDEs and their associated Fokker-Planck equations. This approach is closely related to techniques used in score generative models. Forward and reverse time formulations express the value function as the ratio of two probability density functions; one stemming from a forward McKean-Vlasov SDE and another one from a reverse McKean-Vlasov SDE. In this note, we extend this approach to a more general class of stochastic optimal control problems and combine it with ensemble Kalman filter type and diffusion map approximation techniques in order to obtain efficient and robust particle-based algorithms.
翻译:随机最优控制问题的解可以通过对关联的Hamilton-Jacobi-Bellman方程进行离散化以计算值函数来确定。此外,该问题还可以重新表述为一对前向-后向随机微分方程,这使得蒙特卡洛方法得以应用。近期,该问题也被从前向和反向时间随机微分方程及其关联的Fokker-Planck方程视角进行研究。这一方法与分数生成模型中使用的技术密切相关。前向和反向时间形式将值函数表示为两个概率密度函数的比值——一个源自前向McKean-Vlasov随机微分方程,另一个源自反向McKean-Vlasov随机微分方程。本文将该方法推广至更广泛的随机最优控制问题类别,并集成集合卡尔曼滤波器和扩散映射近似技术,从而获得高效且鲁棒的基于粒子的算法。