The solution to a stochastic optimal control problem can be determined by computing the value function from a discretisation of the associated Hamilton-Jacobi-Bellman equation. Alternatively, the problem can be reformulated in terms of a pair of forward-backward SDEs, which makes Monte-Carlo techniques applicable. More recently, the problem has also been viewed from the perspective of forward and reverse time SDEs and their associated Fokker-Planck equations. This approach is closely related to techniques used in score generative models. Forward and reverse time formulations express the value function as the ratio of two probability functions; one stemming from a forward SDE and another one from a reverse time SDE. In this note, we extend this approach to a more general class of stochastic optimal control problems and combine it with ensemble Kalman filter type approximation techniques in order to obtain an efficient and robust numerical scheme.
翻译:随机最优控制问题的解可通过离散化相应的Hamilton-Jacobi-Bellman方程求解值函数来确定,也可将问题重新表述为一对前向-后向随机微分方程(SDEs)以应用蒙特卡罗方法。近年来,该问题还被从正向与反向时间SDEs及其对应的Fokker-Planck方程视角进行研究,这一思路与得分生成模型中采用的技术密切相关。正向与反向时间公式将值函数表示为两个概率函数的比值:一个来自正向SDE,另一个来自反向SDE。本文将此方法推广至更一般的随机最优控制问题类别,并引入集成卡尔曼滤波型近似技术,从而构建了一种高效且稳健的数值方案。