In this paper, a highly parallel and derivative-free martingale neural network learning method is proposed to solve Hamilton-Jacobi-Bellman (HJB) equations arising from stochastic optimal control problems (SOCPs), as well as general quasilinear parabolic partial differential equations (PDEs). In both cases, the PDEs are reformulated into a martingale formulation such that loss functions will not require the computation of the gradient or Hessian matrix of the PDE solution, while its implementation can be parallelized in both time and spatial domains. Moreover, the martingale conditions for the PDEs are enforced using a Galerkin method in conjunction with adversarial learning techniques, eliminating the need for direct computation of the conditional expectations associated with the martingale property. For SOCPs, a derivative-free implementation of the maximum principle for optimal controls is also introduced. The numerical results demonstrate the effectiveness and efficiency of the proposed method, which is capable of solving HJB and quasilinear parabolic PDEs accurately in dimensions as high as 10,000.
翻译:本文提出了一种高度并行且无需导数的鞅神经网络学习方法,用于求解随机最优控制问题中产生的Hamilton-Jacobi-Bellman方程以及一般拟线性抛物型偏微分方程。针对两类方程,均将其重构为鞅形式,使得损失函数的构建无需计算PDE解的梯度或Hessian矩阵,同时其实现可在时间域和空间域上并行化。此外,通过结合Galerkin方法与对抗学习技术来强化PDE的鞅条件,从而避免直接计算与鞅性质相关的条件期望。对于随机最优控制问题,还引入了最优控制极大值原理的无导数实现方法。数值结果表明了所提方法的有效性与高效性,能够精确求解维度高达10,000的HJB方程和拟线性抛物型偏微分方程。