In this paper, we propose a martingale-based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the infimum of the Hamiltonian, \inf_{u \in U} H(t,x,u, z,p), and stochastic optimal control problems (SOCPs) with controls on both drift and volatility. We reformulate the HJB equations for the value function by training two neural networks, one for the value function and one for the optimal control with the help of two stochastic processes - a Hamiltonian process and a cost process. The control and value networks are trained such that the associated Hamiltonian process is minimized to satisfy the minimum principle of a feedback SOCP, and the cost process becomes a martingale, thus, ensuring the value function network as the solution to the corresponding HJB equation. Moreover, to enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function characterizing the projection property of the conditional expectation condition of the martingale. Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCPs with a dimension up to 2000 in a small number of epochs (less than 20) or stochastic gradient method iterations (less than 2000) for the training.
翻译:本文提出一种基于鞅的神经网络SOC-MartNet,用于求解高维Hamilton-Jacobi-Bellman(HJB)方程及兼具漂移项与波动项控制的随机最优控制问题(SOCP),其中无需Hamiltonian函数下确界\inf_{u \in U} H(t,x,u, z,p)的显式表达式。我们通过训练两个神经网络(分别对应值函数与最优控制),并借助两个随机过程——Hamiltonian过程与成本过程——重构值函数的HJB方程。控制网络与值函数网络的训练目标为:使关联的Hamiltonian过程最小化以满足反馈型SOCP的最小值原理,同时使成本过程成为鞅,从而确保值函数网络即为对应HJB方程的解。此外,为强化成本过程的鞅性质,我们采用对抗网络并构建了一个刻画鞅条件期望投影特性的损失函数。数值结果表明,所提出的SOC-MartNet能有效求解维度高达2000的HJB型方程与SOCP问题,且仅需少量训练轮次(少于20轮)或随机梯度法迭代次数(少于2000次)即可实现高效求解。