In this work, we propose a martingale based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the Hamiltonian $\inf_{u \in U} H(t,x,u, z,p)$, and stochastic optimal control problems with controls on both drift and volatility. We reformulate the HJB equations into a stochastic neural network learning process, i.e., training a control network and a value network such that the associated Hamiltonian process is minimized and the cost process becomes a martingale.To enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function based on the projection property of conditional expectations. Then, the control/value networks and the adversarial network are trained adversarially, such that the cost process is driven towards a martingale and the minimum principle is satisfied for the control.Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCP with a dimension up to $500$ in a small number of training epochs.
翻译:本文提出一种基于鞅的神经网络SOC-MartNet,用于求解无需显式表达哈密顿量 $\inf_{u \in U} H(t,x,u, z,p)$ 的高维Hamilton-Jacobi-Bellman方程,以及同时控制漂移项和波动项的随机最优控制问题。我们将HJB方程重新表述为随机神经网络学习过程,即通过训练控制网络和值网络,使关联的哈密顿过程最小化,并使代价过程成为鞅。为强制代价过程满足鞅性质,我们采用对抗网络,并基于条件期望的投影性质构建损失函数。随后,控制/值网络与对抗网络进行对抗训练,使代价过程趋向鞅,并满足控制的最优性原理。数值结果表明,所提出的SOC-MartNet在少量训练周期内即可高效求解维数高达500的HJB型方程和随机最优控制问题。