In the stochastic gradient descent (SGD) for sequential simulations such as the neural stochastic differential equations, the Multilevel Monte Carlo (MLMC) method is known to offer better theoretical computational complexity compared to the naive Monte Carlo approach. However, in practice, MLMC scales poorly on massively parallel computing platforms such as modern GPUs, because of its large parallel complexity which is equivalent to that of the naive Monte Carlo method. To cope with this issue, we propose the delayed MLMC gradient estimator that drastically reduces the parallel complexity of MLMC by recycling previously computed gradient components from earlier steps of SGD. The proposed estimator provably reduces the average parallel complexity per iteration at the cost of a slightly worse per-iteration convergence rate. In our numerical experiments, we use an example of deep hedging to demonstrate the superior parallel complexity of our method compared to the standard MLMC in SGD.
翻译:在随机梯度下降(SGD)应用于顺序模拟(如神经随机微分方程)时,多层蒙特卡洛(MLMC)方法相较于朴素蒙特卡洛方法具有更优的理论计算复杂度。然而在实际中,MLMC在大规模并行计算平台(如现代GPU)上扩展性较差,因为其并行复杂度与朴素蒙特卡洛方法相当。为解决这一问题,我们提出延迟MLMC梯度估计器,通过复用SGD早期步骤中先前计算的梯度分量,显著降低MLMC的并行复杂度。所提出的估计器以略微降低每步收敛速率为代价,可证明地减少每步平均并行复杂度。在数值实验中,我们以深度对冲为例,展示了该方法相较于标准MLMC在SGD中的优越并行复杂度。