In the stochastic gradient descent (SGD) for sequential simulations such as the neural stochastic differential equations, the Multilevel Monte Carlo (MLMC) method is known to offer better theoretical computational complexity compared to the naive Monte Carlo approach. However, in practice, MLMC scales poorly on massively parallel computing platforms such as modern GPUs, because of its large parallel complexity which is equivalent to that of the naive Monte Carlo method. To cope with this issue, we propose the delayed MLMC gradient estimator that drastically reduces the parallel complexity of MLMC by recycling previously computed gradient components from earlier steps of SGD. The proposed estimator provably reduces the average parallel complexity per iteration at the cost of a slightly worse per-iteration convergence rate. In our numerical experiments, we use an example of deep hedging to demonstrate the superior parallel complexity of our method compared to the standard MLMC in SGD.
翻译:在神经随机微分方程等序列模拟的随机梯度下降(SGD)中,多层蒙特卡洛(MLMC)方法相比朴素蒙特卡洛方法具有更优的理论计算复杂度。然而在实际应用中,MLMC在现代GPU等大规模并行计算平台上扩展性较差,因其并行复杂度与朴素蒙特卡洛方法相当。针对此问题,我们提出延迟MLMC梯度估计器,通过复用SGD早期步骤中先前计算的梯度分量,显著降低了MLMC的并行复杂度。该估计器以略微降低的每步收敛速率为代价,可证地减少了每次迭代的平均并行复杂度。在数值实验中,我们以深度对冲为例,展示了所提方法相比SGD中标准MLMC的优越并行复杂度。