Physics-informed neural networks (PINNs) are infamous for being hard to train. Recently, second-order methods based on natural gradient and Gauss-Newton methods have shown promising performance, improving the accuracy achieved by first-order methods by several orders of magnitude. While promising, the proposed methods only scale to networks with a few thousand parameters due to the high computational cost to evaluate, store, and invert the curvature matrix. We propose Kronecker-factored approximate curvature (KFAC) for PINN losses that greatly reduces the computational cost and allows scaling to much larger networks. Our approach goes beyond the established KFAC for traditional deep learning problems as it captures contributions from a PDE's differential operator that are crucial for optimization. To establish KFAC for such losses, we use Taylor-mode automatic differentiation to describe the differential operator's computation graph as a forward network with shared weights. This allows us to apply KFAC thanks to a recently-developed general formulation for networks with weight sharing. Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS.
翻译:物理信息神经网络(PINNs)因训练困难而备受诟病。近期,基于自然梯度和高斯-牛顿法的二阶优化方法展现出优异性能,将一阶方法达到的精度提升了数个数量级。尽管前景广阔,由于计算、存储和求逆曲率矩阵的高昂成本,现有方法仅能适用于参数规模数千的网络。本文提出适用于PINN损失函数的Kronecker分解近似曲率(KFAC)方法,该方法大幅降低计算成本,使网络规模得以显著扩展。我们的方法超越了传统深度学习问题中既有的KFAC框架,因为它能捕捉偏微分方程微分算子的关键优化贡献。为实现此类损失函数的KFAC计算,我们采用泰勒模式自动微分技术,将微分算子的计算图表述为具有共享权重的正向网络。借助近期发展的权重共享网络通用公式,我们得以成功应用KFAC方法。实验表明,基于KFAC的优化器在小规模问题上可与昂贵的二阶方法媲美,在高维神经网络和偏微分方程中展现出更优越的扩展性,并持续超越一阶方法与LBFGS优化器。