Understanding Loss Landscapes of Neural Network Models in Solving Partial Differential Equations

Solving partial differential equations (PDEs) by parametrizing its solution by neural networks (NNs) has been popular in the past a few years. However, different types of loss functions can be proposed for the same PDE. For the Poisson equation, the loss function can be based on the weak formulation of energy variation or the least squares method, which leads to the deep Ritz model and deep Galerkin model, respectively. But loss landscapes from these different models give arise to different practical performance of training the NN parameters. To investigate and understand such practical differences, we propose to compare the loss landscapes of these models, which are both high dimensional and highly non-convex. In such settings, the roughness is more important than the traditional eigenvalue analysis to describe the non-convexity. We contribute to the landscape comparisons by proposing a roughness index to scientifically and quantitatively describe the heuristic concept of "roughness" of landscape around minimizers. This index is based on random projections and the variance of (normalized) total variation for one dimensional projected functions, and it is efficient to compute. A large roughness index hints an oscillatory landscape profile as a severe challenge for the first order optimization method. We apply this index to the two models for the Poisson equation and our empirical results reveal a consistent general observation that the landscapes from the deep Galerkin method around its local minimizers are less rough than the deep Ritz method, which supports the observed gain in accuracy of the deep Galerkin method.

翻译：以神经网络(NNs)来平衡其解决方案解决部分差异方程式(PDEs)的方法解决部分差异方程式(PDEs)的做法在过去几年中很受欢迎。然而,可以为同一 PDE 提出不同类型的损失功能。对于Poisson 方程式,损失功能可以基于弱化的能源变异配方或最小平方法,这可以分别导致深度Ritz模型和深度Galerkin模型。但是,这些不同模型的损失场景使得培训NNN参数的实际表现不同。为了调查和理解这些实际差异,我们提议对这些模型的损失场景进行对比,这些模型既具有高深度观测度,又非高度观测。在这种环境中,粗略值比传统的偏差值分析更为重要。我们提出粗度指数,从科学和数量上描述“最小度”周围的景观“干旱”的超常度概念。这一指数基于随机的预测和(常规)预测功能的总变异性,而这种模型在深度预测中,其粗度的准确度比常规方略度分析方法更有效。