物理信息神经网络损失曲面的可视化研究 (Visualizing the loss landscapes of physics-informed neural networks)

Training a neural network requires navigating a high-dimensional, non-convex loss surface to find parameters that minimize this loss. In many ways, it is surprising that optimizers such as stochastic gradient descent and ADAM can reliably locate minima which perform well on both the training and test data. To understand the success of training, a "loss landscape" community has emerged to study the geometry of the loss function and the dynamics of optimization, often using visualization techniques. However, these loss landscape studies have mostly been limited to machine learning for image classification. In the newer field of physics-informed machine learning, little work has been conducted to visualize the landscapes of losses defined not by regression to large data sets, but by differential operators acting on state fields discretized by neural networks. In this work, we provide a comprehensive review of the loss landscape literature, as well as a discussion of the few existing physics-informed works which investigate the loss landscape. We then use a number of the techniques we survey to empirically investigate the landscapes defined by the Deep Ritz and squared residual forms of the physics loss function. We find that the loss landscapes of physics-informed neural networks have many of the same properties as the data-driven classification problems studied in the literature. Unexpectedly, we find that the two formulations of the physics loss often give rise to similar landscapes, which appear smooth, well-conditioned, and convex in the vicinity of the solution. The purpose of this work is to introduce the loss landscape perspective to the scientific machine learning community, compare the Deep Ritz and the strong form losses, and to challenge prevailing intuitions about the complexity of the loss landscapes of physics-informed networks.

翻译：训练神经网络需要在高度非凸的高维损失曲面上进行寻优，以找到使损失最小化的参数。随机梯度下降和ADAM等优化器能够可靠地定位在训练集和测试集上均表现良好的极小值点，这在许多方面令人惊讶。为理解训练成功的原因，一个专注于研究损失函数几何特性与优化动力学的"损失曲面"研究领域应运而生，该领域常借助可视化技术进行分析。然而，现有损失曲面研究大多局限于图像分类的机器学习任务。在新兴的物理信息机器学习领域，针对损失曲面的可视化研究尚显不足——这类损失并非通过回归大型数据集定义，而是由作用于神经网络离散化状态场的微分算子所构成。本文系统综述了损失曲面领域的现有文献，并探讨了少数涉及物理信息损失曲面的相关研究。随后，我们运用综述中的多种技术，对基于Deep Ritz方法和平方残差形式的物理损失函数所定义的曲面进行实证研究。研究发现，物理信息神经网络的损失曲面具有与文献中数据驱动分类问题相似的诸多特性。出乎意料的是，两种物理损失形式往往产生结构相似的曲面：在解邻域内呈现平滑、良态且凸的特性。本研究旨在向科学机器学习领域引入损失曲面的研究视角，比较Deep Ritz方法与强形式损失的差异，并对当前关于物理信息网络损失曲面复杂性的普遍认知提出挑战。