The Gauss-Newton (GN) matrix plays an important role in machine learning, most evident in its use as a preconditioning matrix for a wide family of popular adaptive methods to speed up optimization. Besides, it can also provide key insights into the optimization landscape of neural networks. In the context of deep neural networks, understanding the GN matrix involves studying the interaction between different weight matrices as well as the dependencies introduced by the data, thus rendering its analysis challenging. In this work, we take a first step towards theoretically characterizing the conditioning of the GN matrix in neural networks. We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width, which we also extend to two-layer ReLU networks. We expand the analysis to further architectural components, such as residual connections and convolutional layers. Finally, we empirically validate the bounds and uncover valuable insights into the influence of the analyzed architectural components.
翻译:高斯牛顿(GN)矩阵在机器学习中扮演着重要角色,其最显著的作用是作为一系列广泛使用的自适应优化方法的预处理矩阵以加速优化过程。此外,它也能为神经网络的优化景观提供关键见解。在深度神经网络的背景下,理解GN矩阵需要研究不同权重矩阵之间的相互作用以及数据引入的依赖关系,这使得其分析具有挑战性。本工作中,我们首次尝试从理论上刻画神经网络中GN矩阵的条件数。我们为任意深度与宽度的深度线性网络建立了GN矩阵条件数的紧致界,并将该分析推广至两层ReLU网络。我们进一步将分析扩展至其他架构组件,如残差连接与卷积层。最后,我们通过实验验证了所建立的界,并揭示了所分析架构组件影响的重要见解。