Model reparametrization, which follows the change-of-variable rule of calculus, is a popular way to improve the training of neural nets. But it can also be problematic since it can induce inconsistencies in, e.g., Hessian-based flatness measures, optimization trajectories, and modes of probability densities. This complicates downstream analyses: e.g. one cannot definitively relate flatness with generalization since arbitrary reparametrization changes their relationship. In this work, we study the invariance of neural nets under reparametrization from the perspective of Riemannian geometry. From this point of view, invariance is an inherent property of any neural net if one explicitly represents the metric and uses the correct associated transformation rules. This is important since although the metric is always present, it is often implicitly assumed as identity, and thus dropped from the notation, then lost under reparametrization. We discuss implications for measuring the flatness of minima, optimization, and for probability-density maximization. Finally, we explore some interesting directions where invariance is useful.
翻译:模型重新参数化遵循微积分中的变量变换规则,是改善神经网络训练的常用方法。然而,它也可能带来问题,例如在基于海森矩阵的平坦度度量、优化轨迹以及概率密度模式中引发不一致性。这使下游分析复杂化:例如,由于任意重新参数化会改变平坦度与泛化能力之间的关系,因此无法明确地将二者联系起来。在本研究中,我们从黎曼几何的角度探讨神经网络在重新参数化下的不变性。从这个视角来看,如果显式表示度量并使用正确的相关变换规则,不变性是任何神经网络固有的属性。这一点很重要,因为尽管度量始终存在,但通常被隐式地假定为单位矩阵,从而在符号表示中被省略,进而在重新参数化过程中丢失。我们讨论了这一发现对度量极小值平坦度、优化过程以及概率密度最大化的影响。最后,我们探索了不变性发挥作用的若干有趣研究方向。