Model reparametrization -- transforming the parameter space via a bijective differentiable map -- is a popular way to improve the training of neural networks. But reparametrizations have also been problematic since they induce inconsistencies in, e.g., Hessian-based flatness measures, optimization trajectories, and modes of probability density functions. This complicates downstream analyses, e.g. one cannot make a definitive statement about the connection between flatness and generalization. In this work, we study the invariance quantities of neural nets under reparametrization from the perspective of Riemannian geometry. We show that this notion of invariance is an inherent property of any neural net, as long as one acknowledges the assumptions about the metric that is always present, albeit often implicitly, and uses the correct transformation rules under reparametrization. We present discussions on measuring the flatness of minima, in optimization, and in probability-density maximization, along with applications in studying the biases of optimizers and in Bayesian inference.
翻译:模型重参数化——通过双射可微映射变换参数空间——是改进神经网络训练的常用方法。但重参数化也存在问题,因为它会在基于Hessian矩阵的平坦度度量、优化轨迹和概率密度函数模式等方面引发不一致性。这使下游分析复杂化,例如,无法就平坦度与泛化之间的关联做出明确论断。本研究从黎曼几何视角探讨神经网络在重参数化下的不变性量。我们发现,只要承认始终存在(尽管常隐含存在)的度量假设,并采用重参数化下的正确变换规则,这种不变性概念便是任何神经网络的固有属性。我们讨论了最小值平坦度度量、优化过程及概率密度最大化中的相关分析,并探讨了其在优化器偏差研究和贝叶斯推断中的应用。