Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training.
翻译:尽管近期算法研究取得了进展,我们仍然缺乏系统性的方法来利用ReLU神经网络参数中已被充分证明的重新缩放对称性。虽然两组经过适当重新缩放的权重可以实现相同的函数,但它们的训练动态可能存在显著差异。为探索这一现象提供新视角,我们基于近期提出的路径提升框架——该框架为ReLU网络提供了紧凑的分解表示——引入了一种几何驱动的准则来重新缩放神经网络参数。通过最小化该准则,我们得到了一种条件化策略,可将路径提升空间中的核与选定参考对象对齐。我们推导出实现这种对齐的高效算法。在随机网络初始化的背景下,我们分析了网络架构与初始化尺度如何共同影响所提出方法的输出。数值实验证明了该方法在加速训练方面的潜力。