We propose automatic optimisation methods considering the geometry of matrix manifold for the normalised parameters of neural networks. Layerwise weight normalisation with respect to Frobenius norm is utilised to bound the Lipschitz constant and to enhance gradient reliability so that the trained networks are suitable for control applications. Our approach first initialises the network and normalises the data with respect to the $\ell^{2}$-$\ell^{2}$ gain of the initialised network. Then, the proposed algorithms take the update structure based on the exponential map on high-dimensional spheres. Given an update direction such as that of the negative Riemannian gradient, we propose two different ways to determine the stepsize for descent. The first algorithm utilises automatic differentiation of the objective function along the update curve defined on the combined manifold of spheres. The directional second-order derivative information can be utilised without requiring explicit construction of the Hessian. The second algorithm utilises the majorisation-minimisation framework via architecture-aware majorisation for neural networks. With these new developments, the proposed methods avoid manual tuning and scheduling of the learning rate, thus providing an automated pipeline for optimizing normalised neural networks.
翻译:我们提出了考虑矩阵流形几何结构的神经网络归一化参数自动优化方法。采用基于Frobenius范数的逐层权重归一化来约束Lipschitz常数并增强梯度可靠性,使得训练后的网络适用于控制应用。该方法首先初始化网络,并根据初始化网络的$\ell^{2}$-$\ell^{2}$增益对数据进行归一化。随后,所提算法基于高维球面上的指数映射构建更新结构。给定负Riemann梯度等更新方向,我们提出了两种确定下降步长的方法。第一种算法沿定义在球面组合流形上的更新曲线,利用目标函数的自动微分,可在无需显式构建Hessian矩阵的情况下利用方向二阶导数信息。第二种算法通过架构感知的神经网络majorisation-minimisation框架,采用majorisation-minimisation策略。通过这些新进展,所提方法避免了学习率的手动调整与调度,为优化归一化神经网络提供了自动化流程。