Very deep neural networks achieve state-of-the-art performance by extracting rich, hierarchical features. Yet, training them via backpropagation is often hindered by vanishing or exploding gradients. Existing remedies, such as orthogonal or variance-preserving initialisation and residual architectures, allow for a more stable gradient propagation and the training of deeper models. In this work, we introduce a unified mathematical framework that describes a broad class of nonlinear feedforward and residual networks, whose input-to-output Jacobian matrices are exactly orthogonal almost everywhere. Such a constraint forces the resulting networks to achieve perfect dynamical isometry and train efficiently despite being very deep. Our formulation not only recovers standard architectures as particular cases but also yields new designs that match the trainability of residual networks without relying on conventional skip connections. We provide experimental evidence that perfect Jacobian orthogonality at initialisation is sufficient to stabilise training and achieve competitive performance. We compare this strategy to networks regularised to maintain the Jacobian orthogonality and obtain comparable results. We further extend our analysis to a class of networks well-approximated by those with orthogonal Jacobians and introduce networks with Jacobians representing partial isometries. These generalized models are then showed to maintain the favourable trainability properties.
翻译:极深度神经网络通过提取丰富、层次化的特征实现了最先进的性能。然而,通过反向传播训练这些网络常常受到梯度消失或爆炸的阻碍。现有的补救措施,如正交或方差保持初始化以及残差架构,允许更稳定的梯度传播和更深层模型的训练。在这项工作中,我们引入了一个统一的数学框架,描述了一大类非线性前馈网络和残差网络,其输入到输出的雅可比矩阵几乎处处精确正交。这种约束迫使所得网络实现完美的动态等距,从而即使在极深的情况下也能高效训练。我们的公式不仅将标准架构恢复为特例,还产生了新的设计,这些设计在不依赖传统跳跃连接的情况下,达到了与残差网络相当的可训练性。我们提供的实验证据表明,初始化时完美的雅可比矩阵正交性足以稳定训练并实现有竞争力的性能。我们将此策略与通过正则化保持雅可比矩阵正交性的网络进行比较,并获得了可比的结果。我们进一步将分析扩展到一类可由具有正交雅可比矩阵的网络良好近似的网络,并引入了雅可比矩阵表示部分等距的网络。随后证明这些广义模型保持了有利的可训练性特性。