Training neural networks via backpropagation is often hindered by vanishing or exploding gradients. In this work, we design architectures that mitigate these issues by analyzing and controlling the network Jacobian. We first provide a unified characterization for a class of networks with orthogonal Jacobian including known architectures and yielding new trainable designs. We then introduce the relaxed notion of persistent subspace orthogonality. This applies to a broader class of networks whose Jacobians are isometries only on a non-trivial subspace. We propose practical mechanisms to enforce this condition and empirically show that it is necessary to sufficiently preserve the gradient norms during backpropagation, enabling the training of very deep networks. We support our theory with extensive experiments.
翻译:通过反向传播训练神经网络常受梯度消失或爆炸问题阻碍。本研究通过分析并控制网络雅可比矩阵,设计了缓解这些问题的架构。首先,我们对一类具有正交雅可比矩阵的网络进行了统一表征,该类别既包含已知架构,也催生了新的可训练设计方案。随后,我们引入了松弛化的持久子空间正交性概念。这一概念适用于更广泛的网络类别,其雅可比矩阵仅在非平凡子空间上保持等距特性。我们提出了实施该条件的实用机制,并通过实证表明:该条件对于在反向传播过程中充分保持梯度范数是必要的,从而能够实现极深网络的训练。我们通过大量实验验证了理论的有效性。