Herein the topics of (natural) gradient descent, data decorrelation, and approximate methods for backpropagation are brought into a dialogue. Natural gradient descent illuminates how gradient vectors, pointing at directions of steepest descent, can be improved by considering the local curvature of loss landscapes. We extend this perspective and show that to fully solve the problem illuminated by natural gradients in neural networks, one must recognise that correlations in the data at any linear transformation, including node responses at every layer of a neural network, cause a non-orthonormal relationship between the model's parameters. To solve this requires a solution to decorrelate inputs at each individual layer of a neural network. We describe a range of methods which have been proposed for decorrelation and whitening of node output, while providing a novel method specifically useful for distributed computing and computational neuroscience. Implementing decorrelation within multi-layer neural networks, we can show that not only is training via backpropagation sped up significantly but also existing approximations of backpropagation, which have failed catastrophically in the past, are made performant once more. This has the potential to provide a route forward for approximate gradient descent methods which have previously been discarded, training approaches for analogue and neuromorphic hardware, and potentially insights as to the efficacy and utility of decorrelation processes in the brain.
翻译:本文围绕(自然)梯度下降、数据去相关以及反向传播的近似方法展开讨论。自然梯度下降揭示了如何通过考虑损失曲面的局部曲率来改进指向最陡下降方向的梯度向量。我们拓展了这一视角,并证明要完全解决自然梯度在神经网络中所揭示的问题,必须认识到任何线性变换中的数据相关性——包括神经网络每一层节点响应之间的相关性——会导致模型参数间呈现非正交归一关系。解决此问题需要对神经网络每一层的输入进行去相关处理。我们综述了已提出的多种节点输出去相关与白化方法,同时提出了一种特别适用于分布式计算与计算神经科学的新方法。通过在多层神经网络中实现去相关,我们不仅证明反向传播训练速度得到显著提升,而且过去曾完全失效的现有反向传播近似方法也能重新恢复性能。这为先前被弃用的近似梯度下降方法、模拟与神经形态硬件的训练方案提供了潜在发展路径,并可能为大脑中去相关过程的效能与效用机制带来新的见解。