The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.
翻译:反向传播算法仍然是训练深度神经网络(DNNs)最主流且最成功的方法。然而,大规模DNN训练伴随着巨大的计算成本,进而导致高碳足迹。已有证据趋同表明,输入去相关可能加速深度学习进程。但迄今为止,这一发现尚未转化为大规模DNN训练效率的显著提升,主要原因是难以实现快速且稳定的网络全局去相关。本研究首次证明,利用去相关反向传播实现极深神经网络的高效训练是完全可行的。为此,我们提出一种新型算法,能以极小的计算开销实现网络全局输入去相关。通过将该算法与精心优化的技术相结合,在训练18层深度残差网络时,相比标准反向传播实现了两倍以上的加速比,同时获得更高的测试准确率。这充分表明,去相关技术为大规模高效深度学习提供了极具前景的发展方向。