Efficient Deep Learning with Decorrelated Backpropagation

The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.

翻译：反向传播算法仍然是训练深度神经网络（DNNs）最主流且最成功的方法。然而，大规模DNN训练伴随着巨大的计算成本，进而导致高碳足迹。已有证据趋同表明，输入去相关可能加速深度学习进程。但迄今为止，这一发现尚未转化为大规模DNN训练效率的显著提升，主要原因是难以实现快速且稳定的网络全局去相关。本研究首次证明，利用去相关反向传播实现极深神经网络的高效训练是完全可行的。为此，我们提出一种新型算法，能以极小的计算开销实现网络全局输入去相关。通过将该算法与精心优化的技术相结合，在训练18层深度残差网络时，相比标准反向传播实现了两倍以上的加速比，同时获得更高的测试准确率。这充分表明，去相关技术为大规模高效深度学习提供了极具前景的发展方向。

相关内容

反向传播

关注 354

反向传播一词严格来说仅指用于计算梯度的算法，而不是指如何使用梯度。但是该术语通常被宽松地指整个学习算法，包括如何使用梯度，例如通过随机梯度下降。反向传播将增量计算概括为增量规则中的增量规则，该规则是反向传播的单层版本，然后通过自动微分进行广义化，其中反向传播是反向累积（或“反向模式”）的特例。在机器学习中，反向传播（backprop）是一种广泛用于训练前馈神经网络以进行监督学习的算法。对于其他人工神经网络（ANN）都存在反向传播的一般化–一类算法，通常称为“反向传播”。反向传播算法的工作原理是，通过链规则计算损失函数相对于每个权重的梯度，一次计算一层，从最后一层开始向后迭代，以避免链规则中中间项的冗余计算。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日