Scaling Supervised Local Learning with Augmented Auxiliary Networks

Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. However, existing local learning methods are confronted with a large accuracy gap with the BP counterpart, particularly for large-scale networks. This is due to the weak coupling between local layers and their subsequent network layers, as there is no gradient communication across layers. To tackle this issue, we put forward an augmented local learning method, dubbed AugLocal. AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy. We also propose to linearly reduce the depth of auxiliary networks as the hidden layer goes deeper, ensuring sufficient network capacity while reducing the computational cost of auxiliary networks. Our extensive experiments on four image classification datasets (i.e., CIFAR-10, SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks while reducing GPU memory usage by around 40%. The proposed AugLocal method, therefore, opens up a myriad of opportunities for training high-performance deep neural networks on resource-constrained platforms.Code is available at https://github.com/ChenxiangMA/AugLocal.

翻译：深度神经网络通常使用全局误差信号进行端到端反向传播（BP）训练，这不仅在生物学上不可信，还存在更新锁定问题并消耗大量内存。局部学习通过使用梯度隔离的辅助网络独立更新每一层，为解决上述问题提供了一种有前景的替代方案。然而，现有局部学习方法在大规模网络上与反向传播方法之间存在显著的精度差距。这是由于局部层与其后续网络层之间缺乏梯度通信，导致层间耦合较弱。为解决这一问题，我们提出了一种增强的局部学习方法，命名为AugLocal。AugLocal通过从各隐藏层的后续网络层中均匀选择少量子集来构建其辅助网络，以增强层间协同。我们还提出随着隐藏层加深线性缩减辅助网络的深度，在保证充分网络容量的同时降低辅助网络的计算成本。在四个图像分类数据集（即CIFAR-10、SVHN、STL-10和ImageNet）上的大量实验表明，AugLocal能够有效扩展到数十个局部层，其精度与BP训练网络相当，同时GPU内存使用量降低约40%。因此，所提出的AugLocal方法为在资源受限平台上训练高性能深度神经网络开辟了众多可能性。代码已开源在https://github.com/ChenxiangMA/AugLocal。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日