Deep neural networks conventionally employ end-to-end backpropagation for their training process, which lacks biological credibility and triggers a locking dilemma during network parameter updates, leading to significant GPU memory use. Supervised local learning, which segments the network into multiple local blocks updated by independent auxiliary networks. However, these methods cannot replace end-to-end training due to lower accuracy, as gradients only propagate within their local block, creating a lack of information exchange between blocks. To address this issue and establish information transfer across blocks, we propose a Momentum Auxiliary Network (MAN) that establishes a dynamic interaction mechanism. The MAN leverages an exponential moving average (EMA) of the parameters from adjacent local blocks to enhance information flow. This auxiliary network, updated through EMA, helps bridge the informational gap between blocks. Nevertheless, we observe that directly applying EMA parameters has certain limitations due to feature discrepancies among local blocks. To overcome this, we introduce learnable biases, further boosting performance. We have validated our method on four image classification datasets (CIFAR-10, STL-10, SVHN, ImageNet), attaining superior performance and substantial memory savings. Notably, our method can reduce GPU memory usage by more than 45\% on the ImageNet dataset compared to end-to-end training, while achieving higher performance. The Momentum Auxiliary Network thus offers a new perspective for supervised local learning. Our code is available at: https://github.com/JunhaoSu0/MAN.
翻译:深度神经网络传统上采用端到端反向传播进行训练,这种方法缺乏生物学可信性,并在网络参数更新过程中引发锁定困境,导致GPU内存占用显著。监督局部学习将网络分割为多个局部块,通过独立的辅助网络进行更新。然而,由于精度较低,这些方法无法替代端到端训练,因为梯度仅在其局部块内传播,导致块间缺乏信息交换。为解决此问题并建立跨块信息传递,我们提出了一种动量辅助网络(MAN),该网络建立了一种动态交互机制。MAN利用相邻局部块参数的指数移动平均(EMA)来增强信息流。这种通过EMA更新的辅助网络有助于弥合块间的信息鸿沟。然而,我们观察到,由于局部块间的特征差异,直接应用EMA参数存在一定局限性。为克服此问题,我们引入了可学习的偏置,进一步提升了性能。我们在四个图像分类数据集(CIFAR-10、STL-10、SVHN、ImageNet)上验证了我们的方法,取得了优越的性能和显著的内存节省。值得注意的是,与端到端训练相比,我们的方法在ImageNet数据集上可将GPU内存使用降低超过45%,同时实现更高的性能。因此,动量辅助网络为监督局部学习提供了新的视角。我们的代码发布于:https://github.com/JunhaoSu0/MAN。