Deep neural networks (DNNs) typically employ an end-to-end (E2E) training paradigm which presents several challenges, including high GPU memory consumption, inefficiency, and difficulties in model parallelization during training. Recent research has sought to address these issues, with one promising approach being local learning. This method involves partitioning the backbone network into gradient-isolated modules and manually designing auxiliary networks to train these local modules. Existing methods often neglect the interaction of information between local modules, leading to myopic issues and a performance gap compared to E2E training. To address these limitations, we propose the Multilaminar Leap Augmented Auxiliary Network (MLAAN). Specifically, MLAAN comprises Multilaminar Local Modules (MLM) and Leap Augmented Modules (LAM). MLM captures both local and global features through independent and cascaded auxiliary networks, alleviating performance issues caused by insufficient global features. However, overly simplistic auxiliary networks can impede MLM's ability to capture global information. To address this, we further design LAM, an enhanced auxiliary network that uses the Exponential Moving Average (EMA) method to facilitate information exchange between local modules, thereby mitigating the shortsightedness resulting from inadequate interaction. The synergy between MLM and LAM has demonstrated excellent performance. Our experiments on the CIFAR-10, STL-10, SVHN, and ImageNet datasets show that MLAAN can be seamlessly integrated into existing local learning frameworks, significantly enhancing their performance and even surpassing end-to-end (E2E) training methods, while also reducing GPU memory consumption.
翻译:深度神经网络通常采用端到端训练范式,这种范式存在若干挑战,包括GPU内存消耗高、训练效率低下以及模型并行化困难。近期研究试图解决这些问题,其中一种有前景的方法是局部学习。该方法将主干网络划分为梯度隔离的模块,并手动设计辅助网络来训练这些局部模块。现有方法往往忽视局部模块间的信息交互,导致短视性问题,并在性能上与端到端训练存在差距。为应对这些局限,我们提出多层跳跃增强辅助网络。具体而言,MLAAN包含多层局部模块和跳跃增强模块。MLM通过独立级联的辅助网络同时捕获局部与全局特征,缓解因全局特征不足导致的性能问题。然而,过于简化的辅助网络会阻碍MLM捕获全局信息的能力。为此,我们进一步设计LAM——一种采用指数移动平均方法促进局部模块间信息交换的增强型辅助网络,从而缓解因交互不足产生的短视性。MLM与LAM的协同作用展现出卓越性能。我们在CIFAR-10、STL-10、SVHN和ImageNet数据集上的实验表明,MLAAN能无缝集成到现有局部学习框架中,显著提升其性能甚至超越端到端训练方法,同时降低GPU内存消耗。