End-to-end (E2E) training approaches are commonly plagued by high memory consumption, reduced efficiency in training, challenges in model parallelization, and suboptimal biocompatibility. Local learning is considered a novel interactive training method that holds promise as an alternative to E2E. Nonetheless, conventional local learning methods fall short in achieving high model accuracy due to inadequate local inter-module interactions. In this paper, we introduce a new model known as the Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network (MLAAN). MLAAN features an innovative supervised local learning approach coupled with a robust reinforcement module. This dual-component design enables the MLAAN to integrate smoothly with established local learning techniques, thereby enhancing the efficacy of the foundational methods. The method simultaneously acquires the local and global features of the model separately by constructing an independent auxiliary network and a cascade auxiliary network on the one hand and incorporates a leap augmented module, which serves to counteract the reduced learning capacity often associated with weaker supervision. This architecture not only augments the exchange of information amongst the local modules but also effectively mitigates the model's tendency toward myopia. The experimental evaluations conducted on four benchmark datasets, CIFAR-10, STL-10, SVHN, and ImageNet, demonstrate that the integration of MLAAN with existing supervised local learning methods significantly enhances the original methodologies. Of particular note, MLAAN enables local learning methods to comprehensively outperform end-to-end training approaches in terms of optimal performance while saving GPU memory.
翻译:端到端训练方法普遍存在内存消耗高、训练效率低、模型并行化困难及生物兼容性欠佳等问题。局部学习作为一种新型交互式训练方法,有望成为端到端训练的有效替代方案。然而,传统局部学习方法因模块间交互不足而难以实现高模型精度。本文提出一种称为"基于多层跳跃增强辅助网络的规模化监督局部学习"的新型模型MLAAN。该模型创新性地结合了监督局部学习方法与强化模块的双重设计,使其能够与现有局部学习技术无缝集成,从而提升基础方法的效能。该方法通过构建独立辅助网络与级联辅助网络分别获取模型的局部与全局特征,同时引入跳跃增强模块以应对弱监督常导致的学习能力下降问题。该架构不仅增强了局部模块间的信息交换,还有效缓解了模型的短视倾向。在CIFAR-10、STL-10、SVHN和ImageNet四个基准数据集上的实验评估表明,MLAAN与现有监督局部学习方法的结合显著提升了原始方法的性能。尤其值得注意的是,MLAAN使局部学习方法在节约GPU内存的同时,其最优性能全面超越了端到端训练方法。