Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins' loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48%, only 1.1% below the accuracy of an end-to-end pretrained network (71.57% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience.
翻译:当前最先进的深度网络均由反向传播驱动。本文探索了利用自监督学习最新进展作为全反向传播替代方案的分块学习规则。我们提出一种分块预训练流程:对ResNet-50的四个主要层块分别使用Barlow Twins损失函数进行独立训练。实验表明,该流程在ImageNet上的表现与端到端反向传播几乎相当——在分块预训练模型上训练的线性探测分类器获得70.48%的top-1分类精度,仅比端到端预训练网络(精度71.57%)低1.1%。通过大量实验剖析方法中各组件的影响,并探索自监督学习在分块范式下的多种适应性改造,我们系统性地揭示了将局部学习规则扩展到大型网络的关键路径,相关结论对硬件设计到神经科学领域均具有启示意义。