This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning. This mechanism -- the repeated resetting of weights in the last layer, which we nickname "zapping" -- was originally designed for a meta-continual-learning procedure, yet we show it is surprisingly applicable in many settings beyond both meta-learning and continual learning. In our experiments, we wish to transfer a pre-trained image classifier to a new set of classes, in a few shots. We show that our zapping procedure results in improved transfer accuracy and/or more rapid adaptation in both standard fine-tuning and continual learning settings, while being simple to implement and computationally efficient. In many cases, we achieve performance on par with state of the art meta-learning without needing the expensive higher-order gradients, by using a combination of zapping and sequential learning. An intuitive explanation for the effectiveness of this zapping procedure is that representations trained with repeated zapping learn features that are capable of rapidly adapting to newly initialized classifiers. Such an approach may be considered a computationally cheaper type of, or alternative to, meta-learning rapidly adaptable features with higher-order gradients. This adds to recent work on the usefulness of resetting neural network parameters during training, and invites further investigation of this mechanism.
翻译:本文识别出一种简单的预训练机制,该机制能产生具有更强持续学习与迁移学习能力的表征。这种机制——反复重置末层权重(我们称之为"zapping")——最初是为元持续学习流程设计的,但我们发现它出人意料地适用于超越元学习和持续学习的多种场景。在实验中,我们期望将预训练的图像分类器迁移到新类别上(仅需少量样本)。研究表明,在标准微调与持续学习场景中,我们的zapping流程既能提升迁移准确率,又能加速适应过程,同时实现简单且计算高效。通过将zapping与顺序学习相结合,我们在许多情况下无需昂贵的更高阶梯度,即可达到与最先进元学习方法相当的性能。对这种zapping机制有效性的直观解释是:经过反复zapping训练的表征能够学习到可快速适应新初始化分类器的特征。该方法可被视为一种计算成本更低的元学习方法(或替代方案),用于通过更高阶梯度学习快速自适应特征。本研究补充了近期关于训练过程中重置神经网络参数价值的研究,并呼吁进一步探索这一机制。