This paper introduces a new biologically-inspired training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP). CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation (LTP). The first principle is a decay rate over the weight adjustment, which is implemented as a generalization of the AdaGrad optimization algorithm. This means that weights that have received many updates should have lower learning rates as they likely encode important information about previously seen data. However, this principle results in a diffuse distribution of updates throughout the model, as it promotes updates for weights that haven't been previously updated, while a sparse update distribution is preferred to leave weights unassigned for future tasks. Therefore, the second principle introduces a threshold on the loss gradient. This promotes sparse learning by updating a weight only if the loss gradient with respect to that weight is above a certain threshold, i.e. only updating weights with a significant impact on the current loss. Both principles reflect phenomena observed in LTP, where a threshold effect and a gradual saturation of potentiation have been observed. CLASSP is implemented in a Python/PyTorch class, making it applicable to any model. When compared with Elastic Weight Consolidation (EWC) using Computer Vision and sentiment analysis datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.
翻译:本文提出一种名为“通过调整抑制与稀疏性促进实现持续学习”(CLASSP)的新型仿生训练方法。CLASSP基于神经科学中观察到的两个核心原理,特别是在突触传递与长时程增强(LTP)的背景下。第一项原理是对权重调整施加衰减率,其实现方式是对AdaGrad优化算法的泛化。这意味着接收过多次更新的权重应具有较低的学习率,因为它们很可能编码了先前所见数据的重要信息。然而,该原理会导致更新在模型内呈弥散分布,因为它会促进先前未更新权重的更新,而稀疏的更新分布更有利于为未来任务保留未分配的权重。因此,第二项原理引入了损失梯度的阈值机制:仅当某权重对应的损失梯度超过特定阈值时(即仅更新对当前损失有显著影响的权重),才对该权重进行更新,从而促进稀疏学习。这两项原理均反映了LTP中观察到的现象,即存在阈值效应以及增强效果的渐进饱和。CLASSP通过Python/PyTorch类实现,可适用于任何模型。在使用计算机视觉与情感分析数据集与弹性权重固化(EWC)方法进行比较时,CLASSP在准确率与内存占用方面均表现出更优的性能。