Our brains extract durable, generalizable knowledge from transient experiences of the world. Artificial neural networks come nowhere close to this ability. When tasked with learning to classify objects by training on non-repeating video frames in temporal order (online stream learning), models that learn well from shuffled datasets catastrophically forget old knowledge upon learning new stimuli. We propose a new continual learning algorithm, Compositional Replay Using Memory Blocks (CRUMB), which mitigates forgetting by replaying feature maps reconstructed by combining generic parts. CRUMB concatenates trainable and re-usable "memory block" vectors to compositionally reconstruct feature map tensors in convolutional neural networks. Storing the indices of memory blocks used to reconstruct new stimuli enables memories of the stimuli to be replayed during later tasks. This reconstruction mechanism also primes the neural network to minimize catastrophic forgetting by biasing it towards attending to information about object shapes more than information about image textures, and stabilizes the network during stream learning by providing a shared feature-level basis for all training examples. These properties allow CRUMB to outperform an otherwise identical algorithm that stores and replays raw images, while occupying only 3.6% as much memory. We stress-tested CRUMB alongside 13 competing methods on 7 challenging datasets. To address the limited number of existing online stream learning datasets, we introduce 2 new benchmarks by adapting existing datasets for stream learning. With only 3.7-4.1% as much memory and 15-43% as much runtime, CRUMB mitigates catastrophic forgetting more effectively than the state-of-the-art. Our code is available at https://github.com/MorganBDT/crumb.git.
翻译:我们的大脑能够从转瞬即逝的世界经验中提取持久、可泛化的知识。人工神经网络远未达到这种能力。当通过按时间顺序(在线流学习)处理非重复视频帧来训练模型学习物体分类时,那些能够在随机打乱的数据集上良好学习的模型,在学习新刺激后会灾难性地遗忘旧知识。我们提出一种新的持续学习算法——基于记忆块的组合回放(CRUMB),该方法通过回放由组合通用部分重构的特征图来缓解遗忘。CRUMB将可训练且可重复使用的“记忆块”向量拼接起来,以组合方式重构卷积神经网络中的特征图张量。存储用于重构新刺激的记忆块索引,使得在后续任务中能够回放这些刺激的记忆。该重构机制还通过引导网络更关注物体形状信息而非图像纹理信息,使神经网络偏向于最小化灾难性遗忘,并通过为所有训练示例提供共享的特征级基础来稳定流学习过程中的网络。这些特性使CRUMB在仅占用3.6%内存的情况下,优于存储并回放原始图像的同等算法。我们在7个具有挑战性的数据集上,将CRUMB与13种竞争方法一同进行了压力测试。针对现有在线流学习数据集数量有限的问题,我们通过改编现有数据集以适应流学习,引入了2个新的基准测试。仅需3.7-4.1%的内存和15-43%的运行时间,CRUMB就能比现有最优方法更有效地缓解灾难性遗忘。我们的代码已开源:https://github.com/MorganBDT/crumb.git。