Our brains extract durable, generalizable knowledge from transient experiences of the world. Artificial neural networks come nowhere close: when tasked with learning to classify objects by training on non-repeating video frames in temporal order (online stream learning), models that learn well from shuffled datasets catastrophically forget old knowledge upon learning new stimuli. We propose a new continual learning algorithm, Compositional Replay Using Memory Blocks (CRUMB), which mitigates forgetting by replaying feature maps reconstructed by recombining generic parts. Just as crumbs together form a loaf of bread, we concatenate trainable and re-usable "memory block" vectors to compositionally reconstruct feature map tensors in convolutional neural networks. CRUMB stores the indices of memory blocks used to reconstruct new stimuli, enabling replay of specific memories during later tasks. CRUMB's memory blocks are tuned to enhance replay: a single feature map stored, reconstructed, and replayed by CRUMB mitigates forgetting during video stream learning more effectively than an entire image, even though it occupies only 3.6% as much memory. We stress-tested CRUMB alongside 13 competing methods on 5 challenging datasets. To address the limited number of existing online stream learning datasets, we introduce 2 new benchmarks by adapting existing datasets for stream learning. With about 4% of the memory and 20% of the runtime, CRUMB mitigates catastrophic forgetting more effectively than the prior state-of-the-art. Our code is available at https://github.com/MorganBDT/crumb.git.
翻译:我们的大脑能够从转瞬即逝的世界经验中提取持久且可泛化的知识。人工神经网络远未达到这一水平:当任务要求通过按时间顺序对非重复视频帧进行训练来学习对象分类时(在线流学习),那些能在打乱数据集中良好学习的模型在学习新刺激后会灾难性地遗忘旧知识。我们提出一种新的持续学习算法——基于记忆块的组合回放(Compositional Replay Using Memory Blocks, CRUMB),该方法通过回放由通用部件重组得到的特征图来缓解遗忘。正如面包屑共同构成整条面包,我们将可训练且可重复使用的"记忆块"向量拼接起来,以组合方式重建卷积神经网络中的特征图张量。CRUMB存储用于重建新刺激的记忆块索引,从而在后续任务中实现特定记忆的回放。CRUMB的记忆块经过调优以增强回放效果:由CRUMB存储、重建并回放的单个特征图,在缓解视频流学习中的遗忘方面比整张图像更有效,尽管其仅占3.6%的内存。我们将CRUMB与13种对比方法在5个具有挑战性的数据集上进行了压力测试。针对现有在线流学习数据集数量有限的问题,我们通过改编现有数据集用于流学习,引入了两个新基准。在仅使用约4%内存和20%运行时间的条件下,CRUMB比先前最先进方法更有效地缓解了灾难性遗忘。我们的代码见 https://github.com/MorganBDT/crumb.git。