Continual Learning (CL) is a highly relevant setting gaining traction in recent machine learning research. Among CL works, architectural and hybrid strategies are particularly effective due to their potential to adapt the model architecture as new tasks are presented. However, many existing solutions do not efficiently exploit model sparsity, and are prone to capacity saturation due to their inefficient use of available weights, which limits the number of learnable tasks. In this paper, we propose TinySubNets (TSN), a novel architectural CL strategy that addresses the issues through the unique combination of pruning with different sparsity levels, adaptive quantization, and weight sharing. Pruning identifies a subset of weights that preserve model performance, making less relevant weights available for future tasks. Adaptive quantization allows a single weight to be separated into multiple parts which can be assigned to different tasks. Weight sharing between tasks boosts the exploitation of capacity and task similarity, allowing for the identification of a better trade-off between model accuracy and capacity. These features allow TSN to efficiently leverage the available capacity, enhance knowledge transfer, and reduce computational resource consumption. Experimental results involving common benchmark CL datasets and scenarios show that our proposed strategy achieves better results in terms of accuracy than existing state-of-the-art CL strategies. Moreover, our strategy is shown to provide a significantly improved model capacity exploitation. Code released at: https://github.com/lifelonglab/tinysubnets.
翻译:持续学习(Continual Learning, CL)是近期机器学习研究中备受关注且高度相关的设定。在众多持续学习工作中,架构与混合策略因其能在新任务出现时调整模型架构的潜力而尤为有效。然而,现有许多方案未能高效利用模型稀疏性,且由于对可用权重的低效使用,容易导致容量饱和,从而限制了可学习任务的数量。本文提出TinySubNets(TSN),一种新颖的架构式持续学习策略,通过独特地结合不同稀疏度剪枝、自适应量化和权重共享来解决上述问题。剪枝识别出能保持模型性能的权重子集,使相关性较低的权重可用于未来任务。自适应量化允许单个权重被分割为多个部分,并可分配给不同任务。任务间的权重共享提升了容量利用与任务相似性挖掘能力,有助于在模型精度与容量间找到更优权衡。这些特性使TSN能够高效利用可用容量,增强知识迁移,并降低计算资源消耗。在常见基准持续学习数据集和场景上的实验结果表明,我们所提策略在准确率方面优于现有最先进的持续学习策略。此外,我们的策略被证明能显著提升模型容量利用率。代码发布于:https://github.com/lifelonglab/tinysubnets。