Parameter regularization or allocation methods are effective in overcoming catastrophic forgetting in lifelong learning. However, they solve all tasks in a sequence uniformly and ignore the differences in the learning difficulty of different tasks. So parameter regularization methods face significant forgetting when learning a new task very different from learned tasks, and parameter allocation methods face unnecessary parameter overhead when learning simple tasks. In this paper, we propose the Parameter Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. A task is easy for a model that has learned tasks related to it and vice versa. We propose a divergence estimation method based on the Nearest-Prototype distance to measure the task relatedness using only features of the new task. Moreover, we propose a time-efficient relatedness-aware sampling-based architecture search strategy to reduce the parameter overhead for allocation. Experimental results on multiple benchmarks demonstrate that, compared with SOTAs, our method is scalable and significantly reduces the model's redundancy while improving the model's performance. Further qualitative analysis indicates that PAR obtains reasonable task-relatedness.
翻译:参数正则化或分配方法是克服持续学习中灾难性遗忘的有效手段。然而,这些方法对序列中的任务采用统一处理方式,忽略了不同任务学习难度的差异。因此,当学习与已学任务差异显著的新任务时,参数正则化方法面临严重遗忘;而参数分配方法在处理简单任务时则造成不必要的参数冗余。本文提出参数分配与正则化(PAR)方法,该方法能根据每个任务的学习难度,自适应地在参数分配与正则化策略间选择合适方案。若模型已学习过与新任务相关的任务,则该任务易于学习,反之亦然。我们提出一种基于最近原型距离的散度估计方法,仅利用新任务特征即可衡量任务相关性。此外,我们提出一种高效的基于相关性感知的采样架构搜索策略,以降低参数分配带来的额外开销。在多个基准数据集上的实验结果表明,与当前最优方法相比,本方法具有可扩展性,在提升模型性能的同时显著减少了模型冗余性。进一步的定性分析表明,PAR能够获取合理的任务相关性。