Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks. Due to the memory limit, we cannot store all the historical data, and therefore confront the ``catastrophic forgetting'' problem, i.e., the performance on the previous tasks can substantially decrease because of the missing information in the latter period. Though a number of elegant methods have been proposed, the catastrophic forgetting phenomenon still cannot be well avoided in practice. In this paper, we study the problem from the gradient perspective, where our aim is to develop an effective algorithm to calibrate the gradient in each updating step of the model; namely, our goal is to guide the model to be updated in the right direction under the situation that a large amount of historical data are unavailable. Our idea is partly inspired by the seminal stochastic variance reduction methods (e.g., SVRG and SAGA) for reducing the variance of gradient estimation in stochastic gradient descent algorithms. Another benefit is that our approach can be used as a general tool, which is able to be incorporated with several existing popular CL methods to achieve better performance. We also conduct a set of experiments on several benchmark datasets to evaluate the performance in practice.
翻译:持续学习(Continual Learning, CL)是机器学习中的一个基础课题,其目标是在数据与任务持续到达的情况下训练模型。由于存储限制,我们无法保存全部历史数据,因此面临“灾难性遗忘”问题——即由于后期信息缺失,模型在先前任务上的性能可能显著下降。尽管已有多种优秀方法被提出,但在实践中灾难性遗忘现象仍难以有效避免。本文从梯度视角研究该问题,旨在开发一种有效的算法来校准模型每个更新步骤中的梯度;即在大量历史数据不可用的情境下,引导模型沿正确方向更新。我们的思路部分受到经典随机方差缩减方法(如SVRG和SAGA)的启发,这些方法用于降低随机梯度下降算法中梯度估计的方差。本方法的另一优势在于其可作为通用工具,能够与多种现有主流CL方法结合以提升性能。我们还在多个基准数据集上进行了实验,以评估其实际性能。