Hessian Aware Low-Rank Weight Perturbation for Continual Learning

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.

翻译：持续学习旨在按序学习一系列任务，同时不遗忘先前任务获得的知识。本文提出海森感知低秩扰动算法用于持续学习。通过利用权重矩阵变换对序列任务中的参数迁移进行建模，我们提出在神经网络各层对任务自适应参数施加低秩近似。具体而言，我们从理论层面论证了海森矩阵与所提低秩近似之间的定量关系。接着根据由逐层梯度和低秩近似误差估计的经验损失边际增量，全局确定近似秩。此外，通过剪枝重要度较低的参数来控制模型容量，抑制参数增长。我们在包含大规模任务数据集在内的多种基准上开展广泛实验，并与近期一些最先进方法进行对比，验证了所提方法的有效性和可扩展性。实验结果表明，本方法在不同基准上表现更优，尤其在实现任务顺序鲁棒性和处理遗忘问题方面。源代码见https://github.com/lijiaqi/HALRP。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日