Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics for continual scenarios using Neural Tangent Kernel (NTK) theory. With the aid of NTK as a mathematical analysis tool, we recast the challenge of test-time forgetting into the quantifiable generalization gaps during training, identifying three key factors that influence these gaps and the performance of PEFT-CL: training sample size, task-level feature orthogonality, and regularization. To address these challenges, we introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features. Aligning with theoretical guidance, NTK-CL triples the feature representation of each sample, theoretically and empirically reducing the magnitude of both task-interplay and task-specific generalization gaps. Grounded in NTK analysis, our approach imposes an adaptive exponential moving average mechanism and constraints on task-level feature orthogonality, maintaining intra-task NTK forms while attenuating inter-task NTK forms. Ultimately, by fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks. This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems.

翻译：参数高效微调用于持续学习（PEFT-CL）在使预训练模型适应序列任务并缓解灾难性遗忘问题方面展现出潜力。然而，理解该范式中决定持续性能的机制仍不明确。为应对这一复杂性，我们基于神经正切核（NTK）理论对PEFT-CL动态过程进行严格分析，推导出适用于持续学习场景的相关度量指标。借助NTK作为数学分析工具，我们将测试阶段遗忘的挑战重新表述为训练过程中可量化的泛化差距，并识别出影响这些差距及PEFT-CL性能的三个关键因素：训练样本量、任务级特征正交性和正则化。针对这些挑战，我们提出NTK-CL——一种无需存储任务特定参数即可自适应生成任务相关特征的新型框架。该框架遵循理论指导，将每个样本的特征表示扩展至三倍，从理论与实证层面同时降低任务交互与任务特定泛化差距的幅度。基于NTK分析，我们的方法引入自适应指数移动平均机制并对任务级特征正交性施加约束，在保持任务内NTK形式的同时衰减任务间NTK形式。最终，通过对可优化参数进行适当正则化的微调，NTK-CL在现有PEFT-CL基准测试中取得了最先进的性能。本研究为理解和改进PEFT-CL模型提供了理论基础，揭示了特征表示、任务正交性与泛化性能之间的相互作用机制，为开发更高效的持续学习系统作出贡献。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日