We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two complementary ways. First, redundant neurons provide a proxy for dominant pretraining-era feature directions, enabling the construction of approximately protected update subspaces directly from pretrained weights. Second, redundancy offers a natural bias for \emph{where} to place plasticity: by restricting updates to a subset of redundant neurons and constraining the remaining degrees of freedom, we obtain update families with reduced functional drift on the old-data distribution and improved worst-case retention guarantees. These insights lead to \textsc{PLATE} (\textbf{Pla}sticity-\textbf{T}unable \textbf{E}fficient Adapters), a continual learning method requiring no past-task data that provides explicit control over the plasticity-retention trade-off. PLATE parameterizes each layer with a structured low-rank update $ΔW = B A Q^\top$, where $B$ and $Q$ are computed once from pretrained weights and kept frozen, and only $A$ is trained on the new task. The code is available at https://github.com/SalesforceAIResearch/PLATE.
翻译:我们提出了一种用于预训练模型的持续学习方法,该方法无需访问旧任务数据,解决了基础模型适应中的一个实际障碍,即预训练分布通常不可用。我们的关键观察是,预训练网络表现出显著的几何冗余性,并且这种冗余性可以通过两种互补的方式加以利用。首先,冗余神经元为预训练时代的主导特征方向提供了代理,从而能够直接从预训练权重构建近似受保护的更新子空间。其次,冗余性为塑性应放置于何处提供了天然的偏置:通过将更新限制在冗余神经元的子集内并约束剩余的自由度,我们获得了在旧数据分布上功能漂移减小且最坏情况保留保证得到改善的更新族。这些洞见催生了PLATE(可塑性可调高效适配器),这是一种无需过去任务数据的持续学习方法,它提供了对可塑性-保留权衡的显式控制。PLATE使用结构化低秩更新$ΔW = B A Q^\top$对每一层进行参数化,其中$B$和$Q$从预训练权重一次性计算并保持冻结,仅$A$在新任务上进行训练。代码可在https://github.com/SalesforceAIResearch/PLATE获取。