We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two complementary ways. First, redundant neurons provide a proxy for dominant pretraining-era feature directions, enabling the construction of approximately protected update subspaces directly from pretrained weights. Second, redundancy offers a natural bias for \emph{where} to place plasticity: by restricting updates to a subset of redundant neurons and constraining the remaining degrees of freedom, we obtain update families with reduced functional drift on the old-data distribution and improved worst-case retention guarantees. These insights lead to \textsc{PLATE} (\textbf{Pla}sticity-\textbf{T}unable \textbf{E}fficient Adapters), a continual learning method requiring no past-task data that provides explicit control over the plasticity-retention trade-off. PLATE parameterizes each layer with a structured low-rank update $ΔW = B A Q^\top$, where $B$ and $Q$ are computed once from pretrained weights and kept frozen, and only $A$ is trained on the new task. The code is available at https://github.com/SalesforceAIResearch/PLATE.
翻译:我们提出了一种无需访问旧任务数据的持续学习方法,解决了预训练模型适配中因预训练分布无法获取而面临的实践障碍。关键发现是:预训练网络存在显著的几何冗余,这种冗余可通过两种互补方式加以利用。首先,冗余神经元可作为预训练时期主导特征方向的代理,从而直接从预训练权重构建近似受保护的更新子空间。其次,冗余性为塑性部署位置提供了天然偏置——通过将更新限制在冗余神经元子集并约束剩余自由度,可获得在旧数据分布上功能漂移更小的更新族,并改善最坏情况下的记忆保持保证。基于这些洞见,我们提出PLATE(塑性可调高效适配器),一种无需过往任务数据、能显式调控塑性-记忆权衡的持续学习方法。PLATE通过结构化的低秩更新$ΔW = B A Q^\top$参数化每层网络,其中$B$和$Q$基于预训练权重一次性计算并冻结,仅$A$在新任务上训练。代码开源地址:https://github.com/SalesforceAIResearch/PLATE