Continual learning has gained increasing importance as it facilitates the acquisition and refinement of scalable knowledge and skills in language models. However, existing methods typically encounter strict limitations and challenges in real-world scenarios, such as reliance on experience replay, optimization constraints, and inference task-ID. In this study, we introduce the Scalable Language Model (SLM) to overcome these limitations within a more challenging and generalized setting, representing a significant advancement toward practical applications for continual learning. Specifically, we propose the Joint Adaptive Re-Parameterization (JARe), integrated with Dynamic Task-related Knowledge Retrieval (DTKR), to enable adaptive adjustment of language models based on specific downstream tasks. This approach leverages the task distribution within the vector space, aiming to achieve a smooth and effortless continual learning process. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting. Moreover, while prior research primarily focused on a single task type such as classification, our study goes beyond, with the large language model, i.e., LLaMA-2, to explore the effects across diverse domains and task types, such that a single language model can be decently scaled to broader applications.
翻译:持续学习因其能促进语言模型中可扩展知识与技能的获取与精炼而日益重要。然而,现有方法在真实场景中通常面临严格限制与挑战,例如依赖经验回放、优化约束以及推理任务标识。在本研究中,我们引入可扩展语言模型以在更具挑战性和通用性的设定中克服这些限制,这标志着向持续学习的实际应用迈出了重要一步。具体而言,我们提出联合自适应重参数化方法,并集成动态任务相关知识检索,使语言模型能够根据特定下游任务进行自适应调整。该方法利用向量空间中的任务分布,旨在实现平滑且轻松的持续学习过程。我们的方法在多种骨干网络和基准测试中展现了最先进的性能,在全样本和少样本场景中均能实现有效的持续学习且遗忘极小。此外,尽管先前研究主要关注分类等单一任务类型,我们的研究更进一步,利用大型语言模型(即LLaMA-2)探索跨不同领域和任务类型的效应,使得单个语言模型能够体面地扩展到更广泛的应用场景。