Investigating Continual Pretraining in Large Language Models: Insights and Implications

This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on developing strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancing cross-domain knowledge transfer without relying on domain-specific identification. Unlike previous studies, which mostly concentrate on a limited selection of tasks or domains and primarily aim to address the issue of forgetting, our research evaluates the adaptability and capabilities of LLMs to changing data landscapes in practical scenarios. To this end, we introduce a new benchmark designed to measure the adaptability of LLMs to these evolving data environments, offering a comprehensive framework for evaluation. We examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) when the sequence of domains shows semantic similarity, continual pretraining enables LLMs to better specialize in the current domain compared to stand-alone fine-tuning, (ii) training across a diverse range of domains enhances both backward and forward knowledge transfer, and (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both forgetting and learning. We posit that our research marks a shift towards establishing a more realistic benchmark for investigating CL in LLMs, and has the potential to play a key role in guiding the direction of future research in the field.

翻译：本文研究大型语言模型（LLMs）中持续学习（CL）这一不断发展的领域，重点关注高效且可持续训练策略的制定。我们主要聚焦于持续领域自适应预训练——这一过程旨在赋予LLMs从不同领域整合新信息的能力，同时保留先前习得的知识，并在不依赖领域特定识别的情况下增强跨领域知识迁移。与以往研究多集中于有限任务或领域、主要致力于解决遗忘问题不同，本研究评估了LLMs在实际场景中适应动态数据分布的能力。为此，我们引入了一个新基准，用于衡量LLMs对动态数据环境的适应性，并提供了全面的评估框架。我们探究了模型规模对学习效率与遗忘的影响，以及新兴领域的演进顺序和语义相似性如何影响模型内的知识迁移。研究揭示了若干关键发现：（i）当领域序列呈现语义相似性时，持续预训练能使LLMs在当前领域上的表现优于独立微调；（ii）在多样化领域上进行训练能同时增强前向与后向知识迁移；（iii）较小参数的模型对持续预训练尤为敏感，表现出最显著的遗忘与学习速率。我们认为，本研究标志着为LLMs中的CL研究建立更现实基准的范式转变，并有望为未来该领域的研究方向提供关键指导。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日