Recent advances in Large Language Models (LLMs) have exhibited remarkable proficiency across various tasks. Given the potent applications of LLMs in numerous fields, there has been a surge in LLM development. In developing LLMs, a common practice involves continual pre-training on previously fine-tuned models. However, this can lead to catastrophic forgetting. In our work, we investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM. We evaluate the impact of continuous pre-training on the fine-tuned LLM across various dimensions, including output format, knowledge, and reliability. Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training, especially the repetition issue.
翻译:近年来,大语言模型(LLMs)在各种任务中展现出卓越的能力。鉴于LLMs在众多领域的强大应用潜力,其开发工作激增。在开发LLMs时,一种常见做法是在先前微调过的模型上进行持续预训练。然而,这可能导致灾难性遗忘。在我们的工作中,我们研究了在现有微调LLM上进行持续预训练时发生的遗忘现象。我们从输出格式、知识及可靠性等多个维度评估了持续预训练对微调LLM的影响。实验结果突显了在持续预训练过程中解决灾难性遗忘(尤其是重复问题)这一非平凡挑战。